Serial Protocol Design
Table of contents
So, you need a serial protocol for sending messages over some kind of bus?
First rule: use something that already exists.
Second rule: don't re-implement if there's code available.
Third rule: steal with pride - use existing patterns, if not whole protocols
When designing a serial protocol, you need to take care of these issues, among others:
- When does a message begin?
- When does it end?
- Is it corrupted?
- Is it for me?
When does a message begin?
Basically you've got three options:
- Use a special Start Of Message (SOM) byte
- Add a delay between messages
- Assume message begins when data is received
Delay between messages
If you rely on there being a moment of silence between messages, you're limiting the utilization of the bus - and that's rarely what you want.
- Wait until line is idle for X ms
- Start reading message
SOM
A more reasonable choice here is to use a SOM. Any byte will do, at least in theory, but a lot of protocols try to use a byte that contains a lot of bit value transitions - the theory being that the transitions can be used to improve baud rate synchronization. 0xAA and 0x55 are both
- Discard data until SOM byte is found
- Start reading message
No start marker
This is a bad idea, at least on a shared bus. Why is it a bad idea to just assume that a message starts when data is received? Imagine that you're about to connect your device to a communication bus where there's already activity going on - how will you be able to start listening between messages, and not in the middle of one?
- Start reading message
When does a message end?
Before you can start acting on a message, you usually need to received the entire message. And how do you know that there's no more data coming?
We can image at least three different methods:
- Use a special End Of Message (EOM) byte
- Encode the length of the message within the message
- Wait until no more data is received within a certain number of milliseconds
EOM
First off, the EOM marker. The logic for reading a message becomes very easy:
- Find start of message
- Read data until a EOM is found
There's a slight problem with this approach: what if your message data contains the SOM or EOM bytes? You'll need to do some kind of data escaping to handle this, e.g. replace the EOM byte within the message with a [ESC, 0] sequence. Then you'll also need to escape the ESC byte, e.g. with [ESC, 1].
Embed length in message
If you don't want to escape your message data, you can add the message size to the message format. Preferably at the start of the message. Something like: [SOM, length, data...]
Reading such a message:
- Find start of message
- Read length
- Read 'length' bytes of data
When encoding the length of the message, decide on what exactly you're measuring the length of. Is it the number of additional bytes to read, or perhaps the full size of the message including SOM (if there is one)?
Delay between messages
This is the same as having a delay before message start, so probably not a good idea. Reading a message this way:
- Discard any data until delay between bytes is at least X ms
- Read data until delay between bytes is at least X ms
Is the message corrupted?
After reading what you believe is a message, you need to make sure that it's actually valid before you do any further processing with it. There are a number of ways a message can become corrupted:
- Line interference
- Dropped bytes due to e.g. buffer overflows
- Bugs in message codec functions
Normally you'll calculate a checksum over the bytes of the message, and add the checksum to the data being sent so that the receiver can apply the same checksum to the message data and verify that the calculated and received checksums match.
A simple, but bad, checksum function could be to sum the values of each byte of the message. Unfortunately you're going to be unable to detect duplicated or missing zero values, and all forms of rearranged data bytes.
Most protocols use something like CRC8, CRC16 or CRC32. The number in the name signifies the number of bits the resulting checksum contains, and CRC is short for Cyclic Redundancy Check. There's just a slight wrinkle here that always causes a lot of problems: there's not just ONE way of doing a CRC16. There are a myriad of slight variations on the same general approach. So to make things easy for yourself and those reading your protocol specification: include an example implementation in C of the method you've chosen. See Wikipedia for a glimpse into the world of CRCs.
After you've chosen a checksum method to use, you'll need to decide where in the message you're going to put it. Keeping in mind that it should be easy for recipients to validate a message, it makes sense to put it at the end of the message. Then the recipient can do something like:
bool is_valid = checksum(buffer, len) == &buffer[len];
If you put the checksum inside the message, you need to clarify how the checksum is supposed to be calculated. In a IP packet header , for example, the checksum bits are set to 0 while calculating the checksum. You could also skip the checksum bits, but then you'd need to do two calls to your checksum function.
If you're relying on a SOM with no explicit EOM, it's very important to use and validate a checksum. Otherwise you're likely to get confused by random SOM bytes inside messages.
Another thing to decide is which bytes to include when calculating the checksum. Do you include the SOM byte? The address? In short, it's probably best to include the entire message, including the SOM.
Is this message meant for me?
Now that you've got a valid message, it's time to decide what to do with it. In many cases you've got some kind of shared bus, where all devices see all the messages but ignore most of them.
First of all, you need to decide how many addresses are needed. If you're wrong here, you'll end up in the mess that is the transition from IPv4 (with 32 bit addresses) to IPv6 (with 128 bit addresses), so make sure that you add some headroom. But not too much, or you're going to waste precious bytes in every message that is sent.
You'll need to look at both your application requirements and your physical limits. If it's physically impossible to connect more than 64 devices to your communication bus, then there's probably little need to support more than 64 addresses. On the other hand you might need to support routing of messages between such busses, and then it might make sense to have more address bits.
In IP packets, both the sender's and the recipient's addresses are placed in the header, but perhaps you don't need to know who the sender is. If you're using a master/slave configuration on a half duplex communication channel, you can probably get away with having just 4 bits of address. But you probably shouldn't get too creative in packing random bits into bytes; ease of implementation is a very important factor to reduce the number of bugs.
There's probably only one bit flag that makes sense in an address byte, and that's a "is reply" bit. Using this scheme, a passive listener can better understand which device is sending data: a message to device N has its address field set to N, and a message from device N has its address field set to N plus the "is reply" bit.
Additional bits and flags
Many times you'll want to add a couple of flags that signify things about the message. Maybe you want to support optional compression of message contents, or you support multiple checksum methods and need to indicate which one to use.
There's only one recommendation to make here: put all the flags in a separate byte, with the possible exception of the "is reply" flag mentioned above.
Real-world examples
Giving advice is all well and good, but what's used in the real world? Below are a few examples of protocols and a bit about which choices they've made.
Linux terminal port
If you're a bit old-school, or if you're running headless servers, you've probably got a tty set up so you can connect a RS232 cable and get access to a shell.
This is also a serial protocol, in a way. It's crude, but it works. Most of the time.
- When does a message begin?
There's no SOM marker, we just assume that whatever we get first is the start of a message. This is a problem if you don't enter a linefeed before disconnecting your cable; the next time you connect there'll be a half-written command already in the shell's input buffer that you don't see. Imaging typing "rm -rf / " and then disconnecting the cable… It's probably a good habit to always start with a ctrl-C before typing your command after you've just connected the cable.
- When does it end?
EOM marker: linefeed.
- Is it corrupted?
No validation is performed.
- Is it for me?
There is only one recipient, so no addressing is needed.
Dial-up internet connections
In the beginning, there was SLIP . Ok, maybe not in the beginning, but early! SLIP is basically [IP packet, EOM marker]. Since EOM might appear inside the IP packet, there's data escaping to handle this.
This is simple, but apparently it was a bit user-unfriendly. So to replace it we got PPP , with a more robust low-level protocol that also does things such as handling multiple protocol payloads, checksums and more.
OSDP
This is a protocol that assumes the following:
- There's a shared bus
- There's a master device that controls all communications
- There can be a limited number of devices connected to the bus
An OSDP message has the following layout:
┏━━━━━━━━━━━━━━━┓
┃ 0 SOM ┃
┣━━━━━━━━━━━━━━━┫
┃ 1 Address ┃
┣━━━━━━━━━━━━━━━┫
┃ 2-3 Length ┃
┃ ┃
┣━━━━━━━━━━━━━━━┫
┃ 4 Flags ┃
┣───────────────┫
│ 5+ Sec block │
│ (optional)│
┣━━━━━━━━━━━━━━━┫
┃ ? Command ┃
┣───────────────┫
│ ? Data │
│ (optional)│
┣━━━━━━━━━━━━━━━┫
┃ ?-? Checksum ┃
┃ ┃
┗━━━━━━━━━━━━━━━┛
A slight aside
The diagram above was drawn using Monodraw , and hopefully your fonts support allt the characters used. If they don't, it'll probably render misaligned. Oops!
Decoding an OSDP message is pretty straight-forward, but it would have been nice if the Command byte was placed before the optional secure block. Except for that, it's the opinion of the author that this is a robust protocol design.
File formats
It can be argued that file formats share a lot of characteristics with serial protocols. A simple albeit contrived example is JSON, which is terribly fashionable at the moment.
A JSON object consists of a starting '{' character, an number of optional additional bytes, and an ending '}' character. There's no checksum or addressing, but in addition to the SOM and EOM bytes there's a number of rules determining valid sequences of data bytes.