Framing data (Part 4/5) 🐀
Paul Tagliamonte 2021-12-05 diyIn the last post, we we were able to build a functioning Layer 1 PHY where we can encode symbols to transmit, and receive symbols on the other end, we’re now at the point where we can encode and decode those symbols as bits and frame blocks of data, marking them with a Sender and a Destination for routing to the right host(s). This is a “Layer 2” scheme in the OSI model, which is otherwise known as the Data Link Layer. You’re using one to view this website right now – I’m willing to bet your data is going through an Ethernet layer 2 as well as WiFi or maybe a cellular data protocol like 5G or LTE.
Given that this entire exercise is hard enough without designing a complex
Layer 2 scheme, I opted for simplicity in the hopes this would free me from the
complexity and research that has gone into this field for the last 50 years. I
settled on stealing a few ideas from Ethernet Frames – namely, the use of MAC
addresses to identify parties, and
the EtherType
field to indicate the Payload type. I also stole the idea of
using a CRC at the end of the Frame to check for corruption, as well as the
specific CRC method (crc32
using 0xedb88320
as the polynomial).
Lastly, I added a callsign
field to make life easier on ham radio frequencies
if I was ever to seriously attempt to use a variant of this protocol over the
air with multiple users. However, given this scheme is not a commonly used
scheme, it’s best practice to use a nearby radio to identify your transmissions
on the same frequency while testing – or use a Faraday box to test without
transmitting over the airwaves. I added the callsign field in an effort to lean
into the spirit of the Part 97 regulations, even if I relied on a phone
emission to identify the Frames.
As an aside, I asked the ARRL for input here, and their stance to me over email was I’d be OK according to the regs if I were to stick to UHF and put my callsign into the BPSK stream using a widely understood encoding (even with no knowledge of PACKRAT, the callsign is ASCII over BPSK and should be easily demodulatable for followup with me). Even with all this, I opted to use FM phone to transmit my callsign when I was active on the air (specifically, using an SDR and a small bash script to automate transmission while I watched for interference or other band users).
Right, back to the Frame:
With all that done, I put that layout into a struct, so that we can marshal and unmarshal bytes to and from our Frame objects, and work with it in software.
type FrameType [2]byte
type Frame struct {
Destination net.HardwareAddr
Source net.HardwareAddr
Callsign [8]byte
Type FrameType
Payload []byte
CRC uint32
}
Time to pick some consts
I picked a unique and distinctive sync
sequence, which the sender
will transmit before the Frame, while the receiver listens for that
sequence to know when it’s in byte alignment with the symbol stream.
My sync
sequence is [3]byte{'U', 'f', '~'}
which works out to be a
very pleasant bit sequence of 01010101 01100110 01111110
. It’s important
to have soothing preambles for your Frames. We need all the good energy
we can get at this point.
var (
FrameStart = [3]byte{'U', 'f', '~'}
FrameMaxPayloadSize = 1500
)
Next, I defined some FrameType
values for the type
field,
which I can use to determine what is done with that data next,
something Ethernet was originally missing, but has since grown
to depend on (who needs Length anyway? Not me. See below!)
FrameType | Description | Bytes |
Raw | Bytes in the Payload field are opaque and not to be parsed. | [2]byte{0x00, 0x01} |
IPv4 | Bytes in the Payload field are an IPv4 packet. | [2]byte{0x00, 0x02} |
And finally, I decided on a maximum length of the Payload, and decided on limiting it to 1500 bytes to align with the MTU of Ethernet.
var (
FrameTypeRaw = FrameType{0, 1}
FrameTypeIPv4 = FrameType{0, 2}
)
Given we know how we’re going to marshal and unmarshal binary data to and from Frames, we can now move on to looking through the bit stream for our Frames.
Why is there no Length field?
I was initially a bit surprised that Ethernet Frames didn’t have a Length field in use, but the more I thought about it, the more it seemed like a big ole' failure mode without a good implementation outcome. Either the Length is right (resulting in no action and used bits on every packet) or the Length is not the length of the Payload and the driver needs to determine what to do with the packet – does it try and trim the overlong payload and ignore the rest? What if both the end of the read bytes and the end of the subset of the packet denoted by Length have a valid CRC? Which is used? Will everyone agree? What if Length is longer than the Payload but the CRC is good where we detected a lost carrer?
I decided on simplicity. The end of a Frame is denoted by the loss of the BPSK carrier – when the signal is no longer being transmitted (or more correctly, when the signal is no longer received), we know we’ve hit the end of a packet. Missing a single symbol will result in the Frame being finalized. This can cause some degree of corruption, but it’s also a lot easier than doing tricks like bit stuffing to create an end of symbol stream delimiter.
Finding the Frame start in a Symbol Stream
First thing we need to do is find our sync
bit pattern in the symbols we’re
receiving from our BPSK demodulator. There’s some smart ways to do this, but
given that I’m not much of a smart man, I again decided to go for simple
instead. Given our incoming vector of symbols (which are still float
values)
prepend one at a time to a vector of floats that is the same length as the
sync
phrase, and compare against the sync
phrase, to determine if we’re in
sync with the byte boundary within the symbol stream.
The only trick here is that because we’re using BPSK to modulate and demodulate
the data, post phaselock we can be 180 degrees out of alignment (such that a +1
is demodulated as -1, or vice versa). To deal with that, I check against both
the sync
phrase as well as the inverse of the sync
phrase (both [1, -1, 1]
as well as [-1, 1, -1]
) where if the inverse sync is matched, all symbols
to follow will be inverted as well. This effectively turns our symbols back
into bits, even if we’re flipped out of phase. Other techniques like
NRZI will represent a 0 or
1 by a change in phase state – which is great, but can often cascade into long
runs of bit errors, and is generally more complex to implement. That
representation isn’t ambiguous, given you look for a phase change, not the
absolute phase value, which is incredibly compelling.
Here’s a notional example of how I’ve been thinking about the phrase sliding window – and how I’ve been thinking of the checks. Each row is a new symbol taken from the BPSK receiver, and pushed to the head of the sliding window, moving all symbols back in the vector by one.
var (
sync = []float{ ... }
buf = make([]float, len(sync))
incomingSymbols = []float{ ... }
)
for _, el := range incomingSymbols {
copy(buf, buf[1:])
buf[len(buf)-1] = el
if compare(sync, buf) {
// we're synced!
break
}
}
Given the pseudocode above, let’s step through what the checks would be doing at each step:
Buffer | Sync | Inverse Sync |
[…]float{0,…,0} | ❌ […]float{-1,…,-1} | ❌ […]float{1,…,1} |
[…]float{0,…,1} | ❌ […]float{-1,…,-1} | ❌ […]float{1,…,1} |
[more bits in] | ❌ […]float{-1,…,-1} | ❌ […]float{1,…,1} |
[…]float{1,…,1} | ❌ […]float{-1,…,-1} | ✅ […]float{1,…,1} |
After this notional set of comparisons, we know that at the last step, we are now aligned to the frame and byte boundary – the next symbol / bit will be the MSB of the 0th Frame byte. Additionally, we know we’re also 180 degrees out of phase, so we need to flip the symbol’s sign to get the bit. From this point on we can consume 8 bits at a time, and re-assemble the byte stream. I don’t know what this technique is called – or even if this is used in real grown-up implementations, but it’s been working for my toy implementation.
Next Steps
Now that we can read/write Frames to and from PACKRAT, the next steps here are going to be implementing code to encode and decode Ethernet traffic into PACKRAT, coming next in Part 5!