Perfecting Protocol Parsing (Probably) with eBPF
I recently had a little bit of time to kill and decided to see if I could actually do some parsing of other protocols with eBPF. The previous post that I created was about http and whilst it’s an important application protocol to be able to read and potentially manipulate, it feels like there was only so much that could be done. Webpages are highly dynamic and can contain large amounts of data, which are qualities that aren’t always the best to try and parse with eBPF.
So my next attempt was to see how difficult it would be in order to parse something a bit spicier 🌶️! So I recently wrote a basic parsed for BGP messages, which originally was designed to just parse the first bit of data to understand the different message types and give some insight into what was occurring when BGP peers are sending info back and forth, it evolved over the weekend a little bit and now understands peering information and before I decided to write this can now manipulate the data between peers (without the BGP software being aware).
Code is available here
So to begin we will need to do what we always do when we have some network data (the socket buffer skb) in eBPF, which is to check it’s HTTP->TCP/UDP and strip off the headers once we are looking at the correct traffic. This is covered in the previous two eBPF posts, and is in all of the example code so I wont duplicate triplicate the code here. With all of these headers removed (I say removed, we just move the pointer (bit like the needle on a record player) past them so we’re now left with the data portion remaining). With our raw data remaining we now need to convert this into a format that matches the protocol itself, so lets start there!
Protocols
A lot of these protocols are pretty old, and are detailed in documents called Request for Comments or an rfc. These documents put together by experts in the field largely define the architecture of a protocol and a good example, which I used in order to parse HTTP is this one and you can see that this was originally authored in 1999.
So lets get to the crux of it, if you’ve been working with JSON/YAML/XML etc. or anything else that is obviously structured then abandon hope all yee who enter 😂 Almost every protocol has it’s own unique way of how it structures data, some are cleaner than others. To begin with BGP seemed pretty straight forward…
To begin with we’ll need to use the rfc document for the BGP standards, quickly reading through this we can understand that every BGP message starts with the same “fixed size” header:
| 1 | 0 1 2 3 | 
(the diagrams in rfcs are a tad confusing, however the descriptions are a bit clearer)
Simply put the marker should be 16 octets! (aka 16 bytes), the length should be 2 octets (2 bytes or 16 bits) and the type is 1 octet (1 byte or 8 bits) and with this information we can create a structure to put the raw data in that will allow us to shape it into the BGP Message header.
| 1 | struct bgp_message { | 
The joys of padding
If we look at our struct above we can see marker is 16 bytes, length is 2 and type is 1 giving us a grand total of (drum roll 🥁) … 19 bytes. So why oh why, when we do a sizeof(bgp_message) do we end up with 20 bytes 🤯 This was specifically an issue with the BGP keep alive messages that consist of just a BGP message (Type set to 4), where I would attempt to read the BGP message header (expecting it to be 19 bytes) and the compiler was trying to read 20, which was obviously 1 byte too many causing the load_bytes function to fail. 
So after some annoying failed attempts to copy 20 bytes our of 19, I realised that my bpg_message struct is probably being padded, this is process of adding some additional data making it more efficient for the CPU to load and store the data. More detail is available here, in most cases it’s not a problem, however we need everything to align perfectly so set packing per byte we can add the following #pragma pack(1) (which effectively disables packing). Now our struct is the correct size and we have will be able to retrieve data from the skb without causing any errors. 
Getting data from the skb
So we should have a variable that points to the location in the skb where the data lives, this after the frame/IP/TCP headers have ended in my code it’s usually poffset. We will create a variable called bgpm that will now populate with the bytes from the skb using the bpf_skb_load_bytes function.
| 1 | struct bgp_message bgpm; | 
(We can see that the sizeof(bgpm), with the padding enabled was causing this to fail as there were only 19 bytes left in the skb and we were trying to load 20 🙄)
Once we have the header, we need to move our poffset so that we point to whatever exists after the header 
| 1 | poffset += sizeof(bgpm); // remove header | 
Understanding application data
We have successfully parsed the header, so we can now use this information to start to understand what the additional data remaining is and with BGP the message type and the length of the remaining data are key. The bgp.type will be one of the following values:
| 1 | 1 - OPEN | 
Where the bgp.length will represent how much data exists (including the header), so to determine how much “remaining” data is left we would remaining = bgpm.len - sizeof(bgpm) given a KEEPALIVE message is just the header, this should return 0. However other message types often come with additional data!
So lets parse the header, and we’ll look at the UPDATE message in further detail!
| 1 | #define BGP_OPEN 1 | 
(As every message comes through the kernel we parse the header and then process the remainder of the data)
The UPDATE message is (personally) pretty bonkers:
| 1 | +-----------------------------------------------------+ | 
As mentioned, if you’ve been writing/parsing JSON or higher level data structures then arrays etc. are pretty simplistic. With these older structures we will need to do various bits of logic to determine how many pieces of information are marked as variable. 
The Withdrawn Routes is straight forward enough, the Total Path Attributes is mind boggling… 
Without screaming into the void too much, we’re given {x} amount of bytes as the Path Attributes and we would need to do the following:
- Read the first 3bytes to get theflags/type/len.
- Then dependant on the typeread another random sized number of bytes, as each path attribute contains a different amount of data
- We can loadthatleninto the another specific path attribute struct and read that particular data
- Move the data pointer forward the size of the Path Attribute“header” and the length of the remaining datalen.
- Once we’ve done all that move the poffsetthe size of theTotal Path Attributes Lengthso we can read the NLRI data
- Sip a large glass of whisky
Modify the BGP data
So whilst everything detailed above is great for gaining insight into what is happening from a BGP perspective, perhaps we may want to impose some changes to the BGP data as it’s flowing through! For this example we will change the AS number of a new route as it’s being pushed out to a ToR switch. In order to do this we will need to look for the Path Attribute with the type of 2 known as the AS_PATH detailed here.
| 1 | struct bgp_path_as { | 
(Here is the format defined as a C struct)
At this point we’ve gone through through each of the Path Attributes found type 2/AS_PATH and pulled it from the skb, and we want to change it to a different AS number.
| 1 | bgp_as.as = bpf_htonl(65002); | 
*(NOTE: pathOffset points to after the path header of the AS_PATH entry exists)
Here we can use the bpf_skb_store_bytes to write an updated bpg_as that has our changed AS number, this helper also has the flag BPF_F_RECOMPUTE_CSUM that takes care of fixing any checksum changes due to the changed underlying data. 
NOTE: You should notice that where we’re assigning the new AS 65002 we’re wrapping it with the function bpf_htonl, which is effectively changing a host to network long. Simply put numbers that are used for networking use a different “endian” (the number are stored backwards), you can read more about that here.
The user land BGP program that is peering to the ToR is blissfully unaware that the route it is advertising is using a different AS number 😂 at this point.
Outro
The RFC docs are a great way to begin to understand what this seemingly opaque block of data that proceeds the various headers when processing network data with eBPF. The lack of unbounded loops and some other freely expressible way of manipulating data mean that extra thought has to be given when looking and parsing application data. But with a thoughtful approach I don’t see why most protocols can’t be processed by eBPF, today we need to bind programs to TC (Traffic Control) but once XDP has egress support we can offload so much application processing that the network layer will become incredibly powerful. I’m exciting to parse more protocols :-) (DNS next).