Application traffic with eBPF

Posted on 2023-12-08 Edited on 2023-12-11 Disqus:

In a previous post I talked a little bit about building up the knowledge with eBPF to start to understand a little bit more about what is going in and out of a network adapter. Basically taking your ethernet frame and stripping off the headers (Ethernet + IP Header + TCP/UDP Header) you are finally left with what remains within the packet from an application or data sense.

All of the code lives within the “learning eBPF” repository, specifically the eBPF code is here. The plan for this post is to step through the bits that I think are useful or could be important…

Note This code did do some Ingress/Egress packet modification so uses some eBPF helpers that requires 6.1+ of the Linux Kernel to work.

The maps!

Presumably you’ve come across these before? If not never fear!! Simply put an eBPF map is the mechanism for communicating between user-land and the in-kernel eBPF program. What is exceptionally cool (in my mind at least) is that these maps use keys and values.. so I don’t have to loop around data comparing and looking for what matches whatver it is i’m looking for, I pass a key and if something matches I get the corresponding data :D

Below is the map that I will use, which is called url_map the key is 20 characters long (a bounded “string” some might say), and the value that is assigned to that key is a struct that i’ve defined above.

// Defines a different URL associated with a key
struct url_path {
  __u8 path_len;
  __u8 path[max_path_len]; // This should be a char but code generation between here and Go..
};

// Defines my URL map
struct {
  __uint(type, BPF_MAP_TYPE_HASH);
  __uint(max_entries, 1024);
  __type(key, char[max_path_len]);
  __type(value, struct url_path);
}
url_map SEC(".maps");

The eBPF programs!

There are two eBPF programs defined in the code tc_egress and tc_ingress, bonus points if you can guess how they are attached! For this post, we will only concern ourselves with the tc_ingress program.

So as we would see in the myriad of examples that already exist we need to go through the header identification dance.

Do the sanity checks, and cast the data to the type of ethhdr (Ethernet header)
Find the protocol within the ethernet frame by reading the h_proto within the ethernet header (also called Ethertype).
Cast the data after the ethernet header as a iphdr (IP header)
Find the protocol within the IP Header, we also will need to determine the size of the IP header (turns out they can be different sizes! ¯\_(ツ)_/¯)
To determine the size of the header we times it’s value by four, why I hear you ask! Well this value is multiplied by 32bits do determine the size of the header, so if the value was 6 then the header would be 192 bits (or 24 bytes). So to simply determine the IP header in bytes we can multiply this value by 4!
Cast the data *after the IP Header as a tcphdr (TCP Header)
Like step (5) we will need to determine the size of the TCP Header (it again can be dynamic) and it’s the same step here, we simply need to multiply the value doff by four to determine the header size in bytes.
With all of this calculated we can now infer that the data lives at the end of the Ethernet Header size, the IP Header size and the TCP Header size.
Finally we can determine how big the application data is by taking the tot_len (total length) from the IP Header and subtracting the IP and TCP Header sizes.

Application Data !!

In order to read this data we will need a few things that were mentioned above!

First, we will need the data offset (where the data starts) and that is found after the Ethernet header + the IP Header size (once calculated) and the TCP Header (again, once calculated). We will also need a buffer in order to store the data we will be reading from the socket buffer.

// A data buffer to store our application data
char pdata[60];

// Calculate the offset to where our data actually lives
poffset = ETH_HLEN + ip_hlen + tcp_hlen;


// Load data from the socket buffer, poffset starts at the end of the TCP Header
int ret = bpf_skb_load_bytes(skb, poffset, pdata, 60);
if (ret != 0) {
   return 0;
}

We use the bpf_skb_load_bytes to read the a set amount of data (60 bytes) into our buffer (pdata) from the socket buffer (skb) starting from the offset where we know the data is (poffset)!

At this point we have 60 bytes of data, should be enough for us to write some code to understand it.

HTTP Data :-)

Lets look at what happens when we try a HTTP request!

 ~ curl code/test -vvv
*   Trying 192.168.0.22:80...
* Connected to code (192.168.0.22) port 80 (#0)
> GET /test HTTP/1.1
> Host: code
> User-Agent: curl/7.87.0
> Accept: */*

...

I’m using curl to request the URL /test from the host code (code is my development VM, that runs code-server). We can see the data that is sent to the server (each line begins with > to determine the direction of communication). The first line of data in a HTTP request is typically a verb followed by the resource we would like to interact with and this request ends with the HTTP specification and a carriage return as defined in the HTTP standards. So we can see the line that we care about is GET /test (we/I don’t really care about the HTTP specification at this point :D).

Find the HTTP method

The first step is to read the first three characters of pdata and make find if pdata[0] == G, pdata[1] == E and pdata[2] == T this will effectively allow us to find if this is both a HTTP request in the first place and specifically if it is a HTTP request!

Once we’ve validated those first 3 bytes we will want to read the more data starting from the 4 byte (three bytes for the request and one for the space between)!

char path[max_path_len];
memset(&path, 0, sizeof(path));

int path_len = 0;

// Find the request URI (starts at offset 4), ends with a space
for (int i = 4; i < sizeof(pdata) ; i++)
{
    if (pdata[i] != ' ') {
        path[i-4] = pdata[i];
    } else {
        path[i-4] = '\0';
        path_len = i-4;
        break;
    }
}

The above function will read through the rest of the HTTP data (from the 4th byte) until it encounters a space, leaving us with the URL we are trying to GET! We can validate this with a debug print statement:

1	bpf_printk("<- incoming path [%s], length [%d]", path, path_len);

Which will look like the following in your logs:

1	<idle>-0 [001] dNs3. 2252901.017812: bpf_trace_printk: <- incoming path [/test], length [5]

Acting on the HTTP application request

The above explanations detail what and how we’re reading the data, but if we want to “dynamically” look up the HTTP requests we will need to make use of eBPF maps.

In our GO userland code we do the following:

path := flag.String("path", "", "The URL Path to watch for")
flag.Parse()

// ... 

// Create a uint8 array
var urlPath [20]uint8
// copy our bytes into the uint8 array (we can cast)
copy(urlPath[:], *path)

// place our urlPath as the key
err = objs.UrlMap.Put(urlPath,
  bpfUrlPath{
    Path:    urlPath,
    PathLen: uint8(len(urlPath)),
  })
if err != nil {
  panic(err)
}

As we can see in the code above our Go program when started will read from the flag -path and that will be used as a key in our eBPF map, the value can be ignored for now.

struct url_path *found_path = bpf_map_lookup_elem(&url_map, path);
if (found_path > 0) {
    bpf_printk("Looks like we've found your path [%s]", path);
    // perhaps do more, block traffic or redirect?
}

In our eBPF program we will do a map lookup on the HTTP request, if that request as a char array exists as a key then we can operate on it!

Starting our Go program now sudo ./http -interface ens160 -path /test will yield the following:

INFO[0000] Starting 🐝 the eBPF HTTP watcher, on interface [ens160] for path [/test] 
INFO[0000] Loaded TC QDisc                              
INFO[0000] Press Ctrl-C to exit and remove the program  
          <idle>-0       [001] d.s3. 2252901.015575: bpf_trace_printk: <- 0.0.0.0:56345 -> 0.0.0.0:80
          <idle>-0       [001] D.s3. 2252901.015642: bpf_trace_printk: -> 192.168.0.22:80 -> 192.168.0.180:56345
          <idle>-0       [001] d.s3. 2252901.017552: bpf_trace_printk: <- 0.0.0.0:56345 -> 0.0.0.0:80
          <idle>-0       [001] d.s3. 2252901.017793: bpf_trace_printk: <- 0.0.0.0:56345 -> 0.0.0.0:80
          <idle>-0       [001] dNs3. 2252901.017812: bpf_trace_printk: <- incoming path [/test], length [5]
          <idle>-0       [001] dNs3. 2252901.017814: bpf_trace_printk: Looks like we've found your path [/test]

Conclusion

Parsing HTTP isn’t too bad as it is a relatively simple protocol, it uses easy verbs and simple methods for structure with spaces and carriage returns to differentiate. This methodology would potentially work OK with other protocols like DNS, POP3 or SMTP. When things are encrypted we would need someway of decrypting before we can parse the data (that’s beyond me…). However, I hope that this sparks some ideas into playing more with eBPF and attempting to parse and operate on applications with eBPF!