Continuing building your service mesh

Posted on 2024-12-02 Disqus:

In a previous post I detailed (in my head at least) the “shopping list” of bits needed to implement a simple service mesh, you can read that here https://thebsdbox.co.uk/2024/11/30/Building-your-own-service-mesh/. Whilst that post covered some of the theoretical bits and the eBPF 🐝 magic, my aim is to wrap up all of the other pieces needed here.

The proxy

In this “build your own” service mesh, the proxy will live next to the application that we care about.

OMG a sidecar 😱

How that sidecar gets there can be an interesting discussion, so lets look at the choices.

You maintain some yaml, basically your deployments will need to have the sidecar added to to them. This has to be before you deploy as you can’t add a sidecar to an existing pod.
But what about ephemeral containers! (I hear you ask), well they’re pretty good and yes you can add them to an existing pod. BUT, if you need to mount files (like 🔐 certificates), then Volumes need adding to the pod.spec and you can’t do that. AH HA! I hear you think, use secrets and environment variables! that way it doesn’t modify the main body of the pod.spec, just the pod.spec.ephemeralcontainer[x]. Great idea, but it doesn’t work and there has been an issue open about it for nearly (checks notes) 2 years.
Ye olde sidecar injector! The defacto method of modifying a pod.spec before it’s actually committed to the Kubernetes API. (I won’t go into detail as people have written about this for some time)

So regardless of how the proxy gets there, it needs to be there. There will be a proxy in every application that we care about. When one pod wants to talk to another pod it will actually be the proxies that are doing the talking!

What is in a proxy

Our eBPF code
Code that will create connections and TCP listeners for sending and receiving traffic
The required certificates in order for traffic to be encrypted

Proxy startup

On startup out proxy will determine it’s own pid and along with the pod CIDR range add that to an eBPF map, it will then attach our eBPF programs to the kernel. Once that has occurred it will start the proxy listener, this is where our proxy will receive all traffic forwarded by our eBPF program. It will then read the certificates (from the filesystem or environment variables) and once they’re loaded it will start another listener for incoming TLS connections! That’s it.

Proxy running lifecycle

The proxy is listening on it’s internal proxy port
A new connection is received on this port hijacked by eBPF 🐝, where we do a getsockopt syscall with the option SO_ORIGINAL_DST. This returns to us the original destination address and port that this connection was heading to before we hijacked it with eBPF.
We create a new outbound connection to the original destination address, however we substitute the port with that of the proxies TLS listening port. This initiates a new TLS connection between both proxies!
The source proxy will send to the destination proxy the port that we were originally accessing.
The destination proxy will connect to that port and begin receiving traffic from the source.
At this point we have full end-to-end connectivity from one application to another, without the applications realising that we’re in the middle!

Creating Certificates 📝

In order for the TLS to work the certificates will need to be created with the correct details, namely the IP addresses of the pods to ensure that the identification works correctly. This raising a chicken and egg scenario as ideally we require these details asap, however we only can be allocated this IP address once the pod has been created by the Kubernetes API. As we can’t modify the Volume section of a pod once it has been created we can refer to secrets as environment variables before they have been created.

We then write some code to using Kubernetes informers, there is excellent detail here. These informers will “inform” us when a pod has been both created and updated , we care more about the update as this is the operation where the pod.status.podIP will be populated with the address we care about. Once we have this we can create the required certificates and upload them as a secret to be used by the proxy container.

The final piece is the injector 💉

This is relatively straight forward, this piece of code will on startup register through the AdmissionController that certain resources (pods in our case) when created be sent to our code. Which will patch the pod.spec to now include our container as an initContainer, and be sent back to the Kubernetes API server to be scheduled.

In Summary

We “kind of” have the makings of a service mesh at this point (in my mind at least), we transparently move traffic from the application through our proxy, where we can apply what we wish. In this Proof of concept we newly mint certificates and then establish end to end mTLS, where traffic is encrypted between source and destination. Although that doesn’t mean we have to end there 😀

What next..

All the source code for this experiment is available at https://github.com/thebsdbox/smesh so feel free to go and have a look around. It isn’t the tidiest, but it does work 😂