Building your own service mesh
I saw a few mentions about “service mesh” and mTLS amongst other things during the KubeCon US week and given some of the messing around i’d been doing with eBPF recently I asked myself “how hard could it be to write one from scratch”?
The service mesh shopping list
There are a bunch of components that we will need to implement in order for us to implement the “service mesh” type behaviour. Most service meshes implement a heck of a lot more, we’re exploring the basics needed to implement it.
Traffic redirector 🚦
We need a way of taking traffic from an application and sending it elsewhere, typically to our proxy where we will potentially modify the traffic. The traffic needs to be redirected in a way where the application does’t need to know about it occurring, however we need to ensure that the traffic will reach its destination and traffic is returned in a way that makes sense to the application. In most circumstances this is handled by iptables
rules that will change the source and destination of the packets as they navigate the kernel. As a pod initiates a connection to another pod within the cluster we will need to redirect it to our program, which we will call the proxy.
The Proxy
Our proxy will need to be listening somewhere that is accessible on the network and as outbound connections are created their destination will be modified to that of the proxy (we also need to keep a copy of that destination somewhere). At this point we will start receiving data from the source and it is here were we opportunity to potentially change the original traffic or parse the traffic and then make decisions based upon what we learn.
The Injector 💉
The injector is code that will modify the behaviour of Kubernetes so that when new workloads are scheduled an additional container could be added, or something could run before the workload starts that will write iptables/nftables rules into the kernel.
Certificates 📝
If we are wanting to use mTLS between pods then we will need to create certificates, these certs will need things like the pod IPs or pod hostnames etc. in order for the certificates to work. Given that we wont know these details until the pod starts we will need to capture this information by watching Kubernetes and creating the certificates when we see a pod being created.
Lets get started 🐝
If I can’t control the traffic then I can’t do anything, so first things first, I’m going to use eBPF in order to manipulate the traffic and make sure that it is sent to where I need it to go. Why eBPF? well because!
So lets walk this through…
There are a bunch of methods for manipulating traffic XDP, TC, sockets etc.. so what’s the choice?
- XDP? Nope, no egress and if we’re wanting to capture traffic being initiated out to somewhere else, then that’s egress.
- TC? It has egress, BUT it’s already gone through the kernel, iptables, sockets etc.. changing the traffic to send back into the kernel is a bit of a pain.
- Sockets, seems like the best option for what we’re aiming for.
The eBPF 🐝 magic 🪄
Our eBPF code is going to manipulate the L3 & L4 behaviour of packets as they traverse the kernel and in some-cases user-land (i.e. the proxy).
The life of our packet is the following!
For this walkthrough:
pod-01
is 10.0.0.10pod-02
is 10.10.0.20
- Our eBPF program is started and is passed the CIDR range of pods in our Kubernetes cluster and the
pid
of the proxy, this is done through an eBPF map. - The application within the pod (pod-01) is wanting to create an outbound connection
connect()
, in this case to pod-02. This would typically be a high internal port32305
(for example) attempting to connect outbound. - The eBPF program will change the destination from
10.10.0.20
to the proxy that is listening on localhost, so10.10.0.20:<port>
would become127.0.0.1:18000
. - We also stuff the original destination address and port into a map, which uses the socket “cookie” as it’s key.
- The proxy on
127.0.0.1:18000
will receive all the TCP magic from the application that started the connection and once the socket has been established we hook in with eBPF. - Here we will add to another map the source port
32305
and the unique socket “cookie”. - The proxy has an established connection from the application, however it needs to know the original destination, we do this through calling a syscall
getsockopt
with a specific optionSO_ORIGINAL_DST
. This is captured by eBPF, which it does a look up on the src port32305
to find the cookie, it then uses thecookie
to look up in another map to return the original destination10.10.0.20:<port>
. - The proxy can now establish a connection outbound to the destination pod or another proxy (this will be covered later).
- As traffic is
read()
from the proxy it is then forwarded to the internal connection and the application inpod-01
processes it as if there was no proxy in the middle.
Why do we pass the pid
of the proxy into the eBPF program? (I hear you ask)
Well, we would end up in a loop if the proxy has it’s out bound connections looped back to itself. So if we see a connection from the proxy then we don’t redirect it.
Abridged logs
1 | $ kubectl logs pod-01 -c smesh-proxy |
Summary
This post steps through the bits needed in order to form a service mesh and how we use eBPF in order to redirect traffic to another process listening within the same pod. We know that this is achievable, but we now need to understand how to architect these pieces and get traffic across to the other pod! (which i’ll cover in the next post)
UPDATE: That post is now available here