Putting a VIP in your Kubernetes Clusters

In this post i’ve a bunch of things I want to cover all about Type:LoadBalancer (or in most cases a VIP (Virtual IP address). In most Kubernetes environments a user will fire in some yaml defining a LoadBalancer service or do a kubectl expose and “magic” will occur. As far as the end-user is concerned their new service will have a brand new IP address attached to it and when an end-user hits that IP address their traffic will hit a pod that is attached to that service. But what’s actually occuring in most cases? Who can provide this address? How does it all hang together?

That’s what we’re going to cover in this post.

The venerable Tim Hockin covers this in a bunch of slidedecks here -> https://speakerdeck.com/thockin/bringing-traffic-into-your-kubernetes-cluster, so feel free to start here and then i’ll cover some bits in more detail or from a different angle :-)

Kubernetes Services

A lot of this is already mentioned above, but put simply a service is a method of providing access to a pod or number of pods either externally or internally. A common example is exposing a web server front end, where we may have a deployment of 10 nginx pods.. we need to allow end users to access these 10 pods. Within Kubernetes we can define a service that is attached to this deployment, and thanks to the logic within Kubernetes we don’t need to concern ourselves too much with these pods.. we can scale up/scale down.. kill pods etc.. as long as we’re coming through the service it will always have an upto date list of the pods underneath it.

Types of service

ClusterIP

This is an internal only service.. typically used for internal communication between two services such as some middlewhere level connectivity.

NodePort

A NodePort is a port created on every node in the cluster. An external user can connect to a node address on this port to access the service. If we were to use a NodePort with nginx then we’d be given a high port (usually 30000+) that will route traffic to the nginx ports e.g. worker0xx:36123 --> [nginx-pod0x:80]

LoadBalancer

The LoadBalancer service is used to allow external access into a service, it usually requires something external (Cloud Controller Manager) to inform the Kubernetes API what external address traffic should be accepted on.

ExternalName

This allows a service to be exposed on an external name to point to something else. The main use-case is being able to define a service name and having it point to an existing external service…

The Kubernetes Load Balancer service

So how does all this hang together…

-> also i’ll be walking through how you can implement this yourself NOT discussing how the big cloud providers do it :-)

If you take a “fresh” Kubernetes cluster and create a simple nginx deployment and try to expose it as a LoadBalancer you’ll find that it doesn’t work (or sits in pending).

1	kubectl expose deployment hello-world --type=LoadBalancer --name=my-service`

[some time later]

1 2	NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE my-service LoadBalancer 10.3.245.137 <pending> 8080/TCP 54s

Why??? Well the Type:LoadBalancer isn’t the responsibility of kubernetes, it effectively doesn’t come out of the box… as we can see the internal IP (CLUSTER-IP) has been created but no external address exists. This EXTERNAL-IP is typically a piece of information that is infrastructure specific, in the cloud we need addresses to come from their IP address management systems and on-premises who knows how addresses are managed ¯\_(ツ)_/¯

Cloud Controller Manager

The name can be slightly confusing as anyone can write a CCM and their roles aren’t necessarily cloud specific, i presume it’s more that their main use-case is to extend a kubernetes cluster to be aware of a cloud providers functionality.

I’ve covered Cloud Controllers before, but to save the dear reader a few mouse-clicks i’ll cover it briefly again… The role of a CCM is usually to plug into the Kubernetes API and watch and act upon certain resources/objects. For objects of type node then the CCM can speak with the infrastructure to verify that the nodes are deployed correctly and enable them to handle workloads. The objects of type LoadBalancer would require the CCM to speak with an API to request an IP address that is available to be an EXTERNAL-IP, alternatively a CCM may be configured with an IP address range it can use in order to configure the EXTERNAL-IP.

LoadBalancer watcher

Within the CCM it will have code that will be watching the Kubernetes API, specifically for watching Kubernetes services… When one has the spec.type = LoadBalancer then the CCM will act!

Before the CCM has chance to act, this is what the service will look like:

kubectl get svc my-service -o yaml

(some stuff omitted…)

- apiVersion: v1
  kind: Service
  metadata:
    creationTimestamp: "2021-01-02T17:46:16Z"
    name: my-service
    namespace: default
    resourceVersion: "5281940"
    selfLink: /api/v1/namespaces/default/services/hello-world
    uid: 7c14dbbd-f9fd-4279-8524-7418ff62d08f
  spec:
    clusterIP: 10.107.2.227
    externalTrafficPolicy: Cluster
    ports:
    - nodePort: 30497
      port: 80
      protocol: TCP
      targetPort: 80
    selector:
      app: hello-world
    sessionAffinity: None
    type: LoadBalancer
  status:
    loadBalancer: {}

We can see that the status is blank and the spec.loadBalancerIP doesn’t exist.. well, whilst we’ve been reading this the CCM has acted.. if we’re in the cloud it’s done some API calls or if we’re running our own it may have looked at it’s pool of addresses and found a free address. The CCM will take this address and modify the object by updating the spec of the service.

kubectl get svc my-service -o yaml

(more stuff omitted…)

- apiVersion: v1
 kind: Service
[....]
 spec:
   clusterIP: 10.107.2.227
   externalTrafficPolicy: Cluster
   loadBalancerIP: 147.75.100.235
   ports:
   - nodePort: 30497
     port: 80
     protocol: TCP
     targetPort: 80
   selector:
     app: hello-world
   sessionAffinity: None
   type: LoadBalancer
 status:
   loadBalancer: {}

… and that is it.

That is all the CCM needs to do although but your service will still be <pending> as the status is still blank :-(

Two other things need to happen at this point!

Kubernetes Services proxy

What is this “proxy” I speak of! … well, if you’ve noticed the kube-proxy pod on your cluster and wondered why it’s there then read on!

There is plenty more detail => here, but i’ll break it down mainly for LoadBalancers.

When we expose a service within Kubernetes, then the API server will instruct the kube-proxy to inject a bunch of rules into iptables/ipvs that will effectively capture traffic and ensure that this traffic goes to the pods defined under the service. By default any service we create will have its clusterIP and a port written into these rules so that regardless which node inside the cluster we try to access this clusterIP:port then the services proxy will handle this traffic and distribute it accordingly. The clusterIP are virtual IP addresses that are managed by the Kubernetes API, however with LoadBalancers the traffic is external and we don’t know what the IP address will be ¯\_(ツ)_/¯!

Well, as shown above the CCM will eventually modify the spec.loadBalancerIP with an address from the environment.. once this spec is updated then the API will instruct kube-proxy to ensure that any traffic for this externalIP:port is also captured and proxied to the pods underneath the service.

We can see that these rules now exist by looking at the output for iptables-save, all traffic for the address of the LoadBalancer is now forwarded on…

1 2	root@k8s:~# iptables-save \| grep 147 -A KUBE-SERVICES -d 147.75.100.235/32 -p tcp -m comment --comment "default/hello-world loadbalancer IP" -m tcp --dport 80 -j KUBE-FW-2BRRD7ARGUSQ7MRP

The final piece of the puzzle is getting traffic to the machines themselves …

External traffic !

So between a CCM and the Kubernetes service proxy, we have been given an external IP address that we have for our service and the Kubernetes service proxy will ensure that any traffic in the cluster for that external IP address is distributed to the various pods. We now need to get traffic to the nodes themselves…

Hypothetically, if we had a second network adapter in one of the nodes then we could configure this network adapter with the externalIP and as long as we can route traffic to that IP address then the kubernetes service proxy will capture and distribute that traffic. Unfortunately that is a very manual operation, so what options/technologies could we adopt in order to manage this?

We would usually need a final running piece of software that “watches” a service, and once the service is updated with a spec.loadBalancerIP from the CCM we know that it’s good to advertise to the outside world! Also once we’re exposing this to the outside world we can modify the status of the service with the address that we’re exposing on so that clients and end-users know that they can now use this address!

status:
  loadBalancer:
    ingress:
    - ip: 147.75.100.235

There are two main technologies that we can use to tell an existing environment about our new loadBalancer address, and when someone accesses this address where to then send the traffic!

ARP

ARP (Address resolution Protocol) is a Layer 2 protocol, who has the main task of working out which hardware (MAC address) an IP address belongs to. When an IP address first appears on the network it will typically broadcast to the network that it exists, it will also broadcast the MAC address of the adapter that it’s bound to. This informs the network of this IP <==> MAC binding, meaning that on a simple network when packets need to go to an IP address the switching infrastructure knows which machine to send the traffic.

We can see this mapping by using the arp command:

$ arp -a
_gateway (192.168.0.1) at b4:fb:e4:cc:d3:80 [ether] on ens160
? (10.0.73.65) at e6:ca:eb:b8:a0:f3 [ether] on cali43893380087
? (192.168.0.170) at f4:fe:fb:54:89:16 [ether] on ens160
? (10.0.73.67) at 52:5c:1b:5f:e1:50 [ether] on cali0e915999b8d
? (192.168.0.44) at 00:50:56:a5:13:11 [ether] on ens160
? (192.168.0.45) at 00:50:56:a5:c1:86 [ether] on ens160
? (192.168.0.40) at 00:50:56:a5:5f:1d [ether] on ens160

Load Balancing NOTE: In order for this to work with out Kubernetes cluster, we would need to select a single node that would become in charge of hosting this externalIP and using ARP to inform the network that traffic for this address should be sent to that machine. This is because of two reasons:

If another machine broadcasts an ARP update then existing connections will be disrupted
Multiple machines can’t have the same IP address exposed on the same network

The flow of operation is:

A leader is selected
spec.loadBalancerIP is updated to 147.75.100.235
147.75.100.235 is added as an additional address to interface ens160
ARP broadcasts that traffic to 147.75.100.235 is available at 00:50:56:a5:4f:05
Update the service status that the service is being advertised

At this point the externalIP address is known to the larger network, and any traffic will be sent to the node that is elected leader. Once the traffic is captured by the rules in the kernel, the traffic is then sent to the pods that are part of the service.

BGP

BGP (Border Gateway Protocol) is a Layer 3 protocol, the main task is to update routers with the path to new routes (that new route can be a single address or a range of addresses). For our use-cases in a Kubernetes cluster we would use BGP to announce to the routers in the infrastructure that traffic for a our spec.loadBalancerIP should be sent to one or more machines.

Load Balancing NOTE: One additional benefit over ARP is that multiple nodes can advertise the same address, this actually provides both HA and load-balancing across all nodes advertising in the cluster. In order to do this, we can bind the externalIP to the localhost adapter (so it’s not present on the actual network) and leave it to the kernel to allow traffic from the routers into the kernel and allow it to be proxied by kube-proxy.

The flow of operation is:

spec.loadBalancerIP is updated to 147.75.100.235
147.75.100.235 is added as an additional address to interface lo
A BGP peer advertisement updates the other peers (routers) that the loadBalancerIP is available by routing to the machines address on the network (all machines can/will do this)
Update the service status that the service is being advertised

Any client/end-user that tries to access 147.75.100.235 will have their traffic go through the router, where a route will exist to send that traffic to one of the nodes in the cluster, where it will be passed to the kubernetes service proxy.

Overview

At this point we can see there are a number of key pieces amd technologies that can all be harnesed to put together a load-balancing solution for Kubernetes. However the CCM is arguably the most important, as the CCM has the role of being “aware” of the topology or architecture of the infrastructure. It may need to prep nodes with configuration details (BGP configuration settings etc.) and speak with other systems to request valid addresses that can be used for the loadBalancer addresses.

Putting a VIP in your Kubernetes Clusters

http://thebsdbox.co.uk/2021/01/01/Putting-a-VIP-in-your-Kubernetes-Clusters/

Author

Dan

Posted on

January 1, 2021

Licensed under

The "perfect" virtual Tinkerbell environment on Equinix Metal Previous

Kube-Vip and BGP loadbalancers with Unifi Next