Balancing the API Server
This is a lazy introduction to options that can be employed to load balancer the Kubernetes API-Server, however in this first post we will be focusing on the Kubernetes API-Server and load balancers in a general sense just to get a slightly deeper understanding what is happening.
Kubernetes API server
As its name suggests the Kubernetes API server is the “entry-point” into a Kubernetes cluster, and allows for all CRUD (Create/read/update/delete)
operations. The interaction with the Kubernetes API server is typically through REST and JSON/YAML payloads that define objects within the Kubernetes cluster. A Kubernetes API server will happily run as a singular instance, however this (by its very design) is a single point of failure and offers no high availability. In order to provide a resilient and highly available Kubernetes API server then a load balancer should be placed above the Kubernetes API server replicas and have the load balancer handle ensuring that a client is passed to a healthy instance.
Example Kubernetes API architecture
Here a load balancer 192.168.1.1
will select one of the two instances beneath it and present it to the client
1 | ----------------- |
Load balancers
Load balancers have been a heavily deployed technology for a number of decades, their use-case has typically been to provide a solution to two technical challenges:
- Scaling workloads over application replicas
- High Availability over “working” application replicas
With scaling workloads a load balancer will take requests and dependant on the distribution algorithm send the request to one of the pre-configured backends.
The High availability the load balancer will typically employ a technology that will determine if an endpoint is working as correctly and use that to determine if traffic can be sent to that endpoint. As far as the end user is concerned, traffic will always be hitting a working endpoint as long as one is available.
Load balancer architecture
The majority of load balancers operate using a few common structures:
Front end
The front end is the client side part of the load balancer, and it is this front end that is exposed to the outside world. Either an end user or an application will connected to the front end in an identical manner that they would connect to the application directly. The load balancing should always be transparent to the end user/application that is connecting to the load balancer
Back end(s)
When a front end of a load balancer is accessed by a client, it is the load balancers main purpose to then redirect that traffic transparently to a back end server. A back end server is the location where the load balancer application should actually be running, and the load balancer will typically perform a check to ensure that the application is available before sending any traffic to it. As the main point of a load balancer is to provide both high availability and scaling then multiple back ends are usually configured under a single front end, these are typically called a pool from which the load balancer will select one from the many healthy back ends.
Selection Algorithm
Once a front end is defined and a pool of backends have been added there typically will be a predetermined selection algorithm that will be used in order to make the decision which backend should be chosen from the pool. As this is meant to be a simple overview, we will only cover the two most commonly used algorithms:
Round-Robin
This algorithm is arguably the most simple, and will simply loop through the pool of back end servers until they’re exhausted and start again.
Weighted
This method provides the capability of having traffic pre-determined to go more heavily to some backend in the pool than others. Weighted algorithms can be different between load balancers but if we were to pretend that it were based upon simple percentage we could imagine the following two examples:
Equal weighting
backend1.com 50%
backend2.com 50%
This would almost be the same as having round-robin load balancing as requests would be based equally between the two backends in the pool.
Weighted load balancing
backend1.com 10%
backend2.com 90%
This would mean that 1 out of every ten requests would be going to backend1.com, and the remaining 9 would be going to backend2.com. The main use-cases for this are typically things like a phased approach of a new release of an application (canary deployment)
Types of Load balancer
Load balancers provide a number of different mechanisms and ways of exposing their services to be consumed, each of these mechanisms has various advantages and gotchas to be aware of. The majority of load balancers will provide their service at a particular layer in the OSI stack > wiki link
Type 3 Load balancer
Type 3 can typically be thought of as “endpoint” load balancer. This load balancer will expose itself using an IP address sometimes referred to as a VIP (Virtual IP address) and then it will load balance incoming connections over one or more pre-defined endpoints. Health checks to these internal endpoints are typically performed by attempting to either create a connection to the endpoint or a simple “ping” check.
Type 4 Load balancer
A Type 4 load balancer will usually provide it’s functionality over a service that is exposed as a port on an IP address. The most common load balancer of this type is usually a web traffic load balancer that will expose traffic on the IP address of the load balancer through TCP ports 80 (un-encrypted) and 443 for (SSL encrypted web traffic). However this type of load balancer isn’t only restricted to web based traffic and can load balance connections to other services that listen on other TCP ports.
To ensure that the pool of backends are “health” a Type 4 load balancer has the capability to provide that not only does a backend accept connections, but also that the application is behaving as expected.
Example
Un-Healthy Application
- Ping endpoint (successful)
- Attempt to read data (a lot of web applications will expose
/health
URL that can determine if the application has completed its initialisation), if the application isn’t ready then it will return a HTTP error code 500 (anything >300 is typically not healthy)
Type 7 Load balancer
A Type 7 load balancer has to have knowledge about a particular protocol (or application knowledge) in order for it to provide the load balancing to a particular pool of endpoints. This particular load balancer if often found in-front of large websites that host various services under the same domain name.
Example
Consider these two urls:
Under the previous two types of load balancer, all traffic will only go to the same pool of servers regardless of the actual URL requested by the end user.
In a Type 3 load balancer:
example.com
–> resolves to –> load balancer IP
–> which then selects a server from –> endpoint pool
In a Type 4 load balancer:
https://example.com
–> resolves to –> load balancer IP
–> and the https://
resolves to port 443 –> which then selects a server from –> endpoint pool
The Type 7 load balancer however employs the knowledge of the protocol/application and can perform the load balancing based upon certain levels of application behaviour. In this example the load balancer can examine all of the networking decisions to direct traffic to the correct pool of servers, however it can also make additional decisions now based upon things like application behaviour or client request.
In a Type 7 load balancer:
https://example.com/finance
–> resolves to –> load balancer IP
–> and the https://
resolves to port 443 –> load balancer identifies traffic as http/s and can parse behaviour –> /finance
read as the URI –> Selects from the “finance” endpoint pool.
Existing load balancers and examples
There has been a large market for load balancers for a number of decades now as the explosion of web and other highly available and scalable applications has driven demand. Originally the lions share of load balancers were typically hardware devices, and would sit with your switching, routing, firewall and other network appliances. However as the speed of change and requirements for quick and smaller load balancers has taken hold, the rise of the software based load balancer has become prevalent. Originally the software load balancer was limited due to consumer hardware limitations and missing functionality, however with hardware offloads on NICs and in-built crypto in CPUs a lot of these issues have been removed. Finally, as cloud services have exploded so has the LBaaS (load balancer as a service) providing an externally addressable IP that can be easily pointed at a number of internal instances through wizards/APIs or through the click of a few buttons.
A Quick Go example of a Load Balancer
Below is a quick example of writing your own http/s
load balancer, that will create a handler function that will handle the passing of traffic to one of the backend servers and return the results back to the client of the load balancer. The two things to consider here are the backendURL()
function on the second line the the req.
(HTTP Request) modifications.
backendURL()
- This function will choose a server from the available pool and use it as a backend serverreq.XYZ
- Here we’re re-writing some of the HTTP request so that the backend returns the traffic back to the correct client.
1 | handler := func(w http.ResponseWriter, req *http.Request) { |
HAProxy load balancing example for Kubernetes API server
You will find below a simple example for haproxy -> /etc/haproxy/haproxy.cfg
that will load balance from the haproxy VM/server to the two Kubernetes control plane nodes.
1 | frontend kubernetes-api |
Nginx load balancing example for Kubernetes API server
You will find below a simple example for nginx -> /etc/nginx/nginc.cfg
that will load balance from the nginx VM/server to the two Kubernetes control plane nodes.
1 | stream { |
Next steps
This entire post was actually leading up to load balancing the Kubernetes API server with nftables, so click here to read that next. :-)