Balancing the API Server

Posted on 2019-06-13 Edited on 2019-06-24 Disqus:

This is a lazy introduction to options that can be employed to load balancer the Kubernetes API-Server, however in this first post we will be focusing on the Kubernetes API-Server and load balancers in a general sense just to get a slightly deeper understanding what is happening.

Kubernetes API server

As its name suggests the Kubernetes API server is the “entry-point” into a Kubernetes cluster, and allows for all CRUD (Create/read/update/delete)
operations. The interaction with the Kubernetes API server is typically through REST and JSON/YAML payloads that define objects within the Kubernetes cluster. A Kubernetes API server will happily run as a singular instance, however this (by its very design) is a single point of failure and offers no high availability. In order to provide a resilient and highly available Kubernetes API server then a load balancer should be placed above the Kubernetes API server replicas and have the load balancer handle ensuring that a client is passed to a healthy instance.

Example Kubernetes API architecture

Here a load balancer 192.168.1.1 will select one of the two instances beneath it and present it to the client

            -----------------
            |192.168.1.1:6443|
            -----------------
                |        |
-------------------    -------------------
|192.168.1.110:6443|   |192.168.1.111:6443|
-------------------    -------------------

Load balancers

Load balancers have been a heavily deployed technology for a number of decades, their use-case has typically been to provide a solution to two technical challenges:

Scaling workloads over application replicas
High Availability over “working” application replicas

With scaling workloads a load balancer will take requests and dependant on the distribution algorithm send the request to one of the pre-configured backends.

The High availability the load balancer will typically employ a technology that will determine if an endpoint is working as correctly and use that to determine if traffic can be sent to that endpoint. As far as the end user is concerned, traffic will always be hitting a working endpoint as long as one is available.

Load balancer architecture

The majority of load balancers operate using a few common structures:

Front end

The front end is the client side part of the load balancer, and it is this front end that is exposed to the outside world. Either an end user or an application will connected to the front end in an identical manner that they would connect to the application directly. The load balancing should always be transparent to the end user/application that is connecting to the load balancer

Back end(s)

When a front end of a load balancer is accessed by a client, it is the load balancers main purpose to then redirect that traffic transparently to a back end server. A back end server is the location where the load balancer application should actually be running, and the load balancer will typically perform a check to ensure that the application is available before sending any traffic to it. As the main point of a load balancer is to provide both high availability and scaling then multiple back ends are usually configured under a single front end, these are typically called a pool from which the load balancer will select one from the many healthy back ends.

Selection Algorithm

Once a front end is defined and a pool of backends have been added there typically will be a predetermined selection algorithm that will be used in order to make the decision which backend should be chosen from the pool. As this is meant to be a simple overview, we will only cover the two most commonly used algorithms:

Round-Robin

This algorithm is arguably the most simple, and will simply loop through the pool of back end servers until they’re exhausted and start again.

Weighted

This method provides the capability of having traffic pre-determined to go more heavily to some backend in the pool than others. Weighted algorithms can be different between load balancers but if we were to pretend that it were based upon simple percentage we could imagine the following two examples:

Equal weighting

backend1.com 50%
backend2.com 50%

This would almost be the same as having round-robin load balancing as requests would be based equally between the two backends in the pool.

Weighted load balancing

backend1.com 10%
backend2.com 90%

This would mean that 1 out of every ten requests would be going to backend1.com, and the remaining 9 would be going to backend2.com. The main use-cases for this are typically things like a phased approach of a new release of an application (canary deployment)

Types of Load balancer

Load balancers provide a number of different mechanisms and ways of exposing their services to be consumed, each of these mechanisms has various advantages and gotchas to be aware of. The majority of load balancers will provide their service at a particular layer in the OSI stack > wiki link

Type 3 Load balancer

Type 3 can typically be thought of as “endpoint” load balancer. This load balancer will expose itself using an IP address sometimes referred to as a VIP (Virtual IP address) and then it will load balance incoming connections over one or more pre-defined endpoints. Health checks to these internal endpoints are typically performed by attempting to either create a connection to the endpoint or a simple “ping” check.

Type 4 Load balancer

A Type 4 load balancer will usually provide it’s functionality over a service that is exposed as a port on an IP address. The most common load balancer of this type is usually a web traffic load balancer that will expose traffic on the IP address of the load balancer through TCP ports 80 (un-encrypted) and 443 for (SSL encrypted web traffic). However this type of load balancer isn’t only restricted to web based traffic and can load balance connections to other services that listen on other TCP ports.

To ensure that the pool of backends are “health” a Type 4 load balancer has the capability to provide that not only does a backend accept connections, but also that the application is behaving as expected.

Example

Un-Healthy Application

Ping endpoint (successful)
Attempt to read data (a lot of web applications will expose /health URL that can determine if the application has completed its initialisation), if the application isn’t ready then it will return a HTTP error code 500 (anything >300 is typically not healthy)

Type 7 Load balancer

A Type 7 load balancer has to have knowledge about a particular protocol (or application knowledge) in order for it to provide the load balancing to a particular pool of endpoints. This particular load balancer if often found in-front of large websites that host various services under the same domain name.

Example

Consider these two urls:

Under the previous two types of load balancer, all traffic will only go to the same pool of servers regardless of the actual URL requested by the end user.

In a Type 3 load balancer:

example.com –> resolves to –> load balancer IP –> which then selects a server from –> endpoint pool

In a Type 4 load balancer:

https://example.com –> resolves to –> load balancer IP –> and the https:// resolves to port 443 –> which then selects a server from –> endpoint pool

The Type 7 load balancer however employs the knowledge of the protocol/application and can perform the load balancing based upon certain levels of application behaviour. In this example the load balancer can examine all of the networking decisions to direct traffic to the correct pool of servers, however it can also make additional decisions now based upon things like application behaviour or client request.

In a Type 7 load balancer:

https://example.com/finance –> resolves to –> load balancer IP –> and the https:// resolves to port 443 –> load balancer identifies traffic as http/s and can parse behaviour –> /finance read as the URI –> Selects from the “finance” endpoint pool.

Existing load balancers and examples

There has been a large market for load balancers for a number of decades now as the explosion of web and other highly available and scalable applications has driven demand. Originally the lions share of load balancers were typically hardware devices, and would sit with your switching, routing, firewall and other network appliances. However as the speed of change and requirements for quick and smaller load balancers has taken hold, the rise of the software based load balancer has become prevalent. Originally the software load balancer was limited due to consumer hardware limitations and missing functionality, however with hardware offloads on NICs and in-built crypto in CPUs a lot of these issues have been removed. Finally, as cloud services have exploded so has the LBaaS (load balancer as a service) providing an externally addressable IP that can be easily pointed at a number of internal instances through wizards/APIs or through the click of a few buttons.

A Quick Go example of a Load Balancer

Below is a quick example of writing your own http/s load balancer, that will create a handler function that will handle the passing of traffic to one of the backend servers and return the results back to the client of the load balancer. The two things to consider here are the backendURL() function on the second line the the req. (HTTP Request) modifications.

backendURL() - This function will choose a server from the available pool and use it as a backend server
req.XYZ - Here we’re re-writing some of the HTTP request so that the backend returns the traffic back to the correct client.

handler := func(w http.ResponseWriter, req *http.Request) {

		url, _ := url.Parse(backendURL())

		proxy := httputil.NewSingleHostReverseProxy(url)

		req.URL.Host = url.Host
		req.URL.Scheme = url.Scheme
		req.Header.Set("X-Forwarded-Host", req.Host)
		req.Host = url.Host

		proxy.ServeHTTP(w, req)
	}

	mux := http.NewServeMux()
	mux.HandleFunc("/", handler)
	http.ListenAndServe(frontEnd, mux)

HAProxy load balancing example for Kubernetes API server

You will find below a simple example for haproxy -> /etc/haproxy/haproxy.cfg that will load balance from the haproxy VM/server to the two Kubernetes control plane nodes.

frontend kubernetes-api
    bind <haproxy address>:6443
    mode tcp
    option tcplog
    default_backend kubernetes-api

backend kubernetes-api
    mode tcp
    option tcplog
    option tcp-check
    balance roundrobin
    default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100

        server k8s-master1 192.168.2.1:6443 check
        server k8s-master2 192.168.2.2:6443 check

Nginx load balancing example for Kubernetes API server

You will find below a simple example for nginx -> /etc/nginx/nginc.cfg that will load balance from the nginx VM/server to the two Kubernetes control plane nodes.

stream {
    upstream kubernetes-api {
        server 192.168.2.1:6443 weight=5 max_fails=3 fail_timeout=30s;
        server 192.168.2.2:6443 weight=5 max_fails=3 fail_timeout=30s;
    }

    server {
        listen 6443;
        proxy_connect_timeout 1s;
        proxy_timeout 3s;
        proxy_pass kubernetes-api;
    }
}

Next steps

This entire post was actually leading up to load balancing the Kubernetes API server with nftables, so click here to read that next. :-)