thebsbdox

No longer an actual BSD box

Pre-requisites

Before you can begin to create a Cluster-API provider there are a few tools and utilities that have to be in place to allow you to make your own provider.

GOLANG

At this time kubebuilder and the resulting examples all produce code in GO, so it makes sense to follow that path.

kubebuilder

The kubebuilder tool is used to handle the management and creation of CRDs (Custom resource definitions) and the controllers that manipulate the objects created from a resource definition. The installation information can be found here https://book.kubebuilder.io/quick-start.html. We will use kubebuilder in order to build our Machine, Cluster and Bootstrap CRDs and Controllers.

Cluster-API

Cluster-API is an upstream Kubernetes project that allows the extending of Kubernetes so that it can manage infrastructure resources and Kubernetes clusters much in the same way that it would manage the components of an application hosted on Kubernetes.

It can be installed by applying the manifest kubectl create -f https://github.com/kubernetes-sigs/cluster-api/releases/download/v0.2.6/cluster-api-components.yaml (correct as of 28th Oct’ 2019). This manifest will install the following:

  • CRDs for Machines, Clusters, MachineSets, MachineDeployments and Bootstraps
  • Controllers to handle all of the above resources
  • ClusterRole Bindings that will allow the above controllers to manipulate other controllers that are extensions of Cluster-API. (important)

If we examine this ClusterRole we can see what it is allowed to work on:

1
2
3
4
5
6
7
8
9
10
11
12
kubectl describe clusterrole capi-manager-role | grep cluster

*.bootstrap.cluster.x-k8s.io [] [] [create delete get list patch update watch]
clusters.cluster.x-k8s.io/status [] [] [create delete get list patch update watch]
clusters.cluster.x-k8s.io [] [] [create delete get list patch update watch]
machinedeployments.cluster.x-k8s.io/status [] [] [create delete get list patch update watch]
machinedeployments.cluster.x-k8s.io [] [] [create delete get list patch update watch]
machines.cluster.x-k8s.io/status [] [] [create delete get list patch update watch]
machines.cluster.x-k8s.io [] [] [create delete get list patch update watch]
machinesets.cluster.x-k8s.io/status [] [] [create delete get list patch update watch]
machinesets.cluster.x-k8s.io [] [] [create delete get list patch update watch]
*.infrastructure.cluster.x-k8s.io [] [] [create delete get list patch update watch]

The final line details that we will have access to the resource *.infrastructure.cluster.x-k8s.io (with an asterisk), this resource is blanket statement for the other Cluster-API providers and will be covered a little bit more when we create a provider with kubebuilder.

Building

To build a Cluster-API provider we can make use of the model that exists within kubebuilder and extend it so that it’s both aware of Cluster-API and so that Cluster-API can drive the provider.

Initialise the repository

We will need a workspace in order to create our Cluster-API provider, so we will create our directory mkdir cluster-api-{x}; cd cluster-api-{x}. If this is outside of the $GOPATH, we can examine this by looking at go env then we will need to create a go.mod which we can do with go mod init {x}.

Create the CAP{x} project

Once our directory is created and we’ve initialised our go environment, we will use kubebuilder to define the initial project.

Important: If we look at the domain, we need to specify that it needs to be the same domain that is specified above as part of the clusterRole bindings for capi.

kubebuilder init --domain cluster.x-k8s.io --license apache2 --owner "The Kubernetes Authors"

Creating CRDs/Controllers with kubebuilder

With everything ready we can now define our Custom Resource Definitions and the Controllers that will manipulate them.

If we break down the command flags:

  • --kind The type of resource we are defining (needs a capital letter to define it)
  • --group The group these resources will live under
  • --resource Create the CRD
  • --controller Create the Controller code
  • --version The version of our CRD/Controller we’re defining

If we were creating a Cluster-API Provider called example then the command would look something like :

kubebuilder create api --kind ExampleCluster --group infrastructure --resource=true --controller=true --version v1alpha1

Important note

Which would create our resource => exampleCluster.infrastructure.cluster.x-k8s.io, which we can see looking back at the clusterRole bindings allows capi to manipulate it.

Create the Cluster Controller/Resource

kubebuilder create api --kind {x}Cluster --group infrastructure --resource=true --controller=true --version v1alpha1

Create the Machine Controller/Resource

kubebuilder create api --kind {x}Machine --group infrastructure --resource=true --controller=true --version v1alpha1

main.go

Add clusterv1 "sigs.k8s.io/cluster-api/api/v1alpha2" to import(s)

Add _ = clusterv1.AddToScheme(scheme) to init()

API Definitions in /api/v1alpha1

{x}cluster_types.go

Add Finalizer to {x}cluster_types.go

1
2
3
4
5
const (
// ClusterFinalizer allows {x}ClusterReconciler to clean up resources associated with {x}Cluster before
// removing it from the apiserver.
ClusterFinalizer = "{x}cluster.infrastructure.cluster.x-k8s.io"
)

Add additional fields to Status

1
2
3
4
5
6
7
8
9
// Ready denotes that the docker cluster (infrastructure) is ready.
Ready bool `json:"ready"`

// INSERT ADDITIONAL STATUS FIELD - define observed state of cluster
// Important: Run "make" to regenerate code after modifying this file

// APIEndpoints represents the endpoints to communicate with the control plane.
// +optional
APIEndpoints []APIEndpoint `json:"apiEndpoints,omitempty"`

Cluster specific endpoints

1
2
3
4
5
6
7
8
// APIEndpoint represents a reachable Kubernetes API endpoint.
type APIEndpoint struct {
// Host is the hostname on which the API server is serving.
Host string `json:"host"`

// Port is the port on which the API server is serving.
Port int `json:"port"`
}

{x}machine_types.go

TODO

Controllers

Cluster Controller /controller/{x}cluster_controller.go

Modify Imports

Change infrastructurev1alpha1 <import path> to infrav1 <import path>, this will make the code easier to re-use in the future and to share code with other Infrastructure providers.

Define Cluster Controller Name

1
2
3
const (
clusterControllerName = "{x}cluster-controller"
)

Modify the Reconcile function (part 1: context, logging and getting our object)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
func (r *{x}ClusterReconciler) Reconcile(req ctrl.Request) (_ ctrl.Result, rerr error) {

ctx := context.Background()
log := log.Log.WithName(clusterControllerName).WithValues("{x}-cluster", req.NamespacedName)

// Create an empty instance of a {x} Cluster object
{x}ClusterObj := &infrav1.{x}Cluster{}

// Fetch the out Cluster object
if err := r.Client.Get(ctx, req.NamespacedName, {x}ClusterObj); err != nil {
if apierrors.IsNotFound(err) {
return ctrl.Result{}, nil
}
return ctrl.Result{}, err
}

Modify the Reconcile function (part 2: Find the Cluster-API cluster)

1
2
3
4
5
6
7
8
9
10
11
12
13
// Fetch the Cluster API Parent.
cluster, err := util.GetOwnerCluster(ctx, r.Client, {x}ClusterObj.ObjectMeta)
if err != nil {
return ctrl.Result{}, err
}

if cluster == nil {
log.Info("Waiting for Cluster Controller to set OwnerRef on {x} Cluster")
return ctrl.Result{}, nil
}

// Enable Logging to refence the Cluster-API cluster
log = log.WithValues("cluster", cluster.Name)

Modify the Reconcile function (part 3: Create a defer to patch the object)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
// Initialize the patch helper
patchHelper, err := patch.NewHelper({x}ClusterObj, r)
if err != nil {
return ctrl.Result{}, err
}

// Always attempt to Patch the {x}Cluster object and status after each reconciliation.
defer func() {
if err := patchHelper.Patch(ctx, {x}ClusterObj); err != nil {
log.Error(err, "failed to patch {x}Cluster Object")
if rerr == nil {
rerr = err
}
}
}()

Modify the Reconcile function (part 4: Act on the {x} cluster object)

1
2
3
4
5
6
7
8
9

// Handle deleted clusters
if !{x}Cluster.DeletionTimestamp.IsZero() {
return r.reconcileClusterDelete(log, {x}ClusterObj)
}

return r.reconcileCluster(log, cluster, {x}ClusterObj)

// End of Reconcile function

Additional functions

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
func (r *{x}ClusterReconciler) reconcileCluster(logger logr.Logger, cluster *clusterv1.Cluster, {x}Cluster *infrav1.{x}Cluster) (_ ctrl.Result, reterr error) {
logger.Info("Reconciling Cluster")

if !util.Contains({x}Cluster.Finalizers, infrav1.ClusterFinalizer) {
{x}Cluster.Finalizers = append({x}Cluster.Finalizers, infrav1.ClusterFinalizer)
}

// RECONCILE LOGIC

// IMPORTANT - Setting this status to true means that it is recognized as provisioned / ready
{x}Cluster.Status.Ready = true

return ctrl.Result{}, reterr
}

func (r *{x}ClusterReconciler) reconcileClusterDelete(logger logr.Logger, {x}Cluster *infrav1.{x}Cluster) (_ ctrl.Result, reterr error) {
logger.Info("Deleting Cluster")

// DELETE LOGIC

// Filter out this cluster from the list of finalizers (remove the object from Kubernetes)
{x}Cluster.Finalizers = util.Filter({x}Cluster.Finalizers, infrav1.ClusterFinalizer)

return ctrl.Result{}, reterr
}

Code Sharing tooling

As part of a shared coding demo I’m hoping to deliver in the near future I decided to document some of the tooling that seems to work pretty well for me and how I’m using it!

ttyd

The ttyd tool allows for sharing a terminal session over a web browser, and works fantastically! The github repository for ttyd is https://github.com/tsl0922/ttyd.

Installation

  • Grab the binary releases from the release page and de-compress the archive
  • Move the binary to a path it can be ran from and ensure it is named ttyd

Usage

The usage of ttyd is well explained on the github page, however for quick usage it’s typically ttyd -p <port num> [cmd]. Any user can now connect to the IP address where ttyd is running (on the specified port) and they will instantly access the program specified at the end of the command [cmd].

Shared session with ttyd

One of the requirements of what I want to do is to have all users share the same session, and for that session to be read-only. Luckily we can easily do this with an additional utility called screen.

Create a shared screen session

The first thing we need to do is to create our shared screen session and give it a name, which we can do with the following command:

screen -S ttyd

This will create our screen session that we’ve named ttyd and can be easily viewed with screen -ls.

Read-only ttyd sessions

The behaviour that we went from when a user connects to ttyd in their browser is to connect and view the one master share and have it read-only. We can accomplish this with using the screen -x <session> command and starting ttyd in read-only mode. The following command will start ttyd on a particular port, in read-only mode -R and when a client connects it will connect to the master screen session.

ttyd -R -p <port num> screen -x ttyd.

Keppler

Keppler is a fantastic tool for allowing people to remotely see code updates in a very friendly UI, along with browsing the differences as the code is modified. The github repository for keppler is https://github.com/brunosimon/keppler

To make life easier for myself, i’ve wrapped the usage of keppler in a docker container so I can just move to a directory where I want to expose what I’m working on and just run an alias keppler!

Fixes to file watchers

On Linux the below is required so that keppler wont panic when trying to monitor source code files changing.

1
2
3
echo fs.inotify.max_user_instances=524288 | sudo tee -a /etc/sysctl.conf && sudo sysctl -p
echo fs.inotify.max_user_watches=524288 | sudo tee -a /etc/sysctl.conf && sudo sysctl -p
echo fs.inotify.max_queued_events=524288 | sudo tee -a /etc/sysctl.conf && sudo sysctl -p

Keppler Dockerfile

Below is my Docker file that will create an image with everything needed to start keppler and expose it on port 1234.

1
2
3
4
5
FROM node
RUN npm install -g keppler
RUN mkdir /keppler
WORKDIR /keppler
CMD keppler "keppler" --port 1234

Build the dockerfile with the below command, which will create our image keppler:1.0.

docker build -t keppler:1.0 .

Keppler alias

The shell alias below will create automatically start the keppler container and expose the contents of the directory where it is started.

1
alias keppler="docker run --rm -d -v \`pwd\`:/keppler -p1234:1234 --name keppler keppler:1.0"

Stopping Keppler

As it is running as a docker container, it can be simply stopped with the command:

docker stop keppler

Modernising Applications

The current trend of application modernisation is somewhat of a misnomer, especially when we consider the typical end-results from the modernisation procedure:

Typical results:

  • Application code-unchanged
  • Operating environment (libraries and dependancies) largely unchanged
  • Application behaviour largely un-modified
  • Application network interactions typically un-modified

The actual result tends to be Application environment modernisation, where the application is re-platformed onto a modern platform but is still behaving in a “perhaps” un modern pattern. This does allow a number of benefits:

  • Finally getting an old application running in the same environment, but on a modern platform
  • Getting the application monitored by modern tooling
  • Adding the application into a consolidated platform, increased packing (utilisation)
  • The application can be monitored for health and behaviour and restarted
  • Application is now packaged as a container for ease of movement and re-deployment

The next questions is how do we identify an application that can be “modernised”, a lot depends on what remains of the application. With Source code, we can review it’s behaviour and perhaps make changes to make it more cloud-native. However without that, then we need to adopt an investigative route to work out what to do next.

Source code analysis

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import "file.h"

func main() {

int statePath;
int byteCount;
statePath = open("/state.dat");

read(fd);
// Update
// Write back result
byteCount = write("", fd);
//return successfully
return 0;
}

The above code-snippet is an small example of a program (written in C) that performs a few simple actions:

  • Attempt to open a file state.dat
  • Read the contents of this file (as an integer/number)
  • Increment this value
  • Write the result back to the file

We can start to infer a few things by reading through the source code:

Dependancies

Looking at the source code of any application from the perspective of what its dependencies are immediately can start to give clues as to its behaviour. The above example imports a header called file.h that immediately leads to a conclusion that this application may be reading/writing to an underlying filesystem. (note: reading from a filesystem is perfectly accepted behaviour for a stateless application)

Variable analysis

This type of analysis is completely dependent on the “hope” that the original developer chose to utilise variable names that were descriptive enough to be both human readable and parsable by some level of analysis tool. In most programming languages variables are defined using a protected keyword (or token) such as int or bool to define the type of variable and a name for this variable. With this knowledge we can scan through the source code to find variable definitions that can start to define some level of application behaviour.

Code analysis (behaviour and external function calls)

The previous two methods for starting to understand application behaviour can be very quick to infer how the application will behave at runtime, especially in even large source files the dependencies are typically only declared once (at the top of the source code) regardless of the number of lines of code. However to truly understand what the application behaviour entails, we will need to analyse what functions the code is making use of and how it is calling them.

If we consider the above snippet we can see that it is making use of a header that we know is used for file based operations, we can now start to analyse the code in more detail to determine what operations the application will be performing. A simple scan should reveal that there are three operations above that are attributed to file based behaviour open(), read() and write().

As mentioned above opening and reading files is perfect behaviour for a stateless application, think of a stateless web server that is only serving assets such as images. Every request will result in an open("/path/to/image") followed by a read() function to read the image data from the file so that it can be served to an end user.

However just by the naming of the function write() we can see infer a behaviour that will attempt to mutate the environment where this code is running and what it is interacting with. Luckily this naming of function call can be seen across functions in various languages, such as GO, python, nodejs and a number of others.. although in reading about other languages I came across a perl example (which i’ve not used in many years) which involves using the print command to a file handle. This exists in some of the prior examples but typically wouldn’t be the average behaviour.

No Source code, No Problem (perhaps)

Without source code, we still can determine the behaviour of a program and it’s a approach to immutability (and permissions) by doing the following:

  • Running in a read only environment and watching the behaviour
  • Running in a container, restarting and examining the resulting behaviour
  • Both scenarios in combination with a tool like strace to watch for what caused the failed behaviour

The first example will result in typically one of two scenarios.. the program will behave as expected taking input from somewhere, processing it in memory and sending the results somewhere else .. or .. reading the input, attempting to open a handle to where results will be stored and then .. error messages.

The second example .. has caught out, and will continue to catch people out since this paradigm began. A container environment is a brand new hire car, a brand new hotel room (built the same way every time you use it). If you left anything in the hire car/hotel room when you last used it .. well it’s gone (but the room was exactly the same, just it was a new copy).

This is the second post around developing Go code against the vSphere/vCenter APIs using govmomi, if you’re new to this then I would recommend you start with https://thebsdbox.co.uk/2019/07/04/First-steps-with-Govmomi to fully grasp the basic concepts.

Extending our example

The example that we built in the previous post was a very simplistic example that contained only the basic functionality to log into a vSphere/vCenter server, other than testing a users credentials it had no real functionality. In this post we will extend its functionality in order for it to actually interact with various objects and types that exist in both vSphere and vCenter.

Proposed functionality

The next version of our demo application will have the following functionality:

  • Retrieve objects
  • Select objects via type VM/Network/Storage
  • Implement RegEX filters to sort results

The credentials will be passed via environment variables, and the the search parameters will be passed in through CLI flags.

The Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
func parseCredentials(v *vc) (*url.URL, error) {

// Check that an address was actually entered
if v.address == "" {
return nil, fmt.Errorf("No VMware vCenter URL/Address has been submitted")
}

// Check that the URL can be parsed
u, err := url.Parse(v.address)
if err != nil {
return nil, fmt.Errorf("URL can't be parsed, ensure it is https://username:password/<address>/sdk")
}

// Check if a username was entered
if v.username == "" {
// if no username does one exist as part of the url
if u.User.Username() == "" {
return nil, fmt.Errorf("No VMware vCenter Username has been submitted")
}
} else {
// A username was submitted update the url
u.User = url.User(v.username)
}

if v.password == "" {
_, set := u.User.Password()
if set == false {
return nil, fmt.Errorf("No VMware vCenter Password has been submitted")
}
} else {
u.User = url.UserPassword(u.User.Username(), v.password)
}
return u, nil
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
func (i *vcInternal) parseInternals(c *govmomi.Client) error {

ctx, cancel := context.WithCancel(context.Background())
defer cancel()

// Create a new finder that will discover the defaults and are looked for Networks/Datastores
f := find.NewFinder(c.Client, true)

// Find one and only datacenter, not sure how VMware linked mode will work
dc, err := f.DatacenterOrDefault(ctx, "")
if err != nil {
return fmt.Errorf("No Datacenter instance could be found inside of vCenter %v", err)
}

// Make future calls local to this datacenter
f.SetDatacenter(dc)

// Find Datastore/Network
i.datastore, err = f.DatastoreOrDefault(ctx, i.findDataStore)
if err != nil {
return fmt.Errorf("%v", err)
}

i.dcFolders, err = dc.Folders(ctx)
if err != nil {
return fmt.Errorf("Error locating default datacenter folder")
}

// Set the host that the VM will be created on
i.hostSystem, err = f.HostSystemOrDefault(ctx, i.findHost)
if err != nil {
return fmt.Errorf("%v", err)
}

// Find the resource pool attached to this host
i.resourcePool, err = i.hostSystem.ResourcePool(ctx)
if err != nil {
return fmt.Errorf("Error locating default resource pool")
}

i.network, err = f.NetworkOrDefault(ctx, i.findNetwork)
if err != nil {
return fmt.Errorf("Network could not be found")
}

return nil
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
//VMInventory will create an inventory
func VMInventory(c *govmomi.Client, sortVMs bool) ([]*object.VirtualMachine, error) {

ctx, cancel := context.WithCancel(context.Background())
defer cancel()

// Create a new finder that will discover the defaults and are looked for Networks/Datastores
f := find.NewFinder(c.Client, true)

// Find one and only datacenter, not sure how VMware linked mode will work
dc, err := f.DatacenterOrDefault(ctx, "")
if err != nil {
return nil, fmt.Errorf("No Datacenter instance could be found inside of vCenter %v", err)
}

// Make future calls local to this datacenter
f.SetDatacenter(dc)

vms, err := f.VirtualMachineList(ctx, "*")

if sortVMs == true {
// Sort function to sort by name
sort.Slice(vms, func(i, j int) bool {
switch strings.Compare(vms[i].Name(), vms[j].Name()) {
case -1:
return true
case 1:
return false
}
return vms[i].Name() > vms[j].Name()
})
}

return vms, nil
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
func searchVMS(searchString string, v []*object.VirtualMachine) ([]*object.VirtualMachine, error) {

var newVMList []*object.VirtualMachine
var err error
for x := range v {
matched, err := regexp.MatchString(searchString, v[x].Name())
if err != nil {
break
}
// If the regex matches then add it to the new subset
if matched == true {
newVMList = append(newVMList, v[x])
}
}
if err == nil {
return newVMList, nil
}
return nil, err
}

This guide will assume that there is at least the basic familiarity to install the GO (golang) compiler tools, anyone looking for a start will find pretty much everything they need here -> https://golang.org/doc/install

What is Govmomi

The VMware virtual machine products (vSphere/ vCenter) expose the functionality through an API that is documented here -> (https://www.vmware.com/support/pubs/sdk_pubs.html). The technology that these APIs are exposed through is called SOAP (https://en.wikipedia.org/wiki/SOAP) and can be a little bit of a headache to interact with directly. However to ease the lives of anyone wanting to develop code against this SOAP api, VMware have released SDKs and libraries in a variety of languages (python, Java, .Net etc..). The Govmomi library is a GO package that provides the majority of functionality that an end user needs in order to communicate with the VMware vSphere/vCenter APIs.

Getting govmomi

All of the required source code for the VMware vSphere/vCenter GO SDK can be found on github at the following url -> (https://github.com/vmware/govmomi) as mentioned on the README has the go command for pulling the govmomi package.

go get -u github.com/vmware/govmomi

Note: The -u flag instructs go to pull any named dependencies.

To ensure that the govmomi packages were downloaded correctly we should be able to see all the code source tree by running the following command:

ls -la $(go env GOPATH)/src/github.com/vmware/govmomi

Hello World

Just to check that our environment is working as expected lets quickly create the hello world example to just to verify your GO environment. The following code snipped will create a file called govmomi/example.go which we can then use to ensure that things are working as expected.

The following commands can be copy and pasted into a terminal and will create a local govmomi directory and create the example.go file that we will use for development moving forward.

Hello World source code

1
2
3
4
5
6
7
8
9
10
mkdir govmomi; cd govmomi
cat <<EOF > example.go
package main

import "fmt"

func main() {
fmt.Println("Hello world")
}
EOF

To run the above code we can use the following command go run example.go, and we should see Hello world printed to STDOUT.

Logging into VMware vSphere or vCenter

The following code snippet will extend on our previous example, and will make use of additional packages including the govmomi package to provide our required functionality.

Note: This code uses the url of the vSphere/vCenter instance were trying to interact with. This url will always require the following formatting:

protocol://username:password@hostname/sdk

A correct example would be:

https://administrator@vsphere.local:pass1234@vcenter.devlab.local/sdk

Updated source code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
package main

import (
"context"
"flag"
"fmt"
"net/url"
"github.com/vmware/govmomi"
)

func main() {
vURL := flag.String("url", "", "The URL of a vCenter server")
flag.Parse()

u, err := url.Parse(*vURL)
if err != nil {
fmt.Printf("Error parsing url %s\n", vURL)
return
}

ctx, cancel := context.WithCancel(context.Background())
defer cancel()

c, err := govmomi.NewClient(ctx, u, true)
if err != nil {
fmt.Printf("Logging in error: %s\n", err.Error())
return
}

fmt.Println("Log in successful")
c.Logout(ctx)
}

To understand this code we will deconstruct the various important sections.

Imports

To provide the required functionality were pulling in a lot more packages, for a bit more understanding I’ve added either the project URL or the godoc location.

Flags

There are a number of different methods a developer could use to pass data into a program, it could even be hardcoded into the code itself (not recommended). The main options for getting data into a simple CLI program are:

  • Read the data from a file
  • Pass data from environment variables
  • Pass data to the program at runtime through command line flags

In our example we will be using flags thats are provided through the standard library and work in a key/value fashion. The key being the name of the flag -url and the value being data we pass to this key https://{...}.

The following two lines taken from the code above will handle the user input from the CLI.

This first line will declare a variable vURL that will be assigned from the flag named url.
vURL := flag.String("url", "", "The URL of a vCenter server")

The second line will parse all of the created flags to ensure that they’re read correctly or assigned their default values etc..
flag.Parse()

URL Parsing

The govmomi client requires a URL object not just a string representation of the vCenter URL. So we will use the net/url library to parse the string taken from the flag -url as done with the line u, err := url.Parse(*vURL)

Context

The two lines around context (explained above) are used to ensure that a state is shared across multiple APIs, functions and goroutines. In our simple example we don’t need to delve into this in two much detail, but if were to imaging that there were multiple operations taking place such as processing some data, logging into vCenter etc.. and one of the operations failed for a reason, the context would be used to share the fact that all of the other operations sharing that context needs cancelling.

Creating a vCenter/vSphere client

At this point in the code we’ve taken the VMware vCenter URL from a CLI flag and we’ve set up a context to handle a shared state, we will now use these in order to create our vCenter client with the line c, err := govmomi.NewClient(ctx, u, true).

Quickly deconstructing this line:

  • c, err - Return the client object c and an error object err (more or err in a second)
  • govmomi.NewClient - Call the function from the govmomi package
  • ctx - Pass in the shared context
  • u - Pass in the “parsed” url (ultimately taken from the -url string)
  • true - A Boolean value true/false that dictates if the client will tolerate an insecure certificate (self-signed)

If the function call doesn’t return an error object then the login was successful so we print a message to STDOUT and call the logout method on the govmomi client object.

Error handling

A lot of functions in GO will typically return more than one variable/object and the majority of them will return an object of type error, we can see this by looking at the returns from the url.Parse() and govmomi.NewClient() function calls. In the event of a function being successful then the function will return nil in the place of an error object. However when things go wrong then a function should create a new error object with the appropriate error details/messaging.

To check if a function has failed the boiler plate code would typically look like:

1
2
3
4
if err != nil {
// Write error message / parse err object
}

Running our code

To run our code is very simple, we have two options we can compile it directly into a binary or run it directly from the source

Build a binary

We will build our code into a program called govmomi with the command go build -o govmomi we can now run our code with the command:

1
2
3
go build -o govmomi
./govmomi -url "https://dan@vsphere.local:CorrectPass@vcenter.devlab.local/sdk"
Log in successful

Run from source

To run directly from the source code we use the go run command as shown below:

1
2
go run example.go -url "https://dan@vsphere.local:CorrectPass@vcenter.devlab.local/sdk"
Log in successful

Next Steps

If everything worked as expected then there should be a success message printed on screen, in the next post we will look at extending our very basic example into something that offers a lot more functionality.

BONUS POINTS

In this example we pass in the URL through the use of flags on the CLI, another user on the same host will be able to find the vCenter URL and credentials by looking at teh list of processes. The use of environment variables will keep that hidden and within a users session.

Think about removing the flag.String and look at os.Genenv :-)

Further reading -> https://golang.org/pkg/os/#example_Getenv

Clearing up

To remove all of the code examples that we created we will just need to remove the govmomi folder we created in the Hello World section.

… yes that was an attempt to make the title rhyme 🙁

tl;dr make an executable smaller by hiding your code inside the header of the executable… read on for the gory detail. The Gist with the OpCodes can be found here -> https://gist.github.com/thebsdbox/29e395299f89b52214b66269f5b33f7d

There was a great post recently from Dieter Reuter around building the smallest possible Docker Image, which I thought posed an interesting idea mainly due to some of the crazy sizes of Docker images I keep having to deal with. I decided that I would join in with the challenge and see where I could shrink both a binary and by association the resulting Docker Container down to it’s smallest possibility.

I created a number of binaries during my playing around, below is a list of five of them that all print to STDOUT the following text “Hello Docker World!\n” If you’re not familiar with escaped characters, the \n simply is the newline character.

*I realise I capitalised some of the strings by accident, but ‘h’ still occupies the same space as ‘H’ 😉

Initial failure

Before I delve into the steps I went through to make the small container, it’s worth pointing out that there is a fatal flaw in one of the above binaries when placed in a SCRATCH container. hint there is a duplicate binary with the suffix _STATIC 🙂

The reason that the hello_in_C will fail to run in the SCRATCH container is that it has dynamic requirements on a number of system libraries. Most notably is libc, which is the base C library that contains a lot of basic day-to-day code that provides the standard functionality to C programs. If we were to place this into a Docker container the following would be the result:

1
2
$ docker run -it --rm hello:C
standard_init_linux.go:178: exec user process caused “no such file or directory”

We can examine binaries to check for external dependencies using the ldd tool to see what external libraries are needed to run the binary file. Alternatively, we can use volume mapping to pass the host Operating System libraries into the SCRATCH container -v /lib64:/lib64:ro, this will provide the libraries required for this particular executable to successfully execute.

1
2
docker run -v /lib64:/lib64:ro -it --rm hello:C
Hello Docker World!

To permanently fix this issue is quite simple and requires building the C binary with the -static compile-time flag (the package glibc-static will be required), this quite simply will bundle all code into a single file instead of relying on external libraries. This has to knock on effect of making the binary easier to run on other systems (as all code is in one place) however the binary has now increased in size by 100 times… which is the opposite of what we’re trying to accomplish.

What makes an Executable

Ignoring MS-DOS .com files that no-one has touched and hasn’t been supported in years, most executables regardless of Operating System typically consist of a header that identifies the executable type (e.g. elf64, winPE) and a number of sections:

  • .text, code that can be executed
  • .data, static variables
  • .rodata, static constants
  • .strtab / .shstrtab, string tables
  • .symtab, symbol tables.

The Executable header will contain an entry that points to the beginning of the .text section, which the Operating System will then use when the executable is started to find the actual code to run. This code then will access the various bits of data that it needs from the .data or .rodata sections.

Basic overview of a “Hello Docker World!” execution process

  1. The exec() family functions will take the path of file and attempt to have the OS execute it.
  2. The Operating System will examine the header to verify the file, if OK it will examine the header structure an find the entry point.
  3. Once the entry point is found, the operating system will start executing the code from that point. It is at this point where the program itself is now running.
  4. The program will set up for the function to write the string to stdout
  5. Set string length
  6. Set the pointer to the string in the .data section
  7. Call the kernel
  8. Call the exit function (otherwise the kernel will assume the execution failed)

Strip out Sections

In the quick above diagram, we can see through the course of execution that there is a number of sections within the executable that aren’t needed. In most executables there may be debug symbols or various sections that apply to compilers and linkers that are no longer required once the executable has been put together.

In order to have a stripped executable, it can either be compiled with the -s flag (also make sure -g isn’t used, as this adds debug sections). Alternatively we can use the strip tool that has the capability to remove all non-essential sections.

1
2
3
4
$ strip --strip-all ./hello_in_C_STRIPPED
$ ls -la hello_in_C_ST*
-rwxrwxr-x. 1 dan dan 848930 Feb 28 15:35 hello_in_C_STATIC
-rwxrwxr-x. 1 dan dan 770312 Feb 28 18:07 hello_in_C_STRIPPED

With languages such as GO, there can be significant savings by stripping any sections that aren’t essential (although if you’re doing this for production binaries it should be part of your compile/make process).

Extreme Shrinking

The final option that will keep your hands clean for shrinking an executable is to make use of tools like UPX which adds a layer of compression to your executable shrinking what’s left of your stripped binary. Taking my original GO binary I went from:

  • go build hello_docker_world.go = 1633717 bytes
  • strip --strip-all = 1020296 bytes
  • upx = 377136 bytes

Clearly a significant saving in terms of space.

Getting your hands dirty

Everything that has been discussed so far has been compiled through standard build tools and modified with the compiler or OS toolchain that managed executables. Unfortunately we’ve reached as far as we can go with these tools, as they will always build to the ELF/OS standards and always create the sections that they deem required.

In order to build a smaller binary, we’re going to have to move away from the tools that make building executables easier and hand craft a tailored executable. Instead of sticking with the format of [header][code][data], we’re going to look at how we can hide our code inside the header.

Whilst there are some parts of the header that are a requirement, there are some that have to just be a non-zero value and others that are left blank for future use. This is going to allow us to simply change entries in the ELF header from legal values to the code we want to execute and the following will happen:

  1. The Operating System will be asked to execute the file
  2. The OS will read the ELF header, and verify it (even though some values don’t make sense)
  3. It will then find the code entry point in the header that points to the middle of the actual header 🙂
  4. The OS will then start executing from that point in the header, and run our code.

Explained Code below

This code pretty much fits just in the ELF header itself, so I have broken the header up and labelled the header fields and where we’ve hidden the code we want to execute.

First part of header (has to be correct)

Op Code Explanation
org 0x05000000 Set Origin address
db 0x7F, “ELF” Identify as ELF binary
dd 1 32-bit
dd 0 Little endiann
dd $$ Pointer to the beginning of the header
dw 2 Code is executable
dw 3 Instruction set (x86)
dd 0x0500001B
dd 0x0500001B Entry point for our code (section below)
dd 4

Broken Header / Our code

Op Code Explanation Further details
mov dl, 20 Address of Sections header Take 20 characters
mov ecx, msg From the string at this address
int 0x80 Elf Flag table Print them

Remaining header (has to be correct)

Op Code Explanation
db 0x25 Size of the Elf Header
dw 0x20 Size of the Program Header
dw 0x01 Entries in the Program Header
Remaining Code (now beyond the header)
Op Code Explanation
inc eax Set Exit function
int 0x80 Call it.

String section

msg db ‘Hello Docker world!’, 10

It’s also worth pointing out that this code won’t be fully “compiled”, as what is written above is actually binary format and therefore nasm will take the text and write out the binary code directly as written above.

Build and run the executable with:

1
2
3
4
$ nasm -f bin ./tiny_hello_docker.asm -o hello_docker_world
$ chmod +x ./hello_docker_world
$./hello_docker_world
Hello Docker world!

Further Reading

This wikipedia article covers all of the ELF standard in the most readable way I’ve come across:

https://en.wikipedia.org/wiki/Executable_and_Linkable_Format

A much more in-depth overview of hiding things in the ELF headers is available here:

http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html

Deconstructing an existing controller

In order to get a deeper understanding into the design and logic of a Kubernetes controller, we will start by deconstructing the example that comes with the Kubernetes source code. We will examine the structure of the source code, the code generation that automates a lot of the boiler plate and then the actual controller (controller.go) code that manages the logic.

Use-case of sample-controller

A Kubernetes controller is designed to take the functionality that the kubernetes scheduler has and apply its mechanisms to additional use-cases. The “original” controller was an etcd controller which allowed kubernetes to become aware of etcd and its mechanisms so that kubernetes could suddenly create/scale/upgrade and destroy etcd clusters correctly.

The sample-controller is a simplistic controller that creates a resource called Foo that defines a number of additional resources and constraints.

Core components of a controller

A controller can leverage a number features of the client-go library in order to efficiently interact with the Kubernetes API, this section will cover a few of the concepts exposed from the client library that are used by the sample-controller

Informer

Without the concept of the informer a controller would have to repeatedly “poll” the Kubernetes API for state information, this can be an anti-pattern that can cause additional load on the Kubernetes API and create controller overhead as it processes additional reply data from a poll. In order to provide an efficient method of determining the state of a resource in Kubernetes the client-go library has the concept of an informer. The informer allows a client to specify a particular resource and the operations to be “informed” about.

Create an informer

This will set up the informer, and then start with a channel to can signify when it needs to shut down.

1
2
kubeInformerFactory := kubeinformers.NewSharedInformerFactory(kubeClient, time.Second*30)
kubeInformerFactory.Start(stopCh)

Add handlers for various events

This will add three even handlers for add/updates/delete events on the deployments resource within kubernetes

1
2
3
4
5
kubeInformerFactory.Apps().V1().Deployments().Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
AddFunc: addDeploymentFunction(),
UpdateFunc: updateDeploymentFunction(),
DeleteFunc: deleteDeploymentFunction(),
})

Lister

A lister provides the capability to retrieve various objects (types) from the Kubernetes API, and is part of the informer that we created previously however instead of creating various callback functions the client invokes a “list” as part of a query. The Lister query typically will be invoked with two parameters, the namespace that the resource resides in and the name of the resource of a particular resource type.

1
2
3
4
5
deployment, err := c.deploymentsLister.Deployments(foo.Namespace).Get(deploymentName)
// If the resource doesn't exist, we'll create it
if errors.IsNotFound(err) {
deployment, err = c.kubeclientset.AppsV1().Deployments(foo.Namespace).Create(newDeployment(foo))
}

Getting the source code

There are two components that will be needed in order to build our Kubernetes controllers, one being the code generation tooling and the second being the actual code that makes up the sample controller.

Kubernetes Code Generation tools

go get -u k8s.io/code-generator/...

Note Why the ...

From the command go help packages:

An import path is a pattern if it includes one or more “…” wildcards, each of which can match any string, including the empty string and strings containing slashes. Such a pattern expands to all package directories found in the GOPATH trees with names matching the patterns. As a special case, x/… matches x as well as x’s subdirectories. For example, net/… expands to net and packages in its subdirectories.

Kubernetes Sample Controller

go get k8s.io/sample-controller

How the source code is organised

There are a few areas that are key to how the structure of this all makes sense, in order to understand this we will walk the directory structure and identify some of the key files. To begin we should be in the directory with the sample-controller source code is located (should be $GOPATH/go/src/k8s.io/sample-controller).

1
2
3
4
5
6
main.go
controller.go
|-pkg/
| |-apis/
| | |-samplecontroller/
| |-generated/

main.go

This file has the main() function that starts the controller logic and does the following:

  • Set up the signal handlers (So a ctrl+c or SIG{X} will be handled by the controller)
  • Parse the command line flags
  • Build up two client sets, and start two informers that watch for the signal handler
  • Build the controller configuration
  • Start the newly configured controller, and again pass the signal handler

controller.go

This source code file contains all of the logic and functions that provide all of the kubernetes controller functionality.

Function flow

c := Controller

c.Run() -> c.runWorker() -> c.processNextWorkItem() -> c.syncHander()

The two main functions that are invoked are syncHandler() and handleObject.

The syncHandler with be invoked with a namespace and object name that is taken from the work queue, it will then attempt to find the object in the Kubernetes API and process it.

  1. Split the key into a namespace/object name
  2. Use a Lister to find the object in the API server
  3. Parse the object and look for it’s deployment
  4. Use a Lister to find the deployment object referred to in the Foo resource
  5. If the deployment doesn’t exist create it
  6. If the Foo resource has changed update the deployment

The handleObject is invoked through the informer and will add things to the work queue that will be processed eventually by the syncHandler function.

pkg/apis/samplecontroller

In this folder are a number of files that identify the controller name, the controller version and then all of the information that the controller needs in order to define itself or interact with other systems. All of these files are then processed by the code-generator tools in order to build out the correct source to work as expected with Kubernetes.

Test the code generation

The sample-controller repository comes with pre-generated code already in the packages folder. We can teat the code generation tools by moving the generated code to a different location and re-generating out code.

The following will move the generated folder to a backup folder:

mv pkg/apis/generated pkg/apis/generated-backup

The update-codegen.sh script is a hardcoded script that will regenerate all the code for the sample controller.

./hack/update-codegen.sh

Test that our code compiles correctly from out generated code.

go build

Custom Resource Definition (CRD)

This will update the Kubernetes API to inform it that a new resource of type Foo exists and is controlled by the controller samplecontroller.k8s.io version v1aplpha1.

1
2
3
4
5
6
7
8
9
10
11
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: foos.samplecontroller.k8s.io
spec:
group: samplecontroller.k8s.io
version: v1alpha1
names:
kind: Foo
plural: foos
scope: Namespaced

Deployment

This deployment will update the expected state in the Kubernetes API, which will be processed by the controller.

1
2
3
4
5
6
7
apiVersion: samplecontroller.k8s.io/v1alpha1
kind: Foo
metadata:
name: example-foo
spec:
deploymentName: example-foo
replicas: 1

Additional Resources

Details around the sample controller repository

Other Kubernetes controller resources

I’ve been running an iPad Pro as a travel device for about six months now and I though I would make a quick post that details some of the ways i’ve added it into my workflow and a few of the apps that I use on a daily basis.

Hardware

I’ve had a few iPads over the years ranging from the first iPad in 2010 and an iPad mini 2 that has served as my travel entertainment device for a long time. After toying with the idea for some time I decided to opt for the 11 inch iPad Pro (https://www.apple.com/uk/ipad-pro/) as a replacement for my old and tired iPad mini, but also perhaps something that could serve as an in-between when I was in confined spaces (planes/trains) or didn’t have my laptop.

The iPad as a piece of hardware is pretty incredible and with the removal of the home button has oodles of screen real-estate. The Face ID functionality has been flawless for unlocking and payments, however the camera is in a bit of annoying place when using the iPad in landscape mode. The number of times, i’ve held my iPad by the side and accidentally obscured the camera I can’t count.

The battery life has been pretty stable and with moderate usage and 4G the iPad will quite happily last the entire day with plenty of battery to spare. Additionally with the inclusion of USB-C I can charge the iPad with the same charger as my MacBook Pro in a very short period of time. One additional feature that has been very useful is the ability to use a USB-C to lightening cable and have the iPad charge my iPhone whilst on the move.

Finally, the Smart Keyboard Folio works as a fantastic keyboard for a good few hours of typing and makes using the iPad a lot more like using a laptop (along with expected keyboard shortcuts e.g. command-Tab). The one complaint with the Smary Keyboard is that it only protects the front and the back, however three sides of the iPad have no protection so it’s quite easy to aquire scratches and marks.

The small form factor means that the iPad can be used quite comfortably even on the most budget of “budget flights”, the image above is from a Ryan air flight where I happily could edit various files.

Productivity

My day to day technology stack is pretty average for most enterprise companies and luckily there exists pretty good replica iOS applications for the iPad:

  • Office 365 apps (Outlook)
  • Slack
  • Google drive app / Google docs
  • Trello

All of these are “good enough” for medium-to-light usage when a laptop and the “full fat” application isn’t present.

Remote Access

My typical day-to-day tasks usually require the management and usage of a variety of remote systems, there are a few native utilities on the iPad that really make this easy. Two of the best applications for this are by the same developer (https://panic.com/) and it really shows that they know what they’re doing when it comes to different types of remote workload.

Prompt

For handling SSH into various remote servers, I’ve found that Prompt is pretty much unbeatable for handling stable connections for various servers. It also provides a fantastic UI for managing multiple connection information for being able to re-connect to saved sessions upon restarting prompt.

Coda

Coda could perhaps replace Prompt as it does support SSHing into remote hosts, however its main focus is being able to access a remote host over SSH/SFTP to provide a great way of manipulating files on the remote filesystem through the Coda UI. I typically use if for editing both markdown files and various source code files, Coda provides the capability to provide source code highlighting based upon the file extension and more importantly (for markdown) a preview function that will render certain files. Once a file has been opened the first time within Coda it can be edited and saved locally (on the iPad and within the Coda app) and can then be saved back to the remote file system once connectivity is restored.

Visual Studio Code

I would love this to eventually become a native application on the iPad, however until then the only option is to access Visual Studio Code through a web browser using (https://github.com/cdr/code-server). Once code-server is up and running the iPad can access it through the safari web browser, however there are a few things that can make it a little easier to use:

Full screen

To access Visual Studo Code in full screen on the iPad it will need adding to the home screen. Open code-server in safari and press the button next to the address bar, then navigate to the “Add to Home Screen” button. This will now create a button on the iPad Home screen that will open code-server directly with things like the address bar hidden, providing Visual Code in full screen.

Disable Shortcut bad

When typing in Visual Studio Code iOS will keep popping up a bar with commonly used words and shortcuts that can limit the screen real-estate. To disable this open Settings -> General -> Keyboard -> Shortcuts (disable).

VNC Viewer

For accessing either my macbook or hackintosh VNC Viewer with the “Bonjour” capability makes it fantastically easy to find macOS desktops that are being shared through screen share. The VNC Viewer tool is perfectly designed for use with the touch screen with “pinch” and scroll and makes using a full desktop pretty straight forward.

Additional Usage

I presume that “Sidecar” will remove the need for an additional app (https://www.macrumors.com/2019/06/06/macos-catalina-sidecar-limited-to-newer-macs/) but currently the Duet application has been pretty great for turning the iPad into an external display for my Macbook. The performance is fantastic for the software having to handle all of the external monitor functionality, although it sometimes will take a few app restarts for the iPad to appear as a second display.

In this second post we will cover the use of nftables in the linux kernel to provide very simple and low-level load balancing for the Kubernetes API server.

Nftables

Nftables was added to the Linux kernel in the 3.x release and was presumed to be the de-facto replacement of the often complained about iptables. However at the moment with Linux Kernels in the 5.x release it still looks like iptables has kept its dominance.

Even in some of the larger open source projects the move towards the more modern nftables appears to stall quite easily and become a stuck issue:

Nftables is either driven through the nft cli tooling or rules can be created through the netlink interface to provide a programmable driven approach. There are a few libraries that are in the process of being created to help address a programmable approach in various levels of development (see below in the additional resources).

There are a few distributions that have started to the migration to nftables, however to maintain backwards complexity a number of nftables wrappers have been created that allow the distro or an end user to issue iptable commands and have them translated to nftables rules.

Kubernetes API server load balancing with nftables

This is where the architecture gets a little bit different to the load balancers that I covered in my previous post. In the more common load balancers an administrator will create an load balancer endpoint (typically address and port) for a user to connect to, however with nftables we will be routing through it like a gateway or firewall.

Routing (simplistic overview)

In networking, routing is the capability of having networking traffic “routed” to different logical networks that aren’t directly addressable from the source host to the destination host.

Routing Example

  • Source Host is given the address 192.168.0.2/24
  • Destination Host is given the address 192.168.1.2/24

If we decompose the Source address we can see that it’s 192.168.0.2 and on the subnet /24 (typically shown through a netmask of 255.255.255.0). The subnet is used to determine how large the addressable network is, with the current subnet the source host can access anything that is 192.168.0.{x} (1 - 254)

Based upon the designated subnets neither network will be able to directly access one another, so we will need to create a router and then update the hosts so that they have routing table entries that specify how to access networks.

A router will need to have an address on each network in order for the hosts on that network to route their packets through so in this example our router will look like the following:

1
2
3
4
5
6
7
____________________________________________
| My Router |
| | |
| eth0 | eth1 |
| 192.168.0.100 <---------> 192.168.1.100 |
|__________________________________________|

The router could be a physical routing device or alternatively could just be a simple server or VM, in this example we’ll presume a simple Linux VM (as we’re focussing on nftables). In order for out Linux VM to forward packets to other networks we need to ensure that ipv4_forwarding is enabled.

The final step is to ensure that the source host has it’s routing tables updated so that it is aware of where packets need to go when they need to access the 192.168.1.0/24 network. Typically it will look like the following pseudo code (Each OS has slightly different ways of adding routes):

route add 192.168.1.0/24 via 192.168.0.100

This route effectively tells the kernel/networking stack that any packets addressed for the network 192.168.1.0/24 should be forwarded to the address 192.168.0.100, which is our router with an interface in that network range. This simple additional route now means that our source host can now access the destination address by routing packets via our routing vm.

I’ll leave the reader to determine a route for destination hosts wanting to connect to hosts in the source network :)

Nftables architectural overview

This section details the networks and addresses of the network i’ll be using, along with a crude diagram showing the layout.

Networks

  • 192.168.0.0/24 Home network
  • 192.168.1.0/24 VIP (Virtual IP) network
  • 192.168.2.0/24 Kubernetes node network

Important Addresses / Hosts

  • 192.168.0.1 (Gateway / NAT to internet)

  • 192.168.0.11 (Router address in Home network)

  • 192.168.2.1 (Router address in Kubernetes network)

  • 192.168.1.1 (Virtual IP of the Kubernetes API load balancer)

Kubernetes Nodes

  • 192.168.2.110 Master01
  • 192.168.2.111 Master02
  • 192.168.2.120 Worker01

(worker nodes will follow sequential addressing from Worker01)

Network diagram (including expected routing entries)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
_________________
| 192.168.0.0/24|
_________________
|
| 192.168.1.0/24 routes via 192.168.0.11
|
_________________ --> 192.168.0.11
| 192.168.1.0/24|
_________________ --> 192.168.2.1
|
| IPv4 forwarding, and NAT Masquerading
| 192.168.1.0/24 routes via 192.168.2.1
|
_________________
| 192.168.2.0/24|
_________________

Using nft to create the load balancer

The load balancing VM will be an ubuntu 18.04 VM with the nft package and all of it’s dependencies, also we will ensure that ipv4_forwarding is enabled and persists through reboots.

Step one: Creation of the nat table

nft create table nat

Step two: Creation of the postrouting and prerouting chains

nft add chain nat postrouting { type nat hook postrouting priority 0 \; }

nft add chain nat prerouting { type nat hook prerouting priority 0 \; }

Note we will need to add masquerading for the post routing of traffic (so traffic gets back to the source)

nft add rule nat postrouting masquerade

Step three: Creation of the load balancer

nft add rule nat prerouting ip daddr 192.168.1.1 tcp dport 6443 dnat to numgen inc mod 2 map { 0 : 192.168.2.110, 1 : 192.168.2.111 }

This is quite a long command so to understand it we will deconstruct some key parts:

Listening address

  • ip daddr 192.168.1.1 Layer 3 (IP) destination address 192.168.1.1
  • tcp dport 6443 Layer 4 (TCP) destination port 6443

Note
The load balancer listening VIP that we are creating isn’t a “tangible” address that would typically exist with a standard load balancer, there will be no TCP stack (and associated metrics). The address 192.168.1.1 exists only inside the nftables ruleset and only when traffic is routed through the nftables VM will traffic be load balanced.

This is important to be aware of, if we placed a host on the same 192.168.1.0/24 network and tried to access the load balancer address 192.168.1.1 we wouldn’t be able to access it as traffic on the same subnet is never passed through a router or gateway. So, the only way traffic can hit an nftable load balancer address (VIP) is if that traffic is routing through the host and being examined by our ruleset. Not realising this can lead to a lot of head scratching whilst contemplating what appears to be simple network configuration.

Destination of traffic

  • to numgen inc mod 2 Use a random number generator between two numbers
  • map { 0 : 192.168.2.110, 1 : 192.168.2.111 } Our pool of backend servers that the random number generator will select from

Step four: Validate nftables rules

nft list tables

This command will list all of the nftables that have been created, at this point we should see out nat table

nft list table ip nat

This command will list all of the rules/chains that have been created as part of the nat table, at this point we should see our routing chains and our ruleset watching for incoming traffic on our VIP and the destination hosts.

Step five: Add our routes

Client route

The client here will be my home workstation, and will need a route adding so that it can access the load balancer VIP 192.168.1.1 on the 192.168.1.0/24 network:

Pseudo code

route add 192.168.1.0/24 via 192.168.0.11

Kubernetes API server routes

These are needed for a number of reasons, the main reason here though is that kubeadm will need to access the load balancer VIP 192.168.1.1 to ensure that the control plane is accessible.

Pseudo code

route add 192.168.1.0/24 via 192.168.2.1

Using Kubeadm to create our HA (load balanced) Kubernetes control plane

At this point we should have the following:

  • Load balancer VIP/port created in nftables
  • Pool of servers defined under this VIP
  • Client route set through the nftables VM
  • Kubernetes API server set to route to the VIP network through the nftables VM

In order to create a HA control plane with kubeadm we will need to create a small yaml file with the load balancer configuration detailed. Below is an example configuration for Kubernetes 1.15 with our above load balancer configuration:

1
2
3
4
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
kubernetesVersion: 1.15.0
controlPlaneEndpoint: "192.168.1.1:6443"

Once this yaml file has been created on the file system of the first control plane node we can bootstrap the kubernetes cluster:

kubeadm init --config=/path/to/config.yaml --experimental-upload-certs

The kubeadm utility will run through a large battery of pre-bootstrap tests and checks and will start to deploy the Kubernetes control plane components. Once they’re deployed and being started the kubeadm utility will attempt to verify the health of the control plane through the load balancer address. If everything has been deployed correctly then once the control plane components are healthy then traffic should flow through the load balancer to the bootstrapping node.

Load Balancing applications

All of the above notes cover using nftables to load balance the Kubernets API server, however the above example can easily be used in order to load balance an application that is being ran within the kubernetes cluster. This quick example will use the Kubernetes “Hello World” example (https://kubernetes.io/docs/tasks/access-application-cluster/service-access-application-cluster/) (which i’ve just discovered is 644MB ¯\_(ツ)_/¯) and will be load balanced in a subnet that i’ll define as my application network (192.168.4.0/24).

Step one: Create “Hello World” deployment

kubectl run hello-world --replicas=2 --labels="run=load-balancer-example" --image=gcr.io/google-samples/node-hello:1.0 --port=8080

This will pull the needed images and start two replicas on the cluster

Step two: Expose the application through a service and NodePort

kubectl expose deployment hello-world --type=NodePort --name=example-service

Step three: Find the exposed NodePort

kubectl describe services example-service | grep NodePort

This command will show the port that Kubernetes has selected to expose our service on, we can now test that the application is working by examining the Pods to make sure they’re running kubectl get deployments hello-world and testing connectivity with curl http://node-IP:NodePort.

Step four: Load Balance the service

Our two workers (192.168.2.120/121) are exposing the service through their NodePort 31243, we will use our load balancer to create a VIP of 192.168.4.1 that is exposing the same port 31243 that will load balancer over both workers.

1
2
3
nft add rule nat prerouting ip daddr 192.168.4.1 \
tcp dport 31243 dnat to numgen inc mod 2 \
map { 0 : 192.168.2.120, 1 : 192.168.2.121 }

Note: Connection state tracking is managed by conntrack and is applied on the first packet in flow. (thanks @naadirjeewa)

Step five: Add route on client machine to access the load balanced service

My workstation is on the Home network as defined above and will need a route adding so that traffic will go through the load balancer to access the VIP. On my mac the command is:

sudo route -n add 192.168.4.0/24 192.168.0.11

Step six: Confirm that the service is now available

From the client machine we can now validate that our new load balancer address is both accessible and is load balancing over our application on the Kubernetes cluster.

1
2
 $ curl http://192.168.4.1:31243/
Hello Kubernetes!

Monitoring and Metrics with nftables

The nft cli tool provides capability for debugging and monitoring the various rules that have been created within nftables. However in order to create any level of metrics then counters (https://wiki.nftables.org/wiki-nftables/index.php/Counters) will need defining on a particular rule. We will re-create our Kubernetes API server load balancer with a counter enabled:

1
2
3
4
nft add rule nat prerouting ip daddr 192.168.1.1 tcp dport 6443 \
counter dnat to \
numgen inc mod 2 \
map { 0 : 192.168.2.110, 1 : 192.168.2.111 }

If we look at the above rule we can see that on the second line we’ve added a counter statement that is placed after the VIP definition and before the destination dnat to the endpoints.

If we now examine the nat ruleset we can see that counter values are being incremented as the load balancer VIP is being accessed.

1
2
3
nft list table ip nat | grep counter
ip daddr 192.168.1.1 tcp dport 6443 counter packets 54 bytes 3240 dnat to numgen inc mod 2 map { 0 : 192.168.2.110, 1 : 192.168.2.111 }

This output is parsable although not all that useful, the nftables documentation mentions snmp but i’ve yet to find any real concrete documentation.

Additional resources

This is a lazy introduction to options that can be employed to load balancer the Kubernetes API-Server, however in this first post we will be focusing on the Kubernetes API-Server and load balancers in a general sense just to get a slightly deeper understanding what is happening.

Kubernetes API server

As its name suggests the Kubernetes API server is the “entry-point” into a Kubernetes cluster, and allows for all CRUD (Create/read/update/delete)
operations. The interaction with the Kubernetes API server is typically through REST and JSON/YAML payloads that define objects within the Kubernetes cluster. A Kubernetes API server will happily run as a singular instance, however this (by its very design) is a single point of failure and offers no high availability. In order to provide a resilient and highly available Kubernetes API server then a load balancer should be placed above the Kubernetes API server replicas and have the load balancer handle ensuring that a client is passed to a healthy instance.

Example Kubernetes API architecture

Here a load balancer 192.168.1.1 will select one of the two instances beneath it and present it to the client

1
2
3
4
5
6
7
8
            -----------------
|192.168.1.1:6443|
-----------------
| |
------------------- -------------------
|192.168.1.110:6443| |192.168.1.111:6443|
------------------- -------------------

Load balancers

Load balancers have been a heavily deployed technology for a number of decades, their use-case has typically been to provide a solution to two technical challenges:

  • Scaling workloads over application replicas
  • High Availability over “working” application replicas

With scaling workloads a load balancer will take requests and dependant on the distribution algorithm send the request to one of the pre-configured backends.

The High availability the load balancer will typically employ a technology that will determine if an endpoint is working as correctly and use that to determine if traffic can be sent to that endpoint. As far as the end user is concerned, traffic will always be hitting a working endpoint as long as one is available.

Load balancer architecture

The majority of load balancers operate using a few common structures:

Front end

The front end is the client side part of the load balancer, and it is this front end that is exposed to the outside world. Either an end user or an application will connected to the front end in an identical manner that they would connect to the application directly. The load balancing should always be transparent to the end user/application that is connecting to the load balancer

Back end(s)

When a front end of a load balancer is accessed by a client, it is the load balancers main purpose to then redirect that traffic transparently to a back end server. A back end server is the location where the load balancer application should actually be running, and the load balancer will typically perform a check to ensure that the application is available before sending any traffic to it. As the main point of a load balancer is to provide both high availability and scaling then multiple back ends are usually configured under a single front end, these are typically called a pool from which the load balancer will select one from the many healthy back ends.

Selection Algorithm

Once a front end is defined and a pool of backends have been added there typically will be a predetermined selection algorithm that will be used in order to make the decision which backend should be chosen from the pool. As this is meant to be a simple overview, we will only cover the two most commonly used algorithms:

Round-Robin

This algorithm is arguably the most simple, and will simply loop through the pool of back end servers until they’re exhausted and start again.

Weighted

This method provides the capability of having traffic pre-determined to go more heavily to some backend in the pool than others. Weighted algorithms can be different between load balancers but if we were to pretend that it were based upon simple percentage we could imagine the following two examples:

Equal weighting

backend1.com 50%
backend2.com 50%

This would almost be the same as having round-robin load balancing as requests would be based equally between the two backends in the pool.

Weighted load balancing

backend1.com 10%
backend2.com 90%

This would mean that 1 out of every ten requests would be going to backend1.com, and the remaining 9 would be going to backend2.com. The main use-cases for this are typically things like a phased approach of a new release of an application (canary deployment)

Types of Load balancer

Load balancers provide a number of different mechanisms and ways of exposing their services to be consumed, each of these mechanisms has various advantages and gotchas to be aware of. The majority of load balancers will provide their service at a particular layer in the OSI stack > wiki link

Type 3 Load balancer

Type 3 can typically be thought of as “endpoint” load balancer. This load balancer will expose itself using an IP address sometimes referred to as a VIP (Virtual IP address) and then it will load balance incoming connections over one or more pre-defined endpoints. Health checks to these internal endpoints are typically performed by attempting to either create a connection to the endpoint or a simple “ping” check.

Type 4 Load balancer

A Type 4 load balancer will usually provide it’s functionality over a service that is exposed as a port on an IP address. The most common load balancer of this type is usually a web traffic load balancer that will expose traffic on the IP address of the load balancer through TCP ports 80 (un-encrypted) and 443 for (SSL encrypted web traffic). However this type of load balancer isn’t only restricted to web based traffic and can load balance connections to other services that listen on other TCP ports.

To ensure that the pool of backends are “health” a Type 4 load balancer has the capability to provide that not only does a backend accept connections, but also that the application is behaving as expected.

Example

Un-Healthy Application

  1. Ping endpoint (successful)
  2. Attempt to read data (a lot of web applications will expose /health URL that can determine if the application has completed its initialisation), if the application isn’t ready then it will return a HTTP error code 500 (anything >300 is typically not healthy)

Type 7 Load balancer

A Type 7 load balancer has to have knowledge about a particular protocol (or application knowledge) in order for it to provide the load balancing to a particular pool of endpoints. This particular load balancer if often found in-front of large websites that host various services under the same domain name.

Example

Consider these two urls:

Under the previous two types of load balancer, all traffic will only go to the same pool of servers regardless of the actual URL requested by the end user.

In a Type 3 load balancer:

example.com –> resolves to –> load balancer IP –> which then selects a server from –> endpoint pool

In a Type 4 load balancer:

https://example.com –> resolves to –> load balancer IP –> and the https:// resolves to port 443 –> which then selects a server from –> endpoint pool

The Type 7 load balancer however employs the knowledge of the protocol/application and can perform the load balancing based upon certain levels of application behaviour. In this example the load balancer can examine all of the networking decisions to direct traffic to the correct pool of servers, however it can also make additional decisions now based upon things like application behaviour or client request.

In a Type 7 load balancer:

https://example.com/finance –> resolves to –> load balancer IP –> and the https:// resolves to port 443 –> load balancer identifies traffic as http/s and can parse behaviour –> /finance read as the URI –> Selects from the “finance” endpoint pool.

Existing load balancers and examples

There has been a large market for load balancers for a number of decades now as the explosion of web and other highly available and scalable applications has driven demand. Originally the lions share of load balancers were typically hardware devices, and would sit with your switching, routing, firewall and other network appliances. However as the speed of change and requirements for quick and smaller load balancers has taken hold, the rise of the software based load balancer has become prevalent. Originally the software load balancer was limited due to consumer hardware limitations and missing functionality, however with hardware offloads on NICs and in-built crypto in CPUs a lot of these issues have been removed. Finally, as cloud services have exploded so has the LBaaS (load balancer as a service) providing an externally addressable IP that can be easily pointed at a number of internal instances through wizards/APIs or through the click of a few buttons.

A Quick Go example of a Load Balancer

Below is a quick example of writing your own http/s load balancer, that will create a handler function that will handle the passing of traffic to one of the backend servers and return the results back to the client of the load balancer. The two things to consider here are the backendURL() function on the second line the the req. (HTTP Request) modifications.

  • backendURL() - This function will choose a server from the available pool and use it as a backend server
  • req.XYZ - Here we’re re-writing some of the HTTP request so that the backend returns the traffic back to the correct client.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
handler := func(w http.ResponseWriter, req *http.Request) {

url, _ := url.Parse(backendURL())

proxy := httputil.NewSingleHostReverseProxy(url)

req.URL.Host = url.Host
req.URL.Scheme = url.Scheme
req.Header.Set("X-Forwarded-Host", req.Host)
req.Host = url.Host

proxy.ServeHTTP(w, req)
}

mux := http.NewServeMux()
mux.HandleFunc("/", handler)
http.ListenAndServe(frontEnd, mux)

HAProxy load balancing example for Kubernetes API server

You will find below a simple example for haproxy -> /etc/haproxy/haproxy.cfg that will load balance from the haproxy VM/server to the two Kubernetes control plane nodes.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
frontend kubernetes-api
bind <haproxy address>:6443
mode tcp
option tcplog
default_backend kubernetes-api

backend kubernetes-api
mode tcp
option tcplog
option tcp-check
balance roundrobin
default-server inter 10s downinter 5s rise 2 fall 2 slowstart 60s maxconn 250 maxqueue 256 weight 100

server k8s-master1 192.168.2.1:6443 check
server k8s-master2 192.168.2.2:6443 check

Nginx load balancing example for Kubernetes API server

You will find below a simple example for nginx -> /etc/nginx/nginc.cfg that will load balance from the nginx VM/server to the two Kubernetes control plane nodes.

1
2
3
4
5
6
7
8
9
10
11
12
13
stream {
upstream kubernetes-api {
server 192.168.2.1:6443 weight=5 max_fails=3 fail_timeout=30s;
server 192.168.2.2:6443 weight=5 max_fails=3 fail_timeout=30s;
}

server {
listen 6443;
proxy_connect_timeout 1s;
proxy_timeout 3s;
proxy_pass kubernetes-api;
}
}

Next steps

This entire post was actually leading up to load balancing the Kubernetes API server with nftables, so click here to read that next. :-)

0%