Creating a Kubernetes cloud provider, doesn't required boiling the ocean

Whilst working on a number of kubernetes control-plane deployments and the fighting with kubeadm and components like haproxy, nginx and keepalived, I decided to try and create my own load-balancer. In the end this proved to be not too complicated to replicate both a Virtual IP and load-balancing over the backends (master nodes) with the standard Go packages. That project now is pretty stable and can easily be used to create a HA control-plane, all of that can be found on https://kube-vip.io. The next thing I wanted to try (and I’ve been considering learning about this for a while) was creating a load-balancer “within” Kubernetes, as in provide the capability and functionality to kubectl expose <...> --type=LoadBalancer. It turns out that in order to provide this functionality within Kubernetes you need to write a Cloud Provider that the Kubernetes Cloud Controller Manager can interface with.

This post will “chronicle” the process for doing that… :-D

Kubernetes Cloud Providers

We will start with the obvious question.

What are Cloud providers?

Out of the box Kubernetes can’t really do a great deal, it really needs a lot of components to sit on top of or to interface with in order for it to provide the capability to run workloads. For example even in a basic Kubernetes cluster there is a requirement to have a container runtime (CRI, Container Runtime Interface) in order to execute the container, then we would need a networking plugin (CNI, Container Network Interface) in order to provide networking within the cluster.

On the flip side, a typical cloud company (AWS, GCloud, Azure etc…) offers a plethora of cool features and functionality that it would be awesome to consume through the Kubernetes cluster:

  • Load Balancers
  • Cloud instances (VMs, in some placess bare-metal)
  • Areas/zones
  • Deep API integrations into the infrastructure

So how do we marry up these two platforms to share that functionality …

.. Kubernetes Cloud Providers ..

Using Cloud providers

In most circumstances, you won’t even know that you’re using a cloud provider (which I suppose is kind of the point) and only when you try to create an object that the cloud provide can create/manage/delete will it actually be invoked.

The most common use-case (and the one this post is focussing on) is the creation of a load balancer within Kubernetes and it’s “counterpart” being provided by the cloud vendor. In the case of the cloud vendor Amazon Web Services (AWS) then creating a services type: LoadBalancer will create an Elastic Load Balancer (ELB) that will then load balance traffic over the selected pods. All of this functionality from the Cloud Provider Interface abstracts away the underlying technology and regardless of where a cluster is running a LoadBalancer just becomes a LoadBalancer.

Creating a Cloud Provider!

So now onto the actual steps to creating your own cloud provider, this is all going to be written in Go and I’ll do my best to be as descriptive as possible.

Wait, what is the cloud-controller-manager?

In Kubernetes v1.6 the original design was that all the cloud providers would have their vendor specific code all live in the same place. This ultimately lead to a point where all Kubernetes clusters came with a large cloud-controller-manager that at startup would be told which vendor code path to run down.

These were originally called In Tree cloud providers and there has been a push over the last few years to move to Out of Tree providers. When deploying a Kubernetes cluster the only change is that instead of starting the cloud-controller-manager with a specific vendor path (e.g. vsphere or aws), the operator would deploy the vendor specific cloud-provider such as cloud-provider-aws.

A Note about the “why” of In Tree / Out of Tree

There has been a shift of stripping code and “vendor specific” functionality from the main Kubernetes source repositories and into their own repositories. The main reasons for this:

  • Removes a tight-coupling between external/vendor code and Kubernetes proper
  • Allowed these projects to move at a different release rate to the main project
  • Slims the Kubernetes code base and allows these things to become optional
  • Reduces the vulnerability footprint of vendor code within the Kubernetes project
  • The interfaces ensure ongoing compatibility for these Out Of Tree projects

So for someone to create their own cloud provider they will need to follow a standard that was set by the original cloud-controller-manager, this standard is exposed through method sets and interfaces which can be read about more here.

tl;dr simply put, the cloud-controller-manager sets a standard that means if I want to expose a Load Balancer service it needs to also expose a number of methods (with matching signature). We can further see in the LoadBalancer interface here all of the functions that my Loadbalancer must expose in order to work.

The interface

The interface for a cloud-provider can be viewed here, we can see that this interface provides a number of functions that will return the interface for a specific type of functionality.

The more common interfaces I’ve summarised below:

  • Instances controller - responsible for updating kubernetes nodes using cloud APIs and deleting kubernetes nodes that were deleted on your cloud.
  • LoadBalancers controller - responsible for load balancers on your cloud against services of type: LoadBalancer.
  • Routes controller - responsible for setting up network routes on your cloud

Example provider (code) cloud-provider-thebsdbox

This section will cover in Go code all of the basics for building a cloud-provider that will handle all of the services requests (that are type: LoadBalancer). When implementing your own, ensure you use correct paths and package names!

Our cloud-provider will use the following files:

pkg/thebsdbox/cloud.go
pkg/thebsdbox/loadbalancer.go
pkg/thebsdbox/disabled.go
main.go

The next sections will have the source code and then an overview at the end of each file.

cloud.go

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
package thebsdbox

import (
“k8s.io/client-go/informers”
“k8s.io/client-go/tools/clientcmd”
cloudprovider “k8s.io/cloud-provider”
)

const ProviderName = “thebsbdox”

// ThebsdboxCloudProvider - contains all of the interfaces for our cloud provider
type ThebsdboxCloudProvider struct {
lb cloudprovider.LoadBalancer
}

var _ cloudprovider.Interface = &thebsdboxCloudProvider{}

func init() {
cloudprovider.RegisterCloudProvider(ProviderName, newThebsdboxCloudProvider)
}

func newThebsdboxCloudProvider(io.Reader) (cloudprovider.Interface, error) {
return &ThebsdboxCloudProvider{
lb: newLoadBalancer(),
}, nil
}

// Initialize - starts the clound-provider controller
func (t *ThebsdboxCloudProvider) Initialize(clientBuilder cloudprovider.ControllerClientBuilder, stop <-chan struct{}) {
clientset := clientBuilder.ClientOrDie(“do-shared-informers”)
sharedInformer := informers.NewSharedInformerFactory(clientset, 0)

sharedInformer.Start(nil)
sharedInformer.WaitForCacheSync(nil)

}

// ProviderName returns the cloud provider ID.
func (t *ThebsdboxCloudProvider) ProviderName() string {
return ProviderName
}

// ENABLED Services

// LoadBalancer returns a loadbalancer interface. Also returns true if the interface is supported, false otherwise.
func (t *ThebsdboxCloudProvider) LoadBalancer() (cloudprovider.LoadBalancer, bool) {
return t.lb, true
}

Our Cloud-Provider ThebsdboxCloudProvider struct{}

This struct{} contains our vendor specific implementations of functionality, such as load-balancers, instances etc..

1
2
3
type ThebsdboxCloudProvider struct {
lb cloudprovider.LoadBalancer
}

We’re only defining a loadbalancer lb variable as part of our cloud-provider instances as this is the only functionality our provider will expose.

init()

This function will mean that before our cloud-provider actually starts (before the main() function is called) we will register our vendor specific cloud-provider. It will also ensure that our newly registered cloud-provider will be instantiated with the newThebsdboxCloudProvider function.

Instantiating our cloud-provider newThebsdboxCloudProvider()

When our cloud-provider actually has started (the main() function has been called) the cloud-provider controller will look at all registered providers, and it will find ours that we registered in the init() function. It will then call our instantiation function newLoadBalancer(), which will do any pre-tasks for setting up our load balancer, it will then assign it to lb.

Enabling our Load-Balancer func (t *ThebsdboxCloudProvider) LoadBalancer() (cloudprovider.LoadBalancer, bool)

This function is pretty much the crux of enabling the load balancer functionality and as part of the cloud-controller-manager spec defines what functionality our cloud-provider will expose. These functions will return one of two things:

  1. Our instantiated functionality (in this case our loadbalancer object, returned as lb)
  2. If this is enabled or not (true/false)

Everything that we’re not exposing from our cloud-provider will return false and can be seen in the disabled.go source.

disabled.go

All of these functions disable these bits of functionality within our cloud-provider

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
package thebsdbox

import cloudprovider “k8s.io/cloud-provider”

// Instances returns an instances interface. Also returns true if the interface is supported, false otherwise.
func (t *ThebsdboxCloudProvider) Instances() (cloudprovider.Instances, bool) {
return nil, false
}

// Zones returns a zones interface. Also returns true if the interface is supported, false otherwise.
func (t *ThebsdboxCloudProvider) Zones() (cloudprovider.Zones, bool) {
return nil, false
}

// Clusters returns a clusters interface. Also returns true if the interface is supported, false otherwise.
func (t *ThebsdboxCloudProvider) Clusters() (cloudprovider.Clusters, bool) {
return nil, false
}

// Routes returns a routes interface along with whether the interface is supported.
func (t *ThebsdboxCloudProvider) Routes() (cloudprovider.Routes, bool) {
return nil, false
}

// HasClusterID provides an opportunity for cloud-provider-specific code to process DNS settings for pods.
func (t *ThebsdboxCloudProvider) HasClusterID() bool {
return false
}

loadbalancer.go

Our LoadBalancer source code again has to match the interface as expressed here, we can see those functions defined below and exposed as methods as part of our thebsdboxLBManager struct.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
package thebsdbox

import (
“context”
“fmt”

v1 “k8s.io/api/core/v1”
“k8s.io/client-go/kubernetes”
cloudprovider “k8s.io/cloud-provider”
“k8s.io/klog”
)

//thebsdboxLBManager -
type thebsdboxLBManager struct {
kubeClient *kubernetes.Clientset
nameSpace string
}

func newLoadBalancer() cloudprovider.LoadBalancer {
// Needs code to get a kubeclient => client
// Needs code to get a namespace to operate in => namespace

return &thebsdboxLBManager{
kubeClient: client,
namespace: ns,}
}

func (tlb *thebsdboxLBManager) EnsureLoadBalancer(ctx context.Context, clusterName string, service *v1.Service, nodes []*v1.Node) (lbs *v1.LoadBalancerStatus, err error) {
return tlb.syncLoadBalancer(service)
}
func (tlb *thebsdboxLBManager) UpdateLoadBalancer(ctx context.Context, clusterName string, service *v1.Service, nodes []*v1.Node) (err error) {
_, err = tlb.syncLoadBalancer(service)
return err
}

func (tlb *thebsdboxLBManager) EnsureLoadBalancerDeleted(ctx context.Context, clusterName string, service *v1.Service) error {
return tlb.deleteLoadBalancer(service)
}

func (tlb *thebsdboxLBManager) GetLoadBalancer(ctx context.Context, clusterName string, service *v1.Service) (status *v1.LoadBalancerStatus, exists bool, err error) {

// RETRIEVE EXISTING LOAD BALANCER STATUS

return &v1.LoadBalancerStatus{
Ingress: []v1.LoadBalancerIngress{
{
IP: vip,
},
},
}, nil
}

// GetLoadBalancerName returns the name of the load balancer. Implementations must treat the
// *v1.Service parameter as read-only and not modify it.
func (tlb *thebsdboxLBManager) GetLoadBalancerName(_ context.Context, clusterName string, service *v1.Service) string {
return getDefaultLoadBalancerName(service)
}

func getDefaultLoadBalancerName(service *v1.Service) string {
return cloudprovider.DefaultLoadBalancerName(service)
}
func (tlb *thebsdboxLBManager) deleteLoadBalancer(service *v1.Service) error {
klog.Infof(“deleting service ‘%s’ (%s)”, service.Name, service.UID)

// DELETE LOAD BALANCER LOGIC

return err
}

func (tlb *thebsdboxLBManager) syncLoadBalancer(service *v1.Service) (*v1.LoadBalancerStatus, error) {

// CREATE / UPDATE LOAD BALANCER LOGIC (and return updated load balancer IP)

return &v1.LoadBalancerStatus{
Ingress: []v1.LoadBalancerIngress{
{
IP: vip,
},
},
}, nil
}

Instantiating the LoadBalancer newLoadBalancer()

This function is called when the cloud-provider itself is initialised and can be seen in cloud.go as part of the newThebsdboxCloudProvider() method. The newly created load balancer object once created is then added to the cloud-providers main object for use when needed.

Interface methods

EnsureLoadBalancer

Creates a LoadBalancer if one didn’t exist to begin with and then return it’s status (with the load balancer address)

UpdateLoadBalancer

Updates an existing LoadBalancer, or will create one if it didn’t exist and then return it’s status (with the load balancer address)

EnsureLoadBalancerDeleted

Calls GetLoadBalancer first to ensure that the load balancer exists, and if so it will delete the vendor specific load balancer. If this completes successfully then the service of type: LoadBalancer is removed as an object within Kubernetes.

GetLoadBalancer

This will speak natively to the vendor specific load balancer to make sure that it has been provisioned correctly.

GetLoadBalancerName

Returns the name of the load balancer instance.

main.go

This is a the standard main.go as given by the actual cloud-controller-manager example. The one change is the addition of our // OUR CLOUD PROVIDER which adds all of our vendor specific cloud-provider methods.

  1. init() in cloud.go is called and registers our cloud-provider and the call back to our newCloudProvider() method.
  2. The command.Execute() in main.go starts the cloud-controller-manager
  3. The cloud-controller-manager method will look at all of the registered cloud-providers and find our registered provider.
  4. Our provider will have its newCloudProvider() method called which sets up everything that is needed for it to be able to complete it’s tasks.
  5. Our cloud provider is now running, when a user tries to create a resource that we’ve registered for (Load Balancers) our vendor code will be called to provide this functionality.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
package main

import (
“math/rand”
“os”
“time”

“k8s.io/component-base/logs”
“k8s.io/kubernetes/cmd/cloud-controller-manager/app”

_ “k8s.io/component-base/metrics/prometheus/version” // for version metric registration
// NOTE: Importing all in-tree cloud-providers is not required when
// implementing an out-of-tree cloud-provider.
_ “k8s.io/component-base/metrics/prometheus/clientgo” // load all the prometheus client-go plugins
_ “k8s.io/kubernetes/pkg/cloudprovider/providers”

// OUR CLOUD PROVIDER
_ “github.com/user/pkg/thebsdbox/”

)

func main() {
rand.Seed(time.Now().UnixNano())
command := app.NewCloudControllerManagerCommand()

logs.InitLogs()
defer logs.FlushLogs()

if err := command.Execute(); err != nil {
os.Exit(1)
}
}

Summary

Hopefully this is of some use as to how a Kubernetes cloud-provider is architected, to understand a few more examples I’ve included some other providers:

Good luck and any feedback feel free to message me