OS automation with InfraKit on VMware

After developing the initial proof-of-concept for automating VMware VM instances with InfraKit, it became clear that some new mechanism would be needed in order to provide the capability of “initialising” the configuration of a brand new instance.

But … Immutability?

There is no doubt on a personal level that Immutability is clearly the best practice/method for building modern systems and applications. Numerous projects like the Docker engine to provide immutable images containing only application components or LinuxKit which is designed to build effectively a RO minimal Linux based OS demonstrate these benefits. These relatively new projects have allowed and effectively pushed people to embrace new concepts and new working methodologies in order to capture the benefits of Immutability.

I want something in-between …

The majority of Operating systems are designed to be flexible for a multitude of use-cases, which can render them pretty hard to use when trying to adhere to an immutable working pattern. This has led to a number of different methods that are used to build repeatable Operating system builds, ranging from the following:

  • Bootstrapping, through the use of things like kickstart in order to network boot an operating system and install the packages specified in the kickstart flexible
  • Infrastructure automation, these tools are typically used to automate the provisioning of “fleets” of infrastructure such as bare-metal or virtual machines from resources such as VM templates or machine images.
  • Configuration Management, the goal of these tools is to perform configuration of an existing operating system. Typically ssh keys or agents installed within the Operating System are needed to acomplish this task.

This blog post covers in more detail the difference between Infra automation and configuration management.

Another way?

Ideally I wanted to keep all of the provisioning inside “or as native” as possible removing dependancies on multiple APIs/CLIs or other interactions that can change or be depricated in the future (or even abandonded). I also didn’t want to develop a matching solution that relied on SSH (no point copying an already existing solution) or go down the rabbit-hole of developing an Operating system agent and having to support such a thing (security nightmare). After looking at some of the code that already existed as part of the vSphere plugin for InfraKit I hit upon an interesting experiment.

Enter [experimental] vmwscript

Not the greatest name I agree, but it is only experimental…


Ignoring the use of SSH to access a VM or using config management agents led to determining  what would be required to get the vmtoolsd to provide a reliable method for executing a process inside of the operating system. Anyone who’s had a play with vmtoolsd for doing this may have hit onto a few of the issues I experienced:

  • Odd TTY behaviour
  • Hanging processes
  • No STDOUT from the process
  • Processes not starting with any environment (no $PATH)
  • Others that i’ve since forgotten

After some Linux fiddling, some wrapping around execution calls followed by some headscratching and it appeared scriptable automation through vmtoolsd would be feasible!  🙂

The positives with using the vCenter API and vmtoolsd means that all of the exec steps that we want to run on the virtual machine can all be multi-plexed through the vCenter. No direct access to the virtual machine is needed, infact the virtual machine doesn’t need to be on the same network OR even have a network adapter in order for vmwscript to automate configuration of the operating system.

Example direct access automation
Standard Automation

Automation through VMware vCenter API

The vCenter API will provide access that traverses any network a VM may be hosted on.

Another positive is that the vmtoolsd is supported on multiple platforms giving vmwscript windows support for free! and looking at the release notes it might cover FreeBSD and Solaris.

Basic usage

Set the environment by using the following file:

export INFRAKIT_VSPHERE_VCURL=https://user@pass:qoqwom5@vcenter_url/sdk
export INFRAKIT_VSPHERE_VCHOST=esxi01.vsphere.host

Apply the configuration to current environment with $ . <path to config file>:

$ . /home/dan/infrakit_environment

Once the environment variables have been configured InfraKit will have everything needed to take a VM template and start applying changes to it.

$ ./infrakit x vmwscript ./path/to/json.json


VMware templating is certainly not a new technology but vmwscript was designed with a few concepts inherited/”stolen” from the Docker build workflow. Essentially taking a base image and applying the relevant changes to it, applying naming etc. and producing a new template all powered off and ready to consume. This is managed by the outputType being set to Template and once all of the configuration steps have been applied, infrakit will power down the virtual machine and convert it to a VMware template that can be consumed for future use.

    "label":"Updated template",
    "vmconfig" : {
        "guestCredentials" : {
            "guestUser" : "root",
            "guestPass" :"password"
    "deployment": [
            "name": "Template to be updated",
            "note": "Build new template for CentOS",
                "inputTemplate": "Old-Centos7-Template",
                "outputName": "New-Centos7-Template",
                "outputType": "Template",
                "commands": [
                    "note":"Upgrade all packages (except VMware Tools)",            
                    "cmd":"/bin/yum upgrade --exclude=open-vm-tools -y > /tmp/ce-yum-upgrade.log",

The above example will take the template Old-Centos7-Template and apply one set of commands to them and will output an updated template under the name New-Centos7-Template. As described above the model follows the path laid out by the Dockerfile in that we can start with an image (template) and apply a set of changes that ultimately will become the new image(template) we can use for all future deployments.

A much more complete example can be found in the infrakit repository, this template deployment will take any CentOS 7.x “minimal” installation and update the template so that we have an image that has been configured to deploy the Docker Engine (https://github.com/docker/infrakit/blob/master/pkg/x/vmwscript/examples/Docker-EE.json).

Deploying a Platform

Once we have built our updated image we can then use that to deploy multiple instances that are required to build a larger platform. We can automate that in the same method but specifying the outputType as VM will create running virtual machines that are all built from the updated template. We can then apply “bootstrap” configuration that will provide the final step of configuration.


  • Build updated web-sever image: bootstrap with static content
  • Docker Swarm Cluster built from updated Docker image: bootstrap with swarm join token
  • Load balanced application engine image: bootstrap with adding itself into a load balancer upon boot

Example code to apply static networking and add to a swarm cluster:

            "name": "Swarm Worker",
            "note": "Add worker",
               "inputTemplate": "DockerEE-2Template",
               "outputName": "worker001",
               "outputType": "VM",
                "commands": [
                       "note":"Join Swarm",
                       "cmd":"/usr/bin/docker swarm join --token SWMTKN--X", 

The above example will make use of a static token that has been writen into the configuration that will be applied by InfraKit, whilst this will complete the join process succesfully it is a static way of working. The vmwscript utility also has the capability of working in a dynamic way by taking the output of a command and storing in a key/value store to be used in other commands. A good example of this is the swarm example (https://github.com/docker/infrakit/blob/master/pkg/x/vmwscript/examples/swarm.json) that is hosted on the github repository.

When we build the Docker swarm master we save the join token in a temporary location, we will then download the token (and remove the temporary file). The text contents of this file are then stored under the key jointoken.

                    "note":"Backing up swarm key for other nodes",            
                    "cmd":"/usr/bin/docker swarm join-token worker | grep SWMTKN > /tmp/swm.tkn",
                    "delAfterDownload": false

In the second deployment in the deployment array we will deploy a worker node (as above) which will require a swarm join token. To make use of the dynamically generated and stored join token we can access it through the key value store as shown below:

                       "note":"Join Swarm",

Here we can see the resultKey used (as the result from the previous command) can be accesses in execution under an execKey allowing dynamic command generation through the building of platforms.

Docker InfraKit deploying Docker-EE from conference WiFi


This is a purely experimental way of building images and then using those images to automate a build out of a platform, the repository has a few examples for a few basic use-cases such as an updated Docker virtual machine and even a wordpress example that can be simply automated. The command syntax may change to make things more efficent along with adding additional functionality.

  • Windows testing
  • More networking configuration support
  • Ability to have the actual vSphere execute vmwscript (auto-scaling/healing) of docker infrastructure
  • Better user / sudo support

Automating VMware vCenter with InfraKit

After a brief discussion on Slack it became apparent that a post that details the steps to automate VMware vCenter with InfraKit would be useful.

Building InfraKit

I won’t duplicate the content that already exists on the InfraKit GitHub located Here the only change is around the build of the binaries. Instead of building everything, we will just build the components that we need to automate vCenter.

The steps below are all that are required:

make build/infrakit && \
make build/infrakit-manager && \
make build/infrakit-group-default && \
make build/infrakit-flavor-swarm && \
make build/infrakit-instance-vsphere

Starting InfraKit

We can then start the group and flavour plugins as per normal, however the vSphere plugin requires some additional details in order for it to start successfully, namely the details of the vCenter we’re automating by doing the following:

Environment var

export VCURL=https://user:pass@vcIPAddress/sdk

Plugin flag

./build/infrakit-instance-vsphere --url=https://user:pass@vcIPAddress/sdk

Note: For testing purposes the plugin can be instructed to not delete instances when they’re “Destroyed”, instead these instances will effectively stop being managed by InfraKit and will need deleting manually. This can be accomplished by starting the plugin with the flag --ignoreOnDestroy=true.

Using InfraKit to deploy instances from a VMware template/existing VM

In one of the more recent PRs, support for VMware VM Templates (.vmtx) or existing Virtual Machines to be used as the basis for new VM instances was added. This means that it becomes very straight forward to deploy and maintain an infrastructure using Virtual Machines that have been created through other tools.

Below is a snippet of the JSON that is used to describe the deployment of instances from the use of a Template
"Instance": {
"Plugin": "instance-vsphere",
"Properties": {
"Datacenter" : "Home Lab",
"Datastore" : "vSphereNFS",
"Hostname" : "esxi01.fnnrn.me",
"Template" : "CentOS7-WEB",
"PowerOn" : true

The "Datacenter" is optional, and is only required in the even that the VMware vCenter cluster contains more than one Datacenter (linked-mode etc.). The new VM instances will be built using the

Using InfraKit to deploy instances from a LinuxKit .iso

In order to build instances using an .iso that has been pushed to VMware vCenter additional details are required, this is because the .iso doesn’t contain any information to describe a Virtual Machine. The plugin has a number of defaults for CPUs/Memory but these can be overridden as shown below, also the ISOPath must be correct for the Datastore that is used.

"Instance": {
"Plugin": "instance-vsphere",
"Properties": {
"Datacenter" : "Home Lab",
"Datastore" : "vSphereNFS",
"Hostname" : "esxi01.fnnrn.me",
"Network" : "HomeNetwork" ,
"Memory" : 512,
"CPUs" : 1,
"ISOPath" : "linuxkit/vSphere.iso",
"PowerOn" : false

This will create an entirely new Virtual Machine for every instance to the specs detailed in the properties and attach it to the VMware vSwitch/dvSwitch network.

Creating Instances

All instances will be created in a folder that matches the "ID": "InfraKitVMs" they will also be tagged internally so that InfraKit knows which instances it owns. Creating your allocated instances is as simple then as committing your InfraKit JSON:

./infrakit group commit InfraKitVMs.json

At this point the plugin will inspect your configuration, if the configuration has been committed before then the plugin will inspect the current configuration and determine the changes that are required. If the configuration has changed (e.g. Memory allocation has changed), then the plugin will recreate the virtual machines with the updated configuration. If it is just changes to the allocation of VMs then the plugin will determine if VMs need adding or removing and apply the relevant changes.

VMware vCenter

If you move a Virtual Machine from the InfraKit created folder then it will no longer be monitored by InfraKit, it also means that InfraKit will create a new Virtual Machine to replace the one it previously managed. If you drag the Virtual Machine bag to its folder then, the previous Virtual Machine will be deleted as the allocation count will be +1.

VMware vSphere

vSphere doesn’t support the “construct” of folders, therefore all created Virtual Machines will reside in the root with all other Virtual Machines created on that vSphere host.


The three year mission aboard the Enterprise is over

As with all good things… they must come to an end and sadly my time at HP Enterprise (originally HP) will be coming to an end July 2017.

It has been an amazing adventure, and one that has both shaped and had a lasting effect on my career. Pre-HPE I typically had desk-based roles and would be issued work requests, such as configure this VM or design and build this Datacenter and meetings would typically feel like there were more important things I could be doing.

Fast forward to starting at HPE, my first time on the opposite side of the table being a ‘vendor’ was a rather nerve wracking experience (along with being stuck in a suit). Having to learn to deliver customer presentations along with the corporate messaging was a bizarre experience but once I got my head around the admittedly massive portfolio it was an enjoyable experience, especially when the solution came together between vendor and customer.

Occasionally you meet people who for their own amusement will try to take you to task by querying every inane detail of a server or network switch, but you learn pretty quick (or your skin toughens up) how to cope.

I also had my first real exposure to presenting on stage, which was (and still is) an utterly nerve wracking experience. The first time at HP ETSS (Enterprise Technology Solutions Summit) was awful I can still remember standing on stage mumbling and shaking away hoping it would all end ASAP. Since then I’ve presented all over the place including an exhausting solid week of full day presentations in Johannesburg, which again was an amazing experience. I still have a lot to learn but can just about put together a presentation that I’d rate as “acceptable” 🙂

Also whilst at HPE I was exposed to Open Source and started developing again, which is quite amusing given I originally did a degree in Software Engineering and then promptly moved into Infrastructure and Hardware management/configuration. I’ve been very fortunate to get involved and meet some amazing developers in both Docker and Chef over the last couple of years along with getting to contribute to various projects.

I’ve nothing but great memories from the last three years (apart from people trying (and failing) to force me to use salesforce). Also here is to the DCA team, who helped and taught me so much during my time at HPE (gone but not forgotten).

So heres to whats next!

In pursuit of a tinier binary-(er)

… yes that was an attempt to make the title rhyme 🙁

tl;dr make an executable smaller by hiding your code inside the header of the executable… read on for the gory detail.

There was a great post recently from Dieter Reuter around building the smallest possible Docker Image, which I thought posed an interesting idea mainly due to some of the crazy sizes of Docker images I keep having to deal with. I decided that I would join in with the challenge and see where I could shrink both a binary and by association the resulting Docker Container down to it’s smallest possibility.

I created a number of binaries during my playing around, below is a list of five of them that all print to STDOUT the following text "Hello Docker World!\n" If you’re not familiar with escaped characters, the ‘\n’ simply is the newline character.

*I realise I capitalised some of the strings by accident, but ‘h’ still occupies the same space as ‘H’ 😉

Initial failure

Before I delve into the steps I went through to make the small container, it’s worth pointing out that there is a fatal flaw in one of the above binaries when placed in a SCRATCH container. *hint* there is a duplicate binary with the suffix _STATIC 🙂

The reason that the hello_in_C will fail to run in the SCRATCH container is that it has dynamic requirements on a number of system libraries. Most notably is libc, which is the base C library that contains a lot of basic day-to-day code that provides the standard functionality to C programs. If we were to place this into a Docker container the following would be the result:

$ docker run -it --rm hello:C
standard_init_linux.go:178: exec user process caused "no such file or directory"

We can examine binaries to check for external dependencies using the ldd tool to see what external libraries are needed to run the binary file. Alternatively, we can use volume mapping to pass the host Operating System libraries into the SCRATCH container -v /lib64:/lib64:ro, this will provide the libraries required for this particular executable to successfully execute.

docker run -v /lib64:/lib64:ro -it --rm hello:C
Hello Docker World!

To permanently fix this issue is quite simple and requires building the C binary with the -static compile-time flag (the package glibc-static will be required), this quite simply will bundle all code into a single file instead of relying on external libraries. This has to knock on effect of making the binary easier to run on other systems (as all code is in one place) however the binary has now increased in size by 100 times… which is the opposite of what we’re trying to accomplish.

What makes an Executable

Ignoring MS-DOS .com files that no-one has touched and hasn’t been supported in years, most executables regardless of Operating System typically consist of a header that identifies the executable type (e.g. elf64, winPE)  and a number of sections:

  • .text, code that can be executed
  • .data, static variables
  • .rodata, static constants
  • .strtab /.shstrtab, string tables
  • .symtab, symbol tables.

The Executable header will contain an entry that points to the beginning of the .text section, which the Operating System will then use when the executable is started to find the actual code to run. This code then will access the various bits of data that it needs from the .data or .rodata sections.

Basic overview of a “Hello Docker World!” execution process

  1. The exec() family functions will take the path of file and attempt to have the OS execute it.
  2. The Operating System will examine the header to verify the file, if OK it will examine the header structure an find the entry point.
  3. Once the entry point is found, the operating system will start executing the code from that point. It is at this point where the program itself is now running.
  4. The program will set up for the function to write the string to stdout
    1. Set string length
    2. Set the pointer to the string in the .data section
    3. Call the kernel
  5. Call the exit function (otherwise the kernel will assume the execution failed)

Strip out Sections

In the quick above diagram, we can see through the course of execution that there is a number of sections within the executable that aren’t needed. In most executables there may be debug symbols or various sections that apply to compilers and linkers that are no longer required once the executable has been put together.

In order to have a stripped executable, it can either be compiled with the -s flag (also make sure -g isn’t used, as this adds debug sections). Alternatively we can use the strip tool that has the capability to remove all non-essential sections.

$ strip --strip-all ./hello_in_C_STRIPPED
$ ls -la hello_in_C_ST*
-rwxrwxr-x. 1 dan dan 848930 Feb 28 15:35 hello_in_C_STATIC
-rwxrwxr-x. 1 dan dan 770312 Feb 28 18:07 hello_in_C_STRIPPED

With languages such as GO, there can be significant savings by stripping any sections that aren’t essential (although if you’re doing this for production binaries it should be part of your compile/make process).

Extreme Shrinking

The final option that will keep your hands clean for shrinking an executable is to make use of tools like UPX which adds a layer of compression to your executable shrinking what’s left of your stripped binary. Taking my original GO binary I went from:

  • go build hello_docker_world.go = 1633717 bytes
  • strip --strip-all = 1020296 bytes
  • upx = 377136 bytes

Clearly a significant saving in terms of space.

Getting your hands dirty

Everything that has been discussed so far has been compiled through standard build tools and modified with the compiler or OS toolchain that managed executables. Unfortunately we’ve reached as far as we can go with these tools, as they will always build to the ELF/OS standards and always create the sections that they deem required.

In order to build a smaller binary, we’re going to have to move away from the tools that make building executables easier and hand craft a tailored executable. Instead of sticking with the format of [header][code][data], we’re going to look at how we can hide our code inside the header.

Whilst there are some parts of the header that are a requirement, there are some that have to just be a non-zero value and others that are left blank for future use. This is going to allow us to simply change entries in the ELF header from legal values to the code we want to execute and the following will happen:

  1. The Operating System will be asked to execute the file
  2. The OS will read the ELF header, and verify it (even though some values don’t make sense)
  3. It will then find the code entry point in the header that points to the middle of the actual header 🙂
  4. The OS will then start executing from that point in the header, and run our code.


Explained Code below

This code pretty much fits just in the ELF header itself, so I have broken the header up and labelled the header fields and where we’ve hidden the code we want to execute.

First part of header (has to be correct)

org     0x05000000 Set Origin address
db      0x7F, "ELF" Identify as ELF binary
dd      1 32-bit
dd      0 Little endiann
dd      $$ Pointer to the beginning of the header
dw      2 Code is executable
dw      3 Instruction set (x86)
dd      0x0500001B
dd      0x0500001B Entry point for our code (section below)
dd      4

Broken Header / Our code

mov     dl, 20 Address of Sections header  Take 20 characters
mov     ecx, msg  From the string at this address
int     0x80 Elf Flag table  Print them

Remaining header (has to be correct)

db      0x25  Size of the Elf Header
dw      0x20  Size of the Program Header
dw      0x01  Entries in the Program Header

Remaining Code (now beyond the header)

inc      eax  Set Exit function
int      0x80 Call it

String section

msg     db      'Hello Docker world!', 10

It’s also worth pointing out that this code won’t be fully “compiled”, as what is written above is actually binary format and therefore nasm will take the text and write out the binary code directly as written above.

Build and run the executable with:

$ nasm -f bin ./tiny_hello_docker.asm -o hello_docker_world
$ chmod +x ./hello_docker_world
Hello Docker world!

Further Reading

This wikipedia article covers all of the ELF standard in the most readable way i’ve come across: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format

A much more in-depth overview of hiding things in the ELF headers is available here: http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html

InfraKit – Writing an Instance Plugin

For a proof of concept i’ve had the opportunity to re-implement a Docker InfraKit plugin instance plugin from scratch (in another language). To help anyone else who decides to do something similar, I thought it best to capture some of the key points that will need implementing when writing your own plugin.

Starting from Scratch

If you’re going to write a plugin for InfraKit in another language then you’ll need to ensure that you implement all of the correct interfaces, APIs and methods needed to allow expected behaviour. Also if your plugin may need to implement the capability to maintain and store expected state somewhere (perhaps locally or in an external store).


  • UNIX Sockets, that can be accessed by the InfraKit CLI tool
  • HTTPD Server that is bound to the UNIX Socket
  • JSON-RPC 2.0 that will be used as the transport mechanism for methods and method results
  • Methods to:
    • Validate the instance configuration
    • Determine the state of an instance
    • Provision new instances
    • Destroy instances
    • Describe instance parameters (configuration that can be made through the JSON methods)
  • Ideally garbage collection of sockets upon exit, leaving abandoned sockets in the plugin directory can be somewhat confusing to the CLI tool.

UNIX Sockets

The UNIX sockets should always reside in the same location so that the CLI can find them, which is an .infrakit/plugins/ directory inside the home directory of the user that runs that plugins, so my plugins reside in /home/dan/.infrakit/plugins/ It may be apparent that as we’re using UNIX sockets then we’re effectively binding Infrakit to the file system of a single system, however it’s not a technical challenge to bind your plugin socket to a TCP port through tools such as nc or socat.

HTTPD Server

Typically you would (under the covers) create a UNIX socket and bind it to a TCP port of either 80/443. However in this implementation we create a UNIX socket and bind it to a file on the local file system, so instead of connecting to an IP address/port we speak to a file on the file system in the same manner.


After some head scratching around some inconsistencies with the API it was decided by the InfraKit developers to move to JSON-RPC 2.0 as the method for calling procedures. This simply wraps methods and parameters into a standardised JSON format, and expects the method results to be reported in the same standardised way. It’s a slightly bizarre standard as only makes use of the HTTPD POST method and expects a HTTP 200 OK returned even in the event that functions fail. The reason for this is that the errors should be encapsulated in the returning JSON, essentially moving the error reporting down (or up, i’m not sure) the stack.


The typical workflow for provisioning through the Instance plugin is as follows:

  1. After a user commits some configuration the InfraKit CLI will parse it and pass the correct parameters to the instance plugin that is defined in the configuration.
  2. The Instance plugin will then read through the parameters and use them when interacting (API call, CLI scraping, vendor specific protocol API etc.) with the Infrastructure provider.
  3. The Provider will then provision a new instance.
  4. Once complete the provider will return the new instance details to the Instance Plugin.
  5. The Instance plugin will report back to InfraKit that the instance has been provisioned.

The workflow above is pretty simplistic, but I presume will be the common steps for interacting with most infrastructure providers however depending on your provider there can be some caveats. The main caveat relates to infrastructure that can’t be immediately provisioned and can result in InfraKit provisioning multiple instances as the instance won’t have been provisioned within the group plugin polling window. If your instance is going to take a number of minutes to provision and you poll for it every 30 seconds, it will result in InfraKit trying provision a new instance every timeout and once the instance count matches what is required it will have to start destroying excess instances as the come online.

The possible solutions are to either increase the poll timeout so that the instance will be provisioned within that window and be reported back as created, modify the group plugin, or develop some intelligence into your instance plugin. The plugin that I developed as part of a PoC had instances that would take ~4 minutes to be provisioned, which meant that the instance plugin needs a method for it to track what it had provisioned so that it could then check with the provider what state the instances were currently in.

Instance State

There are a number of different ways in order for a plugin to handle state, it could well be that just querying the provider will result in the instance state. However some providers may make it difficult to identify instance created by InfraKit and instances created directly etc. So in my example I needed the plugin to keep track of instances and maintain state during plugin restarts.

  1. InfraKit requires the list of instances so that it can make sure that resource is allocated correctly, so it asks the instance plugin to describe all of its instances.
  2. The Instance plugin will iterate through all of the last known state information.
  3. Each of the instances in it’s last known state will be checked with the provider to determine if they’re created/in-progress/failed.
  4. State information is updated, with the latest information
  5. The instances that are created or in-progress are listed back to InfraKit, where it will either be satisfied with instance count or require some more instances provisioned.


The Instance plugin code along with managing state is located here https://github.com/thebsdbox/InfraKit-Instance-C


InfraKit – The Architecture

Turns out that my previous InfraKit post was almost a complete duplication of something Docker posted pretty much at the same time .. -> https://blog.docker.com/2016/11/docker-online-meetup-recap-infrakit/

One thing that hasn’t so far been covered in a deep-dive fashion is around the plugins themselves, what they are, how you communicate with them and how they communicate with your infrastructure. So in this post, I aim to cover those pieces of information.

Plugin Architecture

Each plugin for InfraKit is technically its own application, and in most cases can be interacted with directly. As an application it needs to conform to a number of standards defined by InfraKit, and have the following capabilities and behaviours:

  •  Create a UNIX socket in /var/run/infrakit
  •  Host a httpd server on that socket
  •  Respond correctly to a number of URLs dependent on the “type” of plugin
  • Respond to the following HTTP Methods:
    •   GET,  typically will be looking at system or instance state
    •   PUT, will be looking to perform instance provisioning or system configuration
  •  Handle JSON passed and parse the data for configuration details

The design choices offer the following benefits:

  • Adheres to the same API designs of docker.socket  -> https://docs.docker.com/engine/reference/api/docker_remote_api/
  • The use of UNIX Socket (that lives on the filesystem), allows the capability to place more complex plugins in containers and expose their interfaces through volume mapping.
  •  As it uses standard http for its communication, it doesn’t tie the plugins to any language. In theory a python plugin could simply be written using the simplehttpserver and bind that to a UNIX Socket.
Reverse-Engineering-Ahead !

To see what is happening under the covers, we can create a fake plugin and use the infrakit cli to interact with it. Below I’m creating a socket using netcat-openbsd and placing it in the plugins directory. This would be visible with a $ build/infrakit plugins ls however that command doesn’t interact with the plugins and just lists the sockets in that directory.

We will now use the infrakit cli to provision an instance by passing it some configuration JSON and asking it to use the “fake” plugin. Immediately afterwards netcat will output the data that was sent to the socket it created, and we can immediately see the URL format along with JSON data that would be sent to a plugin.

$ nc -lU /root/.infrakit/plugins/instance-test &
$ build/infrakit instance provision test.json --name instance-test
POST /Instance.Provision HTTP/1.1
Host: socket
User-Agent: Go-http-client/1.1
Transfer-Encoding: chunked
Accept-Encoding: gzip


From here we can see that the URLs are in the form of /<plugin type>/<action> e.g. POST /Instance.Provision

How to communicate with Plugins

InfraKit gives you the flexibility to go from the simple provisioning of an instance, to having your Infrastructure deployed, monitored, and in the event of mis-configuration or failure, healed. In the simplest example of just needing an instance, through the InfraKit CLI we can explicitly request a new instance and pass the JSON that will contain the configuration needed to define that new instance.

$ build/infrakit instance provision physical_server.json —name instance-physical

However for an end-to-end infrastructure definition a full infrastructure specification would need to be passing to a group plugin for that to ensure that it’s created.


How plugins communicate with your infrastructure

Strictly speaking, the Infrakit architecture covers the design and interaction between plugins (sockets, URL adherence, parsing JSON for configuration information) The plugin itself can be written to use any method that makes sense to provision and destroy the Infrastructure, such as speaking to APIs or even SSH’ing into network switches (it’s 2016, but sadly it’s still the way to interact with some devices 🙁 )

If we look at the vagrant plugin (line 55) we can see that we read all of the properties that were passed through the JSON (CPUs, Memory, networking) and build a Vagrantfile, the plugin will then call vagrant up, which will start the guest machine based upon the configuration in the Vagrantfile.

Some examples that could exist in the future

A networking plugin that takes the  properties and will ensures that those changes are written to that switch and also makes sure that the switch is compliant to the configuration. Given that a lot of switch configuration and feature sets are identical across vendors, it could be possible to have networking properties that are agnostic and the instance-networking can identify the vendor and provide the configuration that matches their interfaces.


DISCLAIMER, I'm only using HPE OneView as an example as I know the API and it's capabilities. There is no guarantee that HPE would write such a plugin.


physical server plugin that would make use of the HPE OneView API to deploy Physical Server Instances.


It quickly becomes clear that from the plugins that exist already, to the ideas for the future that InfraKit can really provide Infrastructure as Code. As the plugin ecosystem continues to grow and evolve, InfraKit has the opportunity to provide a great way to help orchestrate your infrastructure.

Find out more about InfraKit and the plugins on the InfraKit GitHub  …

InfraKit from Docker – an Overview

This marks a first, people actually complaining that my somewhat rambling blog post wasn’t ready…

It’s not going to be possible to cover all of the various topics around InfraKit that I aim to cover in a single post, so this is the first of a few posts that will focus on what InfraKit is, and how it connects together.

What is InfraKit

InfraKit was released by Docker quite recently and has already had ample coverage in the tech media and on Wikipedia. However, I guess my take is that it is a framework(or toolkit, I suppose InfraWork doesn’t work as a name) that has the capability to interact with Infrastructure and provide provisioning, monitoring and healing facilities. The fact that it is a framework means that it is completely extendable and anyone can take InfraKit and extend it to provide those Infrastructure capabilities for their particular needs. Basically a Infrastructure vendor can take InfraKit and add in the capability to have their particular products managed by InfraKit. Although thats a rather commercial point of view, and my hope is that overstretched Infrastructure Admins can take InfraKit and write their own plugins (and contribute them back to the community) that will make their lives easier.

Haven’t we been here before?

There has been a lot of politics happening in the world recently, so I choose to give a politicians answer of yes and no. My entire IT career has been based around deploying, managing and monitoring of Infrastructure and over the ~10 years i’ve seen numerous attempts to make our lives easier through various types of automation.

  • Shell scripts (I guess powershell, sigh 🙁 )
  • Automation engines (with their own languages and nuances)
  • Workflow engines (Task 1 (completed successfully) -> Task 2 (success) -> Task 3 (failed, roll back))
  • OS Specific tools
  • Server/Storage/Network Infrastructure APIs, OS/Hypervisor Management APIs, Cloud APIs … and associated toolkits

All of these have their place and power businesses and companies around the world but where does InfraKit fit? The answer, is that it at has the flexibility to replace or enhance the entire list. Alternatively it can be used in a much more specific nature where it will simply just “keep the lights on” by replacing/rebuilding failed infra or growing and scaling it to meet business and application needs.

What powers InfraKit

There are four components that make up InfraKit that are already well documented on GitHub, however the possibilities for extending them are also discussed in the overview.


This is the component that i’ve focussed on currently and provides some of the core functionality of InfraKit. The Instance plugin provides the foundation of Infrastructure management for InfraKit and as the name suggests, the plugin will provide an instance of Infrastructure resource. The Instance plugins take configuration data in the form of JSON, and provides the properties that are used to configure an instance.

So, a few possible scenarios:

Hypervisor Plugin Cloud Plugin Physical Infrastructure Plugin
VM Template: Linux_VM Instance Type: Medium Hardware Type: 2 cpu server
Network: Dev Region: Europe Server Template: BigData
vCPUs: 4 SSH_KEY: gGJtSsV8 OS build plan: RHEL_7
Mem: 8GB Machine_image: linux Power_State: On
Name: dans_vm_server01

I can then use my instance plugin to define 1, 10, 100, as many as needed instances that will provide my Infrastructure resource. But I want 30 servers 20 for web traffic, 7 for middleware and 3 back end for persistent storage, how do I define my infrastructure to be those resources…


The flavor plugin is what provides the additional layer of configuration that takes a relatively simple instance and defines it as providing a specific set of services.

Again some possibilities that could exist:

WebServer Plugin Middleware Plugin Persistent Storage Plugin
Packages Installed: nginx Packages Installed: rabbitmq Packages Installed: mysql
Storage: nfs://http/ Firewall: 5672 Msql_Username: test_env
Firewall: 80, 443 Cert: ——– cert —  Msql_Password: abc123
Config: nginx.conf RoutingKey: middleware Bind:
Name: http_0{x}  DB_Mount:/var/db

So given my requirements i’d define 20 virtual machine instances and attach the web server flavor to them and so on, that would give me the capacity and the configuration for my particular application or business needs. The flavor plugin not only provides the configuration control, it is also use to ensure that the instance is configured correctly and deemed healthy.

That defines the required infrastructure for one particular application or use case, however to separate numerous sets of instances I need to group them…


The default group plugin is a relatively simple implementation that currently will just hold together all of your instances of varying flavors allowing you to create, scale, heal and destroy everything therein. However groups could be extended to provide the following:

  • Specific and tailored alerting
  • Monitoring
  • Chargeback or utilisation overview
  • Security or role based controls

InfraKit cli

The InfraKit cli is currently the direct way to interact with all of the plugins and direct them to perform their tasks based upon their plugin type.  To see all of the various actions that can be performed on a plugin on a specific type use the cli:

$ build/infrakit <group|flavor|instance> 

$ build/infrakit instance
Available Commands:
  describe    describe the instances
  destroy     destroy the resource
  provision   provisions an instance
  validate    validates an instance configuration

So if we were to use a hypervisor plugin to provision a virtual machine instance we would do something like the following:

Define the instance in some json:

    "Properties": {
    "template": "vm_http_template"
    "network": "dev"
    "vCPUs”: 2
    "mem": "4GB"

Then provision the instance using the correct plugin

$ build/infrakit instance instance.json --name instance-hypervisor

… virtual instance is provisioned …


The next post will cover how the architecture of the plugins, the communication methods between them and how more advanced systems can be architected through the use of InfraKit.

Fixing problems deploying Docker DC (Offline)

Inside my current employer there is quite the buzz around Docker and especially Docker Datacenter being pushed as a commercial offering with Server Infrastructure. This in turn has led to a number of people deploying various Docker components in their various labs around the world, however they tend to be hitting the same issues.

I’m presuming that most of the lab environments must be an internal/secure environment as there is always a request to install without an internet connection (offline).

I did come across one person building Docker DC in one place then trying to manually copy all of the containers to the lab environment (which inevitable broke all of the certificates) essentially DON’T DO THAT.

Luckily for people wanting to deploy Docker DataCentre in an environment without an internet connection, their exists a full bundle containing all the bits that you need (UCP/DTR etc.).  Simply download it on anything with an internet connection, take it to your offline machines and transfer over the bundle and run the following:

$ docker load < docker_dc_offline.tar.gz

Unfortunately, the majority of people who follow the next steps tend to be confused with what happens next:

Unable to find image 'docker/ucp:latest' locally

Pulling repository docker.io/docker/ucp

docker: Error while pulling image: Get https://index.docker.io/v1/repositories/docker/ucp/images: dial tcp: lookup index.docker.io on [::1]:53: read udp [::1]:50867->[::1]:53: read: connection refused.


The culprit behind this is the default behaviour of Docker,based upon a commit in mid-2014 specified that if no tag is specified to a docker image then it will default to the tag :latest. So simply put the installation steps for the Docker UCP ask the use to run the image docker/ucp with no tag and a number of passed commands. Docker immediately ignores the offline bundle images and attempts to connect to docker hub for the :latest version, where the installation promptly fails.

To fix this, you simple need to look at the images that have been installed by the offline bundle and use the tag as part of the installation steps.


ALSO: If you see this error, then please don’t ignore it 😀

WARNING: IPv4 forwarding is disabled. Networking will not work.

$ vim /etc/sysctl.conf:
net.ipv4.ip_forward = 1
sysctl -p /etc/sysctl.conf

Deploying Docker DataCenter

This is a simple guide to deploying some SLES hosts, configuring them to allow deployment of Docker Engines along with configuration to allow Docker Datacenter to be deployed on the platform. It’s also possible to deploy the components using Docker for Mac/Windows as detailed here.

Disclaimer: This is in “no way” an official or supported procedure for deploying Docker CaaS.


I’ve had Docker DataCenter in about 75% of my emails over the last few months and it’s certainly been something that has been on my to-do list to get a private (lab) deployment done. Given the HPE and SuSE announcement in September I decided that I would see how easy it would be to have my deployment on a few SLES hosts, turns out it’s surprisingly simple (although, I was expecting something like OpenStack deployments a few years ago 😕 )

Also, if you’re just looking to *just* deploy Docker DataCenter then ignore the Host configuration steps.



  1. 1-2 large cups of tea (depends on any typing mistakes)
  2. 2 or more SLES hosts (virtual of physical makes no difference, 1vCPU and 2GB ram, 16GB disk) mine were all built from SLE-12-SP1-Server-DVD-x86_64-GM-DVD1
  3. A SuSE product registration, 60 day free is fine (can take 24 hours for the email to arrive) *OPTIONAL*
  4. A Docker Datacenter license 60 day trial is minimum *REQUIRED*
  5. An internet connection? (it’s 2016 … )

Configuring your hosts

SuSE Linux Enterprise Server (SLES) is a pretty locked down beast and will require a few things modified before it can run as a Docker host.

SLES 12 SP1 Installation

The installation was done through the CD images, although it you want to automate the procedure it’s a case of configuring your AutoYast to deploy the correct SuSE patterns. As you step through the installation screens there are a couple of screens to be aware of:

  • Product Registration: If you have the codes then add them in here, it simplifies installing Docker later. ALSO, this is where the Network Settings are hidden  😈 So either set your static networking here or alternatively it can be done through yast on the cli (details here). Ensure on the routing page that IPv4 Forwarding is enabled for Docker networking.
  • Installation Settings: The defaults can leave you with a system you can’t connect to.


Under the Software headline, deselect the patterns GNOME Desktop Environment and X Windows System as we won’t be needing a GUI or full desktop environment. Also under the Firewall and SSH headline, the SSH port is blocked by default and that means you won’t be able to SSH into your server once the Operating System has been installed so click (open).

So after my installation I ended up with two hosts (that can happily ping and resolve one another etc.):

ddc01 /

ddc02 /

The next step is to allow the myriad of ports required for UCP and DTR, this is quite simple and consists of opening up the file /etc/sysconfig/SuSEfirewall2 and modifying it to look like the following:

FW_SERVICES_EXT_TCP="443 2376 2377 12376 12379:12386"

Once this change has been completed, the firewall rules will be re-read by using the command SuSEfirewall2

Installing Docker with a Product registration

Follow the instructions here, no point copying it twice.

Installing Docker without a Product registration

I’m still waiting for my 60-day registration to arrive from SuSE , so in the meantime I decided to start adding other Repositories to deploy applications.  NOTE: As this isn’t coming from a Enterprise repository it certainly won’t be supported.

So the quickest way of getting the latest Docker on a SLES system is to have the latest OpenSuSe repository added, the following two lines will add the repository and add Docker:

zypper ar -f http://download.opensuse.org/tumbleweed/repo/oss/ oss
zipper in docker
docker -v
Docker version 1.12.1, build 8eab29e


To recap, we have a number of hosts configured that have network connectivity and the firewall ports open and finally we’ve Docker installed and ready to deploy containers.

Deploying Docker Datacenter

Deploying the Universal Control Plane (UCP)

On our first node ddc01, we deploy the UCP installer, which automates the pulling of additional containers that make up the UCP.

docker run --rm -it --name ucp \

-v /var/run/docker.sock:/var/run/docker.sock \

docker/ucp install -i --host-address

Errors to watch for:

FATA[0033] The following required ports are blocked on your host: 12376.  Check your firewall settings. 

.. Make sure that you’ve edited the firewall configuration and reloaded the rules

WARNING: IPv4 forwarding is disabled. Networking will not work.

Enable IPv4 forwarding in the yast routing configuration.

Once the installation starts it will ask you for a password for the admin user, for this example the password I set was ‘password’ however I highly recommend that you choose something a little more secure. The installer will also give you the option to set additional SANs for the TLS certificates for additional domain names.

The installation will complete, and in my environment i’ll be able to connect to my UCP by putting in the address of ddc01 in a web browser.


Adding nodes to the UCP

After logging into the UCP for the first time, the dashboard will display everything that the docker cluster is currently managing. There will be a number of containers displayed as they make up the UCP (UCP web server, etcd, swarm manager, swarm client etc…). Adding in additional nodes is as simple as adding in Docker workers to a swarm cluster, possibly simpler as the UCP provides you with a command that can be copied and pasted on all further nodes to add them to the cluster.

Note: The UCP needs a license adding, otherwise additional nodes will fail during the add process.


Deploying the Docker Trusted Registry (DTR)

On ddc02 install the Docker Trusted Registry as it’s not supported or recommended to have the UCP and the DTR on the same nodes.

From ddc02 we download UCP certificate.

curl -k > ucp-ca.pem

To then install the DTR, run this docker command and it will pull down the containers and add the registry to the control plane.

docker run -it --rm docker/dtr install --ucp-url \

--ucp-node ddc02 \

--dtr-external-url \

--ucp-username admin \

--ucp-password password \

--ucp-ca "$(cat ucp-ca.pem)"



With all this completed we have a the following:

  • A number of configured hosts with correct firewall rules.
  • Docker Engine, that starts and stops the containers
  • Docker Swarm, clusters together the Docker Engines (it’s worth noting that it’s not the in built swarm in 1.12 and it still uses the swarm container to link together engines)
  • Docker DTR, the platform for hosting Docker images to be deployed on the engines
  • Docker UCP, as the front end to the entire platform.


I was pleasantly surprised about the simplicity of deploying the components that make up Docker Datacenter. Although it looks a little bit like the various components are running behind the new functionality that has been added to the Docker engine, this is evident in that UCP doesn’t use swarm that is part of 1.12 and wastes a little bit of resource in deploying additional containers to provide the swarm clustering.

It would be nice in the future to provide a more feature rich UI that provides workflow capabilities to compose applications. As currently it’s based upon hand crafting compose files in YAML, that you can copy and paste into the UCP or upload your existing compose files. However the UCP provides an excellent overview of your deployed applications and the current status of containers (logs and statistics).

A peek inside Docker for Mac (Hyperkit, wait xhyve, no bhyve …)

It’s no secret that the code for Docker for Mac ultimately comes from the FreeBSD hypervisor, and the people that have taken the time to modify it to bring it to the Darwin (Mac) platform have done a great job in tweaking code to handle the design decisions that ultimately underpin the Apple Operating System.

Recently I noticed that the bhyve project had released code for the E1000 network card so I decided to take the hyperkit code and see what was required in order to add in the PCI code. What follows is a (rambling and somewhat incoherent) overview of what was changed to move from bhyve to hyperkit and some observations to be aware of when porting further PCI devices to hyperkit.  Again, please be aware i’m not a OS developer or a hardware designer so some of this based upon a possibly flawed understanding… feel free to correct or teach me 🙂

Update: Already heard from @justincormack about Docker for Mac, in that it uses vpnkit not vmnet.

VMM Differences

One of the key factors that led to the portability of bhyve to OSX is that the darwin kernel is loosely based upon the original kernel that powers FreeBSD (family tree from wikipedia here), which typically meant that a lot of the kernel structure and API calls aren’t too different. However OSX is typically aimed at the consumer market and not the server market meaning that as OSX has matured the people from Apple have stripped away some of the kernel functionality that comes as shipped, the obvious one being the removal of TUN/TAP devices in the kernel (can still be exposed through loading a kext (kernel extension)) which although problematic hyperkit has a solution for.

VM structure with bhyve

When bhyve starts a virtual machine it will create the structure of the VM as requested (allocated vCPUs, allocate the memory, construct PCI devices etc.) these are then attached to device nodes under /dev/vmm then the bhyve kernel module handles the VM execution. Also being able to examine /dev/vmm/ provides a place for administrators to see what virtual machines are currently running and also to allow them to continue running unattended.

Internally the bhyve userland tools make use of virtual machine contexts that link together the VM name to the internal kernel structures that are running the VM instance. This allows a single tool to run multiple virtual machines that you typically see from VMMs such as Xen, KVM or ESXi.

Finally the networking configuration that takes place inside of bhyve… Unlike the OSX kernel, freeBSD typically comes prebuilt with support for TAP devices (if not the command kldload if_tap is needed). However simply put, with the use of a TAP device it greatly simplifies the usage of guest network interfaces. When an interface is created with bhyve a PCI network device inside the VM is created and then on the physical host a TAP device is created. Inside the VM when network frames are written to the PCI device bhyve actually writes these frames onto the TAP device on the physical host (using standard write(), read() functions on file descriptors) and those packets are then broadcast out on the physical interface on the host. If you are familiar with VMware ESXi then the concept is almost identical to the way a VSwitch functions.

bhyve Network

VM Structure with Docker for Mac (hyperkit)

So the first observation with the architecture for hyperkit is that all of the device node code /dev/vmm/ has been removed, which has had the effect of making virtual machines process based. This means that when hyperkit starts a VM it will malloc() all of the requested memory etc.. and it become the singular owner of the virtual machine, essentially killing the process ID of hyperkit will kill the VM. Internally all of the virtual machine context code has been removed because hyperkit process to VM is now a 1:1 association.

The initial design to remove all of the context code (instead of possibly always tagging it to a single vm context) requires noticeable changes to every PCI module that is added/ported from bhyve as it’s all based on creating and applying these emulated devices to a particular VM context.

To manage VM execution hyperkit makes use of the hypervisor.framework which is a simplified framework for creating vCPUs, passing in mapped memory and creating an execution loop.

Finally are the changes around network interfaces, from inside the virtual machine the same virtio devices are created as would be created on bhyve. The difference is linking these virtual interfaces to a physical interface, as with OSX there is no TAP device that can be created to link virtual and physical. So their currently exists two methods to pass traffic between virtual and physical hosts, one of which is the virtIO to vmnet (virtio-vmnet) and the other is virtio to vpnkit (virtio-vpnkit) PCI devices. These both use the virtio drivers (specifically the network driver) that are part of any modern Linux kernel and then hand over to the backend of your choice on the physical system.

It’s worth pointing out here that the vmnet backend was the default networking method for xhyve and it makes use of the vmnet.framework, which as mentioned by other people is rather poorly documented. It also slightly complicates things by it’s design as it doesn’t create a file descriptors that would allow the existing simple code to read() and write() from, and it also requires elevated privileges to make use of.

With the work that has been done by the developers at Docker a new alternative method for communicating from virtual network interfaces to the outside world has been created. The solution from Docker is two parts:

  • The virtio-vpnkit device inside hyperkit that handles the reading and writing of network data from the virtual machine
  • The vpnkit component that has a full TCP/IP stack for communication with the outside world.

(I will add more details around vpnkit, when I’ve learnt more … or learnt OCaml, which ever comes first)

Networking overviews

bhyve overview (TAP devices)


xhyve/hyperkit overview (VMNet devices)



 Docker for Mac / hyperkit overview (vpnkit)



Porting (PCI devices) from bhyve to hyperkit

All of the emulated PCI devices all adhere to a defined set of function calls along with a structure that defines pointers to functions and a string that identifies the name of the PCI device (memory dump below)


The pci_emul_finddev(emul) will look for a PCI device e.g. (E1000, virtio-blk, virtio-nat) and then manage the calling of its pe_init function that will initialise the PCI device and then add it to the virtual machine PCI bus as a device that the operating system can use.

Things to be aware of when porting PCI devices are:

  • Removing VM context aware code, as mentioned it is a 1:1 between hyperkit and VM.
    • This also includes tying up paddr_guest2host() which maps physical addresses to guests etc.
  • Moving networking code from using TAP devices with read(), write() to making use of the vmnet framework

With regards to the E1000 PCI code i’ve now managed to tie up the code so that the PCI device is created correctly and added to the PCI bus, just struggling to fix the vmnet code (so feel free to give take my poor attempt and fix it successfully 🙂 https://github.com/thebsdbox/hyperkit)



Further reading