In pursuit of a tinier binary-(er)

… yes that was an attempt to make the title rhyme ūüôĀ

tl;dr¬†make an executable smaller by hiding your code inside the header of the executable… read on for the gory detail.

There was a great post recently from¬†Dieter Reuter¬†around building the smallest possible Docker Image, which I thought posed an interesting idea mainly due to some of the crazy sizes of Docker images I keep having to deal with. I decided that I would join in with the challenge and see where I could shrink both a binary and by association the resulting Docker Container down to it’s smallest possibility.

I created a number of binaries during my playing around, below is a list of five of them that all print to STDOUT the following text "Hello Docker World!\n"¬†If you’re not familiar with escaped characters, the ‘\n’ simply is the newline character.

*I realise I capitalised some of the strings by accident, but ‘h’ still occupies the same space as ‘H’ ūüėČ

Initial failure

Before I delve into the steps I went through to make the small container, it’s worth pointing out that there is a fatal flaw in one of the above binaries when placed in a SCRATCH container. *hint* there is a duplicate binary with the suffix _STATIC ūüôā

The reason that the hello_in_C will fail to run in the SCRATCH container is that it has dynamic requirements on a number of system libraries. Most notably is libc, which is the base C library that contains a lot of basic day-to-day code that provides the standard functionality to C programs. If we were to place this into a Docker container the following would be the result:

$ docker run -it --rm hello:C
standard_init_linux.go:178: exec user process caused "no such file or directory"

We can examine binaries to check for external dependencies using the ldd tool to see what external libraries are needed to run the binary file. Alternatively, we can use volume mapping to pass the host Operating System libraries into the SCRATCH container -v /lib64:/lib64:ro, this will provide the libraries required for this particular executable to successfully execute.

docker run -v /lib64:/lib64:ro -it --rm hello:C
Hello Docker World!

To permanently fix this issue is quite simple and requires building the C binary with the -static compile-time flag (the package glibc-static will be required), this quite simply will bundle all code into a single file instead of relying on external libraries. This has to knock on effect of making the binary easier to run on other systems (as all code is in one place) however the binary has now increased in size by 100¬†times… which is the opposite of what we’re trying to accomplish.

What makes an Executable

Ignoring MS-DOS .com files that no-one has touched and hasn’t been supported in years, most executables regardless of Operating System typically consist of a header that identifies the executable type (e.g. elf64, winPE) ¬†and a number of sections:

  • .text, code that can be executed
  • .data, static variables
  • .rodata, static constants
  • .strtab /.shstrtab, string tables
  • .symtab, symbol tables.

The Executable header will contain an entry that points to the beginning of the .text section, which the Operating System will then use when the executable is started to find the actual code to run. This code then will access the various bits of data that it needs from the .data or .rodata sections.

Basic overview of a “Hello Docker World!” execution process

  1. The exec() family functions will take the path of file and attempt to have the OS execute it.
  2. The Operating System will examine the header to verify the file, if OK it will examine the header structure an find the entry point.
  3. Once the entry point is found, the operating system will start executing the code from that point. It is at this point where the program itself is now running.
  4. The program will set up for the function to write the string to stdout
    1. Set string length
    2. Set the pointer to the string in the .data section
    3. Call the kernel
  5. Call the exit function (otherwise the kernel will assume the execution failed)

Strip out Sections

In the quick above diagram, we can see through the course of execution that there is a number of sections within the executable that aren’t needed. In most executables there may be debug symbols or various sections that apply to compilers and linkers that are no longer required once the executable has been put together.

In order to have a stripped executable, it can either be compiled with the -s flag (also make sure -g isn’t used, as this adds¬†debug sections). Alternatively we can use the strip tool that has the capability to remove all non-essential sections.

$ strip --strip-all ./hello_in_C_STRIPPED
$ ls -la hello_in_C_ST*
-rwxrwxr-x. 1 dan dan 848930 Feb 28 15:35 hello_in_C_STATIC
-rwxrwxr-x. 1 dan dan 770312 Feb 28 18:07 hello_in_C_STRIPPED

With languages such as GO, there can be significant savings by stripping any sections that aren’t essential (although if you’re doing this for production binaries it should be part of your compile/make process).

Extreme Shrinking

The final option¬†that will keep your hands clean for shrinking an executable is to make use of tools like UPX¬†which adds a layer of compression to your executable shrinking what’s left of your stripped binary. Taking my original GO binary I went from:

  • go build hello_docker_world.go = 1633717 bytes
  • strip --strip-all = 1020296¬†bytes
  • upx = 377136¬†bytes

Clearly a significant saving in terms of space.

Getting your hands dirty

Everything that has been discussed so far has been compiled through standard build tools and modified with the compiler or OS toolchain that managed executables. Unfortunately we’ve reached as far as we can go with these tools, as they will always build to the ELF/OS standards and always create the sections that they deem required.

In order to build a smaller binary, we’re going to have to move away from¬†the tools that make building executables easier and hand craft a tailored executable. Instead of sticking with the format of [header][code][data], we’re going to look at how we can hide our code inside the header.

Whilst there are some parts of the header that are a requirement, there are some that have to just be a non-zero value and others that are left blank for future use. This is going to allow us to simply change entries in the ELF header from legal values to the code we want to execute and the following will happen:

  1. The Operating System will be asked to execute the file
  2. The OS will read the ELF header, and verify it (even though some values don’t make sense)
  3. It will then find the code entry point in the header that points to the middle of the actual header ūüôā
  4. The OS will then start executing from that point in the header, and run our code.

 

Explained Code below

This code pretty much fits just in the ELF¬†header itself, so I have¬†broken the header up and labelled the header fields and where we’ve hidden the code we want to execute.

First part of header (has to be correct)

org     0x05000000 Set Origin address
db      0x7F, "ELF" Identify as ELF binary
dd      1 32-bit
dd      0 Little endiann
dd      $$ Pointer to the beginning of the header
dw      2 Code is executable
dw      3 Instruction set (x86)
dd      0x0500001B
dd      0x0500001B Entry point for our code (section below)
dd      4

Broken Header / Our code

mov     dl, 20 Address of Sections header  Take 20 characters
mov     ecx, msg  From the string at this address
int     0x80 Elf Flag table  Print them

Remaining header (has to be correct)

db      0x25  Size of the Elf Header
dw      0x20  Size of the Program Header
dw      0x01  Entries in the Program Header

Remaining Code (now beyond the header)

inc      eax  Set Exit function
int      0x80 Call it

String section

msg     db      'Hello Docker world!', 10

It’s also worth pointing out that this code won’t be fully “compiled”, as what is written above is actually binary format and therefore nasm will take the text and¬†write out the binary code directly as written above.

Build and run the executable with:

$ nasm -f bin ./tiny_hello_docker.asm -o hello_docker_world
$ chmod +x ./hello_docker_world
$./hello_docker_world
Hello Docker world!

Further Reading

This wikipedia article covers all of the ELF standard in the most readable way i’ve come across:¬†https://en.wikipedia.org/wiki/Executable_and_Linkable_Format

A much more in-depth overview of hiding things in the ELF headers is available here: http://www.muppetlabs.com/~breadbox/software/tiny/teensy.html

Raspberry Pi with Docker

I’ve put off purchasing Raspberry Pis for a few years as I was pretty convinced that the novelty would wear off very quickly and they would be consigned to the drawer of random cables and bizarre IT equipment i’ve collected over the years (Parallel cables and zip drives o_O).

The latest iteration of the Pi is the v3 that comes with 1Gb of ram and 4 arm cores and it turns out that whilst it’s not exactly a computer powerhouse, it can still handle enough load to do a few useful things.

Raspberry Pis

I’ve currently got a pair of them in a docker swarm cluster (docker 1.12-rc3 for armv7l available here). Which has given me another opportunity to really actually play with docker and try and replace some of my linux virtual machines with “lightweight” docker containers.

First up: To ease working between the two pi’s I created a nfs share for all my docker files etc. I then decided that having my own Docker registry to share images between hosts would be useful. So on my first node I did a docker pull etc. for the Docker registry container and attempted to start it. This is where I realised that the container will just continuously restart, a quick peer into the container itself and I realised that it has binaries compiled for x86_64¬†not¬†armv7l ¬†so that clearly wasn’t going to work here. So¬†that chalks up failure number one for a pure Raspberry Pi Docker cluster as my Registry had to be ran from a CoreOS virtual machine.

Once that was up and running, my first attempt to push an image from the Pis resulted in the error message :

https://X.X.X.X:5000/v1/_ping: http: server gave HTTP response to HTTPS client

After some googling it appears that this issue is related to having an insecure repository, this can be resolved by going down the generating certificate route etc.. However to fix the insecurity issue the docker daemon will need starting with some parameters to allow the use of an insecure registry.

Note: The documentation online informs you to update the docker configuration files and restart the docker daemon, however there appears to be a bug in raspbian/debian implementation. For me editing /etc/default/docker and adding DOCKER_OPTS='--insecure-registry X.X.X.X:5000' had no effect. This can be inspected by looking at the output from $ docker info as the insecure registries are listed here.

To fix this I had to edit the systemd start up files so that it would start dockerd with the correct parameters.

$ sudo vim /lib/systemd/system/docker.service

...

ExecStart=/usr/bin/dockerd --insecure-registry X.X.X.X:5000 -H fd://

After restarting the daemon, I can successfully push/pull from the registry.

Finally: The first service I attempted to move to a lightweight container resulted in a day and a half of fiddling (clearly a lot of learning needed). Although to clarify this was due to me wanting some capabilities that were compiled into the existing packages.

Moving bind into a container “in theory” is relatively simple:

  • Pull an base container
  • pull the bind package and install (apt-get install -y bind9)
  • map a volume containing the configuration and zone files
  • start the bind process and point it at the mapped configuration files

All of this can be automated through the use of a Dockerfile like this one. However after further investigation and a desire to monitor everything as much as possible, it became apparent that the use of the statistic-channel on bind 9.9 wouldn’t be sufficient and i’d need the 9.10 (mainly for JSON output). After creating a new docker file that adds in the unstable repos for debian and pulling the bind 9.10 packages it turns out that debian compile bind without libjson support ūüôĀ meaning that json output was disabled. This was the point where Docker¬†and I started to fall out as a combination of dockers layered file system and build’s lack of ability to use --privileged or the -v (volume) parameter don’t work. This resulted in me automating a docker container that did the following:

  • Pull an base container
  • Pull a myriad of dev libraries, compilers, make toolchains etc.
  • download the bind9.10 tarball and untar it
  • change the WORKDIR and run ./configure with all of the correct flags
  • make install
  • delete the source directory and tarball
  • remove all of the development packages and compilers

This resulted in a MASSIVE 800Mb docker image just to run bind ūüôĀ In order to shrink the docker container I attempted a number of alternative methods such as using an NFS mount inside the container where all of the source code would reside for the compiling which wouldn’t be needed once a make install was ran. However as mentioned NFS mounts (require --privileged to mount) aren’t allowed with docker build and neither is it an option to pass it through with a -v flag as that doesn’t exist for building. This left me with the only option of manually having to create a docker container for bind that would start a base container with a volume passed through of already “made” binaries along with the required shared libraries and do a make install. This container then could be exited and committed as an image for future use and was significantly smaller that the previous image.

dockerimages

There are a¬†number of issues on the Docker Github page around docker build not supporting volume mapping or privileged features. However the general response to feature requests that would have assisted my deployment generally are for edge cases and won’t be part of Docker any time soon.

Still, got there in the end and with a third Pi in the post i’m looking forward to moving some more systems onto my Pi cluster and coding some cool monitoring solutions ūüėÄ

HP OneView Automation through the API

I’ve had the opportunity to head to some exciting places over the last few weeks/months and especially in the more recent weeks i’ve been heading up and down the country on a regular basis. This has given me time whilst sat on the train “yay!” to really spend some time playing around with HP OneView.. I’ve already had a go at wrapping some of the API in Objective-C and decided to make something a little bit more useful.

I probably should have done my development work in a language that is a little bit more recent, something such as python etc.. however I stuck with a 43 (at time of writing) year old programming language ..¬†C. This does give me the option of porting it to anything with a C compiler and libcurl so the option is there ūüôā I’ve also made use of the libjannson library which is fantastic for manipulating and reading JSON (http://www.digip.org/jansson/).

So, what i’ve ended up with is a simple tool that can plug into automation tools pretty easily (Chef, Puppet, Ansible i’m looking at you) that can interact with HP OneView and do some simple reporting in JSON or Tab delimited output. It also can do somethings that currently aren’t available.. such as interact with multiple HP OneView instances!

I’d full screen this before clicking play.. ūüôā

This is a quick example of pulling some details from one instance (Enclosure-Groups, Server Hardware Type) and using that to move a server profile from one HP OneView to another..

EVO:RAIL – LoudMouth aka Zeroconf

What is Zeroconf?

Zeroconf was first proposed in November 1999 and finalised in 2003 and has found the largest adoption in Mac OS products, nearly all networked printers and other network device vendors. The most obvious and recognisable implementation of zeroconf is bonjour, which has been¬†part of Mac OS since version 9 and is used to provide a number of shared network services. The basics of Zeroconf are explained quite simply on zeroconf.org with the following (abbreviated statement) “making it possible to take two laptop computers, and connect them …¬†without needing a man in a white lab coat to set it all up for you”.

Basically zeroconf allows a server/appliance or client device to discover one another without any networking configuration. It is comparable to DHCP in some regards in that a computer with no network configuration can send out a DHCP request (essentially asking to be configured by the DHCP server), the response will be an assigned address and further configuration allowing communication on the network. Where it differs is that zeroconf also allows for advertisement of services (time capsule, printer services, iTunes shared libraries etc.), it also can advertise small amounts of data to identify itself as a type of device.

A Time machine advertisement over zeroconf: (MAC address removed)

[dan@mgmt ~]$ avahi-browse -r -a -p -t | grep TimeMachine
+;eth0;IPv4;WDMyCloud;Apple TimeMachine;local
=;eth0;IPv4;WDMyCloud;Apple TimeMachine;local;WDMyCloud.local;192.168.0.249;9;"dk0=adVN=TimeMachineBackup,adVF=0x83" "sys=waMA=00:xx:xx:xx:xx:xx,adVF=0x100"

How the EVO:RAIL team are using Zeroconf

From recollection of the deep-dive sessions, I may have mistaken the point (corrections welcome).

Zeroconf has found the largest adoption in networked printers and apple bonjour services, however in the server deployment area a combination of DHCP and MAC address matching is more commonly used (Auto deploy or kickstart from PXE boot).

The EVO:RAIL team have implemented a Zeroconf daemon that lives inside every vSphere instance and inside the VCSA instance. The daemon inside the VCSA wasn’t really explained however the vSphere daemon instances allow the EVO:RAIL engine to discover them and take the necessary steps to automate their configuration.

Implementing Zeroconf inside vSphere(esxi)

The EVO:RAIL team had to develop their own zeroconf daemon named loudmouth that is coded entirely in python. The reason behind this was explained in one of the technical deep dives, the problem being that the majority of pre-existing zeroconf implementations have dependancies on various linux shared libraries.

/lib # ls *so | wc -l
86
/lib # uname -a
VMkernel esxi02.fnnrn.me 5.5.0 #1 SMP Release build-1331820 Sep 18 2013 23:08:31 x86_64 GNU/Linux
....
[dan@mgmt lib]$ ls *so | wc -l
541
[dan@mgmt lib]$ uname -a
Linux mgmt.fnnrn.me 3.8.7-1-ARCH #1 SMP PREEMPT Sat Apr 13 09:01:47 CEST 2013 x86_64 GNU/Linux

As the quick example above shows (32bit libs) a vSphere instance contains only a few elf based libraries providing a limited subset of shared functionality. This means that whilst elf based binaries can be moved from a linux distribution over to a vSphere instance, the chance is that a requirement on a shared library won’t be met. Further more building a static binary possibly won’t help as the VMKernel (VMwares kernel implementation)doesn’t implement the full set of linux syscalls, which makes sense as it’s not an OS implementation the userland area of the vSphere is purely for management of the hypervisor. The biggest issue that an implementation of zeroconf which relies on UDP and datagrams is the lack of implementaion of IP_PKTINFO.

This rules out avahi, Zero Conf IP (zcif), and linux implementations of mDnsResponder.

What about loudmouth?

Unfortunately it is yet to be said if any components of EVO:RAIL will be open sourced or back ported to vSphere, so whilst VMware have a zeroconf implementation for vSphere it is likely it will remain proprietary.

What next…

I’ve improved on where I’ve been with my daemon, however i’m hoping to upload it to github sooner rather than later. Unfortunately work has occupied most of the weekend and most evenings so far .. that tied with catching up on episodes of elementary and dealing with endless segfaults as I add any simple functionality have slowed the progress more than I was expecting.

Also I decided to finish writing up this post, which took most of this evening ūüėź

Debugging on vSphere

A summary of what to expect inside vSphere can be read here and there is no point duplicating existing information (http://www.v-front.de/2013/08/a-myth-busted-and-faq-esxi-is-not-based.html). More importantly when dealing with the vSphere userland libraries or more accurately lack of, then the use of strace is hugely valuable. More details on strace can be found here (http://dansco.de/doku.php?id=technical_documentation:system_debugging).

Update to the sshwrapper

Had quite a few emails recently about using the ssh wrapping class I wrote aaaages ago. I’ve traded a couple of emails back and forth.. and decided that it would be easier for everyone if I just updated these old classes.

So the changes:

  • Added DFSSHConnectionType, this class is used to define how ssh will attempt to connect (password/key/keyboard)
  • Moved everything to a namespace (DF)
  • ARC
  • Tidied up the code, and sorted an issue with CStrings making a mess when converting to an NSString
  • Other things I did ages ago.. (no idea)

It’s uploaded to github.. let me know if there is any problems..

https://github.com/thebsdbox/DFSSHWrapper

 

[UPDATE]: Added the ability to place a timeout on a command sent over ssh…

LVM2 device names and device paths

If you use LVM (Linux volume manager) you’ll be no doubt aware that the ability to place your volumes in groups and name the volumes accordingly makes administration much easier. Having a volume group called oracle and then a binaries and data volume means a tiny glance at a system identifies what is doing what (or so I was under the impression). When it comes to doing any sort of low level administration or system metrics, you may notice that device paths change. Using fdisk or looking at /proc metrics and you’ll suddenly come across dm-0 etc.. ¬†devices, which clearly are the lvm devices under another name. Scanning quickly through a ton of web pages and the lvm2 documentation turned up nothing.

Then I realised that the first two columns in /proc/diskstats where the major/minor device numbers, which when compared with the entries in /dev/mapper allowed me to match up the devices. This can also be achieved by looking at the major/minor numbers from lvdisplay and dmsetup ls. However the latter two tools require root privileges so clearly were not going to work with my current project.

Turned out that the simplest method is to look through the sysfs file hierarchy as the block devices can identify other names that they have. The following command will iterate all of the block devices with alternative names and print them in a format that can be piped into anything else, or in my case be read easily into an NSArray.

grep -H "." /sys/devices/virtual/block/*/dm/name \
| sed 's/\/sys\/devices\/virtual\/block\///g' \
| sed 's/\/dm\/name:/ /g'

	

Getting files through a terminal window

This is a technique I had to use numerous times for a previous job where I would need to transfer files from servers that could not be connected to directly. In the majority of companies there will be numerous networks, where “jump boxes” are required to get across various networks and get to the server in question.

The usual approach of moving files to servers would be through a variety of means (FTP/SCP/NFS/CIFS etc.), however in the case of numerous jump boxes and networks would mean that having to transfer the file between each jump/network. It is possible to use copy and paste however this would only work in the case of text files, trying to display or copy binary data can cause all manner of issue and really mess up your terminal window. So for me the best solution is to UUEncode the binary data into ansi text which can be safely copied out of the terminal window, pasted into a file on your local machine and UUDecoded back to binary data again.

To do this simply UUEncode a file, and the output will be presented to STDOUT i.e. the terminal window.

$ uuencode test.rpm test.rpm

Note: the double typing of the name is required as the first argument is the file to uuencode, whilst the second argument is the name of the file that will be outputted. The output will be presented to STDOUT as shown in an example :

begin 644 test.rpm
M"GP]+2TM+2TM+2T]6R!796(@=G5L;F5R86)I;&ET:65S('1O(&=A:6X@86-C
M97-S('1O('1H92!S>7-T96T@73TM+2TM+2TM+2T]?`I\/2TM+2TM+2TM+2TM

The next step is simply to copy everything from the word “begin” to the word “end” out from the terminal window and then paste it into a text editor of your choice on your local machine under a temporary name. This will then need opening with uudecode, which will then process the text and spit out the file under the filename specified with the encoder.

$ uudecode temporary.uua

In the same location will be the decoded file.

ESXi v4.1 SFTP access

[UPDATE]

This is a 32bit binary, which I think needs some pretty old kernel version. Hence it only works on 4.0, I will try and get an updated release for 4,1 (*note) ESXi 5 comes with sftp-server already.

 

I came across something interesting while fiddling earlier, after spending about 2 hours building a static release of openssh server that was going to replace dropbear. I’d gotten to a point where I could build a i386 release of the binaries with no random library requirements and sshd would start and listen on a port defined in /etc/ssh/sshd_config. unfortunately starting ssh in debug mode allowed me to see numerous glibc errors during connections and explain why I couldn’t connect. At this point I don’t think there is any real way of replacing dropbear with a complete openssh solution even statically linking. Even testing the openssh sftp binary that had been compiled showed that it wasn’t coping with a system call not returning UIDs correctly meaning that it would report a FATAL error and close continually.

Given openssh wasn’t going to be replaced I researched about dropbear and if there was a newer version perhaps with sftp, unfortunately not. Eventually I came across notes on a blog mentioning that dropbear “supports” openssh sftp. After restoring ESXi back to its default filesystem settings (ssh enabled) it appears the attempting to sftp to esxi returns the following error.

ash: /sbin/sftp-server: not found
Connection closed

After compiling a slightly older version of openssh (static) I found a release of sftp-server that will once placed in /sbin on ESXi allows full usage of sftp (including sshfs mounting) binary below.

sftp-server.tar.gz

 

 

 

 

 

 

 

File access in ESXi 4.1 (nfs and tcp)

I’ve had numerous¬†occasions¬†were i’ve needed to upload files to the actual file systems on an esxi system, the only ‘proper’ method is using the abysmal virtual infrastructure client and working mainly on a mac means I need to use VMware Fusion for windows to run the client to connect to the server (overkill). So it’s possible to enable ssh access to the server using the tech support menu, which allows access to the underlying hypervisor and it’s file systems and therefore it’s possible to scp files to the filesystems again this is quite slow and overkill due to the encryption being used. Also due to dropbear being used for the ssh it doesn’t use sftp, which means that you can’t mount the filesystems ala. FUSE and sshfs.

I should say at this point, the goal of all this was to allow me to keep all my ISOs on one server and be able to access them from everywhere also, I wanted a PXE server to be able to access the ISOs and loopback mount them and then present the contents via NFS to the installers started by PXE.

So looking around I found some ftp binaries that should work on ESXi, given that the console access for ESXi is done with busybox there is no file command to determine what binary type the files are so I was unaware of what binaries I could run inside ESXi. This all worked fine following the instructions located on the vm-help.com website here however a few of the instructions are a little bit incorrect such as the path to tcpd is incorrect in inetd, however i’ll leave you to fix that. So on the PXE server using FUSE again and curlftpfs to mount the filesystem and this revealed a glaring bug as soon as I loop back mounted the first ISO. Unfortunately the problem lies in the fact that curlftpfs will use memory to store the file as it downloads it for access by FUSE, so trying to open a 4GB DVD ISO quickly exhausted my PXE servers memory and then it became unresponsive, great.

Further research turned up a blog post about some guy trying to use unfs to enable nfs sharing between two ESXi boxes, more specifically it was mentioned that linux binaries would work fine in the ESXi service console. One thing that was slightly confusing was that ESXi is x86_64 (64bit) however binaries that you need for the service console have to be 32bit otherwise you’ll get a confusing error that the binaries can’t be found when you try and run them due to busybox’s odd handling of errors. I present below the binaries required for nfs in ESXi :-

nfs binaries for x86

These are pretty easy to use, scp the file over to ESXi and untar it in /tmp al that’s left is to place the files in /tmp/sbin into /sbin and the files in /tmp/etc into /etc. The /etc/exports contains one entry to give access to /vmfs/volumes, which means that accessing the nfs share will give you the UUID paths for the disks containing VM’s and ISOs. To start the nfs server, start portmap first and then start unfsd which should be started the following way (unfsd -d &), this is due to unfsd not being able to disconnect from console on start up (something to do with busybox I assume).

One final note, is that once another machine connect to the nfs share portmap will go start using 50%-70% cpu and will need stopping and starting for other nfs clients. I’m still looking into this, however having a cron job to restart the process every few minutes should do the job.