In pursuit of a tinier binary-(er)

… yes that was an attempt to make the title rhyme ūüôĀ

tl;dr¬†make an executable smaller by hiding your code inside the header of the executable… read on for the gory detail.

There was a great post recently from¬†Dieter Reuter¬†around building the smallest possible Docker Image, which I thought posed an interesting idea mainly due to some of the crazy sizes of Docker images I keep having to deal with. I decided that I would join in with the challenge and see where I could shrink both a binary and by association the resulting Docker Container down to it’s smallest possibility.

I created a number of binaries during my playing around, below is a list of five of them that all print to STDOUT the following text "Hello Docker World!\n"¬†If you’re not familiar with escaped characters, the ‘\n’ simply is the newline character.

*I realise I capitalised some of the strings by accident, but ‘h’ still occupies the same space as ‘H’ ūüėČ

Initial failure

Before I delve into the steps I went through to make the small container, it’s worth pointing out that there is a fatal flaw in one of the above binaries when placed in a SCRATCH container. *hint* there is a duplicate binary with the suffix _STATIC ūüôā

The reason that the hello_in_C will fail to run in the SCRATCH container is that it has dynamic requirements on a number of system libraries. Most notably is libc, which is the base C library that contains a lot of basic day-to-day code that provides the standard functionality to C programs. If we were to place this into a Docker container the following would be the result:

$ docker run -it --rm hello:C
standard_init_linux.go:178: exec user process caused "no such file or directory"

We can examine binaries to check for external dependencies using the ldd tool to see what external libraries are needed to run the binary file. Alternatively, we can use volume mapping to pass the host Operating System libraries into the SCRATCH container -v /lib64:/lib64:ro, this will provide the libraries required for this particular executable to successfully execute.

docker run -v /lib64:/lib64:ro -it --rm hello:C
Hello Docker World!

To permanently fix this issue is quite simple and requires building the C binary with the -static compile-time flag (the package glibc-static will be required), this quite simply will bundle all code into a single file instead of relying on external libraries. This has to knock on effect of making the binary easier to run on other systems (as all code is in one place) however the binary has now increased in size by 100¬†times… which is the opposite of what we’re trying to accomplish.

What makes an Executable

Ignoring MS-DOS .com files that no-one has touched and hasn’t been supported in years, most executables regardless of Operating System typically consist of a header that identifies the executable type (e.g. elf64, winPE) ¬†and a number of sections:

  • .text, code that can be executed
  • .data, static variables
  • .rodata, static constants
  • .strtab /.shstrtab, string tables
  • .symtab, symbol tables.

The Executable header will contain an entry that points to the beginning of the .text section, which the Operating System will then use when the executable is started to find the actual code to run. This code then will access the various bits of data that it needs from the .data or .rodata sections.

Basic overview of a “Hello Docker World!” execution process

  1. The exec() family functions will take the path of file and attempt to have the OS execute it.
  2. The Operating System will examine the header to verify the file, if OK it will examine the header structure an find the entry point.
  3. Once the entry point is found, the operating system will start executing the code from that point. It is at this point where the program itself is now running.
  4. The program will set up for the function to write the string to stdout
    1. Set string length
    2. Set the pointer to the string in the .data section
    3. Call the kernel
  5. Call the exit function (otherwise the kernel will assume the execution failed)

Strip out Sections

In the quick above diagram, we can see through the course of execution that there is a number of sections within the executable that aren’t needed. In most executables there may be debug symbols or various sections that apply to compilers and linkers that are no longer required once the executable has been put together.

In order to have a stripped executable, it can either be compiled with the -s flag (also make sure -g isn’t used, as this adds¬†debug sections). Alternatively we can use the strip tool that has the capability to remove all non-essential sections.

$ strip --strip-all ./hello_in_C_STRIPPED
$ ls -la hello_in_C_ST*
-rwxrwxr-x. 1 dan dan 848930 Feb 28 15:35 hello_in_C_STATIC
-rwxrwxr-x. 1 dan dan 770312 Feb 28 18:07 hello_in_C_STRIPPED

With languages such as GO, there can be significant savings by stripping any sections that aren’t essential (although if you’re doing this for production binaries it should be part of your compile/make process).

Extreme Shrinking

The final option¬†that will keep your hands clean for shrinking an executable is to make use of tools like UPX¬†which adds a layer of compression to your executable shrinking what’s left of your stripped binary. Taking my original GO binary I went from:

  • go build hello_docker_world.go = 1633717 bytes
  • strip --strip-all = 1020296¬†bytes
  • upx = 377136¬†bytes

Clearly a significant saving in terms of space.

Getting your hands dirty

Everything that has been discussed so far has been compiled through standard build tools and modified with the compiler or OS toolchain that managed executables. Unfortunately we’ve reached as far as we can go with these tools, as they will always build to the ELF/OS standards and always create the sections that they deem required.

In order to build a smaller binary, we’re going to have to move away from¬†the tools that make building executables easier and hand craft a tailored executable. Instead of sticking with the format of [header][code][data], we’re going to look at how we can hide our code inside the header.

Whilst there are some parts of the header that are a requirement, there are some that have to just be a non-zero value and others that are left blank for future use. This is going to allow us to simply change entries in the ELF header from legal values to the code we want to execute and the following will happen:

  1. The Operating System will be asked to execute the file
  2. The OS will read the ELF header, and verify it (even though some values don’t make sense)
  3. It will then find the code entry point in the header that points to the middle of the actual header ūüôā
  4. The OS will then start executing from that point in the header, and run our code.


Explained Code below

This code pretty much fits just in the ELF¬†header itself, so I have¬†broken the header up and labelled the header fields and where we’ve hidden the code we want to execute.

First part of header (has to be correct)

org     0x05000000 Set Origin address
db      0x7F, "ELF" Identify as ELF binary
dd      1 32-bit
dd      0 Little endiann
dd      $$ Pointer to the beginning of the header
dw      2 Code is executable
dw      3 Instruction set (x86)
dd      0x0500001B
dd      0x0500001B Entry point for our code (section below)
dd      4

Broken Header / Our code

mov     dl, 20 Address of Sections header  Take 20 characters
mov     ecx, msg  From the string at this address
int     0x80 Elf Flag table  Print them

Remaining header (has to be correct)

db      0x25  Size of the Elf Header
dw      0x20  Size of the Program Header
dw      0x01  Entries in the Program Header

Remaining Code (now beyond the header)

inc      eax  Set Exit function
int      0x80 Call it

String section

msg     db      'Hello Docker world!', 10

It’s also worth pointing out that this code won’t be fully “compiled”, as what is written above is actually binary format and therefore nasm will take the text and¬†write out the binary code directly as written above.

Build and run the executable with:

$ nasm -f bin ./tiny_hello_docker.asm -o hello_docker_world
$ chmod +x ./hello_docker_world
Hello Docker world!

Further Reading

This wikipedia article covers all of the ELF standard in the most readable way i’ve come across:¬†

A much more in-depth overview of hiding things in the ELF headers is available here:

Leave a Reply

Your email address will not be published. Required fields are marked *