A peek inside Docker for Mac (Hyperkit, wait xhyve, no bhyve …)

It’s no secret that the code for Docker for Mac ultimately comes from the FreeBSD hypervisor, and the people that have taken the time to modify it to bring it to the Darwin (Mac) platform have done a great job in tweaking code to handle the design decisions that ultimately underpin the Apple Operating System.

Recently I noticed that the bhyve project had released code for the E1000 network card so I decided to take the hyperkit code and see what was required in order to add in the PCI code. What follows is a (rambling and somewhat incoherent) overview of what was changed to move from bhyve to hyperkit and some observations to be aware of when porting further PCI devices to hyperkit.  Again, please be aware i’m not a OS developer or a hardware designer so some of this based upon a possibly flawed understanding… feel free to correct or teach me 🙂

Update: Already heard from @justincormack about Docker for Mac, in that it uses vpnkit not vmnet.

VMM Differences

One of the key factors that led to the portability of bhyve to OSX is that the darwin kernel is loosely based upon the original kernel that powers FreeBSD (family tree from wikipedia here), which typically meant that a lot of the kernel structure and API calls aren’t too different. However OSX is typically aimed at the consumer market and not the server market meaning that as OSX has matured the people from Apple have stripped away some of the kernel functionality that comes as shipped, the obvious one being the removal of TUN/TAP devices in the kernel (can still be exposed through loading a kext (kernel extension)) which although problematic hyperkit has a solution for.

VM structure with bhyve

When bhyve starts a virtual machine it will create the structure of the VM as requested (allocated vCPUs, allocate the memory, construct PCI devices etc.) these are then attached to device nodes under /dev/vmm then the bhyve kernel module handles the VM execution. Also being able to examine /dev/vmm/ provides a place for administrators to see what virtual machines are currently running and also to allow them to continue running unattended.

Internally the bhyve userland tools make use of virtual machine contexts that link together the VM name to the internal kernel structures that are running the VM instance. This allows a single tool to run multiple virtual machines that you typically see from VMMs such as Xen, KVM or ESXi.

Finally the networking configuration that takes place inside of bhyve… Unlike the OSX kernel, freeBSD typically comes prebuilt with support for TAP devices (if not the command kldload if_tap is needed). However simply put, with the use of a TAP device it greatly simplifies the usage of guest network interfaces. When an interface is created with bhyve a PCI network device inside the VM is created and then on the physical host a TAP device is created. Inside the VM when network frames are written to the PCI device bhyve actually writes these frames onto the TAP device on the physical host (using standard write(), read() functions on file descriptors) and those packets are then broadcast out on the physical interface on the host. If you are familiar with VMware ESXi then the concept is almost identical to the way a VSwitch functions.

bhyve Network

VM Structure with Docker for Mac (hyperkit)

So the first observation with the architecture for hyperkit is that all of the device node code /dev/vmm/ has been removed, which has had the effect of making virtual machines process based. This means that when hyperkit starts a VM it will malloc() all of the requested memory etc.. and it become the singular owner of the virtual machine, essentially killing the process ID of hyperkit will kill the VM. Internally all of the virtual machine context code has been removed because hyperkit process to VM is now a 1:1 association.

The initial design to remove all of the context code (instead of possibly always tagging it to a single vm context) requires noticeable changes to every PCI module that is added/ported from bhyve as it’s all based on creating and applying these emulated devices to a particular VM context.

To manage VM execution hyperkit makes use of the hypervisor.framework which is a simplified framework for creating vCPUs, passing in mapped memory and creating an execution loop.

Finally are the changes around network interfaces, from inside the virtual machine the same virtio devices are created as would be created on bhyve. The difference is linking these virtual interfaces to a physical interface, as with OSX there is no TAP device that can be created to link virtual and physical. So their currently exists two methods to pass traffic between virtual and physical hosts, one of which is the virtIO to vmnet (virtio-vmnet) and the other is virtio to vpnkit (virtio-vpnkit) PCI devices. These both use the virtio drivers (specifically the network driver) that are part of any modern Linux kernel and then hand over to the backend of your choice on the physical system.

It’s worth pointing out here that the vmnet backend was the default networking method for xhyve and it makes use of the vmnet.framework, which as mentioned by other people is rather poorly documented. It also slightly complicates things by it’s design as it doesn’t create a file descriptors that would allow the existing simple code to read() and write() from, and it also requires elevated privileges to make use of.

With the work that has been done by the developers at Docker a new alternative method for communicating from virtual network interfaces to the outside world has been created. The solution from Docker is two parts:

  • The virtio-vpnkit device inside hyperkit that handles the reading and writing of network data from the virtual machine
  • The vpnkit component that has a full TCP/IP stack for communication with the outside world.

(I will add more details around vpnkit, when I’ve learnt more … or learnt OCaml, which ever comes first)

Networking overviews

bhyve overview (TAP devices)

bhyve_traffic

xhyve/hyperkit overview (VMNet devices)

hyperkit_traffic

 

 Docker for Mac / hyperkit overview (vpnkit)

docker_traffic

 

Porting (PCI devices) from bhyve to hyperkit

All of the emulated PCI devices all adhere to a defined set of function calls along with a structure that defines pointers to functions and a string that identifies the name of the PCI device (memory dump below)

pci_functions

The pci_emul_finddev(emul) will look for a PCI device e.g. (E1000, virtio-blk, virtio-nat) and then manage the calling of its pe_init function that will initialise the PCI device and then add it to the virtual machine PCI bus as a device that the operating system can use.

Things to be aware of when porting PCI devices are:

  • Removing VM context aware code, as mentioned it is a 1:1 between hyperkit and VM.
    • This also includes tying up paddr_guest2host() which maps physical addresses to guests etc.
  • Moving networking code from using TAP devices with read(), write() to making use of the vmnet framework

With regards to the E1000 PCI code i’ve now managed to tie up the code so that the PCI device is created correctly and added to the PCI bus, just struggling to fix the vmnet code (so feel free to give take my poor attempt and fix it successfully 🙂 https://github.com/thebsdbox/hyperkit)

img_6620

 

Further reading

http://bhyve.org/bhyve-fosdem2013.pdf

https://wiki.freebsd.org/bhyve

https://github.com/docker/hyperkit

Chef and HPE OneView

HPETSS

Currently we’re about 50% of the way through 2016 and i’ve been very privileged to spend a lot of the year working with Chef and not just their software but also presenting with them throughout Europe. All of that bringing us up to the current point where last week I was presenting at HPE TSS (Technology Solutions Summit) around HPE OneView and Chef (picture above :-)). In the last six months i’ve worked a lot with the HPE OneView Chef Provisioning driver  and recently been contributing a lot of changes that have brought the driver version up to 1.20 (as of June 2016). I’ve struggled a little bit with the documentation around Chef Provisioning, so I though it best to write up something around Chef Provisioning and how it works with HPE OneView.

Chef Provisioning

So quite simply, Chef Provisioning is a library that is specifically designed for allowing chef to automate the provisioning of server infrastructure (whether that be physical infrastructure i.e. Servers or virtual infrastructure from vSphere VMs to AWS compute). This library provides the ability to have machine resources that describe the logical make up of a provisioned resource e.g. Operating System, Infrastructure/VM Template, Server Configuration
The Provisioning library can then make use of drivers that extend functionality by allowing Chef to interact with specific end points such as vCenter or AWS. These drivers provide specific driver options that allow the finite configuration of a Chef Machine.

To recap:
Machine resource defines a machine resource inside Chef and can also have additional recipes that will be run in these machines.
Provisioning Drivers extend a machine resource so that Chef can interact with various infrastructure providers. With HPE OneView the driver provides the capability to log into OneView and create Server Profiles from Templates and apply them to server hardware.

Example Recipe:

machine 'web01' do
action :allocate # Action to be performed on this server

  machine_options :driver_options => { # Specific HPE OneView driver options
   :server_template => 'ChefWebServer', # Name of OneView Template
   :host_name => 'chef-http-01', # Name to be applied to Server Profile
   :server_location => 'Encl1, bay 11' # Location of Server Hardware
 }

end

More information around the Chef Provisioning driver along with examples of using it with things like AWS, Vagrant, Azure, VMware etc. can be found on their GitHub site.

API Interactions

Some vendors have taken the approach to have automation agents (Chef clients) hosted inside various infrastructure components such as network switches etc.. I can only assume that this was the only method that existed that would allow Infrastructure automation tools to configure their devices. The HPE OneView Unified API provides a stable/versioned API that Chef and it’s associated Chef provisioning driver can interact with (typically over Https) that doesn’t require either side to maintain for reasons of compatibility.

The diagram below depicts recipes that make use of Chef Provisioning, these have to be run locally (using the -z) command so that they can make use of the provisioning libraries and associated drivers that have to be installed on a Chef workstation. All of the machines provisioned will then be added into the Chef Server for auditing purposes etc..

HPEOneView

HP OneView – Part 2: Server Profiles

Apologies for the delay I was busy..

What is a Server Profile

The “Server Profile” is the defining phrase that comes to mind when thinking about the SDDC (Software Defined Data Centre). It allows a server administrator to define a hardware identity or configuration (MAC addresses, WWNN, BIOS, Boot, RAID config etc.) in software and then apply this to a “blank” server.

This brings a number of key advantages:

  • Pre-designed hardware identities allow configuration tasks to be pre-provisioned before hardware deployment (SAN zoning, firewall ACLs etc..)
  • Designated addresses allow easier identification e.g.  aa:bb:cc:dd:01:01 = Management /  aa:bb:cc:dd:02:01 = Production
  • Server failure/replacement doesn’t require upstream changes, software identity (server profile) is applied and all previous configuration is still relevant.

Design

Following on from the previous HP OneView post, this is a continuation of the same simple VMware vSphere deployment. As before, a good design should exist before implementation, so again i’ve embedded a diagram detailing where and how these networks are going to be defined to the virtual interfaces on a blade. VMware OneView Service Profiles

Quite Simply:

  • Two virtual interfaces defined for all of the Service networks.
  • Two virtual interfaces defined for the Production networks.
  • Two HBAs on each fabric, providing resilience for Fibre Channel traffic.

As mentioned, this is a simple design for a vSphere host but allows expansion in the future with the ability to define a further virtual interface on each physical interface inside the blade.

 

HP OneView – Part 1: Enclosures and Uplinks (logical or otherwise)

The initial configuration of HP OneView is a pretty simple and intuitive process, it just isn’t documented as well as it could be. I’ve decided to put together a few posts detailing some of the areas of configuration that could possibly do with a bit more detailed procedures. I’d expect that someone who wishes to use any thing that is documented here is already acquainted with Virtual Connect, Onboard Administrator, VMware vSphere and iLO’s etc.

Design

Before any implementation work is carried out there has to be a design in place, otherwise what and how are you going to configure anything.. The design for these posts will be a simple VMware environment consisting of a number of networks to handle the standard traffic one would expect (Production, Vmotion etc.) at this point we have defined our networks and these have been trunked by our network admin on the switches out to the C7000’s we will be connecting too.

c7000 VMware

 

From this diagram, you can see the coloured VLANs represent the traffic designated management/service traffic and the grey VLANs represent production traffic. All of these VLANs are trunked from the Access switches down to the virtual connect modules located in the back of the C7000 enclosures.

 

Note: I’ve omitted SAN switches from this and just presented what would appear as zoned storage direct to the Virtual Connect. I may cover storage and flat san at a later date, if there is any request to do so.

 

 

This represents the external connectivity being presented to our Enclosure, it’s now time for the logical configuration with HP OneView…

 

EVO:RAIL – LoudMouth aka Zeroconf

What is Zeroconf?

Zeroconf was first proposed in November 1999 and finalised in 2003 and has found the largest adoption in Mac OS products, nearly all networked printers and other network device vendors. The most obvious and recognisable implementation of zeroconf is bonjour, which has been part of Mac OS since version 9 and is used to provide a number of shared network services. The basics of Zeroconf are explained quite simply on zeroconf.org with the following (abbreviated statement) “making it possible to take two laptop computers, and connect them … without needing a man in a white lab coat to set it all up for you”.

Basically zeroconf allows a server/appliance or client device to discover one another without any networking configuration. It is comparable to DHCP in some regards in that a computer with no network configuration can send out a DHCP request (essentially asking to be configured by the DHCP server), the response will be an assigned address and further configuration allowing communication on the network. Where it differs is that zeroconf also allows for advertisement of services (time capsule, printer services, iTunes shared libraries etc.), it also can advertise small amounts of data to identify itself as a type of device.

A Time machine advertisement over zeroconf: (MAC address removed)

[dan@mgmt ~]$ avahi-browse -r -a -p -t | grep TimeMachine
+;eth0;IPv4;WDMyCloud;Apple TimeMachine;local
=;eth0;IPv4;WDMyCloud;Apple TimeMachine;local;WDMyCloud.local;192.168.0.249;9;"dk0=adVN=TimeMachineBackup,adVF=0x83" "sys=waMA=00:xx:xx:xx:xx:xx,adVF=0x100"

How the EVO:RAIL team are using Zeroconf

From recollection of the deep-dive sessions, I may have mistaken the point (corrections welcome).

Zeroconf has found the largest adoption in networked printers and apple bonjour services, however in the server deployment area a combination of DHCP and MAC address matching is more commonly used (Auto deploy or kickstart from PXE boot).

The EVO:RAIL team have implemented a Zeroconf daemon that lives inside every vSphere instance and inside the VCSA instance. The daemon inside the VCSA wasn’t really explained however the vSphere daemon instances allow the EVO:RAIL engine to discover them and take the necessary steps to automate their configuration.

Implementing Zeroconf inside vSphere(esxi)

The EVO:RAIL team had to develop their own zeroconf daemon named loudmouth that is coded entirely in python. The reason behind this was explained in one of the technical deep dives, the problem being that the majority of pre-existing zeroconf implementations have dependancies on various linux shared libraries.

/lib # ls *so | wc -l
86
/lib # uname -a
VMkernel esxi02.fnnrn.me 5.5.0 #1 SMP Release build-1331820 Sep 18 2013 23:08:31 x86_64 GNU/Linux
....
[dan@mgmt lib]$ ls *so | wc -l
541
[dan@mgmt lib]$ uname -a
Linux mgmt.fnnrn.me 3.8.7-1-ARCH #1 SMP PREEMPT Sat Apr 13 09:01:47 CEST 2013 x86_64 GNU/Linux

As the quick example above shows (32bit libs) a vSphere instance contains only a few elf based libraries providing a limited subset of shared functionality. This means that whilst elf based binaries can be moved from a linux distribution over to a vSphere instance, the chance is that a requirement on a shared library won’t be met. Further more building a static binary possibly won’t help as the VMKernel (VMwares kernel implementation)doesn’t implement the full set of linux syscalls, which makes sense as it’s not an OS implementation the userland area of the vSphere is purely for management of the hypervisor. The biggest issue that an implementation of zeroconf which relies on UDP and datagrams is the lack of implementaion of IP_PKTINFO.

This rules out avahi, Zero Conf IP (zcif), and linux implementations of mDnsResponder.

What about loudmouth?

Unfortunately it is yet to be said if any components of EVO:RAIL will be open sourced or back ported to vSphere, so whilst VMware have a zeroconf implementation for vSphere it is likely it will remain proprietary.

What next…

I’ve improved on where I’ve been with my daemon, however i’m hoping to upload it to github sooner rather than later. Unfortunately work has occupied most of the weekend and most evenings so far .. that tied with catching up on episodes of elementary and dealing with endless segfaults as I add any simple functionality have slowed the progress more than I was expecting.

Also I decided to finish writing up this post, which took most of this evening 😐

Debugging on vSphere

A summary of what to expect inside vSphere can be read here and there is no point duplicating existing information (http://www.v-front.de/2013/08/a-myth-busted-and-faq-esxi-is-not-based.html). More importantly when dealing with the vSphere userland libraries or more accurately lack of, then the use of strace is hugely valuable. More details on strace can be found here (http://dansco.de/doku.php?id=technical_documentation:system_debugging).

EVO:RAIL – “Imitation is the sincerest form of flattery”

EVO:RAIL Overview

At VMworld ’14 I managed to catch a few excellent sessions on EVO:RAIL including the deep dive session + Q&A that really explained the architecture that makes up the EVO:RAIL appliance. The EVO:RAIL appliance is only built by authorised vendors or as VMware call them QEPs (Qualified EVO:RAIL Partners) from a hardware specification that is determined by VMware.

VMW-LOGO-EVO-Rail-108Also the EVO:RAIL software/engine is provided to the QEPs for them to install and requires a built in hardware ID (also provided by the QEPs) in order for the engine to work correctly. This means currently that the EVO:RAIL appliance is a sealed environment that has only been designed to exist on pre-determined hardware and that anyone wanting to use any of this functionality on their pre-existing infrastructure will be unable to do so.

So what are the components of EVO:RAIL:
  • QEP hardware (a 2U appliance that has four separate compute nodes)
  • Loudmouth, a zeroconf daemon written in python that also detects the hardware ID
  • EVO:RAIL engine that consists of python/bash scripts, a comprehensive UI and automation ability for deployment
  • VCSA appliance (contains loudmouth, EVO:RAIL) this is pre-installed on one node in every appliance.
How is it built and how is it configured:EVO Rail UI

The idea is that a customer will speak to their account manager at a QEP place an order on a single SKU and provide some simple configuration details. The vendor then will pre-provision the four nodes with the EVO:RAIL version of vSphere, one of these nodes will be also provisioned with the VCSA appliance (also the EVO:RAIL version). The VCSA node will be configured with some IP addresses provided by the customer so that they can complete the configuration once the appliance has been racked.  The EVO:RAIL engine combined with the loudmouth daemon will detect the remaining nodes in the appliance and allow them to be configured, the same goes for addition appliances (maximum 4) that are added.

This simplified UI that was crafted by Jehad Affoneh and provides a HTML5 + web sockets interface that provides real-time information for the end-user as they complete the EVO:RAIL configuration. Once the initial configuration (networking/passwords etc.) is complete the EVO engine will then handle the automation of the following tasks:

  1. vSphere instance configuration (hostnames, passwords, network configuration)
  2. Deploy and configure the VCSA (once complete add in the vSphere instances, and configure VSAN)
  3. Tidy up
  4. Re-direct user to the EVO:RAIL simplified interface for VM deployment.

vmw-evo-rail-screen-2

 

The final screen that the user is presented with is the EVO:RAIL simplified interface, this is a “reduced functionality” user interface that allows people to complete simple tasks such as deploy a simple VM (from simplified pre-determined parameters such as sizing) or investigate the health of vSphere host. The “real” management interface i.e. vCenter is still there in the back ground and the EVO:RAIL interface still has to interact with it through the awful vCenter SOAP SDK (which hopefully will change in the next releases, thus requiring a re-write for the EVO engine).This vCenter can still be accessed through the URL on port 9443, direct with the infrastructure client or alternatively there is a small link in the EVO interface in the top right hand corner.

What next?

EVO:RAIL has been described by JP Morgan with – “EVO products could be an iPhone moment for enterprise use of cloud computing”. I see this in two ways:

Incredible simplification and ease of use, deployment of four nodes is meant to take no more that fifteen minutes. The simplified interface for VM deployments takes 3-4 clicks and your virtual machine is ready to use. The use of the LoudMouth service truly makes deployment plug and play as more capacity is added.

The walled garden, the comparison to the iPhone covers this point perfectly as this product is a closed off product and only available to authorised partners. There are some really clever design ideas here that could really be expanded on and back ported to the existing vSphere to provide some really great functionality.

  • Large scale deployment with the use of the LoudMouth daemon for discovery
  • Disaster recovery would be simplified again via the LoudMouth daemon advertising where virtual machines are located (in the event that vCenter doesn’t auto re-start after failure).

Imitation?

After speaking with the designer of the UI and sitting through the deep dive sessions there were a few design decisions or “short cuts” that had to be taken in order to get functionality to work, I decided to see what I could improve or at least imitate in my vSphere lab. To begin with I started with the zeroconf agent and how it could be implemented or improved upon, in EVO:RAIL this had to be written from scratch in python due to the development team not managing to get any pre-existing solution (which is understandable AVAHI is hideous and has dependencies on everything).

So I introduce “boaster” which is a tiny daemon written in C that handles simple mDNS service announcement, it’s only a PoC however I intend to add more functionality in the next few days. At the moment a vSphere hypervisor will advertise itself and it’s DHCP address to a bonjour browser or avahi-browse..

mdsn

 

.. More to follow.

Layer 2 over Layer 3 with vSwitch and Mikrotik virtual routers

I’ve trialled a number of different ideas to have a number of different vSwitches that have virtual machines attached, when dealing with a vSphere host that had a single interface. The problem lies in that only one of your vSwitches has a physical interface (uplink) present, which obviously means that traffic can go between the virtual machines on that vSwitch but can’t go northbound to other devices on the network. I decided to give the Mikrotik virtual router a go as it’s requirements are so tiny it doesn’t have a noticeable footprint on my small infrastructure (the virtual routers require 64MB of ram).

Using the two software routers it is possible to bridge interfaces on numerous vSwitches and then use EoIP to create another layer 2 bridge northbound over layer 3. In a simple example we will use two simple vSphere hosts (01 / 02), in real life both are a pair of Gigabyte Brix hosts that whilst good for small lab environments only have a single Gigabit interface. This is limiting with regards to what network based lab environments you can put together as any vSwitch that doesn’t have a physical interface can’t broadcast traffic anywhere other than inside that vSwitch and having different configurations on each host means that vmotion will break the hosts network connectivity.

Below is the configuration I currently have:

layer2 over layer3

 

Although not explicitly mentioned in the diagram the interface on vSwitch0 is ether1, this interface will be on the same vSwitch that has a physical interface and thus will allow outbound traffic from the esxi host. This interface will need configuring to enable connectivity to the switch and also to route out to the internet (if required).

 Configuring router01

Configure ether1

Enable the interface and assign a reachable address (192.168.0.2)

/interface enable ether1
/ip address add address=192.168.0.2/24 interface=ether1 comment="External Interface"

Also add another interface that will be used as an EoIP end point.

/ip address add address=10.0.0.1/24 interface=ether1 comment="EoIP endPoint"

Adding a default gateway (192.168.0.1) which is most peoples router.

/ip route add dst-address=0.0.0.0/0 gateway=192.168.0.1

Create an Ethernet over IP interface

This EoIP interface is required to encapsulate layer2 frames into layer3 packets that can be routed etc..

/interface eoip add comment="eoip interface" name="eoip01" remote-address=10.0.0.2 tunnel-id=1

Create a bridge and add interfaces

The bridge is required for allowing layer2 traffic between interfaces that will sit on the different vSwitches.

/interface bridge add comment="Bridge between vmnics" name=esx-bridge protocol-mode=rstp
/interface bridge port add bridge=esx-bridge interface=eoip01
/interface bridge port add bridge=esx-bridge interface=ether2

 

 Configuring router02

Configure ether1

Enable the interface and assign a reachable address (192.168.0.2)

/interface enable ether1
/ip address add address=192.168.0.3/24 interface=ether1 comment="External Interface"

Also add another interface that will be used as an EoIP end point.

/ip address add address=10.0.0.2/24 interface=ether1 comment="EoIP endPoint"

Adding a default gateway (192.168.0.1) which is most peoples router.

/ip route add dst-address=0.0.0.0/0 gateway=192.168.0.1

Create an Ethernet over IP interface

This EoIP interface is required to encapsulate layer2 frames into layer3 packets that can be routed etc..

/interface eoip add comment="eoip interface" name="eoip01" remote-address=10.0.0.1 tunnel-id=1

Create a bridge and add interfaces

The bridge is required for allowing layer2 traffic between interfaces that will sit on the different vSwitches.

/interface bridge add comment="Bridge between vmnics" name=esx-bridge protocol-mode=rstp
/interface bridge port add bridge=esx-bridge interface=eoip01
/interface bridge port add bridge=esx-bridge interface=ether2

 Testing and DHCP on vSwitch1

Connectivity between the two switches can be tested by pinging the alternative EoIP end points from either host.

e.g. router01 pinging 10.0.0.2 and vice-versa

The final testing is placing DHCP on your vSwitch1 interface and ensuring that clients on both sides of the network receive DHCP leases.

Creating the DHCP pool

/ip pool add name=vswitch1_pool  ranges 172.16.0.2-172.16.0.254

Creating the DHCP server

/ip dhcp-server add address-pool=vswitch1_pool disabled=no interface=ether2 name=vswitch1_dhcp

Then

 

ESXi v4.1 SFTP access

[UPDATE]

This is a 32bit binary, which I think needs some pretty old kernel version. Hence it only works on 4.0, I will try and get an updated release for 4,1 (*note) ESXi 5 comes with sftp-server already.

 

I came across something interesting while fiddling earlier, after spending about 2 hours building a static release of openssh server that was going to replace dropbear. I’d gotten to a point where I could build a i386 release of the binaries with no random library requirements and sshd would start and listen on a port defined in /etc/ssh/sshd_config. unfortunately starting ssh in debug mode allowed me to see numerous glibc errors during connections and explain why I couldn’t connect. At this point I don’t think there is any real way of replacing dropbear with a complete openssh solution even statically linking. Even testing the openssh sftp binary that had been compiled showed that it wasn’t coping with a system call not returning UIDs correctly meaning that it would report a FATAL error and close continually.

Given openssh wasn’t going to be replaced I researched about dropbear and if there was a newer version perhaps with sftp, unfortunately not. Eventually I came across notes on a blog mentioning that dropbear “supports” openssh sftp. After restoring ESXi back to its default filesystem settings (ssh enabled) it appears the attempting to sftp to esxi returns the following error.

ash: /sbin/sftp-server: not found
Connection closed

After compiling a slightly older version of openssh (static) I found a release of sftp-server that will once placed in /sbin on ESXi allows full usage of sftp (including sshfs mounting) binary below.

sftp-server.tar.gz

 

 

 

 

 

 

 

File access in ESXi 4.1 (nfs and tcp)

I’ve had numerous occasions were i’ve needed to upload files to the actual file systems on an esxi system, the only ‘proper’ method is using the abysmal virtual infrastructure client and working mainly on a mac means I need to use VMware Fusion for windows to run the client to connect to the server (overkill). So it’s possible to enable ssh access to the server using the tech support menu, which allows access to the underlying hypervisor and it’s file systems and therefore it’s possible to scp files to the filesystems again this is quite slow and overkill due to the encryption being used. Also due to dropbear being used for the ssh it doesn’t use sftp, which means that you can’t mount the filesystems ala. FUSE and sshfs.

I should say at this point, the goal of all this was to allow me to keep all my ISOs on one server and be able to access them from everywhere also, I wanted a PXE server to be able to access the ISOs and loopback mount them and then present the contents via NFS to the installers started by PXE.

So looking around I found some ftp binaries that should work on ESXi, given that the console access for ESXi is done with busybox there is no file command to determine what binary type the files are so I was unaware of what binaries I could run inside ESXi. This all worked fine following the instructions located on the vm-help.com website here however a few of the instructions are a little bit incorrect such as the path to tcpd is incorrect in inetd, however i’ll leave you to fix that. So on the PXE server using FUSE again and curlftpfs to mount the filesystem and this revealed a glaring bug as soon as I loop back mounted the first ISO. Unfortunately the problem lies in the fact that curlftpfs will use memory to store the file as it downloads it for access by FUSE, so trying to open a 4GB DVD ISO quickly exhausted my PXE servers memory and then it became unresponsive, great.

Further research turned up a blog post about some guy trying to use unfs to enable nfs sharing between two ESXi boxes, more specifically it was mentioned that linux binaries would work fine in the ESXi service console. One thing that was slightly confusing was that ESXi is x86_64 (64bit) however binaries that you need for the service console have to be 32bit otherwise you’ll get a confusing error that the binaries can’t be found when you try and run them due to busybox’s odd handling of errors. I present below the binaries required for nfs in ESXi :-

nfs binaries for x86

These are pretty easy to use, scp the file over to ESXi and untar it in /tmp al that’s left is to place the files in /tmp/sbin into /sbin and the files in /tmp/etc into /etc. The /etc/exports contains one entry to give access to /vmfs/volumes, which means that accessing the nfs share will give you the UUID paths for the disks containing VM’s and ISOs. To start the nfs server, start portmap first and then start unfsd which should be started the following way (unfsd -d &), this is due to unfsd not being able to disconnect from console on start up (something to do with busybox I assume).

One final note, is that once another machine connect to the nfs share portmap will go start using 50%-70% cpu and will need stopping and starting for other nfs clients. I’m still looking into this, however having a cron job to restart the process every few minutes should do the job.