Removing a Persistent Bridge in xen after reboots

The most common method for networking using xen is to use a network bridge with your physical ethernet and then the virtual nics associated with xen domains. The default install and configuration will result in a default xen bridge (xenbr0) which will have your ethernet and virtual nics in. This information has to be explicitly declared in the various xen configuration files, however xen will take care of the actual plumbing and configuring of the bridge and the interfaces. 

However, xen has an oddity (in my opinion) it appears that xend will monitor the bridges on your dom0 and add any other bridges to it’s state. Which means should you manually create a bridge e.g. 

spike ~ # brctl addbr testbr0

xend will see that this bridge has been created and add it to it’s state configuration. On a reboot or restart of xend, what will happen is that xend will configure the networking back to the state that it recorded. This means that not only will xenbr0 be created so will testbr0, which wouldn’t be a problem if your virtual nics were added to the correct bridge (xenbr0). However they more than likely be added to testbr0 meaning that your domU’s will have no networking, unless you manually move the virtual nic to the correct bridge. 

To permanently remove bridges you need to stop xend (/etc/init.d/xend stop *BEWARE* this will stop all domUs) then go to /var/lib/xend/state/ and edit the network.xml. The entire network uuid section containing the bridge you want removing will need deleting, ensure you back everything up before hand. 

Starting xend now will result in only the correct bridges being created and the domUs nics will be added to the correct bridge.

chasing your tail

In my earlier builds (v66) of opensolaris xen domUs i’d already been through the playing around with the networking bug. So realistically you would document all of the steps that you took with the physical nic and virtual nic’s etc.. whoops

I decided that i’d do a fresh install with (v77) of opensolaris, which after discovering in the bug report has a kernel problem with mounting HSFS. This means that it can’t use an install cd properly (genius), the bug report suggests doing a NFS install which is fantastic(regarding the networking bug requires a reboot and the boot_archive updating).

However:

bash-3.2# uname -a
SunOS host-a 5.11 snv_77 i86pc i386 i86xpv

Notes to follow

Networking and Bridges and such

Searching the internet for solutions for the strangeness created by xen’s networking solution really comes up with snippets from email chains or highly over complicated network diagrams, why? i’m not entirely sure. The default method for networking with Xen consists of a collection of ‘pokey’ scripts that seem to get (at least on my system) 90% of the way there.  I assume this again may be a ‘Gentoo’ issue however here are the steps (From a simplistic view) that are taken to create Xen networking:

  1. Original system consisted of eth0 and lo, eth0 has an ip of 10.0.0.1 etc.
  2. Once the system comes up and xen starts its scripts using brctl create a network bridge, this is then used to bridge the physical interface (currently still eth0 and virtual interfaces, called vifs)
  3. xend, the xen daemon uses brctl to create a bridge called xenbr0 then things get a bit random.
  4. eth0 is renamed peth0 (peth = phystical ethernet)
  5. The ip information is taken from peth0 and peth0 is then added to xenbr0
  6. Once the peth0 is added to xenbr0 the ip information is taken from peth0 and applied to xenbr0
  7. Any xen domU that is created afterwards creates a vif which it uses, this vif is then added to the xenbr0 allowing it to communicate on the network

This is a very sparse/dumbed down version of events, however it gives an idea of whats happening. The problem that occured with Gentoo is that step 5. never happens.

What this results in is that Gentoo comes up, brings eth0 up and we have network activity for a few seconds until xen starts to get it’s claws into the network configuration. However the most simple method for repairing this involves a small configuration change in /etc/conf.d/local.start

# 00/00/00 -- IP Allocation to Xenbr0
ifconfig xenbr0 10.0.0.1 netmask 255.255.255.0
route add default gw 10.0.0.254

This is an example taken from mine, you’ll need to alter the gateway and ip address information, but put simply this will execute after every other service has been started, resulting on your domO being visible and network aware etc…