Thursday, February 11, 2010

Setting Up a Distributed Build – Part 1

Earlier today I wrote about the fact that I need to rebuild the Gallio build server.  Then I went downstairs to get ready to leave for work and I promptly twisted my ankle.

So… now  lying in bed with an ice pack strapped to my ankle, this seems like as good a time as any to start planning.


Gallio currently ships with plugins for several 3rd party tools.  It supports the .Net CLR 2.0 and 4.0 on 32bit and 64bit Windows platforms and aims to eventually support Mono on Windows and on Linux.  All told there are quite a few configurations to be tested.

Here’s what we need:

  • Distributed compilation: components are compiled on platforms that provide the required dependencies.
  • Distributed testing: components are tested on platforms that reflect a significant subset of the supported configurations.
  • Distributed packaging: groups of components need to be assembled into packages on platforms that support the required packaging tools.
  • Fully automatic continuous integration.

Here are some nice things to have:

  • Centralized control of builds and maintenance procedures.
  • Centralized reporting of test results, code coverage, performance metrics and defects.
  • Centralized publication of all artifacts such as binaries and documentation.
  • Hermetic build environment.


Most components are binary portable and should only be compiled once but may need to be tested more than once depending on how the configuration of the environment affects the component.

Hermetic Build Environment

Hermetic means “sealed.”

A hermetic build environment is one which encapsulates all of its dependencies so that the result of the build is entirely reproducible.

Think for a moment about the dependencies of your own build process.

  • Source code.
  • 3rd party libraries and tools.
  • SDKs, compilers, frameworks, header files.
  • Test tools, coverage tools, build automation tools, documentation tools, linting tools.
  • Scripting languages, command-line tools, scripts.
  • Configuration files, registry entries.
  • Custom patches, tweaks.
  • Operating System revision and patch level.
  • Crazy glue.
  • Position of the mouse pointer on the screen.  (Please don’t depend on this!)
  • Time of day.  (Ditto.)

The more dependencies you control, the more reproducible will be the build.

The idea of establishing a hermetic build environment is to ensure that the dependencies are tightly controlled.  If I change line 101 of Foo.cs to fix a bug, I need to know that the next build will be identical to the first except that it will contain my specific bug fix because that is the only change in the source tree.

Let’s suppose I upgraded the compiler in place.  Now it’s possible that due to some changes in the compiler, the same source code that I compiled yesterday might produce different output today.  That’s really bad.  It means that I could introduce new bugs otherwise unchanged versions of my code just by recompiling it!

One solution is to indicate in the source code exactly which version of the compiler should be used.  That way if I recompile yesterday’s code, then I will use exactly the same version of the compiler that was originally specified yesterday.  The only way to introduce a change would be to commit a change in the source tree indicating that a new compiler version should be used instead.

The configuration of a hermetic build environment is well-known and consists of fully specified dependencies.  The only way to change the output of the build is by changing something in the source.

Crucially, the hermetic build environment should not be permanently altered by the build process itself.  For example, running tests shouldn’t leave extra temporary files floating around that remain for the next build to trip over.  The environment should be pristine!

How do we implement something like this?

In Linux, we might run the whole build inside of a very carefully constructed chroot directory assuming that we don’t care about the kernel version or that we control it via other means.

In Windows, it’s a lot harder because important configuration information could in principle reside anywhere in the file system or system registry.  For example, what really happens when we install a new version of .Net?  Hmm…  A more practical concern for Gallio is what happens when we upgrade to a new ReSharper beta, for example.

The case of the ReSharper dependency is interesting.  To compile Gallio’s ReSharper extensions, it would be sufficient to have all of the ReSharper libraries checked into the source tree.  However, to test those extensions we actually need a working installation of ReSharper, which itself requires Visual Studio, which itself requires many things.  Checking in libraries isn’t enough.  We’re going to need a virtual machine snapshot.


A build process can be described as a sequence of gathering and scattering operations.

Here’s a fairly straightforward example:

  • Task 1
    • Gather: Check out all of the sources for the core components from the source tree for a specific reversion.
    • Scatter: Compile.
    • Gather: Copy the compiled binaries to a common directory.
  • Task 2
    • Gather: Grab the compiled binaries from the first task.
    • Scatter: Compile extensions with complex environmental dependencies.
    • Gather: Copy the compiled extensions to a common directory.  Generate the documentation and the installers.
  • Task 3
    • Gather: Grab the installer from the second task.
    • Scatter: Install and test extensions with complex environmental dependencies.
    • Gather: Copy all test results into a test database.
  • Task 4
    • Gather: Grab the installer and documentation from the second task.
    • Scatter: Publish the build artifacts and documentation to the web.

In general, tasks gather stuff from the source tree or from the output of other tasks, scatter a bunch of derived artifacts around, then eventually gather them together again into a form that can be consumed by subsequent tasks.

We can represent the dependencies as a linear sequence of steps (as outlined above) except that we would be better off if we could take advantage of the inherent parallelism of certain stages.  For example, there’s no reason we couldn’t be compiling and testing the ReSharper 4.5 extension at the same time as the ReSharper 5.0 extension as long as both processes ran independently (in different virtual machines).  Instead we should model dependencies as a graph of parallel or sequential tasks.

To make this work, we will need a continuous integration tool that coordinates task dependencies elegantly.

Outlining a Solution

Here’s what the solution will look like for us.

Virtual Hermetic Build Environment

Because we need to build and test many components on Windows, the most robust and perhaps simplest way to establish a hermetic build environment will be to use a suite of virtual machines.

To that end, the virtual machine host will maintain a collection virtual machine snapshots.  Each snapshot will have a unique name, like “Win7 x64, VS 2010, R# 5.0.1161.9 (Snapshot 1.0)”.

There are many virtual machines that we could use, but for our purposes we will use VirtualBox.  VirtualBox is free and performs well although I have found it to be somewhat unstable at times.  Even so, it’s perfect for what we need.

We’re going to have a lot of snapshots.  Every time we tweak one of the virtual machines, we will create a new snapshot and then check in a change to the source tree to use the new snapshot for builds of subsequent revisions.  Fortunately VirtualBox supports differencing disk images and snapshot trees.  As a result, we only need to store the differences between snapshots.  What’s more, we can easily reset the virtual machine to a previous snapshot state just by throwing away the latest differencing disk image which is very fast.

Over time, we may choose to throw away some snapshots when they are no longer relevant for practical purposes since there will come a time when we will never need to rebuild most old revisions.  In the meantime, gigabytes are cheap and plentiful, time is not!

How many virtual machines will there be?  I’m not sure… there may be as many as a dozen.

  1. Windows Server 2003, continuous integration server, build agents and other infrastructure.
  2. Windows 7 x86, .Net Framework 2.0 & 4.0, Mono, NCover.
  3. Windows 7 x64, .Net Framework 2.0 & 4.0, Mono, NCover.
  4. Windows 7 x86, ReSharper 4.5, Visual Studio 2008, NCover.
  5. Windows 7 x86, ReSharper 5.0, Visual Studio 2010, NCover.
  6. Windows 7 x86, TypeMock, NCover.
  7. Windows 7 x86, AutoCAD 2010, NCover.
  8. Windows 7 x86, TeamCity, NCover.  (for testing TeamCity extensions)
  9. Windows 7 x86, CCNet, NCover.  (for testing CCNet extensions)
  10. Ubuntu, Mono, MonoCov.
  11. … more… ?
Continuous Integration

The build process will be broken down into a graph of tasks.  Each task will specify dependencies on other tasks whose output it consumes.  We’ll use a continuous integration build manager to run the tasks and report progress.

There are many different continuous integration tools that we could use, but for our purposes we will use TeamCity.  TeamCity does a very good job of managing dependencies and the professional edition is available for free and does everything we need.

Interestingly, because we will be using virtual machine snapshots as part of our hermetic build environment, we will not use TeamCity’s built-in support for cross-platform builds.

JetBrain’s recommended approach for cross-platform builds is to run multiple TeamCity build agents on multiple machines with different configurations.  However for that to work, the machine has to persist the state of the built agent.  The problem is that we will always wipe our virtual machines back to their snapshot state after each build.  So if the agents ran inside the virtual machine they would lose their state changes completely between builds which would probably make TeamCity somewhat unhappy.

Another problem with running separate build agents inside each virtual machine is that we would have to keep all of the virtual machines running all of the time.  Otherwise, the TeamCity primary server would not be able to connect to the build agents.  Running so many virtual machines (dozens?) would be a prohibitively expensive waste of resources.  Moreover, we would need to keep a lot of build agents running and it costs money to buy licenses for more than 3.

Our solution will be to run TeamCity build agents only on the virtual machine host (or in a virtual machine with access to the host).  Each build task will be responsible for starting up the appropriate virtual machine, invoking a remote command inside the virtual machine, gathering results from inside the virtual machine, and then shutting down the virtual machine (discarding changes to its state).

All we need is one TeamCity build agent!  That one build agent can manage as many virtual machines as we like.

In practical terms, we’ll probably run 3 TeamCity build agents.  That way we can run 3 tasks in parallel to improve performance while still fitting within the license constraints of the free TeamCity Professional Edition.

Odds and Ends

We’re also going to need a few other things.

Err, right… I guess we need to build Archimedes too…


Yann Trevin said...

Oh! Sorry to hear about that, Jeff! Hope you are not suffering. I know how sprains are painful.

Roberto Hernández said...


What do you use to run the remote commands on the VM? Do you use some tipe of automation framework like SeleniumRC?


Jeff Brown said...

I actually wrote my own tool for this called VMTool.

It's not very polished yet but you'll find it here: