Single Module Builds – The Fastest Heresy in Town

Single Module Builds – The Fastest Heresy in Town

By Andy Thomason

Overload, 25(138):7-9, April 2017

Unity builds can be controversial. Andy Thomason shows how much difference they can make to build times.

We have been building our C++ projects pretty much the same way since the early 80s and maybe it is time to change. Back then, no one could imagine the need for more than 640k of RAM and so C and C++ modules needed to be small enough to fit in memory. In the C world, we would build modules with a single C function or a group of related functions, carefully keeping the limit to under a thousand lines or so as any more than this would cause the memory to page, causing the build time to shoot up. Linking, too, was a problem as file systems were quite slow and so we needed to build libraries which were archives of object files to minimise link times.

To understand the build process in depth let us look at the layers that we must go through to make an executable file.

Preprocessor Reads include files, expands macros.
Lex/Parse Generates tokens for uninstantiated functions.
C++ Semantics Template expansion, class instantiation.
Intermediate Representation (IR) Platform independent assembly-like pseudo language for high level optimisation.
Code Generation (CG) Platform specific pseudo language for low level optimisation.
Object files/Static libraries Platform specific binary code plus debug information (Dwarf/PDB)
Executable/Dynamic library Linker-generated code ready to run.

This is pretty much the same stack as the original C compilers, with the exception of the C++ semantics. Most of the compile time comes from the fact that #include will typically read hundreds of thousands of extra lines per module, no matter how careful you are with the includes. In addition to this, the full code generation path is executed for many functions defined in header files but only one version of the binary will make it to the executable. Worse than that this is the amount of work needed to generate debug information for every class included, with multiple versions of debug information for each instantiated template class.

Traditional C++ builds

In most C++ projects declarations are made in .h files with classes defined separately using :: in .cpp or .cc files. Some functions are inline in the class, but usually only short ones. The reason for this is largely historical, and some developers prefer not to have function definitions in classes because it clutters the simplicity of their class definitions. However, as we shall see this comes at a very high price in terms of build time and code generation (see Listing 1).


#include "bread.h"
#include "toaster.h"

bread &toaster::eject() {


#include "bread.h"
bread::bread(flour &f, water &w) {
Listing 1

Unity builds

In the games industry, many large engines use ‘unity builds’. A unity build reduces the number of modules in a compilation and has a significant impact on build time. They work by constructing .cpp files that look like Listing 2.


#include "renderer.cpp"
#include "ui_elements.cpp"
#include "gameplay_code.cpp"
#include "character_AI.cpp"


#include "file_io.cpp"
#include "cat_dynamics.cpp"
#include "wobbly_bits.cpp"
#include "death_ray.cpp"

Listing 2

The source code in the .cpp files still looks the same with the exception that it must be ‘clean’, ie. no static or global variables and no anonymous namespaces. Also it is important to avoid using namespace globally as this can cause some problems.

The result is a build that contains fewer modules. Because every module includes pretty much the same header files, despite your best efforts, then it turns out that it takes about the same time to compile a module regardless of its complexity. Fewer modules mean less duplication of debug information and common functions, less code generation, more optimisation opportunities as compilers can inline any function in the module.

Unity builds made a significant difference to large game projects, which often have more than ten thousand source files. But the logical assumption is that the larger the modules, the slower the build would be for an individual change to a file. In practice, in many cases, this turns out not to be true because the link time tends to dominate for projects with thousands of modules and will easily exceed the time taken to compile a single module, thus the incremental compile time is also improved by unity builds.

I demonstrated a unity build of Clang at LLVM 2014 that reduced the compile time from over an hour to twenty seconds and incremental builds to five seconds, but there was fierce resistance to this concept despite the 200× speedup. I had to make over a thousand edits to the codebase to make it build and Clang makes extensive use of static variables and anonymous namespaces, as well as having some namespace name clashes.

Beyond unity builds: single module compilation

One thing that C++ did for us was to allow us to define small functions in the class definition itself. As compilers got better, a new kind of library emerged: the header-only C++ library. Many of the libraries in Boost, for example, are header only and this has the huge advantage that we do not need to build and distribute a binary of the library and so can run the code on any platform with a modern compiler.

JavaScript is also an example of header-only code. When we load a web page, we compile all the JavaScript in a single module – there is no concept of linking in JavaScript, and yet this works well and is accepted practice.

A single module is the ultimate unity build with only one module to build, no linking and source-only library distribution. Advantages are very fast build times in most circumstances, clearer code and very fast link and rebuild times in builds dominated by links. Disadvantages include the effort to re-write libraries to be header only, a ‘brick wall’ response when compilers consume all available DRAM and start paging to SSD and a potentially worse response to very complex template metaprogramming.

How about circular references? In single module builds, if class A refers to class B and vice versa, it is still necessary to separate declarations from definitions as in traditional builds. This is because C++ processes classes in the file scope in the order they are seen in the file. To do this we generally use .inl files, which use the :: operator and the inline keyword. Using the inline keyword means the defined functions are created in linkonce sections rather that the regular sections and as a result then the library can be used in multiple modules. The key is to declare the classes from leaf to root so that forward references are minimal. Another method is to use only template classes or to declare all classes inside a struct which defers the evaluation of the function definitions. It would be possible to change the language to allow classes to be declared in any order, a small change now as compilers keep everything in memory.

So why not build C++ programs as a single module? We know that the build time is very good and the code generation is close to optimal. The answer currently is that there are very few header-only libraries and there is no infrastructure of header-only libraries that we can call on to build our code. There is no packaging mechanism for header-only libraries and finding them on GitHub for example is something of a lottery.

Using the Unity build method helps to bridge the gap somewhat, for example Bullet Physics, TinyXML, LUA (see table) and other libraries can be converted to header-only form without too much effort, but the effort to ‘clean’ the builds and remove static variables can be daunting.

But if you can build your code in a single module, the performance of the build and the generated code can improve by several orders of magnitude.

In an ideal world, compilers would be multithreaded, but even Clang, which is fairly modern, is stubbornly single threaded and the design is not likely to accommodate multithreaded compilation. This means that there is a lower limit on module compile time that will be with us for some time.

Libraries that can be built header-only

Some libraries that can be built header-only even though not designed to do so.


    Bullet is Erwin Coumans’ excellent physics library. As a multi-module build, it takes about a minute to build; as a single module build, it takes around a second.


    TinyXML is a lovely little XML parser that is widely used. As it has very few files, a single module build does not make it much faster, but it does make it portable as no binary is required.


    LUA is a small script language that is widely used in the game industry. It is written in C, but with a little hacking it will run as a header-only library.

Include vs import

Java and C# have both adopted the header-only style as the designers realised that separate declarations and definitions were unnecessary. Java and C# also dropped the #include mechanism in favour of an ‘import’ mechanism. C++ is acquiring an import mechanism of its own and we hope that it will improve build times when it makes it into mainstream compilers. There have been many proposals, however, and all are different.

Daveed Vandevoorde’s original 2006 proposed paper is here:

Daveed is VP of engineering at Edison Design Group, whose front end we used for the Playstation 3 and Vita compilers. Many well-know compilers are based on EDG. Incidentally, EDG is a well-structured C library and builds as a single module unity build in less than a second.

Microsoft have a module mechanism in VS2015:

  #include <stdio.h>
  module dog;
  export namespace dog {
    void woof() {
  import dog;

Clang also is working up to module support:

This involves a module map which maps headers to modules.

  module std [system] [extern_c] {
    module assert {
      textual header "assert.h"
      header "bits/assert-decls.h"
      export *
  import std.assert;

Whilst modules improve the structure of C++ programmes, will they improve build times? Much will depend on the implementation.


To help demonstrate the benefits of single module compilation, I've created a simple python script in a project called synthcode . ( )

This script generates a synthetic C++ project with a variable number of classes. It offers the choice of a single module build (one file per class) or a traditional multi-module build (two files per class) so that we can benchmark the two against each other.

The generated classes aggregate other classes and call a single function on each aggregated member. A more sophisticated script could generate more complex behaviour but even this simple method is quite revealing of build and code performance. Table 1 shows the results on Windows using using cmake -G "Ninja" (Ninja is a high-performance build tool using multiple threads.)

Results are an average of 3 runs

Number of classes Traditional rebuild time Traditional build time (single change) Traditional build exe size (bytes) Single module build/rebuild time Speedup Single module exe size (bytes)
10 6.12s 1.93s 12800 2.14s 2.87x 11776
100 41.12s 2.36s 14336 2.06s 19.92x 11776
1000 401.66s 4.77s 30208 2.87s 139.88x 11776
Table 1

Note that the traditional build time scales roughly linearly with project size but the build time for a single module change also goes up due to link times. On GCC builds the link time can increase to an hour or more on debug builds thanks to the DWARF information replicated in all the object files.

The Single module build time grows much more slowly regardless of the number of classes compiled although in a real scenario thousands of big and complex classes could slow it down to a few tens of seconds.

We are also not using template metaprogramming which performs functional style programming in the compiler. Every operation of template metaprogramming can consume millions of cycles in the compiler and each template expansion can create large amounts of debug information. This can be very useful, however, if properly controlled.

The module compile time is largely dependent on the number of lines pulled into the project by #include. In this case I am including the following with each class which contributes about 110,000 lines to the build:

  #include <vector>
  #include <iostream>
  #include <algorithm>
  #include <cstdint>
  #include <future>

Code size, an indication of optimisation, in the single module builds is consistently low but in the multi module builds it is high. In GCC builds this can lead to multi-gigabyte executable sizes and hour-long link times.


We can see that single module builds not only can improve rebuilds of C++ projects but can improve rebuild times by up to three orders of magnitude. Of course it is not likely that it will be widely adopted as most projects are legacy projects which cannot be updated and C++ literature abounds with illustrations of traditional programme layouts. However, for new projects it would be wise to consider it as an option.

Another very useful side effect of fast compile times is that we could follow the javascript route and distribute code solely as source. A three second compile time for a thousand class application is less that the loading time for even a modest JavaScript application. C++ scripting in web pages would be a very nice thing with libclang embedded in the browser. More palatable than asm.js in many ways.

Heresy? Undoubtedly. New ideas, especially pragmatic ones, take time to become mainstream. 100 times build speed improvements are a compelling argument however.

Your Privacy

By clicking "Accept Non-Essential Cookies" you agree ACCU can store non-essential cookies on your device and disclose information in accordance with our Privacy Policy and Cookie Policy.

Current Setting: Non-Essential Cookies REJECTED

By clicking "Include Third Party Content" you agree ACCU can forward your IP address to third-party sites (such as YouTube) to enhance the information presented on this site, and that third-party sites may store cookies on your device.

Current Setting: Third Party Content EXCLUDED

Settings can be changed at any time from the Cookie Policy page.