Introducing firebuild

Manage firecracker root file systems and VMMs
thumbnail

§what is Firecracker

Firecracker is a virtualization technology for creating and managing secure, multi-tenant services suited for container-like and serverless scenarios. Firecracker workloads run in virtual machines, not containers. Unlike containers, they benefit from extra isolation properties provided by the hardware virtualization. Similar to containers, Firecracker VMs—microVMs—are lightweight and fast to boot. Like containers, they can be treated like cattle. They combine the flexibility of containers and security of virtual machines. These little things can be started in as little as 125 milliseconds and a single host can manage thousands of them! Firecracker was developed at Amazon Web Services primarily for Lambda and Fargate offerings.

Firecracker uses Kernel Virtual Machine (KVM) to create and run microVMs. A minimalist design is achieved by removing unnecessary devices and guest-facing functionality. This reduces the memory footprint and attack surface of each individual VM leading to better utilization and increased security. At minimum, a microVM requires a Linux kernel image and a root file system. Networking can be provided by setting up interfaces manually or with container network interface (CNI).

Firecracker is a couple of years old. Pretty young in the technology world but there are already interesting integrations out there. Kata Containers and WeaveWorks Ignite are the major ones.

§firebuild

There is only so much one can learn by looking at existing tools. The best way is to take something and build another useful thing on top of it. Only this way one can hit roadblocks cleared by others. Only this way one can investigate alternative avenues, possibly not considered before. That is why a few weeks ago I have started working on firebuild. The source code is on GitHub[1].

With firebuild it is possible to:

  • build root file systems directly from Dockerfiles
  • tag and version root file systems
  • run and manage microVMs on a single host
  • define run profiles

The concept of firebuild is to leverage as much of the existing Docker world as possible. There are thousands of Docker images out there. Docker images are awesome because they encapsulate the software we want to run in our workloads, they also encapsulate dependencies. Dockerfiles are what Docker images are built from. Dockeriles are the blueprints of the modern infrastructure. There are thousands of them for almost anything one can imagine and new ones are very easy to write.

§an image is worth more than a thousand words

Ah, but the idea is pretty difficult to visualize with a single image. So, instead, let me walk you though this example of running HashiCorp Consul 1.9.4 on Firecracker. I promise, any questions are answered further.

Before going all in, some prerequisites[2].

§create a firebuild profile

1
2
3
4
5
6
7
8
9
sudo $GOPATH/bin/firebuild profile-create \
	--profile=standard \
	--binary-firecracker=$(readlink /usr/bin/firecracker) \
	--binary-jailer=$(readlink /usr/bin/jailer) \
	--chroot-base=/fc/jail \
	--run-cache=/fc/cache \
	--storage-provider=directory \
	--storage-provider-property-string="rootfs-storage-root=/fc/rootfs" \
	--storage-provider-property-string="kernel-storage-root=/fc/vmlinux"

§create a base operating system root file system (baseos)

firebuild uses the Docker metaphor. An image of an application is built FROM a base. An application image can be built FROM alpine:3.13, for example. Or FROM debian:buster-slim, or FROM registry.access.redhat.com/ubi8/ubi-minimal:8.3 and dozens others.

In order to fulfill those semantics, a base operating system image must be built before the application root file system can be created.

1
2
3
sudo $GOPATH/bin/firebuild baseos \
    --profile=standard \
    --dockerfile $(pwd)/baseos/_/alpine/3.12/Dockerfile

§create a root file system of the application (rootfs)

To run an instance of HashiCorp Consul, firebuild requires the Consul application root file system. To build one:

1
2
3
4
5
6
7
sudo $GOPATH/bin/firebuild rootfs \
    --profile=standard \
    --dockerfile=git+https://github.com/hashicorp/docker-consul.git:/0.X/Dockerfile \
    --cni-network-name=machine-builds \
    --ssh-user=alpine \
    --vmlinux-id=vmlinux-v5.8 \
    --tag=combust-labs/consul:1.9.4

§start the application

1
2
3
4
5
sudo $GOPATH/bin/firebuild run \
    --profile=standard \
    --from=combust-labs/consul:1.9.4 \
    --cni-network-name=machines \
    --vmlinux-id=vmlinux-v5.8

§query Consul

First, find the VM ID:

1
2
3
sudo $GOPATH/bin/firebuild ls \
    --profile=standard \
    --log-as-json 2>&1 | jq '.id' -r

In my case, the value is wcabty1922gloailwrce. I used it to get the IP address of the VM:

1
2
3
$ sudo $GOPATH/bin/firebuild inspect \
    --profile=standard \
    --vmm-id=wcabty1922gloailwrce | jq '.NetworkInterfaces[0].StaticConfiguration.IPConfiguration.IP' -r

The command returned 192.168.127.89. I could query Consul via REST API:

1
2
curl http://192.168.127.89:8500/v1/status/leader
"127.0.0.1:8300"

§what the heck happened

I have started by creating a firebuild profile. Technically firebuild does not require one. Common arguments may be provided on every execution. The profile exists for two reasons:

  • it makes subsequent operations more concise by moving the tedious arguments away
  • provides extra isolation with different chroots, cache directories, and image / kernel catalogs

The directories referenced in the profile must exist before a profile can be created.

In the next step, I have built a base operating system root file system. The elephant in the room question is:

Why does this tool even require that step?

Typical Linux in Docker has many parts removed. For example, there is no init system. Further, different base Docker images have often completely different sets of tools available. All that is for a good reason: Docker images supposed to be small, must start fast and limit the potential attack surface by removing what’s unnecessary.

firebuild builds Firecracker virtual machines. It does so from Dockerfile blueprints.

In order to provide a consistent experience, it requires a more or less functional multi-user Linux installation with components otherwise hidden in the Docker or OCI runtime. These base Linux installations are built from firebuild provided Dockerfiles, the --dockerfile $(pwd)/baseos/_/alpine/3.12/Dockerfile is a base Alpine 3.12. All the commands above were executed from $GOPATH/src/github.com/combust-labs/firebuild directory, hence the use of $(pwd) in the baseos build.

firebuild uses Docker to build the base operating root file system by:

  • building a Docker image from the provided Dockerfile
  • starting a container from newly built image
  • exporting the root file system of the container to the ext4 file on the host using Docker API exec
  • removing the container and the image
  • persisting the built file in the storage provider and namespacing it, the example above results in the root file system stored in /fc/rootfs/_/alpine/3.12/rootfs
  • persisting the build metadata next to the root file system file, above example gives /fc/rootfs/_/alpine/3.12/metadata.json

This custom firebuild provided Dockerfile is based on an upstream alpine:3.12 from Docker Hub.

The primary reason for following this path is to enable building Firecracker VMs from upstream Dockerfiles as often as possible. Other tools out there enable converting a Docker container into a rootfs file but to achieve that full VM experience, a Docker container has to be launched from a hand crafted Dockerfile or extra packages have to be installed on the running container before the export. Dockerfiles are fully auditable but these extra steps are not. The steps often differ between containers. It might be difficult to track how the rootfs was built, some benefits of using a blueprint could be lost.

The step 2 of the example builds Consul directly from the official HashiCorp Docker images GitHub repository. The application root file system was built using the rootfs command.

Note: I refer to the application root file system as rootfs. Bit confusing at first because the result of the baseos command is technically also a rootfs. However, to mentally distinguish one from the other, I refer to to the base OS using the term baseos and an application is a rootfs. This may change in the future.

The rootfs command does much more work than the baseos command.

It starts by fetching a Dockerfile from a source given via the --dockerfile argument. The source can be one of:

  • a git+http(s):// style URL pointing at a git repository (does not have to be GitHub)
  • a http:// of https:// URL, be careful here: There Will Be Dragons (read more[3])
  • a local file
  • an inline Dockerfile
  • standard ssh://, git:// and git+ssh:// URL with a Dockerfile path appended via :/path/to/Dockerfile

The most convenient is the local file system build or a git repository. If a git repository is used, firebuild will clone a complete repository to a temporary directory and treat the build further as a local file system build. Once the sources are on disk, firebuild loads and parses the Dockerfile. This part is preliminary and will change in favor of unattended bootstrap without SSH requirement: Next, a build time VM is started, firebuild connects to it via SSH and runs all commands from the Dockerfile against that VM.

Resources referenced with ADD and COPY commands are treated likewise and supported. Remote resources are supported. firebuild does its best to properly reflect any WORKDIR, USER and SHELL conditions. It supports --chown flags for ADD and COPY.

What’s more, firebuild supports multi-stage builds. firebuild will build any stages with FROM ... as as regular Docker images and extract resources from the stage to the main build when COPY --from= is found. For example, it’s perfectly fine to build a Kafka Proxy root file system from:

1
2
3
4
5
6
7
sudo $GOPATH/bin/firebuild rootfs \
    --profile=standard \
    --dockerfile=git+https://github.com/grepplabs/kafka-proxy.git:/Dockerfile#v0.2.8 \
    --cni-network-name=machine-builds \
    --ssh-user=alpine \
    --vmlinux-id=vmlinux-v5.8 \
    --tag=combust-labs/kafka-proxy:0.2.8

The Dockerfile commands statements which are not supported: ONBUILD, HEALTHCHECK and STOPSIGNAL (although the last one will be supported at a later stage).

Once all of that is finished, the build VM will be stopped, cleaned up and the resulting root file system will be persisted in the storage provider. A metadata file is stored next to the root file system. Currently, only the directory based storage provider is available.

Finally, a resulting application is launched with the run command. The run command uses an unattended, cloud-init like mechanism. The metadata of the baseos and rootfs is combined. A guest facing version is put in MMDS (the Firecracker machine metadata service). MMDS provides a HTTP API available to both: the host and the guest. By default, if the guest was started with --allow-mmds flag, it can reach that API via 169.254.169.254 IP address. firebuild uses MMDS by default for all guests but this can be disabled. The guest facing metadata contains a bunch of information required to bootstrap the VM in a cloud-init style. These are fairly short so let’s look at an example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
{
  "latest": {
    "meta-data": {
      "Drives": {
        "1": {
          "DriveID": "1",
          "IsReadOnly": "false",
          "IsRootDevice": "true",
          "Partuuid": "",
          "PathOnHost": "rootfs"
        }
      },
      "EntrypointJSON": "{\"Cmd\":[\"agent\",\"-dev\",\"-client\",\"0.0.0.0\"],\"EntryPoint\":[\"docker-entrypoint.sh\"],\"Env\":{\"HASHICORP_RELEASES\":\"https://releases.hashicorp.com\"},\"Shell\":[\"/bin/sh\",\"-c\"],\"User\":\"0:0\",\"Workdir\":\"/\"}",
      "Env": {},
      "ImageTag": "combust-labs/consul:1.9.4",
      "LocalHostname": "sharp-mirzakhani",
      "Machine": {
        "CPU": "1",
        "CPUTemplate": "",
        "HTEnabled": "false",
        "KernelArgs": "console=ttyS0 noapic reboot=k panic=1 pci=off nomodules rw",
        "Mem": "128",
        "VMLinux": "vmlinux-v5.8"
      },
      "Network": {
        "CniNetworkName": "machines",
        "Interfaces": {
          "b6:16:f2:3d:29:cf": {
            "Gateway": "192.168.127.1",
            "HostDeviceName": "tap0",
            "IfName": "",
            "IP": "192.168.127.89",
            "IPAddr": "192.168.127.89/24",
            "IPMask": "ffffff00",
            "IPNet": "ip+net",
            "NameServers": ""
          }
        }
      },
      "Users": {},
      "VMMID": "wcabty1922gloailwrce"
    }
  }
}

The metadata contains information about attached drives, network interfaces, simple machine data, entrypoint info and user’s SSH keys, if --identity-file and --ssh-user arguments were provided. The component responsible for bootstrapping the VM from this data is called vminit and can be found in this GitHub repository[4]. The compiled binary is baked into the baseos (suboptimal but it’s a first iteration) and invoked as a system service on VM start.

Currently, vminit does the following:

  • update /etc/hosts file if the VM has a network interface and make sure the VM resolves itself via configured hostname on the interface IP address
  • update /etc/hostname to the configured hostname
  • create an environment variables /etc/profile.d/run-env.sh file for any variables passed via --env and --env-file flags of the run command
  • when users contains a user entry with SSH keys, write those SSH keys to the respective authorized_keys file to enable SSH access; an example of a user entry:
1
2
3
4
5
"Users": {
  "alpine": {
    "SSHKeys": "ssh-rsa ... \nssh-rsa ...\n"
  }
}
  • write the /usr/bin/firebuild-entrypoint.sh program responsible for invoking the entrypoint from MMDS data

When the machine starts, vminit looks for the /usr/bin/firebuild-entrypoint.sh and if one is found, executes it. Fingers crossed, things went well and the application starts automatically.

That was a high level overview of the process.

List running VMs:

1
sudo firebuild ls --profile=standard

Inspect the metadata of a running VM:

1
sudo firebuild inspect --profile=standard --vmm-id=...

Terminate a running VM:

1
sudo firebuild kill --profile=standard --vmm-id=...

§unclean shutdowns

Firecracker VMs will stop when a reboot command is issued in the guest. I call these unclean meaning that they will leave a bunch of VM related directories on disk:

  • the jail directory
  • the run cache directory
  • the CNI cache for the VM interface and a veth pair

To mass-clean all these for all exited VMs, run:

1
sudo firebuild purge --profile=standard

§profile commands

List profiles:

1
sudo firebuild profile-ls

Inspect a profile:

1
sudo firebuild profile-inspect --profile=...

Profiles may be updated by issuing subsequent profile-create commands with a name of an existing profile.

§what’s coming next

These are still early stages for firebuild. There are many things to improve.

§short term

  • tests, tests, tests, …, end to end tests
  • remove the requirement to have SSH access during rootfs build and move to the MMDS / vminit build
  • add support for building directly from Docker images for special cases where the Dockerfile might not be available or is difficult to handle, and example is Jaeger Docker image where the Dockerfile does not incorporate the binary artifact build
  • add a command to build a Linux kernel image directly from the tool
  • manage resolv.conf and nsswitch.conf on the guest

§mid term

  • add service catalog support for service service discovery
  • add support for additional disks
  • a VM management API
  • an event bus / hook to be able to react to events originating in firebuild

§long term

  • enable rootfs build and run related operation split via remote build and run operators
  • provide a remote registry type of system to host rootfs and kernel files externally
  • add networking tools to create CNI bridge and overlay networks and expose VMs on outside of the host

And probably many, many more as the time goes by. I’ll be writing more as firebuild develops.

Thanks for reading. Stay safe.