§what is Firecracker
Firecracker is a virtualization technology for creating and managing secure, multi-tenant services suited for container-like and serverless scenarios. Firecracker workloads run in virtual machines, not containers. Unlike containers, they benefit from extra isolation properties provided by the hardware virtualization. Similar to containers, Firecracker VMs—microVMs—are lightweight and fast to boot. Like containers, they can be treated like cattle. They combine the flexibility of containers and security of virtual machines. These little things can be started in as little as 125 milliseconds and a single host can manage thousands of them! Firecracker was developed at Amazon Web Services primarily for Lambda and Fargate offerings.
Firecracker uses Kernel Virtual Machine (KVM) to create and run microVMs. A minimalist design is achieved by removing unnecessary devices and guest-facing functionality. This reduces the memory footprint and attack surface of each individual VM leading to better utilization and increased security. At minimum, a microVM requires a Linux kernel image and a root file system. Networking can be provided by setting up interfaces manually or with container network interface (CNI).
Firecracker is a couple of years old. Pretty young in the technology world but there are already interesting integrations out there. Kata Containers and WeaveWorks Ignite are the major ones.
There is only so much one can learn by looking at existing tools. The best way is to take something and build another useful thing on top of it. Only this way one can hit roadblocks cleared by others. Only this way one can investigate alternative avenues, possibly not considered before. That is why a few weeks ago I have started working on
firebuild. The source code is on GitHub.
firebuild it is possible to:
- build root file systems directly from Dockerfiles
- tag and version root file systems
- run and manage microVMs on a single host
- define run profiles
The concept of
firebuild is to leverage as much of the existing Docker world as possible. There are thousands of Docker images out there. Docker images are awesome because they encapsulate the software we want to run in our workloads, they also encapsulate dependencies. Dockerfiles are what Docker images are built from. Dockeriles are the blueprints of the modern infrastructure. There are thousands of them for almost anything one can imagine and new ones are very easy to write.
§an image is worth more than a thousand words
Ah, but the idea is pretty difficult to visualize with a single image. So, instead, let me walk you though this example of running HashiCorp Consul 1.9.4 on Firecracker. I promise, any questions are answered further.
§create a firebuild profile
§create a base operating system root file system (baseos)
firebuild uses the Docker metaphor. An image of an application is built
FROM a base. An application image can be built
FROM alpine:3.13, for example. Or
FROM debian:buster-slim, or
FROM registry.access.redhat.com/ubi8/ubi-minimal:8.3 and dozens others.
In order to fulfill those semantics, a base operating system image must be built before the application root file system can be created.
§create a root file system of the application (rootfs)
To run an instance of HashiCorp Consul,
firebuild requires the Consul application root file system. To build one:
§start the application
First, find the VM ID:
In my case, the value is
wcabty1922gloailwrce. I used it to get the IP address of the VM:
The command returned
192.168.127.89. I could query Consul via REST API:
§what the heck happened
I have started by creating a
firebuild profile. Technically
firebuild does not require one. Common arguments may be provided on every execution. The profile exists for two reasons:
- it makes subsequent operations more concise by moving the tedious arguments away
- provides extra isolation with different
chroots, cache directories, and image / kernel catalogs
The directories referenced in the profile must exist before a profile can be created.
In the next step, I have built a base operating system root file system. The elephant in the room question is:
Why does this tool even require that step?
Typical Linux in Docker has many parts removed. For example, there is no init system. Further, different base Docker images have often completely different sets of tools available. All that is for a good reason: Docker images supposed to be small, must start fast and limit the potential attack surface by removing what’s unnecessary.
firebuild builds Firecracker virtual machines. It does so from Dockerfile blueprints.
In order to provide a consistent experience, it requires a more or less functional multi-user Linux installation with components otherwise hidden in the Docker or OCI runtime. These base Linux installations are built from
firebuild provided Dockerfiles, the
--dockerfile $(pwd)/baseos/_/alpine/3.12/Dockerfile is a base Alpine 3.12. All the commands above were executed from
$GOPATH/src/github.com/combust-labs/firebuild directory, hence the use of
$(pwd) in the
firebuild uses Docker to build the base operating root file system by:
- building a Docker image from the provided Dockerfile
- starting a container from newly built image
- exporting the root file system of the container to the ext4 file on the host using Docker API
- removing the container and the image
- persisting the built file in the storage provider and namespacing it, the example above results in the root file system stored in
- persisting the build metadata next to the root file system file, above example gives
firebuild provided Dockerfile is based on an upstream
alpine:3.12 from Docker Hub.
The primary reason for following this path is to enable building Firecracker VMs from upstream Dockerfiles as often as possible. Other tools out there enable converting a Docker container into a
rootfs file but to achieve that full VM experience, a Docker container has to be launched from a hand crafted Dockerfile or extra packages have to be installed on the running container before the export. Dockerfiles are fully auditable but these extra steps are not. The steps often differ between containers. It might be difficult to track how the
rootfs was built, some benefits of using a blueprint could be lost.
The step 2 of the example builds Consul directly from the official HashiCorp Docker images GitHub repository. The application root file system was built using the
Note: I refer to the application root file system as
rootfs. Bit confusing at first because the result of the
baseoscommand is technically also a
rootfs. However, to mentally distinguish one from the other, I refer to to the base OS using the term
baseosand an application is a
rootfs. This may change in the future.
rootfs command does much more work than the
It starts by fetching a Dockerfile from a source given via the
--dockerfile argument. The source can be one of:
git+http(s)://style URL pointing at a git repository (does not have to be GitHub)
https://URL, be careful here: There Will Be Dragons (read more)
- a local file
- an inline
git+ssh://URL with a Dockerfile path appended via
The most convenient is the local file system build or a git repository. If a git repository is used,
firebuild will clone a complete repository to a temporary directory and treat the build further as a local file system build. Once the sources are on disk,
firebuild loads and parses the Dockerfile. This part is preliminary and will change in favor of unattended bootstrap without SSH requirement: Next, a build time VM is started,
firebuild connects to it via SSH and runs all commands from the Dockerfile against that VM.
Resources referenced with
COPY commands are treated likewise and supported. Remote resources are supported.
firebuild does its best to properly reflect any
SHELL conditions. It supports
--chown flags for
firebuild supports multi-stage builds.
firebuild will build any stages with
FROM ... as as regular Docker images and extract resources from the stage to the main build when
COPY --from= is found. For example, it’s perfectly fine to build a Kafka Proxy root file system from:
The Dockerfile commands statements which are not supported:
STOPSIGNAL (although the last one will be supported at a later stage).
Once all of that is finished, the build VM will be stopped, cleaned up and the resulting root file system will be persisted in the storage provider. A metadata file is stored next to the root file system. Currently, only the directory based storage provider is available.
Finally, a resulting application is launched with the
run command. The
run command uses an unattended, cloud-init like mechanism. The metadata of the
rootfs is combined. A guest facing version is put in MMDS (the Firecracker machine metadata service). MMDS provides a HTTP API available to both: the host and the guest. By default, if the guest was started with
--allow-mmds flag, it can reach that API via
169.254.169.254 IP address.
firebuild uses MMDS by default for all guests but this can be disabled. The guest facing metadata contains a bunch of information required to bootstrap the VM in a cloud-init style. These are fairly short so let’s look at an example:
The metadata contains information about attached drives, network interfaces, simple machine data, entrypoint info and user’s SSH keys, if
--ssh-user arguments were provided. The component responsible for bootstrapping the VM from this data is called
vminit and can be found in this GitHub repository. The compiled binary is baked into the
baseos (suboptimal but it’s a first iteration) and invoked as a system service on VM start.
vminit does the following:
/etc/hostsfile if the VM has a network interface and make sure the VM resolves itself via configured
hostnameon the interface IP address
/etc/hostnameto the configured
- create an environment variables
/etc/profile.d/run-env.shfile for any variables passed via
--env-fileflags of the
userscontains a user entry with SSH keys, write those SSH keys to the respective
authorized_keysfile to enable SSH access; an example of a user entry:
- write the
/usr/bin/firebuild-entrypoint.shprogram responsible for invoking the
entrypointfrom MMDS data
When the machine starts,
vminit looks for the
/usr/bin/firebuild-entrypoint.sh and if one is found, executes it. Fingers crossed, things went well and the application starts automatically.
That was a high level overview of the process.
§other useful VM related commands
List running VMs:
Inspect the metadata of a running VM:
Terminate a running VM:
Firecracker VMs will stop when a
reboot command is issued in the guest. I call these unclean meaning that they will leave a bunch of VM related directories on disk:
- the jail directory
- the run cache directory
- the CNI cache for the VM interface and a veth pair
To mass-clean all these for all exited VMs, run:
Inspect a profile:
Profiles may be updated by issuing subsequent
profile-create commands with a name of an existing profile.
§what’s coming next
These are still early stages for
firebuild. There are many things to improve.
- tests, tests, tests, …, end to end tests
- remove the requirement to have SSH access during
rootfsbuild and move to the
MMDS / vminitbuild
- add support for building directly from Docker images for special cases where the Dockerfile might not be available or is difficult to handle, and example is Jaeger Docker image where the Dockerfile does not incorporate the binary artifact build
- add a command to build a Linux kernel image directly from the tool
nsswitch.confon the guest
- add service catalog support for service service discovery
- add support for additional disks
- a VM management API
- an event bus / hook to be able to react to events originating in
- enable rootfs build and run related operation split via remote build and run operators
- provide a remote registry type of system to host rootfs and kernel files externally
- add networking tools to create CNI bridge and overlay networks and expose VMs on outside of the host
And probably many, many more as the time goes by. I’ll be writing more as
Thanks for reading. Stay safe.