§what is Firecracker
Firecracker is a virtualization technology for creating and managing secure, multi-tenant services suited for container-like and serverless scenarios. Firecracker workloads run in virtual machines, not containers. Unlike containers, they benefit from extra isolation properties provided by the hardware virtualization. Similar to containers, Firecracker VMs—microVMs—are lightweight and fast to boot. Like containers, they can be treated like cattle. They combine the flexibility of containers and security of virtual machines. These little things can be started in as little as 125 milliseconds and a single host can manage thousands of them! Firecracker was developed at Amazon Web Services primarily for Lambda and Fargate offerings.
Firecracker uses Kernel Virtual Machine (KVM) to create and run microVMs. A minimalist design is achieved by removing unnecessary devices and guest-facing functionality. This reduces the memory footprint and attack surface of each individual VM leading to better utilization and increased security. At minimum, a microVM requires a Linux kernel image and a root file system. Networking can be provided by setting up interfaces manually or with container network interface (CNI).
Firecracker is a couple of years old. Pretty young in the technology world but there are already interesting integrations out there. Kata Containers and WeaveWorks Ignite are the major ones.
§firebuild
There is only so much one can learn by looking at existing tools. The best way is to take something and build another useful thing on top of it. Only this way one can hit roadblocks cleared by others. Only this way one can investigate alternative avenues, possibly not considered before. That is why a few weeks ago I have started working on firebuild
. The source code is on GitHub[1].
With firebuild
it is possible to:
- build root file systems directly from Dockerfiles
- tag and version root file systems
- run and manage microVMs on a single host
- define run profiles
The concept of firebuild
is to leverage as much of the existing Docker world as possible. There are thousands of Docker images out there. Docker images are awesome because they encapsulate the software we want to run in our workloads, they also encapsulate dependencies. Dockerfiles are what Docker images are built from. Dockeriles are the blueprints of the modern infrastructure. There are thousands of them for almost anything one can imagine and new ones are very easy to write.
§an image is worth more than a thousand words
Ah, but the idea is pretty difficult to visualize with a single image. So, instead, let me walk you though this example of running HashiCorp Consul 1.9.4 on Firecracker. I promise, any questions are answered further.
Before going all in, some prerequisites[2].
§create a firebuild profile
|
|
§create a base operating system root file system (baseos)
firebuild
uses the Docker metaphor. An image of an application is built FROM
a base. An application image can be built FROM alpine:3.13
, for example. Or FROM debian:buster-slim
, or FROM registry.access.redhat.com/ubi8/ubi-minimal:8.3
and dozens others.
In order to fulfill those semantics, a base operating system image must be built before the application root file system can be created.
|
|
§create a root file system of the application (rootfs)
To run an instance of HashiCorp Consul, firebuild
requires the Consul application root file system. To build one:
|
|
§start the application
|
|
§query Consul
First, find the VM ID:
|
|
In my case, the value is wcabty1922gloailwrce
. I used it to get the IP address of the VM:
|
|
The command returned 192.168.127.89
. I could query Consul via REST API:
|
|
§what the heck happened
I have started by creating a firebuild
profile. Technically firebuild
does not require one. Common arguments may be provided on every execution. The profile exists for two reasons:
- it makes subsequent operations more concise by moving the tedious arguments away
- provides extra isolation with different
chroots
, cache directories, and image / kernel catalogs
The directories referenced in the profile must exist before a profile can be created.
In the next step, I have built a base operating system root file system. The elephant in the room question is:
Why does this tool even require that step?
Typical Linux in Docker has many parts removed. For example, there is no init system. Further, different base Docker images have often completely different sets of tools available. All that is for a good reason: Docker images supposed to be small, must start fast and limit the potential attack surface by removing what’s unnecessary.
firebuild
builds Firecracker virtual machines. It does so from Dockerfile blueprints.
In order to provide a consistent experience, it requires a more or less functional multi-user Linux installation with components otherwise hidden in the Docker or OCI runtime. These base Linux installations are built from firebuild
provided Dockerfiles, the --dockerfile $(pwd)/baseos/_/alpine/3.12/Dockerfile
is a base Alpine 3.12. All the commands above were executed from $GOPATH/src/github.com/combust-labs/firebuild
directory, hence the use of $(pwd)
in the baseos
build.
firebuild
uses Docker to build the base operating root file system by:
- building a Docker image from the provided Dockerfile
- starting a container from newly built image
- exporting the root file system of the container to the ext4 file on the host using Docker API
exec
- removing the container and the image
- persisting the built file in the storage provider and namespacing it, the example above results in the root file system stored in
/fc/rootfs/_/alpine/3.12/rootfs
- persisting the build metadata next to the root file system file, above example gives
/fc/rootfs/_/alpine/3.12/metadata.json
This custom firebuild
provided Dockerfile is based on an upstream alpine:3.12
from Docker Hub.
The primary reason for following this path is to enable building Firecracker VMs from upstream Dockerfiles as often as possible. Other tools out there enable converting a Docker container into a rootfs
file but to achieve that full VM experience, a Docker container has to be launched from a hand crafted Dockerfile or extra packages have to be installed on the running container before the export. Dockerfiles are fully auditable but these extra steps are not. The steps often differ between containers. It might be difficult to track how the rootfs
was built, some benefits of using a blueprint could be lost.
The step 2 of the example builds Consul directly from the official HashiCorp Docker images GitHub repository. The application root file system was built using the rootfs
command.
Note: I refer to the application root file system as
rootfs
. Bit confusing at first because the result of thebaseos
command is technically also arootfs
. However, to mentally distinguish one from the other, I refer to to the base OS using the termbaseos
and an application is arootfs
. This may change in the future.
The rootfs
command does much more work than the baseos
command.
It starts by fetching a Dockerfile from a source given via the --dockerfile
argument. The source can be one of:
- a
git+http(s)://
style URL pointing at a git repository (does not have to be GitHub) - a
http://
ofhttps://
URL, be careful here: There Will Be Dragons (read more[3]) - a local file
- an inline
Dockerfile
- standard
ssh://
,git://
andgit+ssh://
URL with a Dockerfile path appended via:/path/to/Dockerfile
The most convenient is the local file system build or a git repository. If a git repository is used, firebuild
will clone a complete repository to a temporary directory and treat the build further as a local file system build. Once the sources are on disk, firebuild
loads and parses the Dockerfile. This part is preliminary and will change in favor of unattended bootstrap without SSH requirement: Next, a build time VM is started, firebuild
connects to it via SSH and runs all commands from the Dockerfile against that VM.
Resources referenced with ADD
and COPY
commands are treated likewise and supported. Remote resources are supported. firebuild
does its best to properly reflect any WORKDIR
, USER
and SHELL
conditions. It supports --chown
flags for ADD
and COPY
.
What’s more, firebuild
supports multi-stage builds. firebuild
will build any stages with FROM ... as
as regular Docker images and extract resources from the stage to the main build when COPY --from=
is found. For example, it’s perfectly fine to build a Kafka Proxy root file system from:
|
|
The Dockerfile commands statements which are not supported: ONBUILD
, HEALTHCHECK
and STOPSIGNAL
(although the last one will be supported at a later stage).
Once all of that is finished, the build VM will be stopped, cleaned up and the resulting root file system will be persisted in the storage provider. A metadata file is stored next to the root file system. Currently, only the directory based storage provider is available.
Finally, a resulting application is launched with the run
command. The run
command uses an unattended, cloud-init like mechanism. The metadata of the baseos
and rootfs
is combined. A guest facing version is put in MMDS (the Firecracker machine metadata service). MMDS provides a HTTP API available to both: the host and the guest. By default, if the guest was started with --allow-mmds
flag, it can reach that API via 169.254.169.254
IP address. firebuild
uses MMDS by default for all guests but this can be disabled. The guest facing metadata contains a bunch of information required to bootstrap the VM in a cloud-init style. These are fairly short so let’s look at an example:
|
|
The metadata contains information about attached drives, network interfaces, simple machine data, entrypoint info and user’s SSH keys, if --identity-file
and --ssh-user
arguments were provided. The component responsible for bootstrapping the VM from this data is called vminit
and can be found in this GitHub repository[4]. The compiled binary is baked into the baseos
(suboptimal but it’s a first iteration) and invoked as a system service on VM start.
Currently, vminit
does the following:
- update
/etc/hosts
file if the VM has a network interface and make sure the VM resolves itself via configuredhostname
on the interface IP address - update
/etc/hostname
to the configuredhostname
- create an environment variables
/etc/profile.d/run-env.sh
file for any variables passed via--env
and--env-file
flags of therun
command - when
users
contains a user entry with SSH keys, write those SSH keys to the respectiveauthorized_keys
file to enable SSH access; an example of a user entry:
|
|
- write the
/usr/bin/firebuild-entrypoint.sh
program responsible for invoking theentrypoint
from MMDS data
When the machine starts, vminit
looks for the /usr/bin/firebuild-entrypoint.sh
and if one is found, executes it. Fingers crossed, things went well and the application starts automatically.
That was a high level overview of the process.
§other useful VM related commands
List running VMs:
|
|
Inspect the metadata of a running VM:
|
|
Terminate a running VM:
|
|
§unclean shutdowns
Firecracker VMs will stop when a reboot
command is issued in the guest. I call these unclean meaning that they will leave a bunch of VM related directories on disk:
- the jail directory
- the run cache directory
- the CNI cache for the VM interface and a veth pair
To mass-clean all these for all exited VMs, run:
|
|
§profile commands
List profiles:
|
|
Inspect a profile:
|
|
Profiles may be updated by issuing subsequent profile-create
commands with a name of an existing profile.
§what’s coming next
These are still early stages for firebuild
. There are many things to improve.
§short term
- tests, tests, tests, …, end to end tests
- remove the requirement to have SSH access during
rootfs
build and move to theMMDS / vminit
build - add support for building directly from Docker images for special cases where the Dockerfile might not be available or is difficult to handle, and example is Jaeger Docker image where the Dockerfile does not incorporate the binary artifact build
- add a command to build a Linux kernel image directly from the tool
- manage
resolv.conf
andnsswitch.conf
on the guest
§mid term
- add service catalog support for service service discovery
- add support for additional disks
- a VM management API
- an event bus / hook to be able to react to events originating in
firebuild
§long term
- enable rootfs build and run related operation split via remote build and run operators
- provide a remote registry type of system to host rootfs and kernel files externally
- add networking tools to create CNI bridge and overlay networks and expose VMs on outside of the host
And probably many, many more as the time goes by. I’ll be writing more as firebuild
develops.
Thanks for reading. Stay safe.