Skip to main content

Kubernetes Image Baking

Overview

Every Kubernetes node the platform creates - both control plane VMs and workers - boots from a pre-built VM image. "Image baking" is the process of building that image once per Kubernetes minor version, so that new clusters provision in minutes instead of half an hour. The image already has the Kubernetes binaries installed and the required container images pulled, so a fresh node only has to apply the per-cluster config from cloud-init.

This is admin-only work. End users never see the image baking process; they only pick a Kubernetes version from the dropdown the admin has populated.

Concepts

A few terms used throughout:

  • VM image. A disk snapshot used as the starting point for new VMs. Same idea as an Ubuntu cloud image, but customised.
  • Bake. Build a VM image with specific software pre-installed, by running through the install steps in a throwaway VM and then snapshotting the disk.
  • kubeadm, kubelet, kubectl. The official Kubernetes node tools. kubeadm bootstraps the control plane. kubelet is the per-node agent that talks to the control plane. kubectl is the command line client.
  • containerd. The container runtime, i.e. the software that actually starts containers on the node.
  • CNI plugin. Container Network Interface. The networking plugin that gives each pod an IP and routes pod-to-pod traffic. Default is Cilium.
  • Cloud Controller Manager (CCM). A controller that runs inside the cluster and turns Kubernetes Service: type=LoadBalancer requests into real managed load balancers on the platform. We ship a custom one called cloud-controller-manager-hypervisor.
  • Cluster autoscaler. Controller that runs inside the cluster and adds or removes worker nodes based on demand. We ship a custom one called cluster-autoscaler-hypervisor.
  • cloud-init. The standard tool that configures a VM on first boot using a small "user-data" file. The Kubernetes cluster lifecycle uses cloud-init to feed each new node the cluster-specific config (tokens, IPs, certificates).

What gets baked

A Kubernetes-ready image is just a normal Linux image with extra software installed and a few sysctl/kernel tweaks. You can either bake one combined image used for both roles, or two separate images for control plane and worker.

ComponentControl plane imageWorker image
etcd container pre-pulledyesno
kube-apiserver / scheduler / controller-manager pre-pulledyesno
cluster-autoscaler binarynoyes
Disk size20 GB40 GB
Image purposeKubernetes Control PlaneKubernetes Worker

Smaller images, cleaner separation, faster control plane boot.

Strategy B: one combined image

If managing two images per version is more pain than the extra disk, bake one combined image and register it as both the control plane image and the worker image when registering the Kubernetes version. Trades ~200 MB extra per worker for half the build cost.

One image per minor version

Kubernetes does not let nodes drift more than one minor version away from each other. Bake separate images for 1.34, 1.35, 1.36, and so on. Patch versions (.0, .1, .2) are picked at build time.

Supported base OS

The build script detects the base OS via /etc/os-release and branches on apt (Debian family) vs dnf (RHEL family). Pick whichever your fleet already uses:

  • Ubuntu 22.04 LTS or newer
  • Debian 12 or 13
  • RHEL 9+
  • CentOS Stream 9+
  • Rocky Linux 9+
  • AlmaLinux 9+
  • Fedora 39+

All families pull the same upstream kubeadm packages from pkgs.k8s.io.

Before you start

You will need:

  • A throwaway VM running one of the supported base OSes.
  • Root access on that VM.
  • Internet from inside the VM (the script downloads packages and container images).
  • About 6 GB free disk space inside the VM during the build.
  • About 30 to 60 minutes per image. Most of that is downloading.

You also need the registry references for the platform's two custom controllers:

  • cloud-controller-manager-hypervisor
  • cluster-autoscaler-hypervisor

These are built locally from their repositories and pushed to a container registry your hypervisors can reach. The script defaults to ghcr.io/hypervisor-io/.... Override via the CCM_REGISTRY and AUTOSCALER_REGISTRY env vars.

What goes in the image

Required packages

  • containerd (a version compatible with the target Kubernetes version - see Kubernetes version skew policy)
  • kubeadm, kubelet, kubectl (pinned to the target patch version)
  • kubernetes-cni
  • runc
  • cloud-init (with NoCloud and ConfigDrive datasources enabled - these are the two methods the platform uses to feed cluster config to a new node)
  • qemu-guest-agent
  • openssh-server

Kernel and sysctl

  • Modules pre-loaded: overlay, br_netfilter
  • sysctl tweaks:
    • net.bridge.bridge-nf-call-iptables=1
    • net.bridge.bridge-nf-call-ip6tables=1
    • net.ipv4.ip_forward=1
  • Swap disabled and masked. kubeadm refuses to start if swap is on.

Networking and utilities

  • iptables / iptables-nft plus nftables
  • socat, ipset, conntrack (or conntrack-tools on RHEL family)
  • ebtables (Debian/Ubuntu only - folded into nftables on RHEL 9+)
  • jq, yq, curl, wget, openssl, ca-certificates, gnupg2
  • On Debian/Ubuntu also: lsb-release, apt-transport-https

Pre-pulled container images

Pulled once at build time so a fresh node can bootstrap without first-boot internet for these:

  • registry.k8s.io/kube-apiserver:v<VERSION> (CP image)
  • registry.k8s.io/kube-controller-manager:v<VERSION> (CP image)
  • registry.k8s.io/kube-scheduler:v<VERSION> (CP image)
  • registry.k8s.io/kube-proxy:v<VERSION> (both)
  • registry.k8s.io/etcd:<ETCD_VERSION> (CP image)
  • registry.k8s.io/coredns/coredns:<COREDNS_VERSION> (both)
  • registry.k8s.io/pause:<PAUSE_VERSION> (both)
  • quay.io/cilium/cilium:v<CILIUM_VERSION> (both - the default CNI; swap to Calico or Flannel if you prefer)
  • quay.io/cilium/operator-generic:v<CILIUM_VERSION> (both)
  • registry.k8s.io/metrics-server/metrics-server:v<METRICS_VERSION> (both)
  • <your-registry>/cloud-controller-manager-hypervisor:<CCM_VERSION> (CP image)
  • <your-registry>/cluster-autoscaler-hypervisor:<CA_VERSION> (worker image)

For the exact reference list for a given Kubernetes minor version, run:

kubeadm config images list --kubernetes-version v1.34.2

What does NOT go in the image

These are per-cluster and injected by cloud-init at first boot:

  • Cluster certificates (kubeadm generates these per cluster).
  • Kubeconfigs.
  • CNI manifest YAMLs (the master applies them via kubectl apply after init).
  • Per-cluster CCM credentials (the master injects a JWT into a Secret).
  • Cluster name, VPC config, Pod CIDR, Service CIDR.

Do not pre-bake any of the above. Doing so will break per-cluster isolation.

OS-specific notes

The build script applies these tweaks automatically; they are listed here so you know what to expect.

Ubuntu / Debian

Stock cloud images work as-is. No preconfiguration required.

RHEL / CentOS Stream / Rocky / AlmaLinux / Fedora

  • SELinux is set to permissive. kubeadm refuses to run under enforcing because kubelet's container processes cannot relabel host paths. /etc/selinux/config is rewritten so the change survives reboot.
  • firewalld is disabled (systemctl disable --now firewalld). kube-proxy manages its own iptables/nftables ruleset and firewalld racing against it produces broken NodePort and Service routing.
  • containerd is installed from the Docker CE repo (the containerd.io package), not the distro's own containerd. This is what the kubeadm docs reference.
  • Kubernetes packages come from pkgs.k8s.io via a yum.repos.d/kubernetes.repo file with exclude= set so a stray dnf update cannot accidentally bump kubeadm/kubelet/kubectl.

For stock RHEL 9 (not CentOS/Rocky/Alma), make sure BaseOS and AppStream repos are enabled via subscription-manager before running the script.

Admin steps

1. Spin up a fresh VM

Use one of the supported base OS cloud images. 4 vCPU / 4 GB RAM / 30 GB disk is comfortable for the build. Boot, log in as root (or sudo to root).

2. Drop the build script onto the VM

Save the build script (provided in the platform source tree at docs/kubernetes/scripts/build-k8s-image.sh) onto the VM. SCP, curl, or paste it - whatever works.

chmod +x build-k8s-image.sh

3. Run it

# Worker image, K8s 1.34
./build-k8s-image.sh --version 1.34.2 --role worker --cni cilium

# Control plane image, K8s 1.35
./build-k8s-image.sh --version 1.35.0 --role cp --cni cilium

# Combined image (both roles in one)
./build-k8s-image.sh --version 1.36.0 --role combined --cni cilium

Flags:

  • --version <K8s patch> - required, e.g. 1.34.2.
  • --role cp|worker|combined - required.
  • --cni cilium|calico|flannel - defaults to cilium.

Environment overrides:

  • CCM_REGISTRY - where to pull the cloud-controller-manager image (default ghcr.io/hypervisor-io).
  • CCM_VERSION - tag (default latest).
  • AUTOSCALER_REGISTRY - where to pull cluster-autoscaler-hypervisor (default ghcr.io/hypervisor-io).
  • AUTOSCALER_VERSION - tag (default: per-minor mapping inside the script).

The script is idempotent. If it gets interrupted, re-run it and it picks up where it left off.

4. Verify the image

After the script reports Done, sanity-check inside the VM:

# kubelet binary present and correct version
kubelet --version

# kubeadm binary present and correct version
kubeadm version

# containerd running
systemctl is-active containerd

# kubelet NOT running on the snapshot - it will fail without certs.
# cloud-init re-enables it after kubeadm has placed the certs.
systemctl is-enabled kubelet # should report 'disabled'

# Modules loaded
lsmod | grep -E 'overlay|br_netfilter'

# sysctls applied
sysctl net.bridge.bridge-nf-call-iptables net.ipv4.ip_forward

# Pre-pulled images present
crictl images | grep -E 'kube-apiserver|pause|coredns'

# Platform's CCM image cached (CP image only)
crictl images | grep cloud-controller-manager-hypervisor

# Platform's autoscaler image cached (worker image only)
crictl images | grep cluster-autoscaler-hypervisor

# qemu-guest-agent active
systemctl is-active qemu-guest-agent

# cloud-init NoCloud + ConfigDrive datasources enabled
cat /etc/cloud/cloud.cfg.d/*.cfg | grep -i datasource

Every check should pass. If anything fails, fix the script and re-run. Do not patch the snapshot by hand - your fix will not survive the next bake.

5. Shut down and snapshot

shutdown -h now

Snapshot the VM's disk into your image catalogue using whatever your storage layer supports: Ceph snapshot, libvirt virsh snapshot-create-as, raw qemu-img convert, etc.

6. Register in the Image Browser

  1. Navigate to Media > Images in the admin panel.
  2. Click Create Image.
  3. Fill in:
    • Name - e.g. HKS Control Plane 1.34.2 or HKS Worker 1.34.2.
    • URL - pointer to the snapshot (whatever URL format your storage backend accepts).
    • Default Interface - usually virtio.
    • Purpose - Kubernetes Control Plane for the CP image, Kubernetes Worker for the worker image.
    • Cloudinit - on.
    • Public - off (admin-managed image).
    • Enabled - on.
  4. Save.

The image will then appear in the Image Browser (Media > Images) and is selectable in the Supported Versions form below. There is no screenshot of this step in this guide; see the Image Browser page under the Compute section if you need a tour.

7. Wire the image into a Supported Version

  1. Navigate to Kubernetes > Supported Versions.
  2. Click Register Version (or edit the existing row for that version).
  3. Pick the freshly registered images for Control Plane Image and Worker Image.
  4. Save.

The version is now selectable from the user Create Cluster wizard.

Cloud-init contract

The image MUST honour the NoCloud cloud-init datasource. The master delivers user-data and meta-data to each node via a seed ISO attached during VM provision. The image's cloud-init runs:

  1. A runcmd: block containing kubeadm init or kubeadm join plus tokens and cert hashes.
  2. A write_files: block dropping cluster metadata into /etc/hypervisor.io/cluster.json.
  3. A systemd-units block enabling kubelet, containerd, and qemu-guest-agent.

The image must not auto-start kubelet on first boot. It would fail without certs. The build script disables it; cloud-init re-enables it after kubeadm has placed the certs.

Updating a version

Image references are immutable on a given KubernetesSupportedVersion row. To roll out a new patch (e.g. 1.34.2 to 1.34.3):

  1. Re-run the build script with --version 1.34.3.
  2. Snapshot the disk under a new tag (hks-cp-1.34.3, hks-worker-1.34.3).
  3. Register the two new images in Media > Images.
  4. Register a new row in Kubernetes > Supported Versions pointing at the new images.
  5. Existing clusters do NOT auto-upgrade. Users promote per cluster via the Workers tab (worker rolling upgrade) and the control plane rolling upgrade.
  6. Mark the old version Deprecated once the new one is published; mark it EOL once no cluster references it.

What end users see

End users do not see anything image-related directly. They only see the Kubernetes versions you registered, in the Create Cluster wizard's Kubernetes Version dropdown.

Troubleshooting

Script halts during package install

  • DNS or network: the VM needs to reach pkgs.k8s.io, download.docker.com, the distro mirrors, and your container registry.
  • On RHEL 9 stock: verify BaseOS and AppStream subscriptions are attached.
  • On Debian/Ubuntu: a stale apt lock from a previous unclean run can wedge things. Run dpkg --configure -a && apt update.

crictl images shows missing pre-pulled containers

  • Confirm the registry is reachable from the build VM.
  • For custom registries: containerd may need credentials in /etc/containerd/config.toml under [plugins."io.containerd.grpc.v1.cri".registry.configs].
  • Re-run the script. The pull step is idempotent.

Cluster create succeeds but worker stays NotReady

Usually means the CNI failed to install. The CNI installs at cluster create time, not from the image, but the image still has to have the CNI plugin's container ready. Confirm:

  • The image actually has the CNI plugin's container pre-pulled: crictl images | grep cilium.
  • The Pod CIDR picked at cluster-create time matches the CNI's expectations. Cilium accepts any. Flannel default is 10.244.0.0/16.

kubelet will not start on first boot

Confirm:

  • systemctl is-enabled kubelet returns disabled on the freshly-snapshotted image.
  • Cloud-init's runcmd: ran successfully. Check /var/log/cloud-init-output.log on the node.

Future automation

The current script is a single bash file you run by hand. The plan is to convert it into a Packer template plus an Ansible role so image bakes become reproducible CI artifacts. Out of scope for now. Manual VM runs are acceptable until your image catalogue grows past around 10 entries.