Kubernetes Image Baking
Overview
Every Kubernetes node the platform creates - both control plane VMs and workers - boots from a pre-built VM image. "Image baking" is the process of building that image once per Kubernetes minor version, so that new clusters provision in minutes instead of half an hour. The image already has the Kubernetes binaries installed and the required container images pulled, so a fresh node only has to apply the per-cluster config from cloud-init.
This is admin-only work. End users never see the image baking process; they only pick a Kubernetes version from the dropdown the admin has populated.
Concepts
A few terms used throughout:
- VM image. A disk snapshot used as the starting point for new VMs. Same idea as an Ubuntu cloud image, but customised.
- Bake. Build a VM image with specific software pre-installed, by running through the install steps in a throwaway VM and then snapshotting the disk.
- kubeadm, kubelet, kubectl. The official Kubernetes node tools.
kubeadmbootstraps the control plane.kubeletis the per-node agent that talks to the control plane.kubectlis the command line client. - containerd. The container runtime, i.e. the software that actually starts containers on the node.
- CNI plugin. Container Network Interface. The networking plugin that gives each pod an IP and routes pod-to-pod traffic. Default is Cilium.
- Cloud Controller Manager (CCM). A controller that runs inside the cluster and turns Kubernetes
Service: type=LoadBalancerrequests into real managed load balancers on the platform. We ship a custom one calledcloud-controller-manager-hypervisor. - Cluster autoscaler. Controller that runs inside the cluster and adds or removes worker nodes based on demand. We ship a custom one called
cluster-autoscaler-hypervisor. - cloud-init. The standard tool that configures a VM on first boot using a small "user-data" file. The Kubernetes cluster lifecycle uses cloud-init to feed each new node the cluster-specific config (tokens, IPs, certificates).
What gets baked
A Kubernetes-ready image is just a normal Linux image with extra software installed and a few sysctl/kernel tweaks. You can either bake one combined image used for both roles, or two separate images for control plane and worker.
Strategy A: two images (recommended)
| Component | Control plane image | Worker image |
|---|---|---|
| etcd container pre-pulled | yes | no |
| kube-apiserver / scheduler / controller-manager pre-pulled | yes | no |
| cluster-autoscaler binary | no | yes |
| Disk size | 20 GB | 40 GB |
| Image purpose | Kubernetes Control Plane | Kubernetes Worker |
Smaller images, cleaner separation, faster control plane boot.
Strategy B: one combined image
If managing two images per version is more pain than the extra disk, bake one combined image and register it as both the control plane image and the worker image when registering the Kubernetes version. Trades ~200 MB extra per worker for half the build cost.
Kubernetes does not let nodes drift more than one minor version away from each other. Bake separate images for 1.34, 1.35, 1.36, and so on. Patch versions (.0, .1, .2) are picked at build time.
Supported base OS
The build script detects the base OS via /etc/os-release and branches on apt (Debian family) vs dnf (RHEL family). Pick whichever your fleet already uses:
- Ubuntu 22.04 LTS or newer
- Debian 12 or 13
- RHEL 9+
- CentOS Stream 9+
- Rocky Linux 9+
- AlmaLinux 9+
- Fedora 39+
All families pull the same upstream kubeadm packages from pkgs.k8s.io.
Before you start
You will need:
- A throwaway VM running one of the supported base OSes.
- Root access on that VM.
- Internet from inside the VM (the script downloads packages and container images).
- About 6 GB free disk space inside the VM during the build.
- About 30 to 60 minutes per image. Most of that is downloading.
You also need the registry references for the platform's two custom controllers:
cloud-controller-manager-hypervisorcluster-autoscaler-hypervisor
These are built locally from their repositories and pushed to a container registry your hypervisors can reach. The script defaults to ghcr.io/hypervisor-io/.... Override via the CCM_REGISTRY and AUTOSCALER_REGISTRY env vars.
What goes in the image
Required packages
containerd(a version compatible with the target Kubernetes version - see Kubernetes version skew policy)kubeadm,kubelet,kubectl(pinned to the target patch version)kubernetes-cnirunccloud-init(withNoCloudandConfigDrivedatasources enabled - these are the two methods the platform uses to feed cluster config to a new node)qemu-guest-agentopenssh-server
Kernel and sysctl
- Modules pre-loaded:
overlay,br_netfilter - sysctl tweaks:
net.bridge.bridge-nf-call-iptables=1net.bridge.bridge-nf-call-ip6tables=1net.ipv4.ip_forward=1
- Swap disabled and masked.
kubeadmrefuses to start if swap is on.
Networking and utilities
iptables/iptables-nftplusnftablessocat,ipset,conntrack(orconntrack-toolson RHEL family)ebtables(Debian/Ubuntu only - folded intonftableson RHEL 9+)jq,yq,curl,wget,openssl,ca-certificates,gnupg2- On Debian/Ubuntu also:
lsb-release,apt-transport-https
Pre-pulled container images
Pulled once at build time so a fresh node can bootstrap without first-boot internet for these:
registry.k8s.io/kube-apiserver:v<VERSION>(CP image)registry.k8s.io/kube-controller-manager:v<VERSION>(CP image)registry.k8s.io/kube-scheduler:v<VERSION>(CP image)registry.k8s.io/kube-proxy:v<VERSION>(both)registry.k8s.io/etcd:<ETCD_VERSION>(CP image)registry.k8s.io/coredns/coredns:<COREDNS_VERSION>(both)registry.k8s.io/pause:<PAUSE_VERSION>(both)quay.io/cilium/cilium:v<CILIUM_VERSION>(both - the default CNI; swap to Calico or Flannel if you prefer)quay.io/cilium/operator-generic:v<CILIUM_VERSION>(both)registry.k8s.io/metrics-server/metrics-server:v<METRICS_VERSION>(both)<your-registry>/cloud-controller-manager-hypervisor:<CCM_VERSION>(CP image)<your-registry>/cluster-autoscaler-hypervisor:<CA_VERSION>(worker image)
For the exact reference list for a given Kubernetes minor version, run:
kubeadm config images list --kubernetes-version v1.34.2
What does NOT go in the image
These are per-cluster and injected by cloud-init at first boot:
- Cluster certificates (kubeadm generates these per cluster).
- Kubeconfigs.
- CNI manifest YAMLs (the master applies them via
kubectl applyafter init). - Per-cluster CCM credentials (the master injects a JWT into a Secret).
- Cluster name, VPC config, Pod CIDR, Service CIDR.
Do not pre-bake any of the above. Doing so will break per-cluster isolation.
OS-specific notes
The build script applies these tweaks automatically; they are listed here so you know what to expect.
Ubuntu / Debian
Stock cloud images work as-is. No preconfiguration required.
RHEL / CentOS Stream / Rocky / AlmaLinux / Fedora
- SELinux is set to
permissive.kubeadmrefuses to run underenforcingbecause kubelet's container processes cannot relabel host paths./etc/selinux/configis rewritten so the change survives reboot. - firewalld is disabled (
systemctl disable --now firewalld).kube-proxymanages its own iptables/nftables ruleset andfirewalldracing against it produces broken NodePort and Service routing. - containerd is installed from the Docker CE repo (the
containerd.iopackage), not the distro's owncontainerd. This is what the kubeadm docs reference. - Kubernetes packages come from
pkgs.k8s.iovia ayum.repos.d/kubernetes.repofile withexclude=set so a straydnf updatecannot accidentally bump kubeadm/kubelet/kubectl.
For stock RHEL 9 (not CentOS/Rocky/Alma), make sure BaseOS and AppStream repos are enabled via subscription-manager before running the script.
Admin steps
1. Spin up a fresh VM
Use one of the supported base OS cloud images. 4 vCPU / 4 GB RAM / 30 GB disk is comfortable for the build. Boot, log in as root (or sudo to root).
2. Drop the build script onto the VM
Save the build script (provided in the platform source tree at docs/kubernetes/scripts/build-k8s-image.sh) onto the VM. SCP, curl, or paste it - whatever works.
chmod +x build-k8s-image.sh
3. Run it
# Worker image, K8s 1.34
./build-k8s-image.sh --version 1.34.2 --role worker --cni cilium
# Control plane image, K8s 1.35
./build-k8s-image.sh --version 1.35.0 --role cp --cni cilium
# Combined image (both roles in one)
./build-k8s-image.sh --version 1.36.0 --role combined --cni cilium
Flags:
--version <K8s patch>- required, e.g.1.34.2.--role cp|worker|combined- required.--cni cilium|calico|flannel- defaults tocilium.
Environment overrides:
CCM_REGISTRY- where to pull the cloud-controller-manager image (defaultghcr.io/hypervisor-io).CCM_VERSION- tag (defaultlatest).AUTOSCALER_REGISTRY- where to pull cluster-autoscaler-hypervisor (defaultghcr.io/hypervisor-io).AUTOSCALER_VERSION- tag (default: per-minor mapping inside the script).
The script is idempotent. If it gets interrupted, re-run it and it picks up where it left off.
4. Verify the image
After the script reports Done, sanity-check inside the VM:
# kubelet binary present and correct version
kubelet --version
# kubeadm binary present and correct version
kubeadm version
# containerd running
systemctl is-active containerd
# kubelet NOT running on the snapshot - it will fail without certs.
# cloud-init re-enables it after kubeadm has placed the certs.
systemctl is-enabled kubelet # should report 'disabled'
# Modules loaded
lsmod | grep -E 'overlay|br_netfilter'
# sysctls applied
sysctl net.bridge.bridge-nf-call-iptables net.ipv4.ip_forward
# Pre-pulled images present
crictl images | grep -E 'kube-apiserver|pause|coredns'
# Platform's CCM image cached (CP image only)
crictl images | grep cloud-controller-manager-hypervisor
# Platform's autoscaler image cached (worker image only)
crictl images | grep cluster-autoscaler-hypervisor
# qemu-guest-agent active
systemctl is-active qemu-guest-agent
# cloud-init NoCloud + ConfigDrive datasources enabled
cat /etc/cloud/cloud.cfg.d/*.cfg | grep -i datasource
Every check should pass. If anything fails, fix the script and re-run. Do not patch the snapshot by hand - your fix will not survive the next bake.
5. Shut down and snapshot
shutdown -h now
Snapshot the VM's disk into your image catalogue using whatever your storage layer supports: Ceph snapshot, libvirt virsh snapshot-create-as, raw qemu-img convert, etc.
6. Register in the Image Browser
- Navigate to Media > Images in the admin panel.
- Click Create Image.
- Fill in:
- Name - e.g.
HKS Control Plane 1.34.2orHKS Worker 1.34.2. - URL - pointer to the snapshot (whatever URL format your storage backend accepts).
- Default Interface - usually
virtio. - Purpose -
Kubernetes Control Planefor the CP image,Kubernetes Workerfor the worker image. - Cloudinit - on.
- Public - off (admin-managed image).
- Enabled - on.
- Name - e.g.
- Save.
The image will then appear in the Image Browser (Media > Images) and is selectable in the Supported Versions form below. There is no screenshot of this step in this guide; see the Image Browser page under the Compute section if you need a tour.
7. Wire the image into a Supported Version
- Navigate to Kubernetes > Supported Versions.
- Click Register Version (or edit the existing row for that version).
- Pick the freshly registered images for Control Plane Image and Worker Image.
- Save.
The version is now selectable from the user Create Cluster wizard.
Cloud-init contract
The image MUST honour the NoCloud cloud-init datasource. The master delivers user-data and meta-data to each node via a seed ISO attached during VM provision. The image's cloud-init runs:
- A
runcmd:block containingkubeadm initorkubeadm joinplus tokens and cert hashes. - A
write_files:block dropping cluster metadata into/etc/hypervisor.io/cluster.json. - A systemd-units block enabling
kubelet,containerd, andqemu-guest-agent.
The image must not auto-start kubelet on first boot. It would fail without certs. The build script disables it; cloud-init re-enables it after kubeadm has placed the certs.
Updating a version
Image references are immutable on a given KubernetesSupportedVersion row. To roll out a new patch (e.g. 1.34.2 to 1.34.3):
- Re-run the build script with
--version 1.34.3. - Snapshot the disk under a new tag (
hks-cp-1.34.3,hks-worker-1.34.3). - Register the two new images in Media > Images.
- Register a new row in Kubernetes > Supported Versions pointing at the new images.
- Existing clusters do NOT auto-upgrade. Users promote per cluster via the Workers tab (worker rolling upgrade) and the control plane rolling upgrade.
- Mark the old version
Deprecatedonce the new one is published; mark itEOLonce no cluster references it.
What end users see
End users do not see anything image-related directly. They only see the Kubernetes versions you registered, in the Create Cluster wizard's Kubernetes Version dropdown.
Troubleshooting
Script halts during package install
- DNS or network: the VM needs to reach
pkgs.k8s.io,download.docker.com, the distro mirrors, and your container registry. - On RHEL 9 stock: verify
BaseOSandAppStreamsubscriptions are attached. - On Debian/Ubuntu: a stale
aptlock from a previous unclean run can wedge things. Rundpkg --configure -a && apt update.
crictl images shows missing pre-pulled containers
- Confirm the registry is reachable from the build VM.
- For custom registries:
containerdmay need credentials in/etc/containerd/config.tomlunder[plugins."io.containerd.grpc.v1.cri".registry.configs]. - Re-run the script. The pull step is idempotent.
Cluster create succeeds but worker stays NotReady
Usually means the CNI failed to install. The CNI installs at cluster create time, not from the image, but the image still has to have the CNI plugin's container ready. Confirm:
- The image actually has the CNI plugin's container pre-pulled:
crictl images | grep cilium. - The Pod CIDR picked at cluster-create time matches the CNI's expectations. Cilium accepts any. Flannel default is
10.244.0.0/16.
kubelet will not start on first boot
Confirm:
systemctl is-enabled kubeletreturnsdisabledon the freshly-snapshotted image.- Cloud-init's
runcmd:ran successfully. Check/var/log/cloud-init-output.logon the node.
Future automation
The current script is a single bash file you run by hand. The plan is to convert it into a Packer template plus an Ansible role so image bakes become reproducible CI artifacts. Out of scope for now. Manual VM runs are acceptable until your image catalogue grows past around 10 entries.
Related pages
- Kubernetes admin guide - cluster-level setup.
- Kubernetes Node Pools - how worker images are used per pool.