Skip to main content

Kubernetes

Overview

Kubernetes clusters run as managed VMs inside a VPC, fronted by a managed Load Balancer that exposes the Kubernetes API server. Each cluster ships with:

  • Single or HA control plane -- 1 node (cost-optimised) or 3 nodes (survives single-node failure)
  • Manual or autoscaled workers -- set a fixed count, or pin min/max and let the cluster autoscaler add nodes when pods are pending
  • Integrated Cloud Controller Manager (CCM) -- Service: type=LoadBalancer in your manifests creates a real managed Load Balancer on the platform
  • Rolling upgrades -- one-click Kubernetes minor/patch upgrades for both control plane and workers, with drain + surge protection
  • Per-cluster security groups -- separate rule scopes for the LB (API exposure), control plane, and workers
  • Custom TLS on the API endpoint -- bind a Let's Encrypt or custom certificate to a domain you point at the API LB

Each cluster lives inside one VPC and exposes its API server through a dedicated public + private endpoint (or private-only).


Admin Configuration

Before users can create clusters, an administrator must enable Kubernetes per region and register at least one supported version.

Enabling Kubernetes on a Region

  1. Navigate to Compute → Hypervisor Groups
  2. Open the hypervisor group (region) you want to enable
  3. Tick Enable Kubernetes service in this group
  4. Make sure the same group has VPC and Load Balancer enabled (Kubernetes needs both)
  5. Save

Screenshot: Admin > Hypervisor Groups > Edit page showing the "Enable Kubernetes service in this group" toggle.

The region's slug is used as the Kubernetes region label baked into node hostnames. The slug is locked once any cluster exists in the group.

Registering Supported Versions

The supported-versions catalogue is the curated list of Kubernetes versions users can pick when creating or upgrading a cluster.

  1. Navigate to Kubernetes → Supported Versions in the admin sidebar
  2. Click Register Version
  3. Fill in:
    • Semantic Version -- e.g. 1.31.0
    • State -- Active (selectable), Deprecated (selectable with warning), or EOL (hidden from new clusters)
    • Control Plane Image -- image baked for control plane nodes (purpose = Kubernetes Control Plane)
    • Worker Image -- image baked for worker nodes (purpose = Kubernetes Worker)
    • Min CPU Cores / Min RAM (MB) -- minimum plan requirements (used to filter plan picker on the Create wizard)
    • EOL Date -- optional date the version reaches end-of-life
    • Upgrade From -- comma-separated list of versions that may be upgraded to this version (e.g. 1.30.0, 1.30.1)
    • Bundled Components -- optional JSON describing co-packaged components (etcd, coredns, cilium, etc.)
  4. Save

Screenshot: Admin > Kubernetes > Supported Versions page showing the registered versions table and the "Register Version" modal.

Where do CP / Worker images come from?

You bake them yourself. See the Kubernetes Image Baking guide for the full procedure -- everything from base OS prerequisites to which container images to pre-pull.

Admin Cluster Management

Admins can view and operate on all user clusters from Kubernetes → Clusters in the admin sidebar. The admin cluster page mirrors the user view and adds destructive controls:

  • Force destroy -- terminate the cluster even when normal delete fails
  • Force evict -- forcibly remove a single node that won't drain cleanly
  • Suspend / Resume -- suspend a cluster (stops billing for nodes) without deleting it
  • Reset state -- clear a stuck Provisioning / Upgrading state to Running (use after manual intervention)
  • Cancel task -- cancel a stuck task

All destructive actions are audit-logged.


Prerequisites

Before creating a cluster you need:

  1. A region with Kubernetes enabled -- the region must have VPC, Load Balancer, and Kubernetes enabled by an administrator
  2. A VPC in that region -- the cluster will run inside it
  3. A NAT Gateway attached to the VPC -- workers need outbound internet to pull images and reach pkgs.k8s.io
  4. At least one private subnet -- control plane nodes must sit in a private subnet
  5. Sufficient credits -- clusters are billed hourly per node (control plane + workers + control plane LB)

If any prerequisite is missing, the Create page surfaces a warning telling you exactly what to fix.

Screenshot: User panel > Kubernetes > Create page showing the Region & VPC step with NAT Gateway warning when a VPC has no NAT GW attached.


Creating a Cluster

  1. Navigate to Kubernetes → Clusters in the user sidebar
  2. Click Create Cluster
  3. Walk through the six-step wizard described below

Screenshot: User panel > Kubernetes > Create page showing the six-step wizard with progress indicator at the top.

Step 1 -- Region & Networking

  • Region -- only regions with VPC + Load Balancer + Kubernetes enabled appear
  • VPC -- only VPCs in the selected region appear; the panel shows whether the VPC has a NAT Gateway attached
NAT Gateway required

Workers need outbound internet to pull container images. Create a NAT Gateway on the VPC before deploying the cluster.

Step 2 -- Subnets

  • Control plane subnet -- must be a private subnet. The API server is reached through the public LB, never directly from the internet.
  • Worker subnet -- defaults to the control plane subnet. Pick a different one to isolate workers.

Step 3 -- Cluster Identity

  • Name -- display name (up to 50 characters)
  • Slug -- lowercase DNS-compatible identifier used in node hostnames (up to 63 chars, [a-z0-9-])
  • Description -- optional free-text notes
  • Kubernetes Version -- pick from the admin-curated list of supported versions
  • Endpoint Mode:
    • Private -- API server reachable only from inside the VPC
    • Public & Private -- API server reachable from both the internet and the VPC (defaults to public)

Step 4 -- Control Plane

  • Control Node Count:
    • 1 node (Single) -- lower cost, no redundancy. Recommended for dev / staging.
    • 3 nodes (HA) -- survives single-node failure. Recommended for production.
  • Control Plane Plan -- CPU / RAM / disk for each control plane VM
  • Control Plane Load Balancer Plan -- resource size for the managed LB fronting the API server

Step 5 -- Workers

The wizard creates the cluster's default node pool. You can add more pools after the cluster is up (see Node Pools for mixed-shape clusters, GPU pools, etc.).

  • Worker Plan -- CPU / RAM / disk for each worker VM (plan-defined; per-pool, not changeable on existing nodes)
  • Worker Count -- initial number of workers in the default pool (1--100)
  • Enable cluster autoscaler -- when on, the autoscaler grows/shrinks the default pool within bounds
    • Min Size -- floor (autoscaler will never scale below this)
    • Max Size -- ceiling
  • Pod CIDR -- in-cluster pod IP range (default 10.244.0.0/16)
  • Service CIDR -- ClusterIP service range (default 10.96.0.0/12)
  • Pod Security Admission Default -- cluster-wide default PSA profile (privileged, baseline, or restricted)

Step 6 -- Review & Create

The review step shows every selection and an estimated hourly cost (when hourly billing is enabled). Click Create Cluster to launch.

Provisioning typically takes 5--12 minutes depending on cluster size. The cluster page shows a live progress overlay throughout.

Screenshot: Cluster show page during creation showing the progress overlay with phase name, percentage, and an optional "Show details" log expansion.


Cluster Overview Page

After provisioning starts, the cluster detail page is the single pane of glass for everything.

Screenshot: User panel > Kubernetes > Cluster show page with the metadata card at the top (name, status badge, region, version, CP count, worker count, public/private endpoint URLs) and the tab bar below.

The header shows:

  • State badge -- Provisioning, Running, Upgrading, Suspended, Failed
  • Region, version, control plane count, worker count -- quick stat tiles
  • Public Endpoint -- the URL of the API server LB (public). Copy with the clipboard button.
  • Private Endpoint -- the URL reachable inside the VPC
  • Kubeconfig button -- downloads a kubeconfig file you can plug into kubectl
  • Redeploy CCM button -- re-applies the in-cluster cloud-controller-manager (needed only if your master domain changes)
  • Delete button -- destroys the cluster and all backing nodes

Kubeconfig

Click Kubeconfig to download a YAML file. Save it and export the path:

export KUBECONFIG=~/Downloads/cluster-mycluster.yaml
kubectl get nodes

You should see the control plane node(s) and workers listed as Ready.

Kubeconfig endpoint

The downloaded kubeconfig points at the public endpoint by default if the cluster has a public LB, otherwise the private endpoint. To reach a private-only cluster, run kubectl from a VM inside the same VPC.

Automatic Certificate Renewal

The cluster's internal certificates and the in-cluster controller tokens (cluster autoscaler, cloud controller manager) renew automatically. You do not need to schedule renewals or run any commands.

What rotates and when:

AssetWhenAction you need to take
Control-plane and etcd PKI30 days before expiryNone
Admin kubeconfig (the one you download from the panel)Reissued with the PKI aboveRe-download from the cluster page after the renewal email
Cluster Autoscaler controller token30 days before expiry, from inside the clusterNone
Cloud Controller Manager token30 days before expiry, from inside the clusterNone

You receive an email 30 days before renewal as a heads-up, and a second email immediately after renewal telling you to re-download the kubeconfig. The cluster page also shows a banner with a one-click download until you acknowledge it.

In-cluster workloads, ingress, the autoscaler, and the CCM are not interrupted during renewal. Only kubectl sessions using the old admin kubeconfig need a refresh.

One-time autoscaler manifest

The cluster autoscaler manifest you apply from the Workers tab is applied once. The autoscaler rotates its own controller token from inside the cluster - you never need to reapply it.


Tabs

The cluster page is organised into tabs across the top of the body.

Screenshot: Cluster show page tab bar with Nodes, Tasks, Workers, Security, SSL & Domains.

Nodes Tab

Lists every VM backing the cluster (control plane + workers) with role, hostname, status, and creation time. Click a row to open a side drawer with:

  • CPU and memory donut gauges -- requested vs. allocatable
  • Top pods -- highest CPU and memory consumers on that node
  • Pod list -- all pods scheduled on the node with their status

Use this view to confirm a node went Ready and to find pods crowding a node before scaling.

Tasks Tab

Shows every operation performed against the cluster (create, scale, upgrade, delete, etc.) with status, progress, and per-task logs. Click a task row to expand its log stream.

Useful when:

  • A provisioning task fails -- the log shows the exact phase and error
  • An upgrade is in flight -- watch each wave complete
  • You want a per-step audit trail of who did what

Filter logs by source (Master / Slave / Cluster / Controller) using the dropdown.

Workers Tab

The workhorse tab for day-2 worker operations.

Screenshot: Workers tab showing autoscaling toggle with min/max bounds, "Scale Workers" form, "Upgrade Workers" form, "Upgrade Control Plane" form, "Worker Labels & Taints" editor, and the Worker Nodes table at the bottom.

Autoscaling

Toggle the cluster autoscaler on/off and set min/max worker bounds. Workers will scale within these bounds in response to pending pods and node utilisation.

Scale Workers

Manually set a desired worker count. Optional reason string is logged for audit.

Upgrade Workers

Roll workers to a newer Kubernetes version using a surge-replace strategy:

  • Target version -- pick from the supported versions list
  • Max surge -- how many extra workers to launch at once (1 means one-at-a-time, no overhead)
  • Drain grace (sec) -- how long to allow pods to terminate gracefully before forcing removal

The upgrade adds a new worker on the target version, drains an old one, terminates it, and repeats until every worker is on the new version.

Upgrade Control Plane

Rolling upgrade for the control plane using surge=1 -- one new CP node provisioned per wave, then one old CP drained. The number of waves equals the control plane node count (1 or 3).

Concurrent operations

Worker scale and worker upgrade are blocked while a control plane upgrade is running. Wait for the CP upgrade to finish before scaling workers.

Worker Labels & Taints

Apply Kubernetes labels (key=value) and taints (key=value:effect) to all workers. Taint effects: NoSchedule, PreferNoSchedule, NoExecute. Click Apply to push the changes to the cluster.

Worker Nodes table

Per-node row with hostname, status, creation time, and a Delete button to cordon, drain, and remove a single worker.

Security Tab

Per-cluster security group rules grouped into three scopes:

  • LB -- rules for the public LB fronting the API server (covers who can reach the K8s API)
  • Control Plane -- rules for the control plane nodes themselves
  • Worker -- rules for worker nodes

Each scope has Inbound and Outbound sub-tabs. Traffic in either direction is denied by default unless a rule explicitly allows it.

Screenshot: Security tab showing the three SG scope cards (LB / CP / Worker), inbound/outbound sub-tabs, and an "Add Rule" modal.

To add a rule:

  1. Pick the scope (LB / CP / Worker) and direction (Inbound / Outbound)
  2. Click Add Rule
  3. Fill in:
    • Protocol -- TCP, UDP, ICMP, ICMPv6, or Any
    • Port Range -- min / max (disabled for ICMP / Any)
    • IP Version -- IPv4 or IPv6
    • CIDR -- e.g. 0.0.0.0/0 for any, 203.0.113.10/32 for one host
    • Description -- free text

SSL & Domains Tab

Bind a TLS certificate to the cluster API load balancer so you can reach the API on a custom domain (e.g. k8s.example.com).

  1. Point a DNS CNAME from your domain to the public LB hostname shown in the info banner
  2. Click Add Certificate
  3. Pick a source:
    • Let's Encrypt -- the platform issues + auto-renews a free certificate. Requires DNS to already resolve to the LB.
    • Custom -- paste your own PEM certificate chain + private key

Once issued, the certificate is bound to the API LB and kubectl against your custom domain will work without --insecure-skip-tls-verify.

Screenshot: SSL & Domains tab showing the info banner with the CNAME target, the SSL Certificates table, and the Add Certificate modal with issuance method selection.


Using kubectl

After downloading the kubeconfig:

export KUBECONFIG=~/Downloads/cluster-mycluster.yaml

# Smoke checks
kubectl get nodes
kubectl get pods -A

# Deploy nginx
kubectl create deployment nginx --image=nginx
kubectl expose deployment nginx --port=80 --type=LoadBalancer

# Wait for external IP to populate
kubectl get svc nginx -w

The Service: type=LoadBalancer triggers the in-cluster Cloud Controller Manager to create a managed Load Balancer on the platform automatically. The Service's EXTERNAL-IP will be the LB's public IP once provisioning completes (typically 30--90 seconds).

For the full list of supported Service annotations (custom plans, SSL termination, source-range filtering, health checks, blue/green and canary traffic splitting, etc.) see the Service LoadBalancer annotation reference.


Cluster Autoscaler

When enabled, the cluster autoscaler runs as a Deployment inside the cluster and:

  • Scales up when there are unschedulable pods that would fit if a new worker existed
  • Scales down when a worker has been underutilised for 10 minutes and its pods can move elsewhere

Bounds (min / max) are enforced strictly. The autoscaler will never go below min even if all pods could be evicted, and never above max even if more pods are pending.

To tweak autoscaling behaviour:

  1. Go to Workers tab
  2. Toggle Enable cluster autoscaler on/off
  3. Adjust Minimum workers and Maximum workers
  4. Click Save bounds
Right-sizing bounds

Set min to your baseline (what handles steady-state traffic) and max to your absolute ceiling -- not just a comfortable peak. The autoscaler will only add nodes when actually needed, so a high max is free until you have pending pods.


Node Pools

Workers can be split across multiple node pools, each with its own instance plan, labels, taints, autoscaling bounds, and drain policy. Use this to mix general / memory / compute / GPU shapes in one cluster, or to taint-isolate noisy workloads from system pods.

The cluster wizard creates a single default pool. Open the Pools tab on the cluster page to add, edit, or delete pools.

For per-pool field reference, scheduling examples (nodeSelector / tolerations), and rate-limit tuning, see the Node Pools guide.


Rolling Upgrades

Both control plane and workers are upgraded using a surge strategy: provision new node(s) on the target version, drain the old node(s), terminate, repeat.

Worker Upgrade

  1. Workers tab → Upgrade Workers card
  2. Pick a Target version
  3. Set Max surge (how many new workers to provision per wave; 1 = no overhead, higher = faster)
  4. Set Drain grace (sec) (default 30; pods get this long to exit cleanly before force-removal)
  5. Click Upgrade

The upgrade runs in waves until every worker is on the target version. Watch progress on the Tasks tab.

Control Plane Upgrade

  1. Workers tab → Upgrade Control Plane card
  2. Pick a Target version
  3. Set Drain grace (sec)
  4. Click Upgrade CP

CP upgrade always uses surge=1: one new CP node per wave, one old CP drained per wave. Total waves = control plane node count.

Plan upgrades during low-traffic windows

Even with surge + drain, briefly cycling control plane nodes can introduce small API server hiccups visible to kubectl and Service controllers. Schedule CP upgrades when downstream traffic is low.

Skip-Version Upgrades

Kubernetes does not support skipping minor versions (e.g. 1.30 → 1.32 in one hop). The dropdown only shows versions that are valid upgrade targets from your current version, as configured by the administrator.


Backups & Snapshots

Etcd snapshots are taken automatically on a schedule configured at the admin level. Restoring from a snapshot is currently an admin-initiated operation -- contact support if you need to roll back cluster state.


Lifecycle Operations

From the cluster header you can:

  • Redeploy CCM -- re-applies the in-cluster cloud-controller-manager. Use after the platform's master domain changes (rare).
  • Delete -- destroys the cluster, all CP and worker VMs, the API LB, and the cluster's SGs. Irreversible.
  • Force Cleanup -- visible if a previous delete failed; force-removes whatever residual records remain. Use only after a normal delete fails.
Deletion is permanent

Deleting a cluster removes every node, the API LB, certificates, and all in-cluster data. Take a backup of anything you need first.


Troubleshooting

Cluster stuck in Provisioning

  • Check the Tasks tab -- the failing task's log shows the phase and error
  • Verify the VPC's NAT Gateway is healthy and has internet -- workers will not finish bootstrap without outbound DNS / package fetch
  • Verify the chosen Kubernetes version still has its CP and Worker images registered (admin task)

kubectl connection refused

  • Confirm the cluster is Running on the overview page
  • For a public endpoint: confirm the SG on the LB scope allows your source IP on the API port (defaults to TCP 6443)
  • For a private endpoint: confirm you're running kubectl from a host inside the VPC

Pods stuck in Pending

  • Open the Nodes tab and check whether any node is under heavy resource pressure
  • If autoscaling is enabled: confirm max hasn't been hit
  • If autoscaling is disabled: scale workers manually from the Workers tab

Service: type=LoadBalancer stuck without EXTERNAL-IP

  • Check kubectl describe svc <name> for a CreateLoadBalancerFailed Event -- it'll name a specific error like missing_annotation or unknown_lb_plan
  • Verify the annotation service.beta.kubernetes.io/managed-loadbalancer-plan references an LB plan enabled in the cluster's region
  • See the Service LoadBalancer annotation reference for every supported annotation

Worker upgrade stalls

  • One pod with a strict PodDisruptionBudget can block drain indefinitely
  • Identify the blocking pod with kubectl get pods --field-selector status.phase=Running -A -o wide filtered to the draining node
  • Either relax the PDB, scale the workload temporarily, or increase Drain grace (sec)