Skip to main content

Kubernetes Node Pools

Overview

A node pool is a group of worker nodes inside a Kubernetes cluster where every node uses the same instance plan (CPU, RAM, disk), the same Kubernetes labels, and the same taints. Every cluster has at least one pool (the default pool, created with the cluster) and can have any number of extra pools added later.

Pools are how you mix node shapes inside one cluster. A common layout is one small general-purpose pool for system pods, plus one or more specialty pools (a memory-heavy pool for caches, a GPU pool for ML serving) that only run specific workloads. The Kubernetes scheduler decides which pool a pod lands on using the labels and taints described below.

This page is documented inside the cluster detail page (under the Pools tab) because there is no separate admin index for node pools. The pool list lives on each cluster. See the Kubernetes admin guide for the cluster-level admin view.

Concepts

A few terms used throughout:

  • Worker node. One VM that runs user containers (called pods).
  • Pod. A group of one or more containers Kubernetes schedules together as one unit.
  • Label. A key=value tag attached to a node. Pods can request a node with a specific label using nodeSelector. Labels are descriptive - they do not keep other pods off the node.
  • Taint. A key=value:effect mark on a node that blocks pods unless they explicitly tolerate it. Taints are exclusive - they reserve the node for pods that opt in. Effects: NoSchedule (block new pods), PreferNoSchedule (avoid if possible), NoExecute (also evict existing pods).
  • Toleration. A field on a pod spec that says "I accept this taint". Without a matching toleration the pod cannot run on a tainted node.
  • Autoscaler. Controller inside the cluster that grows or shrinks pools to match demand. See the Cluster Autoscaler reference.
  • Drain. Move every pod off a node before shutting the node down, so workloads keep running.
  • DaemonSet. A Kubernetes object that runs one pod on every node (typical examples: log collectors, metrics agents). DaemonSet pods are usually ignored when deciding whether a node is empty enough to remove.
  • emptyDir. A pod volume backed by the node's local disk. The data is lost when the pod is removed. Pods using emptyDir are normally not drained without explicit permission, because draining them deletes the data.

When to use multiple pools

A single pool is enough for most clusters. Reach for multiple pools when one of these applies:

  • Mixed workload shapes. Cache pods want 64 GB RAM, batch jobs want 32 vCPUs, model serving wants a GPU. One plan cannot fit all three; one pool per shape can.
  • Taint-based isolation. You want to keep noisy workloads off the nodes that run ingress or system pods. A taint on the noisy pool plus a matching toleration on those pods keeps the rest of the cluster clean.
  • Per-workload scaling profile. Batch nodes can tolerate dense packing and slow reclaim; latency-sensitive nodes want fast reclaim. Per-pool autoscaler tuning lets you do both in one cluster.
  • Different fault domains. Run a pool on a different hypervisor group to isolate the blast radius of a host failure.

If none of those apply, one default pool with autoscaling on is fine.

Admin steps

There is no separate admin index page for pools. Pool management lives inside each cluster's detail page on the admin Kubernetes > Clusters screen (same view shown to end users plus admin-only buttons).

Admin Kubernetes Clusters list - drill into a cluster to manage its pools

Add a pool

  1. Open the cluster's detail page (admin or user panel - same UI).
  2. Switch to the Pools tab.
  3. Click Add Pool.
  4. Fill in the form (fields below).
  5. Click Create.

The new pool starts at its min size. If autoscaling is on, the cluster scales up to min size immediately. If autoscaling is off, nothing happens until you scale manually.

Pool fields

FieldWhat it means
NameA short label for the pool. Lowercase letters, numbers, and dashes. Used as a Kubernetes label and in node hostnames.
PlanThe instance plan that defines CPU, RAM, storage, and price for every node in this pool.
Min sizeThe lowest number of nodes the pool will keep, even when idle. Set to 0 to allow the pool to drain fully when not in use.
Max sizeThe highest number of nodes the pool can grow to. The autoscaler will refuse to scale past this.
AutoscalingToggle. On = autoscaler can grow and shrink this pool within the bounds above. Off = pool stays at whatever size you set manually.
LabelsKubernetes labels applied to every node in the pool. Use these as nodeSelector targets on your pods.
TaintsKubernetes taints applied to every node in the pool. Pods need a matching toleration to land here.
Drain timeoutHow long the system will wait for a node to drain before force-killing pods. Default 5 minutes.
Drain grace periodHow long the kubelet gives each pod to shut down cleanly before killing it.
Ignore DaemonSetsSkip DaemonSet pods when deciding if a node is safe to remove. Usually on.
Delete emptyDir dataAllow draining pods that have an emptyDir volume. Off by default to avoid silently losing data.

Advanced (expand the form to see):

FieldWhat it means
Max surge per periodCap on how many nodes can be created inside one rolling window. Avoids stampedes.
Max unavailable per periodCap on how many nodes can be removed inside one rolling window. Protects in-flight workloads.
Scale periodThe length of the rolling window the two caps above apply to.
Cooldown after scale upIdle gap the autoscaler waits after a scale-up before another scale-up.
Cooldown after scale downIdle gap the autoscaler waits after a scale-down before another scale-down.

Leave the advanced fields blank to use sensible defaults.

One autoscaler, many pools

A single cluster-autoscaler Deployment manages every autoscaling pool in the cluster. Each pool shows up as its own node group with its own bounds. See the Cluster Autoscaler reference for tuning.

What end users see

End users get the same Pools tab on their cluster page. They can add, edit, and delete pools the same way. The only admin-only action is Delete Now (described under Deleting a pool).

Sending pods to a specific pool

End users target pools from their pod spec using Kubernetes scheduling fields. The pool's labels and taints are what the user matches against.

Example: a label-only pool

Pool config: label workload=memory, no taint. Pods that ask for this label land here. Pods that do not ask still might, because there is no taint to keep them out.

apiVersion: apps/v1
kind: Deployment
metadata:
name: redis
spec:
replicas: 3
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
nodeSelector:
workload: memory
containers:
- name: redis
image: redis:7

Example: a tainted pool (GPU)

Pool config: label accelerator=gpu, taint nvidia.com/gpu=present:NoSchedule. The taint reserves the pool for pods that explicitly tolerate it.

apiVersion: apps/v1
kind: Deployment
metadata:
name: inference
spec:
replicas: 2
selector:
matchLabels:
app: inference
template:
metadata:
labels:
app: inference
spec:
nodeSelector:
accelerator: gpu
tolerations:
- key: nvidia.com/gpu
operator: Equal
value: present
effect: NoSchedule
containers:
- name: server
image: my-org/inference:latest
resources:
limits:
nvidia.com/gpu: 1

The toleration lets the pod schedule on GPU nodes; the nodeSelector keeps it there. Workloads without the toleration cannot land on GPU nodes, so the GPU pool is reserved for pods that actually want a GPU.

Built-in labels on every pool

The cluster attaches these labels to every node automatically, on top of any labels you set on the pool. You can use them as nodeSelector targets without configuring them explicitly.

LabelValue
hypervisor.io/pool-nameThe pool name
hypervisor.io/pool-idThe pool's UUID
topology.kubernetes.io/regionThe hypervisor group region slug

The default pool

Every cluster has exactly one default pool. It is created with the cluster and shows up in the Pools tab marked with a Default badge.

You can:

  • Rename it. The default flag stays.
  • Edit its plan, size, labels, taints, and policies like any other pool.
  • Reassign the default flag by editing another pool and ticking Make default. The previous default becomes a regular pool. There is always exactly one default at a time.

You cannot delete the default pool directly. To remove it, reassign the default flag to another pool first, then delete the old one.

Scaling rate limits

Each pool has rate limits that cap how aggressively the cluster can grow or shrink it. The limits exist to avoid two failure modes:

  • Stampedes. A burst of pending pods triggering the autoscaler to ask for 50 nodes at once and overwhelming the hypervisor.
  • Capacity flapping. Rapid alternating scale-ups and scale-downs that churn billing without doing useful work.

The defaults are conservative and most clusters never hit them. If you have a pool that needs to scale fast (a batch pool processing a daily 09:00 queue), raise Max surge per period and shorten Scale period. If you have a pool that needs to be slow and steady (a stateful pool that takes a long time to drain), lower Max unavailable per period.

When a rate limit is hit, the autoscaler queues the rest of the request and retries on its next cycle. Nothing is lost; the work paces out.

Scale-down behaviour

When the autoscaler decides a node is no longer needed, the cluster does the following:

  1. Cordon the node so no new pods land on it.
  2. Drain the pods using the pool's drain policy (grace period, ignore-DaemonSets flag, emptyDir flag).
  3. If drain succeeds within the drain timeout, destroy the underlying VM.
  4. If drain fails or times out, leave the node marked and retry on the next cycle.

The cluster will never drain so many nodes at once that zero workers remain. If a scale-down would remove the last worker, that node is exempted until at least one other worker exists.

Min size is a hard floor

The autoscaler will never scale a pool below its configured min, even if every node is empty. To let a pool drain to zero, set min: 0 and make sure no critical workload pins itself to the pool.

Deleting a pool

Schedule deletion

The standard path. Click Delete on the pool in the Pools tab. The pool's nodes are cordoned, drained according to the drain policy, then destroyed. Rate limits apply, so a large pool may take a few cycles to fully drain.

While deletion is in progress the pool stays visible in the Pools tab with a Deleting status. New pods that would have scheduled here go to other pools (assuming their selectors and tolerations match).

Delete now (admin only)

Admins can bypass drain and rate limits using Delete Now on the admin panel. This destroys all the pool's VMs immediately. Use only when the pool is already broken (every node stuck in NotReady, drain will never succeed). Pods running on the pool's nodes are killed without a grace period.

The default pool cannot be deleted by either path. Reassign the default flag first.

Troubleshooting

SymptomLikely causeFix
Pods stuck Pending even though max not hitPod's nodeSelector / tolerations do not match any pool, or the pool template cannot fit pod requests.kubectl describe pod <name> shows scheduler events. Verify pool labels match nodeSelector and the worker plan has enough CPU / RAM.
Pool stays at min even with zero loadWorking as intended. min is a floor.Lower min if you want full reclaim. Set min: 0 to allow scale to zero.
Scale-up adds nodes but pods still do not scheduleNew node's labels / taints do not match pod's selectors, or pod has bigger requests than the template.Recheck pool labels / taints. Increase plan size or pick a different pool.
Scale-down stalls on one nodeA pod with a strict PodDisruptionBudget or safe-to-evict: false is pinned there.Relax the PDB, scale the blocking workload temporarily, or raise Drain timeout on the pool.
Cannot delete the default poolDefault pools are protected.Reassign the default flag to another pool first, then delete.
Two pools, autoscaler always picks the same oneThe default random expander broke the tie one-sided.Switch the cluster autoscaler's --expander flag. See Expander strategies.