Kubernetes Node Pools
Overview
A node pool is a group of worker nodes inside a Kubernetes cluster that all share the same instance plan, the same Kubernetes labels, and the same taints. Every cluster has at least one pool (the default pool created with the cluster) and can have any number of additional pools added later.
Pools let you mix node shapes inside one cluster. A typical setup is a small general-purpose pool for system pods and ingress, plus one or more specialty pools (memory-optimized for caches and databases, compute-optimized for batch workloads, GPU-equipped for ML serving) that only schedule specific pods. The Kubernetes scheduler decides which pool a pod lands on using the labels and taints you set on the pool.
Key characteristics:
- Per-pool plan -- each pool picks its own instance plan, so node CPU, RAM, and storage are independent across pools.
- Per-pool scaling -- each pool has its own min/max bounds, autoscaling toggle, and rate limits, so a busy GPU pool will not throttle a quiet general pool.
- Per-pool labels and taints -- labels and taints are applied to every node in the pool at boot time. Edit them on the pool and new nodes pick them up.
- Per-pool drain policy -- set how long to wait before force-killing pods on scale-down, and whether to ignore DaemonSets or delete
emptyDirdata. - Default pool -- the cluster's first pool is the default. The default pool is the one the autoscaler falls back to if a request does not specify a pool.
A single cluster-autoscaler Deployment manages every autoscaling pool in the cluster. Each pool shows up as its own node group with its own bounds. See the Cluster Autoscaler reference for tuning.
When to use multiple pools
A single pool is enough for most clusters. Reach for multiple pools when one of these applies:
- Mixed workload shapes. Cache pods want 64 GB RAM, batch jobs want 32 vCPUs, model serving wants a GPU. One plan cannot fit all three; one pool per shape can.
- Taint-based isolation. You want to keep noisy workloads off the nodes that run ingress or system pods. A taint on the noisy pool and a matching toleration on those pods keeps the rest of the cluster clean.
- Per-workload scaling profile. Batch nodes can tolerate dense packing and slow reclaim; latency-sensitive nodes want fast reclaim. Per-pool autoscaler tuning lets you do both in one cluster.
- Different fault domains. Run a pool on a different hypervisor group for blast-radius isolation.
If none of those apply, a single default pool with autoscaling on is fine.
Adding a pool
- Open your cluster page.
- Switch to the Pools tab.
- Click Add Pool.
- Fill in the form (fields described below).
- Click Create.
The new pool starts at its min size. If autoscaling is on, the cluster scales up to min size immediately; if autoscaling is off, nothing happens until you manually scale the pool.
Fields explained
| Field | What it means |
|---|---|
| Name | A short label for the pool. Lowercase letters, numbers, and dashes. Used as a Kubernetes label and in node names. |
| Plan | The instance plan that defines CPU, RAM, storage, and price for every node in this pool. |
| Min size | The lowest number of nodes the pool will keep, even when idle. Set to 0 to allow the pool to drain fully when not in use. |
| Max size | The highest number of nodes the pool can grow to. The autoscaler will refuse to scale past this. |
| Autoscaling | Toggle. When on, the cluster autoscaler can grow and shrink this pool within the bounds above. When off, the pool stays at whatever size you set manually. |
| Labels | Kubernetes labels applied to every node in the pool. Use these as nodeSelector targets on your pods. |
| Taints | Kubernetes taints applied to every node in the pool. Pods need a matching toleration to land here. |
| Drain timeout | How long to wait before force-killing pods during scale-down or node removal. Default is 5 minutes. |
| Drain grace period | How long the kubelet gives each pod to shut down cleanly before killing it. |
| Ignore DaemonSets | Skip DaemonSet pods when deciding if a node is safe to remove. Usually on. |
| Delete emptyDir data | Allow draining pods that have an emptyDir volume. Off by default to avoid losing data. |
Advanced (visible if you expand the form):
| Field | What it means |
|---|---|
| Max surge per period | Cap on how many nodes can be created inside one rolling window. Avoids stampedes. |
| Max unavailable per period | Cap on how many nodes can be removed inside one rolling window. Protects in-flight workloads. |
| Scale period | The length of the rolling window for the two caps above. |
| Cooldown after scale up | Idle gap the autoscaler waits after a scale-up before another scale-up. |
| Cooldown after scale down | Idle gap the autoscaler waits after a scale-down before another scale-down. |
If you leave the advanced fields blank, the pool uses sensible defaults.
Sending pods to a specific pool
Use Kubernetes scheduling fields on your pod spec. The pool's labels and taints are what you match against.
Example: a label-only pool
Pool config: label workload=memory, no taint.
Deployment that requests this pool:
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis
spec:
replicas: 3
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
nodeSelector:
workload: memory
containers:
- name: redis
image: redis:7
Pods land only on nodes in this pool. Other workloads can also land here (no taint to keep them out).
Example: a tainted pool (GPU)
Pool config: label accelerator=gpu, taint nvidia.com/gpu=present:NoSchedule.
Deployment that requests this pool:
apiVersion: apps/v1
kind: Deployment
metadata:
name: inference
spec:
replicas: 2
selector:
matchLabels:
app: inference
template:
metadata:
labels:
app: inference
spec:
nodeSelector:
accelerator: gpu
tolerations:
- key: nvidia.com/gpu
operator: Equal
value: present
effect: NoSchedule
containers:
- name: server
image: my-org/inference:latest
resources:
limits:
nvidia.com/gpu: 1
The toleration lets the pod schedule on tainted GPU nodes; the nodeSelector keeps it there. Workloads without the toleration cannot land on GPU nodes, so the GPU pool is reserved for pods that actually want a GPU.
Useful labels available on every pool
The cluster automatically attaches these labels to every node, in addition to any labels you set on the pool:
| Label | Value |
|---|---|
hypervisor.io/pool-name | The pool name |
hypervisor.io/pool-id | The pool's UUID |
topology.kubernetes.io/region | The hypervisor group region slug |
You can use any of these as a nodeSelector target without configuring it explicitly on the pool.
The default pool
Every cluster has exactly one default pool. It is created automatically when the cluster is created and shows up in the Pools tab marked with a Default badge.
You can:
- Rename it (the default flag stays).
- Edit its plan, size, labels, taints, and policies like any other pool.
- Reassign the default flag by editing another pool and ticking Make default. The previous default becomes a regular pool. There is always exactly one default at a time.
You cannot delete the default pool directly. If you want to remove it, first reassign the default flag to another pool, then delete the old one.
Scaling rate limits
Each pool has a set of rate limits that cap how aggressively the cluster can grow or shrink it. The limits exist to avoid two failure modes:
- Stampedes -- a sudden burst of pending pods triggering the autoscaler to ask for fifty nodes at once and overwhelming the hypervisor.
- Capacity flapping -- rapid alternating scale-ups and scale-downs that churn billing without doing useful work.
The defaults are conservative and most clusters never hit them. If you have a pool that needs to scale fast (for example, a batch pool that processes a daily queue at 09:00), raise max surge per period and shorten scale period. If you have a pool that needs to be slow and steady (for example, a stateful pool that takes a long time to drain), lower max unavailable per period.
When a rate limit is hit, the autoscaler queues the rest of the request and retries on its next cycle. Nothing is lost; the work just paces.
Scale down behavior
When the autoscaler decides a node is no longer needed, the cluster does the following:
- Cordon the node so no new pods land on it.
- Drain the pods according to the pool's drain policy (grace period, ignore-DaemonSets flag, emptyDir flag).
- If drain succeeds within the drain timeout, destroy the underlying VM.
- If drain fails or times out, leave the node marked and retry on the next cycle.
The cluster will never drain so many nodes at once that it leaves zero workers. If a scale-down would remove the last remaining worker, that node is exempted until at least one other worker exists.
The autoscaler will never scale a pool below its configured min, even if every node on it is empty. To let a pool drain to zero, set min: 0 and make sure no critical workload pins itself to the pool.
Deleting a pool
Schedule deletion
The standard path. Click Delete on the pool in the Pools tab. The pool's nodes are cordoned and drained according to the drain policy, then destroyed. Rate limits apply, so a large pool may take a few cycles to fully drain.
While deletion is in progress the pool stays visible in the Pools tab with a Deleting status. New pods that would have scheduled here go to other pools (assuming their selectors and tolerations match).
Delete now (admin only)
Admins can bypass the drain and rate limits using Delete Now on the admin panel. This destroys all the pool's VMs immediately. Use only when the pool is already broken (for example, every node is stuck in NotReady and a graceful drain will never succeed). Pods running on the pool's nodes are killed without a grace period.
The cluster's default pool cannot be deleted by either path. Reassign the default flag first.
Troubleshooting
| Symptom | Likely cause | Fix |
|---|---|---|
Pods stuck Pending even though max not hit | Pod's nodeSelector / tolerations don't match any pool, or the pool template wouldn't fit pod requests. | kubectl describe pod <name> shows scheduler events. Verify pool labels match nodeSelector and the worker plan has enough CPU / RAM. |
Pool stays at min even with zero load | Working as intended. min is a floor. | Lower min if you want full reclaim. Set min: 0 to allow scale to zero. |
| Scale-up adds nodes but pods still don't schedule | New node's labels / taints don't match pod's selectors, or pod has bigger requests than template. | Recheck pool labels / taints. Increase plan size or pick a different pool. |
| Scale-down stalls on one node | A pod with a strict PodDisruptionBudget or safe-to-evict: false is pinned there. | Relax the PDB, scale the blocking workload temporarily, or raise Drain timeout on the pool. |
| Cannot delete the default pool | Default pools are protected. | Reassign the default flag to another pool first, then delete. |
| Two pools, autoscaler always picks the same one | Default random expander broke the tie one-sided. | Switch the cluster autoscaler's --expander flag - see Expander strategies. |