“This guide is part of the Production Kubernetes Debugging Handbook — a complete reference for debugging production Kubernetes clusters.”
Why Kubernetes Scheduling Fails
The Kubernetes scheduler is responsible for one job: deciding which node a pod should run on. When it cannot find a suitable node, the pod stays in Pending state indefinitely. No error message is thrown at the application level. The pod simply never starts.
Scheduling failures are particularly frustrating because they are invisible from the application’s perspective. A deployment shows 0 of 3 replicas ready. An alert fires. Engineers check pod logs — but there are no logs because the container never started. The clue is always in the scheduler’s reasoning, which you have to know where to find.
bash
kubectl get pods -n production
# NAME READY STATUS RESTARTS AGE
# api-svc-7d9f-xp2k1 0/1 Pending 0 8m
# api-svc-7d9f-mn3q2 0/1 Pending 0 8m
# api-svc-7d9f-kx9p3 0/1 Pending 0 8m
Three pods, all Pending, no restarts — this is a scheduling failure, not a crash.
This guide covers every major scheduling failure: insufficient resources, taints and tolerations, affinity rules, ResourceQuota limits, HPA and VPA misconfigurations, and cloud node autoscaler delays.
How the Kubernetes Scheduler Works
The scheduler filters and scores nodes in two phases:
Filtering removes nodes that cannot run the pod. A node fails filtering if it has insufficient CPU or memory, the wrong labels for a node selector, taints the pod does not tolerate, or a PVC the pod needs that cannot be bound in that zone.
Scoring ranks the remaining nodes. The scheduler prefers nodes with the best resource fit, correct zone distribution, and pod affinity alignment.
If zero nodes pass the filtering phase, the pod stays Pending. The scheduler logs its reasoning in the pod events — and that is exactly where you start.
Step 1 — Read the Scheduler’s Reasoning
The single most important command for scheduling failures:
bash
kubectl describe pod <pod-name> -n <namespace>
Go directly to the Events section at the bottom. The scheduler always explains why it could not place the pod:
Events:
Type Reason Age From Message
Warning FailedScheduling 8m default-scheduler 0/5 nodes are available:
2 Insufficient cpu,
2 node(s) had taint {dedicated:gpu}
that the pod did not tolerate,
1 node(s) didn't match
Pod's node affinity/selector.
This single message tells you exactly what is wrong: two nodes are out of CPU, two have a taint the pod cannot tolerate, and one does not match the affinity rules. You do not need to guess — the scheduler already did the analysis.
The format is always: 0/N nodes are available: <reason1>, <reason2>...
Step 2 — The 5-Minute Scheduling Triage Checklist
bash
# 1. What is blocking scheduling? (read Events section)
kubectl describe pod <pod-name> -n <namespace>
# 2. What resources are available across nodes?
kubectl describe nodes | grep -A8 "Allocated resources"
# 3. What are nodes actually consuming right now?
kubectl top nodes
# 4. Are there taints on nodes that could block this pod?
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
# 5. Is there a ResourceQuota limiting the namespace?
kubectl describe resourcequota -n <namespace>
Cause 1 — Insufficient CPU or Memory
Symptom:
0/5 nodes are available: 5 Insufficient memory.
or
0/5 nodes are available: 3 Insufficient cpu, 2 Insufficient memory.
How it works: The scheduler uses requests — not limits — to determine if a node has capacity. A pod with requests.memory: 4Gi needs a node with 4Gi of unallocated memory, even if the pod only ever uses 512Mi at runtime.
How to Diagnose
bash
# See how much of each node is already allocated
kubectl describe nodes | grep -A6 "Allocated resources"
# Example output:
# Allocated resources:
# Resource Requests Limits
# cpu 3800m (95%) 4200m (105%)
# memory 6Gi (85%) 8Gi (106%)
# Check the resource request on the pending pod
kubectl get pod <pod-name> -o yaml | grep -A10 resources
# Find the biggest resource consumers right now
kubectl top pods --all-namespaces --sort-by=cpu | head -15
kubectl top pods --all-namespaces --sort-by=memory | head -15
Fix Checklist
bash
# Option 1: Reduce the pod's resource requests if they are over-provisioned
# Check actual usage vs requests — if usage is consistently 20% of requests, lower them
# Option 2: Find and remove idle workloads consuming cluster capacity
kubectl get pods --all-namespaces | grep -E "Completed|Evicted"
kubectl delete pods --field-selector=status.phase=Succeeded --all-namespaces
# Option 3: Scale up the node pool (cloud environments)
# AKS:
az aks nodepool scale \
--resource-group <rg> \
--cluster-name <cluster> \
--name <nodepool> \
--node-count 6
# Option 4: Check for namespace ResourceQuota blocking new pods
kubectl describe resourcequota -n <namespace>
Important: Resource
requestsdetermine scheduling. A pod withrequests: 4Gioccupies 4Gi of scheduling capacity on its node even if it only uses 200Mi. Audit your requests regularly — over-provisioned requests are the most common cause of cluster resource starvation.
Cause 2 — Taints and Tolerations
Symptom:
0/5 nodes are available: 5 node(s) had taint {dedicated:gpu} that the pod did not tolerate.
Taints are applied to nodes to repel pods that do not explicitly tolerate them. Common in clusters with dedicated node pools — GPU nodes, spot/preemptible nodes, system-only nodes.
How to Diagnose
bash
# Check all node taints
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
# Or check a specific node
kubectl describe node <node-name> | grep Taints
# Check what tolerations the pod has
kubectl get pod <pod-name> -o yaml | grep -A15 tolerations
Understanding taint effects:
| Effect | Behavior |
|---|---|
| NoSchedule | Pod will not be scheduled on this node unless it tolerates the taint |
| PreferNoSchedule | Scheduler tries to avoid this node but will use it if no better option exists |
| NoExecute | Existing pods without the toleration are evicted; new pods are not scheduled |
Fix Checklist
bash
# Add the appropriate toleration to your pod spec
# Example: tolerate a spot instance taint in AKS
tolerations:
- key: "kubernetes.azure.com/scalesetpriority"
operator: "Equal"
value: "spot"
effect: "NoSchedule"
# Example: tolerate any taint with a specific key (wildcard)
tolerations:
- key: "dedicated"
operator: "Exists"
effect: "NoSchedule"
Taint a node (adding taints):
bash
# Add a taint to a node
kubectl taint nodes <node-name> dedicated=gpu:NoSchedule
# Remove a taint from a node
kubectl taint nodes <node-name> dedicated=gpu:NoSchedule-
Cause 3 — Node Affinity and Anti-Affinity
Symptom:
0/5 nodes are available: 5 node(s) didn't match Pod's node affinity/selector.
Node affinity rules tell the scheduler which nodes a pod can or prefers to run on, based on node labels.
How to Diagnose
bash
# Check the pod's affinity rules
kubectl get pod <pod-name> -o yaml | grep -A30 affinity
# Check what labels nodes actually have
kubectl get nodes --show-labels
# Check if any node matches the required labels
kubectl get nodes -l <key>=<value>
Required vs Preferred Affinity
yaml
affinity:
nodeAffinity:
# HARD RULE — pod will never schedule if no node matches
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
- eastus-1
- eastus-2
# SOFT RULE — scheduler prefers but does not require
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
preference:
matchExpressions:
- key: node-type
operator: In
values:
- high-memory
Pod Anti-Affinity Surprises
Pod anti-affinity prevents pods from being co-located on the same node or zone. In small clusters, strict anti-affinity makes pods permanently unschedulable.
0/3 nodes are available: 3 node(s) didn't match pod anti-affinity rules.
This means you have more replicas than nodes that satisfy the anti-affinity constraint — for example, 5 replicas with required anti-affinity on a 3-node cluster.
bash
# Check anti-affinity config
kubectl get pod <pod-name> -o yaml | grep -A20 podAntiAffinity
Fix: Switch from required to preferred anti-affinity unless you have an absolute requirement that replicas never share a node:
yaml
affinity:
podAntiAffinity:
# Use preferred unless you truly cannot tolerate co-location
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: myapp
topologyKey: kubernetes.io/hostname
Cause 4 — ResourceQuota Blocking Pod Creation
Symptom: The cluster has available capacity but pods still cannot be created. The error appears as an event or API rejection:
Error from server (Forbidden): pods "api-svc-7d9f-xp2k1" is forbidden:
exceeded quota: production-quota, requested: limits.cpu=500m,
used: limits.cpu=7800m, limited: limits.cpu=8000m
How to Diagnose
bash
kubectl describe resourcequota -n <namespace>
# Output shows used vs hard limits:
# Resource Used Hard
# -------- --- ---
# limits.cpu 7800m 8000m <- almost at limit
# limits.memory 14Gi 16Gi
# pods 48 50
# requests.cpu 3900m 4000m
Fix Checklist
bash
# Option 1: Increase the quota limit
kubectl edit resourcequota <quota-name> -n <namespace>
# Option 2: Find and clean up completed or idle pods
kubectl get pods -n <namespace> | grep -E "Completed|Evicted|Error"
kubectl delete pod --field-selector=status.phase=Succeeded -n <namespace>
# Option 3: Reduce limits on over-provisioned pods
# Find pods with high limits but low actual usage
kubectl top pods -n <namespace> --sort-by=cpu
Cause 5 — HPA Not Scaling — Metrics Server Down
The Horizontal Pod Autoscaler depends entirely on metrics-server. If metrics-server is down, HPA silently stops scaling — and you will not notice until a traffic spike arrives.
How to Diagnose
bash
# Check HPA status
kubectl get hpa -n <namespace>
# TARGETS showing <unknown> means metrics are not available
# NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS
# api-hpa Deployment/api <unknown>/50% 3 20 3
kubectl describe hpa <hpa-name> -n <namespace>
# Conditions:
# ScalingActive: False
# Reason: FailedGetResourceMetric
# Message: unable to get metrics for resource cpu
# Check metrics-server
kubectl get pods -n kube-system | grep metrics-server
kubectl top nodes # if this fails, metrics-server is broken
Fix Checklist
bash
# Restart metrics-server
kubectl rollout restart deployment/metrics-server -n kube-system
# If metrics-server keeps crashing, check logs
kubectl logs -n kube-system -l k8s-app=metrics-server
# Common fix for AKS: add --kubelet-insecure-tls flag
kubectl patch deployment metrics-server -n kube-system --type=json \
-p='[{"op":"add","path":"/spec/template/spec/containers/0/args/-",
"value":"--kubelet-insecure-tls"}]'
# Verify HPA starts reading metrics after fix
kubectl get hpa -n <namespace> -w
Production rule: Add a Prometheus alert for
kube_horizontalpodautoscaler_status_condition{condition="ScalingActive",status="false"}. A silent HPA failure during off-hours will not be noticed until peak traffic the next day.
Cause 6 — Node Autoscaler Delays
In cloud environments (AKS, EKS, GKE), when all nodes are full and a new pod cannot be scheduled, the cluster autoscaler provisions a new node. This takes 2 to 5 minutes. During this window, pods sit in Pending and your application may be degraded.
How to Diagnose
bash
# Check if autoscaler is working
kubectl get events -n kube-system | grep -i "scale\|autoscal"
# Check autoscaler logs
kubectl logs -n kube-system -l app=cluster-autoscaler --tail=50
# Common log messages:
# "Scale up triggered" — CA decided to add a node
# "Node group has reached maximum size" — cannot scale further, check max node count
# "No candidates for node removal" — scale down did not happen
Fix: Pre-Warm With Buffer Pods
To eliminate cold-start delays, keep buffer capacity using low-priority placeholder pods that get evicted when real workloads need the space:
yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-capacity-buffer
namespace: kube-system
spec:
replicas: 3
template:
spec:
priorityClassName: low-priority-buffer
containers:
- name: pause
image: registry.k8s.io/pause:3.9
resources:
requests:
cpu: "500m"
memory: "1Gi"
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: low-priority-buffer
value: -10
globalDefault: false
When a real high-priority pod needs scheduling, it evicts the buffer pods. Those buffer pods then trigger the autoscaler to add a new node — while your application traffic is already being served.
Real Production Example — Anti-Affinity Blocking Scale-Out
Scenario: Black Friday traffic spike. HPA tries to scale the checkout service from 3 to 15 replicas. After 10 minutes, only 3 replicas are running. 12 pods are stuck in Pending.
bash
kubectl describe pod checkout-svc-6b8d-xp2k1 -n ecommerce
# Events:
# Warning FailedScheduling
# 0/3 nodes are available:
# 3 node(s) didn't match pod anti-affinity rules.
The checkout service had required pod anti-affinity set months ago to ensure high availability. With 3 nodes and 3 existing replicas, the scheduler cannot place any new replica because every node already runs one — and the anti-affinity rule forbids it.
bash
# Check the anti-affinity rule
kubectl get deployment checkout-svc -o yaml | grep -A15 podAntiAffinity
# requiredDuringSchedulingIgnoredDuringExecution:
# topologyKey: kubernetes.io/hostname
# Hard rule — one pod per node, maximum
# Fix: change to preferred anti-affinity
kubectl edit deployment checkout-svc -n ecommerce
# Change: requiredDuringSchedulingIgnoredDuringExecution
# To: preferredDuringSchedulingIgnoredDuringExecution
# Pods begin scheduling immediately
kubectl get pods -n ecommerce -w
Time to resolution: 9 minutes. Lesson: required anti-affinity is a hard ceiling on your replica count equal to the number of matching nodes. Review all anti-affinity rules before Black Friday and any expected traffic spike. Switch to preferred unless co-location genuinely causes a correctness issue.
Quick Reference
bash
# Why is the pod Pending? (always start here)
kubectl describe pod <pod-name> -n <namespace>
# Check node available capacity
kubectl describe nodes | grep -A6 "Allocated resources"
# Check node taints
kubectl get nodes -o custom-columns=NAME:.metadata.name,TAINTS:.spec.taints
# Check node labels
kubectl get nodes --show-labels
# Check ResourceQuota
kubectl describe resourcequota -n <namespace>
# Check HPA status
kubectl get hpa -n <namespace>
kubectl describe hpa <hpa-name> -n <namespace>
# Check metrics-server
kubectl top nodes
# Check cluster autoscaler
kubectl logs -n kube-system -l app=cluster-autoscaler --tail=30
# Clean up completed pods consuming quota
kubectl delete pod --field-selector=status.phase=Succeeded --all-namespaces
Summary
Scheduling failures always have a specific reason that the scheduler records in pod events. The diagnosis path is:
- Read the Events section first —
kubectl describe podtells you exactly why scheduling failed - Insufficient resources — check allocated vs available with
kubectl describe nodes - Taints — check node taints and add tolerations to the pod spec
- Affinity rules — switch
requiredtopreferredunless co-location is a correctness issue - ResourceQuota — check used vs hard limits, clean up completed pods
- HPA silent failure — check metrics-server health, alert on
ScalingActive=False - Autoscaler delay — use buffer pods to pre-warm capacity for traffic spikes
Related guides:

