“This guide is part of the Production Kubernetes Debugging Handbook — a complete reference for debugging production Kubernetes clusters.”
What is CrashLoopBackOff?
If you have worked with Kubernetes for more than a week, you have seen it. A pod that should be running instead shows this:
NAME READY STATUS RESTARTS AGE
payment-svc-7d9f6b-xk2p9 0/1 CrashLoopBackOff 6 12m
CrashLoopBackOff is not an error in itself — it is Kubernetes telling you that your container keeps starting and immediately crashing, and that it has applied a backoff delay before trying again.
The “BackOff” part is key. Kubernetes uses an exponential backoff strategy between each restart attempt:
| Restart Attempt | Wait Before Next Restart |
|---|---|
| 1st | 10 seconds |
| 2nd | 20 seconds |
| 3rd | 40 seconds |
| 4th | 80 seconds |
| 5th+ | Up to 5 minutes |
This is why CrashLoopBackOff pods can sit broken for a long time without anyone noticing — restart attempts slow down dramatically, so the pod looks “stable” in a broken state.
Why CrashLoopBackOff Happens
CrashLoopBackOff is a symptom, not a root cause. The actual problem is always inside the container. Here are the most common reasons, roughly in order of frequency in production:
1. Application error on startup The most common cause. Your application starts, hits an unhandled exception, and exits with a non-zero code. This could be a missing config file, a failed database connection, or a bug in initialization code.
2. Missing or incorrect environment variables The application expects DATABASE_URL but the pod spec references a Secret that does not exist, or references the wrong key name. The app fails the moment it reads the missing variable.
3. Incorrect container entrypoint or command The CMD or ENTRYPOINT in the Dockerfile — or the command/args in the pod spec — points to a binary that does not exist in the image, or passes arguments the binary does not accept.
4. OOMKilled on startup The memory limit is too low for the application to even initialise. The Linux kernel kills the container before it finishes starting. This is especially common with Java applications that have a large JVM heap initialization.
5. Liveness probe failing too early The liveness probe starts checking before the application has finished starting up. Kubernetes kills the container as “unhealthy” before it has had a chance to become healthy. This is a configuration problem, not an application problem.
6. Init container failure If you have init containers, and one of them fails, the main container never starts. The pod keeps restarting the init container, which surfaces as CrashLoopBackOff on the main container.
How to Diagnose CrashLoopBackOff — Step by Step
Step 1 — Confirm the Status and Restart Count
bash
kubectl get pods -n <namespace>
Look at two things: the STATUS column and the RESTARTS column. A high restart count with a recent AGE tells you the pod is crashing fast and often.
Step 2 — Check the Exit Code
The exit code tells you how the container died. This is the fastest way to narrow down the cause.
bash
kubectl describe pod <pod-name> -n <namespace>
Look for the Last State section in the output:
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Sun, 08 Mar 2026 02:00:00 +0000
Finished: Sun, 08 Mar 2026 02:00:02 +0000
What exit codes mean:
| Exit Code | Meaning | Where to Look Next |
|---|---|---|
| 1 | Application error | kubectl logs --previous |
| 137 | OOMKilled — killed by kernel | Increase memory limit |
| 139 | Segmentation fault | Application or library bug |
| 143 | SIGTERM received | Check preStop hooks and liveness probe |
| 255 | Entrypoint not found | Check image CMD or pod spec command |
A container that exits in under 2 seconds with exit code 1 almost always means the application failed on startup — go straight to logs.
Step 3 — Read the Previous Container Logs
This is the most important step. The logs from the crashed container tell you exactly what went wrong.
bash
# Logs from the previously crashed container
kubectl logs <pod-name> -n <namespace> --previous
Without --previous, you get logs from the current container run — which may be empty if the container just started and crashed before logging anything.
bash
# For pods with multiple containers
kubectl logs <pod-name> -n <namespace> -c <container-name> --previous
# Tail the last 50 lines only
kubectl logs <pod-name> -n <namespace> --previous --tail=50
Step 4 — Read Pod Events
Events give you context that logs cannot — things Kubernetes observed about the pod from the outside.
bash
kubectl describe pod <pod-name> -n <namespace>
Scroll to the bottom and read the Events section:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Created 13m (x4 over 13m) kubelet Created container payment-svc
Normal Started 13m (x4 over 13m) kubelet Started container payment-svc
Warning BackOff 2m (x8 over 12m) kubelet Back-off restarting failed container
The (x4 over 13m) annotation shows the container has been created and started 4 times in 13 minutes. The Back-off restarting failed container warning confirms CrashLoopBackOff.
Step 5 — Check Referenced Secrets and ConfigMaps
A very common production cause: the pod references a Secret or ConfigMap that does not exist in the same namespace.
bash
# Check what the pod references
kubectl get pod <pod-name> -n <namespace> -o yaml | \
grep -E "secretKeyRef|configMapKeyRef|secretName|configMapName"
# Verify each one exists in the correct namespace
kubectl get secret <secret-name> -n <namespace>
kubectl get configmap <configmap-name> -n <namespace>
If a referenced Secret does not exist, the pod will fail immediately with an event like:
Warning Failed Error: secret "db-credentials-v2" not found
Debugging Init Container Failures
If your pod has init containers, check them separately. Init container failures show a slightly different status:
bash
kubectl get pod <pod-name> -n <namespace>
# STATUS: Init:CrashLoopBackOff or Init:Error
# Get init container logs
kubectl logs <pod-name> -n <namespace> -c <init-container-name> --previous
# Describe shows init container state separately
kubectl describe pod <pod-name> -n <namespace>
# Look for the: Init Containers section at the top
Common init container failure causes:
- Database migration script fails — DB not reachable yet, or wrong credentials
- Wait-for script times out waiting for a dependency service
- File permission setup fails due to wrong user context
Fixing the Most Common CrashLoopBackOff Causes
Fix 1 — Missing Environment Variable or Secret
bash
# Find what the app is complaining about
kubectl logs <pod-name> --previous | grep -i "env\|variable\|not found\|undefined"
# Check what keys the secret actually has
kubectl get secret <secret-name> -n <namespace> -o jsonpath='{.data}' | python3 -m json.tool
# Re-create the secret with the correct keys
kubectl create secret generic db-credentials \
--from-literal=password=mypassword \
--from-literal=username=myuser \
-n <namespace>
Fix 2 — OOMKilled (Exit Code 137)
bash
# Confirm OOMKilled
kubectl describe pod <pod-name> | grep -A3 "Last State"
# Reason: OOMKilled
# Check current memory limit
kubectl get pod <pod-name> -o yaml | grep -A5 resources
# Patch deployment with higher memory limit
kubectl patch deployment <deployment-name> -n <namespace> --type=json \
-p='[{"op":"replace","path":"/spec/template/spec/containers/0/resources/limits/memory","value":"512Mi"}]'
For Java applications specifically: Java’s JVM allocates heap aggressively on startup. If your container memory limit is 256Mi but the JVM tries to allocate 512Mi of heap, the container gets OOMKilled before the application even starts.
yaml
# Set JVM max heap below your container memory limit
env:
- name: JAVA_OPTS
value: "-Xms128m -Xmx384m" # For a 512Mi container limit
# Or use JVM container awareness (Java 11+)
- name: JAVA_OPTS
value: "-XX:MaxRAMPercentage=75.0" # Use 75% of container memory for heap
Fix 3 — Liveness Probe Killing the Container Too Early
bash
# Check current liveness probe config
kubectl get pod <pod-name> -o yaml | grep -A10 livenessProbe
If initialDelaySeconds is too low, Kubernetes starts checking before the app is ready and kills it:
yaml
# Too aggressive -- kills app before it can start
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 5 # Not enough for most apps
periodSeconds: 10
failureThreshold: 3
# Better -- give the app time to initialize
livenessProbe:
httpGet:
path: /ping # Simple endpoint, no DB check
port: 8080
initialDelaySeconds: 30
periodSeconds: 15
failureThreshold: 3
# Best for apps with variable startup time -- use startupProbe
startupProbe:
httpGet:
path: /ping
port: 8080
failureThreshold: 30 # 30 x 10s = up to 5 minutes to start
periodSeconds: 10
Important: Use a
startupProbefor applications that have long or variable startup times — Java apps, apps that run DB migrations on boot, or anything that loads large datasets into memory. The startup probe disables the liveness probe until it passes, preventing premature kills.
Fix 4 — Wrong Container Entrypoint (Exit Code 255)
bash
# Check what the image actually defines as its entrypoint
docker inspect <image-name>:<tag> | jq '.[0].Config.Entrypoint'
docker inspect <image-name>:<tag> | jq '.[0].Config.Cmd'
# Check what your pod spec is overriding it with
kubectl get pod <pod-name> -o yaml | grep -A5 -E "command:|args:"
# Debug by running the image interactively
docker run -it --entrypoint /bin/sh <image-name>:<tag>
# Then manually run the command to see what error you get
Real Production Example — CrashLoopBackOff After Namespace Migration
The situation: A team migrated a payment microservice from the staging namespace to production. Within minutes of the rollout, all pods entered CrashLoopBackOff with restart counts climbing fast.
Diagnosis:
bash
kubectl logs payment-svc-7d9f6b-xk2p9 -n production --previous
# Error: failed to connect to database: connection refused
# dial tcp 10.0.1.45:5432: connect: connection refused
kubectl describe pod payment-svc-7d9f6b-xk2p9 -n production | grep -A5 "Environment"
# DB_PASSWORD: <set to the key 'password' in secret 'db-credentials'>
kubectl get secret db-credentials -n production
# Error from server (NotFound): secrets "db-credentials" not found
Root cause: The Secret db-credentials existed in staging but had never been created in production. The migration checklist covered the Deployment and Service manifests — but not the Secrets they depended on.
Fix:
bash
# Copy the secret from staging to production
kubectl get secret db-credentials -n staging -o yaml | \
sed 's/namespace: staging/namespace: production/' | \
kubectl apply -f -
# Restart the deployment
kubectl rollout restart deployment/payment-svc -n production
# Verify pods recover
kubectl get pods -n production -w
# All 5 pods Running within 45 seconds
Prevention: Add a pre-deployment validation step to your CI pipeline that checks all referenced Secrets and ConfigMaps exist in the target namespace before applying manifests.
CrashLoopBackOff Debugging Cheatsheet
bash
# 1. Check status and restart count
kubectl get pods -n <namespace>
# 2. Get exit code from last crash
kubectl describe pod <pod-name> -n <namespace> | grep -A5 "Last State"
# 3. Read logs from crashed container
kubectl logs <pod-name> -n <namespace> --previous
# 4. Read pod events
kubectl describe pod <pod-name> -n <namespace>
# 5. Check secrets and configmaps exist in namespace
kubectl get secret <name> -n <namespace>
kubectl get configmap <name> -n <namespace>
# 6. Check init container logs separately
kubectl logs <pod-name> -c <init-container-name> --previous -n <namespace>
# 7. Force a fresh restart after fixing
kubectl rollout restart deployment/<deployment-name> -n <namespace>
Summary
CrashLoopBackOff always has a root cause inside the container. The debugging process is always the same: check the exit code first, then read the previous logs, then read events, then check configuration.
The five most common fixes in production:
- Create the missing Secret or ConfigMap in the correct namespace
- Increase the memory limit for OOMKilled containers
- Add
initialDelaySecondsor astartupProbeto liveness probe config - Fix environment variable references pointing to non-existent Secret keys
- Correct the container entrypoint or command in the pod spec
This guide is part of the Production Kubernetes Debugging Handbook.
Next in the series: How to Fix Kubernetes Node NotReady
Back to the Kubernetes Guide

