How to Debug CrashLoopBackOff in Kubernetes
CrashLoopBackOff is Kubernetes telling you a container started, died, and is being restarted on an increasing back-off delay. It's not a root cause — it's a symptom. The skill is getting from the symptom to the cause quickly, and the path is almost always the same four steps.
1. Confirm the state
kubectl get pods
Look at two columns: STATUS (CrashLoopBackOff) and RESTARTS. A high, climbing restart count means the container keeps dying seconds after it starts. Note the exact pod name — you'll need it.
2. Read the logs
kubectl logs <pod>
If the app logged why it exited (bad config, missing env var, failed DB connection), you're done — fix that and skip to step 4. The trap: when a container dies before it can log anything, kubectl logs is empty. That's where most people get stuck.
3. When logs are empty, describe the pod
kubectl describe pod <pod>
Scroll to the Events section. This is the kubelet's view, and it surfaces the causes the app never got to log:
- Liveness/readiness probe failed — the app is up but the probe is wrong, or it's slow to start.
- OOMKilled — it exceeded its memory limit. Raise the limit or fix the leak.
- ImagePullBackOff — wrong image tag or missing registry credentials.
- Back-off restarting failed container — the generic 'it keeps dying' line; pair it with the more specific warning above it.
4. Fix the cause, then recover
Fix the actual problem — a bad value in a ConfigMap, a too-tight memory limit, a wrong probe path. Then make the pods pick it up:
kubectl rollout restart deployment/<name>
For a single wedged pod that just needs rescheduling, kubectl delete pod <name> lets the deployment create a fresh one. Don't try to repair a managed pod in place — they're cattle, not pets.
The one-line method
get pods → logs → (if empty) describe → fix the cause → rollout restart → verify.
Common mistakes
- Staring at empty logs. If the container dies instantly, the answer is in
describe, notlogs. - Restarting blindly. A restart doesn't fix a bad config or an OOM — it just resets the loop.
- Confusing readiness with liveness. A failing readiness probe keeps a pod out of service; a failing liveness probe kills and restarts it.
Practise it for real
Reading is one thing; doing it under pressure is another. The Kubernetes arc in Terminal Trials drops you into a simulated cluster where you triage a CrashLoopBackOff pod end to end — get pods, describe, and rollout restart — in a safe sandbox. It's free and runs in your browser.
Liked this? ShellQuest turns these mental models into puzzles and labs you can actually practise.
Join the waitlist