Troubleshooting — The Exam’s Biggest Domain
20 labs covering every failure mode you will face in the exam and in production. This domain is worth 30% — more than any other. Start early, practice often.
Objectives
- Diagnose a pod stuck in Pending — no nodes, taints, resource pressure
- Debug a pod stuck in Init:0/1 — init container failing
- Fix a pod failing readiness probe — wrong port or endpoint
Key Commands
kubectl describe pod <pod> # Events section solves 80% of issues
kubectl logs <pod> --previous
kubectl logs <pod> -c init-container
kubectl get events --sort-by=.lastTimestampEvery troubleshooting task: describe first, then logs. Events in describe output resolves 80% of problems before you need to look elsewhere.
Objectives
- Debug a deployment rollout stuck at 0/3 ready
- Identify wrong image tag, missing ConfigMap key, missing Secret as root causes
- Use kubectl rollout status and check ReplicaSet events
Objectives
- Diagnose a NotReady node — kubelet stopped, certificate expired, network plugin missing
- Fix disk pressure by identifying and clearing large log files
- Restart kubelet and verify node returns to Ready
Key Commands
kubectl describe node node01
systemctl status kubelet
journalctl -u kubelet -n 100 --no-pager
systemctl restart kubeletObjectives
- Diagnose a broken API server static pod manifest
- Fix a scheduler that stopped running
- Use crictl when kubectl itself is unavailable
Key Commands
ls /etc/kubernetes/manifests/ # check here first
crictl ps -a # works even when kubectl is down
crictl logs <container-id>When kubectl does not respond, go straight to crictl. This appears on nearly every CKA exam.
Objectives
- Debug pod-to-pod connectivity failure due to CNI plugin crash
- Identify a NetworkPolicy blocking legitimate traffic
- Use kubectl exec with nc/curl to isolate where traffic breaks
Objectives
- Fix a Service with no endpoints — wrong label selector
- Debug a port mismatch between Service port and container targetPort
Key Commands
kubectl get endpoints rx-api # none = selector mismatch
kubectl describe svc rx-api
kubectl get pods --show-labelsEmpty endpoints is the #1 service issue on the exam. If get endpoints shows none, the selector does not match any pod labels.
Objectives
- Identify CoreDNS crash as root cause of cluster-wide DNS failure
- Fix a broken Corefile causing CoreDNS to refuse to start
- Debug cross-namespace DNS using full FQDN
Objectives
- Identify an OOMKilled container and increase its memory limit
- Observe CPU throttling and fix with correct limits
- Understand eviction thresholds and prevention
Objectives
- Retrieve logs from a specific container in a multi-container pod
- Stream logs and filter with grep
- Access logs from a previously crashed container
Key Commands
kubectl logs <pod> -c <container> -f
kubectl logs <pod> --previous
kubectl logs <pod> --since=10m | grep ERRORObjectives
- Install metrics-server and verify kubectl top nodes and kubectl top pods
- Identify highest CPU and memory consuming pods in the cluster
Objectives
- Add an ephemeral debug container with kubectl debug
- Port-forward a service to local machine for testing
- Copy files out of a container with kubectl cp
Key Commands
kubectl debug -it <pod> --image=busybox --target=<container>
kubectl port-forward svc/rx-api 8080:80
kubectl cp <pod>:/app/logs/error.log ./error.logObjectives
- List all Warning events cluster-wide sorted by time
- Filter events by resource type and namespace
- Interpret BackOff, FailedScheduling, FailedMount messages
Key Commands
kubectl get events -A --sort-by=.lastTimestamp | grep Warning
kubectl get events -n rx-prod --field-selector=reason=BackOffObjectives
- Interpret exit codes: 0=success, 1=error, 137=OOMKilled, 143=SIGTERM
- Fix a failing liveness probe — wrong path, wrong port, too aggressive timing
- Understand restartPolicy: Always, OnFailure, Never
Objectives
- Systematically identify the 6 most common CrashLoopBackOff causes
- Fix: wrong command, missing env var, missing Secret, wrong image, permission denied
- Override entrypoint temporarily to keep container alive for debugging
Always check kubectl logs pod –previous first. Crash logs from the last run are the fastest path to root cause.
Objectives
- Fix a pod failing with ImagePullBackOff due to wrong image tag
- Create a docker-registry Secret and add it as imagePullSecrets
- Distinguish ErrImagePull from ImagePullBackOff
Objectives
- Check certificate expiry dates for all cluster certificates
- Renew all certificates with kubeadm certs renew all
- Debug TLS handshake failures using openssl s_client
Key Commands
kubeadm certs check-expiration
kubeadm certs renew all
openssl x509 -in /etc/kubernetes/pki/apiserver.crt -noout -datesObjectives
- Diagnose kubelet failure from journalctl output
- Fix a wrong –config path in the kubelet systemd unit
- Resolve cgroup driver mismatch between containerd and kubelet
Objectives
- Simulate etcd quorum loss and restore from snapshot
- Fix a scheduler that stopped assigning pods
- Identify a broken API server flag preventing admission
Objectives
- Identify over-provisioned and under-provisioned pods using metrics
- Set correct requests based on actual observed usage
- Use LimitRange to enforce sensible namespace defaults
What this lab contains
- 20 tasks covering all 5 domains at exam weight proportions
- Strict 2-hour limit — no solutions mid-way
- Cluster pre-broken in multiple ways — find and fix each issue
- Scoring: 74%+ = exam ready
Complete this lab twice before booking. First attempt reveals gaps. Second confirms readiness. Score 74%+ under time pressure and you are ready.
Your CKA registration includes 2 free killer.sh simulator sessions. Use them in Weeks 9 and 10. killer.sh is intentionally harder than the real exam.