This guide is part of the Kubernetes Guide — a complete topic cluster covering Kubernetes concepts, operations, and production debugging.
Introduction
Storage is where Kubernetes gets significantly more complex than running stateless workloads. A pod that crashes and restarts loses everything written to its container filesystem. A pod that gets rescheduled to a different node loses access to any local disk. Kubernetes storage solves these problems — but it introduces a layered abstraction model that trips up many engineers when they first encounter it.
Understanding Kubernetes storage means understanding three distinct problems it solves:
Ephemeral storage — sharing data between containers in the same pod, or storing temporary data that survives container restarts but not pod deletion.
Persistent storage — data that must survive pod deletion, rescheduling, and cluster upgrades. Databases, message queues, file stores.
Storage provisioning — how storage is requested, created, and bound to workloads in a way that works across cloud providers and on-premises environments.
This guide covers the complete storage model: Volumes, PersistentVolumes, PersistentVolumeClaims, StorageClasses, dynamic provisioning, StatefulSet storage, and the production patterns that keep stateful workloads reliable.
1. The Container Filesystem Problem
A container’s filesystem is ephemeral by design. When a container restarts — due to a crash, a liveness probe failure, or an OOMKill — its filesystem resets to the state of the original image. Anything written after container start is lost.

DIAGRAM: Container filesystem ephemerality problem — timeline diagram. Left column shows Container lifecycle: Start → Write data to /app/data → Crash → Restart → /app/data empty. Right column shows the solution: Volume mounted at /app/data persists across container restarts. Container restarts but volume data survives.
This is not a bug — it is intentional. Containers are designed to be stateless and replaceable. Storage is a separate concern handled through Kubernetes Volumes.
2. Volumes — Ephemeral Shared Storage
A Kubernetes Volume is a directory accessible to containers in a pod. Unlike a container filesystem, a Volume’s lifetime is tied to the pod — not the container. It survives container restarts within the pod but is deleted when the pod is deleted.
Volumes are declared in the pod spec and mounted into containers at specified paths.

DIAGRAM: Kubernetes Volume architecture inside a Pod — show a Pod boundary. Inside: two containers (App Container and Log Shipper Sidecar). Both containers connect to a single ’emptyDir Volume’ cylinder in the center. App Container mounts it at /app/data, Log Shipper mounts it at /logs. Show data flowing from App Container writing logs to /app/data, Log Shipper reading from /logs. Volume labeled ‘survives container restarts, deleted with pod’.
Common Volume Types
emptyDir — starts empty when the pod starts. Data is shared between all containers in the pod. Deleted when the pod is removed. Stored on the node’s local disk (or in memory if medium: Memory is set).
volumes:
- name: cache
emptyDir:
medium: Memory # store in RAM — faster but counts against memory limits
sizeLimit: 512Mi
configMap — mounts a ConfigMap as files in the container. When the ConfigMap is updated, the mounted files are eventually updated too (within ~1 minute, depending on kubelet sync period).
volumes:
- name: app-config
configMap:
name: api-config
items:
- key: config.yaml
path: config.yaml
secret — mounts a Secret as files. Stored in tmpfs (memory) on the node — not written to disk.
volumes:
- name: tls-certs
secret:
secretName: api-tls
defaultMode: 0400 # read-only for owner only
hostPath — mounts a directory from the node’s filesystem into the pod. Rarely appropriate for application workloads — mainly used for system-level pods (log collectors, monitoring agents) that need access to node-level files.
projected — combines multiple volume sources (configMap, secret, serviceAccountToken, downwardAPI) into a single directory.
3. PersistentVolumes — Durable Storage
Volumes are tied to pod lifetime. For data that must survive pod deletion — databases, message queues, uploaded files — Kubernetes uses a different abstraction: PersistentVolumes (PV) and PersistentVolumeClaims (PVC).
┌─────────────────────────────────────────────────────────┐
│ Storage Model │
│ │
│ Pod ──references──► PVC ──bound to──► PV │
│ │ │ │
│ (what you want) (what exists) │
│ request: actual: │
│ 100Gi RWO 100Gi Azure Disk │
│ premium-ssd Premium_LRS │
└─────────────────────────────────────────────────────────┘

DIAGRAM: PV/PVC/Pod relationship — Shows how a Pod uses persistent storage via a PVC. Left: Pod references a PVC by name (my-pvc). Center: PVC requests storage (100 Gi, ReadWriteOnce, StorageClass: premium-ssd). Right: PV provides actual storage (Azure Disk, Premium_LRS, 100 Gi). Arrows illustrate the flow: Pod → PVC (reference), PVC → PV (bound 1:1), PV → Cloud Storage (provisioned by CSI driver).
PersistentVolume (PV)
A PV is a piece of storage in the cluster. It can be provisioned statically by an administrator (creating a PV manifest that references an existing disk) or dynamically by a StorageClass provisioner.
# Manually provisioned PV (static provisioning — rarely needed with dynamic provisioning)
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-legacy-database
spec:
capacity:
storage: 200Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain # keep the disk after PVC deletion
storageClassName: premium-ssd
azureDisk:
diskName: legacy-db-disk
diskURI: /subscriptions/.../disks/legacy-db-disk
PersistentVolumeClaim (PVC)
A PVC is a request for storage. Your pod references a PVC — it does not reference a PV directly. This separation means the pod spec is portable: the same pod manifest works in any cluster as long as a StorageClass can satisfy the PVC request.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-data
namespace: production
spec:
accessModes:
- ReadWriteOnce # one node can mount read/write at a time
resources:
requests:
storage: 100Gi
storageClassName: premium-ssd # which StorageClass to use
# Pod referencing the PVC
spec:
volumes:
- name: db-storage
persistentVolumeClaim:
claimName: postgres-data # reference the PVC by name
containers:
- name: postgres
image: postgres:15
volumeMounts:
- name: db-storage
mountPath: /var/lib/postgresql/data
Access Modes
Access modes define how many nodes can mount the volume and with what permissions:
| Access Mode | Short | Description | Common Use |
|---|---|---|---|
| ReadWriteOnce | RWO | One node, read/write | Databases, single-instance apps |
| ReadOnlyMany | ROX | Many nodes, read only | Shared config files, static assets |
| ReadWriteMany | RWX | Many nodes, read/write | Shared file systems (NFS, Azure Files) |
| ReadWriteOncePod | RWOP | One pod only, read/write | Strict single-pod access guarantee |
Critical production point: Azure Disk and AWS EBS are RWO only — they can only be attached to one node at a time. If your workload needs multiple pods to write to the same volume simultaneously, you need Azure Files (NFS) or AWS EFS, which support RWX.
4. StorageClasses and Dynamic Provisioning
Manually creating PVs for every workload is not scalable. StorageClasses enable dynamic provisioning — when a PVC is created, the StorageClass provisioner automatically creates the underlying storage and a PV to match.
PVC created
│
▼
StorageClass provisioner receives request
(disk.csi.azure.com / ebs.csi.aws.com / pd.csi.storage.gke.io)
│
▼
Provisioner calls cloud API
(Create Azure Disk / Create EBS Volume / Create Persistent Disk)
│
▼
PV created and bound to PVC automatically
│
▼
Pod can mount the volume

DIAGRAM: Dynamic provisioning flow — sequential flow diagram. Step 1: Developer creates PVC manifest. Step 2: StorageClass provisioner (disk.csi.azure.com) receives request. Step 3: Provisioner calls Azure API Step 4: Cloud disk created. Step 5: PV automatically created and bound to PVC. Step 6: PVC status changes to Bound. Step 7: Pod mounts the volume. Each step numbered with arrows between them.
StorageClass Configuration
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: premium-ssd
provisioner: disk.csi.azure.com # CSI driver that creates the storage
parameters:
skuName: Premium_LRS # Azure Disk SKU
cachingMode: ReadOnly # disk caching for read-heavy workloads
reclaimPolicy: Retain # keep disk after PVC deletion (safe for production)
allowVolumeExpansion: true # allow resizing PVCs without recreation
volumeBindingMode: WaitForFirstConsumer # wait for pod scheduling before provisioning
volumeBindingMode
Immediate — PV is provisioned as soon as the PVC is created, before any pod is scheduled. Risk: the disk may be provisioned in a different zone than where the pod eventually schedules, causing a mount failure.
WaitForFirstConsumer — PV is not provisioned until a pod that uses the PVC is scheduled to a node. The provisioner creates the disk in the same zone as the pod’s node. This is the recommended setting for zone-aware storage like Azure Disk and AWS EBS.
Default StorageClasses in AKS
kubectl get storageclass
# NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE
# default (default) disk.csi.azure.com Delete WaitForFirstConsumer
# managed-csi disk.csi.azure.com Delete WaitForFirstConsumer
# managed-csi-premium disk.csi.azure.com Delete WaitForFirstConsumer
# azurefile-csi file.csi.azure.com Delete Immediate
# azurefile-csi-premium file.csi.azure.com Delete Immediate
Azure Disk (managed-csi, managed-csi-premium) — block storage, RWO only, high performance for databases. Azure Files (azurefile-csi, azurefile-csi-premium) — SMB/NFS file share, RWX supported, for shared access scenarios.
5. Reclaim Policies
What happens to the underlying storage when a PVC is deleted?
| Policy | Behavior | When to use |
|---|---|---|
| Delete | PV and underlying disk are deleted | Dev/test environments, ephemeral data |
| Retain | PV and disk remain, PV goes to Released state | Production databases — protect against accidental deletion |
| Recycle | (Deprecated) Basic scrub and reuse | Not recommended |
Production rule: Use reclaimPolicy: Retain for any StorageClass that handles production database volumes. With Delete, a mistaken kubectl delete pvc permanently destroys your data. With Retain, the disk remains and you can recover by creating a new PV that references it.
# After a PVC is deleted with Retain policy, PV status becomes Released
kubectl get pv
# NAME STATUS CLAIM
# pv-postgres Released production/postgres-data ← disk still exists
# To reuse: remove the claimRef so a new PVC can bind to it
kubectl patch pv pv-postgres -p '{"spec":{"claimRef": null}}'
# PV status changes to Available
6. StatefulSet Storage — volumeClaimTemplates
StatefulSets manage stateful applications where each pod has a unique, stable identity and its own persistent storage. Kafka, PostgreSQL, Elasticsearch, and ZooKeeper are typical StatefulSet workloads.
StatefulSets use volumeClaimTemplates to automatically create a dedicated PVC for each pod replica:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: kafka
namespace: messaging
spec:
serviceName: kafka-headless
replicas: 3
selector:
matchLabels:
app: kafka
template:
spec:
containers:
- name: kafka
image: confluentinc/cp-kafka:7.5.0
volumeMounts:
- name: data
mountPath: /var/lib/kafka/data
volumeClaimTemplates: # one PVC per pod, created automatically
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: premium-ssd
resources:
requests:
storage: 100Gi
This creates three PVCs automatically:
kubectl get pvc -n messaging
# NAME STATUS VOLUME CAPACITY STORAGECLASS
# data-kafka-0 Bound pvc-abc-001 100Gi premium-ssd
# data-kafka-1 Bound pvc-abc-002 100Gi premium-ssd
# data-kafka-2 Bound pvc-abc-003 100Gi premium-ssd

DIAGRAM: StatefulSet volumeClaimTemplates — show a StatefulSet with 3 replicas. kafka-0 bound to data-kafka-0 PVC (100Gi). kafka-1 bound to data-kafka-1 PVC (100Gi). kafka-2 bound to data-kafka-2 PVC (100Gi). Each PVC bound to its own PV and underlying cloud disk. Show that scaling down does NOT delete PVCs — orphaned PVCs persist.
Important StatefulSet storage behaviors:
- Scaling down does not delete PVCs. When you scale from 3 to 1 replicas,
data-kafka-1anddata-kafka-2still exist. This is intentional — Kubernetes protects StatefulSet data. - Scaling back up reuses existing PVCs. Scaling from 1 to 3 again reattaches
data-kafka-1anddata-kafka-2to the new pods. - Deleting the StatefulSet does not delete PVCs by default. Clean up PVCs manually after confirming data is no longer needed.
7. Volume Expansion
PVC storage can be expanded without recreating the pod — as long as the StorageClass has allowVolumeExpansion: true.
# Expand a PVC from 100Gi to 200Gi
kubectl patch pvc postgres-data -n production \
-p '{"spec":{"resources":{"requests":{"storage":"200Gi"}}}}'
# Check expansion status
kubectl describe pvc postgres-data -n production
# Conditions:
# FileSystemResizePending: True ← disk resized, filesystem resize pending
# The filesystem resize happens automatically when the pod restarts
# For online resize (no pod restart needed), the CSI driver must support it
You cannot shrink a PVC. Kubernetes does not support reducing PVC size. If you need a smaller volume, create a new PVC and migrate the data.
8. Real Production Example — StorageClass Deleted During Node Pool Upgrade
Scenario: After a planned AKS node pool upgrade, 3 of 5 Kafka pods enter Pending state. The other 2 are healthy. No code was changed.
kubectl get pvc -n messaging
# data-kafka-2 Pending <none> 100Gi premium-zrs
# data-kafka-3 Pending <none> 100Gi premium-zrs
# data-kafka-4 Pending <none> 100Gi premium-zrs
kubectl describe pvc data-kafka-2 -n messaging
# Events:
# Warning ProvisioningFailed:
# storageclass.storage.k8s.io "premium-zrs" not found
The premium-zrs StorageClass had been deleted three weeks earlier during a routine infrastructure review. Existing bound PVCs were unaffected — but when the node upgrade evicted the StatefulSet pods, they needed to provision new PVCs. The provisioner could not find the StorageClass.
# Always audit before deleting a StorageClass
kubectl get pvc --all-namespaces -o json | \
jq '.items[] | select(.spec.storageClassName=="premium-zrs") |
{namespace: .metadata.namespace, name: .metadata.name}'
# Would have shown: messaging/data-kafka-2,3,4
# Fix: recreate the StorageClass
kubectl apply -f - <<EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: premium-zrs
provisioner: disk.csi.azure.com
parameters:
skuName: Premium_ZRS
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
EOF
# PVCs provision and bind within 2 minutes
kubectl get pvc -n messaging -w
# data-kafka-2 Bound pvc-new-001 100Gi premium-zrs
# Pods recover automatically
kubectl get pods -n messaging
Time to resolution: 41 minutes. Lesson: Before deleting any StorageClass, run the audit command above. StatefulSets with currently-bound PVCs will survive the deletion — but any future pod rescheduling, node replacement, or scaling will break silently until it is 2 AM during an upgrade window.
9. When Things Go Wrong
PVC stuck in Pending — StorageClass does not exist, provisioner is down, or cloud quota is exhausted. Check kubectl describe pvc for events. See: Debugging Kubernetes Storage (PV/PVC)
Pod stuck in ContainerCreating with volume mount error — Multi-Attach error (RWO disk still attached to old node) or CSI driver not running. Check kubectl get volumeattachment. See: Debugging Kubernetes Storage (PV/PVC)
PVC stuck in Terminating — pvc-protection finalizer is blocking deletion. Check no pod is actively using the volume, then remove the finalizer. See: Debugging Kubernetes Storage (PV/PVC)
StatefulSet pods Pending after node pool upgrade — StorageClass was deleted or changed. Check kubectl describe pvc for ProvisioningFailed events. See: Debugging Kubernetes Storage (PV/PVC)
Data loss after PVC deletion — StorageClass had reclaimPolicy: Delete. Check if cloud disk still exists (it may if deletion was recent). Change to Retain immediately for all production StorageClasses.
Quick Reference
# Check PVC status
kubectl get pvc -n <namespace>
# Describe stuck PVC
kubectl describe pvc <pvc-name> -n <namespace>
# Check available StorageClasses
kubectl get storageclass
# Check PV status
kubectl get pv
# Check VolumeAttachments (for multi-attach errors)
kubectl get volumeattachment
# Expand a PVC
kubectl patch pvc <pvc-name> -n <namespace> \
-p '{"spec":{"resources":{"requests":{"storage":"200Gi"}}}}'
# Audit PVCs using a StorageClass before deleting it
kubectl get pvc --all-namespaces -o json | \
jq '.items[] | select(.spec.storageClassName=="<class>") |
{namespace: .metadata.namespace, name: .metadata.name}'
# Force delete stuck PVC (remove finalizer)
kubectl patch pvc <pvc-name> -n <namespace> \
-p '{"metadata":{"finalizers":[]}}' --type=merge
# Release a PV for reuse (remove old claimRef)
kubectl patch pv <pv-name> -p '{"spec":{"claimRef": null}}'
Summary
Kubernetes storage is built on three layers working together:
- Volumes — ephemeral shared storage within a pod lifetime, for container-to-container data sharing and config injection
- PersistentVolumes / PersistentVolumeClaims — durable storage that survives pod deletion, bound in a 1:1 relationship
- StorageClasses — dynamic provisioning blueprints that create cloud storage automatically when PVCs are requested
StatefulSets extend this with per-pod PVCs created from volumeClaimTemplates, giving each replica its own stable, persistent storage.
The two most important production rules: use reclaimPolicy: Retain for production databases to prevent accidental data loss, and always audit PVC references before deleting a StorageClass.
Continue learning: