Kubernetes Storages - Volumes and storage class

This guide is part of the Kubernetes Guide — a complete topic cluster covering Kubernetes concepts, operations, and production debugging.

Introduction

Storage is where Kubernetes gets significantly more complex than running stateless workloads. A pod that crashes and restarts loses everything written to its container filesystem. A pod that gets rescheduled to a different node loses access to any local disk. Kubernetes storage solves these problems — but it introduces a layered abstraction model that trips up many engineers when they first encounter it.

Understanding Kubernetes storage means understanding three distinct problems it solves:

Ephemeral storage — sharing data between containers in the same pod, or storing temporary data that survives container restarts but not pod deletion.

Persistent storage — data that must survive pod deletion, rescheduling, and cluster upgrades. Databases, message queues, file stores.

Storage provisioning — how storage is requested, created, and bound to workloads in a way that works across cloud providers and on-premises environments.

This guide covers the complete storage model: Volumes, PersistentVolumes, PersistentVolumeClaims, StorageClasses, dynamic provisioning, StatefulSet storage, and the production patterns that keep stateful workloads reliable.

1. The Container Filesystem Problem

A container’s filesystem is ephemeral by design. When a container restarts — due to a crash, a liveness probe failure, or an OOMKill — its filesystem resets to the state of the original image. Anything written after container start is lost.

DIAGRAM: Container filesystem ephemerality problem — timeline diagram. Left column shows Container lifecycle: Start → Write data to /app/data → Crash → Restart → /app/data empty. Right column shows the solution: Volume mounted at /app/data persists across container restarts. Container restarts but volume data survives.

This is not a bug — it is intentional. Containers are designed to be stateless and replaceable. Storage is a separate concern handled through Kubernetes Volumes.

2. Volumes — Ephemeral Shared Storage

A Kubernetes Volume is a directory accessible to containers in a pod. Unlike a container filesystem, a Volume’s lifetime is tied to the pod — not the container. It survives container restarts within the pod but is deleted when the pod is deleted.

Volumes are declared in the pod spec and mounted into containers at specified paths.

DIAGRAM: Kubernetes Volume architecture inside a Pod — show a Pod boundary. Inside: two containers (App Container and Log Shipper Sidecar). Both containers connect to a single ’emptyDir Volume’ cylinder in the center. App Container mounts it at /app/data, Log Shipper mounts it at /logs. Show data flowing from App Container writing logs to /app/data, Log Shipper reading from /logs. Volume labeled ‘survives container restarts, deleted with pod’.

Common Volume Types

emptyDir — starts empty when the pod starts. Data is shared between all containers in the pod. Deleted when the pod is removed. Stored on the node’s local disk (or in memory if medium: Memory is set).

volumes:
- name: cache
  emptyDir:
    medium: Memory       # store in RAM — faster but counts against memory limits
    sizeLimit: 512Mi

configMap — mounts a ConfigMap as files in the container. When the ConfigMap is updated, the mounted files are eventually updated too (within ~1 minute, depending on kubelet sync period).

volumes:
- name: app-config
  configMap:
    name: api-config
    items:
    - key: config.yaml
      path: config.yaml

secret — mounts a Secret as files. Stored in tmpfs (memory) on the node — not written to disk.

volumes:
- name: tls-certs
  secret:
    secretName: api-tls
    defaultMode: 0400    # read-only for owner only

hostPath — mounts a directory from the node’s filesystem into the pod. Rarely appropriate for application workloads — mainly used for system-level pods (log collectors, monitoring agents) that need access to node-level files.

projected — combines multiple volume sources (configMap, secret, serviceAccountToken, downwardAPI) into a single directory.

3. PersistentVolumes — Durable Storage

Volumes are tied to pod lifetime. For data that must survive pod deletion — databases, message queues, uploaded files — Kubernetes uses a different abstraction: PersistentVolumes (PV) and PersistentVolumeClaims (PVC).

┌─────────────────────────────────────────────────────────┐
│                    Storage Model                         │
│                                                          │
│  Pod ──references──► PVC ──bound to──► PV               │
│                       │                │                 │
│               (what you want)   (what exists)            │
│               request:          actual:                  │
│               100Gi RWO         100Gi Azure Disk         │
│               premium-ssd       Premium_LRS              │
└─────────────────────────────────────────────────────────┘

DIAGRAM: PV/PVC/Pod relationship — Shows how a Pod uses persistent storage via a PVC. Left: Pod references a PVC by name (my-pvc). Center: PVC requests storage (100 Gi, ReadWriteOnce, StorageClass: premium-ssd). Right: PV provides actual storage (Azure Disk, Premium_LRS, 100 Gi). Arrows illustrate the flow: Pod → PVC (reference), PVC → PV (bound 1:1), PV → Cloud Storage (provisioned by CSI driver).

PersistentVolume (PV)

A PV is a piece of storage in the cluster. It can be provisioned statically by an administrator (creating a PV manifest that references an existing disk) or dynamically by a StorageClass provisioner.

# Manually provisioned PV (static provisioning — rarely needed with dynamic provisioning)
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-legacy-database
spec:
  capacity:
    storage: 200Gi
  accessModes:
  - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain    # keep the disk after PVC deletion
  storageClassName: premium-ssd
  azureDisk:
    diskName: legacy-db-disk
    diskURI: /subscriptions/.../disks/legacy-db-disk

PersistentVolumeClaim (PVC)

A PVC is a request for storage. Your pod references a PVC — it does not reference a PV directly. This separation means the pod spec is portable: the same pod manifest works in any cluster as long as a StorageClass can satisfy the PVC request.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-data
  namespace: production
spec:
  accessModes:
  - ReadWriteOnce          # one node can mount read/write at a time
  resources:
    requests:
      storage: 100Gi
  storageClassName: premium-ssd   # which StorageClass to use

# Pod referencing the PVC
spec:
  volumes:
  - name: db-storage
    persistentVolumeClaim:
      claimName: postgres-data    # reference the PVC by name

  containers:
  - name: postgres
    image: postgres:15
    volumeMounts:
    - name: db-storage
      mountPath: /var/lib/postgresql/data

Access Modes

Access modes define how many nodes can mount the volume and with what permissions:

Access Mode	Short	Description	Common Use
ReadWriteOnce	RWO	One node, read/write	Databases, single-instance apps
ReadOnlyMany	ROX	Many nodes, read only	Shared config files, static assets
ReadWriteMany	RWX	Many nodes, read/write	Shared file systems (NFS, Azure Files)
ReadWriteOncePod	RWOP	One pod only, read/write	Strict single-pod access guarantee

Critical production point: Azure Disk and AWS EBS are RWO only — they can only be attached to one node at a time. If your workload needs multiple pods to write to the same volume simultaneously, you need Azure Files (NFS) or AWS EFS, which support RWX.

4. StorageClasses and Dynamic Provisioning

Manually creating PVs for every workload is not scalable. StorageClasses enable dynamic provisioning — when a PVC is created, the StorageClass provisioner automatically creates the underlying storage and a PV to match.

PVC created
     │
     ▼
StorageClass provisioner receives request
  (disk.csi.azure.com / ebs.csi.aws.com / pd.csi.storage.gke.io)
     │
     ▼
Provisioner calls cloud API
  (Create Azure Disk / Create EBS Volume / Create Persistent Disk)
     │
     ▼
PV created and bound to PVC automatically
     │
     ▼
Pod can mount the volume

DIAGRAM: Dynamic provisioning flow — sequential flow diagram. Step 1: Developer creates PVC manifest. Step 2: StorageClass provisioner (disk.csi.azure.com) receives request. Step 3: Provisioner calls Azure API Step 4: Cloud disk created. Step 5: PV automatically created and bound to PVC. Step 6: PVC status changes to Bound. Step 7: Pod mounts the volume. Each step numbered with arrows between them.

StorageClass Configuration

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: premium-ssd
provisioner: disk.csi.azure.com      # CSI driver that creates the storage
parameters:
  skuName: Premium_LRS               # Azure Disk SKU
  cachingMode: ReadOnly              # disk caching for read-heavy workloads
reclaimPolicy: Retain                # keep disk after PVC deletion (safe for production)
allowVolumeExpansion: true           # allow resizing PVCs without recreation
volumeBindingMode: WaitForFirstConsumer  # wait for pod scheduling before provisioning

volumeBindingMode

Immediate — PV is provisioned as soon as the PVC is created, before any pod is scheduled. Risk: the disk may be provisioned in a different zone than where the pod eventually schedules, causing a mount failure.

WaitForFirstConsumer — PV is not provisioned until a pod that uses the PVC is scheduled to a node. The provisioner creates the disk in the same zone as the pod’s node. This is the recommended setting for zone-aware storage like Azure Disk and AWS EBS.

Default StorageClasses in AKS

kubectl get storageclass

# NAME                    PROVISIONER                RECLAIMPOLICY   VOLUMEBINDINGMODE
# default (default)       disk.csi.azure.com         Delete          WaitForFirstConsumer
# managed-csi             disk.csi.azure.com         Delete          WaitForFirstConsumer
# managed-csi-premium     disk.csi.azure.com         Delete          WaitForFirstConsumer
# azurefile-csi           file.csi.azure.com         Delete          Immediate
# azurefile-csi-premium   file.csi.azure.com         Delete          Immediate

Azure Disk (managed-csi, managed-csi-premium) — block storage, RWO only, high performance for databases. Azure Files (azurefile-csi, azurefile-csi-premium) — SMB/NFS file share, RWX supported, for shared access scenarios.

5. Reclaim Policies

What happens to the underlying storage when a PVC is deleted?

Policy	Behavior	When to use
Delete	PV and underlying disk are deleted	Dev/test environments, ephemeral data
Retain	PV and disk remain, PV goes to Released state	Production databases — protect against accidental deletion
Recycle	(Deprecated) Basic scrub and reuse	Not recommended

Production rule: Use reclaimPolicy: Retain for any StorageClass that handles production database volumes. With Delete, a mistaken kubectl delete pvc permanently destroys your data. With Retain, the disk remains and you can recover by creating a new PV that references it.

# After a PVC is deleted with Retain policy, PV status becomes Released
kubectl get pv
# NAME          STATUS     CLAIM
# pv-postgres   Released   production/postgres-data   ← disk still exists

# To reuse: remove the claimRef so a new PVC can bind to it
kubectl patch pv pv-postgres -p '{"spec":{"claimRef": null}}'
# PV status changes to Available

6. StatefulSet Storage — volumeClaimTemplates

StatefulSets manage stateful applications where each pod has a unique, stable identity and its own persistent storage. Kafka, PostgreSQL, Elasticsearch, and ZooKeeper are typical StatefulSet workloads.

StatefulSets use volumeClaimTemplates to automatically create a dedicated PVC for each pod replica:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: kafka
  namespace: messaging
spec:
  serviceName: kafka-headless
  replicas: 3
  selector:
    matchLabels:
      app: kafka
  template:
    spec:
      containers:
      - name: kafka
        image: confluentinc/cp-kafka:7.5.0
        volumeMounts:
        - name: data
          mountPath: /var/lib/kafka/data

  volumeClaimTemplates:              # one PVC per pod, created automatically
  - metadata:
      name: data
    spec:
      accessModes: ["ReadWriteOnce"]
      storageClassName: premium-ssd
      resources:
        requests:
          storage: 100Gi

This creates three PVCs automatically:

kubectl get pvc -n messaging
# NAME           STATUS   VOLUME         CAPACITY   STORAGECLASS
# data-kafka-0   Bound    pvc-abc-001    100Gi      premium-ssd
# data-kafka-1   Bound    pvc-abc-002    100Gi      premium-ssd
# data-kafka-2   Bound    pvc-abc-003    100Gi      premium-ssd

DIAGRAM: StatefulSet volumeClaimTemplates — show a StatefulSet with 3 replicas. kafka-0 bound to data-kafka-0 PVC (100Gi). kafka-1 bound to data-kafka-1 PVC (100Gi). kafka-2 bound to data-kafka-2 PVC (100Gi). Each PVC bound to its own PV and underlying cloud disk. Show that scaling down does NOT delete PVCs — orphaned PVCs persist.

Important StatefulSet storage behaviors:

Scaling down does not delete PVCs. When you scale from 3 to 1 replicas, data-kafka-1 and data-kafka-2 still exist. This is intentional — Kubernetes protects StatefulSet data.
Scaling back up reuses existing PVCs. Scaling from 1 to 3 again reattaches data-kafka-1 and data-kafka-2 to the new pods.
Deleting the StatefulSet does not delete PVCs by default. Clean up PVCs manually after confirming data is no longer needed.

7. Volume Expansion

PVC storage can be expanded without recreating the pod — as long as the StorageClass has allowVolumeExpansion: true.

# Expand a PVC from 100Gi to 200Gi
kubectl patch pvc postgres-data -n production \
  -p '{"spec":{"resources":{"requests":{"storage":"200Gi"}}}}'

# Check expansion status
kubectl describe pvc postgres-data -n production
# Conditions:
#   FileSystemResizePending: True   ← disk resized, filesystem resize pending

# The filesystem resize happens automatically when the pod restarts
# For online resize (no pod restart needed), the CSI driver must support it

You cannot shrink a PVC. Kubernetes does not support reducing PVC size. If you need a smaller volume, create a new PVC and migrate the data.

8. Real Production Example — StorageClass Deleted During Node Pool Upgrade

Scenario: After a planned AKS node pool upgrade, 3 of 5 Kafka pods enter Pending state. The other 2 are healthy. No code was changed.

kubectl get pvc -n messaging
# data-kafka-2   Pending   <none>   100Gi   premium-zrs
# data-kafka-3   Pending   <none>   100Gi   premium-zrs
# data-kafka-4   Pending   <none>   100Gi   premium-zrs

kubectl describe pvc data-kafka-2 -n messaging
# Events:
#   Warning ProvisioningFailed:
#   storageclass.storage.k8s.io "premium-zrs" not found

The premium-zrs StorageClass had been deleted three weeks earlier during a routine infrastructure review. Existing bound PVCs were unaffected — but when the node upgrade evicted the StatefulSet pods, they needed to provision new PVCs. The provisioner could not find the StorageClass.

# Always audit before deleting a StorageClass
kubectl get pvc --all-namespaces -o json | \
  jq '.items[] | select(.spec.storageClassName=="premium-zrs") |
  {namespace: .metadata.namespace, name: .metadata.name}'
# Would have shown: messaging/data-kafka-2,3,4

# Fix: recreate the StorageClass
kubectl apply -f - <<EOF
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: premium-zrs
provisioner: disk.csi.azure.com
parameters:
  skuName: Premium_ZRS
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
EOF

# PVCs provision and bind within 2 minutes
kubectl get pvc -n messaging -w
# data-kafka-2   Bound   pvc-new-001   100Gi   premium-zrs

# Pods recover automatically
kubectl get pods -n messaging

Time to resolution: 41 minutes. Lesson: Before deleting any StorageClass, run the audit command above. StatefulSets with currently-bound PVCs will survive the deletion — but any future pod rescheduling, node replacement, or scaling will break silently until it is 2 AM during an upgrade window.

9. When Things Go Wrong

PVC stuck in Pending — StorageClass does not exist, provisioner is down, or cloud quota is exhausted. Check kubectl describe pvc for events. See: Debugging Kubernetes Storage (PV/PVC)

Pod stuck in ContainerCreating with volume mount error — Multi-Attach error (RWO disk still attached to old node) or CSI driver not running. Check kubectl get volumeattachment. See: Debugging Kubernetes Storage (PV/PVC)

PVC stuck in Terminating — pvc-protection finalizer is blocking deletion. Check no pod is actively using the volume, then remove the finalizer. See: Debugging Kubernetes Storage (PV/PVC)

StatefulSet pods Pending after node pool upgrade — StorageClass was deleted or changed. Check kubectl describe pvc for ProvisioningFailed events. See: Debugging Kubernetes Storage (PV/PVC)

Data loss after PVC deletion — StorageClass had reclaimPolicy: Delete. Check if cloud disk still exists (it may if deletion was recent). Change to Retain immediately for all production StorageClasses.

Quick Reference

# Check PVC status
kubectl get pvc -n <namespace>

# Describe stuck PVC
kubectl describe pvc <pvc-name> -n <namespace>

# Check available StorageClasses
kubectl get storageclass

# Check PV status
kubectl get pv

# Check VolumeAttachments (for multi-attach errors)
kubectl get volumeattachment

# Expand a PVC
kubectl patch pvc <pvc-name> -n <namespace> \
  -p '{"spec":{"resources":{"requests":{"storage":"200Gi"}}}}'

# Audit PVCs using a StorageClass before deleting it
kubectl get pvc --all-namespaces -o json | \
  jq '.items[] | select(.spec.storageClassName=="<class>") |
  {namespace: .metadata.namespace, name: .metadata.name}'

# Force delete stuck PVC (remove finalizer)
kubectl patch pvc <pvc-name> -n <namespace> \
  -p '{"metadata":{"finalizers":[]}}' --type=merge

# Release a PV for reuse (remove old claimRef)
kubectl patch pv <pv-name> -p '{"spec":{"claimRef": null}}'

Summary

Kubernetes storage is built on three layers working together:

Volumes — ephemeral shared storage within a pod lifetime, for container-to-container data sharing and config injection
PersistentVolumes / PersistentVolumeClaims — durable storage that survives pod deletion, bound in a 1:1 relationship
StorageClasses — dynamic provisioning blueprints that create cloud storage automatically when PVCs are requested

StatefulSets extend this with per-pod PVCs created from volumeClaimTemplates, giving each replica its own stable, persistent storage.

The two most important production rules: use reclaimPolicy: Retain for production databases to prevent accidental data loss, and always audit PVC references before deleting a StorageClass.

Continue learning: