Kubernetes Storage & CSI

From Legacy Volumes to Modern Container Storage Interface

Lesson 1: Evolution of Kubernetes Storage

Understanding how Kubernetes storage has evolved is crucial for implementing persistent storage solutions in modern clusters.

Legacy In-Tree Volumes (Deprecated)

The original method required developers to specify vendor-specific storage details directly within the Pod manifest.

apiVersion: v1 kind: Pod metadata: name: legacy-pod spec: containers: - name: app image: nginx volumeMounts: - name: data mountPath: /data volumes: - name: data awsElasticBlockStore: # Vendor-specific! volumeID: vol-0a1b2c3d4e5f6g7h8 fsType: ext4
Critical Problems with Legacy Volumes:
  • Manual Provisioning: All storage volumes had to be created manually on the external system before referencing them
  • Non-Portability: Applications were tightly bound to a specific cloud or storage provider
  • Migration Nightmare: Moving workloads (e.g., from AWS to Google Cloud) required wholesale manifest changes
  • No Abstraction: Developers needed detailed knowledge of underlying storage systems

The Abstraction Layer: PV/PVC/SC

To solve these issues, Kubernetes introduced three key abstractions that separate storage implementation from usage:

1. PersistentVolumeClaim (PVC)

A user-facing request for storage, defining requirements like size (e.g., 5GB) and access mode (e.g., ReadWriteOnce). Developers don't need to know about the underlying storage system.

2. StorageClass (SC)

An administrative object that defines how a volume is created, specifying the Provisioner (the driver) and configuration parameters (like Ceph pool name or replication policy).

3. PersistentVolume (PV)

The actual volume created by the Provisioner that satisfies the PVC request. This is automatically created and bound to the PVC.

Modern Storage Workflow

Developer Creates PVC
"I need 10GB of storage"
StorageClass Defines How
"Use Ceph with 3x replication"
Provisioner Creates PV
Automatically provisions storage
PV Binds to PVC
Developer can now use the volume

Example: PVC and Pod

apiVersion: v1 kind: PersistentVolumeClaim metadata: name: my-pvc spec: accessModes: - ReadWriteOnce resources: requests: storage: 10Gi storageClassName: ceph-rbd # References StorageClass --- apiVersion: v1 kind: Pod metadata: name: modern-pod spec: containers: - name: app image: nginx volumeMounts: - name: data mountPath: /data volumes: - name: data persistentVolumeClaim: claimName: my-pvc # Clean abstraction!
Benefits of PV/PVC/SC Abstraction:
  • Developers request storage without knowing implementation details
  • Administrators control storage provisioning through StorageClasses
  • Applications are portable across different storage backends
  • Automatic volume provisioning eliminates manual steps
  • Clear separation of concerns between dev and ops teams

StorageClass Example

apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: ceph-rbd provisioner: rbd.csi.ceph.com # CSI driver parameters: clusterID: ceph-cluster-1 pool: kubernetes imageFeatures: layering csi.storage.k8s.io/provisioner-secret-name: csi-rbd-secret csi.storage.k8s.io/node-stage-secret-name: csi-rbd-secret reclaimPolicy: Delete allowVolumeExpansion: true volumeBindingMode: Immediate
Key StorageClass Fields:
  • provisioner: The CSI driver that will create volumes
  • parameters: Driver-specific configuration
  • reclaimPolicy: Delete or Retain PV when PVC is deleted
  • allowVolumeExpansion: Enable dynamic volume resizing
  • volumeBindingMode: Immediate or WaitForFirstConsumer

Lesson 2: Container Storage Interface (CSI)

The CSI standard was developed to decouple storage logic from the core Kubernetes code, providing a standardized interface for storage providers.

Why CSI Matters

CSI Advantages:
  • Independent Updates: Storage providers can update their drivers (CSI plugins) without requiring users to update their entire Kubernetes cluster version
  • Vendor Flexibility: Any storage vendor can implement CSI to integrate with Kubernetes
  • Standardization: Common interface across all storage providers
  • Advanced Features: Snapshots, cloning, resizing, and topology awareness
  • Out-of-Tree: Storage code lives outside Kubernetes core, reducing complexity

CSI Driver Components

A CSI driver typically deploys two main components as Pods in the cluster:

Controller Plugin

Deployment: Typically a Deployment or StatefulSet (1-3 replicas)

Responsibilities:

  • Creates volumes on backend storage system
  • Deletes volumes when PVCs are removed
  • Handles volume expansion/resizing
  • Creates snapshots and clones
  • Manages volume lifecycle

Example: Creates a Ceph RBD image when PVC is created

Node Plugin

Deployment: DaemonSet (runs on every worker node)

Responsibilities:

  • Mounts volumes to the node
  • Formats filesystems
  • Bind-mounts volume into Pod containers
  • Unmounts volumes when Pods are deleted
  • Performs local filesystem operations

Example: Maps Ceph RBD to node, formats as ext4, mounts to Pod

CSI Architecture Diagram

User Creates PVC
Requests 10GB storage
Kubernetes API Server
Receives PVC request
CSI Controller Plugin
Creates volume in Ceph backend
PV Created & Bound
PersistentVolume bound to PVC
Pod Scheduled to Node
Kubelet requests volume mount
CSI Node Plugin
Mounts volume to node, then to Pod

CSI Driver Installation (Ceph Example)

The practical section demonstrates configuring the Ceph CSI driver using a Helm chart. This requires specifying the Ceph Cluster ID and the list of Monitor IP addresses.

# Add Ceph CSI Helm repository helm repo add ceph-csi https://ceph.github.io/csi-charts helm repo update # Install Ceph CSI RBD driver helm install ceph-csi-rbd ceph-csi/ceph-csi-rbd \ --namespace ceph-csi-rbd \ --create-namespace \ --set csiConfig[0].clusterID=ceph-cluster-1 \ --set csiConfig[0].monitors="{10.0.1.10:6789,10.0.1.11:6789,10.0.1.12:6789}" # Install Ceph CSI CephFS driver helm install ceph-csi-cephfs ceph-csi/ceph-csi-cephfs \ --namespace ceph-csi-cephfs \ --create-namespace \ --set csiConfig[0].clusterID=ceph-cluster-1 \ --set csiConfig[0].monitors="{10.0.1.10:6789,10.0.1.11:6789,10.0.1.12:6789}"
Configuration Requirements:
  • Cluster ID: Unique identifier for your Ceph cluster
  • Monitors: List of Ceph monitor IP addresses and ports
  • Secrets: Credentials for Ceph authentication (usually created separately)

Verify CSI Driver Installation

# Check CSI controller plugin kubectl get pods -n ceph-csi-rbd -l app=ceph-csi-rbd # Check CSI node plugin (should be one per node) kubectl get pods -n ceph-csi-rbd -l app=csi-rbdplugin # Check CSIDriver object kubectl get csidriver # Expected output: # NAME ATTACHREQUIRED PODINFOONMOUNT MODES # rbd.csi.ceph.com true false Persistent # cephfs.csi.ceph.com false false Persistent
CSI Best Practices:
  • Use separate namespaces for different CSI drivers
  • Monitor CSI plugin health and logs
  • Keep CSI drivers updated for bug fixes and features
  • Test volume operations in non-production first
  • Configure resource limits for CSI plugins

Lesson 3: Ceph RBD - Block Storage

RBD (Rados Block Device) offers block-level storage, typically used for single-Pod, exclusive access. It acts like a virtual hard drive.

Ceph RBD Characteristics

Feature Behavior
Access Mode Exclusive (ReadWriteOnce) - Only one Pod can mount at a time
Volume Use Primary storage for databases or applications requiring exclusive block access
Resizing Mechanism Two-step process involving both controller and node plugins
Performance High performance, low latency, ideal for databases
Use Cases PostgreSQL, MySQL, MongoDB, application state

Creating RBD StorageClass

apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: ceph-rbd provisioner: rbd.csi.ceph.com parameters: clusterID: ceph-cluster-1 pool: kubernetes-rbd imageFormat: "2" imageFeatures: layering csi.storage.k8s.io/provisioner-secret-name: csi-rbd-secret csi.storage.k8s.io/provisioner-secret-namespace: ceph-csi-rbd csi.storage.k8s.io/controller-expand-secret-name: csi-rbd-secret csi.storage.k8s.io/controller-expand-secret-namespace: ceph-csi-rbd csi.storage.k8s.io/node-stage-secret-name: csi-rbd-secret csi.storage.k8s.io/node-stage-secret-namespace: ceph-csi-rbd csi.storage.k8s.io/fstype: ext4 reclaimPolicy: Delete allowVolumeExpansion: true mountOptions: - discard

Using RBD Storage in a Pod

apiVersion: v1 kind: PersistentVolumeClaim metadata: name: postgres-pvc spec: accessModes: - ReadWriteOnce resources: requests: storage: 20Gi storageClassName: ceph-rbd --- apiVersion: apps/v1 kind: StatefulSet metadata: name: postgres spec: serviceName: postgres replicas: 1 selector: matchLabels: app: postgres template: metadata: labels: app: postgres spec: containers: - name: postgres image: postgres:15 env: - name: POSTGRES_PASSWORD value: "secretpassword" ports: - containerPort: 5432 volumeMounts: - name: data mountPath: /var/lib/postgresql/data volumes: - name: data persistentVolumeClaim: claimName: postgres-pvc

RBD Volume Resizing Process

Two-Step Resizing Process:
  1. Step 1 - CSI Controller: Resizes the block device in Ceph backend
  2. Step 2 - CSI Node Plugin: Waits for volume to be actively mounted to a Pod, then performs in-place filesystem resize (e.g., xfs_growfs or resize2fs) before completion
Important: For RBD resizing, the volume must be mounted to a Pod. The filesystem resize happens automatically by the CSI driver.

Expanding an RBD Volume

# Original PVC with 20Gi apiVersion: v1 kind: PersistentVolumeClaim metadata: name: postgres-pvc spec: accessModes: - ReadWriteOnce resources: requests: storage: 20Gi # Original size storageClassName: ceph-rbd # Edit PVC to expand to 50Gi kubectl edit pvc postgres-pvc # Change storage: 20Gi to storage: 50Gi # Save and exit # Check expansion status kubectl describe pvc postgres-pvc # Wait for conditions: # - FileSystemResizePending -> Volume expansion in progress # - Normal status -> Expansion complete # Verify inside the Pod kubectl exec -it postgres-0 -- df -h /var/lib/postgresql/data
RBD Best Practices:
  • Use RBD for stateful applications requiring exclusive access
  • Enable volume expansion in StorageClass for future growth
  • Use appropriate filesystem (ext4 for general use, xfs for large files)
  • Monitor IOPS and latency for database workloads
  • Use snapshots for backup before major operations

Lesson 4: CephFS - Shared File Storage

CephFS (Ceph File System) offers shared file-level storage, allowing multiple Pods on different nodes to simultaneously access the volume (ReadWriteMany).

CephFS Characteristics

Feature Behavior
Access Mode Shared (ReadWriteMany) - Multiple Pods can mount simultaneously
Volume Use Often used for legacy applications or shared data directories
Resizing Mechanism Single-step process, instantaneous, doesn't require mounted volume
Performance Good for shared access, slightly lower performance than RBD
Use Cases PHP applications, shared configuration, media files, logs

Creating CephFS StorageClass

apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: cephfs provisioner: cephfs.csi.ceph.com parameters: clusterID: ceph-cluster-1 fsName: kubernetes-fs pool: cephfs-data csi.storage.k8s.io/provisioner-secret-name: csi-cephfs-secret csi.storage.k8s.io/provisioner-secret-namespace: ceph-csi-cephfs csi.storage.k8s.io/controller-expand-secret-name: csi-cephfs-secret csi.storage.k8s.io/controller-expand-secret-namespace: ceph-csi-cephfs csi.storage.k8s.io/node-stage-secret-name: csi-cephfs-secret csi.storage.k8s.io/node-stage-secret-namespace: ceph-csi-cephfs reclaimPolicy: Delete allowVolumeExpansion: true

Using CephFS for Shared Storage

apiVersion: v1 kind: PersistentVolumeClaim metadata: name: shared-files-pvc spec: accessModes: - ReadWriteMany # Multiple Pods can access resources: requests: storage: 50Gi storageClassName: cephfs --- apiVersion: apps/v1 kind: Deployment metadata: name: php-app spec: replicas: 3 # Multiple replicas share the same volume selector: matchLabels: app: php-app template: metadata: labels: app: php-app spec: containers: - name: php image: php:8.2-apache volumeMounts: - name: shared-files mountPath: /var/www/html/uploads volumes: - name: shared-files persistentVolumeClaim: claimName: shared-files-pvc
ReadWriteMany Use Case: All 3 replicas of the PHP application can read and write to the same shared volume simultaneously. Perfect for user uploads, shared configuration, or cached assets.

CephFS Volume Resizing Process

Single-Step Resizing Process:
  1. CSI Controller modifies an Extended File Attribute (quota.max_bytes) on the CephFS directory
  2. The filesystem remains the same; the quota limit is immediately enforced by Ceph
  3. Resizing is instantaneous and does not require the volume to be mounted

Expanding a CephFS Volume

# Original PVC with 50Gi kubectl get pvc shared-files-pvc # NAME STATUS VOLUME CAPACITY ACCESS MODES # shared-files-pvc Bound pvc-xyz 50Gi RWX # Edit PVC to expand to 100Gi kubectl patch pvc shared-files-pvc -p '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}' # Expansion happens immediately (no waiting for Pod mount) kubectl describe pvc shared-files-pvc # Conditions: # Type Status # FileSystemResizePending False # Verify new size kubectl get pvc shared-files-pvc # NAME STATUS VOLUME CAPACITY ACCESS MODES # shared-files-pvc Bound pvc-xyz 100Gi RWX
Key Difference from RBD: CephFS expansion happens immediately without requiring the volume to be mounted. The quota is updated instantly, and all Pods see the new capacity right away.

CephFS Quotas

CephFS provides a mechanism to enforce size limits on shared file volumes, preventing one application from consuming all available space.

# Check quota on a CephFS mount (from inside Ceph) ceph fs get kubernetes-fs ceph fs status kubernetes-fs # View quota attributes getfattr -n ceph.quota.max_bytes /mnt/cephfs/kubernetes/pvc-xyz # Output: # ceph.quota.max_bytes="107374182400" # 100GiB in bytes

RBD vs CephFS Comparison

Ceph RBD (Block)

  • Access: ReadWriteOnce only
  • Performance: Higher IOPS, lower latency
  • Resize: Two-step, requires mounted Pod
  • Use Cases: Databases, exclusive storage
  • Protocol: Block device (like /dev/sdb)
  • Snapshots: Block-level snapshots

CephFS (File)

  • Access: ReadWriteMany (shared)
  • Performance: Good for shared files
  • Resize: One-step, instant quota update
  • Use Cases: Shared files, legacy apps
  • Protocol: POSIX filesystem
  • Snapshots: Filesystem snapshots
CephFS Best Practices:
  • Use CephFS when you need ReadWriteMany access
  • Ideal for legacy applications that need shared file access
  • Monitor quota usage to prevent exhaustion
  • Use subdirectories for different application tenants
  • Consider RBD if you don't need shared access (better performance)

Lesson 5: Advanced CSI Capabilities

The Ceph CSI driver unlocks advanced storage capabilities for Kubernetes workloads beyond basic volume provisioning.

1. Volume Snapshots

Creating point-in-time copies of volumes for backup and recovery.

apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshotClass metadata: name: ceph-rbd-snapshot driver: rbd.csi.ceph.com deletionPolicy: Delete parameters: clusterID: ceph-cluster-1 csi.storage.k8s.io/snapshotter-secret-name: csi-rbd-secret csi.storage.k8s.io/snapshotter-secret-namespace: ceph-csi-rbd --- apiVersion: snapshot.storage.k8s.io/v1 kind: VolumeSnapshot metadata: name: postgres-snapshot-2024 spec: volumeSnapshotClassName: ceph-rbd-snapshot source: persistentVolumeClaimName: postgres-pvc
Snapshot Use Cases:
  • Backup before database migrations or upgrades
  • Point-in-time recovery for disaster scenarios
  • Testing with production-like data (clone from snapshot)
  • Compliance and audit requirements

2. Volume Cloning

Creating new volumes from existing volumes or snapshots.

# Clone from existing PVC apiVersion: v1 kind: PersistentVolumeClaim metadata: name: postgres-clone spec: accessModes: - ReadWriteOnce resources: requests: storage: 20Gi storageClassName: ceph-rbd dataSource: kind: PersistentVolumeClaim name: postgres-pvc # Source PVC --- # Clone from snapshot apiVersion: v1 kind: PersistentVolumeClaim metadata: name: postgres-restore spec: accessModes: - ReadWriteOnce resources: requests: storage: 20Gi storageClassName: ceph-rbd dataSource: kind: VolumeSnapshot name: postgres-snapshot-2024 apiGroup: snapshot.storage.k8s.io
Cloning Benefits: Clones are created using Ceph's copy-on-write mechanism, making them extremely fast and space-efficient. Only differences from the source consume additional storage.

3. Volume Expansion (Dynamic Resizing)

Allowing administrators to increase the size of volumes dynamically without downtime.

# Enable in StorageClass apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: ceph-rbd provisioner: rbd.csi.ceph.com allowVolumeExpansion: true # Enable dynamic expansion # ... other parameters ... # Expand existing PVC kubectl patch pvc postgres-pvc -p '{"spec":{"resources":{"requests":{"storage":"50Gi"}}}}' # Monitor expansion progress kubectl get pvc postgres-pvc -w # For RBD: Wait for Pod to trigger filesystem resize # For CephFS: Expansion is immediate
Volume Shrinking Not Supported: Kubernetes and most CSI drivers do not support reducing volume size. Always plan capacity carefully and monitor usage trends.

4. Topology Awareness

Configuring the cluster to provision storage volumes from a Ceph cluster located in the same availability zone or region as the consuming Pod.

apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: ceph-rbd-topology provisioner: rbd.csi.ceph.com parameters: # ... other parameters ... volumeBindingMode: WaitForFirstConsumer # Enable topology awareness allowedTopologies: - matchLabelExpressions: - key: topology.kubernetes.io/zone values: - us-east-1a - us-east-1b
Topology Benefits:
  • Reduced Latency: Volume provisioned in the same zone as the Pod
  • Lower Costs: Avoids inter-zone data transfer charges
  • Better Performance: Local access is faster than cross-zone
  • Failure Domain Awareness: Workloads can be zone-specific

5. Volume Metrics and Monitoring

# CSI provides volume metrics via kubelet kubectl get --raw /api/v1/nodes/worker-1/proxy/metrics/cadvisor | grep volume # Prometheus metrics from CSI driver # csi_sidecar_operations_seconds - Operation latency # csi_sidecar_operations_total - Total operations count # kubelet_volume_stats_capacity_bytes - Volume capacity # kubelet_volume_stats_used_bytes - Volume usage # kubelet_volume_stats_available_bytes - Available space # Example PVC monitoring with kubectl kubectl exec -it postgres-0 -- df -h /var/lib/postgresql/data # Filesystem Size Used Avail Use% Mounted on # /dev/rbd0 50G 12G 38G 24% /var/lib/postgresql/data

Advanced Features Summary

Feature RBD Support CephFS Support Primary Benefit
Snapshots ✅ Yes ✅ Yes Backup and recovery
Cloning ✅ Yes ✅ Yes Fast, space-efficient copies
Expansion ✅ Yes (2-step) ✅ Yes (instant) Grow volumes on demand
Topology ✅ Yes ✅ Yes Lower latency, reduced cost
Quotas N/A ✅ Yes Prevent space exhaustion
ReadWriteMany ❌ No ✅ Yes Shared access across Pods
Production Recommendations:
  • Enable volume expansion in all StorageClasses for flexibility
  • Implement regular snapshot schedules for critical workloads
  • Use topology awareness to optimize performance and costs
  • Monitor volume metrics and set alerts for capacity thresholds
  • Test snapshot restore procedures regularly
  • Use RBD for databases, CephFS for shared file workloads
  • Document your storage architecture and disaster recovery plan

Final Quiz

Test your knowledge of Kubernetes Storage and CSI!

Question 1: What was the main problem with legacy in-tree volumes in Kubernetes?

a) They were too fast and efficient
b) Applications were tightly bound to specific storage providers, requiring manual provisioning
c) They only worked with cloud providers
d) They didn't support any storage types

Question 2: What is the primary benefit of CSI (Container Storage Interface)?

a) It makes storage slower but more reliable
b) Storage providers can update drivers independently without requiring Kubernetes cluster updates
c) It eliminates the need for PersistentVolumes
d) It only works with Ceph storage

Question 3: What does the CSI Controller Plugin do?

a) Mounts volumes to Pods on worker nodes
b) Manages volume lifecycle on backend storage (create, delete, resize)
c) Monitors network traffic
d) Formats Pod filesystems

Question 4: What access mode does Ceph RBD support?

a) ReadWriteMany (shared access)
b) ReadWriteOnce (exclusive access)
c) ReadOnlyMany
d) All access modes equally

Question 5: How does CephFS volume resizing work?

a) Requires Pod restart and two-step process
b) Instantly updates quota.max_bytes attribute, doesn't require mounted volume
c) Not supported in CephFS
d) Requires manual intervention on each node

Question 6: What is required for RBD volume expansion to complete?

a) Deleting the PersistentVolume
b) Volume must be actively mounted to a Pod for filesystem resize
c) Restarting the entire cluster
d) Nothing, it's always instant

Question 7: What is the primary use case for CephFS over RBD?

a) Higher performance databases
b) Shared file access across multiple Pods (ReadWriteMany)
c) Reducing storage costs
d) Block-level access

Question 8: What does volumeBindingMode: WaitForFirstConsumer enable?

a) Faster volume provisioning
b) Topology awareness - volumes provisioned in same zone as Pod
c) Automatic volume deletion
d) ReadWriteMany access mode
Quiz Complete!
All correct answers are option 'b'. Review the lessons above to understand why these are the best answers.