Kubernetes Storage & CSI

Lesson 1: Evolution of Kubernetes Storage

Understanding how Kubernetes storage has evolved is crucial for implementing persistent storage solutions in modern clusters.

Legacy In-Tree Volumes (Deprecated)

The original method required developers to specify vendor-specific storage details directly within the Pod manifest.

apiVersion: v1
kind: Pod
metadata:
  name: legacy-pod
spec:
  containers:
  - name: app
    image: nginx
    volumeMounts:
    - name: data
      mountPath: /data
  volumes:
  - name: data
    awsElasticBlockStore:  # Vendor-specific!
      volumeID: vol-0a1b2c3d4e5f6g7h8
      fsType: ext4

Critical Problems with Legacy Volumes:

Manual Provisioning: All storage volumes had to be created manually on the external system before referencing them
Non-Portability: Applications were tightly bound to a specific cloud or storage provider
Migration Nightmare: Moving workloads (e.g., from AWS to Google Cloud) required wholesale manifest changes
No Abstraction: Developers needed detailed knowledge of underlying storage systems

The Abstraction Layer: PV/PVC/SC

To solve these issues, Kubernetes introduced three key abstractions that separate storage implementation from usage:

1. PersistentVolumeClaim (PVC)

A user-facing request for storage, defining requirements like size (e.g., 5GB) and access mode (e.g., ReadWriteOnce). Developers don't need to know about the underlying storage system.

2. StorageClass (SC)

An administrative object that defines how a volume is created, specifying the Provisioner (the driver) and configuration parameters (like Ceph pool name or replication policy).

3. PersistentVolume (PV)

The actual volume created by the Provisioner that satisfies the PVC request. This is automatically created and bound to the PVC.

Modern Storage Workflow

Developer Creates PVC
"I need 10GB of storage"

↓

StorageClass Defines How
"Use Ceph with 3x replication"

↓

Provisioner Creates PV
Automatically provisions storage

↓

PV Binds to PVC
Developer can now use the volume

Example: PVC and Pod

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
  storageClassName: ceph-rbd  # References StorageClass
---
apiVersion: v1
kind: Pod
metadata:
  name: modern-pod
spec:
  containers:
  - name: app
    image: nginx
    volumeMounts:
    - name: data
      mountPath: /data
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: my-pvc  # Clean abstraction!

Benefits of PV/PVC/SC Abstraction:

Developers request storage without knowing implementation details
Administrators control storage provisioning through StorageClasses
Applications are portable across different storage backends
Automatic volume provisioning eliminates manual steps
Clear separation of concerns between dev and ops teams

StorageClass Example

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ceph-rbd
provisioner: rbd.csi.ceph.com  # CSI driver
parameters:
  clusterID: ceph-cluster-1
  pool: kubernetes
  imageFeatures: layering
  csi.storage.k8s.io/provisioner-secret-name: csi-rbd-secret
  csi.storage.k8s.io/node-stage-secret-name: csi-rbd-secret
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: Immediate

Key StorageClass Fields:

provisioner: The CSI driver that will create volumes
parameters: Driver-specific configuration
reclaimPolicy: Delete or Retain PV when PVC is deleted
allowVolumeExpansion: Enable dynamic volume resizing
volumeBindingMode: Immediate or WaitForFirstConsumer

Lesson 2: Container Storage Interface (CSI)

The CSI standard was developed to decouple storage logic from the core Kubernetes code, providing a standardized interface for storage providers.

Why CSI Matters

CSI Advantages:

Independent Updates: Storage providers can update their drivers (CSI plugins) without requiring users to update their entire Kubernetes cluster version
Vendor Flexibility: Any storage vendor can implement CSI to integrate with Kubernetes
Standardization: Common interface across all storage providers
Advanced Features: Snapshots, cloning, resizing, and topology awareness
Out-of-Tree: Storage code lives outside Kubernetes core, reducing complexity

CSI Driver Components

A CSI driver typically deploys two main components as Pods in the cluster:

Controller Plugin

Deployment: Typically a Deployment or StatefulSet (1-3 replicas)

Responsibilities:

Creates volumes on backend storage system
Deletes volumes when PVCs are removed
Handles volume expansion/resizing
Creates snapshots and clones
Manages volume lifecycle

Example: Creates a Ceph RBD image when PVC is created

Node Plugin

Deployment: DaemonSet (runs on every worker node)

Responsibilities:

Mounts volumes to the node
Formats filesystems
Bind-mounts volume into Pod containers
Unmounts volumes when Pods are deleted
Performs local filesystem operations

Example: Maps Ceph RBD to node, formats as ext4, mounts to Pod

CSI Architecture Diagram

User Creates PVC
Requests 10GB storage

↓

Kubernetes API Server
Receives PVC request

↓

CSI Controller Plugin
Creates volume in Ceph backend

↓

PV Created & Bound
PersistentVolume bound to PVC

↓

Pod Scheduled to Node
Kubelet requests volume mount

↓

CSI Node Plugin
Mounts volume to node, then to Pod

CSI Driver Installation (Ceph Example)

The practical section demonstrates configuring the Ceph CSI driver using a Helm chart. This requires specifying the Ceph Cluster ID and the list of Monitor IP addresses.

# Add Ceph CSI Helm repository
helm repo add ceph-csi https://ceph.github.io/csi-charts
helm repo update

# Install Ceph CSI RBD driver
helm install ceph-csi-rbd ceph-csi/ceph-csi-rbd \
  --namespace ceph-csi-rbd \
  --create-namespace \
  --set csiConfig[0].clusterID=ceph-cluster-1 \
  --set csiConfig[0].monitors="{10.0.1.10:6789,10.0.1.11:6789,10.0.1.12:6789}"

# Install Ceph CSI CephFS driver
helm install ceph-csi-cephfs ceph-csi/ceph-csi-cephfs \
  --namespace ceph-csi-cephfs \
  --create-namespace \
  --set csiConfig[0].clusterID=ceph-cluster-1 \
  --set csiConfig[0].monitors="{10.0.1.10:6789,10.0.1.11:6789,10.0.1.12:6789}"

Configuration Requirements:

Cluster ID: Unique identifier for your Ceph cluster
Monitors: List of Ceph monitor IP addresses and ports
Secrets: Credentials for Ceph authentication (usually created separately)

Verify CSI Driver Installation

# Check CSI controller plugin
kubectl get pods -n ceph-csi-rbd -l app=ceph-csi-rbd

# Check CSI node plugin (should be one per node)
kubectl get pods -n ceph-csi-rbd -l app=csi-rbdplugin

# Check CSIDriver object
kubectl get csidriver

# Expected output:
# NAME                   ATTACHREQUIRED   PODINFOONMOUNT   MODES
# rbd.csi.ceph.com       true             false            Persistent
# cephfs.csi.ceph.com    false            false            Persistent

CSI Best Practices:

Use separate namespaces for different CSI drivers
Monitor CSI plugin health and logs
Keep CSI drivers updated for bug fixes and features
Test volume operations in non-production first
Configure resource limits for CSI plugins

Lesson 3: Ceph RBD - Block Storage

RBD (Rados Block Device) offers block-level storage, typically used for single-Pod, exclusive access. It acts like a virtual hard drive.

Ceph RBD Characteristics

Feature	Behavior
Access Mode	Exclusive (ReadWriteOnce) - Only one Pod can mount at a time
Volume Use	Primary storage for databases or applications requiring exclusive block access
Resizing Mechanism	Two-step process involving both controller and node plugins
Performance	High performance, low latency, ideal for databases
Use Cases	PostgreSQL, MySQL, MongoDB, application state

Creating RBD StorageClass

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ceph-rbd
provisioner: rbd.csi.ceph.com
parameters:
  clusterID: ceph-cluster-1
  pool: kubernetes-rbd
  imageFormat: "2"
  imageFeatures: layering
  csi.storage.k8s.io/provisioner-secret-name: csi-rbd-secret
  csi.storage.k8s.io/provisioner-secret-namespace: ceph-csi-rbd
  csi.storage.k8s.io/controller-expand-secret-name: csi-rbd-secret
  csi.storage.k8s.io/controller-expand-secret-namespace: ceph-csi-rbd
  csi.storage.k8s.io/node-stage-secret-name: csi-rbd-secret
  csi.storage.k8s.io/node-stage-secret-namespace: ceph-csi-rbd
  csi.storage.k8s.io/fstype: ext4
reclaimPolicy: Delete
allowVolumeExpansion: true
mountOptions:
  - discard

Using RBD Storage in a Pod

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
  storageClassName: ceph-rbd
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: postgres
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
      - name: postgres
        image: postgres:15
        env:
        - name: POSTGRES_PASSWORD
          value: "secretpassword"
        ports:
        - containerPort: 5432
        volumeMounts:
        - name: data
          mountPath: /var/lib/postgresql/data
      volumes:
      - name: data
        persistentVolumeClaim:
          claimName: postgres-pvc

RBD Volume Resizing Process

Two-Step Resizing Process:

Step 1 - CSI Controller: Resizes the block device in Ceph backend
Step 2 - CSI Node Plugin: Waits for volume to be actively mounted to a Pod, then performs in-place filesystem resize (e.g., xfs_growfs or resize2fs) before completion

Important: For RBD resizing, the volume must be mounted to a Pod. The filesystem resize happens automatically by the CSI driver.

Expanding an RBD Volume

# Original PVC with 20Gi
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi  # Original size
  storageClassName: ceph-rbd

# Edit PVC to expand to 50Gi
kubectl edit pvc postgres-pvc

# Change storage: 20Gi to storage: 50Gi
# Save and exit

# Check expansion status
kubectl describe pvc postgres-pvc

# Wait for conditions:
# - FileSystemResizePending -> Volume expansion in progress
# - Normal status -> Expansion complete

# Verify inside the Pod
kubectl exec -it postgres-0 -- df -h /var/lib/postgresql/data

RBD Best Practices:

Use RBD for stateful applications requiring exclusive access
Enable volume expansion in StorageClass for future growth
Use appropriate filesystem (ext4 for general use, xfs for large files)
Monitor IOPS and latency for database workloads
Use snapshots for backup before major operations

Lesson 4: CephFS - Shared File Storage

CephFS (Ceph File System) offers shared file-level storage, allowing multiple Pods on different nodes to simultaneously access the volume (ReadWriteMany).

CephFS Characteristics

Feature	Behavior
Access Mode	Shared (ReadWriteMany) - Multiple Pods can mount simultaneously
Volume Use	Often used for legacy applications or shared data directories
Resizing Mechanism	Single-step process, instantaneous, doesn't require mounted volume
Performance	Good for shared access, slightly lower performance than RBD
Use Cases	PHP applications, shared configuration, media files, logs

Creating CephFS StorageClass

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: cephfs
provisioner: cephfs.csi.ceph.com
parameters:
  clusterID: ceph-cluster-1
  fsName: kubernetes-fs
  pool: cephfs-data
  csi.storage.k8s.io/provisioner-secret-name: csi-cephfs-secret
  csi.storage.k8s.io/provisioner-secret-namespace: ceph-csi-cephfs
  csi.storage.k8s.io/controller-expand-secret-name: csi-cephfs-secret
  csi.storage.k8s.io/controller-expand-secret-namespace: ceph-csi-cephfs
  csi.storage.k8s.io/node-stage-secret-name: csi-cephfs-secret
  csi.storage.k8s.io/node-stage-secret-namespace: ceph-csi-cephfs
reclaimPolicy: Delete
allowVolumeExpansion: true

Using CephFS for Shared Storage

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: shared-files-pvc
spec:
  accessModes:
  - ReadWriteMany  # Multiple Pods can access
  resources:
    requests:
      storage: 50Gi
  storageClassName: cephfs
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: php-app
spec:
  replicas: 3  # Multiple replicas share the same volume
  selector:
    matchLabels:
      app: php-app
  template:
    metadata:
      labels:
        app: php-app
    spec:
      containers:
      - name: php
        image: php:8.2-apache
        volumeMounts:
        - name: shared-files
          mountPath: /var/www/html/uploads
      volumes:
      - name: shared-files
        persistentVolumeClaim:
          claimName: shared-files-pvc

ReadWriteMany Use Case: All 3 replicas of the PHP application can read and write to the same shared volume simultaneously. Perfect for user uploads, shared configuration, or cached assets.

CephFS Volume Resizing Process

Single-Step Resizing Process:

CSI Controller modifies an Extended File Attribute (quota.max_bytes) on the CephFS directory
The filesystem remains the same; the quota limit is immediately enforced by Ceph
Resizing is instantaneous and does not require the volume to be mounted

Expanding a CephFS Volume

# Original PVC with 50Gi
kubectl get pvc shared-files-pvc
# NAME                STATUS   VOLUME     CAPACITY   ACCESS MODES
# shared-files-pvc    Bound    pvc-xyz    50Gi       RWX

# Edit PVC to expand to 100Gi
kubectl patch pvc shared-files-pvc -p '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}'

# Expansion happens immediately (no waiting for Pod mount)
kubectl describe pvc shared-files-pvc
# Conditions:
#   Type                      Status
#   FileSystemResizePending   False

# Verify new size
kubectl get pvc shared-files-pvc
# NAME                STATUS   VOLUME     CAPACITY   ACCESS MODES
# shared-files-pvc    Bound    pvc-xyz    100Gi      RWX

Key Difference from RBD: CephFS expansion happens immediately without requiring the volume to be mounted. The quota is updated instantly, and all Pods see the new capacity right away.

CephFS Quotas

CephFS provides a mechanism to enforce size limits on shared file volumes, preventing one application from consuming all available space.

# Check quota on a CephFS mount (from inside Ceph)
ceph fs get kubernetes-fs
ceph fs status kubernetes-fs

# View quota attributes
getfattr -n ceph.quota.max_bytes /mnt/cephfs/kubernetes/pvc-xyz

# Output:
# ceph.quota.max_bytes="107374182400"  # 100GiB in bytes

RBD vs CephFS Comparison

Ceph RBD (Block)

Access: ReadWriteOnce only
Performance: Higher IOPS, lower latency
Resize: Two-step, requires mounted Pod
Use Cases: Databases, exclusive storage
Protocol: Block device (like /dev/sdb)
Snapshots: Block-level snapshots

CephFS (File)

Access: ReadWriteMany (shared)
Performance: Good for shared files
Resize: One-step, instant quota update
Use Cases: Shared files, legacy apps
Protocol: POSIX filesystem
Snapshots: Filesystem snapshots

CephFS Best Practices:

Use CephFS when you need ReadWriteMany access
Ideal for legacy applications that need shared file access
Monitor quota usage to prevent exhaustion
Use subdirectories for different application tenants
Consider RBD if you don't need shared access (better performance)

Lesson 5: Advanced CSI Capabilities

The Ceph CSI driver unlocks advanced storage capabilities for Kubernetes workloads beyond basic volume provisioning.

1. Volume Snapshots

Creating point-in-time copies of volumes for backup and recovery.

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
  name: ceph-rbd-snapshot
driver: rbd.csi.ceph.com
deletionPolicy: Delete
parameters:
  clusterID: ceph-cluster-1
  csi.storage.k8s.io/snapshotter-secret-name: csi-rbd-secret
  csi.storage.k8s.io/snapshotter-secret-namespace: ceph-csi-rbd
---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: postgres-snapshot-2024
spec:
  volumeSnapshotClassName: ceph-rbd-snapshot
  source:
    persistentVolumeClaimName: postgres-pvc

Snapshot Use Cases:

Backup before database migrations or upgrades
Point-in-time recovery for disaster scenarios
Testing with production-like data (clone from snapshot)
Compliance and audit requirements

2. Volume Cloning

Creating new volumes from existing volumes or snapshots.

# Clone from existing PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-clone
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
  storageClassName: ceph-rbd
  dataSource:
    kind: PersistentVolumeClaim
    name: postgres-pvc  # Source PVC
---
# Clone from snapshot
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: postgres-restore
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 20Gi
  storageClassName: ceph-rbd
  dataSource:
    kind: VolumeSnapshot
    name: postgres-snapshot-2024
    apiGroup: snapshot.storage.k8s.io

Cloning Benefits: Clones are created using Ceph's copy-on-write mechanism, making them extremely fast and space-efficient. Only differences from the source consume additional storage.

3. Volume Expansion (Dynamic Resizing)

Allowing administrators to increase the size of volumes dynamically without downtime.

# Enable in StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ceph-rbd
provisioner: rbd.csi.ceph.com
allowVolumeExpansion: true  # Enable dynamic expansion
# ... other parameters ...

# Expand existing PVC
kubectl patch pvc postgres-pvc -p '{"spec":{"resources":{"requests":{"storage":"50Gi"}}}}'

# Monitor expansion progress
kubectl get pvc postgres-pvc -w

# For RBD: Wait for Pod to trigger filesystem resize
# For CephFS: Expansion is immediate

Volume Shrinking Not Supported: Kubernetes and most CSI drivers do not support reducing volume size. Always plan capacity carefully and monitor usage trends.

4. Topology Awareness

Configuring the cluster to provision storage volumes from a Ceph cluster located in the same availability zone or region as the consuming Pod.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: ceph-rbd-topology
provisioner: rbd.csi.ceph.com
parameters:
  # ... other parameters ...
volumeBindingMode: WaitForFirstConsumer  # Enable topology awareness
allowedTopologies:
- matchLabelExpressions:
  - key: topology.kubernetes.io/zone
    values:
    - us-east-1a
    - us-east-1b

Topology Benefits:

Reduced Latency: Volume provisioned in the same zone as the Pod
Lower Costs: Avoids inter-zone data transfer charges
Better Performance: Local access is faster than cross-zone
Failure Domain Awareness: Workloads can be zone-specific

5. Volume Metrics and Monitoring

# CSI provides volume metrics via kubelet
kubectl get --raw /api/v1/nodes/worker-1/proxy/metrics/cadvisor | grep volume

# Prometheus metrics from CSI driver
# csi_sidecar_operations_seconds - Operation latency
# csi_sidecar_operations_total - Total operations count
# kubelet_volume_stats_capacity_bytes - Volume capacity
# kubelet_volume_stats_used_bytes - Volume usage
# kubelet_volume_stats_available_bytes - Available space

# Example PVC monitoring with kubectl
kubectl exec -it postgres-0 -- df -h /var/lib/postgresql/data
# Filesystem      Size  Used Avail Use% Mounted on
# /dev/rbd0        50G   12G   38G  24% /var/lib/postgresql/data

Advanced Features Summary

Feature	RBD Support	CephFS Support	Primary Benefit
Snapshots	✅ Yes	✅ Yes	Backup and recovery
Cloning	✅ Yes	✅ Yes	Fast, space-efficient copies
Expansion	✅ Yes (2-step)	✅ Yes (instant)	Grow volumes on demand
Topology	✅ Yes	✅ Yes	Lower latency, reduced cost
Quotas	N/A	✅ Yes	Prevent space exhaustion
ReadWriteMany	❌ No	✅ Yes	Shared access across Pods

Production Recommendations:

Enable volume expansion in all StorageClasses for flexibility
Implement regular snapshot schedules for critical workloads
Use topology awareness to optimize performance and costs
Monitor volume metrics and set alerts for capacity thresholds
Test snapshot restore procedures regularly
Use RBD for databases, CephFS for shared file workloads
Document your storage architecture and disaster recovery plan

Final Quiz

Test your knowledge of Kubernetes Storage and CSI!

Question 1: What was the main problem with legacy in-tree volumes in Kubernetes?

a) They were too fast and efficient

b) Applications were tightly bound to specific storage providers, requiring manual provisioning

c) They only worked with cloud providers

d) They didn't support any storage types

Question 2: What is the primary benefit of CSI (Container Storage Interface)?

a) It makes storage slower but more reliable

b) Storage providers can update drivers independently without requiring Kubernetes cluster updates

c) It eliminates the need for PersistentVolumes

d) It only works with Ceph storage

Question 3: What does the CSI Controller Plugin do?

a) Mounts volumes to Pods on worker nodes

b) Manages volume lifecycle on backend storage (create, delete, resize)

c) Monitors network traffic

d) Formats Pod filesystems

Question 4: What access mode does Ceph RBD support?

a) ReadWriteMany (shared access)

b) ReadWriteOnce (exclusive access)

c) ReadOnlyMany

d) All access modes equally

Question 5: How does CephFS volume resizing work?

a) Requires Pod restart and two-step process

b) Instantly updates quota.max_bytes attribute, doesn't require mounted volume

c) Not supported in CephFS

d) Requires manual intervention on each node

Question 6: What is required for RBD volume expansion to complete?

a) Deleting the PersistentVolume

b) Volume must be actively mounted to a Pod for filesystem resize

c) Restarting the entire cluster

d) Nothing, it's always instant

Question 7: What is the primary use case for CephFS over RBD?

a) Higher performance databases

b) Shared file access across multiple Pods (ReadWriteMany)

c) Reducing storage costs

d) Block-level access

Question 8: What does volumeBindingMode: WaitForFirstConsumer enable?

a) Faster volume provisioning

b) Topology awareness - volumes provisioned in same zone as Pod

c) Automatic volume deletion

d) ReadWriteMany access mode

Quiz Complete!
All correct answers are option 'b'. Review the lessons above to understand why these are the best answers.