Lesson 1: Evolution of Kubernetes Storage
Understanding how Kubernetes storage has evolved is crucial for implementing persistent storage solutions in modern clusters.
Legacy In-Tree Volumes (Deprecated)
The original method required developers to specify vendor-specific storage details directly within the Pod manifest.
apiVersion: v1
kind: Pod
metadata:
name: legacy-pod
spec:
containers:
- name: app
image: nginx
volumeMounts:
- name: data
mountPath: /data
volumes:
- name: data
awsElasticBlockStore: # Vendor-specific!
volumeID: vol-0a1b2c3d4e5f6g7h8
fsType: ext4- Manual Provisioning: All storage volumes had to be created manually on the external system before referencing them
- Non-Portability: Applications were tightly bound to a specific cloud or storage provider
- Migration Nightmare: Moving workloads (e.g., from AWS to Google Cloud) required wholesale manifest changes
- No Abstraction: Developers needed detailed knowledge of underlying storage systems
The Abstraction Layer: PV/PVC/SC
To solve these issues, Kubernetes introduced three key abstractions that separate storage implementation from usage:
1. PersistentVolumeClaim (PVC)
A user-facing request for storage, defining requirements like size (e.g., 5GB) and access mode (e.g., ReadWriteOnce). Developers don't need to know about the underlying storage system.
2. StorageClass (SC)
An administrative object that defines how a volume is created, specifying the Provisioner (the driver) and configuration parameters (like Ceph pool name or replication policy).
3. PersistentVolume (PV)
The actual volume created by the Provisioner that satisfies the PVC request. This is automatically created and bound to the PVC.
Modern Storage Workflow
"I need 10GB of storage"
"Use Ceph with 3x replication"
Automatically provisions storage
Developer can now use the volume
Example: PVC and Pod
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: ceph-rbd # References StorageClass
---
apiVersion: v1
kind: Pod
metadata:
name: modern-pod
spec:
containers:
- name: app
image: nginx
volumeMounts:
- name: data
mountPath: /data
volumes:
- name: data
persistentVolumeClaim:
claimName: my-pvc # Clean abstraction!- Developers request storage without knowing implementation details
- Administrators control storage provisioning through StorageClasses
- Applications are portable across different storage backends
- Automatic volume provisioning eliminates manual steps
- Clear separation of concerns between dev and ops teams
StorageClass Example
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ceph-rbd
provisioner: rbd.csi.ceph.com # CSI driver
parameters:
clusterID: ceph-cluster-1
pool: kubernetes
imageFeatures: layering
csi.storage.k8s.io/provisioner-secret-name: csi-rbd-secret
csi.storage.k8s.io/node-stage-secret-name: csi-rbd-secret
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: Immediate- provisioner: The CSI driver that will create volumes
- parameters: Driver-specific configuration
- reclaimPolicy: Delete or Retain PV when PVC is deleted
- allowVolumeExpansion: Enable dynamic volume resizing
- volumeBindingMode: Immediate or WaitForFirstConsumer
Lesson 2: Container Storage Interface (CSI)
The CSI standard was developed to decouple storage logic from the core Kubernetes code, providing a standardized interface for storage providers.
Why CSI Matters
- Independent Updates: Storage providers can update their drivers (CSI plugins) without requiring users to update their entire Kubernetes cluster version
- Vendor Flexibility: Any storage vendor can implement CSI to integrate with Kubernetes
- Standardization: Common interface across all storage providers
- Advanced Features: Snapshots, cloning, resizing, and topology awareness
- Out-of-Tree: Storage code lives outside Kubernetes core, reducing complexity
CSI Driver Components
A CSI driver typically deploys two main components as Pods in the cluster:
Controller Plugin
Deployment: Typically a Deployment or StatefulSet (1-3 replicas)
Responsibilities:
- Creates volumes on backend storage system
- Deletes volumes when PVCs are removed
- Handles volume expansion/resizing
- Creates snapshots and clones
- Manages volume lifecycle
Example: Creates a Ceph RBD image when PVC is created
Node Plugin
Deployment: DaemonSet (runs on every worker node)
Responsibilities:
- Mounts volumes to the node
- Formats filesystems
- Bind-mounts volume into Pod containers
- Unmounts volumes when Pods are deleted
- Performs local filesystem operations
Example: Maps Ceph RBD to node, formats as ext4, mounts to Pod
CSI Architecture Diagram
Requests 10GB storage
Receives PVC request
Creates volume in Ceph backend
PersistentVolume bound to PVC
Kubelet requests volume mount
Mounts volume to node, then to Pod
CSI Driver Installation (Ceph Example)
The practical section demonstrates configuring the Ceph CSI driver using a Helm chart. This requires specifying the Ceph Cluster ID and the list of Monitor IP addresses.
# Add Ceph CSI Helm repository
helm repo add ceph-csi https://ceph.github.io/csi-charts
helm repo update
# Install Ceph CSI RBD driver
helm install ceph-csi-rbd ceph-csi/ceph-csi-rbd \
--namespace ceph-csi-rbd \
--create-namespace \
--set csiConfig[0].clusterID=ceph-cluster-1 \
--set csiConfig[0].monitors="{10.0.1.10:6789,10.0.1.11:6789,10.0.1.12:6789}"
# Install Ceph CSI CephFS driver
helm install ceph-csi-cephfs ceph-csi/ceph-csi-cephfs \
--namespace ceph-csi-cephfs \
--create-namespace \
--set csiConfig[0].clusterID=ceph-cluster-1 \
--set csiConfig[0].monitors="{10.0.1.10:6789,10.0.1.11:6789,10.0.1.12:6789}"- Cluster ID: Unique identifier for your Ceph cluster
- Monitors: List of Ceph monitor IP addresses and ports
- Secrets: Credentials for Ceph authentication (usually created separately)
Verify CSI Driver Installation
# Check CSI controller plugin
kubectl get pods -n ceph-csi-rbd -l app=ceph-csi-rbd
# Check CSI node plugin (should be one per node)
kubectl get pods -n ceph-csi-rbd -l app=csi-rbdplugin
# Check CSIDriver object
kubectl get csidriver
# Expected output:
# NAME ATTACHREQUIRED PODINFOONMOUNT MODES
# rbd.csi.ceph.com true false Persistent
# cephfs.csi.ceph.com false false Persistent- Use separate namespaces for different CSI drivers
- Monitor CSI plugin health and logs
- Keep CSI drivers updated for bug fixes and features
- Test volume operations in non-production first
- Configure resource limits for CSI plugins
Lesson 3: Ceph RBD - Block Storage
RBD (Rados Block Device) offers block-level storage, typically used for single-Pod, exclusive access. It acts like a virtual hard drive.
Ceph RBD Characteristics
| Feature | Behavior |
|---|---|
| Access Mode | Exclusive (ReadWriteOnce) - Only one Pod can mount at a time |
| Volume Use | Primary storage for databases or applications requiring exclusive block access |
| Resizing Mechanism | Two-step process involving both controller and node plugins |
| Performance | High performance, low latency, ideal for databases |
| Use Cases | PostgreSQL, MySQL, MongoDB, application state |
Creating RBD StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ceph-rbd
provisioner: rbd.csi.ceph.com
parameters:
clusterID: ceph-cluster-1
pool: kubernetes-rbd
imageFormat: "2"
imageFeatures: layering
csi.storage.k8s.io/provisioner-secret-name: csi-rbd-secret
csi.storage.k8s.io/provisioner-secret-namespace: ceph-csi-rbd
csi.storage.k8s.io/controller-expand-secret-name: csi-rbd-secret
csi.storage.k8s.io/controller-expand-secret-namespace: ceph-csi-rbd
csi.storage.k8s.io/node-stage-secret-name: csi-rbd-secret
csi.storage.k8s.io/node-stage-secret-namespace: ceph-csi-rbd
csi.storage.k8s.io/fstype: ext4
reclaimPolicy: Delete
allowVolumeExpansion: true
mountOptions:
- discardUsing RBD Storage in a Pod
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
storageClassName: ceph-rbd
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: postgres
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:15
env:
- name: POSTGRES_PASSWORD
value: "secretpassword"
ports:
- containerPort: 5432
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
volumes:
- name: data
persistentVolumeClaim:
claimName: postgres-pvcRBD Volume Resizing Process
- Step 1 - CSI Controller: Resizes the block device in Ceph backend
- Step 2 - CSI Node Plugin: Waits for volume to be actively mounted to a Pod, then performs in-place filesystem resize (e.g.,
xfs_growfsorresize2fs) before completion
Expanding an RBD Volume
# Original PVC with 20Gi
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi # Original size
storageClassName: ceph-rbd
# Edit PVC to expand to 50Gi
kubectl edit pvc postgres-pvc
# Change storage: 20Gi to storage: 50Gi
# Save and exit
# Check expansion status
kubectl describe pvc postgres-pvc
# Wait for conditions:
# - FileSystemResizePending -> Volume expansion in progress
# - Normal status -> Expansion complete
# Verify inside the Pod
kubectl exec -it postgres-0 -- df -h /var/lib/postgresql/data- Use RBD for stateful applications requiring exclusive access
- Enable volume expansion in StorageClass for future growth
- Use appropriate filesystem (ext4 for general use, xfs for large files)
- Monitor IOPS and latency for database workloads
- Use snapshots for backup before major operations
Lesson 4: CephFS - Shared File Storage
CephFS (Ceph File System) offers shared file-level storage, allowing multiple Pods on different nodes to simultaneously access the volume (ReadWriteMany).
CephFS Characteristics
| Feature | Behavior |
|---|---|
| Access Mode | Shared (ReadWriteMany) - Multiple Pods can mount simultaneously |
| Volume Use | Often used for legacy applications or shared data directories |
| Resizing Mechanism | Single-step process, instantaneous, doesn't require mounted volume |
| Performance | Good for shared access, slightly lower performance than RBD |
| Use Cases | PHP applications, shared configuration, media files, logs |
Creating CephFS StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: cephfs
provisioner: cephfs.csi.ceph.com
parameters:
clusterID: ceph-cluster-1
fsName: kubernetes-fs
pool: cephfs-data
csi.storage.k8s.io/provisioner-secret-name: csi-cephfs-secret
csi.storage.k8s.io/provisioner-secret-namespace: ceph-csi-cephfs
csi.storage.k8s.io/controller-expand-secret-name: csi-cephfs-secret
csi.storage.k8s.io/controller-expand-secret-namespace: ceph-csi-cephfs
csi.storage.k8s.io/node-stage-secret-name: csi-cephfs-secret
csi.storage.k8s.io/node-stage-secret-namespace: ceph-csi-cephfs
reclaimPolicy: Delete
allowVolumeExpansion: trueUsing CephFS for Shared Storage
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: shared-files-pvc
spec:
accessModes:
- ReadWriteMany # Multiple Pods can access
resources:
requests:
storage: 50Gi
storageClassName: cephfs
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: php-app
spec:
replicas: 3 # Multiple replicas share the same volume
selector:
matchLabels:
app: php-app
template:
metadata:
labels:
app: php-app
spec:
containers:
- name: php
image: php:8.2-apache
volumeMounts:
- name: shared-files
mountPath: /var/www/html/uploads
volumes:
- name: shared-files
persistentVolumeClaim:
claimName: shared-files-pvcCephFS Volume Resizing Process
- CSI Controller modifies an Extended File Attribute (
quota.max_bytes) on the CephFS directory - The filesystem remains the same; the quota limit is immediately enforced by Ceph
- Resizing is instantaneous and does not require the volume to be mounted
Expanding a CephFS Volume
# Original PVC with 50Gi
kubectl get pvc shared-files-pvc
# NAME STATUS VOLUME CAPACITY ACCESS MODES
# shared-files-pvc Bound pvc-xyz 50Gi RWX
# Edit PVC to expand to 100Gi
kubectl patch pvc shared-files-pvc -p '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}'
# Expansion happens immediately (no waiting for Pod mount)
kubectl describe pvc shared-files-pvc
# Conditions:
# Type Status
# FileSystemResizePending False
# Verify new size
kubectl get pvc shared-files-pvc
# NAME STATUS VOLUME CAPACITY ACCESS MODES
# shared-files-pvc Bound pvc-xyz 100Gi RWXCephFS Quotas
CephFS provides a mechanism to enforce size limits on shared file volumes, preventing one application from consuming all available space.
# Check quota on a CephFS mount (from inside Ceph)
ceph fs get kubernetes-fs
ceph fs status kubernetes-fs
# View quota attributes
getfattr -n ceph.quota.max_bytes /mnt/cephfs/kubernetes/pvc-xyz
# Output:
# ceph.quota.max_bytes="107374182400" # 100GiB in bytesRBD vs CephFS Comparison
Ceph RBD (Block)
- Access: ReadWriteOnce only
- Performance: Higher IOPS, lower latency
- Resize: Two-step, requires mounted Pod
- Use Cases: Databases, exclusive storage
- Protocol: Block device (like /dev/sdb)
- Snapshots: Block-level snapshots
CephFS (File)
- Access: ReadWriteMany (shared)
- Performance: Good for shared files
- Resize: One-step, instant quota update
- Use Cases: Shared files, legacy apps
- Protocol: POSIX filesystem
- Snapshots: Filesystem snapshots
- Use CephFS when you need ReadWriteMany access
- Ideal for legacy applications that need shared file access
- Monitor quota usage to prevent exhaustion
- Use subdirectories for different application tenants
- Consider RBD if you don't need shared access (better performance)
Lesson 5: Advanced CSI Capabilities
The Ceph CSI driver unlocks advanced storage capabilities for Kubernetes workloads beyond basic volume provisioning.
1. Volume Snapshots
Creating point-in-time copies of volumes for backup and recovery.
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: ceph-rbd-snapshot
driver: rbd.csi.ceph.com
deletionPolicy: Delete
parameters:
clusterID: ceph-cluster-1
csi.storage.k8s.io/snapshotter-secret-name: csi-rbd-secret
csi.storage.k8s.io/snapshotter-secret-namespace: ceph-csi-rbd
---
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: postgres-snapshot-2024
spec:
volumeSnapshotClassName: ceph-rbd-snapshot
source:
persistentVolumeClaimName: postgres-pvc- Backup before database migrations or upgrades
- Point-in-time recovery for disaster scenarios
- Testing with production-like data (clone from snapshot)
- Compliance and audit requirements
2. Volume Cloning
Creating new volumes from existing volumes or snapshots.
# Clone from existing PVC
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-clone
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
storageClassName: ceph-rbd
dataSource:
kind: PersistentVolumeClaim
name: postgres-pvc # Source PVC
---
# Clone from snapshot
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: postgres-restore
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
storageClassName: ceph-rbd
dataSource:
kind: VolumeSnapshot
name: postgres-snapshot-2024
apiGroup: snapshot.storage.k8s.io3. Volume Expansion (Dynamic Resizing)
Allowing administrators to increase the size of volumes dynamically without downtime.
# Enable in StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ceph-rbd
provisioner: rbd.csi.ceph.com
allowVolumeExpansion: true # Enable dynamic expansion
# ... other parameters ...
# Expand existing PVC
kubectl patch pvc postgres-pvc -p '{"spec":{"resources":{"requests":{"storage":"50Gi"}}}}'
# Monitor expansion progress
kubectl get pvc postgres-pvc -w
# For RBD: Wait for Pod to trigger filesystem resize
# For CephFS: Expansion is immediate4. Topology Awareness
Configuring the cluster to provision storage volumes from a Ceph cluster located in the same availability zone or region as the consuming Pod.
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: ceph-rbd-topology
provisioner: rbd.csi.ceph.com
parameters:
# ... other parameters ...
volumeBindingMode: WaitForFirstConsumer # Enable topology awareness
allowedTopologies:
- matchLabelExpressions:
- key: topology.kubernetes.io/zone
values:
- us-east-1a
- us-east-1b- Reduced Latency: Volume provisioned in the same zone as the Pod
- Lower Costs: Avoids inter-zone data transfer charges
- Better Performance: Local access is faster than cross-zone
- Failure Domain Awareness: Workloads can be zone-specific
5. Volume Metrics and Monitoring
# CSI provides volume metrics via kubelet
kubectl get --raw /api/v1/nodes/worker-1/proxy/metrics/cadvisor | grep volume
# Prometheus metrics from CSI driver
# csi_sidecar_operations_seconds - Operation latency
# csi_sidecar_operations_total - Total operations count
# kubelet_volume_stats_capacity_bytes - Volume capacity
# kubelet_volume_stats_used_bytes - Volume usage
# kubelet_volume_stats_available_bytes - Available space
# Example PVC monitoring with kubectl
kubectl exec -it postgres-0 -- df -h /var/lib/postgresql/data
# Filesystem Size Used Avail Use% Mounted on
# /dev/rbd0 50G 12G 38G 24% /var/lib/postgresql/dataAdvanced Features Summary
| Feature | RBD Support | CephFS Support | Primary Benefit |
|---|---|---|---|
| Snapshots | ✅ Yes | ✅ Yes | Backup and recovery |
| Cloning | ✅ Yes | ✅ Yes | Fast, space-efficient copies |
| Expansion | ✅ Yes (2-step) | ✅ Yes (instant) | Grow volumes on demand |
| Topology | ✅ Yes | ✅ Yes | Lower latency, reduced cost |
| Quotas | N/A | ✅ Yes | Prevent space exhaustion |
| ReadWriteMany | ❌ No | ✅ Yes | Shared access across Pods |
- Enable volume expansion in all StorageClasses for flexibility
- Implement regular snapshot schedules for critical workloads
- Use topology awareness to optimize performance and costs
- Monitor volume metrics and set alerts for capacity thresholds
- Test snapshot restore procedures regularly
- Use RBD for databases, CephFS for shared file workloads
- Document your storage architecture and disaster recovery plan
Final Quiz
Test your knowledge of Kubernetes Storage and CSI!
Question 1: What was the main problem with legacy in-tree volumes in Kubernetes?
Question 2: What is the primary benefit of CSI (Container Storage Interface)?
Question 3: What does the CSI Controller Plugin do?
Question 4: What access mode does Ceph RBD support?
Question 5: How does CephFS volume resizing work?
Question 6: What is required for RBD volume expansion to complete?
Question 7: What is the primary use case for CephFS over RBD?
Question 8: What does volumeBindingMode: WaitForFirstConsumer enable?
All correct answers are option 'b'. Review the lessons above to understand why these are the best answers.