Kubernetes Cluster Upgrades

Lesson 1: Maintenance & Self-Healing

Kubernetes is designed as a self-healing system with intelligent resource management capabilities. Understanding maintenance procedures is critical for safe cluster operations.

Kubernetes Self-Healing

Self-Healing Capability: Kubernetes is designed to be a self-healing system, and kubelet is smart enough to manage resources and, if necessary, evict pods when resources are scarce.

How Self-Healing Works:

Kubelet monitors node resources (CPU, memory, disk)
When resources are scarce, it evicts low-priority pods
Failed pods are automatically restarted
ReplicaSets ensure desired pod count is maintained
Health checks (liveness/readiness) detect unhealthy pods

Node Maintenance Procedure

For planned maintenance on a worker node, you should use kubectl drain to gracefully remove all pods, perform the necessary work (e.g., reboot, kernel update), and then use kubectl uncordon to bring the node back into service.

# Step 1: Check current node status
kubectl get nodes

# Step 2: Cordon the node (prevent new pod scheduling)
kubectl cordon worker-node-1
# node/worker-node-1 cordoned

# Step 3: Drain the node (gracefully evict all pods)
kubectl drain worker-node-1 \
  --ignore-daemonsets \
  --delete-emptydir-data \
  --force

# Output shows pods being evicted:
# evicting pod default/web-app-abc
# evicting pod default/api-server-xyz
# pod/web-app-abc evicted
# pod/api-server-xyz evicted
# node/worker-node-1 drained

# Step 4: Perform maintenance
ssh worker-node-1
sudo apt update && sudo apt upgrade -y
sudo reboot

# Step 5: After reboot, uncordon the node
kubectl uncordon worker-node-1
# node/worker-node-1 uncordoned

# Step 6: Verify node is back in service
kubectl get nodes
# NAME            STATUS   ROLES    AGE   VERSION
# worker-node-1   Ready    worker   30d   v1.28.0

Important: Always ensure applications have multiple replicas across different nodes before draining. Otherwise, draining could cause application downtime.

Managed vs. Self-Managed Updates

Managed Kubernetes (GKE, EKS, AKS)

Update Approach: Delete and replace nodes

Cloud provider handles upgrades
Old nodes are deleted
New nodes with updated version are provisioned
Rolling fashion (one at a time)
Minimal manual intervention
Usually zero-downtime

Self-Managed Kubernetes

Update Approach: In-place upgrades

Full control and responsibility
Update components in sequence
Requires careful planning
Tools like Kubespray help automate
More manual steps involved
Greater risk if not done correctly

Control Plane Maintenance Scenarios

High Availability (3 Masters)

HA Masters: In an HA setup (three masters), you can update masters one-by-one, as the remaining masters will maintain cluster functionality. This provides zero-downtime upgrades.

# With 3 masters, you can safely upgrade one at a time
# Master 1 goes down → Masters 2 & 3 handle API requests
# Master 2 goes down → Masters 1 & 3 handle API requests
# Master 3 goes down → Masters 1 & 2 handle API requests

# etcd maintains quorum: 2 out of 3 nodes = quorum maintained

Single Master (Not Recommended for Production)

Single Master Risk: If you only have one master, the running workload will continue, but if a pod fails and needs restarting, components like Ingress Controller and kube-proxy won't be able to reach the API server for updated pod status, potentially still routing traffic to the failed pod until the master is restored.

What Happens During Single Master Downtime:

Existing pods continue running normally
No new pods can be scheduled
Failed pods cannot be restarted
Ingress/Services may route to failed pods
kubectl commands fail (no API server)
Cluster state changes are not processed

Best Practice: Always run at least 3 master nodes in production to ensure high availability during maintenance and unexpected failures.

Lesson 2: Common Problems During Upgrade

Understanding common upgrade issues is critical for planning and troubleshooting. Most problems stem from non-backward-compatible changes.

The Importance of Reading the Changelog

Critical Rule: Always thoroughly study the changelog for breaking changes before upgrading. Many upgrade failures can be prevented by reading release notes.

Problem 1: Outdated Manifest API Versions

Deprecated API Versions

Upgrading to versions like 1.16 removed support for older API versions (e.g., apps/v1beta1 for Deployments). While existing applications will continue to run, new deployments using old manifest formats will fail.

# Old manifest (stopped working in Kubernetes 1.16)
apiVersion: apps/v1beta1  # DEPRECATED!
kind: Deployment
metadata:
  name: old-app
spec:
  replicas: 3
  template:
    # ...

# Error when applying after upgrade:
# error: unable to recognize "deployment.yaml":
# no matches for kind "Deployment" in version "apps/v1beta1"

# Fixed manifest (correct version)
apiVersion: apps/v1  # CORRECT
kind: Deployment
metadata:
  name: old-app
spec:
  replicas: 3
  selector:  # Now required in apps/v1
    matchLabels:
      app: old-app
  template:
    metadata:
      labels:
        app: old-app
    # ...

Impact: Existing deployments continue running, but you cannot update or create new resources using old API versions. This breaks CI/CD pipelines that rely on old manifests.

Solution: Before upgrading, update all manifests to current API versions. Use tools like kubectl convert or pluto to detect deprecated APIs in your cluster.

# Install pluto to detect deprecated APIs
brew install FairwindsOps/tap/pluto

# Scan your manifests for deprecated APIs
pluto detect-files -d ./manifests/

# Output shows deprecated APIs:
# NAME        KIND         VERSION        DEPRECATED   DEPRECATED IN   REMOVED   REMOVED IN
# old-app     Deployment   apps/v1beta1   true         v1.9.0          true      v1.16.0

# Convert old API versions to current
kubectl convert -f old-deployment.yaml --output-version apps/v1

Problem 2: Deprecated/Changed Kubelet Flags

Command-Line Flag Changes

Command-line flags for kubelet and other components often change names over time (e.g., --experimental-bootstrap-kubeconfig changed to --bootstrap-kubeconfig), which can break automated deployment scripts.

# Old kubelet configuration (Kubernetes 1.8)
/usr/bin/kubelet \
  --experimental-bootstrap-kubeconfig=/etc/kubernetes/bootstrap.conf \
  --experimental-allowed-unsafe-sysctls=net.ipv4.tcp_syncookies \
  # ...

# After upgrade to 1.10+, these flags are removed:
# Flag --experimental-bootstrap-kubeconfig has been deprecated
# Use --bootstrap-kubeconfig instead

# Updated configuration
/usr/bin/kubelet \
  --bootstrap-kubeconfig=/etc/kubernetes/bootstrap.conf \
  --allowed-unsafe-sysctls=net.ipv4.tcp_syncookies \
  # ...

Impact: Kubelet fails to start with unknown flag errors, breaking the entire node. Automation scripts that provision new nodes also fail.

Problem 3: Stricter Validation

Bug Fixes That Break Existing Configs

Sometimes, a bug fix in a newer version makes manifest validation stricter (e.g., version 1.14.1), causing previously working Helm charts or configuration files to fail.

# This worked in Kubernetes 1.13 (due to a bug)
apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  ports:
  - port: 80
    targetPort: "http"  # String value
    protocol: TCP
  # selector is MISSING (invalid but accepted due to bug)

# After upgrade to 1.14.1:
# Error: Service "my-service" is invalid:
# spec.selector: Required value

# Fixed manifest
apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  selector:  # Now properly enforced
    app: my-app
  ports:
  - port: 80
    targetPort: http
    protocol: TCP

Why This Happens: Kubernetes sometimes accepted invalid configurations due to bugs. When these bugs are fixed, previously "working" manifests suddenly fail validation.

Problem 4: Docker/Container Runtime Issues

Forgotten Containers

Upgrading Docker can sometimes cause the daemon to "forget" about running containers (like kube-proxy). A new instance of the container will be launched, but the old process still occupies the ports, leading to conflicts and crash loops until the old process is manually killed or the node is rebooted.

# Scenario: Upgraded Docker from 19.03 to 20.10

# Old kube-proxy container still running (PID 1234)
ps aux | grep kube-proxy
# root  1234  kube-proxy --kubeconfig=/etc/kubernetes/kubeconfig

# Docker daemon doesn't see it
docker ps | grep kube-proxy
# (no output - Docker "forgot" about it)

# Kubelet tries to start new kube-proxy container
# Error: port 10256 already in use by PID 1234

# Check logs
kubectl logs -n kube-system kube-proxy-abc
# Error: listen tcp :10256: bind: address already in use

# Solution 1: Kill the old process
sudo kill 1234
# Kubelet will restart kube-proxy successfully

# Solution 2: Reboot the node (cleaner)
sudo reboot

API Version Incompatibility

A mismatch between the API version supported by the kubelet and the installed Docker client (docker cli) or daemon can cause connection failures.

# Kubelet expects Docker API 1.40
# Installed Docker supports only API 1.38

# Kubelet logs show:
# Error: failed to create containerd task:
# failed to create shim: API version mismatch

# Check Docker API version
docker version
# Client: API version: 1.38
# Server: API version: 1.38

# Solution: Upgrade Docker to compatible version
# OR: Use containerd directly instead of Docker

Recommendation: Modern Kubernetes (1.24+) has deprecated Docker support. Migrate to containerd or CRI-O as your container runtime to avoid these issues.

Lesson 3: General Upgrade Procedure

Following a structured upgrade procedure is essential for safe, successful cluster upgrades. Never rush or skip steps.

The Complete Upgrade Workflow

Step 1: Read Documentation

Thoroughly study the changelog for breaking changes. Look for:

Deprecated API versions
Removed or renamed flags
Changes in default behavior
Known issues and workarounds
Feature gates that changed

Step 2: Practice on Test Cluster

Install and upgrade a test cluster to identify and resolve potential issues specific to your setup. The test cluster should mirror production as closely as possible.

Step 3: Plan and Backup

Schedule the upgrade during a maintenance window and ensure etcd backups are performed. Have a rollback plan ready.

Step 4: Sequential Execution

Update components one-by-one, always starting with the control plane, then worker nodes.

Component Update Order

1. etcd (First)

The cluster's datastore must be upgraded first. Back up etcd before starting.

# Backup etcd
ETCDCTL_API=3 etcdctl snapshot save backup.db \
  --endpoints=https://127.0.0.1:2379 \
  --cacert=/etc/kubernetes/pki/etcd/ca.crt \
  --cert=/etc/kubernetes/pki/etcd/server.crt \
  --key=/etc/kubernetes/pki/etcd/server.key

# Verify backup
ETCDCTL_API=3 etcdctl snapshot status backup.db --write-out=table

# Upgrade etcd (example with kubeadm)
# This is usually handled by kubeadm upgrade apply

2. Control Plane Components

Upgrade API Server, Scheduler, Controller Manager, and kubelet on master nodes. In HA setup, do one master at a time.

# On first master node
sudo kubeadm upgrade plan  # Review upgrade plan

sudo kubeadm upgrade apply v1.28.0

# Upgrade kubelet and kubectl on master
sudo apt-mark unhold kubelet kubectl
sudo apt-get update
sudo apt-get install -y kubelet=1.28.0-00 kubectl=1.28.0-00
sudo apt-mark hold kubelet kubectl

sudo systemctl daemon-reload
sudo systemctl restart kubelet

# Repeat for other master nodes
# On subsequent masters, use:
# sudo kubeadm upgrade node

3. Worker Nodes

Upgrade kubelet, CNI, kube-proxy, and CoreDNS on worker nodes. Do this in batches or one node at a time.

# For each worker node:

# Drain the node
kubectl drain worker-node-1 --ignore-daemonsets --delete-emptydir-data

# SSH to the worker node
ssh worker-node-1

# Upgrade kubeadm
sudo apt-mark unhold kubeadm
sudo apt-get update
sudo apt-get install -y kubeadm=1.28.0-00
sudo apt-mark hold kubeadm

# Upgrade node configuration
sudo kubeadm upgrade node

# Upgrade kubelet and kubectl
sudo apt-mark unhold kubelet kubectl
sudo apt-get install -y kubelet=1.28.0-00 kubectl=1.28.0-00
sudo apt-mark hold kubelet kubectl

# Restart kubelet
sudo systemctl daemon-reload
sudo systemctl restart kubelet

# Exit SSH and uncordon the node
kubectl uncordon worker-node-1

# Verify node is updated
kubectl get nodes

Version Skew Policy

Important Rules:

No Version Skipping: Don't skip minor versions. Upgrade sequentially (1.25 → 1.26 → 1.27)
kubelet Skew: kubelet can be up to 2 minor versions behind the API server
kubectl Skew: kubectl can be 1 minor version ahead or behind the API server
kube-proxy Skew: kube-proxy should match the kubelet version

# Valid version combinations:
# API Server: 1.28
# kubelet: 1.28, 1.27, or 1.26 (within 2 minor versions)
# kubectl: 1.29, 1.28, or 1.27 (±1 minor version)

# Invalid:
# API Server: 1.28
# kubelet: 1.25 (too old - 3 versions behind)

Pre-Upgrade Checklist

Task	Status	Notes
Read changelog and release notes	☐	Look for breaking changes
Test upgrade on staging cluster	☐	Mirror production environment
Backup etcd	☐	Store backup securely off-cluster
Update deprecated API versions in manifests	☐	Use pluto or kubectl-convert
Schedule maintenance window	☐	Communicate to stakeholders
Prepare rollback plan	☐	Document rollback procedure
Verify sufficient node capacity	☐	Pods must be reschedulable during drain
Check PodDisruptionBudgets	☐	Ensure PDBs won't block drain

Lesson 4: Kubespray Upgrade Automation

Kubespray is an Ansible-based tool that automates Kubernetes cluster deployment and upgrades, making the process significantly simpler and more reliable.

What is Kubespray?

Kubespray: An open-source project that uses Ansible to deploy and manage production-ready Kubernetes clusters. It supports various OS distributions, network plugins, and infrastructure providers.

Benefits of Kubespray:

Automated, repeatable deployments
Safe, sequential component upgrades
Configurable via inventory files
Supports on-premises and cloud environments
Active community and well-tested
Handles complex HA configurations

Kubespray Upgrade Process

The demonstration showed how simple the upgrade can be when using an automation tool like Kubespray:

Step 1: Update Inventory Configuration

The user only needed to modify the kube_version variable in the inventory file from 1.17.5 to 1.18.4.

# File: inventory/mycluster/group_vars/k8s_cluster/k8s-cluster.yml

# Before upgrade
kube_version: v1.17.5

# After change (ready to upgrade)
kube_version: v1.18.4

# Other important settings:
kubeadm_upgrade: true  # Enable upgrade mode
upgrade_cluster_setup: true
kube_proxy_mode: ipvs
dns_mode: coredns

Step 2: Run Upgrade Playbook

# Execute the Kubespray upgrade playbook
ansible-playbook -i inventory/mycluster/hosts.yml \
  upgrade-cluster.yml \
  --become \
  --become-user=root

# Optional: Use bastion host for private networks
ansible-playbook -i inventory/mycluster/hosts.yml \
  upgrade-cluster.yml \
  --become \
  --become-user=root \
  -e ansible_ssh_common_args='-o ProxyCommand="ssh -W %h:%p user@bastion-host"'

How Kubespray Manages the Upgrade

Sequential Upgrade Strategy:

Masters: Updated sequentially (serial: 1) to ensure HA is maintained
Workers: Updated in batches (default serial: 20%) to minimize disruption
Automatic Drain: Kubespray drains each node before upgrading
Verification: Checks node health before proceeding to next

# Kubespray upgrade playbook workflow:

# 1. Pre-flight checks
- Verify SSH connectivity to all nodes
- Check current Kubernetes version
- Validate inventory configuration

# 2. Upgrade etcd (if needed)
- Backup etcd on each master
- Upgrade etcd binaries
- Restart etcd service
- Verify cluster health

# 3. Upgrade first master (serial: 1)
- Drain node
- Upgrade control plane components
- Upgrade kubelet
- Uncordon node
- Verify API server is healthy

# 4. Upgrade second master (serial: 1)
- Same process as first master
- Cluster remains available via other masters

# 5. Upgrade third master (serial: 1)
- Same process

# 6. Upgrade worker nodes (serial: 20%)
- Batch 1 (20% of workers): drain, upgrade, uncordon
- Verify batch health
- Batch 2 (next 20%): drain, upgrade, uncordon
- Continue until all workers upgraded

# 7. Post-upgrade tasks
- Upgrade CoreDNS
- Update kube-proxy
- Verify all components
- Run cluster smoke tests

Upgrade Timing

Performance: In the demonstration, the entire cluster of 6 nodes was upgraded from v1.17.5 to v1.18.4 in approximately 13 minutes.

Phase	Approximate Time	Details
Pre-flight checks	1-2 minutes	Validation and connectivity tests
etcd upgrade	2-3 minutes	Backup and upgrade 3 etcd instances
Master upgrades (3 nodes)	5-6 minutes	Sequential, ~2 min per master
Worker upgrades (3 nodes)	4-5 minutes	Parallel batches, faster than masters
Post-upgrade verification	1 minute	Health checks and validation
Total	~13 minutes	For 6-node cluster

Bastion Host for Private Networks

Bastion Host (Jump Host): A bastion host is necessary when the cluster nodes are in a private network not directly accessible from the internet. It acts as an intermediary SSH server, allowing tools like Ansible (Kubespray) to securely access the internal cluster network.

# Network topology:
# Your laptop → Public Internet → Bastion Host → Private Network → Cluster Nodes

# Inventory configuration for bastion
# File: inventory/mycluster/hosts.yml

all:
  vars:
    ansible_user: ubuntu
    # Configure bastion/jump host
    bastion_host: bastion.example.com
    bastion_user: ubuntu
    bastion_ssh_key: ~/.ssh/bastion-key.pem

# SSH through bastion
ansible-playbook -i inventory/mycluster/hosts.yml \
  upgrade-cluster.yml \
  -e ansible_ssh_common_args='-o ProxyCommand="ssh -W %h:%p -i ~/.ssh/bastion-key.pem ubuntu@bastion.example.com"'

Docker live-restore Caveat

Docker live-restore: While live-restore allows containers to continue running if the Docker daemon restarts, it can sometimes cause issues during a major Docker upgrade where the new daemon version fails to properly pick up the old containers, leading to service disruption.

# Docker daemon.json with live-restore
{
  "live-restore": true
}

# Potential issue during upgrade:
# 1. Docker upgraded from 19.03 to 20.10
# 2. New daemon starts but doesn't recognize old containers
# 3. Containers still running but "orphaned"
# 4. Kubelet tries to start new containers → port conflicts
# 5. Manual intervention required

# Recommendation: Disable live-restore during cluster upgrades
# Or migrate to containerd/CRI-O (no Docker dependency)

Best Practice: For Kubernetes 1.24+, migrate away from Docker to containerd or CRI-O as your container runtime. This avoids Docker-specific issues and is officially recommended by Kubernetes.

Final Quiz

Test your knowledge of Kubernetes Cluster Upgrades!

Question 1: What is the correct procedure for worker node maintenance?

a) Directly reboot the node without any preparation

b) Use kubectl drain to evacuate pods, perform maintenance, then kubectl uncordon

c) Delete all pods manually before maintenance

d) Stop kubelet service and reboot immediately

Question 2: What happens during single master downtime?

a) All pods immediately stop running

b) Existing pods run normally but failed pods cannot restart and no new pods can be scheduled

c) The cluster automatically creates a new master

d) Worker nodes take over master responsibilities

Question 3: What is a common issue when upgrading to Kubernetes 1.16?

a) All pods are automatically deleted

b) Old API versions like apps/v1beta1 are removed and manifests using them fail

c) etcd becomes incompatible

d) CNI plugins stop working

Question 4: What is the correct component upgrade order?

a) Worker nodes first, then control plane

b) etcd first, then control plane components, then worker nodes

c) All components simultaneously

d) CNI first, then everything else

Question 5: What must you always do before upgrading a cluster?

a) Delete all workloads first

b) Read changelog, test on staging cluster, and backup etcd

c) Upgrade production first to find issues

d) Disable all monitoring systems

Question 6: How does Kubespray handle master node upgrades in HA setup?

a) Upgrades all masters simultaneously

b) Updates masters sequentially (serial: 1) to maintain availability

c) Requires manual intervention for each master

d) Only upgrades the primary master

Question 7: What is a bastion host used for?

a) Running the Kubernetes control plane

b) Acting as SSH jump host to access cluster nodes in private network

c) Storing etcd backups

d) Load balancing API server traffic

Question 8: Why can Docker upgrades cause "forgotten container" issues?

a) Docker always deletes all containers during upgrade

b) New Docker daemon may not recognize old containers, causing port conflicts when kubelet tries to start new instances

c) Containers are stored in wrong directory

d) Kubernetes is incompatible with Docker

Quiz Complete!
All correct answers are option 'b'. Review the lessons above to understand why these are the best answers.