Kubernetes: Deployment, Probes & Resources

Lesson 1 of 4

Kubernetes Deployments

What is a Deployment?

A Deployment is a Kubernetes object that provides declarative updates for Pods and ReplicaSets. It's the standard way to manage stateless applications in production environments.

Deployment Purpose

Deployments allow you to:

Declare the desired state of your application
Roll out updates with zero downtime
Roll back to previous versions if needed
Scale applications up or down
Pause and resume deployments

Managing Stateless Applications

Deployments are designed specifically for stateless applications—applications where each instance is identical and interchangeable:

Web servers (nginx, Apache)
API servers
Microservices
Worker processes

Stateful vs. Stateless

Stateless: No persistent data stored locally. Any Pod can handle any request.

Stateful: Applications like databases that require persistent storage and stable network identities. These use StatefulSets, not Deployments.

Deployment YAML Structure

A typical Deployment manifest contains the desired state specification:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-web-app
  labels:
    app: web
spec:
  # Desired state
  replicas: 3

  # Selector to identify managed Pods
  selector:
    matchLabels:
      app: web

  # Pod template
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: nginx
        image: nginx:1.21
        ports:
        - containerPort: 80

Declaring the Desired State

The Deployment continuously works to maintain the state you've declared:

# Create Deployment
kubectl apply -f deployment.yaml

# Check status
kubectl get deployments
kubectl get pods

# Scale the Deployment (change desired state)
kubectl scale deployment my-web-app --replicas=5

# Deployment controller automatically creates 2 more Pods

Update Strategies

Deployments support two update strategies:

1. Rolling Update (Default)

Gradually replaces old Pods with new ones, ensuring availability:

spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1        # Max number of extra Pods during update
      maxUnavailable: 0  # Max number of Pods that can be unavailable

maxSurge: How many extra Pods can exist during update (e.g., 1 means 4 Pods for a 3-replica deployment)
maxUnavailable: How many Pods can be down during update (0 means always keep all replicas available)

2. Recreate

Terminates all old Pods before creating new ones (causes downtime):

spec:
  strategy:
    type: Recreate

When to Use Recreate

Use Recreate strategy when:

Your application can't run multiple versions simultaneously
Resource constraints prevent running extra Pods
Downtime is acceptable

Handling Updates

To update your application, modify the Deployment spec and apply:

# Update the image version
kubectl set image deployment/my-web-app nginx=nginx:1.22

# Or edit directly
kubectl edit deployment my-web-app

# Watch the rollout
kubectl rollout status deployment/my-web-app

# View rollout history
kubectl rollout history deployment/my-web-app

Rollback Capabilities

If an update causes issues, easily rollback:

# Rollback to previous version
kubectl rollout undo deployment/my-web-app

# Rollback to specific revision
kubectl rollout history deployment/my-web-app
kubectl rollout undo deployment/my-web-app --to-revision=2

# Pause rollout (for troubleshooting)
kubectl rollout pause deployment/my-web-app

# Resume rollout
kubectl rollout resume deployment/my-web-app

Deployment Best Practices

Always use version tags, never :latest
Set appropriate resource requests and limits (covered in Lesson 3)
Configure health probes (covered in Lesson 2)
Use Rolling Updates with maxUnavailable=0 for zero downtime
Test rollouts in staging before production

Lesson 2 of 4

Kubernetes Probes: Health Checks

Why Health Checks Matter

Applications can fail in various ways. A container might be running, but the application inside could be:

Deadlocked or frozen
Unable to handle requests
Still initializing
Overloaded and unresponsive

Kubernetes probes allow you to configure automated health checks to detect and respond to these conditions.

What Are Probes?

Probes are diagnostic checks performed periodically by Kubernetes to determine the state of containers. Based on probe results, Kubernetes can automatically restart containers or stop sending traffic to them.

Types of Probes

1. Liveness Probe

Determines if the application is running and healthy.

Liveness Probe Behavior

If liveness probe fails: Kubernetes assumes the application is stuck or broken and automatically restarts the container.

Use case: Detect when an application is in an unrecoverable state and needs to be restarted.

apiVersion: v1
kind: Pod
metadata:
  name: liveness-example
spec:
  containers:
  - name: app
    image: my-app:1.0
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
      initialDelaySeconds: 30  # Wait 30s before first check
      periodSeconds: 10         # Check every 10 seconds
      timeoutSeconds: 5         # Timeout after 5 seconds
      failureThreshold: 3       # Restart after 3 failed checks

2. Readiness Probe

Determines if the application is ready to serve user traffic.

Readiness Probe Behavior

If readiness probe fails: Kubernetes removes the Pod from Service endpoints, stopping traffic from being sent to it. The container is NOT restarted.

Use case: Prevent traffic from reaching Pods that are still initializing or temporarily unable to serve requests.

apiVersion: v1
kind: Pod
metadata:
  name: readiness-example
spec:
  containers:
  - name: app
    image: my-app:1.0
    readinessProbe:
      httpGet:
        path: /ready
        port: 8080
      initialDelaySeconds: 10
      periodSeconds: 5
      failureThreshold: 3

3. Startup Probe (Kubernetes 1.16+)

Used for slow-starting containers. Disables liveness and readiness checks until the app starts.

startupProbe:
  httpGet:
    path: /healthz
    port: 8080
  failureThreshold: 30  # 30 * 10 = 300 seconds (5 minutes) max
  periodSeconds: 10

Probe Lifecycle

Container Starts → Startup Probe (if configured) → Readiness Probe → Liveness Probe

Probe Mechanisms

Probes can check health in three ways:

1. HTTP GET

Most common for web applications:

livenessProbe:
  httpGet:
    path: /healthz      # Endpoint to check
    port: 8080          # Port to check
    httpHeaders:        # Optional headers
    - name: Custom-Header
      value: Awesome
  # Success: 200-399 response code
  # Failure: Anything else or timeout

2. TCP Socket

Check if a port is accepting connections:

livenessProbe:
  tcpSocket:
    port: 3306  # Check if MySQL port is open
  initialDelaySeconds: 15
  periodSeconds: 20

3. Exec Command

Execute a command inside the container:

livenessProbe:
  exec:
    command:
    - cat
    - /tmp/healthy
  # Success: exit code 0
  # Failure: non-zero exit code

Probe Configuration Parameters

Parameter	Description	Default
`initialDelaySeconds`	Wait time before first probe	0
`periodSeconds`	How often to perform probe	10
`timeoutSeconds`	Timeout for probe response	1
`successThreshold`	Successes needed to be considered healthy	1
`failureThreshold`	Failures needed to take action	3

Complete Example with Both Probes

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: nginx
        image: nginx:1.21
        ports:
        - containerPort: 80

        # Check if nginx is alive
        livenessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          failureThreshold: 3

        # Check if nginx is ready to serve traffic
        readinessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 5
          failureThreshold: 3

Common Mistakes

Setting initialDelaySeconds too low: Container hasn't started yet, causing restart loops
Using same endpoint for liveness and readiness: Different purposes need different checks
Heavy probe operations: Probes should be lightweight and fast
No probes at all: Kubernetes can't detect unhealthy containers

Probe Best Practices

Always configure both probes for production applications
Liveness: Check critical dependencies (database connection, etc.)
Readiness: Check if ready to handle requests (warm-up complete, etc.)
Keep probes lightweight: They run frequently
Set appropriate thresholds: Balance between quick detection and false positives

Lesson 3 of 4

Resource Management: Requests & Limits

Why Resource Management Matters

In a Kubernetes cluster, multiple applications share the same physical resources (CPU and memory). Without proper resource management:

One application can starve others of resources
Nodes can run out of memory, causing crashes
The scheduler can't make informed decisions about Pod placement
Resource contention leads to unpredictable performance

Resource Concepts

Kubernetes uses two mechanisms to manage resources:

Requests: Guaranteed minimum resources (used for scheduling)
Limits: Maximum resources a container can use (enforced caps)

Resource Requests

A request is the amount of CPU and memory that Kubernetes guarantees to a container.

How Requests Work

Used by the Kubernetes scheduler to decide which node can run the Pod
The scheduler ensures the node has enough unreserved resources
Resources are reserved for the container, even if not fully used
Container is guaranteed to get at least this much

apiVersion: v1
kind: Pod
metadata:
  name: resource-demo
spec:
  containers:
  - name: app
    image: my-app:1.0
    resources:
      requests:
        memory: "256Mi"  # Request 256 MiB of memory
        cpu: "500m"      # Request 500 milliCPU (0.5 CPU)

CPU Units

1 CPU = 1 vCPU/Core (AWS vCPU, GCP Core, Azure vCore, etc.)
1000m = 1 CPU (m = millicores)
500m = 0.5 CPU = half a core
100m = 0.1 CPU = 10% of a core

Memory Units

128974848 (bytes)
129M or 129e6 (megabytes, 1000-based)
128Mi (mebibytes, 1024-based) - Recommended
1Gi = 1024 MiB

Resource Limits

A limit is the maximum amount of resources a container can consume.

How Limits Work

CPU: Container is throttled if it tries to use more than the limit
Memory: Container is killed (OOMKilled) if it exceeds the limit
Limits are enforced by the container runtime (Docker/containerd)
Containers can use less than the limit, but never more

apiVersion: v1
kind: Pod
metadata:
  name: resource-demo
spec:
  containers:
  - name: app
    image: my-app:1.0
    resources:
      requests:
        memory: "256Mi"
        cpu: "500m"
      limits:
        memory: "512Mi"  # Max 512 MiB (killed if exceeded)
        cpu: "1000m"     # Max 1 CPU (throttled if exceeded)

Memory vs. CPU Behavior

CPU (compressible): Container is throttled (slowed down) if exceeding limit. No crash.

Memory (incompressible): Container is terminated (OOMKilled) if exceeding limit. This causes restarts!

Scheduling with Requests

The scheduler uses requests to make placement decisions:

# Node has 4 CPU and 16Gi memory
# Already running Pods have requested:
#   - 2 CPU, 8Gi memory

# New Pod requests:
#   - 1.5 CPU, 6Gi memory

# Scheduler checks:
# Available CPU: 4 - 2 = 2 CPU ✓ (1.5 fits)
# Available Memory: 16 - 8 = 8Gi ✓ (6Gi fits)
# Pod is scheduled to this node

# If node can't fit the Pod, it remains Pending

Quality of Service (QoS) Classes

Based on how you configure requests and limits, Kubernetes assigns a QoS class that determines eviction priority when nodes run out of resources.

1. Guaranteed (Highest Priority)

Pod with requests = limits for all containers:

resources:
  requests:
    memory: "256Mi"
    cpu: "500m"
  limits:
    memory: "256Mi"  # Same as request
    cpu: "500m"      # Same as request

Priority: Last to be evicted
Use case: Critical production workloads

2. Burstable (Medium Priority)

Pod with requests < limits (or only requests set):

resources:
  requests:
    memory: "256Mi"
    cpu: "500m"
  limits:
    memory: "512Mi"  # Higher than request
    cpu: "1000m"     # Higher than request

Priority: Evicted after BestEffort
Use case: Most applications (can burst when resources available)

3. BestEffort (Lowest Priority)

Pod with no requests or limits:

resources: {} # No requests or limits

Priority: First to be evicted
Use case: Low-priority batch jobs

Eviction Order (Under Resource Pressure)

When a node runs out of resources:

BestEffort Pods are killed first
Burstable Pods using more than requested are killed next
Guaranteed Pods are killed last (only if system processes need resources)

Complete Deployment with Resources

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
      - name: nginx
        image: nginx:1.21
        ports:
        - containerPort: 80

        # Resource management
        resources:
          requests:
            memory: "128Mi"
            cpu: "250m"
          limits:
            memory: "256Mi"
            cpu: "500m"

        # Health probes
        livenessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 30
          periodSeconds: 10

        readinessProbe:
          httpGet:
            path: /
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 5

Monitoring Resource Usage

# View node resource usage
kubectl top nodes

# View Pod resource usage
kubectl top pods

# View Pod resource requests/limits
kubectl describe pod my-pod

# Check if Pod was OOMKilled
kubectl describe pod my-pod | grep -i oom

Resource Best Practices

Always set requests for production Pods
Set limits for memory to prevent OOM on nodes
Monitor actual usage with kubectl top and adjust
Use Burstable QoS for most workloads (requests < limits)
Use Guaranteed QoS for critical services
Start conservative and tune based on metrics

Common Resource Issues

Pods stuck Pending: No node has enough resources to fit requests
OOMKilled: Container exceeded memory limit
CPU throttling: Application slow because hitting CPU limit
No requests set: Scheduler can't make good decisions

Lesson 4 of 4

Important Notes & Practical Exercises

Correction: Docker EXPOSE Directive

An important clarification about Docker networking:

EXPOSE Does NOT Open Ports

The EXPOSE directive in a Dockerfile does not actually open a port at the network level. It serves only as documentation to indicate which ports the application uses.

# In Dockerfile
EXPOSE 8080

# This only documents that port 8080 is used
# It does NOT:
#   - Open the port in firewall
#   - Make the port accessible from outside
#   - Configure any network settings

# To actually expose the port, use:
docker run -p 8080:8080 my-image

# In Kubernetes, define ports in Service object, not just Pod spec

What EXPOSE Actually Does

Serves as documentation for developers
Used by docker ps to display ports
Can be used by container orchestrators for port mapping
No actual network effect by itself

Understanding the Container Runtime

Kubernetes delegates actual container execution to the container runtime (Docker, containerd, CRI-O). Understanding this delegation is crucial for troubleshooting.

How Kubernetes Uses cgroups

Resource limits and requests are enforced using Linux cgroups (control groups):

# When you set in Kubernetes:
resources:
  limits:
    memory: "512Mi"
    cpu: "1000m"

# Kubernetes tells the container runtime to create cgroups:
# - memory.limit_in_bytes = 536870912 (512 * 1024 * 1024)
# - cpu.cfs_quota_us = 100000 (1 CPU worth)

# You can inspect these directly on the node:
cat /sys/fs/cgroup/memory/kubepods/pod//memory.limit_in_bytes
cat /sys/fs/cgroup/cpu/kubepods/pod//cpu.cfs_quota_us

Viewing Underlying Docker Containers

# SSH to a node and view Docker containers
docker ps

# You'll see containers created by Kubernetes
# Each Pod's containers plus pause containers

# View container resource constraints
docker inspect  | grep -A 10 "Memory"

# This shows how Kubernetes delegated work to Docker

Kubernetes Quality of Service (QoS)

As covered in Lesson 3, understanding QoS classes is crucial for production deployments:

Practical Homework Assignments

Assignment 1: Implement Liveness Probe

Goal: Add a Liveness Probe to a Deployment and observe its behavior.

# 1. Create a Deployment with a failing liveness probe
kubectl apply -f - <

# Expected: Container restarts when /tmp/healthy is removed

Assignment 2: Explore Docker and cgroups

Goal: Understand how Kubernetes delegates work to Docker.

# 1. Create a Pod with resource limits
kubectl run resource-demo \
  --image=nginx \
  --requests='cpu=100m,memory=128Mi' \
  --limits='cpu=200m,memory=256Mi'

# 2. Find the node where Pod is running
kubectl get pod resource-demo -o wide

# 3. SSH to that node and find Docker container
docker ps | grep resource-demo

# 4. Inspect container resources
docker inspect  | grep -i memory
docker inspect  | grep -i cpu

# 5. Check cgroups (if you have access)
find /sys/fs/cgroup -name "**"
cat /sys/fs/cgroup/memory/kubepods/.../memory.limit_in_bytes

Assignment 3: Study Kubernetes Documentation

Goal: Familiarize yourself with the structure and content of official docs.

Visit kubernetes.io/docs
Read Configure Quality of Service for Pods
Understand the three QoS classes and eviction behavior
Review Managing Resources for Containers
Study the documentation structure for future reference

Production-Ready Checklist

Before deploying to production, ensure:

✓ Using Deployments (not bare Pods)
✓ Liveness probes configured
✓ Readiness probes configured
✓ Resource requests set (for scheduling)
✓ Resource limits set (for protection)
✓ Rolling update strategy configured
✓ Multiple replicas for high availability
✓ Proper labels and selectors
✓ Version tags (not :latest)

Key Takeaways

Summary: Production-Ready Kubernetes

Deployments manage stateless apps with declarative updates and rollbacks
Liveness Probes detect broken apps and restart containers
Readiness Probes prevent traffic to unready Pods
Resource Requests guarantee minimum resources and enable scheduling
Resource Limits cap maximum usage and prevent resource starvation
QoS Classes determine eviction priority under resource pressure
cgroups are how resource limits are actually enforced

Next Topics to Explore

Services: Networking and load balancing
ConfigMaps & Secrets: Configuration management
Volumes & PersistentVolumes: Storage
StatefulSets: Stateful applications
Ingress: HTTP routing and SSL/TLS
RBAC: Security and access control
Helm: Package management

Kubernetes Production Ready