Kubernetes Production Ready

Deployments, Probes & Resources

0%
Lesson 1 of 4

Kubernetes Deployments

What is a Deployment?

A Deployment is a Kubernetes object that provides declarative updates for Pods and ReplicaSets. It's the standard way to manage stateless applications in production environments.

Deployment Purpose

Deployments allow you to:

  • Declare the desired state of your application
  • Roll out updates with zero downtime
  • Roll back to previous versions if needed
  • Scale applications up or down
  • Pause and resume deployments

Managing Stateless Applications

Deployments are designed specifically for stateless applications—applications where each instance is identical and interchangeable:

  • Web servers (nginx, Apache)
  • API servers
  • Microservices
  • Worker processes

Stateful vs. Stateless

Stateless: No persistent data stored locally. Any Pod can handle any request.

Stateful: Applications like databases that require persistent storage and stable network identities. These use StatefulSets, not Deployments.

Deployment YAML Structure

A typical Deployment manifest contains the desired state specification:

apiVersion: apps/v1 kind: Deployment metadata: name: my-web-app labels: app: web spec: # Desired state replicas: 3 # Selector to identify managed Pods selector: matchLabels: app: web # Pod template template: metadata: labels: app: web spec: containers: - name: nginx image: nginx:1.21 ports: - containerPort: 80

Declaring the Desired State

The Deployment continuously works to maintain the state you've declared:

# Create Deployment kubectl apply -f deployment.yaml # Check status kubectl get deployments kubectl get pods # Scale the Deployment (change desired state) kubectl scale deployment my-web-app --replicas=5 # Deployment controller automatically creates 2 more Pods

Update Strategies

Deployments support two update strategies:

1. Rolling Update (Default)

Gradually replaces old Pods with new ones, ensuring availability:

spec: strategy: type: RollingUpdate rollingUpdate: maxSurge: 1 # Max number of extra Pods during update maxUnavailable: 0 # Max number of Pods that can be unavailable
  • maxSurge: How many extra Pods can exist during update (e.g., 1 means 4 Pods for a 3-replica deployment)
  • maxUnavailable: How many Pods can be down during update (0 means always keep all replicas available)

2. Recreate

Terminates all old Pods before creating new ones (causes downtime):

spec: strategy: type: Recreate

When to Use Recreate

Use Recreate strategy when:

  • Your application can't run multiple versions simultaneously
  • Resource constraints prevent running extra Pods
  • Downtime is acceptable

Handling Updates

To update your application, modify the Deployment spec and apply:

# Update the image version kubectl set image deployment/my-web-app nginx=nginx:1.22 # Or edit directly kubectl edit deployment my-web-app # Watch the rollout kubectl rollout status deployment/my-web-app # View rollout history kubectl rollout history deployment/my-web-app

Rollback Capabilities

If an update causes issues, easily rollback:

# Rollback to previous version kubectl rollout undo deployment/my-web-app # Rollback to specific revision kubectl rollout history deployment/my-web-app kubectl rollout undo deployment/my-web-app --to-revision=2 # Pause rollout (for troubleshooting) kubectl rollout pause deployment/my-web-app # Resume rollout kubectl rollout resume deployment/my-web-app

Deployment Best Practices

  • Always use version tags, never :latest
  • Set appropriate resource requests and limits (covered in Lesson 3)
  • Configure health probes (covered in Lesson 2)
  • Use Rolling Updates with maxUnavailable=0 for zero downtime
  • Test rollouts in staging before production
Lesson 2 of 4

Kubernetes Probes: Health Checks

Why Health Checks Matter

Applications can fail in various ways. A container might be running, but the application inside could be:

  • Deadlocked or frozen
  • Unable to handle requests
  • Still initializing
  • Overloaded and unresponsive

Kubernetes probes allow you to configure automated health checks to detect and respond to these conditions.

What Are Probes?

Probes are diagnostic checks performed periodically by Kubernetes to determine the state of containers. Based on probe results, Kubernetes can automatically restart containers or stop sending traffic to them.

Types of Probes

1. Liveness Probe

Determines if the application is running and healthy.

Liveness Probe Behavior

If liveness probe fails: Kubernetes assumes the application is stuck or broken and automatically restarts the container.

Use case: Detect when an application is in an unrecoverable state and needs to be restarted.

apiVersion: v1 kind: Pod metadata: name: liveness-example spec: containers: - name: app image: my-app:1.0 livenessProbe: httpGet: path: /healthz port: 8080 initialDelaySeconds: 30 # Wait 30s before first check periodSeconds: 10 # Check every 10 seconds timeoutSeconds: 5 # Timeout after 5 seconds failureThreshold: 3 # Restart after 3 failed checks

2. Readiness Probe

Determines if the application is ready to serve user traffic.

Readiness Probe Behavior

If readiness probe fails: Kubernetes removes the Pod from Service endpoints, stopping traffic from being sent to it. The container is NOT restarted.

Use case: Prevent traffic from reaching Pods that are still initializing or temporarily unable to serve requests.

apiVersion: v1 kind: Pod metadata: name: readiness-example spec: containers: - name: app image: my-app:1.0 readinessProbe: httpGet: path: /ready port: 8080 initialDelaySeconds: 10 periodSeconds: 5 failureThreshold: 3

3. Startup Probe (Kubernetes 1.16+)

Used for slow-starting containers. Disables liveness and readiness checks until the app starts.

startupProbe: httpGet: path: /healthz port: 8080 failureThreshold: 30 # 30 * 10 = 300 seconds (5 minutes) max periodSeconds: 10

Probe Lifecycle

Container StartsStartup Probe (if configured) → Readiness ProbeLiveness Probe

Probe Mechanisms

Probes can check health in three ways:

1. HTTP GET

Most common for web applications:

livenessProbe: httpGet: path: /healthz # Endpoint to check port: 8080 # Port to check httpHeaders: # Optional headers - name: Custom-Header value: Awesome # Success: 200-399 response code # Failure: Anything else or timeout

2. TCP Socket

Check if a port is accepting connections:

livenessProbe: tcpSocket: port: 3306 # Check if MySQL port is open initialDelaySeconds: 15 periodSeconds: 20

3. Exec Command

Execute a command inside the container:

livenessProbe: exec: command: - cat - /tmp/healthy # Success: exit code 0 # Failure: non-zero exit code

Probe Configuration Parameters

Parameter Description Default
initialDelaySeconds Wait time before first probe 0
periodSeconds How often to perform probe 10
timeoutSeconds Timeout for probe response 1
successThreshold Successes needed to be considered healthy 1
failureThreshold Failures needed to take action 3

Complete Example with Both Probes

apiVersion: apps/v1 kind: Deployment metadata: name: web-app spec: replicas: 3 selector: matchLabels: app: web template: metadata: labels: app: web spec: containers: - name: nginx image: nginx:1.21 ports: - containerPort: 80 # Check if nginx is alive livenessProbe: httpGet: path: / port: 80 initialDelaySeconds: 30 periodSeconds: 10 timeoutSeconds: 5 failureThreshold: 3 # Check if nginx is ready to serve traffic readinessProbe: httpGet: path: / port: 80 initialDelaySeconds: 5 periodSeconds: 5 failureThreshold: 3

Common Mistakes

  • Setting initialDelaySeconds too low: Container hasn't started yet, causing restart loops
  • Using same endpoint for liveness and readiness: Different purposes need different checks
  • Heavy probe operations: Probes should be lightweight and fast
  • No probes at all: Kubernetes can't detect unhealthy containers

Probe Best Practices

  • Always configure both probes for production applications
  • Liveness: Check critical dependencies (database connection, etc.)
  • Readiness: Check if ready to handle requests (warm-up complete, etc.)
  • Keep probes lightweight: They run frequently
  • Set appropriate thresholds: Balance between quick detection and false positives
Lesson 3 of 4

Resource Management: Requests & Limits

Why Resource Management Matters

In a Kubernetes cluster, multiple applications share the same physical resources (CPU and memory). Without proper resource management:

  • One application can starve others of resources
  • Nodes can run out of memory, causing crashes
  • The scheduler can't make informed decisions about Pod placement
  • Resource contention leads to unpredictable performance

Resource Concepts

Kubernetes uses two mechanisms to manage resources:

  • Requests: Guaranteed minimum resources (used for scheduling)
  • Limits: Maximum resources a container can use (enforced caps)

Resource Requests

A request is the amount of CPU and memory that Kubernetes guarantees to a container.

How Requests Work

  • Used by the Kubernetes scheduler to decide which node can run the Pod
  • The scheduler ensures the node has enough unreserved resources
  • Resources are reserved for the container, even if not fully used
  • Container is guaranteed to get at least this much
apiVersion: v1 kind: Pod metadata: name: resource-demo spec: containers: - name: app image: my-app:1.0 resources: requests: memory: "256Mi" # Request 256 MiB of memory cpu: "500m" # Request 500 milliCPU (0.5 CPU)

CPU Units

  • 1 CPU = 1 vCPU/Core (AWS vCPU, GCP Core, Azure vCore, etc.)
  • 1000m = 1 CPU (m = millicores)
  • 500m = 0.5 CPU = half a core
  • 100m = 0.1 CPU = 10% of a core

Memory Units

  • 128974848 (bytes)
  • 129M or 129e6 (megabytes, 1000-based)
  • 128Mi (mebibytes, 1024-based) - Recommended
  • 1Gi = 1024 MiB

Resource Limits

A limit is the maximum amount of resources a container can consume.

How Limits Work

  • CPU: Container is throttled if it tries to use more than the limit
  • Memory: Container is killed (OOMKilled) if it exceeds the limit
  • Limits are enforced by the container runtime (Docker/containerd)
  • Containers can use less than the limit, but never more
apiVersion: v1 kind: Pod metadata: name: resource-demo spec: containers: - name: app image: my-app:1.0 resources: requests: memory: "256Mi" cpu: "500m" limits: memory: "512Mi" # Max 512 MiB (killed if exceeded) cpu: "1000m" # Max 1 CPU (throttled if exceeded)

Memory vs. CPU Behavior

CPU (compressible): Container is throttled (slowed down) if exceeding limit. No crash.

Memory (incompressible): Container is terminated (OOMKilled) if exceeding limit. This causes restarts!

Scheduling with Requests

The scheduler uses requests to make placement decisions:

# Node has 4 CPU and 16Gi memory # Already running Pods have requested: # - 2 CPU, 8Gi memory # New Pod requests: # - 1.5 CPU, 6Gi memory # Scheduler checks: # Available CPU: 4 - 2 = 2 CPU ✓ (1.5 fits) # Available Memory: 16 - 8 = 8Gi ✓ (6Gi fits) # Pod is scheduled to this node # If node can't fit the Pod, it remains Pending

Quality of Service (QoS) Classes

Based on how you configure requests and limits, Kubernetes assigns a QoS class that determines eviction priority when nodes run out of resources.

1. Guaranteed (Highest Priority)

Pod with requests = limits for all containers:

resources: requests: memory: "256Mi" cpu: "500m" limits: memory: "256Mi" # Same as request cpu: "500m" # Same as request
  • Priority: Last to be evicted
  • Use case: Critical production workloads

2. Burstable (Medium Priority)

Pod with requests < limits (or only requests set):

resources: requests: memory: "256Mi" cpu: "500m" limits: memory: "512Mi" # Higher than request cpu: "1000m" # Higher than request
  • Priority: Evicted after BestEffort
  • Use case: Most applications (can burst when resources available)

3. BestEffort (Lowest Priority)

Pod with no requests or limits:

resources: {} # No requests or limits
  • Priority: First to be evicted
  • Use case: Low-priority batch jobs

Eviction Order (Under Resource Pressure)

When a node runs out of resources:

  1. BestEffort Pods are killed first
  2. Burstable Pods using more than requested are killed next
  3. Guaranteed Pods are killed last (only if system processes need resources)

Complete Deployment with Resources

apiVersion: apps/v1 kind: Deployment metadata: name: web-app spec: replicas: 3 selector: matchLabels: app: web template: metadata: labels: app: web spec: containers: - name: nginx image: nginx:1.21 ports: - containerPort: 80 # Resource management resources: requests: memory: "128Mi" cpu: "250m" limits: memory: "256Mi" cpu: "500m" # Health probes livenessProbe: httpGet: path: / port: 80 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: / port: 80 initialDelaySeconds: 5 periodSeconds: 5

Monitoring Resource Usage

# View node resource usage kubectl top nodes # View Pod resource usage kubectl top pods # View Pod resource requests/limits kubectl describe pod my-pod # Check if Pod was OOMKilled kubectl describe pod my-pod | grep -i oom

Resource Best Practices

  • Always set requests for production Pods
  • Set limits for memory to prevent OOM on nodes
  • Monitor actual usage with kubectl top and adjust
  • Use Burstable QoS for most workloads (requests < limits)
  • Use Guaranteed QoS for critical services
  • Start conservative and tune based on metrics

Common Resource Issues

  • Pods stuck Pending: No node has enough resources to fit requests
  • OOMKilled: Container exceeded memory limit
  • CPU throttling: Application slow because hitting CPU limit
  • No requests set: Scheduler can't make good decisions
Lesson 4 of 4

Important Notes & Practical Exercises

Correction: Docker EXPOSE Directive

An important clarification about Docker networking:

EXPOSE Does NOT Open Ports

The EXPOSE directive in a Dockerfile does not actually open a port at the network level. It serves only as documentation to indicate which ports the application uses.

# In Dockerfile EXPOSE 8080 # This only documents that port 8080 is used # It does NOT: # - Open the port in firewall # - Make the port accessible from outside # - Configure any network settings # To actually expose the port, use: docker run -p 8080:8080 my-image # In Kubernetes, define ports in Service object, not just Pod spec

What EXPOSE Actually Does

  • Serves as documentation for developers
  • Used by docker ps to display ports
  • Can be used by container orchestrators for port mapping
  • No actual network effect by itself

Understanding the Container Runtime

Kubernetes delegates actual container execution to the container runtime (Docker, containerd, CRI-O). Understanding this delegation is crucial for troubleshooting.

How Kubernetes Uses cgroups

Resource limits and requests are enforced using Linux cgroups (control groups):

# When you set in Kubernetes: resources: limits: memory: "512Mi" cpu: "1000m" # Kubernetes tells the container runtime to create cgroups: # - memory.limit_in_bytes = 536870912 (512 * 1024 * 1024) # - cpu.cfs_quota_us = 100000 (1 CPU worth) # You can inspect these directly on the node: cat /sys/fs/cgroup/memory/kubepods/pod//memory.limit_in_bytes cat /sys/fs/cgroup/cpu/kubepods/pod//cpu.cfs_quota_us

Viewing Underlying Docker Containers

# SSH to a node and view Docker containers docker ps # You'll see containers created by Kubernetes # Each Pod's containers plus pause containers # View container resource constraints docker inspect | grep -A 10 "Memory" # This shows how Kubernetes delegated work to Docker

Kubernetes Quality of Service (QoS)

As covered in Lesson 3, understanding QoS classes is crucial for production deployments:

Recommended Reading

Review the official Kubernetes documentation on:

  • Quality of Service Classes: How Kubernetes assigns QoS
  • Resource Management: Best practices for requests and limits
  • Pod Priority and Preemption: Advanced scheduling features
  • Node Pressure Eviction: How nodes handle resource shortages

Practical Homework Assignments

Assignment 1: Implement Liveness Probe

Goal: Add a Liveness Probe to a Deployment and observe its behavior.

# 1. Create a Deployment with a failing liveness probe kubectl apply -f - < # Expected: Container restarts when /tmp/healthy is removed

Assignment 2: Explore Docker and cgroups

Goal: Understand how Kubernetes delegates work to Docker.

# 1. Create a Pod with resource limits kubectl run resource-demo \ --image=nginx \ --requests='cpu=100m,memory=128Mi' \ --limits='cpu=200m,memory=256Mi' # 2. Find the node where Pod is running kubectl get pod resource-demo -o wide # 3. SSH to that node and find Docker container docker ps | grep resource-demo # 4. Inspect container resources docker inspect | grep -i memory docker inspect | grep -i cpu # 5. Check cgroups (if you have access) find /sys/fs/cgroup -name "**" cat /sys/fs/cgroup/memory/kubepods/.../memory.limit_in_bytes

Assignment 3: Study Kubernetes Documentation

Goal: Familiarize yourself with the structure and content of official docs.

  1. Visit kubernetes.io/docs
  2. Read Configure Quality of Service for Pods
  3. Understand the three QoS classes and eviction behavior
  4. Review Managing Resources for Containers
  5. Study the documentation structure for future reference

Production-Ready Checklist

Before deploying to production, ensure:

  • ✓ Using Deployments (not bare Pods)
  • ✓ Liveness probes configured
  • ✓ Readiness probes configured
  • ✓ Resource requests set (for scheduling)
  • ✓ Resource limits set (for protection)
  • ✓ Rolling update strategy configured
  • ✓ Multiple replicas for high availability
  • ✓ Proper labels and selectors
  • ✓ Version tags (not :latest)

Key Takeaways

Summary: Production-Ready Kubernetes

  1. Deployments manage stateless apps with declarative updates and rollbacks
  2. Liveness Probes detect broken apps and restart containers
  3. Readiness Probes prevent traffic to unready Pods
  4. Resource Requests guarantee minimum resources and enable scheduling
  5. Resource Limits cap maximum usage and prevent resource starvation
  6. QoS Classes determine eviction priority under resource pressure
  7. cgroups are how resource limits are actually enforced

Next Topics to Explore

  • Services: Networking and load balancing
  • ConfigMaps & Secrets: Configuration management
  • Volumes & PersistentVolumes: Storage
  • StatefulSets: Stateful applications
  • Ingress: HTTP routing and SSL/TLS
  • RBAC: Security and access control
  • Helm: Package management
Final Assessment

Test Your Knowledge

Production-Ready Kubernetes Quiz

Question 1: What is the primary purpose of a Deployment?

Question 2: What happens when a Liveness Probe fails?

Question 3: What happens when a Readiness Probe fails?

Question 4: What is a resource "request" in Kubernetes?

Question 5: What happens if a container exceeds its memory limit?

Question 6: What is the highest priority QoS class?

Question 7: What does the EXPOSE directive in a Dockerfile actually do?

Question 8: What is the default update strategy for Deployments?