Kubernetes Deployments
What is a Deployment?
A Deployment is a Kubernetes object that provides declarative updates for Pods and ReplicaSets. It's the standard way to manage stateless applications in production environments.
Deployment Purpose
Deployments allow you to:
- Declare the desired state of your application
- Roll out updates with zero downtime
- Roll back to previous versions if needed
- Scale applications up or down
- Pause and resume deployments
Managing Stateless Applications
Deployments are designed specifically for stateless applications—applications where each instance is identical and interchangeable:
- Web servers (nginx, Apache)
- API servers
- Microservices
- Worker processes
Stateful vs. Stateless
Stateless: No persistent data stored locally. Any Pod can handle any request.
Stateful: Applications like databases that require persistent storage and stable network identities. These use StatefulSets, not Deployments.
Deployment YAML Structure
A typical Deployment manifest contains the desired state specification:
Declaring the Desired State
The Deployment continuously works to maintain the state you've declared:
Update Strategies
Deployments support two update strategies:
1. Rolling Update (Default)
Gradually replaces old Pods with new ones, ensuring availability:
- maxSurge: How many extra Pods can exist during update (e.g., 1 means 4 Pods for a 3-replica deployment)
- maxUnavailable: How many Pods can be down during update (0 means always keep all replicas available)
2. Recreate
Terminates all old Pods before creating new ones (causes downtime):
When to Use Recreate
Use Recreate strategy when:
- Your application can't run multiple versions simultaneously
- Resource constraints prevent running extra Pods
- Downtime is acceptable
Handling Updates
To update your application, modify the Deployment spec and apply:
Rollback Capabilities
If an update causes issues, easily rollback:
Deployment Best Practices
- Always use version tags, never
:latest - Set appropriate resource requests and limits (covered in Lesson 3)
- Configure health probes (covered in Lesson 2)
- Use Rolling Updates with maxUnavailable=0 for zero downtime
- Test rollouts in staging before production
Kubernetes Probes: Health Checks
Why Health Checks Matter
Applications can fail in various ways. A container might be running, but the application inside could be:
- Deadlocked or frozen
- Unable to handle requests
- Still initializing
- Overloaded and unresponsive
Kubernetes probes allow you to configure automated health checks to detect and respond to these conditions.
What Are Probes?
Probes are diagnostic checks performed periodically by Kubernetes to determine the state of containers. Based on probe results, Kubernetes can automatically restart containers or stop sending traffic to them.
Types of Probes
1. Liveness Probe
Determines if the application is running and healthy.
Liveness Probe Behavior
If liveness probe fails: Kubernetes assumes the application is stuck or broken and automatically restarts the container.
Use case: Detect when an application is in an unrecoverable state and needs to be restarted.
2. Readiness Probe
Determines if the application is ready to serve user traffic.
Readiness Probe Behavior
If readiness probe fails: Kubernetes removes the Pod from Service endpoints, stopping traffic from being sent to it. The container is NOT restarted.
Use case: Prevent traffic from reaching Pods that are still initializing or temporarily unable to serve requests.
3. Startup Probe (Kubernetes 1.16+)
Used for slow-starting containers. Disables liveness and readiness checks until the app starts.
Probe Lifecycle
Container Starts → Startup Probe (if configured) → Readiness Probe → Liveness Probe
Probe Mechanisms
Probes can check health in three ways:
1. HTTP GET
Most common for web applications:
2. TCP Socket
Check if a port is accepting connections:
3. Exec Command
Execute a command inside the container:
Probe Configuration Parameters
| Parameter | Description | Default |
|---|---|---|
initialDelaySeconds |
Wait time before first probe | 0 |
periodSeconds |
How often to perform probe | 10 |
timeoutSeconds |
Timeout for probe response | 1 |
successThreshold |
Successes needed to be considered healthy | 1 |
failureThreshold |
Failures needed to take action | 3 |
Complete Example with Both Probes
Common Mistakes
- Setting initialDelaySeconds too low: Container hasn't started yet, causing restart loops
- Using same endpoint for liveness and readiness: Different purposes need different checks
- Heavy probe operations: Probes should be lightweight and fast
- No probes at all: Kubernetes can't detect unhealthy containers
Probe Best Practices
- Always configure both probes for production applications
- Liveness: Check critical dependencies (database connection, etc.)
- Readiness: Check if ready to handle requests (warm-up complete, etc.)
- Keep probes lightweight: They run frequently
- Set appropriate thresholds: Balance between quick detection and false positives
Resource Management: Requests & Limits
Why Resource Management Matters
In a Kubernetes cluster, multiple applications share the same physical resources (CPU and memory). Without proper resource management:
- One application can starve others of resources
- Nodes can run out of memory, causing crashes
- The scheduler can't make informed decisions about Pod placement
- Resource contention leads to unpredictable performance
Resource Concepts
Kubernetes uses two mechanisms to manage resources:
- Requests: Guaranteed minimum resources (used for scheduling)
- Limits: Maximum resources a container can use (enforced caps)
Resource Requests
A request is the amount of CPU and memory that Kubernetes guarantees to a container.
How Requests Work
- Used by the Kubernetes scheduler to decide which node can run the Pod
- The scheduler ensures the node has enough unreserved resources
- Resources are reserved for the container, even if not fully used
- Container is guaranteed to get at least this much
CPU Units
1CPU = 1 vCPU/Core (AWS vCPU, GCP Core, Azure vCore, etc.)1000m= 1 CPU (m = millicores)500m= 0.5 CPU = half a core100m= 0.1 CPU = 10% of a core
Memory Units
128974848(bytes)129Mor129e6(megabytes, 1000-based)128Mi(mebibytes, 1024-based) - Recommended1Gi= 1024 MiB
Resource Limits
A limit is the maximum amount of resources a container can consume.
How Limits Work
- CPU: Container is throttled if it tries to use more than the limit
- Memory: Container is killed (OOMKilled) if it exceeds the limit
- Limits are enforced by the container runtime (Docker/containerd)
- Containers can use less than the limit, but never more
Memory vs. CPU Behavior
CPU (compressible): Container is throttled (slowed down) if exceeding limit. No crash.
Memory (incompressible): Container is terminated (OOMKilled) if exceeding limit. This causes restarts!
Scheduling with Requests
The scheduler uses requests to make placement decisions:
Quality of Service (QoS) Classes
Based on how you configure requests and limits, Kubernetes assigns a QoS class that determines eviction priority when nodes run out of resources.
1. Guaranteed (Highest Priority)
Pod with requests = limits for all containers:
- Priority: Last to be evicted
- Use case: Critical production workloads
2. Burstable (Medium Priority)
Pod with requests < limits (or only requests set):
- Priority: Evicted after BestEffort
- Use case: Most applications (can burst when resources available)
3. BestEffort (Lowest Priority)
Pod with no requests or limits:
- Priority: First to be evicted
- Use case: Low-priority batch jobs
Eviction Order (Under Resource Pressure)
When a node runs out of resources:
- BestEffort Pods are killed first
- Burstable Pods using more than requested are killed next
- Guaranteed Pods are killed last (only if system processes need resources)
Complete Deployment with Resources
Monitoring Resource Usage
Resource Best Practices
- Always set requests for production Pods
- Set limits for memory to prevent OOM on nodes
- Monitor actual usage with
kubectl topand adjust - Use Burstable QoS for most workloads (requests < limits)
- Use Guaranteed QoS for critical services
- Start conservative and tune based on metrics
Common Resource Issues
- Pods stuck Pending: No node has enough resources to fit requests
- OOMKilled: Container exceeded memory limit
- CPU throttling: Application slow because hitting CPU limit
- No requests set: Scheduler can't make good decisions
Important Notes & Practical Exercises
Correction: Docker EXPOSE Directive
An important clarification about Docker networking:
EXPOSE Does NOT Open Ports
The EXPOSE directive in a Dockerfile does not actually open a port at the network level. It serves only as documentation to indicate which ports the application uses.
What EXPOSE Actually Does
- Serves as documentation for developers
- Used by
docker psto display ports - Can be used by container orchestrators for port mapping
- No actual network effect by itself
Understanding the Container Runtime
Kubernetes delegates actual container execution to the container runtime (Docker, containerd, CRI-O). Understanding this delegation is crucial for troubleshooting.
How Kubernetes Uses cgroups
Resource limits and requests are enforced using Linux cgroups (control groups):
Viewing Underlying Docker Containers
Kubernetes Quality of Service (QoS)
As covered in Lesson 3, understanding QoS classes is crucial for production deployments:
Recommended Reading
Review the official Kubernetes documentation on:
- Quality of Service Classes: How Kubernetes assigns QoS
- Resource Management: Best practices for requests and limits
- Pod Priority and Preemption: Advanced scheduling features
- Node Pressure Eviction: How nodes handle resource shortages
Practical Homework Assignments
Assignment 1: Implement Liveness Probe
Goal: Add a Liveness Probe to a Deployment and observe its behavior.
Assignment 2: Explore Docker and cgroups
Goal: Understand how Kubernetes delegates work to Docker.
Assignment 3: Study Kubernetes Documentation
Goal: Familiarize yourself with the structure and content of official docs.
- Visit kubernetes.io/docs
- Read Configure Quality of Service for Pods
- Understand the three QoS classes and eviction behavior
- Review Managing Resources for Containers
- Study the documentation structure for future reference
Production-Ready Checklist
Before deploying to production, ensure:
- ✓ Using Deployments (not bare Pods)
- ✓ Liveness probes configured
- ✓ Readiness probes configured
- ✓ Resource requests set (for scheduling)
- ✓ Resource limits set (for protection)
- ✓ Rolling update strategy configured
- ✓ Multiple replicas for high availability
- ✓ Proper labels and selectors
- ✓ Version tags (not :latest)
Key Takeaways
Summary: Production-Ready Kubernetes
- Deployments manage stateless apps with declarative updates and rollbacks
- Liveness Probes detect broken apps and restart containers
- Readiness Probes prevent traffic to unready Pods
- Resource Requests guarantee minimum resources and enable scheduling
- Resource Limits cap maximum usage and prevent resource starvation
- QoS Classes determine eviction priority under resource pressure
- cgroups are how resource limits are actually enforced
Next Topics to Explore
- Services: Networking and load balancing
- ConfigMaps & Secrets: Configuration management
- Volumes & PersistentVolumes: Storage
- StatefulSets: Stateful applications
- Ingress: HTTP routing and SSL/TLS
- RBAC: Security and access control
- Helm: Package management