Kubernetes Jobs & Security

Jobs, CronJobs, RBAC & Best Practices

0%
Lesson 1 of 4

Kubernetes Jobs: Run-to-Completion Workloads

Understanding Jobs

Unlike Deployments which run long-running services, Jobs are designed for applications that need to complete their work and terminate.

Job vs Deployment

Aspect Deployment Job
Purpose Long-running services Run-to-completion tasks
Lifecycle Runs indefinitely Runs until task completes
Restart Policy Always restart on failure Restart until success or retry limit
Success Criteria None (keeps running) Exit code 0 (successful completion)
Examples Web servers, APIs, databases Batch jobs, data processing, migrations

How Jobs Work

Job Completion Criteria

A Job continuously executes the Pod it creates until the Pod successfully completes its task and exits with a successful result code (exit code 0).

1. Job Created
kubectl apply -f job.yaml
2. Pod Started
Job controller creates Pod to run task
3. Task Executes
Container runs to completion
4. Check Exit Code
Did container exit with code 0?
If Exit Code = 0: SUCCESS
Job marked as Complete, Pod kept for logs
If Exit Code ≠ 0: FAILURE
Retry Pod (up to backoffLimit)

Basic Job Manifest

apiVersion: batch/v1 kind: Job metadata: name: data-migration spec: # Number of successful completions required completions: 1 # Run pods in parallel (default: 1) parallelism: 1 # Number of retries before marking Job failed backoffLimit: 3 # Pod template template: spec: restartPolicy: Never # or OnFailure containers: - name: migration image: my-migration-tool:1.0 command: - /bin/sh - -c - | echo "Starting data migration..." # Do the actual work migrate-data --source=old-db --dest=new-db echo "Migration complete!" # Exit 0 for success exit 0

Job Parameters Explained

1. completions

spec: completions: 3 # Job needs 3 successful Pod completions # Use cases: # - Process 3 batches of data # - Run 3 different migration tasks # - Split work into 3 chunks

2. parallelism

spec: completions: 10 parallelism: 3 # Run 3 Pods at a time # Execution: # - Pods 1, 2, 3 run in parallel # - When Pod 1 completes, Pod 4 starts # - Continue until all 10 completions achieved

3. backoffLimit

spec: backoffLimit: 4 # Retry up to 4 times on failure # If Pod fails: # - Retry 1: wait 10s # - Retry 2: wait 20s # - Retry 3: wait 40s # - Retry 4: wait 80s (max 6 minutes) # After 4 failures: Job marked as Failed

Job Operations

Create and Monitor Job

# Create Job kubectl apply -f job.yaml # Watch Job status kubectl get jobs -w NAME COMPLETIONS DURATION AGE data-migration 0/1 5s 5s data-migration 1/1 45s 45s ← Complete! # View Job details kubectl describe job data-migration # View Job Pods kubectl get pods -l job-name=data-migration NAME READY STATUS RESTARTS AGE data-migration-abc123 0/1 Completed 0 2m # View logs from completed Job kubectl logs data-migration-abc123

Delete Job

# Delete Job and its Pods kubectl delete job data-migration # Keep completed Pods for manual cleanup kubectl delete job data-migration --cascade=orphan

Common Job Patterns

1. Simple One-Off Task

apiVersion: batch/v1 kind: Job metadata: name: database-backup spec: template: spec: restartPolicy: Never containers: - name: backup image: postgres:14 command: - /bin/sh - -c - | pg_dump -h $DB_HOST -U $DB_USER $DB_NAME > /backup/db.sql aws s3 cp /backup/db.sql s3://backups/$(date +%Y%m%d).sql env: - name: DB_HOST value: "postgres.default.svc" - name: DB_USER valueFrom: secretKeyRef: name: db-credentials key: username volumeMounts: - name: backup mountPath: /backup volumes: - name: backup emptyDir: {}

2. Parallel Processing

apiVersion: batch/v1 kind: Job metadata: name: batch-processor spec: completions: 100 # Process 100 items parallelism: 10 # Run 10 workers in parallel template: spec: restartPolicy: OnFailure containers: - name: processor image: batch-processor:1.0 command: - python - process.py - --queue - work-queue

3. Work Queue Pattern

apiVersion: batch/v1 kind: Job metadata: name: queue-worker spec: parallelism: 5 # Run 5 workers completions: null # Run until queue is empty template: spec: restartPolicy: OnFailure containers: - name: worker image: worker:1.0 command: - /worker - --queue-url=$(QUEUE_URL)

Job Use Cases

Common Job Applications

  • Data migration: Move data between databases
  • Batch processing: Process large datasets
  • Backups: Database or file system backups
  • Report generation: Create periodic reports
  • Data transformation: ETL (Extract, Transform, Load) tasks
  • Testing: Run test suites
  • Cleanup: Delete old data or files
  • Imports: Load data from external sources

Jobs in CI/CD Pipelines

A significant application of Jobs is in Continuous Integration/Continuous Deployment (CI/CD) pipelines.

CI/CD Use Case: Automated Test Environments

A Job can be automatically triggered upon creation of a new branch in a repository to:

  1. Create temporary testing environment
  2. Store credentials in Kubernetes Secrets or ConfigMaps
  3. Provision complete isolated environment for testing and debugging
  4. Run automated tests
  5. Clean up resources when done
# Example: Create test environment on new branch apiVersion: batch/v1 kind: Job metadata: name: create-test-env-feature-xyz spec: template: spec: restartPolicy: Never containers: - name: setup image: kubectl:latest command: - /bin/sh - -c - | # Create namespace for this branch kubectl create namespace feature-xyz # Create secrets kubectl create secret generic db-creds \ --from-literal=password=$(generate-password) \ -n feature-xyz # Deploy application helm install myapp ./charts/myapp \ --namespace feature-xyz \ --set image.tag=feature-xyz # Run smoke tests run-tests --namespace feature-xyz echo "Test environment ready at https://feature-xyz.example.com"

Helm Chart Hooks

Jobs are integral to customizing deployments using Helm Charts via Hooks.

# Helm Job Hook Example apiVersion: batch/v1 kind: Job metadata: name: {{ .Release.Name }}-db-migration annotations: # Run BEFORE install/upgrade "helm.sh/hook": pre-install,pre-upgrade # Delete Job after success "helm.sh/hook-delete-policy": hook-succeeded spec: template: spec: restartPolicy: Never containers: - name: migration image: {{ .Values.migration.image }} command: - migrate - up env: - name: DATABASE_URL valueFrom: secretKeyRef: name: {{ .Release.Name }}-db key: url

Common Helm Hook Types

  • pre-install: Before resources are created
  • post-install: After all resources are created
  • pre-upgrade: Before upgrade is applied
  • post-upgrade: After upgrade completes
  • pre-delete: Before resources are deleted
  • post-delete: After resources are deleted

Job Best Practices

  • Set appropriate backoffLimit for retries
  • Use restartPolicy: Never or OnFailure
  • Clean up completed Jobs to avoid clutter
  • Set ttlSecondsAfterFinished for automatic cleanup
  • Use resource limits to prevent runaway Jobs
  • Monitor Job status and set up alerts
  • Store logs before Job cleanup
Lesson 2 of 4

CronJobs: Scheduled Tasks

What are CronJobs?

CronJobs provide the mechanism to schedule Jobs to run periodically, similar to the traditional Unix cron utility.

CronJob Purpose

CronJobs create Jobs on a repeating schedule:

  • Time-based execution: Run at specific times/intervals
  • Automated recurring tasks: No manual intervention
  • Cron syntax: Familiar scheduling format

CronJob Manifest

apiVersion: batch/v1 kind: CronJob metadata: name: nightly-backup spec: # Schedule in cron format schedule: "0 2 * * *" # Every day at 2:00 AM # Optional: Time zone timeZone: "America/New_York" # Job template jobTemplate: spec: template: spec: restartPolicy: OnFailure containers: - name: backup image: backup-tool:1.0 command: - /backup.sh # Keep last 3 successful Jobs successfulJobsHistoryLimit: 3 # Keep last 1 failed Job failedJobsHistoryLimit: 1 # Prevent concurrent runs concurrencyPolicy: Forbid

Cron Schedule Syntax

Cron Format

# ┌───────────── minute (0 - 59) # │ ┌───────────── hour (0 - 23) # │ │ ┌───────────── day of month (1 - 31) # │ │ │ ┌───────────── month (1 - 12) # │ │ │ │ ┌───────────── day of week (0 - 6) (Sunday=0) # │ │ │ │ │ # * * * * *

Common Cron Schedules

Schedule Description Cron Expression
Every minute Runs every minute * * * * *
Every 5 minutes 0, 5, 10, 15, etc. */5 * * * *
Every hour Top of every hour 0 * * * *
Every day at 2 AM Daily at 2:00 0 2 * * *
Every Monday at 9 AM Weekly on Monday 0 9 * * 1
First day of month Monthly at midnight 0 0 1 * *
Weekdays at 6 PM Mon-Fri at 18:00 0 18 * * 1-5

CronJob Parameters

1. concurrencyPolicy

spec: concurrencyPolicy: Allow # Default: Allow concurrent runs # Options: # - Allow: Allow concurrent Jobs # - Forbid: Skip new run if previous still running # - Replace: Cancel current and start new

Concurrency Example

# Scenario: Job scheduled every minute, takes 2 minutes to complete concurrencyPolicy: Allow - Minute 1: Job 1 starts - Minute 2: Job 2 starts (Job 1 still running) - Minute 3: Job 3 starts (Job 1 & 2 still running) Result: 3 Jobs running simultaneously concurrencyPolicy: Forbid - Minute 1: Job 1 starts - Minute 2: Skipped (Job 1 still running) - Minute 3: Job 2 starts (Job 1 finished) Result: Never more than 1 Job running concurrencyPolicy: Replace - Minute 1: Job 1 starts - Minute 2: Job 1 cancelled, Job 2 starts - Minute 3: Job 2 cancelled, Job 3 starts Result: Only latest Job runs

2. startingDeadlineSeconds

spec: startingDeadlineSeconds: 300 # 5 minutes # If CronJob misses scheduled time (cluster down, etc.): # - Try to start within 300 seconds of scheduled time # - If deadline passed, count as missed # - Prevents backlog of old Jobs

3. suspend

spec: suspend: true # Temporarily disable CronJob # Use cases: # - Maintenance windows # - Debugging # - Pause without deleting

CronJob Examples

1. Database Backup

apiVersion: batch/v1 kind: CronJob metadata: name: postgres-backup spec: schedule: "0 2 * * *" # Daily at 2 AM concurrencyPolicy: Forbid successfulJobsHistoryLimit: 7 # Keep 1 week jobTemplate: spec: template: spec: restartPolicy: OnFailure containers: - name: backup image: postgres:14 command: - /bin/sh - -c - | TIMESTAMP=$(date +%Y%m%d_%H%M%S) pg_dump -h postgres -U admin mydb | \ gzip > /backups/backup_${TIMESTAMP}.sql.gz # Upload to S3 aws s3 cp /backups/backup_${TIMESTAMP}.sql.gz \ s3://backups/postgres/ # Delete local copy rm /backups/backup_${TIMESTAMP}.sql.gz env: - name: PGPASSWORD valueFrom: secretKeyRef: name: postgres-creds key: password volumeMounts: - name: backups mountPath: /backups volumes: - name: backups emptyDir: {}

2. Log Rotation

apiVersion: batch/v1 kind: CronJob metadata: name: log-cleanup spec: schedule: "0 0 * * 0" # Weekly on Sunday midnight jobTemplate: spec: template: spec: restartPolicy: OnFailure containers: - name: cleanup image: busybox command: - /bin/sh - -c - | # Delete logs older than 30 days find /logs -name "*.log" -mtime +30 -delete echo "Cleanup complete" volumeMounts: - name: logs mountPath: /logs volumes: - name: logs hostPath: path: /var/log/myapp

3. Health Check Reporter

apiVersion: batch/v1 kind: CronJob metadata: name: health-report spec: schedule: "0 9 * * 1" # Monday at 9 AM jobTemplate: spec: template: spec: restartPolicy: Never containers: - name: reporter image: health-checker:1.0 command: - python - report.py - --email - ops@example.com

4. Certificate Renewal

apiVersion: batch/v1 kind: CronJob metadata: name: cert-renewal spec: schedule: "0 0 1 * *" # First of month jobTemplate: spec: template: spec: restartPolicy: OnFailure containers: - name: certbot image: certbot/certbot command: - certbot - renew - --quiet

CronJob Operations

Create and Monitor CronJob

# Create CronJob kubectl apply -f cronjob.yaml # List CronJobs kubectl get cronjobs NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE nightly-backup 0 2 * * * False 0 8h 5d # Describe CronJob kubectl describe cronjob nightly-backup # View Jobs created by CronJob kubectl get jobs -l cronjob=nightly-backup # Manually trigger CronJob (create Job immediately) kubectl create job manual-backup --from=cronjob/nightly-backup

Suspend/Resume CronJob

# Suspend CronJob kubectl patch cronjob nightly-backup -p '{"spec":{"suspend":true}}' # Resume CronJob kubectl patch cronjob nightly-backup -p '{"spec":{"suspend":false}}'

CronJob Considerations

Important Notes

  • Idempotency: Jobs should be idempotent (safe to run multiple times)
  • Missed runs: CronJobs may miss schedules if cluster is down
  • Timezone: Default is controller manager's timezone (use timeZone field)
  • Concurrency: Set appropriate concurrencyPolicy
  • History limits: Clean up old Jobs to avoid clutter

CronJob Best Practices

  • Make Jobs idempotent (can run multiple times safely)
  • Set concurrencyPolicy: Forbid for non-overlapping tasks
  • Use startingDeadlineSeconds to prevent backlogs
  • Keep history limits reasonable (3-7 successful, 1-3 failed)
  • Set resource limits on Job Pods
  • Monitor CronJob execution and failures
  • Test schedule syntax before deploying
  • Document what each CronJob does
Lesson 3 of 4

RBAC: Role-Based Access Control

Understanding RBAC

RBAC (Role-Based Access Control) is the system implemented in Kubernetes to govern permissions, allowing cluster operators to strictly define what users and service accounts can and cannot do.

RBAC Purpose

RBAC manages the distribution of user rights and access to various components within the Kubernetes cluster:

  • Security: Prevent unauthorized access
  • Least privilege: Give minimum necessary permissions
  • Granular control: Fine-grained access policies
  • Audit: Track who can do what

RBAC Core Concepts

1. Subjects (Who)

Entities that can perform actions:

  • Users: Human users (developers, operators)
  • Groups: Collections of users
  • ServiceAccounts: Accounts for Pods/applications

2. Resources (What)

Kubernetes API resources:

  • Pods, Deployments, Services, ConfigMaps, Secrets
  • Namespaces, Nodes, PersistentVolumes
  • Custom Resources

3. Verbs (Actions)

Operations that can be performed:

  • get - Read individual resource
  • list - Read multiple resources
  • watch - Watch for changes
  • create - Create new resources
  • update - Modify existing resources
  • patch - Partially modify resources
  • delete - Delete resources
  • deletecollection - Delete multiple resources

RBAC Components

Role / ClusterRole
Defines WHAT actions are allowed on WHICH resources
+
RoleBinding / ClusterRoleBinding
Binds a Role to WHO (users, groups, service accounts)
=
Access Control
Subject can perform allowed actions

Role vs ClusterRole

Aspect Role ClusterRole
Scope Namespace-specific Cluster-wide
Resources Namespaced resources only All resources (including cluster-scoped)
Use Case Team/project access Admin access, cluster resources

Creating Roles

Example 1: Read-Only Access

apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: pod-reader namespace: development rules: - apiGroups: [""] # "" indicates core API group resources: ["pods", "pods/log"] verbs: ["get", "list", "watch"]

Example 2: Deployment Manager

apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: deployment-manager namespace: development rules: - apiGroups: ["apps"] resources: ["deployments", "replicasets"] verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] - apiGroups: [""] resources: ["pods", "pods/log"] verbs: ["get", "list", "watch"]

Example 3: ClusterRole for Node Access

apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: node-reader rules: - apiGroups: [""] resources: ["nodes"] verbs: ["get", "list", "watch"] - apiGroups: [""] resources: ["nodes/status"] verbs: ["get"]

Creating RoleBindings

Bind Role to User

apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: read-pods namespace: development subjects: - kind: User name: jane@example.com apiGroup: rbac.authorization.k8s.io roleRef: kind: Role name: pod-reader apiGroup: rbac.authorization.k8s.io

Bind Role to Group

apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: developers-deployment-access namespace: development subjects: - kind: Group name: developers apiGroup: rbac.authorization.k8s.io roleRef: kind: Role name: deployment-manager apiGroup: rbac.authorization.k8s.io

Bind ClusterRole to ServiceAccount

apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: monitoring-cluster-reader subjects: - kind: ServiceAccount name: prometheus namespace: monitoring roleRef: kind: ClusterRole name: cluster-reader apiGroup: rbac.authorization.k8s.io

Real-World RBAC Example

Scenario: Multi-Environment Access Control

Organization has development and production environments:

  • Developers: Full access to development, read-only to production
  • Testers: Access only to development
  • DevOps: Full access to all environments
# Role: Full access in development apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: dev-full-access namespace: development rules: - apiGroups: ["*"] resources: ["*"] verbs: ["*"] --- # Role: Read-only in production apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: prod-read-only namespace: production rules: - apiGroups: ["*"] resources: ["*"] verbs: ["get", "list", "watch"] --- # Binding: Developers to development (full access) apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: developers-dev-access namespace: development subjects: - kind: Group name: developers apiGroup: rbac.authorization.k8s.io roleRef: kind: Role name: dev-full-access apiGroup: rbac.authorization.k8s.io --- # Binding: Developers to production (read-only) apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: developers-prod-access namespace: production subjects: - kind: Group name: developers apiGroup: rbac.authorization.k8s.io roleRef: kind: Role name: prod-read-only apiGroup: rbac.authorization.k8s.io --- # Binding: Testers to development only apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: testers-dev-access namespace: development subjects: - kind: Group name: testers apiGroup: rbac.authorization.k8s.io roleRef: kind: Role name: dev-full-access apiGroup: rbac.authorization.k8s.io

ServiceAccounts for Pods

Pods use ServiceAccounts to authenticate with the API server:

# Create ServiceAccount apiVersion: v1 kind: ServiceAccount metadata: name: my-app namespace: default --- # Create Role for ServiceAccount apiVersion: rbac.authorization.k8s.io/v1 kind: Role metadata: name: configmap-reader namespace: default rules: - apiGroups: [""] resources: ["configmaps"] verbs: ["get", "list"] --- # Bind Role to ServiceAccount apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: my-app-configmap-access namespace: default subjects: - kind: ServiceAccount name: my-app namespace: default roleRef: kind: Role name: configmap-reader apiGroup: rbac.authorization.k8s.io --- # Use ServiceAccount in Pod apiVersion: v1 kind: Pod metadata: name: my-app spec: serviceAccountName: my-app # Use this ServiceAccount containers: - name: app image: my-app:1.0

Testing RBAC

# Check if user can perform action kubectl auth can-i create deployments --namespace=development # Check for another user kubectl auth can-i create deployments \ --namespace=development \ --as=jane@example.com # Check all permissions for a user kubectl auth can-i --list --as=jane@example.com

Common RBAC Patterns

1. View Role (Read-Only)

kubectl create role viewer \ --verb=get,list,watch \ --resource=pods,services,deployments \ --namespace=development

2. Edit Role (Read-Write)

kubectl create role editor \ --verb=get,list,watch,create,update,patch,delete \ --resource=pods,services,deployments,configmaps \ --namespace=development

3. Admin Role (Full Access)

kubectl create role admin \ --verb=* \ --resource=* \ --namespace=development

RBAC Best Practices

  • Least privilege: Grant minimum necessary permissions
  • Use Roles for namespaces: Not ClusterRoles when possible
  • Regular audits: Review permissions periodically
  • Groups over individuals: Manage group memberships
  • ServiceAccounts for Pods: Don't use default SA
  • Test before applying: Use kubectl auth can-i
  • Document permissions: Keep track of who has what access
  • Avoid wildcards: Be specific about resources and verbs
Lesson 4 of 4

DNS & Service Access Best Practices

Service Access Methods

There are multiple ways for applications to access Services in Kubernetes. Understanding the trade-offs is important for performance.

Method 1: DNS Round Robin

DNS Round Robin Drawbacks

While DNS seems like a natural choice, it has operational drawbacks for service access within the cluster.

How DNS Round Robin Works

# Service with 3 Pods kubectl get svc webapp NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) webapp ClusterIP 10.96.100.50 80/TCP # DNS query returns multiple IPs (Pod IPs) nslookup webapp.default.svc.cluster.local Name: webapp.default.svc.cluster.local Address: 10.244.1.5 # Pod 1 Address: 10.244.2.8 # Pod 2 Address: 10.244.3.12 # Pod 3 # Client randomly selects one IP from list

Problem 1: System Latency

Up to 20 Seconds of Latency

When a Pod goes down, the maximum time required for all components to register the change and stop sending traffic to the failed Pod can be up to 20 seconds.

T = 0s
Pod Crashes
Pod 2 (10.244.2.8) becomes unavailable
T = 0-5s
Kubelet Detects Failure
Readiness probe fails, kubelet notifies API Server
T = 5-10s
API Server Updates Endpoints
Failed Pod removed from Service endpoints
T = 10-15s
kube-proxy Updates iptables
Network rules updated on all nodes
T = 15-20s
DNS Records Update
DNS server (CoreDNS) updates A records

Impact: During this 20-second window, clients may still try to connect to the failed Pod, resulting in connection errors.

Problem 2: DNS TTL Overhead

Constant DNS Query Load

When using DNS Round Robin, the Time-To-Live (TTL) for DNS records is often set to a short duration (e.g., 5 seconds) to quickly reflect changes.

This forces client applications to repeatedly send queries to the DNS server every 5 seconds, placing significant and unnecessary load on DNS infrastructure.

# Short TTL forces frequent DNS queries # TTL = 5 seconds Client Application Loop: 1. Query DNS for webapp.default.svc.cluster.local 2. Receive list of IPs (TTL: 5 seconds) 3. Use IPs for requests 4. Wait 5 seconds 5. Query DNS again (refresh) 6. Repeat indefinitely Load Impact: - 1 client: 12 queries/minute - 100 clients: 1,200 queries/minute - 1,000 clients: 12,000 queries/minute # This places heavy load on: # - Local CoreDNS pods # - Potential upstream DNS servers # - Network bandwidth

Method 2: ClusterIP with kube-proxy (Recommended)

More Viable Scheme

Relying on the cluster's internal network address translation (NAT), typically managed by kube-proxy, is a more viable scheme for service resolution compared to the high overhead and latency risks of DNS Round Robin.

How ClusterIP/kube-proxy Works

1. Client Uses Service Name
Application connects to webapp.default.svc.cluster.local
2. DNS Returns ClusterIP (Once)
DNS query returns stable ClusterIP: 10.96.100.50
TTL can be long (30s+) since IP doesn't change
3. Client Sends to ClusterIP
All requests go to 10.96.100.50:80
4. kube-proxy Intercepts (iptables)
iptables rules on local node intercept traffic
5. Load Balanced to Healthy Pod
Traffic routed to one of the healthy Pods
Failed Pods automatically excluded

Benefits of ClusterIP/kube-proxy

  • Faster failover: kube-proxy updates iptables rules immediately when Endpoints change
  • Lower DNS load: ClusterIP is stable, so DNS queries can have long TTL
  • Automatic load balancing: iptables rules distribute traffic
  • No client-side logic: Transparent to application
  • Better performance: Kernel-level routing (iptables/IPVS)

Comparison: DNS Round Robin vs ClusterIP

Aspect DNS Round Robin ClusterIP (kube-proxy)
Failover Time Up to 20 seconds Near-instant (seconds)
DNS Queries Frequent (every TTL) Infrequent (stable IP)
Load on DNS High (short TTL) Low (long TTL)
Load Balancing Client-side random Kernel-level iptables/IPVS
Health Awareness Delayed (TTL dependent) Immediate (Endpoints)
Recommendation Not recommended ✓ Recommended

Best Practices for Service Access

Recommended Approach

  1. Use Service DNS names in application config
  2. Let kube-proxy handle routing via ClusterIP
  3. Configure readiness probes for fast failure detection
  4. Use headless Services only when you need direct Pod access
  5. Monitor Service health and Pod availability

When to Use Each Method

# Use ClusterIP (Default - Recommended) apiVersion: v1 kind: Service metadata: name: webapp spec: type: ClusterIP # or omit (default) selector: app: webapp ports: - port: 80 targetPort: 8080 # Application connects to: webapp.default.svc.cluster.local # kube-proxy handles load balancing automatically --- # Use Headless Service (Special Cases Only) apiVersion: v1 kind: Service metadata: name: database spec: clusterIP: None # Headless selector: app: database ports: - port: 5432 # Use when: # - StatefulSet needs direct Pod access # - Custom client-side load balancing required # - Service discovery without kube-proxy

Performance Optimization

# 1. Configure longer DNS TTL for stable Services apiVersion: v1 kind: ConfigMap metadata: name: coredns namespace: kube-system data: Corefile: | .:53 { kubernetes cluster.local in-addr.arpa ip6.arpa { pods insecure ttl 30 # Longer TTL since ClusterIPs are stable } forward . /etc/resolv.conf cache 30 } # 2. Use IPVS instead of iptables for better performance kubectl edit configmap kube-proxy -n kube-system # Set mode: ipvs # 3. Configure application connection pools properly # - Reuse connections # - Handle connection failures gracefully # - Implement retry logic with exponential backoff

Summary: Service Access

  • Use ClusterIP Services for internal service-to-service communication
  • Let kube-proxy handle load balancing and failover
  • Avoid DNS Round Robin for performance reasons
  • Configure readiness probes for fast failure detection
  • Use headless Services only when necessary (StatefulSets)
  • Monitor DNS and network performance
Final Assessment

Test Your Knowledge

Jobs, CronJobs & RBAC Quiz

Question 1: What is the key difference between a Job and a Deployment?

Question 2: What exit code indicates successful Job completion?

Question 3: What is the purpose of CronJobs?

Question 4: What does RBAC stand for?

Question 5: What is the maximum latency when a Pod fails in DNS Round Robin?

Question 6: Why is DNS Round Robin problematic for service access?

Question 7: What is the recommended method for service access in Kubernetes?

Question 8: Jobs can be used in CI/CD pipelines to automatically create what?