Distributed Databases & Failover Crash Course

Lesson 1 of 6

Introduction to Distributed Databases

What is a Distributed Database?

A distributed database is a database system that stores data across multiple physical locations, either on the same network or across different geographic locations. Unlike traditional centralized databases, data is distributed across multiple nodes, servers, or data centers.

Why Distributed Databases?

Modern applications require distributed databases to address several critical challenges:

Scalability: Handle massive amounts of data and traffic by distributing load
High Availability: Ensure service continuity even when components fail
Performance: Reduce latency by placing data closer to users
Fault Tolerance: Survive hardware failures and disasters
Geographic Distribution: Serve users globally with low latency

Types of Distributed Databases

1. Homogeneous Distributed Databases

All sites use the same database management system (DBMS) and appear as a single logical database to users.

Same DBMS across all nodes
Easier to manage and maintain
Example: Multiple PostgreSQL instances synchronized together

2. Heterogeneous Distributed Databases

Different sites may use different DBMSs with varying schemas and data models.

Different DBMSs at different locations
More complex integration
Example: MySQL at one site, Oracle at another, integrated through middleware

Distributed Database Architecture

Architecture	Description	Use Case
Client-Server	Clients connect to one or more database servers	Web applications, mobile apps
Peer-to-Peer	Each node is both client and server	Blockchain, decentralized systems
Multi-tier	Multiple layers (presentation, application, database)	Enterprise applications

Key Challenges in Distributed Databases

1. Data Distribution

Deciding how to distribute data across nodes while maintaining performance and consistency.

2. Transaction Management

Ensuring ACID properties across multiple nodes is complex and requires sophisticated protocols.

3. Network Partitioning

Handling scenarios where network failures split the database into isolated segments.

4. Data Consistency

Maintaining consistent data across all replicas when updates occur.

CAP Theorem Preview

The CAP theorem states that distributed databases can only guarantee two of three properties: Consistency, Availability, and Partition Tolerance. We'll explore this in detail in Lesson 4.

Popular Distributed Database Systems

Database	Type	Key Features
Cassandra	NoSQL (Column-family)	High availability, linear scalability
MongoDB	NoSQL (Document)	Flexible schema, horizontal scaling
CockroachDB	SQL (Relational)	ACID compliant, geo-distribution
Amazon DynamoDB	NoSQL (Key-value)	Managed service, auto-scaling
Google Spanner	SQL (Relational)	Global consistency, SQL support

Basic Concepts

Node

An individual database server or instance in the distributed system. Each node stores a portion of the data.

Replica

A copy of data stored on multiple nodes to ensure availability and fault tolerance.

Shard

A horizontal partition of data. Each shard contains a subset of the total data.

Cluster

A collection of interconnected database nodes working together as a single system.

Real-World Example

Netflix uses Apache Cassandra with hundreds of nodes distributed across multiple AWS regions. This allows them to handle billions of requests per day, survive data center failures, and provide low-latency access to users worldwide.

Lesson 2 of 6

Replication Strategies

What is Database Replication?

Database replication is the process of copying and maintaining database objects (like tables and databases) in multiple database servers. This creates redundant copies of data to improve availability, fault tolerance, and read performance.

Benefits of Replication

High Availability: If one server fails, others can continue serving requests
Load Distribution: Read queries can be distributed across replicas
Disaster Recovery: Data is preserved even if one location is destroyed
Geographic Distribution: Place data closer to users for lower latency

Replication Architectures

1. Master-Slave Replication (Primary-Secondary)

One primary node handles all writes, and changes are replicated to one or more secondary nodes that handle reads.

# Master Node (Primary)
- Accepts all write operations
- Propagates changes to slaves
- Can also serve read requests

# Slave Nodes (Secondaries)
- Receive data from master
- Serve read-only requests
- Cannot accept write operations

Example: MySQL Master-Slave Setup
------------------------------------
Master: 192.168.1.10 (writes)
Slave1: 192.168.1.11 (reads)
Slave2: 192.168.1.12 (reads)

Advantages:

Simple to implement and understand
Scales read operations effectively
Reduced load on the master for read-heavy workloads

Disadvantages:

Single point of failure for writes (master)
Replication lag may cause stale reads
Limited write scalability

2. Master-Master Replication (Multi-Master)

Multiple nodes can accept both read and write operations, with changes synchronized between all masters.

# All nodes are equal
Master1: 192.168.1.10 (reads & writes)
Master2: 192.168.1.11 (reads & writes)
Master3: 192.168.1.12 (reads & writes)

# Bidirectional synchronization
Master1 <---> Master2 <---> Master3

Advantages:

No single point of failure
Scales both reads and writes
Better availability

Disadvantages:

Complex conflict resolution needed
Risk of write conflicts
More difficult to maintain consistency

3. Peer-to-Peer Replication

All nodes are equal peers, each can read and write. Used in NoSQL databases like Cassandra.

# Cassandra-style architecture
Node1 <---> Node2
  ^           ^
  |           |
  v           v
Node4 <---> Node3

- No master node
- Data distributed via consistent hashing
- Replication factor determines copies

Synchronous vs Asynchronous Replication

Synchronous Replication

Write operations must be confirmed by replicas before completing the transaction.

# Synchronous Write Flow
1. Client sends write to primary
2. Primary writes to local storage
3. Primary sends to all replicas
4. Replicas acknowledge receipt
5. Primary confirms to client  ← Only then!

Latency: Higher
Consistency: Strong
Data Loss Risk: Very Low

Use When: Data consistency is critical (financial transactions, inventory management)

Asynchronous Replication

Write operations are confirmed immediately after being written to the primary, replicas updated later.

# Asynchronous Write Flow
1. Client sends write to primary
2. Primary writes to local storage
3. Primary confirms to client  ← Immediately!
4. Primary sends to replicas (background)
5. Replicas eventually receive updates

Latency: Lower
Consistency: Eventual
Data Loss Risk: Higher

Use When: Performance is critical and you can tolerate eventual consistency (social media feeds, caching)

Replication Lag

Replication lag is the delay between when data is written to the primary and when it appears on replicas.

Handling Replication Lag

Strategies to mitigate replication lag issues:

Read-your-writes consistency: Route user's reads to primary after they write
Monotonic reads: Always read from the same replica for a session
Monitoring: Track lag metrics and alert when exceeding thresholds
Connection pooling: Use smart routing to direct traffic appropriately

Replication Topologies

Topology	Structure	Best For
Star	Central master, multiple slaves	Simple setups, read scaling
Chain	Master → Slave1 → Slave2 → Slave3	Geographic distribution
Ring	Circular replication between nodes	Multi-master scenarios
Mesh	All nodes replicate to all others	Maximum redundancy

Real-World Example: Instagram

Instagram uses PostgreSQL with master-slave replication. They use the master for writes and route read queries to multiple read replicas. During peak times, they can have 12+ read replicas to handle billions of read requests per day.

PostgreSQL Replication Example

# On Primary Server (postgresql.conf)
wal_level = replica
max_wal_senders = 10
wal_keep_size = 1GB

# Create replication user
CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'secret';

# On Replica Server
# Create standby configuration
standby_mode = 'on'
primary_conninfo = 'host=192.168.1.10 port=5432 user=replicator password=secret'
trigger_file = '/tmp/postgresql.trigger.5432'

# Start replica server - it will sync from primary

Lesson 4 of 6

Consistency Models & CAP Theorem

The CAP Theorem

The CAP theorem, proposed by Eric Brewer, states that a distributed database system can only guarantee two of the following three properties simultaneously:

The Three Properties

Consistency (C): All nodes see the same data at the same time
Availability (A): Every request receives a response (success or failure)
Partition Tolerance (P): System continues operating despite network partitions

In reality, network partitions are inevitable in distributed systems, so you must choose between Consistency and Availability.

CAP Trade-offs

CP Systems (Consistency + Partition Tolerance)

Sacrifice availability to maintain consistency during network partitions.

# CP System Behavior During Partition:
- Nodes cannot communicate with each other
- System rejects writes to maintain consistency
- Reads may be rejected or delayed
- System becomes unavailable until partition resolves

Examples:
- MongoDB (with strong consistency settings)
- HBase
- Redis (with wait command)
- ZooKeeper

Use When: Data correctness is critical (banking, inventory)

AP Systems (Availability + Partition Tolerance)

Sacrifice consistency to maintain availability during network partitions.

# AP System Behavior During Partition:
- Nodes accept writes independently
- System remains available
- Different nodes may have different data
- Eventually converges after partition resolves

Examples:
- Cassandra
- DynamoDB
- Couchbase
- Riak

Use When: Availability is critical (social media, analytics)

CA Systems (Consistency + Availability)

In practice, CA systems don't exist in truly distributed environments because network partitions are inevitable.

CA Systems Reality Check

Traditional relational databases (MySQL, PostgreSQL) running on a single node can be considered CA, but they're not truly distributed. Once you distribute them, you must handle partitions, moving into CP or AP territory.

Consistency Models

1. Strong Consistency (Linearizability)

Once a write completes, all subsequent reads will see that write or a later one.

# Strong Consistency Example:
Time: T1 - Client A writes X=1
Time: T2 - Write completes
Time: T3 - Client B reads X → Must see X=1 or later value

Guarantees: All clients see the same value at the same time
Latency: Higher (must wait for replication)
Use Case: Financial transactions, inventory systems

2. Eventual Consistency

If no new updates are made, all replicas will eventually converge to the same value.

# Eventual Consistency Example:
Time: T1 - Client A writes X=1 to Node 1
Time: T2 - Client B reads from Node 2 → May see old value X=0
Time: T3 - Replication completes
Time: T4 - Client B reads from Node 2 → Now sees X=1

Guarantees: Eventually consistent, but may be stale
Latency: Lower (don't wait for replication)
Use Case: Social media feeds, caching, analytics

3. Causal Consistency

Writes that are causally related are seen by all nodes in the same order.

# Causal Consistency Example:
Alice posts: "I love distributed databases!"
Bob replies: "Me too!"

All users must see:
1. Alice's post first
2. Bob's reply second

But unrelated posts can appear in any order relative to these.

4. Read-Your-Writes Consistency

A client will always see their own writes in subsequent reads.

# Read-Your-Writes Example:
User updates profile: name = "John Doe"
User immediately views profile → Must see "John Doe"

Implementation: Route user's reads to the primary or use session affinity

5. Monotonic Reads

Once a client reads a value, it will never see an older value in subsequent reads.

# Monotonic Reads Example:
Time: T1 - User reads X=5 from Replica 1
Time: T2 - User reads X from Replica 2 → Must see X≥5

Prevents: Time going backwards from user's perspective

ACID vs BASE

ACID (Traditional Databases)

Property	Description
Atomicity	All or nothing: transaction fully completes or fully fails
Consistency	Database moves from one valid state to another
Isolation	Concurrent transactions don't interfere with each other
Durability	Completed transactions survive system failures

BASE (Distributed NoSQL)

Property	Description
Basically Available	System guarantees availability (CAP theorem)
Soft state	State may change over time without input (convergence)
Eventual consistency	System will become consistent over time

Quorum-Based Consistency

Use voting to balance consistency and availability. Common in distributed databases like Cassandra and DynamoDB.

# Quorum Formula:
N = Total number of replicas
W = Write quorum (nodes that must acknowledge write)
R = Read quorum (nodes queried for read)

Strong Consistency: W + R > N
Examples:
- N=3, W=2, R=2 (2+2>3) → Strong consistency
- N=3, W=3, R=1 (3+1>3) → Strong consistency, slow writes
- N=3, W=1, R=3 (1+3>3) → Strong consistency, slow reads

Eventual Consistency: W + R ≤ N
Examples:
- N=3, W=1, R=1 (1+1≤3) → Fast but eventually consistent

Cassandra Consistency Levels

Cassandra allows you to tune consistency per query:

ONE: 1 node responds (fastest, least consistent)
QUORUM: Majority of nodes respond
ALL: All nodes respond (slowest, most consistent)
LOCAL_QUORUM: Quorum in local datacenter

Real-World Example: Amazon

Amazon's DynamoDB uses eventual consistency by default for high availability. However, critical operations like checkout and payment use strong consistency settings to prevent inventory overselling and payment errors.

Conflict Resolution

In eventually consistent systems, conflicts can occur when the same data is updated on multiple nodes simultaneously.

Conflict Resolution Strategies:

Last Write Wins (LWW): Use timestamps, newest write wins (simple but can lose data)
Version Vectors: Track causality, identify true conflicts
Application-level resolution: Let application decide (e.g., shopping cart merge)
CRDTs: Conflict-free Replicated Data Types that merge automatically

Lesson 5 of 6

Database Failover Mechanisms

What is Database Failover?

Database failover is the automatic or manual process of switching to a standby database server when the primary server fails. It's a critical component of high-availability database systems.

Goals of Failover

Minimize Downtime: Restore service as quickly as possible
Prevent Data Loss: Ensure no committed transactions are lost
Maintain Consistency: Keep data consistent across nodes
Transparent to Users: Users shouldn't notice the switch

Types of Failover

1. Automatic Failover

System automatically detects failure and promotes a standby without human intervention.

# Automatic Failover Flow:
1. Health check detects primary failure
2. Failover coordinator verifies failure
3. Standby is automatically promoted to primary
4. DNS/load balancer updated to point to new primary
5. Clients reconnect to new primary
6. Old primary joins as standby when recovered

Time to Failover: Seconds to minutes
Risk: Potential split-brain scenarios
Tools: Patroni, MySQL Group Replication, MongoDB Replica Sets

2. Manual Failover

Database administrator manually triggers failover after assessing the situation.

# Manual Failover Flow:
1. DBA is alerted to primary failure
2. DBA investigates the cause
3. DBA manually promotes standby
4. DBA updates application configuration
5. DBA monitors the new primary

Time to Failover: Minutes to hours
Risk: Longer downtime
Benefit: More control, reduced risk of split-brain

3. Planned Failover (Switchover)

Controlled failover during maintenance windows with zero data loss.

# Planned Failover Flow:
1. Ensure standby is fully synchronized
2. Stop writes to primary
3. Wait for replication to complete
4. Verify standby is caught up
5. Promote standby to primary
6. Redirect traffic to new primary
7. Demote old primary to standby

Downtime: Minimal (seconds)
Data Loss: Zero
Use Case: Upgrades, maintenance, migrations

Failure Detection

Health Check Methods:

Heartbeat: Regular "alive" signals between nodes
TCP Connection: Monitor database port connectivity
Query-based: Execute test queries periodically
Replication lag: Monitor replication delay

# PostgreSQL Health Check Script
#!/bin/bash
pg_isready -h primary-db.example.com -p 5432 -U postgres

# Returns:
# - 0: accepting connections
# - 1: rejecting connections
# - 2: no response
# - 3: no attempt (bad params)

# Advanced check with query
psql -h primary-db.example.com -U postgres -c "SELECT 1" > /dev/null 2>&1
if [ $? -eq 0 ]; then
    echo "Primary is healthy"
else
    echo "Primary is down - trigger failover"
fi

Split-Brain Problem

A dangerous scenario where two nodes both think they're the primary, potentially causing data divergence and corruption.

Split-Brain Scenario

# What Happens:
1. Network partition separates primary from standby
2. Standby can't reach primary, thinks it's down
3. Standby promotes itself to primary
4. Now TWO primaries exist!
5. Both accept writes → data divergence
6. When network heals, conflict resolution needed

Prevention:
- Use quorum-based systems
- Implement fencing/STONITH
- Use distributed consensus (Raft, Paxos)
- Witness servers for tie-breaking

Fencing and STONITH

Techniques to prevent split-brain by forcibly isolating or shutting down the old primary.

# Fencing Methods:
1. Network Fencing: Block old primary's network access
2. Power Fencing: Power off old primary
3. STONITH: "Shoot The Other Node In The Head"
4. Disk Fencing: Revoke storage access

Example with STONITH:
- Standby detects primary failure
- Before promoting, standby sends power-off to old primary
- Ensures old primary cannot accept writes
- Safe to promote standby

Consensus Algorithms

Raft Consensus

A consensus algorithm used in many distributed databases for leader election and maintaining consistency.

# Raft Leader Election:
1. Nodes start as followers
2. If leader heartbeat timeout, become candidate
3. Request votes from other nodes
4. Node with majority votes becomes leader
5. Leader handles all writes
6. If leader fails, process repeats

Used by: etcd, Consul, CockroachDB

Benefits:
- Guaranteed single leader
- Handles network partitions correctly
- No split-brain possible

Paxos Consensus

An older, more complex consensus algorithm that Raft was designed to improve upon.

Used by: Google Spanner, Apache ZooKeeper

More complex but proven correct
Handles Byzantine failures (malicious nodes)

Failover Patterns by Database

PostgreSQL with Patroni

# Patroni Configuration (patroni.yml)
scope: postgres-cluster
namespace: /db/
name: postgres-node-1

etcd:
  host: etcd-cluster:2379

bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576

postgresql:
  data_dir: /var/lib/postgresql/data
  parameters:
    max_connections: 100
    shared_buffers: 256MB

# Patroni handles:
- Automatic failover
- Leader election via etcd
- Health checks
- Replication configuration

MySQL Group Replication

# MySQL Group Replication
- Multi-master replication
- Automatic failover
- Conflict detection and resolution
- Majority-based decision making

# Configuration
[mysqld]
gtid_mode = ON
enforce_gtid_consistency = ON
plugin_load_add = 'group_replication.so'
group_replication_group_name = "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"
group_replication_start_on_boot = OFF
group_replication_local_address = "192.168.1.10:33061"
group_replication_group_seeds = "192.168.1.10:33061,192.168.1.11:33061"

MongoDB Replica Sets

# MongoDB Automatic Failover
rs.initiate({
  _id: "myReplicaSet",
  members: [
    { _id: 0, host: "mongodb0.example.net:27017" },
    { _id: 1, host: "mongodb1.example.net:27017" },
    { _id: 2, host: "mongodb2.example.net:27017" }
  ]
})

# Automatic behaviors:
- Elections use majority vote
- Failover typically completes in 10-30 seconds
- Primary election requires majority of voting members
- Arbiter nodes can participate in voting without data

Monitoring Failover Systems

Key Metrics to Monitor:

Metric	Description	Alert Threshold
Replication Lag	Time delay between primary and standby	> 10 seconds
Health Check Status	Is the primary responding?	Any failure
Connection Count	Active database connections	> 80% of max
Failover Count	Number of failovers	> 1 per week
Recovery Time	Time to complete failover	> 5 minutes

Real-World Example: GitHub

In 2018, GitHub experienced a 24-hour outage due to a split-brain scenario in their MySQL cluster. Network partition caused both data centers to think they were primary. They had to carefully reconcile divergent data. Since then, they've improved their consensus mechanisms and monitoring.

Testing Failover

Regular failover testing is crucial to ensure your system works when needed.

# Failover Testing Checklist:
1. Chaos Engineering: Randomly kill primary in test environment
2. Measure failover time
3. Verify zero data loss
4. Check application reconnection
5. Monitor for split-brain
6. Test failback to original primary
7. Document any issues

# Tools for Testing:
- Chaos Monkey (Netflix)
- Gremlin
- Litmus (Kubernetes)
- Custom scripts to kill processes/VMs

Lesson 6 of 6

Real-World Implementation & Best Practices

Designing a Distributed Database System

Implementing a production-ready distributed database system requires careful planning and consideration of multiple factors.

Key Design Decisions

Database Selection: SQL vs NoSQL, specific technology
Replication Strategy: Master-slave, multi-master, or peer-to-peer
Partitioning Strategy: How to shard data across nodes
Consistency Model: Strong vs eventual consistency
Failover Approach: Automatic, manual, or hybrid
Geographic Distribution: Single region vs multi-region

Multi-Region Architecture

Distributing databases across geographic regions for global availability and low latency.

# Multi-Region Setup Example
Region: US-EAST (Primary)
  - Primary database cluster
  - 3 nodes with replication
  - Serves North American traffic

Region: EU-WEST (Secondary)
  - Read replica cluster
  - 3 nodes synced from US-EAST
  - Serves European traffic

Region: ASIA-PACIFIC (Secondary)
  - Read replica cluster
  - 3 nodes synced from US-EAST
  - Serves Asian traffic

Replication: Asynchronous between regions
Failover: Automatic within region, manual between regions
Latency: <50ms local, 100-200ms cross-region

Multi-Region Challenges:

Cross-region latency: Speed of light limits
Data sovereignty: Legal requirements for data location
Consistency: CAP theorem trade-offs amplified
Cost: Data transfer between regions is expensive

Best Practices for High Availability

1. Redundancy at Every Layer

# High Availability Stack
Load Balancer (HA pair)
    ↓
Application Servers (3+ instances)
    ↓
Connection Pooler (PgBouncer/ProxySQL)
    ↓
Database Cluster (3+ nodes)
    ↓
Storage (Replicated/RAID)

No single point of failure!

2. Automate Everything

Automated failover
Automated backups
Automated monitoring and alerting
Automated scaling
Infrastructure as Code (Terraform, CloudFormation)

3. Monitor Proactively

# Essential Metrics
- Query performance (slow query log)
- Connection pool utilization
- Replication lag
- Disk I/O and space
- CPU and memory usage
- Network throughput
- Error rates
- Backup success/failure

Tools: Prometheus, Grafana, Datadog, New Relic

4. Test Failure Scenarios

# Regular Disaster Recovery Drills
Simulate primary database failure
Test automatic failover
Measure recovery time objective (RTO)
Verify recovery point objective (RPO)
Test restore from backups
Simulate network partitions
Test cross-region failover
Document lessons learned

Frequency: Monthly for critical systems

Backup Strategies

3-2-1 Backup Rule

3 copies of data (original + 2 backups)
2 different storage types (disk + cloud)
1 offsite backup (different geographic location)

# PostgreSQL Backup Strategy
# Full Backup (Daily)
pg_basebackup -h primary -D /backup/full/$(date +%Y%m%d) -Ft -z -P

# WAL Archiving (Continuous)
archive_command = 'rsync -a %p backup-server:/wal_archive/%f'

# Point-in-Time Recovery (PITR)
# Can restore to any point in time between backups

# Retention Policy
- Daily backups: Keep 7 days
- Weekly backups: Keep 4 weeks
- Monthly backups: Keep 12 months
- Yearly backups: Keep 7 years

Security Best Practices

Network Security

Firewall rules: Only allow necessary ports
VPC/Private networks: Isolate database traffic
SSL/TLS encryption: Encrypt data in transit
VPN/Bastion hosts: Secure administrative access

Access Control

# Principle of Least Privilege
# Read-only user for reporting
CREATE USER reporting_user WITH PASSWORD 'strong_password';
GRANT SELECT ON ALL TABLES IN SCHEMA public TO reporting_user;

# Application user with limited permissions
CREATE USER app_user WITH PASSWORD 'strong_password';
GRANT SELECT, INSERT, UPDATE ON specific_tables TO app_user;

# Admin user (use sparingly)
CREATE USER admin_user WITH PASSWORD 'very_strong_password' SUPERUSER;

# Audit logging
ALTER DATABASE mydb SET log_statement = 'all';
ALTER DATABASE mydb SET log_connections = 'on';

Encryption

At rest: Encrypt database files (LUKS, dm-crypt, AWS EBS encryption)
In transit: SSL/TLS for all connections
Backups: Encrypt backup files before storing
Sensitive data: Application-level encryption for PII

Performance Optimization

Connection Pooling

# PgBouncer Configuration
[databases]
mydb = host=127.0.0.1 port=5432 dbname=mydb

[pgbouncer]
listen_port = 6432
listen_addr = *
auth_type = md5
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25

# Benefits:
- Reduces connection overhead
- Prevents connection exhaustion
- Better resource utilization

Query Optimization

# Indexing Strategy
# Good indexes
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_user_created ON orders(user_id, created_at);

# Analyze query plans
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'user@example.com';

# Monitor slow queries
# PostgreSQL
ALTER DATABASE mydb SET log_min_duration_statement = 1000; -- log queries > 1s

# MySQL
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;

Caching

# Multi-layer caching
Application Cache (Redis/Memcached)
    ↓ (cache miss)
Database Query Cache
    ↓ (cache miss)
Actual Database Query

# Cache common queries
- User sessions
- Product catalogs
- Configuration data
- Frequently accessed data

# Cache invalidation strategies
- TTL (Time To Live)
- Write-through cache
- Cache-aside pattern
- Event-driven invalidation

Real-World Case Studies

Case Study 1: Uber

Challenge: Global ride-hailing service needs low latency worldwide.

Solution:

Schemaless (MySQL-based) for flexible data model
Geographic sharding by city
Multi-region deployment
Eventually consistent for non-critical data
Strong consistency for billing/payments

Result: Handles millions of trips per day with <50ms latency for critical operations.

Case Study 2: Discord

Challenge: Real-time chat with billions of messages, rapid growth.

Solution:

Started with MongoDB, migrated to Cassandra
Partitioned by channel_id
Eventually consistent (acceptable for chat)
Custom message ID scheme for ordering
Aggressive caching with Redis

Result: Scales to 140 million monthly active users, storing trillions of messages.

Case Study 3: Airbnb

Challenge: Complex booking system requiring strong consistency.

Solution:

MySQL with master-slave replication
Sharded by listing_id
Synchronous replication for bookings
Asynchronous for search/browse
Read replicas for analytics

Result: Prevents double-bookings while maintaining high availability.

Choosing the Right Database

Use Case	Recommended Database	Why
Financial transactions	PostgreSQL, CockroachDB	ACID compliance, strong consistency
Social media feed	Cassandra, DynamoDB	High write throughput, eventual consistency OK
E-commerce catalog	MongoDB, Elasticsearch	Flexible schema, full-text search
Real-time analytics	ClickHouse, TimescaleDB	Columnar storage, time-series optimization
Session storage	Redis, Memcached	In-memory, fast reads/writes
Graph relationships	Neo4j, Amazon Neptune	Optimized for graph queries

Migration Strategies

Blue-Green Deployment

# Migrating to distributed database
Set up new distributed database (Green)
Keep old database running (Blue)
Replicate data from Blue to Green
Run both in parallel (dual writes)
Verify data consistency
Switch reads to Green
Switch writes to Green
Monitor for issues
Decommission Blue after stabilization

Allows instant rollback if issues arise!

Common Pitfalls to Avoid

Over-sharding: Too many small shards increases complexity
Under-monitoring: You can't fix what you can't see
Ignoring backups: Test restores regularly!
No capacity planning: Know your growth trajectory
Premature optimization: Start simple, scale when needed
Ignoring security: Security should be built-in from day one

Summary: Key Takeaways

Distributed databases trade simplicity for scalability and availability
Choose consistency model based on business requirements (CAP theorem)
Implement proper replication for fault tolerance
Use sharding to scale beyond single-server limits
Automated failover minimizes downtime but requires careful implementation
Monitor everything and test failure scenarios regularly
There's no one-size-fits-all solution - understand your requirements

Distributed Databases & Failover Crash Course

Introduction to Distributed Databases

What is a Distributed Database?

Why Distributed Databases?

Types of Distributed Databases

1. Homogeneous Distributed Databases

2. Heterogeneous Distributed Databases

Distributed Database Architecture

Key Challenges in Distributed Databases

1. Data Distribution

2. Transaction Management

3. Network Partitioning

4. Data Consistency

CAP Theorem Preview

Popular Distributed Database Systems

Basic Concepts

Node

Replica

Shard

Cluster

Real-World Example

Test Your Knowledge - Lesson 1

Question 1: What is a key benefit of distributed databases?

Question 2: What is a shard in distributed databases?

Question 3: Which database is known for being ACID compliant with geo-distribution?

Replication Strategies

What is Database Replication?

Benefits of Replication

Replication Architectures

1. Master-Slave Replication (Primary-Secondary)

2. Master-Master Replication (Multi-Master)

3. Peer-to-Peer Replication

Synchronous vs Asynchronous Replication

Synchronous Replication

Asynchronous Replication

Replication Lag

Handling Replication Lag

Replication Topologies

Real-World Example: Instagram

PostgreSQL Replication Example

Test Your Knowledge - Lesson 2

Question 1: In Master-Slave replication, where are write operations handled?

Question 2: What is the main advantage of asynchronous replication over synchronous?

Question 3: What is replication lag?

Partitioning and Sharding

What is Data Partitioning?

Why Partition Data?

Partitioning vs Sharding

Types of Partitioning

1. Horizontal Partitioning (Sharding)

2. Vertical Partitioning

Sharding Strategies

1. Range-Based Sharding

2. Hash-Based Sharding

3. Directory-Based Sharding

4. Geographic Sharding

Consistent Hashing

Sharding Challenges

Choosing a Shard Key

Real-World Example: MongoDB Sharding

MongoDB Sharding Example

Test Your Knowledge - Lesson 3

Question 1: What is horizontal partitioning?

Question 2: What is the main advantage of hash-based sharding?

Question 3: Which is a good characteristic for a shard key?

Consistency Models & CAP Theorem

The CAP Theorem

The Three Properties

CAP Trade-offs

CP Systems (Consistency + Partition Tolerance)

AP Systems (Availability + Partition Tolerance)

CA Systems (Consistency + Availability)

CA Systems Reality Check

Consistency Models

1. Strong Consistency (Linearizability)

2. Eventual Consistency

3. Causal Consistency

4. Read-Your-Writes Consistency

5. Monotonic Reads

ACID vs BASE