Consistency

How it works

Consistency Overview
How it works
  1. Choose model (strong/eventual)
  2. Apply quorum or leader rules
  3. Handle conflicts (CRDTs/outbox)
  4. Validate with invariants

🎯 What is Consistency?

Consistency in distributed systems ensures that all nodes see the same data at the same time. It's like having a shared notebook - everyone writes on the same page, and no one can write on a different page until the first one is done.
🖥️ Node 1
🖥️ Node 2
🖥️ Node 3
⬇️
📓 Shared State

Overview

  • Governs what readers observe relative to recent writes across replicas.
  • Strong: reads see latest committed write. Eventual: replicas converge over time.
  • Tunable consistency lets callers pick (e.g., QUORUM vs ONE) per request.

When to use stronger guarantees

  • Monetary transfers, inventory updates, auth/entitlements, uniqueness constraints.
  • Workflows where dual-writes or replays cause significant harm.

Trade-offs

  • Stronger consistency increases latency and reduces availability during partitions.
  • Weaker consistency improves availability but requires idempotency and reconciliation.

Patterns

  • Quorum reads/writes: W+R > N to ensure overlap.
  • CRDTs/event sourcing for conflict-free merges.
  • Sagas and outbox pattern for cross-service updates.
  • Read your own writes via sticky routing or session tokens.

Anti-patterns

  • Blind dual-writes to multiple stores without ordering/transactions.
  • Assuming clock sync; prefer logical timestamps/vector clocks.
  • Relying on caches without invalidation strategy.

📐 Quick Diagrams


  # Quorum (N=3, W=2, R=2)
  Client ▶ LB ▶ Replicas A,B,C
         └─ write to any 2
         └─ read from any 2
  

  # Outbox + Poller
  Service ▶ DB(outbox) ▶ Poller ▶ Broker ▶ Consumers
  

❓ Interview Q&A (concise)

  • Q: CAP vs PACELC? A: CAP focuses on partitions; PACELC adds latency vs consistency trade-offs even without partitions.
  • Q: Ensure idempotency? A: Use request IDs, upserts, de-dup tables, and at-least-once consumers with exactly-once semantics per key.
  • Q: Read-after-write consistency? A: Route to primary, use sticky sessions, or quorum reads; avoid stale replicas.

📚 Consistency Models

🔒 Strong Consistency

All reads return the most recent write

✨ Characteristics:

  • Immediate consistency across nodes
  • High latency due to coordination
  • Example: Traditional RDBMS

⏳ Eventual Consistency

System will become consistent over time

✨ Characteristics:

  • Low latency, high availability
  • Data may be stale for a period
  • Example: DNS, Amazon DynamoDB

⚠️ Weak Consistency

No guarantees when all nodes will be consistent

✨ Characteristics:

  • Best effort to be consistent
  • Used in real-time systems
  • Example: Video streaming, gaming

ACID Properties

  1. Atomicity: All or nothing execution
  2. Consistency: Database remains in a valid state
  3. Isolation: Concurrent transactions don't interfere
  4. Durability: Committed transactions persist

BASE Properties (Alternative to ACID)

  1. Basically Available: System remains available
  2. Soft state: State may change over time
  3. Eventually consistent: System becomes consistent eventually

🛠️ Achieving Consistency

🔄 Data Replication

Maintain copies of data across nodes

Synchronous: Immediate consistency, higher latency
Asynchronous: Eventual consistency, lower latency

🛠️ Distributed Transactions

Ensure all-or-nothing across services

Two-Phase Commit: Coordinator and participants agree on commit
Saga Pattern: Chained transactions with compensation actions

🔑 Consensus Algorithms

Agree on values among distributed nodes

Paxos: A family of protocols for solving consensus in a network of unreliable processors
Raft: A consensus algorithm that is easy to understand and implements a replicated log

🎯 Consistency Best Practices

🔄 Choose the right consistency model: Based on application needs
🛠️ Use distributed transactions judiciously: Be aware of the trade-offs
🔑 Implement consensus algorithms: For critical distributed data
📊 Monitor and tune: Consistency-related metrics and performance