Caching
How it works
- Check cache for key
- On miss, fetch from origin
- Populate cache with TTL
- Serve and monitor hit ratio
🎯 What is Caching?
Caching is like having a secret stash of frequently used ingredients in a restaurant - it allows you to serve your customers faster without always going back to the store. In computing, caching stores copies of files or data in temporary storage locations for quick access.👤 Client Request
⬇️
🔍 Check Cache
⬇️
✅ Cache Hit
⬇️
📬 Serve from Cache
Overview
- Speeds up reads by storing computed or fetched data closer to the requester.
- Multiple tiers: browser, CDN, reverse proxy, app cache, DB cache.
- Improves latency and reduces origin load when hit ratio is healthy.
When to use
- Read-heavy workloads with temporal/spatial locality.
- Expensive computations or slow upstreams where slight staleness is acceptable.
- Content with natural versioning (e.g., assets) or query results reused across users.
Trade-offs
- Staleness vs. freshness: TTL/validation governance needed.
- Extra complexity: invalidation, key design, and observability.
Patterns
- Cache-aside for apps managing their own misses.
- Read-through for transparent loads; write-through for strong consistency.
- Stale-while-revalidate and request coalescing to avoid stampedes.
Anti-patterns
- Caching everything by default without TTL strategy.
- Keys without namespacing/versioning, making broad invalidation impossible.
- Large blob values that negate cache benefits due to transfer cost.
📚 Cache Levels
- Browser Cache: Client-side caching
- CDN: Geographic distribution of static content
- Reverse Proxy: Server-side caching (Nginx, Varnish)
- Application Cache: In-memory cache (Redis, Memcached)
- Database Cache: Query result caching
🛠️ Caching Strategies
🗂️ Cache-Aside (Lazy Loading)
Application manages cache
- Load data on cache miss
- Good for read-heavy workloads
- Example: Product catalogs
📝 Write-Through
Write to cache and database simultaneously
- Ensures consistency
- Higher write latency
- Example: User profiles
📥 Write-Back (Write-Behind)
Write to cache immediately, then to database asynchronously
- Risk of data loss on failure
- Higher performance for writes
- Example: Logging, analytics
📖 Read-Through
Cache loads data automatically on miss
- Transparent to application
- Example: Configuration settings
📊 Cache Patterns
🌡️ Cache Warming
Pre-load cache with expected data
🚫 Cache Invalidation
Remove stale data from cache
⏳ TTL (Time To Live)
Automatic expiration of cache entries
🚧 Common Issues
- Cache Stampede: Multiple requests for the same expired key
- Hot Keys: Some keys accessed much more frequently
- Cache Penetration: Queries for non-existent data
🛠️ Solutions and Best Practices
🔄 Cache Aside Pattern
Application is responsible for loading data into the cache
📊 Monitoring and Alerts
Track cache hit/miss rates, evictions, and latency
🔧 Regular Cache Maintenance
Invalidate stale data, adjust TTLs, and optimize size
🧩 Real-World Scenarios
- CDN image cache: Long TTL with versioned file names (hash in path) to make purges safe and fast.
- API response cache: Product details cached for 60–300s to cut DB load; key by path+query and user scope.
- Session/profile writes: Write-through for strong consistency; write-back for analytics or logs where loss is tolerable.
- Search autosuggest: Read-through with short TTL and background refresh.
- Microservice aggregator: Per-endpoint cache to shield upstream fan-out.
⚠️ Pitfalls and Anti-patterns
- Thundering herd/stampede on popular key expiration.
- Poor key design: missing namespace, collisions, or no version for schema changes.
- Oversized values increase network and serialization cost; prefer smaller granular keys.
- Inconsistent TTLs cause stale blends of data; use coherent policies and jitter to avoid synchronized expiry.
- Eviction mismatch (e.g., LFU vs LRU) with traffic shape; monitor and adjust.
- No negative caching: repeated misses for non-existent entries hammer the DB.
📐 Quick Diagrams
# Cache-aside (lazy)
App ──get(k)──▶ Cache ─miss──▶ DB
◀─hit──── ◀── value
└─ set(k,v,ttl) ─────▶
# Multi-layer (Edge + App + DB)
Client ▶ CDN ▶ App ▶ Redis ▶ DB
# Sharded Redis with consistent hashing
keyHash = hash(key) % N
🧪 Operations Checklist
- Track hit ratio, evictions, item size distribution, tail latency.
- Size memory with headroom; set maxmemory-policy explicitly.
- Use pipelining/batching for chatty patterns; compress large values.
- Add jitter to TTLs; warm critical keys on deploys/cold starts.
- Version cache keys to invalidate safely during schema changes.
- For write-back, ensure durable queues and recovery on crash.
❓ Interview Q&A (concise)
- Q: When not to cache? A: Highly dynamic, user-specific, or sensitive data; when recompute is cheap and correctness trumps speed.
- Q: Prevent cache stampede? A: Request coalescing/single-flight, mutex per key, early refresh, jittered TTL, and stale-while-revalidate.
- Q: Cache consistency options? A: Write-through, write-around, write-back; plus invalidation on writes and short TTLs.
- Q: Negative caching? A: Cache “not found” briefly to avoid repeated DB hits for missing data.
- Q: Redis vs Memcached? A: Redis = richer data types, persistence, clustering; Memcached = simple, fast KV in-memory.
- Q: How to invalidate broadly? A: Versioned prefixes/namespaces rather than scanning keys.