Content Delivery Network
Global content distribution with edge caching, TLS termination, and dynamic content acceleration.
Learning Objectives
By the end of this case study, you will understand:
- Design global edge network with optimal POP placement
- Implement intelligent caching strategies and cache invalidation
- Build traffic routing with health checks and failover mechanisms
- Design TLS termination and certificate management at scale
- Implement dynamic content acceleration and origin shielding
Real-World Examples
Cloudflare: 270+ cities, handles 25+ million HTTP requests per second
Amazon CloudFront: 225+ edge locations, powers AWS and Netflix streaming
Fastly: Real-time configuration changes, powers GitHub and Shopify
Akamai: 300,000+ servers in 135+ countries, handles 30% of web traffic
Requirements
Functional Requirements
- Cache static assets with TTL controls
- TLS termination with modern cipher suites
- Dynamic routing based on geo, performance
- Purge/invalidate content globally
Non-functional Requirements
- High cache hit ratio; low origin offload
- Low p95 latency globally
- Scalable invalidation propagation
High-Level Design
- Edge POPs with tiered caches and origin shielding
- Control plane for invalidations and configs
Capacity & Sizing
- Requests/sec per POP, average payload size
- Cache storage per POP and tier
Key Components
- POP caches, Tiered cache, Origin
- Invalidation service
Architecture
High-level components and data flow
Data Model
Core entities and relationships
- cache_entries (
key PK,etag,ttl,size,ts) - invalidations (
id PK,pattern,ts,actor)
APIs
- POST /api/invalidate { pattern }
- GET /api/cache/:key
- DELETE /api/cache/:key
Hot Path
- Request path: POP cache → tier cache → origin (on miss)
Caching & TTL
- Honor Cache-Control/Surrogate-Control; stale-while-revalidate
Scaling
- Hash ring to spread keys across POP caches
- Async propagation of invalidations
- Prefetch queues for new deployments
Trade-offs
- Cache staleness vs purge frequency
- Tiered cache hit ratio vs added latency
- Compression formats vs CPU cost
Failure Modes & Mitigations
- POP outage → reroute to nearest healthy POP
- Miss storms → throttle and warm tiers
- Invalidation backlog → prioritize patterns
Observability
- SLIs: cache hit ratio, p95 latency, origin offload
- TLS termination errors, cert expirations
- Invalidation throughput and lag