Content Delivery Network

Global content distribution with edge caching, TLS termination, and dynamic content acceleration.

Learning Objectives

By the end of this case study, you will understand:

Design global edge network with optimal POP placement
Implement intelligent caching strategies and cache invalidation
Build traffic routing with health checks and failover mechanisms
Design TLS termination and certificate management at scale
Implement dynamic content acceleration and origin shielding

Real-World Examples

Cloudflare: 270+ cities, handles 25+ million HTTP requests per second

Amazon CloudFront: 225+ edge locations, powers AWS and Netflix streaming

Fastly: Real-time configuration changes, powers GitHub and Shopify

Akamai: 300,000+ servers in 135+ countries, handles 30% of web traffic

Requirements

Functional Requirements

Cache static assets with TTL controls
TLS termination with modern cipher suites
Dynamic routing based on geo, performance
Purge/invalidate content globally

Non-functional Requirements

High cache hit ratio; low origin offload
Low p95 latency globally
Scalable invalidation propagation

High-Level Design

Edge POPs with tiered caches and origin shielding
Control plane for invalidations and configs

Capacity & Sizing

Requests/sec per POP, average payload size
Cache storage per POP and tier

Key Components

POP caches, Tiered cache, Origin
Invalidation service

Architecture

High-level components and data flow

Data Model

Core entities and relationships

cache_entries (key PK, etag, ttl, size, ts)
invalidations (id PK, pattern, ts, actor)

APIs

POST /api/invalidate { pattern }
GET /api/cache/:key
DELETE /api/cache/:key

Hot Path

Request path: POP cache → tier cache → origin (on miss)

Caching & TTL

Honor Cache-Control/Surrogate-Control; stale-while-revalidate

Scaling

Hash ring to spread keys across POP caches
Async propagation of invalidations
Prefetch queues for new deployments

Trade-offs

Cache staleness vs purge frequency
Tiered cache hit ratio vs added latency
Compression formats vs CPU cost

Failure Modes & Mitigations

POP outage → reroute to nearest healthy POP
Miss storms → throttle and warm tiers
Invalidation backlog → prioritize patterns

Observability

SLIs: cache hit ratio, p95 latency, origin offload
TLS termination errors, cert expirations
Invalidation throughput and lag