Content Delivery Network

Global content distribution with edge caching, TLS termination, and dynamic content acceleration.

Learning Objectives

By the end of this case study, you will understand:

  • Design global edge network with optimal POP placement
  • Implement intelligent caching strategies and cache invalidation
  • Build traffic routing with health checks and failover mechanisms
  • Design TLS termination and certificate management at scale
  • Implement dynamic content acceleration and origin shielding

Real-World Examples

Cloudflare: 270+ cities, handles 25+ million HTTP requests per second

Amazon CloudFront: 225+ edge locations, powers AWS and Netflix streaming

Fastly: Real-time configuration changes, powers GitHub and Shopify

Akamai: 300,000+ servers in 135+ countries, handles 30% of web traffic

Requirements

Functional Requirements

  • Cache static assets with TTL controls
  • TLS termination with modern cipher suites
  • Dynamic routing based on geo, performance
  • Purge/invalidate content globally

Non-functional Requirements

  • High cache hit ratio; low origin offload
  • Low p95 latency globally
  • Scalable invalidation propagation

High-Level Design

  • Edge POPs with tiered caches and origin shielding
  • Control plane for invalidations and configs

Capacity & Sizing

  • Requests/sec per POP, average payload size
  • Cache storage per POP and tier

Key Components

  • POP caches, Tiered cache, Origin
  • Invalidation service

Architecture

High-level components and data flow

Data Model

Core entities and relationships

  • cache_entries (key PK, etag, ttl, size, ts)
  • invalidations (id PK, pattern, ts, actor)

APIs

  • POST /api/invalidate { pattern }
  • GET /api/cache/:key
  • DELETE /api/cache/:key

Hot Path

  1. Request path: POP cache → tier cache → origin (on miss)

Caching & TTL

  • Honor Cache-Control/Surrogate-Control; stale-while-revalidate

Scaling

  • Hash ring to spread keys across POP caches
  • Async propagation of invalidations
  • Prefetch queues for new deployments

Trade-offs

  • Cache staleness vs purge frequency
  • Tiered cache hit ratio vs added latency
  • Compression formats vs CPU cost

Failure Modes & Mitigations

  • POP outage → reroute to nearest healthy POP
  • Miss storms → throttle and warm tiers
  • Invalidation backlog → prioritize patterns

Observability

  • SLIs: cache hit ratio, p95 latency, origin offload
  • TLS termination errors, cert expirations
  • Invalidation throughput and lag