URL Shortener

Designing a TinyURL-like service with billions of redirects per day.

Learning Objectives

By the end of this case study, you will understand:

Design high-throughput URL encoding and decoding systems
Implement efficient key generation strategies (base62, hash-based, counter-based)
Build globally distributed caching for 95%+ cache hit rates
Handle massive read-to-write ratios (1000:1) with proper data partitioning
Design analytics pipelines for click tracking and URL performance metrics

Real-World Examples

Bitly: Processes 600+ million links per month with 99.9% uptime, used by Nike, Disney, and BBC

TinyURL: One of the first URL shorteners (2002), handles millions of redirects daily with minimal infrastructure

t.co (Twitter): Processes billions of clicks, automatically shortens all URLs for security and analytics

short.link (Google): Powers YouTube video sharing and Google's internal link shortening needs

Learning Objectives

By the end of this case study, you will understand:

Design high-throughput URL encoding and decoding systems
Implement efficient key generation strategies (base62, hash-based, counter-based)
Build globally distributed caching for 95%+ cache hit rates
Handle massive read-to-write ratios (1000:1) with proper data partitioning
Design analytics pipelines for click tracking and URL performance metrics

Real-World Examples

Bitly: Processes 600+ million links per month with 99.9% uptime, used by Nike, Disney, and BBC

TinyURL: One of the first URL shorteners (2002), handles millions of redirects daily with minimal infrastructure

t.co (Twitter): Processes billions of clicks, automatically shortens all URLs for security and analytics

short.link (Google): Powers YouTube video sharing and Google's internal link shortening needs

Global routing, caching, app tier, keygen, KV, and analytics

Requirements

Functional Requirements

URL Shortening: Convert long URLs to unique short codes
URL Redirection: Redirect short URLs to original destinations
Custom Aliases: Allow users to create custom short codes
Expiration: Support time-based URL expiration
Analytics: Track clicks, geographic data, referrers
User Management: Account creation and URL management
Bulk Operations: API for bulk URL shortening
Link Preview: Safe preview before redirecting

Non-Functional Requirements

Scale: 100M URLs created/month, 10B redirects/month
Latency: < 50ms P95 redirect latency globally
Availability: 99.99% uptime for redirect service
Read/Write Ratio: 1000:1 (heavily read-optimized)
Storage: 10TB for 5 years of URL data
Cache Hit Rate: > 95% for popular URLs
Security: Prevent malicious URLs, rate limiting

Capacity Planning & Traffic Analysis

Write Traffic

100M URLs/month = 38.5 URLs/second
Peak traffic (10x avg) = 385 URLs/second
Average URL size = 200 bytes
With metadata = 300 bytes per record

Read Traffic

10B redirects/month = 3.86K redirects/second
Peak traffic (5x avg) = 19.3K redirects/second
Cache memory (1M hot URLs) = 300MB
95% cache hit rate target

Storage Growth

Per year: 100M × 12 × 300B = 360GB
5 years: 1.8TB raw data
With replication (3x): 5.4TB
Analytics data: ~2x URL data

High-Level Design

Edge/CDN for geo routing and cache hot redirects
App layer for create/redirect, rate limiting, auth
Primary KV store: code → long URL (+ TTL, flags)
ID generator for base62 codes; collision-safe

Key Components

API/App tier, WAF/Rate limiter
Key generator (Snowflake/KSUID or hash+retry)
Cache (Redis/Memcached) and persistent KV (Dynamo/Cassandra)
Analytics pipeline (Kafka → OLAP)

Capacity & Sizing

Assume 10M new URLs/day → ~115 QPS writes (peak 10×)
Reads 1000× writes → ~115k QPS (peak); cache target >95% hit ratio
Average URL 200 bytes; with metadata ~300 bytes/document
Storage per year ≈ 10M × 300B × 365 ≈ 1.1 TB (before replication)

Key Components

Code generation: Snowflake/KSUID or hash + collision handling
Storage: Redis/Memcache cache + persistent DB (Cassandra/DynamoDB)
Analytics pipeline (optional): Kafka → OLAP store (BigQuery/ClickHouse)

Data Model

Codes table + analytics events

codes (code PK, url, created_at, expire_at, owner_id, flags, hits)
owner_codes (owner_id, code, created_at) — for listing by user
events (event_id PK, code, ts, ip, ua, country) — optional analytics stream

APIs

Create shortened URL: POST /api/shorten with body { "url": "https://example.com/very/long", "custom": "mycode" }
Redirect: GET /:code
Owner list: GET /api/urls?owner=me

Response (create): { "code": "AbCd12", "shortUrl": "https://x.y/AbCd12", "expireAt": null }

Hot Path

Client hits https://x.y/AbCd
Edge cache lookup; on miss, forward to nearest region
App reads cache → DB on miss; returns 301
Async increment hit counter; stream event

Caching & TTL

Edge cache 301 responses (10–60 minutes); invalidate on update/delete
Conditional requests for previews via ETag

Caching & TTL

Edge cache 301 responses for active codes (e.g., 10–60 minutes)
Invalidate on update/delete; background warmup for top codes
Local app cache (LRU) to reduce DB tail latency

Scaling

Partition KV by code prefix; consistent hashing to distribute
Replicate multi-region; read local, write-through to home region
Asynchronous analytics to decouple read path

Trade-offs

Eventual consistency acceptable for analytics
Cache TTL vs purge on update
Collision probability vs code length

Failure Modes & Mitigations

DB outage → serve from cache with stale-if-error window
Hot keys → per-key rate limiting and targeted pre-warm
Keygen collision → retry with different salt/sequence

Observability

SLIs: redirect success rate, p95 latency, cache hit ratio
Error budgets and alerts for saturation and failures
Structured logs for redirects and creation events

URL Encoding Strategies

Base62 Counter

Advantages

Sequential, predictable length
No collisions by design
Compact encoding (6 chars = 56B combinations)

Disadvantages

Single point of failure (counter service)
Difficult to scale horizontally
URLs are predictable (security concern)

Hash-Based (MD5/SHA)

Advantages

Stateless generation
Distributed-friendly
Same URL produces same hash

Disadvantages

Potential collisions
Fixed length (may be longer than needed)
Need collision detection logic

UUID/Snowflake (Recommended)

Advantages

Guaranteed uniqueness across nodes
Timestamp ordering capability
High throughput generation

Disadvantages

Slightly longer codes
Requires node coordination
More complex implementation

Caching Strategy Deep Dive

L1: CDN/Edge Cache

TTL: 24 hours for hot URLs
Coverage: Top 10% of URLs (80% traffic)
Invalidation: API-triggered purge
Size: 100K URLs per edge location

L2: Application Cache (Redis)

TTL: 6 hours with LRU eviction
Coverage: Top 50% of URLs (95% traffic)
Size: 10M URLs (~3GB memory)
Replication: Redis Cluster with 3 replicas

L3: Database Read Replicas

Purpose: Cache misses and analytics queries
Replication Lag: < 1 second
Read Distribution: Round-robin load balancing
Fallback: Primary DB for consistency

Analytics Pipeline Architecture

1. Event Collection

Async event publishing to Kafka
Click events with: timestamp, IP, user-agent, referrer
Batch processing for high throughput
Guaranteed delivery with at-least-once semantics

2. Stream Processing

Real-time aggregation using Apache Flink/Kafka Streams
Geographical IP resolution for location analytics
Bot detection and filtering based on patterns
Windowed aggregations (1min, 1hr, 1day)

3. Data Storage

ClickHouse for fast analytical queries
Partitioning by date for efficient time-range queries
Materialized views for common aggregations
Data retention: 2 years with compression

Security Considerations

🛡️ Rate Limiting

IP-based: 100 requests/minute for creation
User-based: 1000 URLs/day for authenticated users
Sliding window with Redis counters
Progressive penalties for repeat offenders

🔍 URL Validation

Malware scanning integration (VirusTotal API)
Phishing domain blacklist checking
URL format validation and sanitization
Recursive shortener detection

🔐 Access Control

JWT tokens for authenticated operations
API key management for enterprise clients
URL ownership validation for modifications
HTTPS enforcement for all endpoints

Best Practices

Design for cache-first architecture: 95%+ cache hit rate is critical for performance
Implement proper URL validation and sanitization to prevent malicious redirects
Use CDN with geographic distribution for global low-latency redirects
Design analytics as a separate service to avoid impacting redirect performance
Implement gradual key expiration and cleanup to manage storage costs

Common Pitfalls

Not handling key collisions properly - can lead to data corruption or infinite loops
Poor cache warming strategy leading to cache misses during traffic spikes
Insufficient URL validation allowing redirect to malicious sites
Not implementing proper rate limiting - vulnerable to abuse and DDoS
Storing analytics data synchronously - impacts redirect latency significantly

URL Shortener

Learning Objectives

Real-World Examples

Learning Objectives

Real-World Examples

Requirements

Functional Requirements

Non-Functional Requirements

Capacity Planning & Traffic Analysis

Write Traffic

Read Traffic

Storage Growth

High-Level Design

Key Components

Capacity & Sizing

Key Components

Data Model

APIs

Hot Path

Caching & TTL

Caching & TTL

Scaling

Trade-offs

Failure Modes & Mitigations

Observability

URL Encoding Strategies

Base62 Counter

Advantages

Disadvantages

Hash-Based (MD5/SHA)

Advantages

Disadvantages

UUID/Snowflake (Recommended)

Advantages

Disadvantages

Caching Strategy Deep Dive

L1: CDN/Edge Cache

L2: Application Cache (Redis)

L3: Database Read Replicas

Analytics Pipeline Architecture

1. Event Collection

2. Stream Processing

3. Data Storage

Security Considerations

🛡️ Rate Limiting

🔍 URL Validation

🔐 Access Control

Best Practices

Common Pitfalls

Share this resource