Analytics Pipeline

Real-time analytics from ingestion to OLAP with rollups and dashboards.

Learning Objectives

By the end of this case study, you will understand:

Design high-throughput event ingestion with backpressure handling
Implement stream processing for real-time aggregations and rollups
Build OLAP systems for fast analytical queries
Design exactly-once or idempotent processing guarantees
Implement efficient data partitioning and compaction strategies

Real-World Examples

Netflix: Processes 700+ billion events daily for recommendations and viewing analytics

Uber: Real-time analytics for surge pricing, driver matching, and demand forecasting

Airbnb: Event pipeline powers search ranking, pricing, and user behavior analytics

LinkedIn: Processes member activity for feed ranking and professional insights

Requirements

Functional Requirements

Ingest events, validate, and persist
Compute rollups/materialized views
Query metrics, segments, sessions

Non-functional Requirements

Handle bursts with backpressure
Exactly-once or idempotent at-least-once guarantees
Low-latency queries for dashboards

High-Level Design

Producers → event bus → stream processors
OLAP store for query; views for low-latency aggregates
DLQs and replay mechanisms

Capacity & Sizing

Events per second, average event size
Storage for raw vs aggregated data; retention
Throughput for processors and query concurrency

Key Components

Gateway/collector, Stream processors
OLAP store (columnar), Materialized views
DLQ and replay tools

Architecture

High-level components and data flow

Data Model

Core entities and relationships

events (event_id PK, ts, type, user_id, props_json)
rollups (bucket PK, metric, value)
sessions (session_id PK, user_id, start, end)

APIs

POST /api/events { type, userId, props }
GET /api/metrics?metric=...&range=...
GET /api/sessions?userId=...&range=...

Hot Path

Ingest: validate → enqueue → ack
Query: scan rollups → return

Caching & TTL

Cache dashboard queries for seconds to minutes

Scaling

Partition by event time and key; compact segments
Backpressure with bounded buffers and DLQs
Cold storage for raw events

Trade-offs

Exactly-once cost vs at-least-once + idempotency
Freshness of rollups vs compute cost
Wide-column vs columnar stores for queries

Failure Modes & Mitigations

Consumer lag → autoscale or reduce batch size
Skewed keys → repartitioning and salting
Provider outage → buffer and shed load

Observability

SLIs: ingest ack time, processing lag, query latency
DLQ size and reprocess metrics
Per-partition throughput dashboards