Centralized Logging

Centralized Logging with metrics/logs/traces collection and alerting.

Requirements

Functional Requirements

Ingest metrics/logs/traces
Query/visualize
Alerting

Non-functional Requirements

Handle bursts
Low-latency dashboards

High-Level Design

Agents → gateway → TSDB/index → UI

Capacity & Sizing

Time series/cardinality, logs/sec, traces/sec

Key Components

Ingest gateway, TSDB, Index store, UI/Alerting

Architecture

High-level components and data flow

Data Model

Core entities and relationships

metrics (ts, name, labels_json, value)
logs (ts, level, msg, labels_json)
traces (trace_id PK, span_id, parent_id, ts, dur)

APIs

POST /api/metrics
POST /api/logs
POST /api/traces

Hot Path

Ingest → store
Query → aggregate → return

Caching & TTL

Cache queries with short TTL; precompute heavy dashboards

Scaling

TSDB sharding
Log index partitioning
Sampling traces

Trade-offs

Retention vs cost
Sampling vs visibility
Index depth vs query speed

Failure Modes & Mitigations

Ingest overload → backpressure
Hot partitions → rebalance
Alert spam → dedupe

Observability

Ingest rate
query latency
alert delivery