Centralized Logging
Centralized Logging with metrics/logs/traces collection and alerting.
Requirements
Functional Requirements
- Ingest metrics/logs/traces
- Query/visualize
- Alerting
Non-functional Requirements
- Handle bursts
- Low-latency dashboards
High-Level Design
- Agents → gateway → TSDB/index → UI
Capacity & Sizing
- Time series/cardinality, logs/sec, traces/sec
Key Components
- Ingest gateway, TSDB, Index store, UI/Alerting
Architecture
High-level components and data flow
Data Model
Core entities and relationships
- metrics (
ts,name,labels_json,value) - logs (
ts,level,msg,labels_json) - traces (
trace_id PK,span_id,parent_id,ts,dur)
APIs
- POST /api/metrics
- POST /api/logs
- POST /api/traces
Hot Path
- Ingest → store
- Query → aggregate → return
Caching & TTL
- Cache queries with short TTL; precompute heavy dashboards
Scaling
- TSDB sharding
- Log index partitioning
- Sampling traces
Trade-offs
- Retention vs cost
- Sampling vs visibility
- Index depth vs query speed
Failure Modes & Mitigations
- Ingest overload → backpressure
- Hot partitions → rebalance
- Alert spam → dedupe
Observability
- Ingest rate
- query latency
- alert delivery