Centralized Logging

Centralized Logging with metrics/logs/traces collection and alerting.

Requirements

Functional Requirements

  • Ingest metrics/logs/traces
  • Query/visualize
  • Alerting

Non-functional Requirements

  • Handle bursts
  • Low-latency dashboards

High-Level Design

  • Agents → gateway → TSDB/index → UI

Capacity & Sizing

  • Time series/cardinality, logs/sec, traces/sec

Key Components

  • Ingest gateway, TSDB, Index store, UI/Alerting

Architecture

High-level components and data flow

Data Model

Core entities and relationships

  • metrics (ts, name, labels_json, value)
  • logs (ts, level, msg, labels_json)
  • traces (trace_id PK, span_id, parent_id, ts, dur)

APIs

  • POST /api/metrics
  • POST /api/logs
  • POST /api/traces

Hot Path

  1. Ingest → store
  2. Query → aggregate → return

Caching & TTL

  • Cache queries with short TTL; precompute heavy dashboards

Scaling

  • TSDB sharding
  • Log index partitioning
  • Sampling traces

Trade-offs

  • Retention vs cost
  • Sampling vs visibility
  • Index depth vs query speed

Failure Modes & Mitigations

  • Ingest overload → backpressure
  • Hot partitions → rebalance
  • Alert spam → dedupe

Observability

  • Ingest rate
  • query latency
  • alert delivery