Notification Service

Designing multi-channel notifications: email, SMS, push, in-app with templates and preferences.

API + Router + Channel workers + Providers

Requirements

  • Templates with variables; localization
  • User preferences; channel fallbacks and quiet hours
  • Deduplication and idempotency

High-Level Design

  • API to enqueue notifications; outbox pattern from producers
  • Router selects channels based on preferences and urgency
  • Per-channel workers with provider retries and DLQs

Capacity & Sizing

  • Notifications/sec with burst tolerance; channel throughput limits
  • Provider quotas and rate limits per account/region
  • DLQ sizing and reprocessing policies

Key Components

  • Enqueue API and outbox processors
  • Router with preference evaluation and channel selection
  • Channel workers with provider adapters

Data Model

Templates, notifications, preferences, and deliveries

  • templates (template_id PK, name, locale, body)
  • notifications (notif_id PK, user_id, template_id, status, created_at)
  • preferences (user_id PK, email, sms, push, quiet_hours)
  • deliveries (delivery_id PK, notif_id, channel, provider, status, attempts, last_error)

APIs

  • Enqueue: POST /api/notifications with body { userId, templateId, data }
  • Preferences: PUT /api/preferences/:userId with body { email, sms, push, quietHours }
  • Status: GET /api/notifications/:id

Hot Path

  1. Enqueue → persist → publish → route channels
  2. Deliver via provider → update delivery status

Delivery Flow

  1. Enqueue notification; persist and publish event
  2. Router expands into channel-specific deliveries
  3. Workers call providers with retries/backoff; DLQ on failure

Scaling

  • Partition queues by channel; autoscale workers
  • Use provider pools and rate limits; fallback providers
  • Batch sends when supported (email)

Caching & TTL

  • Cache compiled templates per locale with TTL
  • Short TTL cache for user preferences; invalidate on update

Failure Modes & Mitigations

  • Provider outage → fallback route and DLQ
  • Spam/blocks → feedback loops and suppression lists
  • Duplicate sends → idempotency keys per notif/user

Observability

  • Delivery rates, failure reasons, and time-to-send
  • End-to-end tracing across router and workers