Web Crawler
Web Crawler high-level architecture and considerations.
Requirements
Functional Requirements
- Core CRUD and list/search
Non-functional Requirements
- Availability, latency targets
High-Level Design
- Client → API → DB
- Cache hot reads
Capacity & Sizing
- Req/s, data size, growth rate
Key Components
- API, DB, Cache
Architecture
High-level components and data flow
Data Model
Core entities and relationships
- entities (
id PK,ts,data_json)
APIs
- GET /api/entities
- POST /api/entities
- GET /api/entities/:id
Hot Path
- Create → read → update
Caching & TTL
- Cache hot reads for seconds; conditional requests
Scaling
- Shard by id
- Cache hot reads
- Async writes
Trade-offs
- Consistency vs availability
- Cost vs latency
- Simplicity vs features
Failure Modes & Mitigations
- DB outage → degrade reads
- Hot keys → rate limit
- Network partitions → retries
Observability
- Error rate
- latency
- capacity