Search Service
Distributed search with indexing pipeline, shard/replica management, and low-latency queries.
Learning Objectives
By the end of this case study, you will understand:
- Design horizontally scalable search architecture with sharding
- Implement efficient indexing pipeline with near real-time updates
- Build query routing and result merging for distributed search
- Design relevance scoring and ranking algorithms (BM25, neural)
- Implement high availability through replica management
Real-World Examples
Elasticsearch: Powers search for Netflix, Uber, and thousands of companies
Google Search: Handles 8.5 billion searches per day across trillions of pages
Amazon Product Search: Powers search across 600+ million products
Shopify Search: Enables product discovery for 4+ million businesses
Requirements
Functional Requirements
- Index documents with fields; update/delete
- Search with filters, sorting, pagination
- Reindex segments and schema evolution
Non-functional Requirements
- Low p95 query latency under load
- High availability via replicas
- Near real-time index freshness
High-Level Design
- Write pipeline: parse/analyze → index
- Query router: route → fanout → merge
- Shard/replica management for HA and scale
Capacity & Sizing
- Documents count, average size, terms per doc
- Posting lists size and memory for caches
- QPS for search and indexing throughput
Key Components
- Indexer, Query Router
- Shard nodes (storage + search executors)
- Coordinator for cluster state
Architecture
High-level components and data flow
Data Model
Core entities and relationships
- documents (
doc_id PK,title,body,ts) - terms (
term PK,df) - postings (
term,doc_id,tf,positions)
APIs
- POST /api/index { doc }
- GET /api/search?q=...&topk=...
- POST /api/reindex/segment { segmentId }
Hot Path
- Search: route → fanout → score → merge top-k
Caching & TTL
- Cache hot queries and partial results with short TTLs
- Warm caches for frequent filters and facets
Scaling
- Shard by term range or hash; replicas for HA
- Columnar/segment storage for compression and speed
- Cache hot queries and postings lists
Trade-offs
- Freshness vs query performance (near-real-time)
- BM25 vs neural rerank compute cost
- Strict consistency vs availability under failures
Failure Modes & Mitigations
- Shard outage → route to replicas; degrade features
- Index corruption → restore from checkpoints
- Skewed terms → adaptive query planning
Implementation Notes
- Use inverted indexes with posting lists for efficient term lookup
- Implement segment-based architecture for concurrent read/write operations
- Design query parser with support for complex boolean expressions
- Use skip lists or compressed indexes for memory efficiency
- Implement real-time search with incremental index updates
Best Practices
- Design schema carefully - field types affect query performance significantly
- Implement proper text analysis pipeline (tokenization, stemming, synonyms)
- Use appropriate data structures: tries for autocomplete, bloom filters for existence
- Implement query optimization and caching at multiple levels
- Design for gradual index rebuilds without service interruption
Common Pitfalls
- Not considering term frequency distribution - hot terms can kill performance
- Insufficient memory allocation for posting lists and caches
- Poor shard key selection leading to uneven data distribution
- Not implementing proper circuit breakers for slow queries
- Inadequate testing with production-scale data and query patterns
Observability
- SLIs: p95 query latency, error rate, tail latency by shard
- Indexing lag and segment merge metrics
- Cache hit ratios and hot term dashboards