Search Service

Distributed search with indexing pipeline, shard/replica management, and low-latency queries.

Learning Objectives

By the end of this case study, you will understand:

Design horizontally scalable search architecture with sharding
Implement efficient indexing pipeline with near real-time updates
Build query routing and result merging for distributed search
Design relevance scoring and ranking algorithms (BM25, neural)
Implement high availability through replica management

Real-World Examples

Elasticsearch: Powers search for Netflix, Uber, and thousands of companies

Google Search: Handles 8.5 billion searches per day across trillions of pages

Amazon Product Search: Powers search across 600+ million products

Shopify Search: Enables product discovery for 4+ million businesses

Requirements

Functional Requirements

Index documents with fields; update/delete
Search with filters, sorting, pagination
Reindex segments and schema evolution

Non-functional Requirements

Low p95 query latency under load
High availability via replicas
Near real-time index freshness

High-Level Design

Write pipeline: parse/analyze → index
Query router: route → fanout → merge
Shard/replica management for HA and scale

Capacity & Sizing

Documents count, average size, terms per doc
Posting lists size and memory for caches
QPS for search and indexing throughput

Key Components

Indexer, Query Router
Shard nodes (storage + search executors)
Coordinator for cluster state

Architecture

High-level components and data flow

Data Model

Core entities and relationships

documents (doc_id PK, title, body, ts)
terms (term PK, df)
postings (term, doc_id, tf, positions)

APIs

POST /api/index { doc }
GET /api/search?q=...&topk=...
POST /api/reindex/segment { segmentId }

Hot Path

Search: route → fanout → score → merge top-k

Caching & TTL

Cache hot queries and partial results with short TTLs
Warm caches for frequent filters and facets

Scaling

Shard by term range or hash; replicas for HA
Columnar/segment storage for compression and speed
Cache hot queries and postings lists

Trade-offs

Freshness vs query performance (near-real-time)
BM25 vs neural rerank compute cost
Strict consistency vs availability under failures

Failure Modes & Mitigations

Shard outage → route to replicas; degrade features
Index corruption → restore from checkpoints
Skewed terms → adaptive query planning

Implementation Notes

Use inverted indexes with posting lists for efficient term lookup
Implement segment-based architecture for concurrent read/write operations
Design query parser with support for complex boolean expressions
Use skip lists or compressed indexes for memory efficiency
Implement real-time search with incremental index updates

Best Practices

Design schema carefully - field types affect query performance significantly
Implement proper text analysis pipeline (tokenization, stemming, synonyms)
Use appropriate data structures: tries for autocomplete, bloom filters for existence
Implement query optimization and caching at multiple levels
Design for gradual index rebuilds without service interruption

Common Pitfalls

Not considering term frequency distribution - hot terms can kill performance
Insufficient memory allocation for posting lists and caches
Poor shard key selection leading to uneven data distribution
Not implementing proper circuit breakers for slow queries
Inadequate testing with production-scale data and query patterns

Observability

SLIs: p95 query latency, error rate, tail latency by shard
Indexing lag and segment merge metrics
Cache hit ratios and hot term dashboards