DEEP EXPLANATION

Choosing a vector database for scale (EXPLAINED)

Company BasedVector DatabasesHard25 min read

500M vectors at sub-100ms p99 is a staff-level vector search design question from Uber, Airbnb, and large-scale ML platform teams. Learn sharding strategies, index tuning, and the operational trade-offs that separate senior from principal engineers.

Vector Databases · System Design

TL;DR — Quick Answer

Sharded vector index, embedding cache, tiered storage, dedicated ANN service, and aggressive index tuning with monitoring.

The Interview Question

You need to store 500M vectors with <100ms p99 latency. How do you architect the vector search layer?

Deep Explanation

At 500M scale: partition by tenant/use-case, use managed vector DB with auto-scaling (Pinecone, Milvus) or self-hosted Milvus/Qdrant cluster. Optimize: HNSW ef_search params, product quantization for memory, pre-filtering with metadata, query embedding cache, and read replicas for query load.

Get deep explanations, PDF export & all Vector Databases questions

ScaleANNArchitectureUberAirbnb

Up next

Next Question

Vector database fundamentals (SOLVED)

Vector databases power every RAG system, yet most candidates can't explain ANN algorithms or hybrid search. This fundamental question appears in 80% of AI engineering loops. Master dense vs sparse retrieval and when hybrid search wins.

Continue

Choosing a vector database for scale (EXPLAINED)

The Interview Question

Deep Explanation

Real-World Examples

Common Mistakes

What Interviewers Expect

Follow-Up Questions