Back to Blog

Tuning Semantic Search for Real Performance Gains

April 28, 2026 · semantic search vector database hnsw ivf machine learning data engineering latency algorithm optimization

Fighting for ways to optimize search time in semantic search has become a daily struggle I deal with now. From all the struggles I've run into, I've learned that there are a few key factors that really determine how fast your semantic search can be. Most performance issues don't come from exotic optimizations, but from a small set of parameters and design choices that end up shaping everything else downstream.

1. Choose the Right Index: HNSW vs IVF

Let's start with the index. For most small to mid-scale workloads, HNSW is the default choice.

It performs well when the dataset is within a range where graph-based search remains memory-efficient and fast, typically up to a few million vectors depending on hardware. In this regime, it offers strong latency, recall, and operational simplicity without needing clustering or training steps.

IVF becomes more relevant as scale increases, often beyond roughly 10 million vectors, or when memory usage becomes a limiting factor. It improves efficiency by grouping vectors into clusters and restricting search to a subset of them, but introduces additional tuning complexity.

If there is uncertainty, HNSW is usually the safer starting point and remains sufficient for most applications before scale becomes a constraint.

2. HNSW Parameters and Why They Dominate Search Performance

Assuming you are using HNSW, the most important tuning parameters are those that affect how the graph is explored during query time. These directly influence both search speed and result quality, and even small changes can noticeably affect production behavior.

To understand the cost structure, compare against brute force search: O(N × D), where N is the number of vectors and D is embedding dimension.

efSearch

HNSW avoids scanning all vectors by navigating a graph, which reduces the effective cost to:

search time ∝ efSearch

This makes efSearch the main runtime control knob in HNSW, directly affecting both latency and recall.

Typical settings:

20 - 50: very fast queries, lower recall
50 - 150: balanced performance
150 - 300+: higher recall, higher latency

topK

topK defines how many results are returned after search. It does not affect traversal, only how many items are selected from the explored candidates.

Its usefulness depends on efSearch. If efSearch is too small, the candidate pool is weak, and increasing topK only returns more low-quality results rather than improving relevance.

M and efConstruction

M controls graph connectivity and memory usage
efConstruction controls index build quality

These influence how well the graph is structured during indexing. A better graph can improve recall at a given efSearch, but they do not directly affect how many nodes are visited per query, making their impact on latency indirect.

3. Embedding Dimension and Compute Cost

Embedding dimension directly affects search speed because similarity computation scales linearly with vector size.

search time ∝ embedding dimension

Since efSearch determines how many comparisons are performed, total query time scales with both factors.

Common real-world examples:

all-MiniLM-L6-v2: 384 dimensions
bge-base / e5-base: 768 dimensions
text-embedding-ada-002: 1536 dimensions

Higher dimensions increase per-comparison cost proportionally, making embedding size one of the most important latency drivers in practice.

4. Distance Metric Cost

Common similarity metrics include cosine similarity, dot product, and L2 distance.

Cosine similarity is most commonly used for text embeddings. When vectors are normalized, it reduces to a dot product, which is computationally efficient. All three methods have similar asymptotic cost since they require a full pass over the embedding dimensions.

Metric	Speed	Notes
Dot product	Fastest	Simple and highly optimized
Cosine similarity	Very fast	Equivalent to dot product if normalized
L2 distance	Slightly slower	Extra arithmetic per dimension