After implementing a basic autocomplete system using trigram indexing in PostgreSQL, I wanted to add typo tolerance next.
I already covered the prefix-search side in Building Prefix Search Autocomplete Using Trigram Indexing, but prefix matching alone is not enough for real-world search.
For example, if a user types piza, the system should still be able to return:
pizza
pizza hut
pizza toronto
This is accomplished by implementing a typo-tolerant query.
Basic Typo-Tolerant Query
PostgreSQL provides the % operator for similarity matching.
Example:
SELECT
term,
similarity(term, 'piza') AS score
FROM search_terms
WHERE term % 'piza'
ORDER BY score DESC
LIMIT 10;
This searches for terms sufficiently similar to "piza" and ranks them by similarity score.
The similarity() function returns a value between:
0→ completely different1→ identical
Similarity Threshold
The % operator uses a similarity threshold internally. The default threshold is 0.3, but you can adjust it:
SET pg_trgm.similarity_threshold = 0.2;
Lower thresholds:
- allow fuzzier matches
- return more results
Higher thresholds:
- produce stricter matching
- reduce false positives
Usually something around 0.2 - 0.4 works reasonably well for autocomplete.
Combining Prefix and Fuzzy Matching
In practice, it usually makes sense to combine:
- prefix matching
- typo tolerance
- popularity ranking
Example:
SELECT
term,
search_count,
similarity(term, 'piza') AS score
FROM search_terms
WHERE
term ILIKE 'piza%'
OR term % 'piza'
ORDER BY
score DESC,
search_count DESC
LIMIT 10;
This allows:
- exact prefix matches first
- typo-tolerant fallback matches
- more popular searches ranked higher
Important Limitation
Similarity matching works much better once the input reaches at least 3 characters. Therefore, very short inputs like p or pi do not contain enough information for meaningful fuzzy matching.
A common approach is:
- use normal prefix matching for short inputs
- enable similarity matching after 3+ characters
That usually produces much better autocomplete behavior.