I needed to implement an autocomplete feature for a search field, so I looked into how these systems are usually built. Turns out the basic implementation is much simpler than I expected, and PostgreSQL alone can already handle a lot of it.
The idea is straightforward:
- store search terms
- track how often they are searched
- query terms matching the user's prefix
- add proper indexing so the queries stay fast
Storing Search Terms
For a simple autocomplete system, a single table is enough:
CREATE TABLE search_terms (
term TEXT PRIMARY KEY,
search_count INTEGER NOT NULL DEFAULT 0,
updated_at TIMESTAMP NOT NULL DEFAULT NOW()
);
Whenever a user performs a search, increment the counter:
INSERT INTO search_terms(term, search_count)
VALUES ('pizza toronto', 1)
ON CONFLICT(term)
DO UPDATE SET
search_count = search_terms.search_count + 1,
updated_at = NOW();
Example data:
| term | search_count |
|---|---|
| pizza | 1200 |
| pizza hut | 850 |
| pizza toronto | 430 |
| pickup | 300 |
Basic Autocomplete Query
When the user types: pi, the backend can query:
SELECT term, search_count
FROM search_terms
WHERE term ILIKE 'pi%'
ORDER BY search_count DESC
LIMIT 10;
This gives:
- prefix matching
- popularity ranking
- deterministic results
Trigram Indexes
The important part is indexing. Without an index, PostgreSQL must scan the entire table for every autocomplete request, which becomes expensive once the dataset grows:
WHERE term ILIKE 'pi%'
PostgreSQL solves this using the pg_trgm extension, which implements trigram indexing.
A trigram is simply a sequence of 3 consecutive characters. For example:
pizza
is internally broken into:
" pi"
"piz"
"izz"
"zza"
"za "
PostgreSQL indexes these trigram fragments instead of entire strings. This allows it to quickly narrow down candidate matches for partial searches like:
LIKEILIKE- fuzzy matching
- typo-tolerant matching
Enable trigram support:
CREATE EXTENSION pg_trgm;
Then create a GIN trigram index:
CREATE INDEX idx_search_terms_trgm
ON search_terms
USING gin (term gin_trgm_ops);
Now prefix searches become dramatically faster, even with large datasets.
Another nice benefit is typo tolerance. A query like piza can still match pizza using PostgreSQL similarity functions.