Evaluate Index Strategies Beyond Big-O Complexity
Back to Blog

Evaluate Index Strategies Beyond Big-O Complexity

I was working on some query optimization work, trying to minimize search latency in a backend system. Naturally, this led me down the rabbit hole of indexing strategies.

I already knew that O(1) lookups are possible. Hash-based structures can achieve constant-time access, and databases do support hash indexes. But their limitations are pretty clear. They only handle exact matches and do not work for range queries or ordering, so they did not feel like a general solution.

That got me curious about something else.
Are there other indexing strategies that go beyond O(log n), but are still practical?

Faster Than O(log n)

There are indeed data structures with better asymptotic complexity.

One example is van Emde Boas trees, which are used for fast integer key operations like predecessor and successor queries in bounded integer universes and support operations in O(log log n) time. Another is the Union-Find structure with path compression, which is used to efficiently track connected components in dynamic graph connectivity problems and achieves O(α(n)), where α(n) grows so slowly that it is effectively constant for any realistic input size.

They look strictly better than O(log n), but they come with trade-offs. Large constant factors, complex implementations, and poor cache and disk behavior make them impractical for real databases. Most assume everything fits in memory, which is rarely the case.

What Databases Actually Use

While these theoretical improvements are interesting, they matter less in real systems. Big-O is rarely the main bottleneck in databases. Performance is driven more by how data is physically accessed than by index complexity.

Disk I/O, memory access, cache locality, and serialization often dominate. A theoretically faster structure like O(log log n) can still be slower than O(log n) if it leads to worse access patterns. This is why faster-than-log n data structures rarely make it into production systems. Their hardware-level costs usually cancel out any algorithmic gains.

Modern relational databases like PostgreSQL and MySQL rely on a small set of well-tested index types. PostgreSQL is a good example because it supports several specialized indexes.

Here is a quick survey.

B-tree (O(log n))

Default index type. Balanced, disk-friendly, and highly versatile.

  • Supports equality, range queries, and ordering
  • Very shallow in practice due to high branching factor
  • Often behaves close to constant time for lookups

GIN (O(log n + k))

Inverted index used for JSONB, arrays, and full-text search.

  • Maps value to many rows
  • Performance depends on result size k
  • Great for containment and search queries

GiST (roughly O(log n), data-dependent)

Generalized Search Tree, used for geometric data, ranges, and custom types.

  • Flexible framework rather than a single structure
  • Performance depends on how well data can be partitioned
  • Used for things like spatial queries

BRIN (O(n) with pruning)

Block Range Index, optimized for very large tables with ordered data, such as timestamps.

  • Stores summaries of data blocks instead of row-level entries
  • Extremely small index size
  • Works best when data is naturally ordered

Comparison

Index Type Typical Complexity Best For Key Trade-off
B-tree O(log n) General queries, ranges, sorting Reliable and versatile
GIN O(log n + k) JSONB, arrays, full-text search Depends on number of matches
GiST ~O(log n) Spatial, range, custom data types Highly data-dependent
BRIN O(n) with pruning Very large ordered datasets Less precise, may scan extra blocks