Batch Based Approaches to Updating Large Tables in PostgreSQL
Back to Blog

Batch Based Approaches to Updating Large Tables in PostgreSQL

I sometimes need to update a large table in Postgres, something like:

UPDATE users
SET is_active = true
WHERE last_login IS NOT NULL;

This looks simple, but once the table gets large it can take too long, hold locks for too long, and put unnecessary pressure on the database. Over time I ended up learning to run these kinds of updates in batches instead of doing everything in one shot.

The idea is to process a fixed number of rows per iteration, usually a few thousand at a time, until everything is done.

Case 1: Integer primary keys

If the table has an integer id, batching is straightforward because the values are naturally ordered and index-friendly.

The pattern is to keep track of the last processed ID and move forward:

DO $$
DECLARE
    batch_size INT := 2000;
    last_id BIGINT := 0;
    rows_updated INT;
BEGIN
    LOOP
        WITH batch AS (
            SELECT id
            FROM users
            WHERE id > last_id
            ORDER BY id
            LIMIT batch_size
        )
        UPDATE users u
        SET is_active = true
        FROM batch
        WHERE u.id = batch.id;

        GET DIAGNOSTICS rows_updated = ROW_COUNT;
        EXIT WHEN rows_updated = 0;

        SELECT MAX(id) INTO last_id FROM batch;
    END LOOP;
END $$;

This works well because Postgres can use the index on id, and each iteration only touches a small slice of the table.

Case 2: String IDs (UUID, CUID, etc.)

When the primary key is a string, such as UUIDs or CUIDs, you cannot safely rely on ordering. String comparison is lexicographic and does not represent insertion or temporal order.

Instead of trying to walk through the ID space, it is more reliable to repeatedly pick rows that still need to be processed.

DO $$
DECLARE
    batch_size INT := 2000;
    rows_updated INT;
BEGIN
    LOOP
        WITH batch AS (
            SELECT id
            FROM users
            WHERE last_login IS NOT NULL
              AND is_active = false
            LIMIT batch_size
        )
        UPDATE users u
        SET is_active = true
        FROM batch
        WHERE u.id = batch.id;

        GET DIAGNOSTICS rows_updated = ROW_COUNT;
        EXIT WHEN rows_updated = 0;
    END LOOP;
END $$;

This version does not depend on ordering. Instead, it repeatedly selects a chunk of unfinished rows and processes them until none are left.

It is also easier to reason about because the batching condition is tied directly to state rather than key structure.

Notes on performance and indexing

Batching only works well if the database can quickly find the next set of rows to process.

For integer ID batching, the primary key index is usually sufficient.

For state-based batching, ensure the filtering condition is indexed when possible:

CREATE INDEX ON users (last_login)
WHERE is_active = false;

Without proper indexing, each batch can degrade into repeated full table scans, which defeats the purpose of batching.