I sometimes need to update a large table in Postgres, something like:
UPDATE users
SET is_active = true
WHERE last_login IS NOT NULL;
This looks simple, but once the table gets large it can take too long, hold locks for too long, and put unnecessary pressure on the database. Over time I ended up learning to run these kinds of updates in batches instead of doing everything in one shot.
The idea is to process a fixed number of rows per iteration, usually a few thousand at a time, until everything is done.
Case 1: Integer primary keys
If the table has an integer id, batching is straightforward because the values are naturally ordered and index-friendly.
The pattern is to keep track of the last processed ID and move forward:
DO $$
DECLARE
batch_size INT := 2000;
last_id BIGINT := 0;
rows_updated INT;
BEGIN
LOOP
WITH batch AS (
SELECT id
FROM users
WHERE id > last_id
ORDER BY id
LIMIT batch_size
)
UPDATE users u
SET is_active = true
FROM batch
WHERE u.id = batch.id;
GET DIAGNOSTICS rows_updated = ROW_COUNT;
EXIT WHEN rows_updated = 0;
SELECT MAX(id) INTO last_id FROM batch;
END LOOP;
END $$;
This works well because Postgres can use the index on id, and each iteration only touches a small slice of the table.
Case 2: String IDs (UUID, CUID, etc.)
When the primary key is a string, such as UUIDs or CUIDs, you cannot safely rely on ordering. String comparison is lexicographic and does not represent insertion or temporal order.
Instead of trying to walk through the ID space, it is more reliable to repeatedly pick rows that still need to be processed.
DO $$
DECLARE
batch_size INT := 2000;
rows_updated INT;
BEGIN
LOOP
WITH batch AS (
SELECT id
FROM users
WHERE last_login IS NOT NULL
AND is_active = false
LIMIT batch_size
)
UPDATE users u
SET is_active = true
FROM batch
WHERE u.id = batch.id;
GET DIAGNOSTICS rows_updated = ROW_COUNT;
EXIT WHEN rows_updated = 0;
END LOOP;
END $$;
This version does not depend on ordering. Instead, it repeatedly selects a chunk of unfinished rows and processes them until none are left.
It is also easier to reason about because the batching condition is tied directly to state rather than key structure.
Notes on performance and indexing
Batching only works well if the database can quickly find the next set of rows to process.
For integer ID batching, the primary key index is usually sufficient.
For state-based batching, ensure the filtering condition is indexed when possible:
CREATE INDEX ON users (last_login)
WHERE is_active = false;
Without proper indexing, each batch can degrade into repeated full table scans, which defeats the purpose of batching.