Re: Exposing the lock manager's WaitForLockers() to SQL - Mailing list pgsql-hackers
From | Will Mortensen |
---|---|
Subject | Re: Exposing the lock manager's WaitForLockers() to SQL |
Date | |
Msg-id | CAMpnoC4AZ963rzce_HOi2drB9TkB6SPhCUtspEPJUzQiGJ5KNA@mail.gmail.com Whole thread Raw |
In response to | Exposing the lock manager's WaitForLockers() to SQL (Will Mortensen <will@extrahop.com>) |
Responses |
Re: Exposing the lock manager's WaitForLockers() to SQL
Re: Exposing the lock manager's WaitForLockers() to SQL |
List | pgsql-hackers |
I got some very helpful off-list feedback from Robert Haas that this needed more self-contained explanation/motivation. So here goes. :-) This patch set adds a new SQL function pg_wait_for_lockers(), which waits for transactions holding specified table locks to commit or roll back. This can be useful with knowledge of the queries in those transactions, particularly for asynchronous and incremental processing of inserted/updated rows. Specifically, consider a scenario where INSERTs and UPDATEs always set a serial column to its default value. A client can call pg_sequence_last_value() + pg_wait_for_lockers() and then take a new DB snapshot and know that rows committed after this snapshot will have values of the serial column greater than the value from pg_sequence_last_value(). As shown in the example at the end, this allows the client to asynchronously and incrementally read inserted/updated rows with minimal per-client state, without buffering changes, and without affecting writer transactions. There are lots of other ways to support incrementally reading new rows, but they don’t have all of those qualities. For example: * Forcing writers to commit in a specific order (e.g. by serial column value) would reduce throughput * Explicitly tracking or coordinating with writers would likely be more complex, impact performance, and/or require much more state * Methods that are synchronous or buffer/queue changes are problematic if readers fall behind Existing ways to wait for table locks also have downsides: * Taking a conflicting lock with LOCK blocks new transactions from taking the lock of interest while LOCK waits. And in order to wait for writers holding RowExclusiveLock, we must take ShareLock, which also conflicts with ShareUpdateExclusiveLock and therefore unnecessarily interferes with (auto)vacuum. Finally, with multiple tables LOCK locks them one at a time, so it waits (and holds locks) longer than necessary. * Using pg_locks / pg_lock_status() to identify the transactions holding the locks is more expensive since it also returns all other locks in the DB cluster, plus there’s no efficient built-in way to wait for the transactions to commit or roll back. By contrast, pg_wait_for_lockers() doesn’t block other transactions, waits on multiple tables in parallel, and doesn’t spend time looking at irrelevant locks. This change is split into three patches for ease of review. The first two patches modify the existing WaitForLockers() C function and other locking internals to support waiting for lockers in a single lock mode, which allows waiting for INSERT/UPDATE without waiting for vacuuming. These changes could be omitted at the cost of unnecessary waiting, potentially for a long time with slow vacuums. The third patch adds the pg_wait_for_lockers() SQL function, which just calls WaitForLockers(). FWIW, another solution might be to directly expose the functions that WaitForLockers() calls, namely GetLockConflicts() (generalized to GetLockers() in the first patch) to identify the transactions holding the locks, and VirtualXactLock() to wait for each transaction to commit or roll back. That would be more complicated for the client but could be more broadly useful. I could investigate that further if it seems preferable. === Example === Assume we have the following table: CREATE TABLE page_views ( id bigserial, view_time timestamptz ); which is only ever modified by (potentially concurrent) INSERT commands that assign the default value to the id column. We can run the following commands: SELECT pg_sequence_last_value('page_views_id_seq'); pg_sequence_last_value ------------------------ 4 SELECT pg_wait_for_lockers(array['page_views']::oid, regclass[], 'RowExclusiveLock', FALSE); Now we know that all rows where id <= 4 have been committed or rolled back, and we can observe/process them: SELECT * FROM page_views WHERE id <= 4; id | view_time ----+------------------------------- 2 | 2024-01-01 12:34:01.000000-00 3 | 2024-01-01 12:34:00.000000-00 Later we can iterate: SELECT pg_sequence_last_value('page_views_id_seq'); pg_sequence_last_value ------------------------ 9 SELECT pg_wait_for_lockers(array['page_views']::oid, regclass[], 'RowExclusiveLock', FALSE); We already observed all the rows where id <= 4, so this time we can filter them out: SELECT * FROM page_views WHERE id > 4 AND id <= 9; id | view_time ----+------------------------------- 5 | 2024-01-01 12:34:05.000000-00 8 | 2024-01-01 12:34:04.000000-00 9 | 2024-01-01 12:34:07.000000-00 We can continue iterating like this to incrementally observe more newly inserted rows. Note that the only state we persist across iterations is the value returned by pg_sequence_last_value(). In this example, we processed inserted rows exactly once. Variations are possible for handling updates, as discussed in the original email, and I could explain that again better if it would be helpful. :-)
Attachment
pgsql-hackers by date: