Thread: Exposing the lock manager's WaitForLockers() to SQL
Hi there, We'd like to be able to call the lock manager's WaitForLockers() and WaitForLockersMultiple() from SQL. Below I describe our use case, but basically I'm wondering if this: 1. Seems like a reasonable thing to do 2. Would be of interest upstream 3. Should be done with a new pg_foo() function (taking an oid?), or a whole new SQL command, or something else If this sounds promising, we may be able to code this up and submit it. The rest of this email describes our use cases and related observations: ==== Use Case Background ==== Our use case is inspired by this blog post by Marco Slot (CC'ed) at Citus Data: https://www.citusdata.com/blog/2018/06/14/scalable-incremental-data-aggregation/ . This describes a scheme for correctly aggregating rows given minimal coordination with an arbitrary number of writers while keeping minimal additional state. It relies on two simple facts: 1. INSERT/UPDATE take their ROW EXCLUSIVE lock on the target table before evaluating any column DEFAULT expressions, and thus before e.g. calling nextval() on a sequence in the DEFAULT expression. And of course, this lock is only released when the transaction commits or rolls back. 2. pg_sequence_last_value() (still undocumented!) can be used to obtain an instantaneous upper bound on the sequence values that have been returned by nextval(), even if the transaction that called nextval() hasn't yet committed. So, assume we have a table: create table tbl ( id bigserial, data text ); which is only ever modified by INSERTs that use DEFAULT for id. Then, a client can process each row exactly once using a loop like this (excuse the pseudo-SQL): min_id := 0; while true: max_id := pg_sequence_last_value('tbl_id_seq'); wait_for_writers('tbl'::regclass); SELECT some_aggregation(data) FROM tbl WHERE id > min_id AND id <= max_id; min_id := max_id; In the blog post, the equivalent of wait_for_writers() is implemented by taking and immediately releasing a SHARE ROW EXCLUSIVE lock on tbl. It's unclear why this can't be SHARE, since it just needs to conflict with INSERT's ROW EXCLUSIVE, but in any case it's sufficient for correctness. (Note that this version only works if the rows committed by the transactions that it waited for are actually visible to the SELECT, so for example, the whole thing can't be within a Repeatable Read or Serializable transaction.) ==== Why WaitForLockers()? ==== No new writer can acquire a ROW EXCLUSIVE lock as long as we're waiting to obtain the SHARE lock, even if we only hold it for an instant. If we have to wait a long time, because some existing writer holds its ROW EXCLUSIVE lock for a long time, this could noticeably reduce overall writer throughput. But we don't actually need to obtain a lock at all--and waiting for transactions that already hold conflicting locks is exactly what WaitForLockers() / WaitForLockersMultiple() does. Using it instead would prevent any interference with writers. ==== Appendix: Extensions and Observations ==== Aside from downgrading to SHARE mode and merely waiting instead of locking, we propose a couple other extensions and observations related to Citus' scheme. These only tangentially motivate our need for WaitForLockers(), so you may stop reading here unless the overall scheme is of interest. == Separate client for reading sequences and waiting == First, in our use case each batch of rows might require extensive processing as part of a larger operation that doesn't want to block waiting for writers to commit. A simple extension is to separate the processing from the determination of sequence values. In other words, have a single client that sits in a loop: while true: seq_val := pg_sequence_last_value('tbl_id_seq'); WaitForLockers('tbl'::regclass, 'SHARE'); publish(seq_val); and any number of other clients that use the series of published sequence values to do their own independent processing (maintaining their own additional state). This can be extended to multiple tables with WaitForLockersMultiple(): while true: seq_val1 := pg_sequence_last_value('tbl1_id_seq'); seq_val2 := pg_sequence_last_value('tbl2_id_seq'); WaitForLockersMultiple( ARRAY['tbl1', 'tbl2']::regclass[], 'SHARE'); publish('tbl1', seq_val1); publish('tbl2', seq_val2); Which is clearly more efficient than locking or waiting for the tables in sequence, hence the desire for that function as well. == Latency == This brings us to a series of observations about latency. If some writers take a long time to commit, some already-committed rows might not be processed for a long time. To avoid exacerbating this when using WaitForLockersMultiple(), which obviously has to wait for the last writer of any specified table, it should be used with groups of tables that are generally written by the same transactions. Also, while in Citus' example the aggregation needs to process each row exactly once, latency can be reduced if a row may be processed more than once and if rows can be processed out of order by sequence value (id), by simply removing the "id <= max_id" term from the WHERE clause in the reader. This particularly reduces latency if waiting and processing are separated as described in the above section. == Updates and latency == In our application we have some use cases with a table like: create table tbl ( id bigint primary key, data text, mod_idx bigserial ); where writers do: INSERT INTO tbl (id, data) VALUES (1, 'foo') ON CONFLICT (id) DO UPDATE SET data = excluded.data, mod_idx = DEFAULT; and where the reader's job is to continuously replicate rows within a fixed range of id's in an eventually-consistent fashion. Since the writer always bumps mod_idx by setting it to DEFAULT, superficially it seems we can use this scheme with mod_idx: min_mod_idx := 0; while true: max_mod_idx := pg_sequence_last_value('tbl_mod_idx_seq'); WaitForLockers('tbl'::regclass, 'SHARE'); SELECT do_replicate(id, data, mod_idx) FROM tbl WHERE id >= my_min_id -- app-specific AND id < my_max_id -- app-specific AND mod_idx > min_mod_idx AND mod_idx <= max_mod_idx ORDER BY mod_idx; min_mod_idx := max_mod_idx; This version replicates all rows eventually (if writers stop), replicates each version of a row at most once (allowing updates to be skipped if obviated by a later committed update), and replicates changes in order by mod_idx, which may make bookkeeping easier. But continuous overlapping updates could keep some rows *perpetually* out of the reader's reach, leading to *unbounded* latency. If the application can instead tolerate potentially replicating the same version of a row more than once, and replicating changes to different rows out of order by mod_idx, latency can be minimized by removing "mod_idx <= max_mod_idx" from the WHERE clause. (The ORDER BY should likely also be removed, since later batches may contain rows with a lower mod_idx.) The remainder of the scheme still ensures that all rows are eventually replicated, and limits redundant replication while keeping minimal state. == Latency tradeoff and advantages == In conclusion, with this scheme there is a tradeoff between minimizing latency and avoiding redundant processing, where (depending on the scenario) the amount of latency or redundant processing is related to the maximum amount of time that a writer transaction holds a ROW EXCLUSIVE lock on the table. Therefore, this time should be minimized wherever possible. This tradeoff seems to be an inherent consequence of the minimalist advantages of this scheme: 1. If we use WaitForLockers(), no additional locks are taken, so there's no impact on concurrency of writers 2. If WaitForLockers() is separated from readers, there's no impact on concurrency/waiting of readers 3. Can be used to guarantee eventual consistency as desired 4. Keeps O(1) state per table (per reader)--no tracking of individual writers or individual row updates 5. Requires minimal cooperation from writers (just use DEFAULT expressions that use nextval())
On Fri, Dec 23, 2022 at 11:43 AM Will Mortensen <will@extrahop.com> wrote: > We'd like to be able to call the lock manager's WaitForLockers() and > WaitForLockersMultiple() from SQL. Below I describe our use case, but > basically I'm wondering if this: > > 1. Seems like a reasonable thing to do > > 2. Would be of interest upstream > > 3. Should be done with a new pg_foo() function (taking an > oid?), or a whole new SQL command, or something else Definitely +1 on adding a function/syntax to wait for lockers without actually taking a lock. The get sequence value + lock-and-release approach is still the only reliable scheme I've found for reliably and efficiently processing new inserts in PostgreSQL. I'm wondering whether it could be an option of the LOCK command. (LOCK WAIT ONLY?) Marco
Hi Marco, thanks for the reply! Glad to know you'd find it useful too. :-) On Tue, Jan 10, 2023 at 1:01 AM Marco Slot <marco.slot@gmail.com> wrote: > I'm wondering whether it could be an option of the LOCK command. > (LOCK WAIT ONLY?) I assume that's doable, but just from looking at the docs, it might be a little confusing. For example, at least if we use WaitForLockersMultiple(), waiting for multiple tables would happen in parallel (which I think is good), while locking them is documented to happen sequentially. Also, normal LOCK is illegal outside a transaction, but waiting makes perfect sense. (Actually, normal LOCK makes sense too, if the goal was just to wait. :-) ) By contrast, while LOCK has NOWAIT, and SELECT's locking clause has NOWAIT and SKIP LOCKED, they only change the blocking/failure behavior, while success still means taking the lock and has the same semantics. But I'm really no expert on SQL syntax or typical practice for things like this. Anything that works is fine with me. :-) ==== As a possibly superfluous sidebar, I wanted to correct this part of my original message: > On Fri, Dec 23, 2022 at 11:43 AM Will Mortensen <will@extrahop.com> wrote: > > pg_sequence_last_value() (still undocumented!) can be used to > > obtain an instantaneous upper bound on the sequence values > > that have been returned by nextval(), even if the transaction > > that called nextval() hasn't yet committed. This is true, but not the most important part of making this scheme work: as you mentioned in the Citus blog post, to avoid missing rows, we need (and this gives us) an instantaneous *lower* bound on the sequence values that could be used by transactions that commit after we finish waiting (and start processing). This doesn't work with sequence caching, since without somehow inspecting all sessions' sequence caches, rows with arbitrarily old/low cached sequence values could be committed arbitrarily far into the future, and we'd fail to process them. As you also implied in the blog post, the upper bound is what allows us to also process each row *exactly* once (instead of at least once) and in sequence order, if desired. So those are the respective justifications for both arms of the WHERE clause: id > min_id AND id <= max_id . On Tue, Jan 10, 2023 at 1:01 AM Marco Slot <marco.slot@gmail.com> wrote: > > On Fri, Dec 23, 2022 at 11:43 AM Will Mortensen <will@extrahop.com> wrote: > > We'd like to be able to call the lock manager's WaitForLockers() and > > WaitForLockersMultiple() from SQL. Below I describe our use case, but > > basically I'm wondering if this: > > > > 1. Seems like a reasonable thing to do > > > > 2. Would be of interest upstream > > > > 3. Should be done with a new pg_foo() function (taking an > > oid?), or a whole new SQL command, or something else > > Definitely +1 on adding a function/syntax to wait for lockers without > actually taking a lock. The get sequence value + lock-and-release > approach is still the only reliable scheme I've found for reliably and > efficiently processing new inserts in PostgreSQL. I'm wondering > whether it could be an option of the LOCK command. (LOCK WAIT ONLY?) > > Marco
Hi, On 2023-01-10 10:01:25 +0100, Marco Slot wrote: > On Fri, Dec 23, 2022 at 11:43 AM Will Mortensen <will@extrahop.com> wrote: > > We'd like to be able to call the lock manager's WaitForLockers() and > > WaitForLockersMultiple() from SQL. Below I describe our use case, but > > basically I'm wondering if this: > > > > 1. Seems like a reasonable thing to do > > > > 2. Would be of interest upstream > > > > 3. Should be done with a new pg_foo() function (taking an > > oid?), or a whole new SQL command, or something else > > Definitely +1 on adding a function/syntax to wait for lockers without > actually taking a lock. I think such a function would still have to integrate enough with the lock manager infrastructure to participate in the deadlock detector. Otherwise I think you'd trivially end up with loads of deadlocks. Greetings, Andres Freund
Hi Andres, On Wed, Jan 11, 2023 at 12:33 PM Andres Freund <andres@anarazel.de> wrote: > I think such a function would still have to integrate enough with the lock > manager infrastructure to participate in the deadlock detector. Otherwise I > think you'd trivially end up with loads of deadlocks. Could you elaborate on which unusual deadlock concerns arise? To be clear, WaitForLockers() is an existing function in lmgr.c (https://github.com/postgres/postgres/blob/216a784829c2c5f03ab0c43e009126cbb819e9b2/src/backend/storage/lmgr/lmgr.c#L986), and naively it seems like we mostly just need to call it. To my very limited understanding, from looking at the existing callers and the implementation of LOCK, that would look something like this (assuming we're in a SQL command like LOCK and calling unmodified WaitForLockers() with a single table): 1. Call something like RangeVarGetRelidExtended() with AccessShareLock to ensure the table is not dropped and obtain the table oid 2. Use SET_LOCKTAG_RELATION() to construct the lock tag from the oid 3. Call WaitForLockers(), which internally calls GetLockConflicts() and VirtualXactLock(). These certainly take plenty of locks of various types, and will likely sleep in LockAcquire() waiting for transactions to finish, but there don't seem to be any unusual pre/postconditions, nor do we hold any unusual locks already. Obviously a deadlock is possible if transactions end up waiting for each other, just as when taking table or row locks, etc., but it seems like this would be detected as usual?
I suppose if it's correct that we need to lock the table first (at least in ACCESS SHARE mode), an option to LOCK perhaps makes more sense. Maybe you could specify two modes like: LOCK TABLE IN _lockmode_ MODE AND THEN WAIT FOR CONFLICTS WITH _waitmode_ MODE; But that might be excessive. :-D And I don't know if there's any reason to use a _lockmode_ other than ACCESS SHARE. On Wed, Jan 11, 2023 at 11:03 PM Will Mortensen <will@extrahop.com> wrote: > > Hi Andres, > > On Wed, Jan 11, 2023 at 12:33 PM Andres Freund <andres@anarazel.de> wrote: > > I think such a function would still have to integrate enough with the lock > > manager infrastructure to participate in the deadlock detector. Otherwise I > > think you'd trivially end up with loads of deadlocks. > > Could you elaborate on which unusual deadlock concerns arise? To be > clear, WaitForLockers() is an existing function in lmgr.c > (https://github.com/postgres/postgres/blob/216a784829c2c5f03ab0c43e009126cbb819e9b2/src/backend/storage/lmgr/lmgr.c#L986), > and naively it seems like we mostly just need to call it. To my very > limited understanding, from looking at the existing callers and the > implementation of LOCK, that would look something like this > (assuming we're in a SQL command like LOCK and calling unmodified > WaitForLockers() with a single table): > > 1. Call something like RangeVarGetRelidExtended() with AccessShareLock > to ensure the table is not dropped and obtain the table oid > > 2. Use SET_LOCKTAG_RELATION() to construct the lock tag from the oid > > 3. Call WaitForLockers(), which internally calls GetLockConflicts() and > VirtualXactLock(). These certainly take plenty of locks of various types, > and will likely sleep in LockAcquire() waiting for transactions to finish, > but there don't seem to be any unusual pre/postconditions, nor do we > hold any unusual locks already. > > Obviously a deadlock is possible if transactions end up waiting for each > other, just as when taking table or row locks, etc., but it seems like this > would be detected as usual?
Hi, On 2023-01-11 23:03:30 -0800, Will Mortensen wrote: > On Wed, Jan 11, 2023 at 12:33 PM Andres Freund <andres@anarazel.de> wrote: > > I think such a function would still have to integrate enough with the lock > > manager infrastructure to participate in the deadlock detector. Otherwise I > > think you'd trivially end up with loads of deadlocks. > > Could you elaborate on which unusual deadlock concerns arise? To be > clear, WaitForLockers() is an existing function in lmgr.c > (https://github.com/postgres/postgres/blob/216a784829c2c5f03ab0c43e009126cbb819e9b2/src/backend/storage/lmgr/lmgr.c#L986), > and naively it seems like we mostly just need to call it. I know that WaitForLockers() is an existing function :). I'm not sure it's entirely suitable for your use case. So I mainly wanted to point out that if you end up writing a separate version of it, you still need to integrate with the deadlock detection. WaitForLockers() does that by actually acquiring a lock on the "transaction" its waiting for. > To my very limited understanding, from looking at the existing callers and > the implementation of LOCK, that would look something like this (assuming > we're in a SQL command like LOCK and calling unmodified WaitForLockers() > with a single table): > > 1. Call something like RangeVarGetRelidExtended() with AccessShareLock > to ensure the table is not dropped and obtain the table oid > > 2. Use SET_LOCKTAG_RELATION() to construct the lock tag from the oid > > 3. Call WaitForLockers(), which internally calls GetLockConflicts() and > VirtualXactLock(). These certainly take plenty of locks of various types, > and will likely sleep in LockAcquire() waiting for transactions to finish, > but there don't seem to be any unusual pre/postconditions, nor do we > hold any unusual locks already. I suspect that keeping the AccessShareLock while doing the WaitForLockers() is likely to increase the deadlock risk noticeably. I think for the use case you might get away with resolving the relation names, building the locktags, and then release the lock, before calling WaitForLockers. If somebody drops the table or such, you'd presumably still get desired behaviour that way, without the increased deaadlock risk. Greetings, Andres Freund
Hi Andres, On Thu, Jan 12, 2023 at 11:31 AM Andres Freund <andres@anarazel.de> wrote: > I know that WaitForLockers() is an existing function :). I'm not sure it's > entirely suitable for your use case. So I mainly wanted to point out that if > you end up writing a separate version of it, you still need to integrate with > the deadlock detection. I see. What about it seems potentially unsuitable? > On 2023-01-11 23:03:30 -0800, Will Mortensen wrote: > > To my very limited understanding, from looking at the existing callers and > > the implementation of LOCK, that would look something like this (assuming > > we're in a SQL command like LOCK and calling unmodified WaitForLockers() > > with a single table): > > > > 1. Call something like RangeVarGetRelidExtended() with AccessShareLock > > to ensure the table is not dropped and obtain the table oid > > > > 2. Use SET_LOCKTAG_RELATION() to construct the lock tag from the oid > > > > 3. Call WaitForLockers(), which internally calls GetLockConflicts() and > > VirtualXactLock(). These certainly take plenty of locks of various types, > > and will likely sleep in LockAcquire() waiting for transactions to finish, > > but there don't seem to be any unusual pre/postconditions, nor do we > > hold any unusual locks already. > > I suspect that keeping the AccessShareLock while doing the WaitForLockers() is > likely to increase the deadlock risk noticeably. I think for the use case you > might get away with resolving the relation names, building the locktags, and > then release the lock, before calling WaitForLockers. If somebody drops the > table or such, you'd presumably still get desired behaviour that way, without > the increased deaadlock risk. That makes sense. I agree it seems fine to just return if e.g. the table is dropped. FWIW re: deadlocks in general, I probably didn't highlight it well in my original email, but the existing solution for this use case (as Marco described in his blog post) is to actually lock the table momentarily. Marco's blog post uses ShareRowExclusiveLock, but I think ShareLock is sufficient for us; in any case, that's stronger than the AccessShareLock that we need to merely wait. And actually locking the table with e.g. ShareLock seems perhaps *more* likely to cause deadlocks (and hurts performance), since it not only waits for existing conflicting lockers (e.g. RowExclusiveLock) as desired, but also undesirably blocks other transactions from newly acquiring conflicting locks in the meantime. Hence the motivation for this feature. :-) I'm sure I may be missing something though. Thanks for all your feedback. :-)
Hi, On 2023-01-12 19:21:00 -0800, Will Mortensen wrote: > FWIW re: deadlocks in general, I probably didn't highlight it well in my > original email, but the existing solution for this use case (as Marco > described in his blog post) is to actually lock the table momentarily. > Marco's blog post uses ShareRowExclusiveLock, but I think ShareLock is > sufficient for us; in any case, that's stronger than the AccessShareLock that > we need to merely wait. > > And actually locking the table with e.g. ShareLock seems perhaps *more* > likely to cause deadlocks (and hurts performance), since it not only waits for > existing conflicting lockers (e.g. RowExclusiveLock) as desired, but also > undesirably blocks other transactions from newly acquiring conflicting locks > in the meantime. Hence the motivation for this feature. :-) > > I'm sure I may be missing something though. Thanks for all your feedback. :-) From a deadlock risk pov, it's worse to hold an AccessShareLock and then wait for other transaction to end, than to just wait for ShareRowExclusiveLock, without holding any locks. If you don't hold any locks (*) and wait for a lock, you cannot participate in a deadlock, because nobody will wait for you. A deadlock is a cycle in the lock graph, a node can't participate in a deadlock if it doesn't have any incoming edges, and there can't be incoming edges if there's nothing to wait on. Consider a scenario like this: tx 1: acquires RowExclusiveLock on tbl1 to insert rows tx 2: acquires AccessShareLock on tbl1 tx 2: WaitForLockers(ShareRowExclusiveLock, tbl1) ends up waiting for tx1 tx 1: truncate tbl1 needs an AccessExclusiveLock Boom, a simple deadlock. tx1 can't progress, because it can't get AccessExclusiveLock, and tx2 can't progress because tx1 didn't finish. But if tx2 directly waited for ShareRowExclusiveLock, there'd not been any cycle in the lock graph, and everything would have worked. Regards, Andres (*) If you define holding locks expansive, it's impossible to wait for a lock without holding a lock, since every transaction holds a lock on its own virtual transactionid. But normally nobody just waits for a transaction that hasn't done anything.
Hi Andres, On Thu, Jan 12, 2023 at 7:49 PM Andres Freund <andres@anarazel.de> wrote: > Consider a scenario like this: > > tx 1: acquires RowExclusiveLock on tbl1 to insert rows > tx 2: acquires AccessShareLock on tbl1 > tx 2: WaitForLockers(ShareRowExclusiveLock, tbl1) ends up waiting for tx1 > tx 1: truncate tbl1 needs an AccessExclusiveLock Oh of course, thanks. Is it even necessary to take the AccessShareLock? I see that one can call e.g. RangeVarGetRelidExtended() with NoLock, and from the comments it seems like that might be OK here? Did you have any remaining concerns about the suitability of WaitForLockers() for the use case? Any thoughts on the syntax? It seems like an option to LOCK (like Marco suggested) might be simplest to implement albeit a little tricky to document. Supporting descendant tables looks straightforward enough (just collect more locktags?). Views look more involved; maybe we can avoid supporting them?
Here is a first attempt at a WIP patch. Sorry about the MIME type. It doesn't take any locks on the tables, but I'm not super confident that that's safe, so any input would be appreciated. I omitted view support for simplicity, but if that seems like a requirement I'll see about adding it. I assume we would need to take AccessShareLock on views (and release it, per above). If the syntax and behavior seem roughly correct I'll work on updating the docs. The commit message at the beginning of the .patch has slightly more commentary. Thanks for any and all feedback!
Attachment
Updated patch with more tests and a first attempt at doc updates. As the commit message and doc now point out, using WaitForLockersMultiple() makes for a behavior difference with actually locking multiple tables, in that the combined set of conflicting locks is obtained only once for all tables, rather than obtaining conflicts and locking / waiting for just the first table and then obtaining conflicts and locking / waiting for the second table, etc. This is definitely desirable for my use case, but maybe these kinds of differences illustrate the potential awkwardness of extending LOCK? Thanks again for any and all feedback!
Attachment
Updated docs a bit. I'll see about adding this to the next CF in hopes of attracting a reviewer. :-)
Attachment
I realized that for our use case, we'd ideally wait for holders of RowExclusiveLock only, and not e.g. VACUUM holding ShareUpdateExclusiveLock. Waiting for lockers in a specific mode seems possible by generalizing/duplicating WaitForLockersMultiple() and GetLockConflicts(), but I'd love to have a sanity check before attempting that. Also, I imagine those semantics might be too different to make sense as part of the LOCK command. Alternatively, I had originally been trying to use the pg_locks view, which obviously provides flexibility in identifying existing lock holders. But I couldn't find a way to wait for the locks to be released / transactions to finish, and I was a little concerned about the performance impact of selecting from it frequently when we only care about a subset of the locks, although I didn't try to assess that in our particular application. In any case, I'm looking forward to hearing more feedback from reviewers and potential users. :-)
On Sun, Sep 3, 2023 at 11:16 PM Will Mortensen <will@extrahop.com> wrote: > I realized that for our use case, we'd ideally wait for holders of > RowExclusiveLock only, and not e.g. VACUUM holding > ShareUpdateExclusiveLock. Waiting for lockers in a specific mode seems > possible by generalizing/duplicating WaitForLockersMultiple() and > GetLockConflicts(), but I'd love to have a sanity check before > attempting that. Also, I imagine those semantics might be too > different to make sense as part of the LOCK command. Well I attempted it. :-) Here is a new series that refactors GetLockConflicts(), generalizes WaitForLockersMultiple(), and adds a new WAIT FOR LOCKERS command. I first tried extending LOCK further, but the code became somewhat unwieldy and the syntax became very confusing. I also thought again about making new pg_foo() functions, but that would seemingly make it harder to share code with LOCK, and sharing syntax (to the extent it makes sense) feels very natural. Also, a new SQL command provides plenty of doc space. :-) (I'll see about adding more examples later.) I'll try to edit the title of the CF entry accordingly. Still looking forward to any feedback. :-)
Attachment
I meant to add that the example in the doc is adapted from Marco Slot's blog post linked earlier: https://www.citusdata.com/blog/2018/06/14/scalable-incremental-data-aggregation/
Simplified the code and docs, and rewrote the example with more prose instead of PL/pgSQL, which unfortunately made it longer, although it could be truncated. Not really sure what's best...
Attachment
On Sat, 2024-01-06 at 02:57 -0800, Will Mortensen wrote: > Simplified the code and docs, and rewrote the example with more prose > instead of PL/pgSQL, which unfortunately made it longer, although it > could be truncated. Not really sure what's best... I thought about this idea, and I have some doubts. WAIT FOR LOCKERS only waits for transactions that were holding locks when the statement started. Transactions that obtailed locks later on are ignored. While your original use case is valid, I cannot think of any other use case. So it is a special-purpose statement that is only useful for certain processing of append-only tables. Is it worth creating a new SQL statement for that, which could lead to a conflict with future editions of the SQL standard? Couldn't we follow the PostgreSQL idiosyncrasy of providing a function with side effects instead? Yours, Laurenz Albe
Hi Laurenz, thanks for taking a look! On Sat, Jan 6, 2024 at 4:00 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > While your original use case is valid, I cannot think of > any other use case. So it is a special-purpose statement that is only > useful for certain processing of append-only tables. It is definitely somewhat niche. :-) But as I mentioned in my longwinded original message, the scheme is easily extended (with some tradeoffs) to process updates, if they set a non-primary-key column using a sequence. As for deletions though, our applications handle them separately. > Is it worth creating a new SQL statement for that, which could lead to > a conflict with future editions of the SQL standard? Couldn't we follow > the PostgreSQL idiosyncrasy of providing a function with side effects > instead? I would be happy to add a pg_foo() function instead. Here are a few things to figure out: * To support waiting for lockers in a specified mode vs. conflicting with a specified mode, should there be two functions, or one function with a boolean argument like I used in C? * Presumably the function(s) would take a regclass[] argument? * Presumably the lock mode would be specified using strings like 'ShareLock'? There's no code to parse these AFAICT, but we could add it. * Maybe we could omit LOCK's handling of descendant tables for simplicity? I will have to see how much other code needs to be duplicated or shared. I'll look further into it later this week.
Here is a new series adding a single pg_wait_for_lockers() function that takes a boolean argument to control the interpretation of the lock mode. It omits LOCK's handling of descendant tables so it requires permissions directly on descendants in order to wait for locks on them. Not sure if that would be a problem for anyone.
Attachment
On Thu, 11 Jan 2024 at 15:22, Will Mortensen <will@extrahop.com> wrote: > > Here is a new series adding a single pg_wait_for_lockers() function > that takes a boolean argument to control the interpretation of the > lock mode. It omits LOCK's handling of descendant tables so it > requires permissions directly on descendants in order to wait for > locks on them. Not sure if that would be a problem for anyone. CFBot shows that there is one warning as in [1]: patching file doc/src/sgml/libpq.sgml ... [09:30:40.000] [940/2212] Compiling C object src/backend/postgres_lib.a.p/storage_lmgr_condition_variable.c.obj [09:30:40.000] [941/2212] Compiling C object src/backend/postgres_lib.a.p/storage_lmgr_deadlock.c.obj [09:30:40.000] [942/2212] Compiling C object src/backend/postgres_lib.a.p/storage_lmgr_lmgr.c.obj [09:30:40.000] [943/2212] Compiling C object src/backend/postgres_lib.a.p/storage_lmgr_lock.c.obj [09:30:40.000] c:\cirrus\src\backend\storage\lmgr\lock.c(4084) : warning C4715: 'ParseLockmodeName': not all control paths return a value Please post an updated version for the same. [1] - https://api.cirrus-ci.com/v1/task/4884224944111616/logs/build.log Regards, Vignesh
On Fri, Jan 26, 2024 at 4:54 AM vignesh C <vignesh21@gmail.com> wrote: > > CFBot shows that there is one warning as in [1]: > patching file doc/src/sgml/libpq.sgml > ... > [09:30:40.000] [943/2212] Compiling C object > src/backend/postgres_lib.a.p/storage_lmgr_lock.c.obj > [09:30:40.000] c:\cirrus\src\backend\storage\lmgr\lock.c(4084) : > warning C4715: 'ParseLockmodeName': not all control paths return a > value Thanks Vignesh, I guess the MS compiler doesn't have __builtin_constant_p()? So I added an unreachable return, and a regression test that exercises this error path. I also made various other simplifications and minor fixes to the code, docs, and tests. Back in v5 (with a new SQL command) I had a detailed example in the docs, which I removed when changing to a function, and I'm not sure if I should try to add it back now...I could shrink it but it might still be too long for this part of the docs? Anyway, please see attached.
Attachment
I guess the output of the deadlock test was unstable, so I simply removed it in v8 here, but I can try to fix it instead if it seems important to test that.
Attachment
Minor style fix; sorry for the spam.
Attachment
Rebased and fixed conflicts. FWIW re: Andrey's comment in his excellent CF summary email[0]: we're currently using vanilla Postgres (via Gentoo) on single nodes, and not anything fancy like Citus. The Citus relationship is just that we were inspired by Marco's blog post there. We have a variety of clients written in different languages that generally don't coordinate their table modifications, and Marco's scheme merely requires them to use sequences idiomatically, which we can just about manage. :-) This feature is then a performance optimization to support this scheme while avoiding the case where one writer holding a RowExclusiveLock blocks the reader from taking a ShareLock which in turn prevents other writers from taking a RowExclusiveLock for a long time. Instead, the reader can wait for the first writer without taking any locks or blocking later writers. I've illustrated this difference in the isolation tests. Still hoping we can get this into 17. :-) [0] https://www.postgresql.org/message-id/C8D65462-0888-4484-A72C-C99A94381ECD%40yandex-team.ru
Attachment
Rebased, fixed a couple typos, and reordered the isolation tests to put the most elaborate pair last.
Attachment
I got some very helpful off-list feedback from Robert Haas that this needed more self-contained explanation/motivation. So here goes. :-) This patch set adds a new SQL function pg_wait_for_lockers(), which waits for transactions holding specified table locks to commit or roll back. This can be useful with knowledge of the queries in those transactions, particularly for asynchronous and incremental processing of inserted/updated rows. Specifically, consider a scenario where INSERTs and UPDATEs always set a serial column to its default value. A client can call pg_sequence_last_value() + pg_wait_for_lockers() and then take a new DB snapshot and know that rows committed after this snapshot will have values of the serial column greater than the value from pg_sequence_last_value(). As shown in the example at the end, this allows the client to asynchronously and incrementally read inserted/updated rows with minimal per-client state, without buffering changes, and without affecting writer transactions. There are lots of other ways to support incrementally reading new rows, but they don’t have all of those qualities. For example: * Forcing writers to commit in a specific order (e.g. by serial column value) would reduce throughput * Explicitly tracking or coordinating with writers would likely be more complex, impact performance, and/or require much more state * Methods that are synchronous or buffer/queue changes are problematic if readers fall behind Existing ways to wait for table locks also have downsides: * Taking a conflicting lock with LOCK blocks new transactions from taking the lock of interest while LOCK waits. And in order to wait for writers holding RowExclusiveLock, we must take ShareLock, which also conflicts with ShareUpdateExclusiveLock and therefore unnecessarily interferes with (auto)vacuum. Finally, with multiple tables LOCK locks them one at a time, so it waits (and holds locks) longer than necessary. * Using pg_locks / pg_lock_status() to identify the transactions holding the locks is more expensive since it also returns all other locks in the DB cluster, plus there’s no efficient built-in way to wait for the transactions to commit or roll back. By contrast, pg_wait_for_lockers() doesn’t block other transactions, waits on multiple tables in parallel, and doesn’t spend time looking at irrelevant locks. This change is split into three patches for ease of review. The first two patches modify the existing WaitForLockers() C function and other locking internals to support waiting for lockers in a single lock mode, which allows waiting for INSERT/UPDATE without waiting for vacuuming. These changes could be omitted at the cost of unnecessary waiting, potentially for a long time with slow vacuums. The third patch adds the pg_wait_for_lockers() SQL function, which just calls WaitForLockers(). FWIW, another solution might be to directly expose the functions that WaitForLockers() calls, namely GetLockConflicts() (generalized to GetLockers() in the first patch) to identify the transactions holding the locks, and VirtualXactLock() to wait for each transaction to commit or roll back. That would be more complicated for the client but could be more broadly useful. I could investigate that further if it seems preferable. === Example === Assume we have the following table: CREATE TABLE page_views ( id bigserial, view_time timestamptz ); which is only ever modified by (potentially concurrent) INSERT commands that assign the default value to the id column. We can run the following commands: SELECT pg_sequence_last_value('page_views_id_seq'); pg_sequence_last_value ------------------------ 4 SELECT pg_wait_for_lockers(array['page_views']::oid, regclass[], 'RowExclusiveLock', FALSE); Now we know that all rows where id <= 4 have been committed or rolled back, and we can observe/process them: SELECT * FROM page_views WHERE id <= 4; id | view_time ----+------------------------------- 2 | 2024-01-01 12:34:01.000000-00 3 | 2024-01-01 12:34:00.000000-00 Later we can iterate: SELECT pg_sequence_last_value('page_views_id_seq'); pg_sequence_last_value ------------------------ 9 SELECT pg_wait_for_lockers(array['page_views']::oid, regclass[], 'RowExclusiveLock', FALSE); We already observed all the rows where id <= 4, so this time we can filter them out: SELECT * FROM page_views WHERE id > 4 AND id <= 9; id | view_time ----+------------------------------- 5 | 2024-01-01 12:34:05.000000-00 8 | 2024-01-01 12:34:04.000000-00 9 | 2024-01-01 12:34:07.000000-00 We can continue iterating like this to incrementally observe more newly inserted rows. Note that the only state we persist across iterations is the value returned by pg_sequence_last_value(). In this example, we processed inserted rows exactly once. Variations are possible for handling updates, as discussed in the original email, and I could explain that again better if it would be helpful. :-)
Attachment
I should add that the latest patches remove permissions checks because pg_locks doesn't have any, and improve the commit messages. Hope I didn't garble anything doing this late after the dev conference. :-) Robert asked me about other existing functions that could be leveraged, such as GetConflictingVirtualXIDs(), but I didn't see any with close-enough semantics that handle fast-path locks as needed for tables/relations.
On Thu, May 30, 2024 at 12:01 AM Will Mortensen <will@extrahop.com> wrote: > FWIW, another solution might be to directly expose the functions that > WaitForLockers() calls, namely GetLockConflicts() (generalized to > GetLockers() in the first patch) to identify the transactions holding > the locks, and VirtualXactLock() to wait for each transaction to > commit or roll back. That would be more complicated for the client but > could be more broadly useful. I could investigate that further if it > seems preferable. We will look further into this. Since the main advantage over polling the existing pg_locks view would be efficiency, we will try to provide more quantitative evidence/analysis of that. That will probably want to be a new thread and CF entry, so I'm withdrawing this one. Thanks again for all the replies, and to Robert for your off-list feedback and letting me bend your ear in Vancouver. :-)