Thread: Exposing the lock manager's WaitForLockers() to SQL

Exposing the lock manager's WaitForLockers() to SQL

From

Will Mortensen

Date:

23 December 2022, 10:41:20

Hi there,

We'd like to be able to call the lock manager's WaitForLockers() and
WaitForLockersMultiple() from SQL. Below I describe our use case, but
basically I'm wondering if this:

    1. Seems like a reasonable thing to do

    2. Would be of interest upstream

    3. Should be done with a new pg_foo() function (taking an
       oid?), or a whole new SQL command, or something else

If this sounds promising, we may be able to code this up and submit it.

The rest of this email describes our use cases and related observations:


==== Use Case Background ====

Our use case is inspired by this blog post by Marco Slot (CC'ed) at
Citus Data: https://www.citusdata.com/blog/2018/06/14/scalable-incremental-data-aggregation/
. This describes a scheme for correctly aggregating rows given minimal
coordination with an arbitrary number of writers while keeping minimal
additional state. It relies on two simple facts:

    1. INSERT/UPDATE take their ROW EXCLUSIVE lock on the target
       table before evaluating any column DEFAULT expressions,
       and thus before e.g. calling nextval() on a sequence in
       the DEFAULT expression. And of course, this lock is only
       released when the transaction commits or rolls back.

    2. pg_sequence_last_value() (still undocumented!) can be
       used to obtain an instantaneous upper bound on the
       sequence values that have been returned by nextval(), even
       if the transaction that called nextval() hasn't yet
       committed.

So, assume we have a table:

    create table tbl (
        id bigserial,
        data text
    );

which is only ever modified by INSERTs that use DEFAULT for id. Then,
a client can process each row exactly once using a loop like this
(excuse the pseudo-SQL):

    min_id := 0;
    while true:
        max_id := pg_sequence_last_value('tbl_id_seq');
        wait_for_writers('tbl'::regclass);
        SELECT
            some_aggregation(data)
            FROM tbl
            WHERE id > min_id AND id <= max_id;
        min_id := max_id;

In the blog post, the equivalent of wait_for_writers() is implemented
by taking and immediately releasing a SHARE ROW EXCLUSIVE lock on tbl.
It's unclear why this can't be SHARE, since it just needs to conflict
with INSERT's ROW EXCLUSIVE, but in any case it's sufficient for
correctness.

(Note that this version only works if the rows committed by the
transactions that it waited for are actually visible to the SELECT, so
for example, the whole thing can't be within a Repeatable Read or
Serializable transaction.)


==== Why WaitForLockers()? ====

No new writer can acquire a ROW EXCLUSIVE lock as long as we're
waiting to obtain the SHARE lock, even if we only hold it for an
instant. If we have to wait a long time, because some existing writer
holds its ROW EXCLUSIVE lock for a long time, this could noticeably
reduce overall writer throughput.

But we don't actually need to obtain a lock at all--and waiting for
transactions that already hold conflicting locks is exactly what
WaitForLockers() / WaitForLockersMultiple() does. Using it instead
would prevent any interference with writers.


==== Appendix: Extensions and Observations ====

Aside from downgrading to SHARE mode and merely waiting instead of
locking, we propose a couple other extensions and observations related
to Citus' scheme. These only tangentially motivate our need for
WaitForLockers(), so you may stop reading here unless the overall
scheme is of interest.

== Separate client for reading sequences and waiting ==

First, in our use case each batch of rows might require extensive
processing as part of a larger operation that doesn't want to block
waiting for writers to commit. A simple extension is to separate the
processing from the determination of sequence values. In other words,
have a single client that sits in a loop:

    while true:
        seq_val := pg_sequence_last_value('tbl_id_seq');
        WaitForLockers('tbl'::regclass, 'SHARE');
        publish(seq_val);

and any number of other clients that use the series of published
sequence values to do their own independent processing (maintaining
their own additional state).

This can be extended to multiple tables with WaitForLockersMultiple():

    while true:
        seq_val1 := pg_sequence_last_value('tbl1_id_seq');
        seq_val2 := pg_sequence_last_value('tbl2_id_seq');
        WaitForLockersMultiple(
            ARRAY['tbl1', 'tbl2']::regclass[], 'SHARE');
        publish('tbl1', seq_val1);
        publish('tbl2', seq_val2);

Which is clearly more efficient than locking or waiting for the tables
in sequence, hence the desire for that function as well.

== Latency ==

This brings us to a series of observations about latency. If some
writers take a long time to commit, some already-committed rows might
not be processed for a long time. To avoid exacerbating this when
using WaitForLockersMultiple(), which obviously has to wait for the
last writer of any specified table, it should be used with groups of
tables that are generally written by the same transactions.

Also, while in Citus' example the aggregation needs to process each
row exactly once, latency can be reduced if a row may be processed
more than once and if rows can be processed out of order by sequence
value (id), by simply removing the "id <= max_id" term from the WHERE
clause in the reader. This particularly reduces latency if waiting and
processing are separated as described in the above section.

== Updates and latency ==

In our application we have some use cases with a table like:

    create table tbl (
        id bigint primary key,
        data text,
        mod_idx bigserial
    );

where writers do:

    INSERT INTO tbl (id, data) VALUES (1, 'foo')
    ON CONFLICT (id) DO UPDATE
    SET data = excluded.data, mod_idx = DEFAULT;

and where the reader's job is to continuously replicate rows within a
fixed range of id's in an eventually-consistent fashion. Since the
writer always bumps mod_idx by setting it to DEFAULT, superficially it
seems we can use this scheme with mod_idx:

    min_mod_idx := 0;
    while true:
        max_mod_idx := pg_sequence_last_value('tbl_mod_idx_seq');
        WaitForLockers('tbl'::regclass, 'SHARE');
        SELECT
            do_replicate(id, data, mod_idx)
            FROM tbl
            WHERE
                id >= my_min_id     -- app-specific
                AND id < my_max_id  -- app-specific
                AND mod_idx > min_mod_idx
                AND mod_idx <= max_mod_idx
            ORDER BY mod_idx;
        min_mod_idx := max_mod_idx;

This version replicates all rows eventually (if writers stop),
replicates each version of a row at most once (allowing updates to be
skipped if obviated by a later committed update), and replicates
changes in order by mod_idx, which may make bookkeeping easier. But
continuous overlapping updates could keep some rows *perpetually* out
of the reader's reach, leading to *unbounded* latency. If the
application can instead tolerate potentially replicating the same
version of a row more than once, and replicating changes to different
rows out of order by mod_idx, latency can be minimized by removing
"mod_idx <= max_mod_idx" from the WHERE clause. (The ORDER BY should
likely also be removed, since later batches may contain rows with a
lower mod_idx.) The remainder of the scheme still ensures that all
rows are eventually replicated, and limits redundant replication while
keeping minimal state.

== Latency tradeoff and advantages ==

In conclusion, with this scheme there is a tradeoff between minimizing
latency and avoiding redundant processing, where (depending on the
scenario) the amount of latency or redundant processing is related to
the maximum amount of time that a writer transaction holds a ROW
EXCLUSIVE lock on the table. Therefore, this time should be minimized
wherever possible.

This tradeoff seems to be an inherent consequence of the minimalist
advantages of this scheme:

    1. If we use WaitForLockers(), no additional locks are taken,
       so there's no impact on concurrency of writers

    2. If WaitForLockers() is separated from readers, there's no
       impact on concurrency/waiting of readers

    3. Can be used to guarantee eventual consistency as desired

    4. Keeps O(1) state per table (per reader)--no tracking of
       individual writers or individual row updates

    5. Requires minimal cooperation from writers (just use DEFAULT
       expressions that use nextval())

Re: Exposing the lock manager's WaitForLockers() to SQL

From

Marco Slot

Date:

10 January 2023, 09:01:25

On Fri, Dec 23, 2022 at 11:43 AM Will Mortensen <will@extrahop.com> wrote:
> We'd like to be able to call the lock manager's WaitForLockers() and
> WaitForLockersMultiple() from SQL. Below I describe our use case, but
> basically I'm wondering if this:
>
>     1. Seems like a reasonable thing to do
>
>     2. Would be of interest upstream
>
>     3. Should be done with a new pg_foo() function (taking an
>        oid?), or a whole new SQL command, or something else

Definitely +1 on adding a function/syntax to wait for lockers without
actually taking a lock. The get sequence value + lock-and-release
approach is still the only reliable scheme I've found for reliably and
efficiently processing new inserts in PostgreSQL. I'm wondering
whether it could be an option of the LOCK command. (LOCK WAIT ONLY?)

Marco

Re: Exposing the lock manager's WaitForLockers() to SQL

From

Will Mortensen

Date:

11 January 2023, 09:59:38

Hi Marco, thanks for the reply! Glad to know you'd find it useful too. :-)

On Tue, Jan 10, 2023 at 1:01 AM Marco Slot <marco.slot@gmail.com> wrote:
> I'm wondering whether it could be an option of the LOCK command.
> (LOCK WAIT ONLY?)

I assume that's doable, but just from looking at the docs, it might be
a little confusing. For example, at least if we use
WaitForLockersMultiple(), waiting for multiple tables would happen in
parallel (which I think is good), while locking them is documented to
happen sequentially. Also, normal LOCK is illegal outside a
transaction, but waiting makes perfect sense. (Actually, normal LOCK
makes sense too, if the goal was just to wait. :-) )

By contrast, while LOCK has NOWAIT, and SELECT's locking clause
has NOWAIT and SKIP LOCKED, they only change the blocking/failure
behavior, while success still means taking the lock and has the same
semantics.

But I'm really no expert on SQL syntax or typical practice for things like
this. Anything that works is fine with me. :-)

====

As a possibly superfluous sidebar, I wanted to correct this part of my
original message:

> On Fri, Dec 23, 2022 at 11:43 AM Will Mortensen <will@extrahop.com> wrote:
> > pg_sequence_last_value() (still undocumented!) can be used to
> > obtain an instantaneous upper bound on the sequence values
> > that have been returned by nextval(), even if the transaction
> > that called nextval() hasn't yet committed.

This is true, but not the most important part of making this scheme
work: as you mentioned in the Citus blog post, to avoid missing rows,
we need (and this gives us) an instantaneous *lower* bound on the
sequence values that could be used by transactions that commit after
we finish waiting (and start processing). This doesn't work with
sequence caching, since without somehow inspecting all sessions'
sequence caches, rows with arbitrarily old/low cached sequence
values could be committed arbitrarily far into the future, and we'd
fail to process them.

As you also implied in the blog post, the upper bound is what
allows us to also process each row *exactly* once (instead of at
least once) and in sequence order, if desired.

So those are the respective justifications for both arms of the
WHERE clause: id > min_id AND id <= max_id .

On Tue, Jan 10, 2023 at 1:01 AM Marco Slot <marco.slot@gmail.com> wrote:
>
> On Fri, Dec 23, 2022 at 11:43 AM Will Mortensen <will@extrahop.com> wrote:
> > We'd like to be able to call the lock manager's WaitForLockers() and
> > WaitForLockersMultiple() from SQL. Below I describe our use case, but
> > basically I'm wondering if this:
> >
> >     1. Seems like a reasonable thing to do
> >
> >     2. Would be of interest upstream
> >
> >     3. Should be done with a new pg_foo() function (taking an
> >        oid?), or a whole new SQL command, or something else
>
> Definitely +1 on adding a function/syntax to wait for lockers without
> actually taking a lock. The get sequence value + lock-and-release
> approach is still the only reliable scheme I've found for reliably and
> efficiently processing new inserts in PostgreSQL. I'm wondering
> whether it could be an option of the LOCK command. (LOCK WAIT ONLY?)
>
> Marco

Re: Exposing the lock manager's WaitForLockers() to SQL

From

Andres Freund

Date:

11 January 2023, 20:33:29

Hi,

On 2023-01-10 10:01:25 +0100, Marco Slot wrote:
> On Fri, Dec 23, 2022 at 11:43 AM Will Mortensen <will@extrahop.com> wrote:
> > We'd like to be able to call the lock manager's WaitForLockers() and
> > WaitForLockersMultiple() from SQL. Below I describe our use case, but
> > basically I'm wondering if this:
> >
> >     1. Seems like a reasonable thing to do
> >
> >     2. Would be of interest upstream
> >
> >     3. Should be done with a new pg_foo() function (taking an
> >        oid?), or a whole new SQL command, or something else
> 
> Definitely +1 on adding a function/syntax to wait for lockers without
> actually taking a lock.

I think such a function would still have to integrate enough with the lock
manager infrastructure to participate in the deadlock detector. Otherwise I
think you'd trivially end up with loads of deadlocks.

Greetings,

Andres Freund

Re: Exposing the lock manager's WaitForLockers() to SQL

From

Will Mortensen

Date:

12 January 2023, 07:03:30

Hi Andres,

On Wed, Jan 11, 2023 at 12:33 PM Andres Freund <andres@anarazel.de> wrote:
> I think such a function would still have to integrate enough with the lock
> manager infrastructure to participate in the deadlock detector. Otherwise I
> think you'd trivially end up with loads of deadlocks.

Could you elaborate on which unusual deadlock concerns arise? To be
clear, WaitForLockers() is an existing function in lmgr.c

(https://github.com/postgres/postgres/blob/216a784829c2c5f03ab0c43e009126cbb819e9b2/src/backend/storage/lmgr/lmgr.c#L986),
and naively it seems like we mostly just need to call it. To my very
limited understanding, from looking at the existing callers and the
implementation of LOCK, that would look something like this
(assuming we're in a SQL command like LOCK and calling unmodified
WaitForLockers() with a single table):

1. Call something like RangeVarGetRelidExtended() with AccessShareLock
to ensure the table is not dropped and obtain the table oid

2. Use SET_LOCKTAG_RELATION() to construct the lock tag from the oid

3. Call WaitForLockers(), which internally calls GetLockConflicts() and
VirtualXactLock(). These certainly take plenty of locks of various types,
and will likely sleep in LockAcquire() waiting for transactions to finish,
but there don't seem to be any unusual pre/postconditions, nor do we
hold any unusual locks already.

Obviously a deadlock is possible if transactions end up waiting for each
other, just as when taking table or row locks, etc., but it seems like this
would be detected as usual?

Re: Exposing the lock manager's WaitForLockers() to SQL

From

Will Mortensen

Date:

12 January 2023, 08:17:48

I suppose if it's correct that we need to lock the table first (at least
in ACCESS SHARE mode), an option to LOCK perhaps makes
more sense. Maybe you could specify two modes like:

LOCK TABLE IN _lockmode_ MODE AND THEN WAIT FOR CONFLICTS WITH _waitmode_ MODE;

But that might be excessive. :-D And I don't know if there's any
reason to use a _lockmode_ other than ACCESS SHARE.

On Wed, Jan 11, 2023 at 11:03 PM Will Mortensen <will@extrahop.com> wrote:
>
> Hi Andres,
>
> On Wed, Jan 11, 2023 at 12:33 PM Andres Freund <andres@anarazel.de> wrote:
> > I think such a function would still have to integrate enough with the lock
> > manager infrastructure to participate in the deadlock detector. Otherwise I
> > think you'd trivially end up with loads of deadlocks.
>
> Could you elaborate on which unusual deadlock concerns arise? To be
> clear, WaitForLockers() is an existing function in lmgr.c
>
(https://github.com/postgres/postgres/blob/216a784829c2c5f03ab0c43e009126cbb819e9b2/src/backend/storage/lmgr/lmgr.c#L986),
> and naively it seems like we mostly just need to call it. To my very
> limited understanding, from looking at the existing callers and the
> implementation of LOCK, that would look something like this
> (assuming we're in a SQL command like LOCK and calling unmodified
> WaitForLockers() with a single table):
>
> 1. Call something like RangeVarGetRelidExtended() with AccessShareLock
> to ensure the table is not dropped and obtain the table oid
>
> 2. Use SET_LOCKTAG_RELATION() to construct the lock tag from the oid
>
> 3. Call WaitForLockers(), which internally calls GetLockConflicts() and
> VirtualXactLock(). These certainly take plenty of locks of various types,
> and will likely sleep in LockAcquire() waiting for transactions to finish,
> but there don't seem to be any unusual pre/postconditions, nor do we
> hold any unusual locks already.
>
> Obviously a deadlock is possible if transactions end up waiting for each
> other, just as when taking table or row locks, etc., but it seems like this
> would be detected as usual?

Re: Exposing the lock manager's WaitForLockers() to SQL

From

Andres Freund

Date:

12 January 2023, 19:31:22

Hi,

On 2023-01-11 23:03:30 -0800, Will Mortensen wrote:
> On Wed, Jan 11, 2023 at 12:33 PM Andres Freund <andres@anarazel.de> wrote:
> > I think such a function would still have to integrate enough with the lock
> > manager infrastructure to participate in the deadlock detector. Otherwise I
> > think you'd trivially end up with loads of deadlocks.
>
> Could you elaborate on which unusual deadlock concerns arise? To be
> clear, WaitForLockers() is an existing function in lmgr.c
>
(https://github.com/postgres/postgres/blob/216a784829c2c5f03ab0c43e009126cbb819e9b2/src/backend/storage/lmgr/lmgr.c#L986),
> and naively it seems like we mostly just need to call it.

I know that WaitForLockers() is an existing function :). I'm not sure it's
entirely suitable for your use case. So I mainly wanted to point out that if
you end up writing a separate version of it, you still need to integrate with
the deadlock detection. WaitForLockers() does that by actually acquiring a
lock on the "transaction" its waiting for.

> To my very limited understanding, from looking at the existing callers and
> the implementation of LOCK, that would look something like this (assuming
> we're in a SQL command like LOCK and calling unmodified WaitForLockers()
> with a single table):
>
> 1. Call something like RangeVarGetRelidExtended() with AccessShareLock
> to ensure the table is not dropped and obtain the table oid
>
> 2. Use SET_LOCKTAG_RELATION() to construct the lock tag from the oid
>
> 3. Call WaitForLockers(), which internally calls GetLockConflicts() and
> VirtualXactLock(). These certainly take plenty of locks of various types,
> and will likely sleep in LockAcquire() waiting for transactions to finish,
> but there don't seem to be any unusual pre/postconditions, nor do we
> hold any unusual locks already.

I suspect that keeping the AccessShareLock while doing the WaitForLockers() is
likely to increase the deadlock risk noticeably.  I think for the use case you
might get away with resolving the relation names, building the locktags, and
then release the lock, before calling WaitForLockers. If somebody drops the
table or such, you'd presumably still get desired behaviour that way, without
the increased deaadlock risk.

Greetings,

Andres Freund

Re: Exposing the lock manager's WaitForLockers() to SQL

From

Will Mortensen

Date:

13 January 2023, 03:21:00

Hi Andres,

On Thu, Jan 12, 2023 at 11:31 AM Andres Freund <andres@anarazel.de> wrote:
> I know that WaitForLockers() is an existing function :). I'm not sure it's
> entirely suitable for your use case. So I mainly wanted to point out that if
> you end up writing a separate version of it, you still need to integrate with
> the deadlock detection.

I see. What about it seems potentially unsuitable?

> On 2023-01-11 23:03:30 -0800, Will Mortensen wrote:
> > To my very limited understanding, from looking at the existing callers and
> > the implementation of LOCK, that would look something like this (assuming
> > we're in a SQL command like LOCK and calling unmodified WaitForLockers()
> > with a single table):
> >
> > 1. Call something like RangeVarGetRelidExtended() with AccessShareLock
> > to ensure the table is not dropped and obtain the table oid
> >
> > 2. Use SET_LOCKTAG_RELATION() to construct the lock tag from the oid
> >
> > 3. Call WaitForLockers(), which internally calls GetLockConflicts() and
> > VirtualXactLock(). These certainly take plenty of locks of various types,
> > and will likely sleep in LockAcquire() waiting for transactions to finish,
> > but there don't seem to be any unusual pre/postconditions, nor do we
> > hold any unusual locks already.
>
> I suspect that keeping the AccessShareLock while doing the WaitForLockers() is
> likely to increase the deadlock risk noticeably.  I think for the use case you
> might get away with resolving the relation names, building the locktags, and
> then release the lock, before calling WaitForLockers. If somebody drops the
> table or such, you'd presumably still get desired behaviour that way, without
> the increased deaadlock risk.

That makes sense. I agree it seems fine to just return if e.g. the table is
dropped.

FWIW re: deadlocks in general, I probably didn't highlight it well in my
original email, but the existing solution for this use case (as Marco
described in his blog post) is to actually lock the table momentarily.
Marco's blog post uses ShareRowExclusiveLock, but I think ShareLock is
sufficient for us; in any case, that's stronger than the AccessShareLock that
we need to merely wait.

And actually locking the table with e.g. ShareLock seems perhaps *more*
likely to cause deadlocks (and hurts performance), since it not only waits for
existing conflicting lockers (e.g. RowExclusiveLock) as desired, but also
undesirably blocks other transactions from newly acquiring conflicting locks
in the meantime. Hence the motivation for this feature. :-)

I'm sure I may be missing something though. Thanks for all your feedback. :-)

Re: Exposing the lock manager's WaitForLockers() to SQL

From

Andres Freund

Date:

13 January 2023, 03:49:36

Hi,

On 2023-01-12 19:21:00 -0800, Will Mortensen wrote:
> FWIW re: deadlocks in general, I probably didn't highlight it well in my
> original email, but the existing solution for this use case (as Marco
> described in his blog post) is to actually lock the table momentarily.
> Marco's blog post uses ShareRowExclusiveLock, but I think ShareLock is
> sufficient for us; in any case, that's stronger than the AccessShareLock that
> we need to merely wait.
> 
> And actually locking the table with e.g. ShareLock seems perhaps *more*
> likely to cause deadlocks (and hurts performance), since it not only waits for
> existing conflicting lockers (e.g. RowExclusiveLock) as desired, but also
> undesirably blocks other transactions from newly acquiring conflicting locks
> in the meantime. Hence the motivation for this feature. :-)
> 
> I'm sure I may be missing something though. Thanks for all your feedback. :-)

From a deadlock risk pov, it's worse to hold an AccessShareLock and then wait
for other transaction to end, than to just wait for ShareRowExclusiveLock,
without holding any locks.

If you don't hold any locks (*) and wait for a lock, you cannot participate in
a deadlock, because nobody will wait for you.  A deadlock is a cycle in the
lock graph, a node can't participate in a deadlock if it doesn't have any
incoming edges, and there can't be incoming edges if there's nothing to wait
on.

Consider a scenario like this:

tx 1: acquires RowExclusiveLock on tbl1 to insert rows
tx 2: acquires AccessShareLock on tbl1
tx 2: WaitForLockers(ShareRowExclusiveLock, tbl1) ends up waiting for tx1
tx 1: truncate tbl1 needs an AccessExclusiveLock

Boom, a simple deadlock. tx1 can't progress, because it can't get
AccessExclusiveLock, and tx2 can't progress because tx1 didn't finish.

But if tx2 directly waited for ShareRowExclusiveLock, there'd not been any
cycle in the lock graph, and everything would have worked.

Regards,

Andres

(*) If you define holding locks expansive, it's impossible to wait for a lock
without holding a lock, since every transaction holds a lock on its own
virtual transactionid. But normally nobody just waits for a transaction that
hasn't done anything.

Re: Exposing the lock manager's WaitForLockers() to SQL

From

Will Mortensen

Date:

13 January 2023, 07:02:46

Hi Andres,

On Thu, Jan 12, 2023 at 7:49 PM Andres Freund <andres@anarazel.de> wrote:
> Consider a scenario like this:
>
> tx 1: acquires RowExclusiveLock on tbl1 to insert rows
> tx 2: acquires AccessShareLock on tbl1
> tx 2: WaitForLockers(ShareRowExclusiveLock, tbl1) ends up waiting for tx1
> tx 1: truncate tbl1 needs an AccessExclusiveLock

Oh of course, thanks.

Is it even necessary to take the AccessShareLock? I see that one can call e.g.
RangeVarGetRelidExtended() with NoLock, and from the comments it seems
like that might be OK here?

Did you have any remaining concerns about the suitability of WaitForLockers()
for the use case?

Any thoughts on the syntax? It seems like an option to LOCK (like Marco
suggested) might be simplest to implement albeit a little tricky to document.

Supporting descendant tables looks straightforward enough (just collect more
locktags?). Views look more involved; maybe we can avoid supporting them?

Re: Exposing the lock manager's WaitForLockers() to SQL

From

Will Mortensen

Date:

02 February 2023, 05:55:28

Here is a first attempt at a WIP patch. Sorry about the MIME type.

It doesn't take any locks on the tables, but I'm not super confident
that that's safe, so any input would be appreciated.

I omitted view support for simplicity, but if that seems like a
requirement I'll see about adding it. I assume we would need to take
AccessShareLock on views (and release it, per above).

If the syntax and behavior seem roughly correct I'll work on updating the docs.

The commit message at the beginning of the .patch has slightly more commentary.

Thanks for any and all feedback!

Attachment

0001-Add-WAIT-ONLY-option-to-LOCK.patch

Re: Exposing the lock manager's WaitForLockers() to SQL

From

Will Mortensen

Date:

04 July 2023, 08:11:05

Updated patch with more tests and a first attempt at doc updates.

As the commit message and doc now point out, using
WaitForLockersMultiple() makes for a behavior difference with actually
locking multiple tables, in that the combined set of conflicting locks
is obtained only once for all tables, rather than obtaining conflicts
and locking / waiting for just the first table and then obtaining
conflicts and locking / waiting for the second table, etc. This is
definitely desirable for my use case, but maybe these kinds of
differences illustrate the potential awkwardness of extending LOCK?

Thanks again for any and all feedback!

Attachment

v2-0001-Add-WAIT-ONLY-option-to-LOCK-statement.patch

Re: Exposing the lock manager's WaitForLockers() to SQL

From

Will Mortensen

Date:

03 August 2023, 06:30:36

Updated docs a bit. I'll see about adding this to the next CF in hopes
of attracting a reviewer. :-)

Attachment

v3-0001-Add-WAIT-ONLY-option-to-LOCK-command.patch

Re: Exposing the lock manager's WaitForLockers() to SQL

From

Will Mortensen

Date:

04 September 2023, 06:16:52

I realized that for our use case, we'd ideally wait for holders of
RowExclusiveLock only, and not e.g. VACUUM holding
ShareUpdateExclusiveLock. Waiting for lockers in a specific mode seems
possible by generalizing/duplicating WaitForLockersMultiple() and
GetLockConflicts(), but I'd love to have a sanity check before
attempting that. Also, I imagine those semantics might be too
different to make sense as part of the LOCK command.

Alternatively, I had originally been trying to use the pg_locks view,
which obviously provides flexibility in identifying existing lock
holders. But I couldn't find a way to wait for the locks to be
released / transactions to finish, and I was a little concerned about
the performance impact of selecting from it frequently when we only
care about a subset of the locks, although I didn't try to assess that
in our particular application.

In any case, I'm looking forward to hearing more feedback from
reviewers and potential users. :-)

Re: Exposing the lock manager's WaitForLockers() to SQL

From

Will Mortensen

Date:

23 December 2023, 09:47:40

On Sun, Sep 3, 2023 at 11:16 PM Will Mortensen <will@extrahop.com> wrote:
> I realized that for our use case, we'd ideally wait for holders of
> RowExclusiveLock only, and not e.g. VACUUM holding
> ShareUpdateExclusiveLock. Waiting for lockers in a specific mode seems
> possible by generalizing/duplicating WaitForLockersMultiple() and
> GetLockConflicts(), but I'd love to have a sanity check before
> attempting that. Also, I imagine those semantics might be too
> different to make sense as part of the LOCK command.

Well I attempted it. :-) Here is a new series that refactors
GetLockConflicts(), generalizes WaitForLockersMultiple(), and adds a
new WAIT FOR LOCKERS command.

I first tried extending LOCK further, but the code became somewhat
unwieldy and the syntax became very confusing. I also thought again
about making new pg_foo() functions, but that would seemingly make it
harder to share code with LOCK, and sharing syntax (to the extent it
makes sense) feels very natural. Also, a new SQL command provides
plenty of doc space. :-) (I'll see about adding more examples later.)

I'll try to edit the title of the CF entry accordingly. Still looking
forward to any feedback. :-)

Attachment

Re: Exposing the lock manager's WaitForLockers() to SQL

From

Will Mortensen

Date:

23 December 2023, 09:56:37

I meant to add that the example in the doc is adapted from Marco
Slot's blog post linked earlier:
https://www.citusdata.com/blog/2018/06/14/scalable-incremental-data-aggregation/

Re: Exposing the lock manager's WaitForLockers() to SQL

From

Will Mortensen

Date:

06 January 2024, 10:57:04

Simplified the code and docs, and rewrote the example with more prose
instead of PL/pgSQL, which unfortunately made it longer, although it
could be truncated. Not really sure what's best...

Attachment

Re: Exposing the lock manager's WaitForLockers() to SQL

From

Laurenz Albe

Date:

06 January 2024, 12:00:08

On Sat, 2024-01-06 at 02:57 -0800, Will Mortensen wrote:
> Simplified the code and docs, and rewrote the example with more prose
> instead of PL/pgSQL, which unfortunately made it longer, although it
> could be truncated. Not really sure what's best...

I thought about this idea, and I have some doubts.

WAIT FOR LOCKERS only waits for transactions that were holding locks
when the statement started.  Transactions that obtailed locks later on
are ignored.  While your original use case is valid, I cannot think of
any other use case.  So it is a special-purpose statement that is only
useful for certain processing of append-only tables.

Is it worth creating a new SQL statement for that, which could lead to
a conflict with future editions of the SQL standard?  Couldn't we follow
the PostgreSQL idiosyncrasy of providing a function with side effects
instead?

Yours,
Laurenz Albe

Re: Exposing the lock manager's WaitForLockers() to SQL

From

Will Mortensen

Date:

09 January 2024, 08:18:28

Hi Laurenz, thanks for taking a look!

On Sat, Jan 6, 2024 at 4:00 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
> While your original use case is valid, I cannot think of
> any other use case.  So it is a special-purpose statement that is only
> useful for certain processing of append-only tables.

It is definitely somewhat niche. :-) But as I mentioned in my
longwinded original message, the scheme is easily extended (with some
tradeoffs) to process updates, if they set a non-primary-key column
using a sequence. As for deletions though, our applications handle
them separately.

> Is it worth creating a new SQL statement for that, which could lead to
> a conflict with future editions of the SQL standard?  Couldn't we follow
> the PostgreSQL idiosyncrasy of providing a function with side effects
> instead?

I would be happy to add a pg_foo() function instead. Here are a few
things to figure out:

* To support waiting for lockers in a specified mode vs. conflicting
with a specified mode, should there be two functions, or one function
with a boolean argument like I used in C?

* Presumably the function(s) would take a regclass[] argument?

* Presumably the lock mode would be specified using strings like
'ShareLock'? There's no code to parse these AFAICT, but we could add
it.

* Maybe we could omit LOCK's handling of descendant tables for
simplicity? I will have to see how much other code needs to be
duplicated or shared.

I'll look further into it later this week.

Re: Exposing the lock manager's WaitForLockers() to SQL

From

Will Mortensen

Date:

11 January 2024, 09:51:20

Here is a new series adding a single pg_wait_for_lockers() function
that takes a boolean argument to control the interpretation of the
lock mode. It omits LOCK's handling of descendant tables so it
requires permissions directly on descendants in order to wait for
locks on them. Not sure if that would be a problem for anyone.

Attachment

Re: Exposing the lock manager's WaitForLockers() to SQL

From

vignesh C

Date:

26 January 2024, 12:54:01

On Thu, 11 Jan 2024 at 15:22, Will Mortensen <will@extrahop.com> wrote:
>
> Here is a new series adding a single pg_wait_for_lockers() function
> that takes a boolean argument to control the interpretation of the
> lock mode. It omits LOCK's handling of descendant tables so it
> requires permissions directly on descendants in order to wait for
> locks on them. Not sure if that would be a problem for anyone.

CFBot shows that there is one warning as in [1]:
patching file doc/src/sgml/libpq.sgml
...
[09:30:40.000] [940/2212] Compiling C object
src/backend/postgres_lib.a.p/storage_lmgr_condition_variable.c.obj
[09:30:40.000] [941/2212] Compiling C object
src/backend/postgres_lib.a.p/storage_lmgr_deadlock.c.obj
[09:30:40.000] [942/2212] Compiling C object
src/backend/postgres_lib.a.p/storage_lmgr_lmgr.c.obj
[09:30:40.000] [943/2212] Compiling C object
src/backend/postgres_lib.a.p/storage_lmgr_lock.c.obj
[09:30:40.000] c:\cirrus\src\backend\storage\lmgr\lock.c(4084) :
warning C4715: 'ParseLockmodeName': not all control paths return a
value

Please post an updated version for the same.

[1] - https://api.cirrus-ci.com/v1/task/4884224944111616/logs/build.log

Regards,
Vignesh

Re: Exposing the lock manager's WaitForLockers() to SQL

From

Will Mortensen

Date:

29 January 2024, 03:28:05

On Fri, Jan 26, 2024 at 4:54 AM vignesh C <vignesh21@gmail.com> wrote:
>
> CFBot shows that there is one warning as in [1]:
> patching file doc/src/sgml/libpq.sgml
> ...
> [09:30:40.000] [943/2212] Compiling C object
> src/backend/postgres_lib.a.p/storage_lmgr_lock.c.obj
> [09:30:40.000] c:\cirrus\src\backend\storage\lmgr\lock.c(4084) :
> warning C4715: 'ParseLockmodeName': not all control paths return a
> value

Thanks Vignesh, I guess the MS compiler doesn't have
__builtin_constant_p()? So I added an unreachable return, and a
regression test that exercises this error path.

I also made various other simplifications and minor fixes to the code,
docs, and tests.

Back in v5 (with a new SQL command) I had a detailed example in the
docs, which I removed when changing to a function, and I'm not sure if
I should try to add it back now...I could shrink it but it might still
be too long for this part of the docs?

Anyway, please see attached.

Attachment

Re: Exposing the lock manager's WaitForLockers() to SQL

From

Will Mortensen

Date:

29 January 2024, 04:06:03

I guess the output of the deadlock test was unstable, so I simply
removed it in v8 here, but I can try to fix it instead if it seems
important to test that.

Attachment

Re: Exposing the lock manager's WaitForLockers() to SQL

From

Will Mortensen

Date:

29 January 2024, 07:44:05

Minor style fix; sorry for the spam.

Attachment

Re: Exposing the lock manager's WaitForLockers() to SQL

From

Will Mortensen

Date:

09 March 2024, 04:25:36

Rebased and fixed conflicts.

FWIW re: Andrey's comment in his excellent CF summary email[0]: we're
currently using vanilla Postgres (via Gentoo) on single nodes, and not
anything fancy like Citus. The Citus relationship is just that we were
inspired by Marco's blog post there. We have a variety of clients
written in different languages that generally don't coordinate their
table modifications, and Marco's scheme merely requires them to use
sequences idiomatically, which we can just about manage. :-)

This feature is then a performance optimization to support this scheme
while avoiding the case where one writer holding a RowExclusiveLock
blocks the reader from taking a ShareLock which in turn prevents other
writers from taking a RowExclusiveLock for a long time. Instead, the
reader can wait for the first writer without taking any locks or
blocking later writers. I've illustrated this difference in the
isolation tests.

Still hoping we can get this into 17. :-)

[0] https://www.postgresql.org/message-id/C8D65462-0888-4484-A72C-C99A94381ECD%40yandex-team.ru

Attachment

Re: Exposing the lock manager's WaitForLockers() to SQL

From

Will Mortensen

Date:

27 March 2024, 05:16:47

Rebased, fixed a couple typos, and reordered the isolation tests to
put the most elaborate pair last.

Attachment

Re: Exposing the lock manager's WaitForLockers() to SQL

From

Will Mortensen

Date:

30 May 2024, 07:01:32

I got some very helpful off-list feedback from Robert Haas that this
needed more self-contained explanation/motivation. So here goes. :-)

This patch set adds a new SQL function pg_wait_for_lockers(), which
waits for transactions holding specified table locks to commit or roll
back. This can be useful with knowledge of the queries in those
transactions, particularly for asynchronous and incremental processing
of inserted/updated rows.

Specifically, consider a scenario where INSERTs and UPDATEs always set
a serial column to its default value. A client can call
pg_sequence_last_value() + pg_wait_for_lockers() and then take a new
DB snapshot and know that rows committed after this snapshot will have
values of the serial column greater than the value from
pg_sequence_last_value(). As shown in the example at the end, this
allows the client to asynchronously and incrementally read
inserted/updated rows with minimal per-client state, without buffering
changes, and without affecting writer transactions.

There are lots of other ways to support incrementally reading new
rows, but they don’t have all of those qualities. For example:

* Forcing writers to commit in a specific order (e.g. by serial column
value) would reduce throughput

* Explicitly tracking or coordinating with writers would likely be
more complex, impact performance, and/or require much more state

* Methods that are synchronous or buffer/queue changes are problematic
if readers fall behind

Existing ways to wait for table locks also have downsides:

* Taking a conflicting lock with LOCK blocks new transactions from
taking the lock of interest while LOCK waits. And in order to wait for
writers holding RowExclusiveLock, we must take ShareLock, which also
conflicts with ShareUpdateExclusiveLock and therefore unnecessarily
interferes with (auto)vacuum. Finally, with multiple tables LOCK locks
them one at a time, so it waits (and holds locks) longer than
necessary.

* Using pg_locks / pg_lock_status() to identify the transactions
holding the locks is more expensive since it also returns all other
locks in the DB cluster, plus there’s no efficient built-in way to
wait for the transactions to commit or roll back.

By contrast, pg_wait_for_lockers() doesn’t block other transactions,
waits on multiple tables in parallel, and doesn’t spend time looking
at irrelevant locks.

This change is split into three patches for ease of review. The first
two patches modify the existing WaitForLockers() C function and other
locking internals to support waiting for lockers in a single lock
mode, which allows waiting for INSERT/UPDATE without waiting for
vacuuming. These changes could be omitted at the cost of unnecessary
waiting, potentially for a long time with slow vacuums. The third
patch adds the pg_wait_for_lockers() SQL function, which just calls
WaitForLockers().

FWIW, another solution might be to directly expose the functions that
WaitForLockers() calls, namely GetLockConflicts() (generalized to
GetLockers() in the first patch) to identify the transactions holding
the locks, and VirtualXactLock() to wait for each transaction to
commit or roll back. That would be more complicated for the client but
could be more broadly useful. I could investigate that further if it
seems preferable.


=== Example ===

Assume we have the following table:

CREATE TABLE page_views (
    id bigserial,
    view_time timestamptz
);

which is only ever modified by (potentially concurrent) INSERT
commands that assign the default value to the id column. We can run
the following commands:

SELECT pg_sequence_last_value('page_views_id_seq');

 pg_sequence_last_value
------------------------
                       4

SELECT pg_wait_for_lockers(array['page_views']::oid, regclass[],
'RowExclusiveLock', FALSE);

Now we know that all rows where id <= 4 have been committed or rolled
back, and we can observe/process them:

SELECT * FROM page_views WHERE id <= 4;

 id |           view_time
----+-------------------------------
  2 | 2024-01-01 12:34:01.000000-00
  3 | 2024-01-01 12:34:00.000000-00

Later we can iterate:

SELECT pg_sequence_last_value('page_views_id_seq');

 pg_sequence_last_value
------------------------
                      9

SELECT pg_wait_for_lockers(array['page_views']::oid, regclass[],
'RowExclusiveLock', FALSE);

We already observed all the rows where id <= 4, so this time we can
filter them out:

SELECT * FROM page_views WHERE id > 4 AND id <= 9;

 id |           view_time
----+-------------------------------
  5 | 2024-01-01 12:34:05.000000-00
  8 | 2024-01-01 12:34:04.000000-00
  9 | 2024-01-01 12:34:07.000000-00

We can continue iterating like this to incrementally observe more
newly inserted rows. Note that the only state we persist across
iterations is the value returned by pg_sequence_last_value().

In this example, we processed inserted rows exactly once. Variations
are possible for handling updates, as discussed in the original email,
and I could explain that again better if it would be helpful. :-)

Attachment

Re: Exposing the lock manager's WaitForLockers() to SQL

From

Will Mortensen

Date:

30 May 2024, 07:10:57

I should add that the latest patches remove permissions checks because
pg_locks doesn't have any, and improve the commit messages. Hope I
didn't garble anything doing this late after the dev conference. :-)

Robert asked me about other existing functions that could be
leveraged, such as GetConflictingVirtualXIDs(), but I didn't see any
with close-enough semantics that handle fast-path locks as needed for
tables/relations.

Re: Exposing the lock manager's WaitForLockers() to SQL

From

Will Mortensen

Date:

22 July 2024, 06:46:51

On Thu, May 30, 2024 at 12:01 AM Will Mortensen <will@extrahop.com> wrote:
> FWIW, another solution might be to directly expose the functions that
> WaitForLockers() calls, namely GetLockConflicts() (generalized to
> GetLockers() in the first patch) to identify the transactions holding
> the locks, and VirtualXactLock() to wait for each transaction to
> commit or roll back. That would be more complicated for the client but
> could be more broadly useful. I could investigate that further if it
> seems preferable.

We will look further into this. Since the main advantage over polling
the existing pg_locks view would be efficiency, we will try to provide
more quantitative evidence/analysis of that. That will probably want
to be a new thread and CF entry, so I'm withdrawing this one.

Thanks again for all the replies, and to Robert for your off-list
feedback and letting me bend your ear in Vancouver. :-)