Re: Interrupts vs signals - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Interrupts vs signals
Date
Msg-id b3930251-78d5-4379-b213-b37b469f6dfb@iki.fi
Whole thread Raw
In response to Interrupts vs signals  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: Interrupts vs signals
List pgsql-hackers
One remaining issue with these patches is backwards-compatibility for 
extensions:

1. Any extensions using WaitLatch, SetLatch, ResetLatch need to be 
changed to use WaitInterrupt, SetInterrupt/RaiseInterrupt, 
ResetInterrupt instead.

2. If an extension is defining its own Latch with InitLatch, rather than 
using MyLatch, it's out of luck. You could probably rewrite it using the 
INTERRUPT_GENERAL_WAKEUP, which is multiplexed like MyLatch, but it's a 
bit more effort.

We can provide backwards compatibility macros and a new facility to 
allocate custom interrupt bits. But how big of a problem is this anyway? 
In an in-person chat last week, Andres said something like "this will 
break every extension". I don't think it's quite that bad, and more 
complicated extensions need small tweaks in every major release anyway. 
But let's try to quantify that.

Analysis
--------

Joel compiled a list of all extensions that are on github: 
https://gist.github.com/joelonsql/e5aa27f8cc9bd22b8999b7de8aee9d47. I 
did "git checkout" on all of them, 1159 repositories in total (some of 
the links on the list were broken, or required authentication).

86 out of those 1159 extensions contain the word "Latch":

$ ls -d  */*  | xargs -IFOO bash -c "grep -q -r -I  Latch FOO && echo 
FOO" | wc -l
86

For comparison, in PostgreSQL v10, we added a new 'wait_event_info' 
argument to WaitLatch and WaitLatchForSocket. That breakage was on 
similar scale:

$ ls -d  */*  | xargs -IFOO bash -c "grep -q -r -I  WaitLatch FOO && 
echo FOO" | wc -l
71

Admittedly that was a long time ago, we might have different priorities now.

Deeper analysis
---------------

Most of those calls are simple calls to SetLatch(MyLatch), 
WaitLatch(MyLatch, ...) etc. that could be handled by backwards 
compatibility macros. Many of the calls don't use MyLatch, but use 
MyProc->procLatch directly:

     rc = WaitLatch(&MyProc->procLatch, ...)

That makes sense, since  MyLatch was added in 2015, while the initial 
latch code was added in 2010. Any extensions that lived during that era 
would use MyProc->procLatch for backwards-compatibility, and there's 
been no need to change that since. That could also be supported by 
backwards-compatibility macros, or we could tell the extension authors 
to search & replace MyProc->procLatch to MyLatch.

Excluding those, there are a handful of extensions that would need to be 
updated. Below is a quick analysis of the remaining results of "grep 
Latch" in all the repos:

These calls use the process latch to wake up a different process (I 
excluded some duplicated forks of the same extension):

2ndQuadrant/pglogical/pglogical_sync.c: 
SetLatch(&apply->proc->procLatch);
2ndQuadrant/pglogical/pglogical_apply.c: 
                SetLatch(&worker->proc->procLatch);
2ndQuadrant/pglogical/pglogical_worker.c: 
SetLatch(&w->proc->procLatch);
2ndQuadrant/pglogical/pglogical_worker.c: 
SetLatch(&PGLogicalCtx->supervisor->procLatch);
heterodb/pg-strom/src/gpu_cache.c:      Latch      *backend;
heterodb/pg-strom/src/gpu_cache.c:      cmd->backend = (is_async ? NULL 
: MyLatch);
heterodb/pg-strom/src/gpu_cache.c: 
SetLatch(cmd->backend);
heterodb/pg-strom/src/gpu_cache.c: 
SetLatch(cmd->backend);
citusdata/citus/src/backend/distributed/utils/maintenanced.c:   Latch 
*latch; /* pointer to the background worker's latch */
citusdata/citus/src/backend/distributed/utils/maintenanced.c: 
                SetLatch(dbData->latch);
liaorc/devel-master/src/cuda_program.c: 
SetLatch(&proc->procLatch);
okbob/generic-scheduler/dbworker.c: 
SetLatch(&dbworker->parent->procLatch);
postgrespro/lsm3/lsm3.c: 
SetLatch(&entry->merger->procLatch);
postgrespro/lsm3/lsm3.c: 
SetLatch(&entry->merger->procLatch);
postgrespro/pg_pageprep/pg_pageprep.c: 
SetLatch(&starter->procLatch);
postgrespro/pg_pageprep/pg_pageprep.c: 
SetLatch(&starter->procLatch);
postgrespro/pg_query_state/pg_query_state.c:    Latch   *caller;
postgrespro/pg_query_state/pg_query_state.c: 
SetLatch(counterpart_userid->caller);
postgrespro/pg_query_state/pg_query_state.c: 
counterpart_userid->caller = MyLatch;
timescale/timescaledb/src/loader/bgw_message_queue.c: 
SetLatch(&BackendPidGetProc(queue_get_reader(queue))->procLatch);
troels/hbase_fdw/process_communication.c:       control->latch = MyLatch;
troels/hbase_fdw/process_communication.c: 
SetLatch(control->latch);

The above could easily be fairly replaced with the new SendInterrupt() 
function. A backwards compatibility macro might be possible for some of 
these, but would be tricky for others. So I think these extensions will 
need to adapt by switching to SendInterrupt(). At quick glance, it would 
easy for all of these to do that with a small "#if PG_VERSION_NUM" block.


postgrespro/pg_wait_sampling/pg_wait_sampling.c:         * null wait 
events. So instead we make use of DisownLatch() resetting
postgrespro/pg_wait_sampling/pg_wait_sampling.c:        if (proc->pid == 
0 || proc->procLatch.owner_pid == 0 || proc->pid == MyProcPid)
postgrespro/pg_wait_sampling/pg_wait_sampling.c: 
SetLatch(pgws_collector_hdr->latch);
postgrespro/pg_wait_sampling/pg_wait_sampling.c: 
SetLatch(pgws_collector_hdr->latch);
postgrespro/pg_wait_sampling/pg_wait_sampling.h:        Latch 
   *latch;

This case is similar to the previous group: a latch is used to wake up 
another process. It peeks directly into procLatch.owner_pid however. I 
suspect this will be simpler and better after rewriting to use 
interrupts, but not sure.


darold/datalink/datalink_bgw.c:                         (errmsg("Latch 
status before waitlatch call: %d", MyProc->procLatch.is_set)));

This is a DEBUG1 message. Could probably be just removed, it doesn't 
seem very useful.

postgrespro/jsonbd/jsonbd.c:    SetLatch(&wd->latch);
postgrespro/jsonbd/jsonbd.h:    Latch                           latch;
postgrespro/jsonbd/jsonbd.h:    Latch 
launcher_latch;
postgrespro/jsonbd/jsonbd_worker.c:     SetLatch(&hdr->launcher_latch);
postgrespro/jsonbd/jsonbd_worker.c:     InitLatch(&worker_state->latch);
postgrespro/jsonbd/jsonbd_worker.c:     InitLatch(&hdr->launcher_latch);
postgrespro/jsonbd/jsonbd_worker.c:             rc = WaitLatch(MyLatch, 
WL_LATCH_SET | WL_POSTMASTER_DEATH,
postgrespro/jsonbd/jsonbd_worker.c:             ResetLatch(MyLatch);
postgrespro/jsonbd/jsonbd_worker.c:             rc = 
WaitLatch(&worker_state->latch, WL_LATCH_SET | WL_POSTMASTER_DEATH,
postgrespro/jsonbd/jsonbd_worker.c: 
ResetLatch(&worker_state->latch);
postgrespro/jsonbd/jsonbd_worker.c:     WaitLatch(&hdr->launcher_latch, 
WL_LATCH_SET, 0, PG_WAIT_EXTENSION);
postgrespro/jsonbd/jsonbd_worker.c:     WaitLatch(&hdr->launcher_latch, 
WL_LATCH_SET, 0);
postgrespro/jsonbd/jsonbd_worker.c:     ResetLatch(&hdr->launcher_latch);
timescale/timescaledb/test/src/bgw/params.h:    Latch timer_latch;
timescale/timescaledb/test/src/bgw/params.c: 
SetLatch(&wrapper->params.timer_latch);
timescale/timescaledb/test/src/bgw/params.c: 
InitLatch(&wrapper->params.timer_latch);
timescale/timescaledb/test/src/bgw/params.c: 
ResetLatch(&wrapper->params.timer_latch);
timescale/timescaledb/test/src/bgw/params.c: 
WaitLatch(&wrapper->params.timer_latch,

These two extensions, 'jsonbd' and 'timescaledb', actually use InitLatch 
to define their own latches.

I think the 'jsonbd' usage was copied from logical apply workers. At 
quick glance, it probably should be using the process latch instead of 
creating its own.

In 'timescaledb', the InitLatch call is in a mock test module. It 
probably could be written differently..



Summary
-------


We have a few options for how to deal with backwards-compatibility for 
extensions:

A) If we rip off the bandaid in one go and don't provide any 
backwards-compatibility macros, we will break 96 extensions. Most of 
them can be fixed by replacing WaitLatch, SetLatch, ResetLatch with 
corresponding WaitInterrupt, RaiseInterrupt, ResetInterrupt calls. (With 
#ifdef for compatibility with older postgres versions)

B) If we provide backwards-compatibility macros so that simple Latch
calls on MyLatch continue working, we will break about 14 extensions. 
They will need some tweaking like in option A). A bit more than the 
simple cases in option A), but nothing too difficult.

C) We could provide "forward-compatibility" macros in a separate header 
file, to make the new "SetInterrupt" etc calls work in old PostgreSQL 
versions. Many of the extensions already have a header file like this, 
see e.g. citusdata/citus/src/include/pg_version_compat.h, 
pipelinedb/pipelinedb/include/compat.h. It might actually be a good idea 
to provide a semi-official header file like this on the Postgres wiki, 
to help extension authors. It would encourage extensions to use the 
latest idioms, while still being able to compile for older versions.



I'm leaning towards option C). Let's rip off the band-aid, but provide 
documentation for how to adapt your extension code. And provide a 
forwards-compatibility header on the wiki, that extension authors can 
use to make the new Interrupt calls work against old server versions.


The second question is whether we need to provide a replacement for 
InitLatch for extensions. ISTM that none of the extensions out there 
truly need it. But maybe if we had a nicer interface for allocating 
interrupt bits for custom usage, they would actually find that handy. 
But I'm leaning towards "no" at this point. Could be added later.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Eager aggregation, take 3
Next
From: Nathan Bossart
Date:
Subject: Re: Reduce one comparison in binaryheap's sift down