Thread: [COMMITTERS] pgsql: Add support for temporary replication slots

[COMMITTERS] pgsql: Add support for temporary replication slots

From

Peter Eisentraut

Date:

12 December 2016, 17:16:34

Add support for temporary replication slots

This allows creating temporary replication slots that are removed
automatically at the end of the session or on error.

From: Petr Jelinek <petr.jelinek@2ndquadrant.com>

Branch
------
master

Details
-------
http://git.postgresql.org/pg/commitdiff/a924c327e2793d2025b19e18de7917110dc8afd8

Modified Files
--------------
contrib/test_decoding/Makefile          |  2 +-
contrib/test_decoding/expected/ddl.out  |  4 +-
contrib/test_decoding/expected/slot.out | 58 +++++++++++++++++++++++++++
contrib/test_decoding/sql/slot.sql      | 20 ++++++++++
doc/src/sgml/func.sgml                  | 16 ++++++--
doc/src/sgml/protocol.sgml              | 13 ++++++-
src/backend/catalog/system_views.sql    | 11 ++++++
src/backend/replication/repl_gram.y     | 22 +++++++----
src/backend/replication/repl_scanner.l  |  1 +
src/backend/replication/slot.c          | 69 ++++++++++++++++++++++++++-------
src/backend/replication/slotfuncs.c     | 24 ++++++++----
src/backend/replication/walsender.c     | 28 +++++++------
src/backend/storage/lmgr/proc.c         |  3 ++
src/backend/tcop/postgres.c             |  3 ++
src/include/catalog/pg_proc.h           |  6 +--
src/include/nodes/replnodes.h           |  1 +
src/include/replication/slot.h          |  4 +-
src/test/regress/expected/rules.out     |  3 +-
18 files changed, 237 insertions(+), 51 deletions(-)

Re: [COMMITTERS] pgsql: Add support for temporary replication slots

From

Tom Lane

Date:

12 December 2016, 18:43:00

Peter Eisentraut <peter_e@gmx.net> writes:
> Add support for temporary replication slots

Some of the slower buildfarm members are failing test-decoding-check since
this went in.  At least on prairiedog, it looks like a race condition in
the test:

** /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/expected/slot.out    Mon Dec 12 09:24:32 2016
--- /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/test_decoding/./regression_output/results/slot.out    Mon Dec 12
10:01:312016 
***************
*** 32,38 ****

  -- should fail because the temporary slot was dropped automatically
  SELECT pg_drop_replication_slot('regression_slot_t');
! ERROR:  replication slot "regression_slot_t" does not exist
  -- test switching between slots in a session
  SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot1', 'test_decoding', true);
   ?column?
--- 32,38 ----

  -- should fail because the temporary slot was dropped automatically
  SELECT pg_drop_replication_slot('regression_slot_t');
! ERROR:  replication slot "regression_slot_t" is active for PID 17615
  -- test switching between slots in a session
  SELECT 'init' FROM pg_create_logical_replication_slot('regression_slot1', 'test_decoding', true);
   ?column?

            regards, tom lane

Re: [COMMITTERS] pgsql: Add support for temporary replication slots

From

Tom Lane

Date:

12 December 2016, 18:55:44

I wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
>> Add support for temporary replication slots

> Some of the slower buildfarm members are failing test-decoding-check since
> this went in.  At least on prairiedog, it looks like a race condition in
> the test:

Having now looked a bit more closely, that's almost certainly what it is:
the test is assuming that the old backend exits instantaneously, which
it doesn't.  You might want to borrow the wait-for-previous-session-to-die
loop I put into src/test/modules/test_extensions/sql/test_extensions.sql
recently.  (Make sure you get the fixed version ;-))

            regards, tom lane

Re: [COMMITTERS] pgsql: Add support for temporary replication slots

From

Petr Jelinek

Date:

12 December 2016, 21:47:53

On 12/12/16 16:55, Tom Lane wrote:
> I wrote:
>> Peter Eisentraut <peter_e@gmx.net> writes:
>>> Add support for temporary replication slots
>
>> Some of the slower buildfarm members are failing test-decoding-check since
>> this went in.  At least on prairiedog, it looks like a race condition in
>> the test:
>
> Having now looked a bit more closely, that's almost certainly what it is:
> the test is assuming that the old backend exits instantaneously, which
> it doesn't.  You might want to borrow the wait-for-previous-session-to-die
> loop I put into src/test/modules/test_extensions/sql/test_extensions.sql
> recently.  (Make sure you get the fixed version ;-))
>

Yes you are correct, we tried naively to first drop another slot in
hopes that it will give the other backend enough time to close, but
apparently that wasn't robust enough for slower machines.

Attached is the fixed test using the loop from test_extensions (and yes
I would have missed the pg_stat_clear_snapshot() call without the hint,
thanks :) ).

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

0001-Fix-the-test_decoding-slot-test-race-condition.patch

Re: [COMMITTERS] pgsql: Add support for temporary replication slots

From

Tom Lane

Date:

12 December 2016, 22:32:57

Petr Jelinek <petr.jelinek@2ndquadrant.com> writes:
> Yes you are correct, we tried naively to first drop another slot in
> hopes that it will give the other backend enough time to close, but
> apparently that wasn't robust enough for slower machines.

> Attached is the fixed test using the loop from test_extensions (and yes
> I would have missed the pg_stat_clear_snapshot() call without the hint,
> thanks :) ).

Pushed, thanks.

            regards, tom lane

Re: [COMMITTERS] pgsql: Add support for temporary replication slots

From

Tom Lane

Date:

13 December 2016, 02:39:13

I wrote:
> Petr Jelinek <petr.jelinek@2ndquadrant.com> writes:
>> Attached is the fixed test using the loop from test_extensions (and yes
>> I would have missed the pg_stat_clear_snapshot() call without the hint,
>> thanks :) ).

> Pushed, thanks.

Hm, buildfarm says this didn't fix it.  Where exactly does the dropping
of the slot happen ... is it not complete by the time the backend exits?

            regards, tom lane

Re: [COMMITTERS] pgsql: Add support for temporary replication slots

From

Petr Jelinek

Date:

13 December 2016, 03:05:08

On 13/12/16 00:39, Tom Lane wrote:
> I wrote:
>> Petr Jelinek <petr.jelinek@2ndquadrant.com> writes:
>>> Attached is the fixed test using the loop from test_extensions (and yes
>>> I would have missed the pg_stat_clear_snapshot() call without the hint,
>>> thanks :) ).
>
>> Pushed, thanks.
>
> Hm, buildfarm says this didn't fix it.  Where exactly does the dropping
> of the slot happen ... is it not complete by the time the backend exits?
>

Yes, I've seen. The cleanup of slots is done in ProcKill(), I wonder,
since it's on_shmem_exit hook, and the pgstat_beshutdown_hook is as
well, maybe the pgstat_beshutdown_hook is called before ProcKill and the
query is lucky to hit right in between.

I am not quite sure what would be the best way to work around that. We
could wait for the slot to disappear instead of trying to drop it, but
then the test would hang on failure.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Re: [COMMITTERS] pgsql: Add support for temporary replication slots

From

Tom Lane

Date:

13 December 2016, 03:40:37

Petr Jelinek <petr.jelinek@2ndquadrant.com> writes:
> On 13/12/16 00:39, Tom Lane wrote:
>> Hm, buildfarm says this didn't fix it.  Where exactly does the dropping
>> of the slot happen ... is it not complete by the time the backend exits?

> Yes, I've seen. The cleanup of slots is done in ProcKill(), I wonder,
> since it's on_shmem_exit hook, and the pgstat_beshutdown_hook is as
> well, maybe the pgstat_beshutdown_hook is called before ProcKill and the
> query is lucky to hit right in between.

Hm.  That seems like a pretty bogus place to do it.  An awful lot of the
backend infrastructure is already gone by then, if I recall the ordering
correctly.  Maybe ShutdownPostgres would be a saner place; but it really
depends on what you think the module layering is for this facility.
I would definitely not think it is proc.c's responsibility, though.

            regards, tom lane

Re: [COMMITTERS] pgsql: Add support for temporary replication slots

From

Petr Jelinek

Date:

13 December 2016, 03:42:48

On 13/12/16 01:40, Tom Lane wrote:
> Petr Jelinek <petr.jelinek@2ndquadrant.com> writes:
>> On 13/12/16 00:39, Tom Lane wrote:
>>> Hm, buildfarm says this didn't fix it.  Where exactly does the dropping
>>> of the slot happen ... is it not complete by the time the backend exits?
>
>> Yes, I've seen. The cleanup of slots is done in ProcKill(), I wonder,
>> since it's on_shmem_exit hook, and the pgstat_beshutdown_hook is as
>> well, maybe the pgstat_beshutdown_hook is called before ProcKill and the
>> query is lucky to hit right in between.
>
> Hm.  That seems like a pretty bogus place to do it.  An awful lot of the
> backend infrastructure is already gone by then, if I recall the ordering
> correctly.  Maybe ShutdownPostgres would be a saner place; but it really
> depends on what you think the module layering is for this facility.
> I would definitely not think it is proc.c's responsibility, though.
>

Well, the problem is that that's the place where the currently active
slot is released (if there was an active one). So we'd need to move that
part somewhere else as well. I am not sure what's the reasoning for
releasing it at that specific spot so CCing Andres.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Re: [COMMITTERS] pgsql: Add support for temporary replication slots

From

Andres Freund

Date:

13 December 2016, 03:46:48

On 2016-12-12 19:40:37 -0500, Tom Lane wrote:
> Petr Jelinek <petr.jelinek@2ndquadrant.com> writes:
> > Yes, I've seen. The cleanup of slots is done in ProcKill(), I wonder,
> > since it's on_shmem_exit hook, and the pgstat_beshutdown_hook is as
> > well, maybe the pgstat_beshutdown_hook is called before ProcKill and the
> > query is lucky to hit right in between.
>
> Hm.  That seems like a pretty bogus place to do it.

Well, that'd not be Pet[e]r's fault. Robert and I had some preexisting
slot cleanpucode there (which I'd like to consolidate btw).

> backend infrastructure is already gone by then, if I recall the ordering
> correctly.  Maybe ShutdownPostgres would be a saner place; but it really
> depends on what you think the module layering is for this facility.
> I would definitely not think it is proc.c's responsibility, though.

Slots quite possibly can used by bgworkers, so I don't think
ShutdownPostgres would be right.  I don't think there's a perfect place
for it atm, so it's ProcKill()...

- Andres

Re: [COMMITTERS] pgsql: Add support for temporary replication slots

From

Andres Freund

Date:

13 December 2016, 03:49:03

On 2016-12-13 01:42:48 +0100, Petr Jelinek wrote:
> On 13/12/16 01:40, Tom Lane wrote:
> > Petr Jelinek <petr.jelinek@2ndquadrant.com> writes:
> >> On 13/12/16 00:39, Tom Lane wrote:
> >>> Hm, buildfarm says this didn't fix it.  Where exactly does the dropping
> >>> of the slot happen ... is it not complete by the time the backend exits?
> >
> >> Yes, I've seen. The cleanup of slots is done in ProcKill(), I wonder,
> >> since it's on_shmem_exit hook, and the pgstat_beshutdown_hook is as
> >> well, maybe the pgstat_beshutdown_hook is called before ProcKill and the
> >> query is lucky to hit right in between.
> >
> > Hm.  That seems like a pretty bogus place to do it.  An awful lot of the
> > backend infrastructure is already gone by then, if I recall the ordering
> > correctly.  Maybe ShutdownPostgres would be a saner place; but it really
> > depends on what you think the module layering is for this facility.
> > I would definitely not think it is proc.c's responsibility, though.
> >
>
> Well, the problem is that that's the place where the currently active
> slot is released (if there was an active one). So we'd need to move that
> part somewhere else as well. I am not sure what's the reasoning for
> releasing it at that specific spot so CCing Andres.

Why don't we just instead make the loop over pg_replication_slots?

Andres

Re: [COMMITTERS] pgsql: Add support for temporary replication slots

From

Petr Jelinek

Date:

13 December 2016, 03:57:01

On 13/12/16 01:49, Andres Freund wrote:
> On 2016-12-13 01:42:48 +0100, Petr Jelinek wrote:
>> On 13/12/16 01:40, Tom Lane wrote:
>>> Petr Jelinek <petr.jelinek@2ndquadrant.com> writes:
>>>> On 13/12/16 00:39, Tom Lane wrote:
>>>>> Hm, buildfarm says this didn't fix it.  Where exactly does the dropping
>>>>> of the slot happen ... is it not complete by the time the backend exits?
>>>
>>>> Yes, I've seen. The cleanup of slots is done in ProcKill(), I wonder,
>>>> since it's on_shmem_exit hook, and the pgstat_beshutdown_hook is as
>>>> well, maybe the pgstat_beshutdown_hook is called before ProcKill and the
>>>> query is lucky to hit right in between.
>>>
>>> Hm.  That seems like a pretty bogus place to do it.  An awful lot of the
>>> backend infrastructure is already gone by then, if I recall the ordering
>>> correctly.  Maybe ShutdownPostgres would be a saner place; but it really
>>> depends on what you think the module layering is for this facility.
>>> I would definitely not think it is proc.c's responsibility, though.
>>>
>>
>> Well, the problem is that that's the place where the currently active
>> slot is released (if there was an active one). So we'd need to move that
>> part somewhere else as well. I am not sure what's the reasoning for
>> releasing it at that specific spot so CCing Andres.
>
> Why don't we just instead make the loop over pg_replication_slots?
>

I mentioned that as possible solution upthread, I am only worried that
the failure scenario is basically infinite loop.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Re: [COMMITTERS] pgsql: Add support for temporary replication slots

From

Andres Freund

Date:

13 December 2016, 04:00:59

On 2016-12-13 01:57:01 +0100, Petr Jelinek wrote:
> I mentioned that as possible solution upthread, I am only worried that
> the failure scenario is basically infinite loop.

I don't see the problem with that. If you're really concerned you can
set a statement timeout.

Andres

Re: [COMMITTERS] pgsql: Add support for temporary replication slots

From

Tom Lane

Date:

13 December 2016, 04:08:01

Andres Freund <andres@anarazel.de> writes:
> On 2016-12-13 01:57:01 +0100, Petr Jelinek wrote:
>> I mentioned that as possible solution upthread, I am only worried that
>> the failure scenario is basically infinite loop.

> I don't see the problem with that. If you're really concerned you can
> set a statement timeout.

I don't think "change the test" is the right response here.
I think the problem is that we're disconnecting from the slot at the
wrong step of backend shutdown, and that we need to fix that before
it bites us on some more painful parts of our anatomies.  You can't
just throw darts at the code when deciding where to do things, and
proc.c is NOT the place that should be concerned with replication
slots.

            regards, tom lane

Re: [COMMITTERS] pgsql: Add support for temporary replication slots

From

Andres Freund

Date:

13 December 2016, 04:11:01

On 2016-12-12 20:08:01 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2016-12-13 01:57:01 +0100, Petr Jelinek wrote:
> >> I mentioned that as possible solution upthread, I am only worried that
> >> the failure scenario is basically infinite loop.
>
> > I don't see the problem with that. If you're really concerned you can
> > set a statement timeout.
>
> I don't think "change the test" is the right response here.
> I think the problem is that we're disconnecting from the slot at the
> wrong step of backend shutdown, and that we need to fix that before
> it bites us on some more painful parts of our anatomies.  You can't
> just throw darts at the code when deciding where to do things, and
> proc.c is NOT the place that should be concerned with replication
> slots.

And why is that / where would be more appropriate? It's not
ShutdownPostgres() (usable from bgworkers). And it has to be
after LWLockReleaseAll(), because otherwise we'll potentially just
deadlock against ourselves.

Greetings,

Andres Freund

Re: [COMMITTERS] pgsql: Add support for temporary replication slots

From

Petr Jelinek

Date:

13 December 2016, 04:21:27

On 13/12/16 02:08, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
>> On 2016-12-13 01:57:01 +0100, Petr Jelinek wrote:
>>> I mentioned that as possible solution upthread, I am only worried that
>>> the failure scenario is basically infinite loop.
>
>> I don't see the problem with that. If you're really concerned you can
>> set a statement timeout.
>
> I don't think "change the test" is the right response here.
> I think the problem is that we're disconnecting from the slot at the
> wrong step of backend shutdown, and that we need to fix that before
> it bites us on some more painful parts of our anatomies.  You can't
> just throw darts at the code when deciding where to do things, and
> proc.c is NOT the place that should be concerned with replication
> slots.
>

It has been concerned with replication slots since 9.4 so it's not like
it's newly invented concern. As Andres says, we don't have better place
to put it to at the moment.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Re: [COMMITTERS] pgsql: Add support for temporary replication slots

From

Petr Jelinek

Date:

13 December 2016, 04:28:26

On 13/12/16 02:00, Andres Freund wrote:
> On 2016-12-13 01:57:01 +0100, Petr Jelinek wrote:
>> I mentioned that as possible solution upthread, I am only worried that
>> the failure scenario is basically infinite loop.
>
> I don't see the problem with that. If you're really concerned you can
> set a statement timeout.
>

Okay in case we decide it's the right way to go attached patch does
that. I also added some more tests based on your feedback while I am at it.

--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Attachment

0001-Various-temporary-slots-test-improvements.patch

Re: [COMMITTERS] pgsql: Add support for temporary replication slots

From

Peter Eisentraut

Date:

14 December 2016, 17:35:23

On 12/12/16 8:28 PM, Petr Jelinek wrote:
> On 13/12/16 02:00, Andres Freund wrote:
>> On 2016-12-13 01:57:01 +0100, Petr Jelinek wrote:
>>> I mentioned that as possible solution upthread, I am only worried that
>>> the failure scenario is basically infinite loop.
>>
>> I don't see the problem with that. If you're really concerned you can
>> set a statement timeout.
>>
>
> Okay in case we decide it's the right way to go attached patch does
> that. I also added some more tests based on your feedback while I am at it.

This looks mostly reasonable to me, but the location and xid output from
pg_logical_slot_get_changes() is not portable/repeatable, so it should
be omitted from the output.

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [COMMITTERS] pgsql: Add support for temporary replication slots

From

Petr Jelinek

Date:

15 December 2016, 11:46:27

On 14/12/16 15:35, Peter Eisentraut wrote:
> On 12/12/16 8:28 PM, Petr Jelinek wrote:
>> On 13/12/16 02:00, Andres Freund wrote:
>>> On 2016-12-13 01:57:01 +0100, Petr Jelinek wrote:
>>>> I mentioned that as possible solution upthread, I am only worried that
>>>> the failure scenario is basically infinite loop.
>>>
>>> I don't see the problem with that. If you're really concerned you can
>>> set a statement timeout.
>>>
>>
>> Okay in case we decide it's the right way to go attached patch does
>> that. I also added some more tests based on your feedback while I am at it.
> 
> This looks mostly reasonable to me, but the location and xid output from
> pg_logical_slot_get_changes() is not portable/repeatable, so it should
> be omitted from the output.
> 

Sigh, yes you are right.

-- 
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-committers mailing list (pgsql-committers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-committers

Attachment

0001-Various-temporary-slots-test-improvements.patch

Re: [COMMITTERS] pgsql: Add support for temporary replication slots

From

Peter Eisentraut

Date:

15 December 2016, 16:51:04

On 12/15/16 3:46 AM, Petr Jelinek wrote:
>>> Okay in case we decide it's the right way to go attached patch does
>>> that. I also added some more tests based on your feedback while I am at it.
>>
>> This looks mostly reasonable to me, but the location and xid output from
>> pg_logical_slot_get_changes() is not portable/repeatable, so it should
>> be omitted from the output.
>>
>
> Sigh, yes you are right.

committed

--
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services