Thread: [PATCH] Send catalog_xmin separately in hot standby feedback
Hi all Currently hot standby feedback sends GetOldestXmin()'s result to the upstream as the required xmin. GetOldestXmin() returns a slot's catalog_xmin if that's the lowest xmin on the system. That's fine so long as we don't do logical decoding on standbys, but if we start allowing logical slots on standbys it'll cause the master to retain too much bloat since it'll pin the master's xmin (not catalog_xmin) down based on the catalog_xmin of any slots on the downstream. To fix that, add new fields to the hot standby feedback protocol message to carry a separate catalog_xmin. This doesn't need any special care for backward compatibility because the only thing that has any business sending hot standby feedback is a physical standby and they're required to be the same major version as the master. If someone tries to connect with a standby of the wrong version they'll fail long before this, and even if they didn't they'd just get an error saying there's not enough data in the message. pg_basebackup, pg_recvlogical and pg_receivexlog don't send hot standby feedback messages. I'm posting this now because Petr was interested in it for his work on logical replication. I'll be following it a subsequent patch to allow logical slot creation on physical replicas if they're using a slot to talk to the master and have hot_standby_feedback enabled, just so you know the direction this is going in. Passes 'make check', src/bin/pg_basebackup and src/test/recovery TAP tests. I haven't added specific tests for this functionality since there isn't (yet) a way to set catalog_xmin separately on a physical standby without a dedicated test module. The logical decoding timeline following patch[1] is also relevant for this, since it is required for logical decoding on standby to survive promotion. Next steps will be: * Expose information about whether or not a slot is in use from walreceiver.c * Allow logical slots to be created on replicas if hot_standby_feedback is enabled and a logical slot is in use. Return null as the exported snapshot ID when creating over the walsender protocol, since we can't export a snapshot on a standby due to the need to allocate an xid. (That can be addressed separately). * Now that recovery tests are possible, write the recovery test suite for logical decoding on standby * Auto-drop replication slots when dropping a database in dbase_redo * Add a safety mechanism to stop users disabling hs feedback on the replica or stopping using a physical slot to the upstream while logical slots exist on the replica. Or mark such logical slots as unusable using a new persistent field on the slot. Not trivial because we must allow crash recovery without a slot to upstream (obviously), and should preferably also allow fallback to archive recovery when server with slot is temporarily unreachable. Must also consider handling of physical slot with catalog_xmin set from cascading physical replica with logical slots on it. * Extend the logical replication patch to add support for following physical failover using this functionality, likely in 11.0. [1] https://commitfest.postgresql.org/10/779/ I'll add this to the next CF, but I realise the inability to test it standalone may mean it can only be committed as part of a series along with full support for logical decoding from standby. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
On 5 September 2016 at 12:40, Craig Ringer <craig@2ndquadrant.com> wrote: > Hi all > > Currently hot standby feedback sends GetOldestXmin()'s result to the > upstream as the required xmin. GetOldestXmin() returns a slot's > catalog_xmin if that's the lowest xmin on the system. Note that this patch changes the API to GetOldestXmin(), adding a new boolean to allow it to disregard the catalog_xmin of slots. Per Simon's feedback I'm going to split that out into a separate patch, so will post a follow-up split one soon as the series. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 5 September 2016 at 14:44, Craig Ringer <craig@2ndquadrant.com> wrote: > On 5 September 2016 at 12:40, Craig Ringer <craig@2ndquadrant.com> wrote: >> Hi all >> >> Currently hot standby feedback sends GetOldestXmin()'s result to the >> upstream as the required xmin. GetOldestXmin() returns a slot's >> catalog_xmin if that's the lowest xmin on the system. > > > Note that this patch changes the API to GetOldestXmin(), adding a new > boolean to allow it to disregard the catalog_xmin of slots. > > Per Simon's feedback I'm going to split that out into a separate > patch, so will post a follow-up split one soon as the series. Now formatted a series: 1. Send catalog_xmin in hot standby feedback protocol 2. Make walsender respect catalog_xmin in hot standby feedback messages 3. Allow GetOldestXmin(...) to optionally disregard the catalog_xmin 4. Send catalog_xmin separately in hot_standby_feedback messages Descriptions are in the patch headers. 1 adds the protocol field only. The value is at this point always sent as 0 by walreceiver and ignored by walsender. There's need to handle backward compatibility in the addition to the hot standby protocol message here as only the same major version Pg has any business sending us hot standby feedback. pg_receivexlog, pg_basebackup etc don't use hs feedback. Includes protocol docs change. 2 makes walsender now pay attention to the sent catalog_xmin. walreceiver doesn't set it yet and has no way to get it separately. 3 Provides a way to get the global xmin without considering the catalog_xmin so walreceiver can use it. 4 makes walsender use the modified GetOldestXmin() (3) needs additional attention: By ignoring slot catalog_xmin in the GetOldestXmin() call then separately calling ProcArrayGetReplicationSlotXmin() to get the catalog_xmin to we might produce a catalog xmin slightly later than what was in the procarray at the time we previously called GetOldestXmin() to examine backend/session state. ProcArrayLock is released so it can be advanced in-between the calls. That's harmless - it isn't necessary for the reported catalog_xmin to be exactly consistent with backend state. If it advances it's safe to report the new position since we know the confirmed positions are on-disk locally. The alternative here is to extend GetOldestXmin() to take an out-param to report the slot catalog xmin. That really just duplicates the functionality of ProcArrayGetReplicationSlotXmin but means we can grab it within a single ProcArray lock. Variants of GetOldestXmin and ProcArrayGetReplicationSlotXmin that take an already-locked flag would work too, but would hold the lock across parts of GetOldestXmin() that currently don't retain it. I could also convert the current boolean param ignoreVacuum into a flags argument instead of adding another boolean. No real preference from me. I cut out some comment changes to be submitted separately; otherwise this series is much the same as the original patch upthread. Also available at https://github.com/2ndQuadrant/postgres/tree/dev/feedback-catalog-xmin (and tagged dev/feedback-catalog-xmin). Branch subject to rebasing. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
Hi Craig, On 05/09/16 11:28, Craig Ringer wrote: > On 5 September 2016 at 14:44, Craig Ringer <craig@2ndquadrant.com> wrote: >> On 5 September 2016 at 12:40, Craig Ringer <craig@2ndquadrant.com> wrote: >>> Hi all >>> >>> Currently hot standby feedback sends GetOldestXmin()'s result to the >>> upstream as the required xmin. GetOldestXmin() returns a slot's >>> catalog_xmin if that's the lowest xmin on the system. >> >> >> Note that this patch changes the API to GetOldestXmin(), adding a new >> boolean to allow it to disregard the catalog_xmin of slots. >> >> Per Simon's feedback I'm going to split that out into a separate >> patch, so will post a follow-up split one soon as the series. > Here is my review of them. > Now formatted a series: > > 1. Send catalog_xmin in hot standby feedback protocol > + xmin_epoch = nextEpoch; > if (nextXid < xmin) > - nextEpoch--; > + xmin_epoch --; > + catalog_xmin_epoch = nextEpoch; > + if (nextXid < catalog_xmin) > + catalog_xmin_epoch --; Don't understand why you keep the nextEpoch here, it's not used anywhere that I can see, you could just as well use the xmin_epoch directly if that's how you want to name it. > + /* > + * A 10.0+ standby's walsender passes the lowest catalog xmin of any > + * replication slot up to the master. > + */ > + feedbackCatalogXmin = pq_getmsgint(&reply_message, 4); > + feedbackCatalogEpoch = pq_getmsgint(&reply_message, 4); > I'd be more interested to know why this is sent rather than it's sent since version 10+ in this comment. > 2. Make walsender respect catalog_xmin in hot standby feedback messages > + if (TransactionIdIsNormal(feedbackCatalogXmin) > + && TransactionIdPrecedes(feedbackCatalogXmin, feedbackXmin)) > + { > + MyPgXact->xmin = feedbackCatalogXmin; > + } > + else > + { > + MyPgXact->xmin = feedbackXmin; > + } This not how we usually use the {} brackets (there are some more instances of using them for one line of code in this particular commit). > 3. Allow GetOldestXmin(...) to optionally disregard the catalog_xmin > By ignoring slot catalog_xmin in the GetOldestXmin() call then > separately calling ProcArrayGetReplicationSlotXmin() to get the > catalog_xmin to we might produce a catalog xmin slightly later than > what was in the procarray at the time we previously called > GetOldestXmin() to examine backend/session state. ProcArrayLock is > released so it can be advanced in-between the calls. That's harmless - > it isn't necessary for the reported catalog_xmin to be exactly > consistent with backend state. If it advances it's safe to report the > new position since we know the confirmed positions are on-disk > locally. > > The alternative here is to extend GetOldestXmin() to take an out-param > to report the slot catalog xmin. That really just duplicates the > functionality of ProcArrayGetReplicationSlotXmin but means we can grab > it within a single ProcArray lock. Variants of GetOldestXmin and > ProcArrayGetReplicationSlotXmin that take an already-locked flag would > work too, but would hold the lock across parts of GetOldestXmin() that > currently don't retain it. I could also convert the current boolean > param ignoreVacuum into a flags argument instead of adding another > boolean. No real preference from me. > I would honestly prefer the change to GetOldestXmin to return the catalog_xmin. It seems both cleaner and does less locking. > 4. Send catalog_xmin separately in hot_standby_feedback messages > This looks okay (provided the change above). In general it's simpler patch than I expected which is good. But it would be good to have some tests. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On 25 October 2016 at 00:19, Petr Jelinek <petr@2ndquadrant.com> wrote: >> Now formatted a series: >> >> 1. Send catalog_xmin in hot standby feedback protocol > >> + xmin_epoch = nextEpoch; >> if (nextXid < xmin) >> - nextEpoch--; >> + xmin_epoch --; >> + catalog_xmin_epoch = nextEpoch; >> + if (nextXid < catalog_xmin) >> + catalog_xmin_epoch --; > > Don't understand why you keep the nextEpoch here, it's not used anywhere > that I can see, you could just as well use the xmin_epoch directly if > that's how you want to name it. Fixed, thanks. >> + /* >> + * A 10.0+ standby's walsender passes the lowest catalog xmin of any >> + * replication slot up to the master. >> + */ >> + feedbackCatalogXmin = pq_getmsgint(&reply_message, 4); >> + feedbackCatalogEpoch = pq_getmsgint(&reply_message, 4); >> > > I'd be more interested to know why this is sent rather than it's sent > since version 10+ in this comment. Removed. It's explained in a comment inside the if (hot_standby_feedback) block in walreceiver anyway. >> 2. Make walsender respect catalog_xmin in hot standby feedback messages > >> + if (TransactionIdIsNormal(feedbackCatalogXmin) >> + && TransactionIdPrecedes(feedbackCatalogXmin, feedbackXmin)) >> + { >> + MyPgXact->xmin = feedbackCatalogXmin; >> + } >> + else >> + { >> + MyPgXact->xmin = feedbackXmin; >> + } > > This not how we usually use the {} brackets (there are some more > instances of using them for one line of code in this particular commit). Whoops. Thanks. I find the Pg convention pretty ghastly when dealing with multi-line 'if' conditions followed by a single-line statement, but it's still the convention whether I like it or not. >> 3. Allow GetOldestXmin(...) to optionally disregard the catalog_xmin >> By ignoring slot catalog_xmin in the GetOldestXmin() call then >> separately calling ProcArrayGetReplicationSlotXmin() to get the >> catalog_xmin to we might produce a catalog xmin slightly later than >> what was in the procarray at the time we previously called >> GetOldestXmin() to examine backend/session state. ProcArrayLock is >> released so it can be advanced in-between the calls. That's harmless - >> it isn't necessary for the reported catalog_xmin to be exactly >> consistent with backend state. If it advances it's safe to report the >> new position since we know the confirmed positions are on-disk >> locally. >> >> The alternative here is to extend GetOldestXmin() to take an out-param >> to report the slot catalog xmin. That really just duplicates the >> functionality of ProcArrayGetReplicationSlotXmin but means we can grab >> it within a single ProcArray lock. Variants of GetOldestXmin and >> ProcArrayGetReplicationSlotXmin that take an already-locked flag would >> work too, but would hold the lock across parts of GetOldestXmin() that >> currently don't retain it. I could also convert the current boolean >> param ignoreVacuum into a flags argument instead of adding another >> boolean. No real preference from me. >> > > I would honestly prefer the change to GetOldestXmin to return the > catalog_xmin. It seems both cleaner and does less locking. Fair enough. Done. > In general it's simpler patch than I expected which is good. But it > would be good to have some tests. Agreed. OK, I've added basic tests for physical replication slots and hot_standby_feedback to t/001_stream_rep.pl since it make sense to test both along with stream_rep.pl's tests of cascading, etc and I don't think a separate test is needed. It's not actually practical to add tests for the catalog_xmin on standby functionality until the next patch in the series (pending) which enables logical decoding on standby. Currently you can't create a slot on a standby so you can't cause the standby to hold down catalog_xmin. But the tests show things work how they should within the range of currently exposed functionality. In the process I noticed how skeletal those tests still are. We have a great framework now (thanks Michael!) and I'd like to start filling it out with tests involving unclean shutdowns, promotions, etc. There's a lot still to write to get solid coverage. Tests aren't hard. Who's keen to write some? I'll happily help any volunteers out. New patch series attached. 0001 is the new tests. The guts is patches 2-5. I'm not sure whether 2, 3, 4 and 5 should be squashed for commit or not, but I left them separate for easier review. For complete functionality this series will want to be coupled with logical decoding timeline following and a pending patch to enable logical decoding on standby. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Attachment
- 0005-Send-catalog_xmin-separately-in-hot_standby_feedback.patch
- 0004-Allow-GetOldestXmin-.-to-optionally-disregard-the-ca.patch
- 0003-Make-walsender-respect-catalog_xmin-in-hot-standby-f.patch
- 0002-Send-catalog_xmin-in-hot-standby-feedback-protocol.patch
- 0001-Expand-streaming-replication-tests-to-cover-hot-stan.patch