Re: Minimal logical decoding on standbys - Mailing list pgsql-hackers
From | Drouvot, Bertrand |
---|---|
Subject | Re: Minimal logical decoding on standbys |
Date | |
Msg-id | 2feee395-959b-6d53-369a-c00f8f2fa6c5@amazon.com Whole thread Raw |
In response to | Re: [UNVERIFIED SENDER] Re: Minimal logical decoding on standbys ("Drouvot, Bertrand" <bdrouvot@amazon.com>) |
Responses |
Re: Minimal logical decoding on standbys
|
List | pgsql-hackers |
Hi Andres, On 6/22/21 12:38 PM, Drouvot, Bertrand wrote: > Hi Andres, > > On 6/14/21 7:41 AM, Drouvot, Bertrand wrote: >> Hi Andres, >> >> On 4/8/21 5:47 AM, Andres Freund wrote: >>> Hi, >>> >>> On 2021-04-07 13:32:18 -0700, Andres Freund wrote: >>>> While working on this I found a, somewhat substantial, issue: >>>> >>>> When the primary is idle, on the standby logical decoding via >>>> walsender >>>> will typically not process the records until further WAL writes >>>> come in >>>> from the primary, or until a 10s lapsed. >>>> >>>> The problem is that WalSndWaitForWal() waits for the *replay* LSN to >>>> increase, but gets woken up by walreceiver when new WAL has been >>>> flushed. Which means that typically walsenders will get woken up at >>>> the >>>> same time that the startup process will be - which means that by the >>>> time the logical walsender checks GetXLogReplayRecPtr() it's unlikely >>>> that the startup process already replayed the record and updated >>>> XLogCtl->lastReplayedEndRecPtr. >>>> >>>> I think fixing this would require too invasive changes at this >>>> point. I >>>> think we might be able to live with 10s delay issue for one >>>> release, but >>>> it sure is ugly :(. >>> This is indeed pretty painful. It's a lot more regularly occuring if >>> you >>> either have a slot disk, or you switch around the order of >>> WakeupRecovery() and WalSndWakeup() XLogWalRcvFlush(). >>> >>> - There's about which timeline to use. If you use pg_recvlogical and >>> you >>> restart the server, you'll see errors like: >>> >>> pg_recvlogical: error: unexpected termination of replication >>> stream: ERROR: requested WAL segment 000000000000000000000003 has >>> already been removed >>> >>> the real filename is 000000010000000000000003 - i.e. the timeline is >>> 0. >>> >>> This isn't too hard to fix, but definitely needs fixing. >> >> Thanks, nice catch! >> >> From what I have seen, we are not going through InitXLOGAccess() on a >> Standby and in some cases (like the one you mentioned) >> StartLogicalReplication() is called without IdentifySystem() being >> called previously: this lead to ThisTimeLineID still set to 0. >> >> I am proposing a fix in the attached v18 by adding a check in >> StartLogicalReplication() and ensuring that ThisTimeLineID is retrieved. >> >>> >>> - ResolveRecoveryConflictWithLogicalSlots() is racy - potentially >>> leading us to drop a slot that has been created since we signalled a >>> recovery conflict. See >>> https://www.postgresql.org/message-id/20210408020913.zzprrlvqyvlt5cyy%40alap3.anarazel.de >>> >>> for some very similar issues. >> >> I have rewritten this part by following the same logic as the one >> used in 96540f80f8 (the commit linked to the thread you mentioned). >> >>> >>> - Given the precedent of max_slot_wal_keep_size, I think it's wrong to >>> just drop the logical slots. Instead we should just mark them as >>> invalid, like InvalidateObsoleteReplicationSlots(). >> >> Makes fully sense and done that way in the attached patch. >> >> I am setting the slot's data.xmin and data.catalog_xmin as >> InvalidTransactionId to mark the slot(s) as invalid in case of conflict. >> >>> - There's no tests covering timeline switches, what happens if >>> there's a >>> promotion if logical decoding is currently ongoing. >> >> I'll now work on the tests. >> >>> >>> - The way ResolveRecoveryConflictWithLogicalSlots() builds the error >>> message is not good (and I've complained about it before...). >> >> I changed it and made it more simple. >> >> I also removed the details around mentioning xmin or catalog xmin (as >> I am not sure of the added value and they are currently also not >> mentioned during standby recovery snapshot conflict). >> >>> >>> Unfortunately I think the things I have found are too many for me to >>> address within the given time. I'll send a version with a somewhat >>> polished set of the changes I made in the next few days... >> >> Thanks for the review and feedback. >> >> Please find enclosed v18 with the changes I worked on. >> >> I still need to have a look on the tests. > > Please find enclosed v19 that also contains the changes related to > your TAP tests remarks, mainly: > > - get rid of 024 and add more tests in 026 (025 has been used in the > meantime) > > - test that logical decoding actually produces useful and correct results > > - test standby promotion and logical decoding behavior once done > > - useless "use" removal > > - check_confl_logicalslot() function removal > > - rewrote make_slot_active() to make use of poll_query_until() and > timeout > > - remove the useless eval() > > - remove the "Catalog xmins should advance after standby logical slot > fetches the changes" test > > One thing that's not clear to me is your remark "There's also no test > for a recovery conflict due to row removal": Don't you think that the > "vacuum full" conflict test is enough? if not, what kind of additional > tests would you like to see? > >> >> There is also the 10s delay to work on, do you already have an idea >> on how we should handle it? >> >> Thanks >> >> Bertrand >> > Thanks > > Bertrand > Please find enclosed v20 a needed rebase (nothing serious worth mentioning) of v19. FWIW, just to sum up that v19 (and so v20): - contained the changes (see details above) related to your TAP tests remarks - contained the changes (see details above) related to your code remarks There is still the 10s delay thing that need work: do you already have an idea on how we should handle it? And still one thing that's not clear to me is your remark "There's also no test for a recovery conflict due to row removal": Don't you think that the "vacuum full" conflict test is enough? if not, what kind of additional tests would you like to see? Thanks Bertrand
Attachment
pgsql-hackers by date: