Re: [HACKERS] Logical decoding on standby - Mailing list pgsql-hackers
From | Craig Ringer |
---|---|
Subject | Re: [HACKERS] Logical decoding on standby |
Date | |
Msg-id | CAMsr+YH797YL4OxZaU-KSQRDj158tE7t3=9+04Em7rDcL1Xbkg@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Logical decoding on standby (Andres Freund <andres@anarazel.de>) |
Responses |
Re: [HACKERS] Logical decoding on standby
Re: [HACKERS] Logical decoding on standby |
List | pgsql-hackers |
.On 20 March 2017 at 17:33, Andres Freund <andres@anarazel.de> wrote: > Hi, > > Have you checked how high the overhead of XLogReadDetermineTimeline is? > A non-local function call, especially into a different translation-unit > (no partial inlining), for every single page might end up being > noticeable. That's fine in the cases it actually adds functionality, > but for a master streaming out data, that's not actually adding > anything. I don't anticipate any significant effect given the large amount of indirection via decoding, reorder buffer, snapshot builder, output plugin, etc that we already do and how much memory allocation gets done ... but it's worth checking. I could always move the fast path into a macro or inline function if it does turn out to make a detectable difference. One of the things I want to get to is refactoring all the xlog page reading stuff into a single place, shared between walsender and normal backends, to get rid of this confusing mess we currently have. The only necessary difference is how we wait for new WAL, the rest should be as common as possible allowing for xlogreader's needs. I particularly want to get rid of the two identically named static XLogRead functions. But all that probably means making timeline.c FRONTEND friendly and it's way too intrusive to contemplate at this stage. > Did you check whether you changes to read_local_xlog_page could cause > issues with twophase.c? Because that now also uses it. Thanks, that's a helpful point. The commit in question is 978b2f65. I didn't notice that it introduced XLogReader use in twophase.c, though I should've realised given the discussion about fetching recent 2pc info from xlog. I don't see any potential for issues at first glance, but I'll go over it in more detail. The main concern is validity of ThisTimeLineID, but since it doesn't run in recovery I don't see much of a problem there. That also means it can afford to use the current timeline-oblivious read_local_xlog_page safely. TAP tests for 2pc were added by 3082098. I'll check to make sure they have appropriate coverage for this. > Did you check whether ThisTimeLineID is actually always valid in the > processes logical decoding could run in? IIRC it's not consistently > update during recovery in any process but the startup process. I share your concerns that it may not be well enough maintained. Thankyou for the reminder, that's been on my TODO and got lost when I had to task-hop to other priorities. I have some TAP tests to validate promotion that need finishing off. My main concern is around live promotions, both promotion of standby to master, and promotion of upstream master when streaming from a cascaded replica. [Will cover review of 0003 separately, next] > 0002 should be doable as a whole this release, I have severe doubts that > 0003 as a whole has a chance for 10 - the code is in quite a raw shape, > there's a significant number of open ends. I'd suggest breaking of bits > that are independently useful, and work on getting those committed. That would be my preference too. I do not actually feel strongly about the need for logical decoding on standby, and would in many ways prefer to defer it until we have two-way hot standby feedback and the ability to have the master confirm the actual catalog_xmin locked in to eliminate the current race and ugly workaround for it. I'd rather have solid timeline following in place now and bare-minimum failover capability. I'm confident that the ability for xlogreader to follow timeline switches will also be independently useful. The parts I think are important for Pg10 are: * Teach xlogreader to follow timeline switches * Ability to create logical slots on replicas * Ability to advance (via feedback or via SQL function) - no need to actually decode and call output plugins at all. * Ability to drop logical slots on replicas That would be enough to provide minimal standby promotion without hideous hacks. Unfortunately, slot creation on standby is probably the ugliest part of the patch. It can be considerably simplified by imposing the rule that the application must ensure catalog_xmin on the master is already held down (with a replication slot) before creating a slot on the standby, and it's the application's job to send feedback to the master before any standbys it's keeping slots on. If the app fails to do so, the slot on the downstream will become unusable and attempts to decode changes from it will fail with conflict with recovery. That'd get rid of a lot of the code including some of the ugliest bits, since we'd no longer make any special effort with catalog_xmin during slot creation. We're already pushing complexity onto apps for this, after concluding that the transparent failover slots approach wasn't the way forward, so I'm OK with that. Let the apps that want logical decoding to support physical replica promotion pay most of the cost. I'd then like to revisit full decoding on standby later, once we have 2-way hot standby feedback, where the upstream can reply with confirmation xmin is locked in, including cascading handling. Getting there would mostly involve trimming this patch down, which is nice. It would be necessary to add a SQL function and/or walsender command to send feedback on a slot we're not currently replaying changes from, but I see that as independently valuable and have wanted it for a number of things already. We'd still have to decode (so we found the right restart_lsn), but we'd suppress output plugin calls entirely. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: