Thread: Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
From
Ashutosh Bapat
Date:
On Wed, Nov 6, 2024 at 8:54 PM Tomas Vondra <tomas@vondra.me> wrote: > > Hi, > > I've been investigating some issues reported by users, related to > logical replication unexpectedly breaking with messages like: > > LOG: invalidating slot "s" because its restart_lsn X/Y exceeds > max_slot_wal_keep_size > > which is pretty confusing, because the system has that GUC set to -1 (so > disabled, there should be no limit). Or a message like this: > > ERROR: requested WAL segment 000...0AA has already been removed > > which is a bit less confusing, but still puzzling because replication > slots are meant to prevent exactly this. > > I speculated there's some sort of rare race condition, in how we advance > the slot LSN values. I didn't know where to start, so I gave up on > trying to understand the whole complex code. Instead, I wrote a simple > stress test that > > 1) sets up a cluster (primary + 1+ logical replicas) > > 2) runs pgbench on the primary, checks replicas/slots > > 3) randomly restarts the nodes (both primary/replicas) with either fast > and immediate mode, with random short pauses > > 4) runs checkpoints in tight while loop > > This is based on the observations that in past reports the issues seem > to happen only with logical replication, shortly after reconnect (e.g. > after connection reset etc.). And the slots are invalidated / WAL > removed during a checkpoint, so frequent checkpoints should make it > easier to hit ... > > Attached is a couple scripts running this - it's not particularly pretty > and may need some tweak to make it work on your machine (run.sh is the > script to run). > > And unfortunately, this started to fail pretty quick. The short story is > that it's not difficult to get into a situation where restart_lsn for a > slot moves backwards, so something like this can happen: > > 1) slot restart_lsn moves forward to LSN A > > 2) checkpoint happens, updates min required LSN for slots, recycles > segments it considers unnecessary (up to LSN A) > > 3) slot restart_lsn moves backwards to LSN B (where B < A) > > 4) kaboom > > This would perfectly explain all the symptoms I mentioned earlier. The > max_slot_wal_keep_size reference is confusing, but AFAIK it's just based > on (reasonable) belief that LSN can't move backwards, and so the only > reason for restart_lsn being before min required LSN is that this GUC > kicked in. But the assumption does not hold :-( > > Now, why/how can this happen? > > I kept adding a elog(LOG) messages to all places updating LSNs for a > slot, and asserts to fail if data.restart_lsn moves backwards. See the > attached 0001 patch. An example for a failure (for the walsended backend > that triggered the assert) looks like this: > > 344.139 LOG: starting logical decoding for slot "s1" > > 344.139 DETAIL: Streaming transactions committing after 1/E97EAC30, > reading WAL from 1/E96FB4D0. > > 344.140 LOG: logical decoding found consistent point at 1/E96FB4D0 > > 344.140 DETAIL: Logical decoding will begin using saved snapshot. > > 344.140 LOG: LogicalConfirmReceivedLocation 1/E9865398 > > 344.140 LOG: LogicalConfirmReceivedLocation updating > data.restart_lsn to 1/E979D4C8 (from 1/E96FB4D0) > candidate_restart_valid 0/0 (from 1/E9865398) > candidate_restart_lsn 0/0 (from 1/E979D4C8) > > 344.145 LOG: LogicalIncreaseRestartDecodingForSlot > restart_decoding_lsn 1/E96FB4D0 > > 344.145 LOG: LogicalIncreaseRestartDecodingForSlot s1 > candidate_restart_valid_lsn 1/E979D4C8 (0/0) > candidate_restart_lsn 1/E96FB4D0 (0/0) > > 344.147 LOG: LogicalIncreaseRestartDecodingForSlot > restart_decoding_lsn 1/E979D4C8 > > 344.156 LOG: LogicalIncreaseXminForSlot > candidate_catalog_xmin 30699 > candidate_xmin_lsn 1/E993AD68 (0/0) > ... > 344.235 LOG: LogicalIncreaseRestartDecodingForSlot > restart_decoding_lsn 1/E9F33AF8 > > 344.240 LOG: LogicalConfirmReceivedLocation 1/E9DCCD78 > > 344.240 LOG: LogicalConfirmReceivedLocation updating > data.restart_lsn to 1/E96FB4D0 (from 1/E979D4C8) > candidate_restart_valid 0/0 (from 1/E979D4C8) > candidate_restart_lsn 0/0 (from 1/E96FB4D0) > > 345.536 LOG: server process (PID 2518127) was terminated by > signal 6: Aborted > > We start decoding at 1/E96FB4D0, and right after startup we get a > confirmation, and LogicalConfirmReceivedLocation updates restart_lsn to > 1/E979D4C8. > > But then LogicalIncreaseRestartDecodingForSlot comes along, and stores > the restart_decoding_lsn 1/E96FB4D0 (which is the initial restart_lsn) > into candidate_restart_lsn. > > And then a little bit later we get another confirmation, we call > LogicalConfirmReceivedLocation which propagates candidate_restart_lsn > into data.restart_lsn. > > This is how restart_lsn moves backwards, causing issues. I've reproduced > this on PG14 and current master, but I believe the issue exists since > the introduction of logical replication in 9.4 :-( > > I'm not claiming this is the only way how this can happen, but all the > cases I've seen in my stress testing look like this. Moreover, I'm not > claiming this is the only LSN field that can move backwards like this. > It seems to me various other candidate_ fields have the same issue, but > may have consequences other than discarding "unnecessary" WAL. > > I've been removed of this [1] thread from 2022. I'm 99% sure it's the > same issue - it happened shortly after a reconnect, etc. And it seems to > me Masahiko-san was about the causes in [2]. I don't think the fix > suggested in that message (changing the branch to "else if") can work, > though. At least it did not really help in my testing. > > And I'm not sure it'd fix all the issues - it only affects restart_lsn, > but AFAICS the same issue (LSNs moving backwards) can happen for the > other LSN slot field (candidate_xmin_lsn). After examining the code before reading [2], I came to the same conclusion as Masahiko-san in [2]. We install candidate_restart_lsn based on the running transaction record whose LSN is between restart_lsn and confirmed_flush_lsn. Since candidate_restart_valid of such candidates would be lesser than any confirmed_flush_lsn received from downstream. I am surprised that the fix suggested by Masahiko-san didn't work though. The fix also fix the asymmetry, between LogicalIncreaseXminForSlot and LogicalIncreaseRestartDecodingForSlot, that you have pointed out in your next email. What behaviour do you see with that fix applied? [1] https://www.postgresql.org/message-id/Yz2hivgyjS1RfMKs%40depesz.com [2] https://www.postgresql.org/message-id/CAD21AoBVhYnGBuW_o%3DwEGgTp01qiHNAx1a14b1X9kFXmuBe%3Dsg%40mail.gmail.com -- Best Wishes, Ashutosh Bapat
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
From
Ashutosh Bapat
Date:
On Sat, Nov 9, 2024 at 5:05 PM Tomas Vondra <tomas@vondra.me> wrote: > > On 11/8/24 15:57, Ashutosh Bapat wrote: > > ... > > > > After examining the code before reading [2], I came to the same > > conclusion as Masahiko-san in [2]. We install candidate_restart_lsn > > based on the running transaction record whose LSN is between > > restart_lsn and confirmed_flush_lsn. Since candidate_restart_valid of > > such candidates would be lesser than any confirmed_flush_lsn received > > from downstream. I am surprised that the fix suggested by Masahiko-san > > didn't work though. The fix also fix the asymmetry, between > > LogicalIncreaseXminForSlot and LogicalIncreaseRestartDecodingForSlot, > > that you have pointed out in your next email. What behaviour do you > > see with that fix applied? > > > > > > [1] https://www.postgresql.org/message-id/Yz2hivgyjS1RfMKs%40depesz.com > > [2] https://www.postgresql.org/message-id/CAD21AoBVhYnGBuW_o%3DwEGgTp01qiHNAx1a14b1X9kFXmuBe%3Dsg%40mail.gmail.com > > > > > > I read that message (and the earlier discussion multiple times) while > investigating the issue, and TBH it's not very clear to me what the > conclusion is :-( > > There's some discussion about whether the candidate fields should be > reset on release or not. There are even claims that it might be > legitimate to not reset the fields and update the restart_lsn. Using > such "stale" LSN values seems rather suspicious to me, but I don't have > a proof that it's incorrect. FWIW this unclarity is what I mentioned the > policy/contract for candidate fields is not explained anywhere. > > That being said, I gave that fix a try - see the attached 0001 patch. It > tweaks LogicalIncreaseRestartDecodingForSlot (it needs a bit more care > because of the spinlock), and it adds a couple asserts to make sure the > data.restart_lsn never moves back. > > And indeed, with this my stress test script does not crash anymore. :) I think the problem is about processing older running transactions record and setting data.restart_lsn based on the candidates those records produce. But what is not clear to me is how come a newer candidate_restart_lsn is available immediately upon WAL sender restart. I.e. in the sequence of events you mentioned in your first email 1. 344.139 LOG: starting logical decoding for slot "s1" 2. 344.139 DETAIL: Streaming transactions committing after 1/E97EAC30, reading WAL from 1/E96FB4D0. 3. 344.140 LOG: logical decoding found consistent point at 1/E96FB4D0 4. 344.140 DETAIL: Logical decoding will begin using saved snapshot. 5. 344.140 LOG: LogicalConfirmReceivedLocation 1/E9865398 6. 344.140 LOG: LogicalConfirmReceivedLocation updating data.restart_lsn to 1/E979D4C8 (from 1/E96FB4D0) candidate_restart_valid 0/0 (from 1/E9865398) candidate_restart_lsn 0/0 (from 1/E979D4C8) how did candidate_restart_lsn = 1/E979D4C8 and candidate_restart_valid = 1/E9865398 were set in ReplicationSlot after WAL sender? It means it must have read and processed running transaction record at 1/E9865398. If that's true, how come it went back to a running transactions WAL record at 1/E979D4C8? It should be reading WAL records sequentially, hence read 1/E979D4C8 first then 1/E9865398. 344.145 LOG: LogicalIncreaseRestartDecodingForSlot s1 candidate_restart_valid_lsn 1/E979D4C8 (0/0) candidate_restart_lsn 1/E96FB4D0 (0/0) -- Best Wishes, Ashutosh Bapat
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
From
Tomas Vondra
Date:
On 11/11/24 14:51, Ashutosh Bapat wrote: > ... > > I think the problem is about processing older running transactions > record and setting data.restart_lsn based on the candidates those > records produce. But what is not clear to me is how come a newer > candidate_restart_lsn is available immediately upon WAL sender > restart. I.e. in the sequence of events you mentioned in your first > email > 1. 344.139 LOG: starting logical decoding for slot "s1" > > 2. 344.139 DETAIL: Streaming transactions committing after 1/E97EAC30, > reading WAL from 1/E96FB4D0. > > 3. 344.140 LOG: logical decoding found consistent point at 1/E96FB4D0 > > 4. 344.140 DETAIL: Logical decoding will begin using saved snapshot. > > 5. 344.140 LOG: LogicalConfirmReceivedLocation 1/E9865398 > > 6. 344.140 LOG: LogicalConfirmReceivedLocation updating > data.restart_lsn to 1/E979D4C8 (from 1/E96FB4D0) > candidate_restart_valid 0/0 (from 1/E9865398) > candidate_restart_lsn 0/0 (from 1/E979D4C8) > > how did candidate_restart_lsn = 1/E979D4C8 and candidate_restart_valid > = 1/E9865398 were set in ReplicationSlot after WAL sender? It means it > must have read and processed running transaction record at 1/E9865398. > If that's true, how come it went back to a running transactions WAL > record at 1/E979D4C8? It should be reading WAL records sequentially, > hence read 1/E979D4C8 first then 1/E9865398. > > 344.145 LOG: LogicalIncreaseRestartDecodingForSlot s1 > candidate_restart_valid_lsn 1/E979D4C8 (0/0) > candidate_restart_lsn 1/E96FB4D0 (0/0) > Those are good questions, but IIUC that's explained by this comment from Masahiko-san's analysis [1]: Thinking about the root cause more, it seems to me that the root cause is not the fact that candidate_xxx values are not cleared when being released. In the scenario I reproduced, after restarting the logical decoding, the walsender sets the restart_lsn to a candidate_restart_lsn left in the slot upon receiving the ack from the subscriber. ... If this is correct, then what happens is: 1) replication is running, at some point we set candidate LSN to B 2) something breaks, causing reconnect with restart LSN A (< B) 3) we still have the candidate LSN B in memory, and after receiving some confirmation we set it as restart_lsn 4) we get to decode the RUNNING_XACTS, which moves restart_lsn back If this analysis is correct, I think it's rather suspicious we don't reset the candidate fields on restart. Can those "old" values ever be valid? But I haven't tried resetting them. Also, this is why I'm not entirely sure just tweaking the conditions in LogicalIncreaseRestartDecodingForSlot is quite correct. Maybe it fixes this particular issue, but maybe the right fix would be to reset the candidate fields on reconnect? And this change would be just hiding the actual problem. I haven't tried this. [1] https://www.postgresql.org/message-id/CAD21AoBVhYnGBuW_o%3DwEGgTp01qiHNAx1a14b1X9kFXmuBe%3Dsg%40mail.gmail.com -- Tomas Vondra
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
From
Masahiko Sawada
Date:
On Mon, Nov 11, 2024 at 6:17 AM Tomas Vondra <tomas@vondra.me> wrote: > > If this analysis is correct, I think it's rather suspicious we don't > reset the candidate fields on restart. Can those "old" values ever be > valid? But I haven't tried resetting them. I had the same question. IIRC resetting them also fixes the problem[1]. However, I got a comment from Alvaro[2]: Hmm, interesting -- I was studying some other bug recently involving the xmin of a slot that had been invalidated and I remember wondering if these "candidate" fields were being properly ignored when the slot is marked not in use; but I didn't check. Are you sure that resetting them when the slot is released is the appropriate thing to do? I mean, shouldn't they be kept set while the slot is in use, and only reset if we destroy it? Which made me re-investigate the issue and thought that it doesn't necessarily need to clear these candidate values in memory on releasing a slot as long as we're carefully updating restart_lsn. Which seems a bit efficient for example when restarting from a very old point. Of course, even if we reset them on releasing a slot, it would perfectly work since it's the same as restarting logical decoding with a server restart. I think LogicalIncreaseRestartDecodingForSlot() should be fixed as it seems not to be working expectedly, but I could not have proof that we should either keep or reset them on releasing a slot. Regards, [1] https://www.postgresql.org/message-id/CAD21AoBG2OSDOFTtpPtQ7fx5Vt8p3dS5hPAv28CBSC6z2kHx-g%40mail.gmail.com [2] https://www.postgresql.org/message-id/20230206152209.yglmntznhcmaueyn%40alvherre.pgsql -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
From
Tomas Vondra
Date:
On 11/11/24 23:41, Masahiko Sawada wrote: > On Mon, Nov 11, 2024 at 6:17 AM Tomas Vondra <tomas@vondra.me> wrote: >> >> If this analysis is correct, I think it's rather suspicious we don't >> reset the candidate fields on restart. Can those "old" values ever be >> valid? But I haven't tried resetting them. > > I had the same question. IIRC resetting them also fixes the > problem[1]. However, I got a comment from Alvaro[2]: > > Hmm, interesting -- I was studying some other bug recently involving the > xmin of a slot that had been invalidated and I remember wondering if > these "candidate" fields were being properly ignored when the slot is > marked not in use; but I didn't check. Are you sure that resetting them > when the slot is released is the appropriate thing to do? I mean, > shouldn't they be kept set while the slot is in use, and only reset if > we destroy it? > > Which made me re-investigate the issue and thought that it doesn't > necessarily need to clear these candidate values in memory on > releasing a slot as long as we're carefully updating restart_lsn. Not sure, but maybe it'd be useful to ask the opposite question. Why shouldn't it be correct to reset the fields, which essentially puts the slot into the same state as if it was just read from disk? That also discards all these values, and we can't rely on accidentally keeping something important info in memory (because if the instance restarts we'd lose that). But this reminds me that the patch I shared earlier today resets the slot in the ReplicationSlotAcquire() function, but I guess that's not quite correct. It probably should be in the "release" path. > Which seems a bit efficient for example when restarting from a very > old point. Of course, even if we reset them on releasing a slot, it > would perfectly work since it's the same as restarting logical > decoding with a server restart. I find the "efficiency" argument a bit odd. It'd be fine if we had a correct behavior to start with, but we don't have that ... Also, I'm not quite sure why exactly would it be more efficient? And how likely is this in practice? It seems to me that performance-sensitive cases can't do reconnects very often anyway, that's inherently inefficient. No? > I think > LogicalIncreaseRestartDecodingForSlot() should be fixed as it seems > not to be working expectedly, but I could not have proof that we > should either keep or reset them on releasing a slot. > Not sure. Chances are we need both fixes, if only to make LogicalIncreaseRestartDecodingForSlot more like the other function. regards -- Tomas Vondra
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
From
Masahiko Sawada
Date:
On Mon, Nov 11, 2024 at 2:08 PM Tomas Vondra <tomas@vondra.me> wrote: > > > But neither of those fixes prevents backwards move for confirmed_flush > LSN, as enforced by asserts in the 0005 patch. I don't know if this > assert is incorrect or now. It seems natural that once we get a > confirmation for some LSN, we can't move before that position, but I'm > not sure about that. Maybe it's too strict. Hmm, I'm concerned that it might be another problem. I think there are some cases where a subscriber sends a flush position older than slot's confirmed_flush as a feedback message. But it seems to be dangerous if we always accept it as a new confirmed_flush value. It could happen that confirm_flush could be set to a LSN older than restart_lsn. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
From
Ashutosh Bapat
Date:
On Tue, Nov 12, 2024 at 12:02 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > On Mon, Nov 11, 2024 at 2:08 PM Tomas Vondra <tomas@vondra.me> wrote: > > > > > > But neither of those fixes prevents backwards move for confirmed_flush > > LSN, as enforced by asserts in the 0005 patch. I don't know if this > > assert is incorrect or now. It seems natural that once we get a > > confirmation for some LSN, we can't move before that position, but I'm > > not sure about that. Maybe it's too strict. > > Hmm, I'm concerned that it might be another problem. I think there are > some cases where a subscriber sends a flush position older than slot's > confirmed_flush as a feedback message. But it seems to be dangerous if > we always accept it as a new confirmed_flush value. It could happen > that confirm_flush could be set to a LSN older than restart_lsn. > If confirmed_flush LSN moves backwards, it means the transactions which were thought to be replicated earlier are no longer considered to be replicated. This means that the restart_lsn of the slot needs to be at least far back as the oldest of starting points of those transactions. Thus restart_lsn of slot has to be pushed further back. That WAL may not be available anymore. Similar issue with catalog_xmin, the older catalog rows may have been removed. Other problem is we may send some transactions twice, which might cause trouble downstream. So I agree that confirmed_flush LSN should not move backwards. IIRC, if the downstream sends an older confirmed_flush in START_REPLICATION message, WAL sender does not consider it and instead uses the one in replication slot. -- Best Wishes, Ashutosh Bapat
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
From
Ashutosh Bapat
Date:
On Tue, Nov 12, 2024 at 4:54 AM Tomas Vondra <tomas@vondra.me> wrote: > > > > On 11/11/24 23:41, Masahiko Sawada wrote: > > On Mon, Nov 11, 2024 at 6:17 AM Tomas Vondra <tomas@vondra.me> wrote: > > > > Which made me re-investigate the issue and thought that it doesn't > > necessarily need to clear these candidate values in memory on > > releasing a slot as long as we're carefully updating restart_lsn. > > Not sure, but maybe it'd be useful to ask the opposite question. Why > shouldn't it be correct to reset the fields, which essentially puts the > slot into the same state as if it was just read from disk? That also > discards all these values, and we can't rely on accidentally keeping > something important info in memory (because if the instance restarts > we'd lose that). > > But this reminds me that the patch I shared earlier today resets the > slot in the ReplicationSlotAcquire() function, but I guess that's not > quite correct. It probably should be in the "release" path. > > > Which seems a bit efficient for example when restarting from a very > > old point. Of course, even if we reset them on releasing a slot, it > > would perfectly work since it's the same as restarting logical > > decoding with a server restart. > > I find the "efficiency" argument a bit odd. It'd be fine if we had a > correct behavior to start with, but we don't have that ... Also, I'm not > quite sure why exactly would it be more efficient? > > And how likely is this in practice? It seems to me that > performance-sensitive cases can't do reconnects very often anyway, > that's inherently inefficient. No? > > > I think > > LogicalIncreaseRestartDecodingForSlot() should be fixed as it seems > > not to be working expectedly, but I could not have proof that we > > should either keep or reset them on releasing a slot. > > > > Not sure. Chances are we need both fixes, if only to make > LogicalIncreaseRestartDecodingForSlot more like the other function. > Thanks a lot for pointing to Masahiko's analysis. I missed that part when I read that thread. Sorry. A candidate_restart_lsn and candidate_restart_valid pair just tells that we may set slot's data.restart_lsn to candidate_restart_lsn when the downstream confirms an LSN >= candidate_restart_valid. That pair can never get inaccurate. It may get stale but never inaccurate. So wiping those fields from ReplicationSlot is unnecessary. What should ideally happen is we should ignore candidates produced by older running transactions WAL records after WAL sender restart. This is inline with what logical replication does with transactions committed before slot's confirmed_flush_lsn - those are ignored. But the criteria for ignoring running transactions records is slightly different from that for transactions. If we ignore candidate_restart_lsn which has candidate_restart_valid <= confirmed_flush_lsn, we might lose some opportunity to advance data.restart_lsn. Instead we should ignore any candidate_restart_lsn <= data.restart_lsn especially before WAL sender finds first change to send downstream. We can do that in SnapBuildProcessRunningXacts() by accessing MyReplicationSlot, taking lock on it and then comparing data.restart_lsn with txn->restart_decoding_lsn before calling LogicalIncreaseRestartDecodingForSlot(). But then LogicalIncreaseRestartDecodingForSlot() would be doing the same anyway after applying your patch 0004. The only downside of 0004 is that the logic to ignore candidates produced by a running transactions record is not clearly visible in SnapBuildProcessRunningXacts(). For a transaction which is ignored the logic to ignore the transaction is visible in DecodeCommit() or DecodeAbort() - where people are likely to look for that logic. We may add a comment to that effect in SnapBuildProcessRunningXacts(). -- Best Wishes, Ashutosh Bapat
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
From
Tomas Vondra
Date:
On 11/12/24 10:37, Ashutosh Bapat wrote: > On Tue, Nov 12, 2024 at 4:54 AM Tomas Vondra <tomas@vondra.me> wrote: >> >> >> >> On 11/11/24 23:41, Masahiko Sawada wrote: >>> On Mon, Nov 11, 2024 at 6:17 AM Tomas Vondra <tomas@vondra.me> wrote: >>> >>> Which made me re-investigate the issue and thought that it doesn't >>> necessarily need to clear these candidate values in memory on >>> releasing a slot as long as we're carefully updating restart_lsn. >> >> Not sure, but maybe it'd be useful to ask the opposite question. Why >> shouldn't it be correct to reset the fields, which essentially puts the >> slot into the same state as if it was just read from disk? That also >> discards all these values, and we can't rely on accidentally keeping >> something important info in memory (because if the instance restarts >> we'd lose that). >> >> But this reminds me that the patch I shared earlier today resets the >> slot in the ReplicationSlotAcquire() function, but I guess that's not >> quite correct. It probably should be in the "release" path. >> >>> Which seems a bit efficient for example when restarting from a very >>> old point. Of course, even if we reset them on releasing a slot, it >>> would perfectly work since it's the same as restarting logical >>> decoding with a server restart. >> >> I find the "efficiency" argument a bit odd. It'd be fine if we had a >> correct behavior to start with, but we don't have that ... Also, I'm not >> quite sure why exactly would it be more efficient? >> >> And how likely is this in practice? It seems to me that >> performance-sensitive cases can't do reconnects very often anyway, >> that's inherently inefficient. No? >> >>> I think >>> LogicalIncreaseRestartDecodingForSlot() should be fixed as it seems >>> not to be working expectedly, but I could not have proof that we >>> should either keep or reset them on releasing a slot. >>> >> >> Not sure. Chances are we need both fixes, if only to make >> LogicalIncreaseRestartDecodingForSlot more like the other function. >> > > Thanks a lot for pointing to Masahiko's analysis. I missed that part > when I read that thread. Sorry. > No need to apologize. The discussion is complex and spread over a rather long time period. > A candidate_restart_lsn and candidate_restart_valid pair just tells > that we may set slot's data.restart_lsn to candidate_restart_lsn when > the downstream confirms an LSN >= candidate_restart_valid. That pair > can never get inaccurate. It may get stale but never inaccurate. So > wiping those fields from ReplicationSlot is unnecessary. > Isn't this issue a proof that those fields *can* get inaccurate? Or what do you mean by "stale but not inaccurate"? > What should ideally happen is we should ignore candidates produced by > older running transactions WAL records after WAL sender restart. This > is inline with what logical replication does with transactions > committed before slot's confirmed_flush_lsn - those are ignored. But > the criteria for ignoring running transactions records is slightly > different from that for transactions. If we ignore > candidate_restart_lsn which has candidate_restart_valid <= > confirmed_flush_lsn, we might lose some opportunity to advance > data.restart_lsn. Instead we should ignore any candidate_restart_lsn > <= data.restart_lsn especially before WAL sender finds first change to > send downstream. We can do that in SnapBuildProcessRunningXacts() by > accessing MyReplicationSlot, taking lock on it and then comparing > data.restart_lsn with txn->restart_decoding_lsn before calling > LogicalIncreaseRestartDecodingForSlot(). But then > LogicalIncreaseRestartDecodingForSlot() would be doing the same anyway > after applying your patch 0004. The only downside of 0004 is that the > logic to ignore candidates produced by a running transactions record > is not clearly visible in SnapBuildProcessRunningXacts(). For a > transaction which is ignored the logic to ignore the transaction is > visible in DecodeCommit() or DecodeAbort() - where people are likely > to look for that logic. We may add a comment to that effect in > SnapBuildProcessRunningXacts(). > I have thought about just doing something like: slot->data.restart_lsn = Max(slot->data.restart_lsn, new_lsn); and similar for the other LSN fields. And it did resolve the issue at hand, of course. But it seems sloppy, and I'm worried it might easily mask actual issues in other cases. I'm still of the opinion that (with the exception of a reconnect), these places should not need to deal with values that go backwards. It should work just fine without the Max(), and we should add Asserts() to check that it's always a higher LSN. For the reconnect, I think it's a bit as if the primary restarted right before the reconnect. That could happen anyway, and we need to handle that correctly - if not, we have yet another issue, IMHO. And with the restart it's the same as writing the slot to disk and reading it back, which also doesn't retain most of the fields. So it seems cleaner to do the same thing and just reset the various fields. I haven't thought about making SnapBuildProcessRunningXacts() more complex to consider this stuff. But I doubt we'd like to be accessing slots from that code - it has nothing to do with slots. If anything, tweaking LogicalIncreaseRestartDecodingForSlot() seems more appropriate, but I'm still wondering if the current coding was intentional and we're just missing why it was written like this. There's also the question of backpatching - the simpler the better, and this I think just resetting the fields wins in this regard. The main question is whether it's correct - I think it is. I'm not too worried about efficiency very much, on the grounds that this should not matter very often (only after unexpected restart). Correctness > efficiency. regards -- Tomas Vondra
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
From
Ashutosh Bapat
Date:
On Tue, Nov 12, 2024 at 4:55 PM Tomas Vondra <tomas@vondra.me> wrote: > > > A candidate_restart_lsn and candidate_restart_valid pair just tells > > that we may set slot's data.restart_lsn to candidate_restart_lsn when > > the downstream confirms an LSN >= candidate_restart_valid. That pair > > can never get inaccurate. It may get stale but never inaccurate. So > > wiping those fields from ReplicationSlot is unnecessary. > > > > Isn't this issue a proof that those fields *can* get inaccurate? Or what > do you mean by "stale but not inaccurate"? While processing a running transaction log record at RT1 the WAL sender gathers that slot's data.restart_lsn can be set to candidate_restart_lsn1 when confirmed_flush_lsn is past candidate_valid_restart1. In that sense it's accurate irrespective of when those candidates are generated. But when the next running transactions record RT2 is processed a new pair of candidate_restart_lsn2 and candidate_restart_valid2 is produced (candidate_restart_valid2 > candidate_restart_valid1). Thus if confirmed_flush_lsn >= candidate_restart_valid2, it's better to set data.restart_lsn = candidate_restart_lsn2 instead of candidate_restart_lsn1. In that sense the first pair becomes stale. If we could maintain all such pairs in a sorted fashion, we would be able to set data.restart_lsn to the latest candiate_restart_lsn for a received confirmed_flush_lsn and remove all the pairs including the latest one after setting data.restart_lsn. But we don't maintain such a list and hence the business of resetting candiate_restart_lsn and candidate_restart_valid to indicate that we have consumed the previous pair and are ready for a new one. We don't keep updating candidate_restart_lsn and candidate_restart_valid to avoid chasing a moving target and never updating data.restart_lsn. > > > What should ideally happen is we should ignore candidates produced by > > older running transactions WAL records after WAL sender restart. This > > is inline with what logical replication does with transactions > > committed before slot's confirmed_flush_lsn - those are ignored. But > > the criteria for ignoring running transactions records is slightly > > different from that for transactions. If we ignore > > candidate_restart_lsn which has candidate_restart_valid <= > > confirmed_flush_lsn, we might lose some opportunity to advance > > data.restart_lsn. Instead we should ignore any candidate_restart_lsn > > <= data.restart_lsn especially before WAL sender finds first change to > > send downstream. We can do that in SnapBuildProcessRunningXacts() by > > accessing MyReplicationSlot, taking lock on it and then comparing > > data.restart_lsn with txn->restart_decoding_lsn before calling > > LogicalIncreaseRestartDecodingForSlot(). But then > > LogicalIncreaseRestartDecodingForSlot() would be doing the same anyway > > after applying your patch 0004. The only downside of 0004 is that the > > logic to ignore candidates produced by a running transactions record > > is not clearly visible in SnapBuildProcessRunningXacts(). For a > > transaction which is ignored the logic to ignore the transaction is > > visible in DecodeCommit() or DecodeAbort() - where people are likely > > to look for that logic. We may add a comment to that effect in > > SnapBuildProcessRunningXacts(). > > > > I have thought about just doing something like: > > slot->data.restart_lsn = Max(slot->data.restart_lsn, new_lsn); > > and similar for the other LSN fields. And it did resolve the issue at > hand, of course. But it seems sloppy, and I'm worried it might easily > mask actual issues in other cases. Agreed. > > I'm still of the opinion that (with the exception of a reconnect), these > places should not need to deal with values that go backwards. It should > work just fine without the Max(), and we should add Asserts() to check > that it's always a higher LSN. Agreed. > > I haven't thought about making SnapBuildProcessRunningXacts() more > complex to consider this stuff. But I doubt we'd like to be accessing > slots from that code - it has nothing to do with slots. If anything, > tweaking LogicalIncreaseRestartDecodingForSlot() seems more appropriate, Agree. > but I'm still wondering if the current coding was intentional and we're > just missing why it was written like this. Interestingly, the asymmetry between the functions is added in the same commit b89e151054a05f0f6d356ca52e3b725dd0505e53. I doubt it was intentional; the comments at both places say the same thing. This problem is probably as old as that commit. > > For the reconnect, I think it's a bit as if the primary restarted right > before the reconnect. That could happen anyway, and we need to handle > that correctly - if not, we have yet another issue, IMHO. And with the > restart it's the same as writing the slot to disk and reading it back, > which also doesn't retain most of the fields. So it seems cleaner to do > the same thing and just reset the various fields. > > > There's also the question of backpatching - the simpler the better, and > this I think just resetting the fields wins in this regard. The main > question is whether it's correct - I think it is. I'm not too worried > about efficiency very much, on the grounds that this should not matter > very often (only after unexpected restart). Correctness > efficiency. If the slot's restart_lsn is very old before disconnect we will lose an opportunity to update the restart_lsn and thus release some resources earlier. However, that opportunity is only for a short duration. On a fast enough machine the data.restart_lsn will be updated anyway after processing all running transactions wal records anyway. So I am also of the opinion that the argument of efficiency won't stand here. But I doubt if "not resetting" is wrong. It looks more intentional to me than the asymmetry between LogicalIncreaseRestartDecodingForSlot and LogicalIncreaseXminForSlot. This is our chance to settle that asymmetry forever :D. I don't have a strong argument against resetting candidates in slots at SlotRelease. However, resetting effective_xmin and effective_catalog_xmin may does not look good. The old values may have been used to advance xmin horizon. I have not closely read the code to know whether the on-disk xmin and effective_xmin are always in sync when computing xmin horizon or not. -- Best Wishes, Ashutosh Bapat
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
From
Amit Kapila
Date:
On Tue, Nov 12, 2024 at 4:55 PM Tomas Vondra <tomas@vondra.me> wrote: > > There's also the question of backpatching - the simpler the better, and > this I think just resetting the fields wins in this regard. The main > question is whether it's correct - I think it is. I'm not too worried > about efficiency very much, on the grounds that this should not matter > very often (only after unexpected restart). Correctness > efficiency. > Sure, but what is wrong with changing LogicalIncreaseRestartDecodingForSlot's "if (slot->candidate_restart_valid == InvalidXLogRecPtr)" to "else if (slot->candidate_restart_valid == InvalidXLogRecPtr)"? My previous analysis [1][2] on similar issue also leads to that conclusion. Then later Sawada-San's email [3] also leads to the same solution. I know that the same has been discussed in this thread and we are primarily worried about whether we are missing some case that needs the current code in LogicalIncreaseRestartDecodingForSlot(). It is always possible that all who have analyzed are missing some point but I feel the chances are less. I vote to fix LogicalIncreaseRestartDecodingForSlot. Now, we have at least some theory to not clear candidate_restart_* values which is why to prevent advancing restart_lsn earlier if we get confirmation from the subscriber. Now, your theory that walsender exits should be rare so this doesn't impact much is also true but OTOH why change something that can work more efficiently provided we fix LogicalIncreaseRestartDecodingForSlot as per our analysis? [1] - https://www.postgresql.org/message-id/CAA4eK1JvyWHzMwhO9jzPquctE_ha6bz3EkB3KE6qQJx63StErQ%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CAA4eK1KcnTvwrVqmpRTEMpyarBeTxwr8KA%2BkaveQOiqJ0zYsXA%40mail.gmail.com [3] - https://www.postgresql.org/message-id/CAD21AoBVhYnGBuW_o%3DwEGgTp01qiHNAx1a14b1X9kFXmuBe%3Dsg%40mail.gmail.com -- With Regards, Amit Kapila.
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
From
Tomas Vondra
Date:
On 11/12/24 13:19, Amit Kapila wrote: > On Tue, Nov 12, 2024 at 4:55 PM Tomas Vondra <tomas@vondra.me> wrote: >> >> There's also the question of backpatching - the simpler the better, and >> this I think just resetting the fields wins in this regard. The main >> question is whether it's correct - I think it is. I'm not too worried >> about efficiency very much, on the grounds that this should not matter >> very often (only after unexpected restart). Correctness > efficiency. >> > > Sure, but what is wrong with changing > LogicalIncreaseRestartDecodingForSlot's "if > (slot->candidate_restart_valid == InvalidXLogRecPtr)" to "else if > (slot->candidate_restart_valid == InvalidXLogRecPtr)"? My previous > analysis [1][2] on similar issue also leads to that conclusion. Then > later Sawada-San's email [3] also leads to the same solution. I know > that the same has been discussed in this thread and we are primarily > worried about whether we are missing some case that needs the current > code in LogicalIncreaseRestartDecodingForSlot(). It is always possible > that all who have analyzed are missing some point but I feel the > chances are less. I vote to fix LogicalIncreaseRestartDecodingForSlot. > I'm not opposed to adjusting LogicalIncreaseRestartDecodingForSlot(). The difference from LogicalIncreaseXminForSlot() seems accidental, in which case fixing it seems desirable. > Now, we have at least some theory to not clear candidate_restart_* > values which is why to prevent advancing restart_lsn earlier if we get > confirmation from the subscriber. Now, your theory that walsender > exits should be rare so this doesn't impact much is also true but OTOH > why change something that can work more efficiently provided we fix > LogicalIncreaseRestartDecodingForSlot as per our analysis? > Because with how it works now it's impossible to check if the LSN values are correct. We get a LSN value that's before what we already have in the slot, and it might be either fine after a restart, or a bug (perhaps on the subscriber side) causing all kinds of strange issues. I'm all in favor of making stuff more efficient, but only if it doesn't harm reliability. And that does seem to be happening here, because this (not resetting + inability to have strict checks on the LSN updates) seems to be one of the reasons why we have this bug since 9.4. Sure, maybe fixing LogicalIncreaseRestartDecodingForSlot() is enough to fix this particular case. But I'd be happier if we could also add asserts checking the LSN advances, to detect similar issues that we may be unaware of yet. regards -- Tomas Vondra
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
From
Masahiko Sawada
Date:
On Tue, Nov 12, 2024 at 4:08 AM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Tue, Nov 12, 2024 at 4:55 PM Tomas Vondra <tomas@vondra.me> wrote: > > > > but I'm still wondering if the current coding was intentional and we're > > just missing why it was written like this. > > Interestingly, the asymmetry between the functions is added in the > same commit b89e151054a05f0f6d356ca52e3b725dd0505e53. I doubt it was > intentional; the comments at both places say the same thing. This > problem is probably as old as that commit. FYI the asymmetry appears to have been present since the first version of the patch that gave the LogicalIncreaseRestartDecodingForSlot() this form[1]. > > > > > For the reconnect, I think it's a bit as if the primary restarted right > > before the reconnect. That could happen anyway, and we need to handle > > that correctly - if not, we have yet another issue, IMHO. And with the > > restart it's the same as writing the slot to disk and reading it back, > > which also doesn't retain most of the fields. So it seems cleaner to do > > the same thing and just reset the various fields. > > > > > > > There's also the question of backpatching - the simpler the better, and > > this I think just resetting the fields wins in this regard. The main > > question is whether it's correct - I think it is. I'm not too worried > > about efficiency very much, on the grounds that this should not matter > > very often (only after unexpected restart). Correctness > efficiency. > > If the slot's restart_lsn is very old before disconnect we will lose > an opportunity to update the restart_lsn and thus release some > resources earlier. However, that opportunity is only for a short > duration. On a fast enough machine the data.restart_lsn will be > updated anyway after processing all running transactions wal records > anyway. So I am also of the opinion that the argument of efficiency > won't stand here. But I doubt if "not resetting" is wrong. It looks > more intentional to me than the asymmetry between > LogicalIncreaseRestartDecodingForSlot and LogicalIncreaseXminForSlot. In addition to the asymmetry between two functions, the reason why I think we should fix LogicalIncreaseRestartDecodingForSlot() is the fact that it accepts any LSN as a candidate restart_lsn. Which is dangerous and incorrect to me even if there wasn't the asymmetry stuff. Regards, [1] https://www.postgresql.org/message-id/flat/20131212000839.GA22776%40awork2.anarazel.de#e21aed91a35b31c86d34026369501816 -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
From
Ashutosh Bapat
Date:
On Tue, Nov 12, 2024 at 4:55 PM Tomas Vondra <tomas@vondra.me> wrote: > > > > On 11/12/24 10:37, Ashutosh Bapat wrote: > > On Tue, Nov 12, 2024 at 4:54 AM Tomas Vondra <tomas@vondra.me> wrote: > >> > >> > >> > >> On 11/11/24 23:41, Masahiko Sawada wrote: > >>> On Mon, Nov 11, 2024 at 6:17 AM Tomas Vondra <tomas@vondra.me> wrote: > >>> > >>> Which made me re-investigate the issue and thought that it doesn't > >>> necessarily need to clear these candidate values in memory on > >>> releasing a slot as long as we're carefully updating restart_lsn. > >> > >> Not sure, but maybe it'd be useful to ask the opposite question. Why > >> shouldn't it be correct to reset the fields, which essentially puts the > >> slot into the same state as if it was just read from disk? That also > >> discards all these values, and we can't rely on accidentally keeping > >> something important info in memory (because if the instance restarts > >> we'd lose that). > >> > >> But this reminds me that the patch I shared earlier today resets the > >> slot in the ReplicationSlotAcquire() function, but I guess that's not > >> quite correct. It probably should be in the "release" path. > >> > >>> Which seems a bit efficient for example when restarting from a very > >>> old point. Of course, even if we reset them on releasing a slot, it > >>> would perfectly work since it's the same as restarting logical > >>> decoding with a server restart. > >> > >> I find the "efficiency" argument a bit odd. It'd be fine if we had a > >> correct behavior to start with, but we don't have that ... Also, I'm not > >> quite sure why exactly would it be more efficient? > >> > >> And how likely is this in practice? It seems to me that > >> performance-sensitive cases can't do reconnects very often anyway, > >> that's inherently inefficient. No? > >> > >>> I think > >>> LogicalIncreaseRestartDecodingForSlot() should be fixed as it seems > >>> not to be working expectedly, but I could not have proof that we > >>> should either keep or reset them on releasing a slot. > >>> > >> > >> Not sure. Chances are we need both fixes, if only to make > >> LogicalIncreaseRestartDecodingForSlot more like the other function. > >> > > > > Thanks a lot for pointing to Masahiko's analysis. I missed that part > > when I read that thread. Sorry. > > > > No need to apologize. The discussion is complex and spread over a rather > long time period. > > > A candidate_restart_lsn and candidate_restart_valid pair just tells > > that we may set slot's data.restart_lsn to candidate_restart_lsn when > > the downstream confirms an LSN >= candidate_restart_valid. That pair > > can never get inaccurate. It may get stale but never inaccurate. So > > wiping those fields from ReplicationSlot is unnecessary. > > > > Isn't this issue a proof that those fields *can* get inaccurate? Or what > do you mean by "stale but not inaccurate"? > > > What should ideally happen is we should ignore candidates produced by > > older running transactions WAL records after WAL sender restart. This > > is inline with what logical replication does with transactions > > committed before slot's confirmed_flush_lsn - those are ignored. But > > the criteria for ignoring running transactions records is slightly > > different from that for transactions. If we ignore > > candidate_restart_lsn which has candidate_restart_valid <= > > confirmed_flush_lsn, we might lose some opportunity to advance > > data.restart_lsn. Instead we should ignore any candidate_restart_lsn > > <= data.restart_lsn especially before WAL sender finds first change to > > send downstream. We can do that in SnapBuildProcessRunningXacts() by > > accessing MyReplicationSlot, taking lock on it and then comparing > > data.restart_lsn with txn->restart_decoding_lsn before calling > > LogicalIncreaseRestartDecodingForSlot(). But then > > LogicalIncreaseRestartDecodingForSlot() would be doing the same anyway > > after applying your patch 0004. The only downside of 0004 is that the > > logic to ignore candidates produced by a running transactions record > > is not clearly visible in SnapBuildProcessRunningXacts(). For a > > transaction which is ignored the logic to ignore the transaction is > > visible in DecodeCommit() or DecodeAbort() - where people are likely > > to look for that logic. We may add a comment to that effect in > > SnapBuildProcessRunningXacts(). > > > > I have thought about just doing something like: > > slot->data.restart_lsn = Max(slot->data.restart_lsn, new_lsn); > > and similar for the other LSN fields. And it did resolve the issue at > hand, of course. But it seems sloppy, and I'm worried it might easily > mask actual issues in other cases. > > I'm still of the opinion that (with the exception of a reconnect), these > places should not need to deal with values that go backwards. It should > work just fine without the Max(), and we should add Asserts() to check > that it's always a higher LSN. > > For the reconnect, I think it's a bit as if the primary restarted right > before the reconnect. That could happen anyway, and we need to handle > that correctly - if not, we have yet another issue, IMHO. And with the > restart it's the same as writing the slot to disk and reading it back, > which also doesn't retain most of the fields. So it seems cleaner to do > the same thing and just reset the various fields. > > > I haven't thought about making SnapBuildProcessRunningXacts() more > complex to consider this stuff. But I doubt we'd like to be accessing > slots from that code - it has nothing to do with slots. Here's way we can fix SnapBuildProcessRunningXacts() similar to DecodeCommit(). DecodeCommit() uses SnapBuildXactNeedsSkip() to decide whether a given transaction should be decoded or not. /* * Should the contents of transaction ending at 'ptr' be decoded? */ bool SnapBuildXactNeedsSkip(SnapBuild *builder, XLogRecPtr ptr) { return ptr < builder->start_decoding_at; } Similar to SnapBuild::start_decoding_at we could maintain a field SnapBuild::start_reading_at to the LSN from which the WAL sender would start reading WAL. If candidate_restart_lsn produced by a running transactions WAL record is less than SnapBuild::start_reading_at, SnapBuildProcessRunningXacts() won't call LogicalIncreaseRestartDecodingForSlot() with that candiate LSN. We won't access the slot here and the solution will be inline with DecodeCommit() which skips the transactions. -- Best Wishes, Ashutosh Bapat
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
From
Amit Kapila
Date:
On Tue, Nov 12, 2024 at 6:29 PM Tomas Vondra <tomas@vondra.me> wrote: > > Sure, maybe fixing LogicalIncreaseRestartDecodingForSlot() is enough to > fix this particular case. But I'd be happier if we could also add > asserts checking the LSN advances, to detect similar issues that we may > be unaware of yet. > As most of us lean towards fixing LogicalIncreaseRestartDecodingForSlot(), let's fix that in the HEAD and back branches. Separately we can consider other asserts just for HEAD that you think will make the code robust and help avoid such bugs in the future. -- With Regards, Amit Kapila.
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
From
Amit Kapila
Date:
On Tue, Nov 12, 2024 at 12:43 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Tue, Nov 12, 2024 at 12:02 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > > > On Mon, Nov 11, 2024 at 2:08 PM Tomas Vondra <tomas@vondra.me> wrote: > > > > > > > > > But neither of those fixes prevents backwards move for confirmed_flush > > > LSN, as enforced by asserts in the 0005 patch. I don't know if this > > > assert is incorrect or now. It seems natural that once we get a > > > confirmation for some LSN, we can't move before that position, but I'm > > > not sure about that. Maybe it's too strict. > > > > Hmm, I'm concerned that it might be another problem. I think there are > > some cases where a subscriber sends a flush position older than slot's > > confirmed_flush as a feedback message. > > Right, it can happen for cases where subscribers doesn't have to do anything (for example DDLs) like I have explained in one of my emails [1] > > But it seems to be dangerous if > > we always accept it as a new confirmed_flush value. It could happen > > that confirm_flush could be set to a LSN older than restart_lsn. > > Possible, though I haven't tried to reproduce such a case. But, will it create any issues? I don't know if there is any benefit in allowing to move confirmed_flush LSN backward. AFAIR, we don't allow such backward values to persist. They will temporarily be in memory. I think as a separate patch we should prevent it from moving backward. > > If confirmed_flush LSN moves backwards, it means the transactions > which were thought to be replicated earlier are no longer considered > to be replicated. This means that the restart_lsn of the slot needs to > be at least far back as the oldest of starting points of those > transactions. Thus restart_lsn of slot has to be pushed further back. > I don't see a reason to move restart_lsn backward. Why do you think so? [1] - https://www.postgresql.org/message-id/CAA4eK1%2BzWQwOe5G8zCYGvErnaXh5%2BDbyg_A1Z3uywSf_4%3DT0UA%40mail.gmail.com -- With Regards, Amit Kapila.
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
From
Tomas Vondra
Date:
On 11/13/24 11:59, Amit Kapila wrote: > On Tue, Nov 12, 2024 at 12:43 PM Ashutosh Bapat > <ashutosh.bapat.oss@gmail.com> wrote: >> >> On Tue, Nov 12, 2024 at 12:02 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> >>> On Mon, Nov 11, 2024 at 2:08 PM Tomas Vondra <tomas@vondra.me> wrote: >>>> >>>> >>>> But neither of those fixes prevents backwards move for confirmed_flush >>>> LSN, as enforced by asserts in the 0005 patch. I don't know if this >>>> assert is incorrect or now. It seems natural that once we get a >>>> confirmation for some LSN, we can't move before that position, but I'm >>>> not sure about that. Maybe it's too strict. >>> >>> Hmm, I'm concerned that it might be another problem. I think there are >>> some cases where a subscriber sends a flush position older than slot's >>> confirmed_flush as a feedback message. >>> > > Right, it can happen for cases where subscribers doesn't have to do > anything (for example DDLs) like I have explained in one of my emails > [1] > Thanks. I admit not being entirely familiar with all the details, but doesn't that email explain more "Why it currently happens?" rather than "Is this what should be happening?" Sure, the subscriber needs to confirm changes for which nothing needs to be done, like DDL. But isn't there a better way to do that, rather than allowing confirmed_lsn to go backwards? >>> But it seems to be dangerous if >>> we always accept it as a new confirmed_flush value. It could happen >>> that confirm_flush could be set to a LSN older than restart_lsn. >>> > > Possible, though I haven't tried to reproduce such a case. But, will > it create any issues? I don't know if there is any benefit in allowing > to move confirmed_flush LSN backward. AFAIR, we don't allow such > backward values to persist. They will temporarily be in memory. I > think as a separate patch we should prevent it from moving backward. > >> >> If confirmed_flush LSN moves backwards, it means the transactions >> which were thought to be replicated earlier are no longer considered >> to be replicated. This means that the restart_lsn of the slot needs to >> be at least far back as the oldest of starting points of those >> transactions. Thus restart_lsn of slot has to be pushed further back. >> > > I don't see a reason to move restart_lsn backward. Why do you think so? > I think what Ashutosh is saying that if confirmed_flush is allowed to move backwards, that may result in start_lsn moving backwards too. And we need to be able to decode all transactions committed since start_lsn, so if start_lsn moves backwards, maybe restart_lsn needs to move backwards too. I have no idea if confirmed_flush/start_lsn can move backwards enough to require restart_lsn to move, though. Anyway, these discussions are a good illustration why I think allowing these LSNs to move backwards is a problem. It either causes bugs (like with breaking replication slots) and/or it makes the reasoning about correct behavior much harder. regards -- Tomas Vondra
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
From
Tomas Vondra
Date:
On 11/13/24 10:38, Amit Kapila wrote: > On Tue, Nov 12, 2024 at 6:29 PM Tomas Vondra <tomas@vondra.me> wrote: >> >> Sure, maybe fixing LogicalIncreaseRestartDecodingForSlot() is enough to >> fix this particular case. But I'd be happier if we could also add >> asserts checking the LSN advances, to detect similar issues that we may >> be unaware of yet. >> > > As most of us lean towards fixing > LogicalIncreaseRestartDecodingForSlot(), let's fix that in the HEAD > and back branches. Separately we can consider other asserts just for > HEAD that you think will make the code robust and help avoid such bugs > in the future. > +1 to that regards -- Tomas Vondra
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
From
Tomas Vondra
Date:
On 11/13/24 05:38, Ashutosh Bapat wrote: > ... > > Here's way we can fix SnapBuildProcessRunningXacts() similar to > DecodeCommit(). DecodeCommit() uses SnapBuildXactNeedsSkip() to decide > whether a given transaction should be decoded or not. > /* > * Should the contents of transaction ending at 'ptr' be decoded? > */ > bool > SnapBuildXactNeedsSkip(SnapBuild *builder, XLogRecPtr ptr) > { > return ptr < builder->start_decoding_at; > } > > Similar to SnapBuild::start_decoding_at we could maintain a field > SnapBuild::start_reading_at to the LSN from which the WAL sender would > start reading WAL. If candidate_restart_lsn produced by a running > transactions WAL record is less than SnapBuild::start_reading_at, > SnapBuildProcessRunningXacts() won't call > LogicalIncreaseRestartDecodingForSlot() with that candiate LSN. We > won't access the slot here and the solution will be inline with > DecodeCommit() which skips the transactions. > Could you maybe write a patch doing this? That would allow proper testing etc. regards -- Tomas Vondra
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
From
Masahiko Sawada
Date:
On Wed, Nov 13, 2024 at 3:53 AM Tomas Vondra <tomas@vondra.me> wrote: > > > > On 11/13/24 10:38, Amit Kapila wrote: > > On Tue, Nov 12, 2024 at 6:29 PM Tomas Vondra <tomas@vondra.me> wrote: > >> > >> Sure, maybe fixing LogicalIncreaseRestartDecodingForSlot() is enough to > >> fix this particular case. But I'd be happier if we could also add > >> asserts checking the LSN advances, to detect similar issues that we may > >> be unaware of yet. > >> > > > > As most of us lean towards fixing > > LogicalIncreaseRestartDecodingForSlot(), let's fix that in the HEAD > > and back branches. Separately we can consider other asserts just for > > HEAD that you think will make the code robust and help avoid such bugs > > in the future. > > > > +1 to that > +1. Tomas, are you going to update the patch you shared before[1] or shall I? Regards, [1] https://www.postgresql.org/message-id/abf794c4-c459-4fed-84d9-968c4f0e2052%40vondra.me, 0004-fix-LogicalIncreaseRestartDecodingForSlot.patch -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
From
Tomas Vondra
Date:
On 11/14/24 00:33, Masahiko Sawada wrote: > On Wed, Nov 13, 2024 at 3:53 AM Tomas Vondra <tomas@vondra.me> wrote: >> >> >> >> On 11/13/24 10:38, Amit Kapila wrote: >>> On Tue, Nov 12, 2024 at 6:29 PM Tomas Vondra <tomas@vondra.me> wrote: >>>> >>>> Sure, maybe fixing LogicalIncreaseRestartDecodingForSlot() is enough to >>>> fix this particular case. But I'd be happier if we could also add >>>> asserts checking the LSN advances, to detect similar issues that we may >>>> be unaware of yet. >>>> >>> >>> As most of us lean towards fixing >>> LogicalIncreaseRestartDecodingForSlot(), let's fix that in the HEAD >>> and back branches. Separately we can consider other asserts just for >>> HEAD that you think will make the code robust and help avoid such bugs >>> in the future. >>> >> >> +1 to that >> > > +1. > > Tomas, are you going to update the patch you shared before[1] or shall I? > Please feel free to take over. I'm busy with some other stuff and the initial analysis was done by you anyway. regards -- Tomas Vondra
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
From
Ashutosh Bapat
Date:
On Wed, Nov 13, 2024 at 5:25 PM Tomas Vondra <tomas@vondra.me> wrote: > > On 11/13/24 05:38, Ashutosh Bapat wrote: > > ... > > > > Here's way we can fix SnapBuildProcessRunningXacts() similar to > > DecodeCommit(). DecodeCommit() uses SnapBuildXactNeedsSkip() to decide > > whether a given transaction should be decoded or not. > > /* > > * Should the contents of transaction ending at 'ptr' be decoded? > > */ > > bool > > SnapBuildXactNeedsSkip(SnapBuild *builder, XLogRecPtr ptr) > > { > > return ptr < builder->start_decoding_at; > > } > > > > Similar to SnapBuild::start_decoding_at we could maintain a field > > SnapBuild::start_reading_at to the LSN from which the WAL sender would > > start reading WAL. If candidate_restart_lsn produced by a running > > transactions WAL record is less than SnapBuild::start_reading_at, > > SnapBuildProcessRunningXacts() won't call > > LogicalIncreaseRestartDecodingForSlot() with that candiate LSN. We > > won't access the slot here and the solution will be inline with > > DecodeCommit() which skips the transactions. > > > > Could you maybe write a patch doing this? That would allow proper > testing etc. Here's a quick and dirty patch which describes the idea. I didn't get time to implement code to move SnapBuild::restart_lsn if SnapBuild::start_decoding_at moves forward while building initial snapshot. I am not sure whether that's necessary either. I have added three elogs to see if the logic is working as expected. I see two of the elogs in patch in the server log when I run tests from tests/subscription and tests/recovery. But I do not see the third one. That either means that the situation causing the bug is not covered by those tests or the fix is not triggered. If you run your reproduction and still see the crashes please provide the output of those elog messages along with the rest of the elogs you have added. -- Best Wishes, Ashutosh Bapat
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
From
Amit Kapila
Date:
On Wed, Nov 13, 2024 at 5:23 PM Tomas Vondra <tomas@vondra.me> wrote: > > On 11/13/24 11:59, Amit Kapila wrote: > > On Tue, Nov 12, 2024 at 12:43 PM Ashutosh Bapat > > <ashutosh.bapat.oss@gmail.com> wrote: > >> > >> On Tue, Nov 12, 2024 at 12:02 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >>> > >>> On Mon, Nov 11, 2024 at 2:08 PM Tomas Vondra <tomas@vondra.me> wrote: > >>>> > >>>> > >>>> But neither of those fixes prevents backwards move for confirmed_flush > >>>> LSN, as enforced by asserts in the 0005 patch. I don't know if this > >>>> assert is incorrect or now. It seems natural that once we get a > >>>> confirmation for some LSN, we can't move before that position, but I'm > >>>> not sure about that. Maybe it's too strict. > >>> > >>> Hmm, I'm concerned that it might be another problem. I think there are > >>> some cases where a subscriber sends a flush position older than slot's > >>> confirmed_flush as a feedback message. > >>> > > > > Right, it can happen for cases where subscribers doesn't have to do > > anything (for example DDLs) like I have explained in one of my emails > > [1] > > > > Thanks. I admit not being entirely familiar with all the details, but > doesn't that email explain more "Why it currently happens?" rather than > "Is this what should be happening?" > Right. > Sure, the subscriber needs to confirm changes for which nothing needs to > be done, like DDL. But isn't there a better way to do that, rather than > allowing confirmed_lsn to go backwards? > Firstly, I agree that we should try to find ways to avoid going confirmed_lsn going backward. We can try to explore solutions both in the publisher and subscriber-side. > >>> But it seems to be dangerous if > >>> we always accept it as a new confirmed_flush value. It could happen > >>> that confirm_flush could be set to a LSN older than restart_lsn. > >>> > > > > Possible, though I haven't tried to reproduce such a case. But, will > > it create any issues? I don't know if there is any benefit in allowing > > to move confirmed_flush LSN backward. AFAIR, we don't allow such > > backward values to persist. They will temporarily be in memory. I > > think as a separate patch we should prevent it from moving backward. > > > >> > >> If confirmed_flush LSN moves backwards, it means the transactions > >> which were thought to be replicated earlier are no longer considered > >> to be replicated. This means that the restart_lsn of the slot needs to > >> be at least far back as the oldest of starting points of those > >> transactions. Thus restart_lsn of slot has to be pushed further back. > >> > > > > I don't see a reason to move restart_lsn backward. Why do you think so? > > > > I think what Ashutosh is saying that if confirmed_flush is allowed to > move backwards, that may result in start_lsn moving backwards too. And > we need to be able to decode all transactions committed since start_lsn, > so if start_lsn moves backwards, maybe restart_lsn needs to move > backwards too. I have no idea if confirmed_flush/start_lsn can move > backwards enough to require restart_lsn to move, though. > > Anyway, these discussions are a good illustration why I think allowing > these LSNs to move backwards is a problem. It either causes bugs (like > with breaking replication slots) and/or it makes the reasoning about > correct behavior much harder. > Right, I think one needs to come up with some reproducible scenarios where these cause any kind of problem or inefficiency. Then, we can discuss the solutions accordingly. I mean to say that someone has to put effort into making a bit more solid case for changing this code because it may not be a good idea to change something just based on some theory unless it is just adding some assertions. -- With Regards, Amit Kapila.
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
From
Amit Kapila
Date:
On Thu, Nov 14, 2024 at 7:08 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > > Sure. I've attached the updated patch. I just added the commit message. > @@ -1815,6 +1818,8 @@ LogicalIncreaseRestartDecodingForSlot(XLogRecPtr current_lsn, XLogRecPtr restart confirmed_flush = slot->data.confirmed_flush; SpinLockRelease(&slot->mutex); + spin_released = true; + elog(DEBUG1, "failed to increase restart lsn: proposed %X/%X, after %X/%X, current candidate %X/%X, current after %X/%X, flushed up to %X/%X", LSN_FORMAT_ARGS(restart_lsn), LSN_FORMAT_ARGS(current_lsn), @@ -1823,6 +1828,9 @@ LogicalIncreaseRestartDecodingForSlot(XLogRecPtr current_lsn, XLogRecPtr restart LSN_FORMAT_ARGS(confirmed_flush)); } + if (!spin_released) + SpinLockRelease(&slot->mutex); This coding pattern looks odd to me. We can consider releasing spinlock in the other two if/else if checks. I understand it is a matter of individual preference, so, if you and or others prefer the current way, that is also fine with me. Other than this, the patch looks good to me. -- With Regards, Amit Kapila.
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
From
Tomas Vondra
Date:
On 11/15/24 18:40, Masahiko Sawada wrote: > On Thu, Nov 14, 2024 at 10:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote: >> >> On Thu, Nov 14, 2024 at 7:08 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>> >>> Sure. I've attached the updated patch. I just added the commit message. >>> >> >> @@ -1815,6 +1818,8 @@ LogicalIncreaseRestartDecodingForSlot(XLogRecPtr >> current_lsn, XLogRecPtr restart >> confirmed_flush = slot->data.confirmed_flush; >> SpinLockRelease(&slot->mutex); >> >> + spin_released = true; >> + >> elog(DEBUG1, "failed to increase restart lsn: proposed %X/%X, after >> %X/%X, current candidate %X/%X, current after %X/%X, flushed up to >> %X/%X", >> LSN_FORMAT_ARGS(restart_lsn), >> LSN_FORMAT_ARGS(current_lsn), >> @@ -1823,6 +1828,9 @@ LogicalIncreaseRestartDecodingForSlot(XLogRecPtr >> current_lsn, XLogRecPtr restart >> LSN_FORMAT_ARGS(confirmed_flush)); >> } >> >> + if (!spin_released) >> + SpinLockRelease(&slot->mutex); >> >> This coding pattern looks odd to me. We can consider releasing >> spinlock in the other two if/else if checks. I understand it is a >> matter of individual preference, so, if you and or others prefer the >> current way, that is also fine with me. Other than this, the patch >> looks good to me. > > Indeed, I prefer your idea. I"ve attached the updated patch. I'll push > it early next week unless there are further comments. > I'm not particularly attached to how I did this in my WIP patch, it was simply the simplest way to make it work for experimentation. I'd imagine it'd be best to just mirror how LogicalIncreaseXminForSlot() does this. regards -- Tomas Vondra
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
From
Tomas Vondra
Date:
On 11/15/24 04:26, Amit Kapila wrote: > On Wed, Nov 13, 2024 at 5:23 PM Tomas Vondra <tomas@vondra.me> wrote: >> >> On 11/13/24 11:59, Amit Kapila wrote: >>> On Tue, Nov 12, 2024 at 12:43 PM Ashutosh Bapat >>> <ashutosh.bapat.oss@gmail.com> wrote: >>>> >>>> On Tue, Nov 12, 2024 at 12:02 PM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>>>> >>>>> On Mon, Nov 11, 2024 at 2:08 PM Tomas Vondra <tomas@vondra.me> wrote: >>>>>> >>>>>> >>>>>> But neither of those fixes prevents backwards move for confirmed_flush >>>>>> LSN, as enforced by asserts in the 0005 patch. I don't know if this >>>>>> assert is incorrect or now. It seems natural that once we get a >>>>>> confirmation for some LSN, we can't move before that position, but I'm >>>>>> not sure about that. Maybe it's too strict. >>>>> >>>>> Hmm, I'm concerned that it might be another problem. I think there are >>>>> some cases where a subscriber sends a flush position older than slot's >>>>> confirmed_flush as a feedback message. >>>>> >>> >>> Right, it can happen for cases where subscribers doesn't have to do >>> anything (for example DDLs) like I have explained in one of my emails >>> [1] >>> >> >> Thanks. I admit not being entirely familiar with all the details, but >> doesn't that email explain more "Why it currently happens?" rather than >> "Is this what should be happening?" >> > > Right. > >> Sure, the subscriber needs to confirm changes for which nothing needs to >> be done, like DDL. But isn't there a better way to do that, rather than >> allowing confirmed_lsn to go backwards? >> > > Firstly, I agree that we should try to find ways to avoid going > confirmed_lsn going backward. We can try to explore solutions both in > the publisher and subscriber-side. > Good that we agree. I'm not sure how could we do this on the subscriber side. Or more precisely, I see "LSNs don't go backwards" as an invariant that the publisher can enforce for all subscribers (with whatever plugin). And if a subscriber responds in a way that contradicts the invariant, I'd treat that as a subscriber bug. I don't think we should rely on subscriber to do the right thing - we have little control over that, and when things break (say, WAL gets removed too early), people generally point at the publisher. >> >> I think what Ashutosh is saying that if confirmed_flush is allowed to >> move backwards, that may result in start_lsn moving backwards too. And >> we need to be able to decode all transactions committed since start_lsn, >> so if start_lsn moves backwards, maybe restart_lsn needs to move >> backwards too. I have no idea if confirmed_flush/start_lsn can move >> backwards enough to require restart_lsn to move, though. >> >> Anyway, these discussions are a good illustration why I think allowing >> these LSNs to move backwards is a problem. It either causes bugs (like >> with breaking replication slots) and/or it makes the reasoning about >> correct behavior much harder. >> > > Right, I think one needs to come up with some reproducible scenarios > where these cause any kind of problem or inefficiency. Then, we can > discuss the solutions accordingly. I mean to say that someone has to > put effort into making a bit more solid case for changing this code > because it may not be a good idea to change something just based on > some theory unless it is just adding some assertions. > I don't know, but isn't this a bit backwards? I understand we don't want to just recklessly change long-standing code, but I think it's sensible to enforce sensible invariants like "LSNs can't go backwards" to ensure correct behavior. And then if there's a performance/efficiency problem, and someone proposes a fix, it's up to that patch to prove it's correct. Here we seem to have code no one is quite sure is correct, but we're asking for reproducible scenarios proving the existence of a bug. The bugs can be quite subtle / hard to reproduce, as evidenced by the restart_lsn issue present since 9.4. Maybe this should be the other way around, i.e. make it "strict" and then prove that (a) relaxing the check is still correct and (b) it actually has other benefits. That being said, I don't have a clear idea how to change this. So +1 to at least introducing the asserts. regards -- Tomas Vondra
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
From
Masahiko Sawada
Date:
On Fri, Nov 15, 2024 at 9:48 AM Tomas Vondra <tomas@vondra.me> wrote: > > > > On 11/15/24 18:40, Masahiko Sawada wrote: > > On Thu, Nov 14, 2024 at 10:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > >> > >> On Thu, Nov 14, 2024 at 7:08 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: > >>> > >>> Sure. I've attached the updated patch. I just added the commit message. > >>> > >> > >> @@ -1815,6 +1818,8 @@ LogicalIncreaseRestartDecodingForSlot(XLogRecPtr > >> current_lsn, XLogRecPtr restart > >> confirmed_flush = slot->data.confirmed_flush; > >> SpinLockRelease(&slot->mutex); > >> > >> + spin_released = true; > >> + > >> elog(DEBUG1, "failed to increase restart lsn: proposed %X/%X, after > >> %X/%X, current candidate %X/%X, current after %X/%X, flushed up to > >> %X/%X", > >> LSN_FORMAT_ARGS(restart_lsn), > >> LSN_FORMAT_ARGS(current_lsn), > >> @@ -1823,6 +1828,9 @@ LogicalIncreaseRestartDecodingForSlot(XLogRecPtr > >> current_lsn, XLogRecPtr restart > >> LSN_FORMAT_ARGS(confirmed_flush)); > >> } > >> > >> + if (!spin_released) > >> + SpinLockRelease(&slot->mutex); > >> > >> This coding pattern looks odd to me. We can consider releasing > >> spinlock in the other two if/else if checks. I understand it is a > >> matter of individual preference, so, if you and or others prefer the > >> current way, that is also fine with me. Other than this, the patch > >> looks good to me. > > > > Indeed, I prefer your idea. I"ve attached the updated patch. I'll push > > it early next week unless there are further comments. > > > > I'm not particularly attached to how I did this in my WIP patch, it was > simply the simplest way to make it work for experimentation. I'd imagine > it'd be best to just mirror how LogicalIncreaseXminForSlot() does this. I was going to push it next Monday but we're going to have out-of-cycle minor releases next week and we don't have to wait for February minor releases given that we already agreed with the current fix. So I pushed the fix. Looking at buildfarm animal, it seems that alligator started complain a build error: plancat.c: In function \342\200\230get_relation_info\342\200\231: plancat.c:331:54: error: assignment to \342\200\230void (*)(void)\342\200\231 from incompatible pointer type \342\200\230amcostestimate_function\342\200\231 {aka \342\200\230void (*)(struct PlannerInfo *, struct IndexPath *, double, double *, double *, double *, double *, double *)\342\200\231} [-Wincompatible-pointer-types] 331 | info->amcostestimate = amroutine->amcostestimate; | ^ make[4]: *** [<builtin>: plancat.o] Error 1 I think that it's not relevant with this fix and recent commits but caused by changes happening in gcc[1] (it's using nightly build gcc). From a success log, 5f28e6b[2]: configure: using compiler=gcc (GCC) 15.0.0 20241115 (experimental) On the other hand, from a failure log[3]: configure: using compiler=gcc (GCC) 15.0.0 20241116 (experimental) FYI it started to report build errors with 20241116 build also on v12 where this fix is not pushed. Regards, [1] likely https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=55e3bd376b2214e200fa76d12b67ff259b06c212 [2] https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=alligator&dt=2024-11-16%2000%3A20%3A53&stg=configure [3] https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=alligator&dt=2024-11-16%2002%3A14%3A05&stg=configure -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
From
Tomas Vondra
Date:
On 11/16/24 08:26, Masahiko Sawada wrote: > On Fri, Nov 15, 2024 at 9:48 AM Tomas Vondra <tomas@vondra.me> wrote: >> >> >> >> On 11/15/24 18:40, Masahiko Sawada wrote: >>> On Thu, Nov 14, 2024 at 10:16 PM Amit Kapila <amit.kapila16@gmail.com> wrote: >>>> >>>> On Thu, Nov 14, 2024 at 7:08 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote: >>>>> >>>>> Sure. I've attached the updated patch. I just added the commit message. >>>>> >>>> >>>> @@ -1815,6 +1818,8 @@ LogicalIncreaseRestartDecodingForSlot(XLogRecPtr >>>> current_lsn, XLogRecPtr restart >>>> confirmed_flush = slot->data.confirmed_flush; >>>> SpinLockRelease(&slot->mutex); >>>> >>>> + spin_released = true; >>>> + >>>> elog(DEBUG1, "failed to increase restart lsn: proposed %X/%X, after >>>> %X/%X, current candidate %X/%X, current after %X/%X, flushed up to >>>> %X/%X", >>>> LSN_FORMAT_ARGS(restart_lsn), >>>> LSN_FORMAT_ARGS(current_lsn), >>>> @@ -1823,6 +1828,9 @@ LogicalIncreaseRestartDecodingForSlot(XLogRecPtr >>>> current_lsn, XLogRecPtr restart >>>> LSN_FORMAT_ARGS(confirmed_flush)); >>>> } >>>> >>>> + if (!spin_released) >>>> + SpinLockRelease(&slot->mutex); >>>> >>>> This coding pattern looks odd to me. We can consider releasing >>>> spinlock in the other two if/else if checks. I understand it is a >>>> matter of individual preference, so, if you and or others prefer the >>>> current way, that is also fine with me. Other than this, the patch >>>> looks good to me. >>> >>> Indeed, I prefer your idea. I"ve attached the updated patch. I'll push >>> it early next week unless there are further comments. >>> >> >> I'm not particularly attached to how I did this in my WIP patch, it was >> simply the simplest way to make it work for experimentation. I'd imagine >> it'd be best to just mirror how LogicalIncreaseXminForSlot() does this. > > I was going to push it next Monday but we're going to have > out-of-cycle minor releases next week and we don't have to wait for > February minor releases given that we already agreed with the current > fix. So I pushed the fix. > Thanks. I see you only backpatched to 13, but I believe 12 will be rewrapped too. So maybe backpatch to 12 too? regards -- Tomas Vondra
Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4
From
Tom Lane
Date:
Tomas Vondra <tomas@vondra.me> writes: > Thanks. I see you only backpatched to 13, but I believe 12 will be > rewrapped too. So maybe backpatch to 12 too? Please don't. The release team already discussed this and determined that we want to push 12.22 with only the ROLE-regression fix. v12 is not "back in support" for any other purpose. regards, tom lane