Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: POC: Cleaning up orphaned files using undo logs |
Date | |
Msg-id | CAA4eK1KKAFBCJuPnFtgdc89djv4xO=ZkAdXvKQinqN4hWiRbvA@mail.gmail.com Whole thread Raw |
In response to | Re: POC: Cleaning up orphaned files using undo logs (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: POC: Cleaning up orphaned files using undo logs
|
List | pgsql-hackers |
On Tue, Jul 16, 2019 at 2:09 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Jul 1, 2019 at 3:54 AM Thomas Munro <thomas.munro@gmail.com> wrote: > > 2. Introduced a new RMGR callback rm_undo_status. It is used to > > decide when record sets in the UNDO_SHARED category should be > > discarded (instead of the usual single xid-based rules). The possible > > answers are "discard me now!", "ask me again when a given XID is all > > visible", and "ask me again when a given XID is no longer running". > > From the minor nitpicking department, the patches from this stack that > are updating rmgrlist.h are consistently failing to update the comment > line preceding the list of PG_RMGR() lines. This looks to be patches > 0014 and 0015 in this stack; 0015 seems to need to be squashed into > 0014. > Fixed. You can verify in patch 0011-Infrastructure-to-execute-pending-undo-actions. The 'mask' was missing in the list as well which I have added here, we might want to commit that separately. > Reviewing Amit's 0016: > > performUndoActions appears to be badly-designed. For starters, it's > sometimes wrong: the only place it gets set to true is in > UndoActionsRequired (which is badly named, because from the name you > expect it to return a Boolean and to not have side effects, but > instead it doesn't return anything and does have side effects). > UndoActionsRequired() only gets called from selected places, like > AbortCurrentTransaction(), so the rest of the time it just returns a > wrong answer. Now maybe it's never called at those times, but there's > no guard to prevent a function like CanPerformUndoActions() (which is > also badly named, because performUndoActions tells you whether you > need to perform undo actions, not whether it's possible to perform > undo actions) from being called before the flag is set. I think that > this flag should be either (1) maintained eagerly - so that wherever > we set start_urec_ptr we also set the flag right away or (2) removed - > so when we need to know, we just loop over all of the undo categories > on the spot, which is not that expensive because there aren't that > many of them. > I have taken approach-2 to fix this. > It seems pointless to make PrepareTransaction() take undo pointers as > arguments, because those pointers are just extracted from the > transaction state, to which PrepareTransaction() has a pointer. > Fixed. > Thomas has already objected to another proposal to add functions that > turn 32-bit XIDs into 64-bit XIDs. Therefore, I feel confident in > predicting that he will likewise object to GetEpochForXid. I think > this needs to be changed somehow, maybe by doing what the XXX comment > you added suggests. > I will fix this later. I think we can separately write a patch to extend Two-phase file to use fulltransactionid and then use it here. > This patch has some problems with naming consistency. There's a > function called PushUndoRequest() which calls a function called > RegisterRollbackReq() to do the heart of the work. So, is it undo or > rollback? Are we pushing or registering? Is it a request or a req? > For bonus points, the flag that the function sets is called > undo_req_pushed, which is halfway in between the two competing > terminologies. Other gripes about PushUndoRequest: push is vague and > doesn't really explain what's happening, "apllying" is a typo, > per_level is a poor variable name and shouldn't be declared volatile. > This function has problems with naming in other places, too; please go > through all of the names carefully and make them consistent and > adequately descriptive. > I have changed the namings to make them consistent. If you see anything else, then do let me know. > I am not a fan of applying_subxact_undo. I think we should look for a > better design there. A couple of things occur to me. One is that we > don't necessarily need to go to FATAL; we could just force the current > transaction and all of its subtransactions fail all the way out to the > top level, but then perhaps allow new transactions to be started > afterwards. I'm not sure that's worth it, but it would work, and I > think it has precedent in SxactIsDoomed. Assuming we're going to stick > with the current FATAL plan, I think we should do something like > invent a new kind of critical section that forces ERROR to be promoted > to FATAL and then use it here. We could call it a semi-critical or > locally-critical section, and the undo machinery could use it, but > then also so could other things. I've wanted that sort of concept > before, so I think it's a good idea to try to have something general > and independent of undo. The same concept could be used in > PerformUndoActions() instead of having to invent > pg_rethrow_as_fatal(), so we'd have two uses for this mechanism right > away. > Okay, I have developed the concept of semi-critical section and used it for sub-transactions and temp tables. Kindly check if this is something that you have in mind? > FinishPreparedTransactions() tries to apply undo actions while > interrupts are still held. Is that necessary? Can we avoid it? > Fixed. > It seems highly likely that the logic added to the TBLOCK_SUBCOMMIT > case inside CommitTransactionCommand and also into > ReleaseCurrentSubTransaction should have been added to > CommitSubTransaction instead. If that's not true, then we have to > believe that the TBLOCK_SUBRELEASE call to CommitSubTransaction needs > different treatment from the other two cases, which sounds unlikely; > we also have to explain why undo is somehow different from all of > these other releases that are already handled in that function, not in > its callers. I also strongly suspect it is altogether wrong to do > this before CommitSubTransaction sets s->state to TRANS_COMMIT; what > if a subxact callback throws an error? > > For related reasons, I don't think that the change ReleaseSavepoint() > are right either. Notice the header comment: "As above, we don't > actually do anything here except change blockState." The "as above" > part of the comment probably didn't originally refer to > DefineSavepoint(), which definitely does do other stuff, but to > something like EndImplicitTransactionBlock() or EndTransactionBlock(), > and DefineSavepoint() got stuck in the middle later. Anyway, your > patch makes the comment false by doing actual state changes in this > function, rather than just marking the subtransactions for commit. > But why should that be right? If none of the many other bits of state > are manipulated here rather than in CommitSubTransaction(), why is > undo the one thing that is different? I guess this is basically just > compensation for the lack of any of this code in the TBLOCK_SUBRELEASE > path which I noted in the previous paragraph, but I still think the > right answer is to put it all in CommitSubTransaction() *after* we set > TRANS_COMMIT. > Changed as per suggestion. > There are a number of things I either don't like or don't understand > about PerformUndoActions. One is that undo_req_pushed gets passed to > this function. That just looks really odd from an abstraction point > of view. Basically, we have a function whose job is to "perform undo > actions," and it gets a flag as an argument that tells it to not > actually perform some of the undo actions: that's odd. I think the > reason it's like that is because of the issue we've been discussing > elsewhere that there's a separate undo request for each category. If > you didn't have that, you wouldn't need to do this here. I'm not > saying that proves that the one-request-per-persistence-level design > is definitely wrong, but this is certainly not a point in its favor, > at least IMHO. > I think we have discussed in detail about one-request-per-persistence-level design and I will investigate it to see if we can make it one-request-per-transaction and if not what are the challenges and can we overcome them without significantly more work and complexity. So for now, I have not changed anything related to this point. Apart from these comments, I have changed a few more things: a. Changed TWOPHASE_MAGIC define as we are changing TwoPhaseFileHeader. b. Fixed comments by Dilip on same patch [1]. I will respond to them separately. c. Fixed the problem reported by Thomas [2] and one similar problem in an error queue noticed by me. I have still not addressed all the comments raised. This is mainly to unblock Thomas's test and share whatever is done until now. I am posting all the patches, but have not modified anything related to undo-log and undo-interface patches (aka from 0001 to 0008). [1] - https://www.postgresql.org/message-id/CAFiTN-tObs5BQZETqK12QuOz7nPSXb90PdG49AzK2ZJ4ts1c5g%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CA%2BhUKGLv016-1y%3DCwx%2Bmme%2BcFRD5Bn03%3D2JVFnRB7JMLsA35%3Dw%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Attachment
- 0001-Move-some-md.c-specific-logic-from-smgr.c-to-md.c.patch
- 0002-Prepare-to-support-multiple-SMGR-implementations.patch
- 0003-Add-undo-log-manager.patch
- 0004-Allow-WAL-record-data-on-first-modification-after-a-.patch
- 0005-Add-prefetch-support-for-the-undo-log.patch
- 0006-Defect-and-enhancement-in-multi-log-support.patch
- 0007-Provide-interfaces-to-store-and-fetch-undo-records.patch
- 0008-undo-page-consistency-checker.patch
- 0008-undo-page-consistency-checker-1.patch
- 0009-Extend-binary-heap-functionality.patch
- 0010-Infrastructure-to-register-and-fetch-undo-action-req.patch
- 0011-Infrastructure-to-execute-pending-undo-actions.patch
- 0012-Allow-foreground-transactions-to-perform-undo-action.patch
- 0013-Allow-execution-and-discard-of-undo-by-background-wo.patch
pgsql-hackers by date: