Re: [HACKERS] Speedup twophase transactions - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: [HACKERS] Speedup twophase transactions |
Date | |
Msg-id | CAB7nPqSjrtRQMYnFKXCnjYcO=ni2VH1C1hDcuOo+_O2qNkmtNw@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Speedup twophase transactions (Nikhil Sontakke <nikhils@2ndquadrant.com>) |
Responses |
Re: [HACKERS] Speedup twophase transactions
|
List | pgsql-hackers |
On Sat, Mar 11, 2017 at 7:26 PM, Nikhil Sontakke <nikhils@2ndquadrant.com> wrote: > Hi David and Michael, >> It would be great to get this thread closed out after 14 months and many >> commits. > > PFA, latest patch which addresses Michael's comments. Thanks for the new version. Let's head toward a final patch. >> + ereport(WARNING, >> + (errmsg("removing future two-phase state data from >> memory \"%u\"", >> + xid))); >> + PrepareRedoRemove(xid); >> + continue >> Those are not normal (partially because unlink is atomic, but not >> durable)... But they match the correct coding pattern regarding >> incorrect 2PC entries... I'd really like to see those switched to a >> FATAL with unlink() made durable for those calls. > > Hmm, not sure what exactly we need to do here. If you look at the prior > checks, there we already skip on-disk entries. So, typically, the entries > that we encounter here will be in shmem only. As long as we don't have an alternative to offer a durable unlink, let's do nothing then. This is as well consistent with the other code paths handling corrupted or incorrect 2PC entries. >> + /* Deconstruct header */ >> + hdr = (TwoPhaseFileHeader *) buf; >> + Assert(TransactionIdEquals(hdr->xid, xid)); >> + >> + if (TransactionIdPrecedes(xid, result)) >> + result = xid; >> This portion is repeated three times and could be easily refactored. >> You could just have a routine that returns the oldes transaction ID >> used, and ignore the result for StandbyRecoverPreparedTransactions() >> by casting a (void) to let any kind of static analyzer understand that >> we don't care about the result for example. Handling for subxids is >> necessary as well depending on the code path. Spliting things into a >> could of sub-routines may be more readable as well. There are really a >> couple of small parts that can be gathered and strengthened. > > Have added a new function to do this now. It reads either from disk or > shared memory and produces error/log messages accordingly as well now. And that's ProcessTwoPhaseBufferAndReturn in the patch. ProcessTwoPhaseBuffer may be a better name. >> + /* >> + * Recreate its GXACT and dummy PGPROC >> + */ >> + gxactnew = MarkAsPreparing(xid, gid, >> + hdr->prepared_at, >> + hdr->owner, hdr->database, >> + gxact->prepare_start_lsn, >> + gxact->prepare_end_lsn); >> MarkAsPreparing() does not need to be extended with two new arguments. >> RecoverPreparedTransactions() is used only at the end of recovery, >> where it is not necessary to look at the 2PC state files from the >> records. In this code path inredo is also set to false :) > > That's not true. We will have entries with inredo set at the end of recovery > as well. Infact the MarkAsPreparing() call from > RecoverPreparedTransactions() is the one which will remove these inredo > entries and convert them into regular entries. We have optimized the > recovery code path as well. > >> + /* Delete TwoPhaseState gxact entry and/or 2PC file. */ >> + PrepareRedoRemove(parsed.twophase_xid); >> Both things should not be present, no? If the file is pushed to disk >> it means that the checkpoint horizon has already moved. >> > PREPARE in redo, followed by a checkpoint, followed by a COMMIT/ROLLBACK. We > can have both the bits set in this case. Oh, I see where our thoughts don't overlap. I actually thought that the shared memory entry and the on-disk file cannot co-exist (or if you want a file flushed at checkpoint should have its shmem entry removed). But you are right and I am wrong. In order to have the error handling done properly if the maximum amount of 2PC transactions is reached. Still.... >> - ereport(ERROR, >> + /* It's ok to find an entry in the redo/recovery case */ >> + if (!gxact->inredo) >> + ereport(ERROR, >> (errcode(ERRCODE_DUPLICATE_OBJECT), >> errmsg("transaction identifier \"%s\" is already in >> use", >> gid))); >> + else >> + { >> + found = true; >> + break; >> + } >> I would not have thought so. > > Since we are using the TwoPhaseState structure to track redo entries, at end > of recovery, we will find existing entries. Please see my comments above for > RecoverPreparedTransactions() This is absolutely not good, because it is a direct result of the interactions of the first loop of RecoverPreparedTransaction() with its second loop, and because MarkAsPreparing() can finished by being called *twice* from the same transaction. I really think that this portion should be removed and that RecoverPreparedTransactions() should be more careful when scanning the entries in pg_twophase by looking up at what exists as well in shared memory, instead of doing that in MarkAsPreparing(). Here are some more comments: + /* + * Recreate its GXACT and dummy PGPROC + */ + gxactnew = MarkAsPreparing(xid, gid, + hdr->prepared_at, + hdr->owner, hdr->database, + gxact->prepare_start_lsn, + gxact->prepare_end_lsn); + + Assert(gxactnew == gxact); Here it would be better to set ondisk to false. This makes the code more consistent with the previous loop, and the intention clear. The first loop of RecoverPreparedTransactions() has a lot in common with its second loop. You may want to refactor a little bit more here. +/* + * PrepareRedoRemove + * + * Remove the corresponding gxact entry from TwoPhaseState. Also + * remove the 2PC file. + */ This could be a bit more expanded. The removal of the 2PC does not happen after removing the in-memory data, it would happen if the in-memory data is not found. +MarkAsPreparingInRedo(TransactionId xid, const char *gid, + TimestampTz prepared_at, Oid owner, Oid databaseid, + XLogRecPtr prepare_start_lsn, XLogRecPtr prepare_end_lsn) +{ + GlobalTransaction gxact; + + LWLockAcquire(TwoPhaseStateLock, LW_EXCLUSIVE); MarkAsPreparingInRedo is internal to twophase.c. There is no need to expose it externally and it is just used in PrepareRedoAdd so you could just group both. bool valid; /* TRUE if PGPROC entry is in proc array */ bool ondisk; /* TRUE if preparestate file is on disk */ + bool inredo; /* TRUE if entry was added via xlog_redo */ We could have a set of flags here, that's the 3rd boolean of the structure used for a status. -- Michael
pgsql-hackers by date: