Re: Undo logs - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: Undo logs |
Date | |
Msg-id | CAFiTN-s112o6KLi_gVX=b-vO9t_2_ufxxRxG6Ff1iDx5Knvohw@mail.gmail.com Whole thread Raw |
In response to | Re: Undo logs (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Undo logs
|
List | pgsql-hackers |
On Sat, Dec 1, 2018 at 12:58 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Thu, Nov 29, 2018 at 6:27 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Mon, Nov 26, 2018 at 2:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > 10. > > > + if (!UndoRecPtrIsValid(multi_prep_urp)) > > > + urecptr = UndoRecordAllocateMulti(urec, 1, upersistence, txid); > > > + else > > > + urecptr = multi_prep_urp; > > > + > > > + size = UndoRecordExpectedSize(urec); > > > .. > > > .. > > > + if (UndoRecPtrIsValid(multi_prep_urp)) > > > + { > > > + UndoLogOffset insert = UndoRecPtrGetOffset(multi_prep_urp); > > > + insert = UndoLogOffsetPlusUsableBytes(insert, size); > > > + multi_prep_urp = MakeUndoRecPtr(UndoRecPtrGetLogNo(urecptr), insert); > > > + } > > > > > > Can't we use urecptr instead of multi_prep_urp in above code after > > > urecptr is initialized? > > > > I think we can't because urecptr is just the current pointer we are > > going to return but multi_prep_urp is the static variable we need to > > update so that > > the next prepare can calculate urecptr from this location. > > > > > Okay, but that was not apparent from the code, so added a comment in > the attached delta patch. BTW, wouldn't it be better to move this > code to the end of function once prepare for current record is > complete. > > More comments > ---------------------------- > 1. > * We can consider that the log as switched if > > /that/ needs to be removed. > > 2. > + if (prepare_idx == max_prepared_undo) > + elog(ERROR, "Already reached the maximum prepared limit."); > > a. /Already/already > b. we don't use full-stop (.) in error Merged your change > > 3. > + * also stores the dbid and the progress of the undo apply during rollback. > > /the progress/ extra space. > Merged your change > 4. > +UndoSetPrepareSize(int nrecords, UnpackedUndoRecord *undorecords, > + TransactionId xid, UndoPersistence upersistence) > +{ > > nrecords should be a second parameter. Merged your change > > 5. > +UndoRecPtr > +PrepareUndoInsert(UnpackedUndoRecord *urec, UndoPersistence upersistence, > + TransactionId xid) > > It seems better to have xid parameter before UndoPersistence. Merged your change. Done same changes in UndoRecordAllocate as well > > 6. > + /* FIXME: Should we just report error ? */ > + Assert(index < MAX_BUFFER_PER_UNDO); > > No need of this Fixme. Merged your change > > 7. > PrepareUndoInsert() > { > .. > do > { > .. > + /* Undo record can not fit into this block so go to the next block. */ > + cur_blk++; > .. > } while (cur_size < size); > .. > } > > This comment was making me uneasy, so slightly adjusted the code. > Basically, by that time it was not decided whether undo record can fit > in current buffer or not. Merged your change > > 8. > + /* > + * Save referenced of undo record pointer as well as undo record. > + * InsertPreparedUndo will use these to insert the prepared record. > + */ > + prepared_undo[prepare_idx].urec = urec; > + prepared_undo[prepare_idx].urp = urecptr; > > Slightly adjust the above comment. Merged your change > > 9. > +InsertPreparedUndo(void) > { > .. > + Assert(prepare_idx > 0); > + > + /* This must be called under a critical section. */ > + Assert(InRecovery || CritSectionCount > 0); > .. > } > > I have added a few more comments for above assertions, see if those are correct. Merged your change > > 10. > +InsertPreparedUndo(void) > { > .. > + prev_undolen = undo_len; > + > + UndoLogSetPrevLen(UndoRecPtrGetLogNo(urp), prev_undolen); > .. > } > > There is no need to use an additional variable prev_undolen in the > above code. I have modified the code to remove it's usage, check if > that is correct. looks fine to me. > > 11. > + * Fetch the next undo record for given blkno, offset and transaction id (if > + * valid). We need to match transaction id along with block number and offset > + * because in some cases (like reuse of slot for committed transaction), we > + * need to skip the record if it is modified by a transaction later than the > + * transaction indicated by previous undo record. For example, consider a > + * case where tuple (ctid - 0,1) is modified by transaction id 500 which > + * belongs to transaction slot 0. Then, the same tuple is modified by > + * transaction id 501 which belongs to transaction slot 1. Then, both the > + * transaction slots are marked for reuse. Then, again the same tuple is > + * modified by transaction id 502 which has used slot 0. Now, some > + * transaction which has started before transaction 500 wants to traverse the > + * chain to find visible tuple will keep on rotating infinitely between undo > + * tuple written by 502 and 501. In such a case, we need to skip the undo > + * tuple written by transaction 502 when we want to find the undo record > + * indicated by the previous pointer of undo tuple written by transaction 501. > + * Start the search from urp. Caller need to call UndoRecordRelease > to release the > + * resources allocated by this function. > + * > + * urec_ptr_out is undo record pointer of the qualified undo record if valid > + * pointer is passed. > + */ > +UnpackedUndoRecord * > +UndoFetchRecord(UndoRecPtr urp, BlockNumber blkno, OffsetNumber offset, > + TransactionId xid, UndoRecPtr * urec_ptr_out, > + SatisfyUndoRecordCallback callback) > > > The comment above UndoFetchRecord is very zheap specific, so I have > tried to simplify it. I think we can give so much detailed examples > only when we introduce zheap code. Make sense. > > Apart from above, there are miscellaneous comments and minor code > edits in the attached delta patch. I have merged your changes. > > 12. > PrepareUndoInsert() > { > .. > + /* > + * If this is the first undo record for this top transaction add the > + * transaction information to the undo record. > + * > + * XXX there is also an option that instead of adding the information to > + * this record we can prepare a new record which only contain transaction > + * informations. > + */ > + if (xid == InvalidTransactionId) > > The above comment seems to be out of place, we are doing nothing like > that here. This work is done in UndoRecordAllocate, may be you can > move 'XXX ..' part of the comment in that function. Done > > 13. > PrepareUndoInsert() > { > .. > if (!UndoRecPtrIsValid(prepared_urec_ptr)) > + urecptr = UndoRecordAllocate(urec, 1, upersistence, txid); > + else > + urecptr = prepared_urec_ptr; > + > + size = UndoRecordExpectedSize(urec); > .. > > I think we should make above code a bit more bulletproof. As it is > written, there is no guarantee the size we have allocated is same as > we are using in this function. I agree How about if we take 'size' as output > parameter from UndoRecordAllocate and then use it in this function? > Additionally, we can have an Assert that the size returned by > UndoRecordAllocate is same as UndoRecordExpectedSize. With this change we will be able to guarantee when we are allocating single undo record but multi prepare will still be a problem. I haven't fix this as of now. I will think on how to handle both the cases when we have to prepare one time or when we have to allocate once and prepare multiple time. > > 14. > + > +void > +RegisterUndoLogBuffers(uint8 first_block_id) > +{ > + int idx; > + int flags; > + > + for (idx = 0; idx < buffer_idx; idx++) > + { > + flags = undo_buffer[idx].zero ? REGBUF_WILL_INIT : 0; > + XLogRegisterBuffer(first_block_id + idx, undo_buffer[idx].buf, flags); > + } > +} > + > +void > +UndoLogBuffersSetLSN(XLogRecPtr recptr) > +{ > + int idx; > + > + for (idx = 0; idx < buffer_idx; idx++) > + PageSetLSN(BufferGetPage(undo_buffer[idx].buf), recptr); > +} > > One line comment atop of these functions will be good. It will be > better if we place these functions at the end of file or someplace > else, as right now they are between prepare* and insert* function > calls which makes code flow bit awkward. Done > > 15. > + * and mark them dirty. For persistent undo, this step should be performed > + * after entering a critical section; it should never fail. > + */ > +void > +InsertPreparedUndo(void) > +{ > > Why only for persistent undo this step should be performed in the > critical section? I think as this function operates on shred buffer, > even for unlogged undo, it should be done in a critical section. I think we can remove this comment? I have removed that in the current patch. > > 16. > +InsertPreparedUndo(void) > { > .. > /* if starting a new log then there is no prevlen to store */ > + if (offset == UndoLogBlockHeaderSize) > + { > + if (log->meta.prevlogno != InvalidUndoLogNumber) > + { > + UndoLogControl *prevlog = UndoLogGet(log->meta.prevlogno, false); > + > + uur->uur_prevlen = prevlog->meta.prevlen; > + } > .. > } > > The comment here suggests that for new logs, we don't need prevlen, > but still, in one case you are maintaining the length, can you add few > comments to explain why? Done > > 17. > +UndoGetOneRecord(UnpackedUndoRecord *urec, UndoRecPtr urp, RelFileNode rnode, > + UndoPersistence persistence) > { > .. > + /* > + * FIXME: This can be optimized to just fetch header first and only if > + * matches with block number and offset then fetch the complete > + * record. > + */ > + if (UnpackUndoRecord(urec, page, starting_byte, &already_decoded, false)) > + break; > .. > } > > I don't know how much it matters if we fetch complete record or just > it's header unless the record is big or it falls in two pages. I > think both are boundary cases and I couldn't see this part much in > perf profiles. There is nothing to fix here if you want you can a XXX > comment or maybe suggest it as a future optimization. Changed > > 18. > +UndoFetchRecord() > { > ... > + /* > + * Prevent UndoDiscardOneLog() from discarding data while we try to > + * read it. Usually we would acquire log->mutex to read log->meta > + * members, but in this case we know that discard can't move without > + * also holding log->discard_lock. > + */ > + LWLockAcquire(&log->discard_lock, LW_SHARED); > + if (!UndoRecPtrIsValid(log->oldest_data)) > + { > + /* > + * UndoDiscardInfo is not yet initialized. Hence, we've to check > + * UndoLogIsDiscarded and if it's already discarded then we have > + * nothing to do. > + */ > + LWLockRelease(&log->discard_lock); > + if (UndoLogIsDiscarded(urp)) > + { > + if (BufferIsValid(urec->uur_buffer)) > + ReleaseBuffer(urec->uur_buffer); > + return NULL; > + } > + > + LWLockAcquire(&log->discard_lock, LW_SHARED); > + } > + > + /* Check if it's already discarded. */ > + if (urp < log->oldest_data) > + { > + LWLockRelease(&log->discard_lock); > + if (BufferIsValid(urec->uur_buffer)) > + ReleaseBuffer(urec->uur_buffer); > + return NULL; > + } > .. > } > > Can't we replace this logic with UndoRecordIsValid? Done > > 19. > UndoFetchRecord() > { > .. > while(true) > { > .. > /* > + * If we have a valid buffer pinned then just ensure that we want to > + * find the next tuple from the same block. Otherwise release the > + * buffer and set it invalid > + */ > + if (BufferIsValid(urec->uur_buffer)) > + { > + /* > + * Undo buffer will be changed if the next undo record belongs to > + * a different block or undo log. > + */ > + if (UndoRecPtrGetBlockNum(urp) != BufferGetBlockNumber(urec->uur_buffer) || > + (prevrnode.relNode != rnode.relNode)) > + { > + ReleaseBuffer(urec->uur_buffer); > + urec->uur_buffer = InvalidBuffer; > + } > + } > + else > + { > + /* > + * If there is not a valid buffer in urec->uur_buffer that means > + * we had copied the payload data and tuple data so free them. > + */ > + if (urec->uur_payload.data) > + pfree(urec->uur_payload.data); > + if (urec->uur_tuple.data) > + pfree(urec->uur_tuple.data); > + } > + > + /* Reset the urec before fetching the tuple */ > + urec->uur_tuple.data = NULL; > + urec->uur_tuple.len = 0; > + urec->uur_payload.data = NULL; > + urec->uur_payload.len = 0; > + prevrnode = rnode; > .. > } > > Can't we move this logic after getting a record with UndoGetOneRecord > and matching with a callback? This is certainly required after the > first record, so it looks a bit odd here. Also, if possible can we > move it to a separate function as this is not the main logic and makes > the main logic difficult to follow. > Fixed Apart from fixing these comments I have also rebased Thomas' undo log patches on the current head. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: