Re: Reviewing freeze map code - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: Reviewing freeze map code |
Date | |
Msg-id | CAD21AoAqXDbR0TwK42YLJsujK9HqA=pT9O_yRWmyq5WiW9GoaQ@mail.gmail.com Whole thread Raw |
In response to | Re: Reviewing freeze map code (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: Reviewing freeze map code
|
List | pgsql-hackers |
On Fri, Jul 8, 2016 at 10:24 PM, Amit Kapila <amit.kapila16@gmail.com> wrote: > On Thu, Jul 7, 2016 at 12:34 PM, Masahiko Sawada <sawada.mshk@gmail.com> wrote: >> Than you for reviewing! >> >> On Thu, Jul 7, 2016 at 7:06 AM, Andres Freund <andres@anarazel.de> wrote: >>> On 2016-07-05 23:37:59 +0900, Masahiko Sawada wrote: >>>> diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c >>>> index 57da57a..fd66527 100644 >>>> --- a/src/backend/access/heap/heapam.c >>>> +++ b/src/backend/access/heap/heapam.c >>>> @@ -3923,6 +3923,17 @@ l2: >>>> >>>> if (need_toast || newtupsize > pagefree) >>>> { >>>> + /* >>>> + * To prevent data corruption due to updating old tuple by >>>> + * other backends after released buffer >>> >>> That's not really the reason, is it? The prime problem is crash safety / >>> replication. The row-lock we're faking (by setting xmax to our xid), >>> prevents concurrent updates until this backend died. >> >> Fixed. >> >>>> , we need to emit that >>>> + * xmax of old tuple is set and clear visibility map bits if >>>> + * needed before releasing buffer. We can reuse xl_heap_lock >>>> + * for this purpose. It should be fine even if we crash midway >>>> + * from this section and the actual updating one later, since >>>> + * the xmax will appear to come from an aborted xid. >>>> + */ >>>> + START_CRIT_SECTION(); >>>> + >>>> /* Clear obsolete visibility flags ... */ >>>> oldtup.t_data->t_infomask &= ~(HEAP_XMAX_BITS | HEAP_MOVED); >>>> oldtup.t_data->t_infomask2 &= ~HEAP_KEYS_UPDATED; >>>> @@ -3936,6 +3947,46 @@ l2: >>>> /* temporarily make it look not-updated */ >>>> oldtup.t_data->t_ctid = oldtup.t_self; >>>> already_marked = true; >>>> + >>>> + /* Clear PD_ALL_VISIBLE flags */ >>>> + if (PageIsAllVisible(BufferGetPage(buffer))) >>>> + { >>>> + all_visible_cleared = true; >>>> + PageClearAllVisible(BufferGetPage(buffer)); >>>> + visibilitymap_clear(relation, BufferGetBlockNumber(buffer), >>>> + vmbuffer); >>>> + } >>>> + >>>> + MarkBufferDirty(buffer); >>>> + >>>> + if (RelationNeedsWAL(relation)) >>>> + { >>>> + xl_heap_lock xlrec; >>>> + XLogRecPtr recptr; >>>> + >>>> + /* >>>> + * For logical decoding we need combocids to properly decode the >>>> + * catalog. >>>> + */ >>>> + if (RelationIsAccessibleInLogicalDecoding(relation)) >>>> + log_heap_new_cid(relation, &oldtup); >>> >>> Hm, I don't see that being necessary here. Row locks aren't logically >>> decoded, so there's no need to emit this here. >> >> Fixed. >> >>> >>>> + /* Clear PD_ALL_VISIBLE flags */ >>>> + if (PageIsAllVisible(page)) >>>> + { >>>> + Buffer vmbuffer = InvalidBuffer; >>>> + BlockNumber block = BufferGetBlockNumber(*buffer); >>>> + >>>> + all_visible_cleared = true; >>>> + PageClearAllVisible(page); >>>> + visibilitymap_pin(relation, block, &vmbuffer); >>>> + visibilitymap_clear(relation, block, vmbuffer); >>>> + } >>>> + >>> >>> That clears all-visible unnecessarily, we only need to clear all-frozen. >>> >> >> Fixed. >> >>> >>>> @@ -8694,6 +8761,23 @@ heap_xlog_lock(XLogReaderState *record) >>>> } >>>> HeapTupleHeaderSetXmax(htup, xlrec->locking_xid); >>>> HeapTupleHeaderSetCmax(htup, FirstCommandId, false); >>>> + >>>> + /* The visibility map need to be cleared */ >>>> + if ((xlrec->infobits_set & XLHL_ALL_VISIBLE_CLEARED) != 0) >>>> + { >>>> + RelFileNode rnode; >>>> + Buffer vmbuffer = InvalidBuffer; >>>> + BlockNumber blkno; >>>> + Relation reln; >>>> + >>>> + XLogRecGetBlockTag(record, 0, &rnode, NULL, &blkno); >>>> + reln = CreateFakeRelcacheEntry(rnode); >>>> + >>>> + visibilitymap_pin(reln, blkno, &vmbuffer); >>>> + visibilitymap_clear(reln, blkno, vmbuffer); >>>> + PageClearAllVisible(page); >>>> + } >>>> + >>> >>> >>>> PageSetLSN(page, lsn); >>>> MarkBufferDirty(buffer); >>>> } >>>> diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h >>>> index a822d0b..41b3c54 100644 >>>> --- a/src/include/access/heapam_xlog.h >>>> +++ b/src/include/access/heapam_xlog.h >>>> @@ -242,6 +242,7 @@ typedef struct xl_heap_cleanup_info >>>> #define XLHL_XMAX_EXCL_LOCK 0x04 >>>> #define XLHL_XMAX_KEYSHR_LOCK 0x08 >>>> #define XLHL_KEYS_UPDATED 0x10 >>>> +#define XLHL_ALL_VISIBLE_CLEARED 0x20 >>> >>> Hm. We can't easily do that in the back-patched version; because a >>> standby won't know to check for the flag . That's kinda ok, since we >>> don't yet need to clear all-visible yet at that point of >>> heap_update. But that better means we don't do so on the master either. >>> >> >> Attached latest version patch. > > + /* Clear only the all-frozen bit on visibility map if needed */ > > + if (PageIsAllVisible(BufferGetPage(buffer)) && > > + VM_ALL_FROZEN(relation, block, &vmbuffer)) > + { > + visibilitymap_clear_extended(relation, block, vmbuffer, > + VISIBILITYMAP_ALL_FROZEN); > + } > + > > + if (RelationNeedsWAL(relation)) > + { > .. > > + XLogBeginInsert(); > + XLogRegisterBuffer(0, buffer, REGBUF_STANDARD); > + > + xlrec.offnum = ItemPointerGetOffsetNumber(&oldtup.t_self); > + xlrec.locking_xid = xmax_old_tuple; > + xlrec.infobits_set = compute_infobits(oldtup.t_data->t_infomask, > + oldtup.t_data->t_infomask2); > + XLogRegisterData((char *) &xlrec, SizeOfHeapLock); > + recptr = XLogInsert(RM_HEAP_ID, XLOG_HEAP_LOCK); > .. > > One thing that looks awkward in this code is that it doesn't record > whether the frozen bit is actually cleared during the actual operation > and then during replay, it always clear the frozen bit irrespective of > whether it has been cleared by the actual operation or not. > I changed it so that we look all-frozen bit up first, and then clear it if needed. > + /* Clear only the all-frozen bit on visibility map if needed */ > + if (PageIsAllVisible(page) && > + VM_ALL_FROZEN(relation, BufferGetBlockNumber(*buffer), &vmbuffer)) > + { > + BlockNumber block = BufferGetBlockNumber(*buffer); > + > + visibilitymap_pin(relation, block, &vmbuffer); > > I think it is not right to call visibilitymap_pin after holding a > buffer lock (visibilitymap_pin can perform I/O). Refer heap_update > for how to pin the visibility map. > Thank you for your advice! Fixed. Attached separated two patched, please give me feedbacks. Regards, -- Masahiko Sawada
Attachment
pgsql-hackers by date: