Re: [HACKERS] WAL consistency check facility - Mailing list pgsql-hackers
From | Kuntal Ghosh |
---|---|
Subject | Re: [HACKERS] WAL consistency check facility |
Date | |
Msg-id | CAGz5QCKa_n_qV-p_zf7b9HS0gjei=e18NiKurfrg-v9NRBinpg@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] WAL consistency check facility (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] WAL consistency check facility
|
List | pgsql-hackers |
On Tue, Jan 31, 2017 at 9:36 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Jan 5, 2017 at 12:54 AM, Kuntal Ghosh > <kuntalghosh.2007@gmail.com> wrote: >> I've attached the patch with the modified changes. PFA. > > Can this patch check contrib/bloom? > Added support for generic resource manager type. > + /* > + * Mask some line pointer bits, particularly those marked as > + * used on a master and unused on a standby. > + */ > > Comment doesn't explain why we need to do this. > Added comments. > + /* > + * For GIN_DELETED page, the page is initialized to empty. > + * Hence mask everything. > + */ > + if (opaque->flags & GIN_DELETED) > + memset(page_norm, MASK_MARKER, BLCKSZ); > + else > + mask_unused_space(page_norm); > > If the page is initialized to empty, why do we need to mask > anything/everything? Anyway, it doesn't seem right to mask the > GinPageOpaque structure itself. Or at least it doesn't seem right to > mask the flags word. > Modified accordingly. > > + if (!HeapTupleHeaderXminFrozen(page_htup)) > + page_htup->t_infomask |= HEAP_XACT_MASK; > + else > + page_htup->t_infomask |= HEAP_XMAX_COMMITTED | > HEAP_XMAX_INVALID; > > Comment doesn't address this logic. Also, the "else" case shouldn't > exist at all, I think. Added comments. I think that "Else" part is needed for xmax. > > + /* > + * For a speculative tuple, the content of t_ctid is conflicting > + * between the backup page and current page. Hence, we set it > + * to the current block number and current offset. > + */ > > Why does it differ? Is that a bug? > Added comments. > > + /* > + * During replay of a btree page split, we don't set the BTP_SPLIT_END > + * flag of the right sibling and initialize th cycle_id to 0 for the same > + * page. > + */ > > A reference to the name of the redo function might be helpful here > (and in some other places), unless it's just ${AMNAME}_redo. "th" is > a typo for "the". Corrected. > + appendStringInfoString(buf, " (full page > image, apply)"); > + else > + appendStringInfoString(buf, " (full page image)"); > > How about "(full page image)" and "(full page image, for WAL verification)"? > > Similarly in XLogDumpDisplayRecord, I think we should assume that > "FPW" means what it has always meant, and leave that output alone. > Instead, distinguish the WAL-consistency-checking case when it > happens, e.g. "(FPW for consistency check)". > Corrected. > +checkConsistency(XLogReaderState *record) > > How about CheckXLogConsistency()? > > + * If needs_backup is true or wal consistency check is enabled for > > ...or WAL checking is enabled... > > + * If WAL consistency is enabled for the resource manager of > > If WAL consistency checking is enabled... > > + * Note: when a backup block is available in XLOG with BKPIMAGE_APPLY flag > > with the BKPIMAGE_APPLY flag Modified accordingly. > > - * In RBM_ZERO_* modes, if the page doesn't exist, the relation is extended > - * with all-zeroes pages up to the referenced block number. In > - * RBM_ZERO_AND_LOCK and RBM_ZERO_AND_CLEANUP_LOCK modes, the return value > + * In RBM_ZERO_* modes, if the page doesn't exist or BKPIMAGE_APPLY flag > + * is not set for the backup block, the relation is extended with all-zeroes > + * pages up to the referenced block number. > > OK, I'm puzzled by this. Surely we don't want the WAL consistency > checking facility to cause the relation to be extended like this. And > I don't see why it would, because the WAL consistency checking happens > after the page changes have already been applied. I wonder if this > hunk is in error and should be dropped. > Dropped comments. > + PageXLogRecPtrSet(phdr->pd_lsn, PG_UINT64_MAX); > + phdr->pd_prune_xid = PG_UINT32_MAX; > + phdr->pd_flags |= PD_PAGE_FULL | PD_HAS_FREE_LINES; > + phdr->pd_flags |= PD_ALL_VISIBLE; > +#define MASK_MARKER 0xFF > (and many others) > > Why do we mask by setting bits rather than clearing bits? My > intuition would have been to zero things we want to mask, rather than > setting them to one. > Modified accordingly so that we can zero things during masking instead of setting them to one. > + { > + newwalconsistency[i] = true; > + } > > Superfluous braces. > Corrected. > + /* > + * Leave if no masking functions defined, this is possible in the case > + * resource managers generating just full page writes, comparing an > + * image to itself has no meaning in those cases. > + */ > + if (RmgrTable[rmid].rm_mask == NULL) > + return; > > ...and also... > > + /* > + * This feature is enabled only for the resource managers where > + * a masking function is defined. > + */ > + for (i = 0; i <= RM_MAX_ID; i++) > + { > + if (RmgrTable[i].rm_mask != NULL) > > Why do we assume that the feature is only enabled for RMs that have a > mask function? Why not instead assume that if there's no masking > function, no masking is required? I've introduced a function in rmgr.c, named consistencyCheck_is_enabled, which returns true if wal_consistency_checking is enabled for a resource manager. It does not have any dependency on the masking function. If a masking function is defined, then we mask the page before consistency check. However, I'm not sure whether rmgr.c is the right place to define the function consistencyCheck_is_enabled. > > + /* Definitely not an individual resource manager. Check for 'all'. */ > + if (pg_strcasecmp(tok, "all") == 0) > > It seems like it might be cleaner to check for "all" first, and then > check for individual RMs afterward. > Done. > + /* > + * Parameter should contain either 'all' or a combination of resource > + * managers. > + */ > + if (isAll && isRmgrId) > + { > + GUC_check_errdetail("Invalid value combination"); > + return false; > + } > > That error message is not very clear, and I don't see why we even need > to check this. If someone sets wal_consistency_checking = hash, all, > let's just set it for all and the fact that hash is also set won't > matter to anything. > Modified accordingly. > + void (*rm_mask) (char *page, BlockNumber blkno); > > Could the page be passed as type "Page" rather than a "char *" to make > things more convenient for the masking functions? If not, could those > functions at least do something like "Page page = (Page) pagebytes;" > rather than "Page page_norm = (Page) page;"? > Modified it as "Page tempPage = (Page) page;" > + /* > + * Read the contents from the current buffer and store it in a > + * temporary page. > + */ > + buf = XLogReadBufferExtended(rnode, forknum, blkno, > + RBM_NORMAL); > + if (!BufferIsValid(buf)) > + continue; > + > + new_page = BufferGetPage(buf); > + > + /* > + * Read the contents from the backup copy, stored in WAL record > + * and store it in a temporary page. There is not need to allocate > + * a new page here, a local buffer is fine to hold its contents and > + * a mask can be directly applied on it. > + */ > + if (!RestoreBlockImage(record, block_id, old_page_masked)) > + elog(ERROR, "failed to restore block image"); > + > + /* > + * Take a copy of the new page where WAL has been applied to have > + * a comparison base before masking it... > + */ > + memcpy(new_page_masked, new_page, BLCKSZ); > + > + /* No need for this page anymore now that a copy is in */ > + ReleaseBuffer(buf); > > The order of operations is strange here. We read the "new" page, > holding the pin (but no lock?). Then we restore the block image into > old_page_masked. Now we copy the new page and release the pin. It > would be better, ISTM, to rearrange that so that we finish with the > new page and release the pin before dealing with the old page. Also, > I think we need to actually lock the buffer before copying it. Maybe > that's not strictly necessary since this is all happening on the > standby, but it seems like a bad idea to set the precedent that you > can read a page without taking the content lock. > Modified accordingly. > I think the "new" and "old" page terminology is kinda weird too. > Maybe we should call them the "replay image" and the "master image" or > something like that. A few more comments wouldn't hurt either. > Done. > + * Also mask the all-visible flag. > + * > + * XXX: It is unfortunate that we have to do this. If the flag is set > + * incorrectly, that's serious, and we would like to catch it. If the flag > + * is cleared incorrectly, that's serious too. But redo of HEAP_CLEAN > + * records don't currently set the flag, even though it is set in the > + * master, so we must silence failures that that causes. > + */ > + phdr->pd_flags |= PD_ALL_VISIBLE; > > I'm puzzled by the reference to HEAP_CLEAN. The thing that might set > the all-visible bit is XLOG_HEAP2_VISIBLE, not XLOG_HEAP2_CLEAN. > Unless I'm missing something, there's no situation in which > XLOG_HEAP2_CLEAN might be associated with setting PD_ALL_VISIBLE. > Also, XLOG_HEAP2_VISIBLE records do SOMETIMES set the bit, just not > always. And there's a good reason for that, which is explained in > this comment: > > * We don't bump the LSN of the heap page when setting the visibility > * map bit (unless checksums or wal_hint_bits is enabled, in which > * case we must), because that would generate an unworkable volume of > * full-page writes. This exposes us to torn page hazards, but since > * we're not inspecting the existing page contents in any way, we > * don't care. > * > * However, all operations that clear the visibility map bit *do* bump > * the LSN, and those operations will only be replayed if the XLOG LSN > * follows the page LSN. Thus, if the page LSN has advanced past our > * XLOG record's LSN, we mustn't mark the page all-visible, because > * the subsequent update won't be replayed to clear the flag. > > So I think this comment needs to be rewritten with a bit more nuance. > Corrected. > +extern void mask_unused_space(Page page); > +#endif > > Missing newline. > Done. Thank you Robert for the review. Please find the updated patch in the attachment. Thanks to Amit Kapila and Dilip Kumar for their suggestions in offline discussions. -- Thanks & Regards, Kuntal Ghosh EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: