Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) - Mailing list pgsql-hackers
| From | Kirill Reshke |
|---|---|
| Subject | Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) |
| Date | |
| Msg-id | CALdSSPjvhGXihT_9f-GJabYU=_PjrFDUxYaURuTbfLyQM6TErg@mail.gmail.com Whole thread Raw |
| In response to | Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) (Melanie Plageman <melanieplageman@gmail.com>) |
| List | pgsql-hackers |
On Sat, 20 Dec 2025 at 02:10, Melanie Plageman <melanieplageman@gmail.com> wrote: > > Attached v29 addresses some feedback and also corrects a small error > with the assertion I had added in the previous version's 0009. > > On Thu, Dec 18, 2025 at 10:38 PM Xuneng Zhou <xunengzhou@gmail.com> wrote: > > > > I’ve done a basic review of patches 1 and 2. Here are some comments > > which may be somewhat immature, as this is a fairly large change set > > and I’m new to some parts of the code. > > > > 1) Potential stale old_vmbits after VM repair n v2 > > Good catch! I've fixed this in attached v29. > > > 2) Add Assert(BufferIsDirty(buf)) > > > > Since the patch's core claim is "buffer must be dirty before WAL > > registration", an assertion encodes this invariant. Should we add: > > > > Assert(BufferIsValid(buf)); > > Assert(BufferIsDirty(buf)); > > > > right before the visibilitymap_set() call? > > There are already assertions that will trip in various places -- most > importantly in XLogRegisterBuffer(), which is the one that inspired > this refactor. > > > The comment at lines: > > > "The only scenario where it is not already dirty is if the VM was removed…" > > > > This phrasing could become misleading after future refactors. Can we > > make it more direct like: > > > > > "We must mark the heap buffer dirty before calling visibilitymap_set(), because it may WAL-log the buffer and XLogRegisterBuffer()requires it." > > I see your point about future refactors missing updating comments like > this. But, I don't think we are going to refactor the code such that > we can have PD_ALL_VISIBLE set without the VM bits set more often. > Also, it is common practice in Postgres to describe very specific edge > cases or odd scenarios in order to explain code that may seem > confusing without the comment. It does risk that comment later > becoming stale, but it is better that future developers understand why > the code is there. > > That being said, I take your point that the comment is confusing. I > have updated it in a different way. > > > > "Even if PD_ALL_VISIBLE is already set, we don't need to worry about unnecessarily dirtying the heap buffer, as itmust be marked dirty before adding it to the WAL chain. The only scenario where it is not already dirty is if the VM wasremoved..." > > > > In this test we now call MarkBufferDirty() on the heap page even when > > only setting the VM, so the comments claiming “does not need to modify > > the heap buffer”/“no heap page modification” might be misleading. It > > might be better to say the test doesn’t need to modify heap > > tuples/page contents or doesn’t need to prune/freeze. > > The point I'm trying to make is that we have to dirty the buffer even > if we don't modify the page because of the XLOG sub-system > requirements. And, it may seem like a waste to do that if not > modifying the page, but the page will rarely be clean anyway. I've > tried to make this more clear in attached v29. > > - Melanie Hi! I checked v29-0009, about HeapTupleSatisfiesVacuumHorizon. Origins of this code track down to fdf9e21196a6 which was committed as part of [0], at which point there was no HeapTupleSatisfiesVacuumHorizon function. I guess this is the reason this optimization was not performed earlier. I also think this patch is correct, because we do similar things for HEAPTUPLE_DEAD & HEAPTUPLE_RECENTLY_DEAD, and HeapTupleSatisfiesVacuumHorizon is just a proxy to HeapTupleSatisfiesVacuumHorizon with only difference in DEAD VS RECENTLY_DEAD handling. Similar change could be done at heapam_scan_analyze_next_tuple ... case HEAPTUPLE_DEAD: case HEAPTUPLE_RECENTLY_DEAD: /* Count dead and recently-dead rows */ *deadrows += 1; break; ... [0] https://www.postgresql.org/message-id/CABOikdP0meGuXPPWuYrP%3DvDvoqUdshF2xJAzZHWSKg03Rz_%2B9Q%40mail.gmail.com -- Best regards, Kirill Reshke
pgsql-hackers by date: