Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access) - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)
Date
Msg-id CAAKRu_YxD3UC3BXxS55jPjBC_yj_vn3FVoLvBMwQuHXGDXacGg@mail.gmail.com
Whole thread Raw
In response to Re: eliminate xl_heap_visible to reduce WAL (and eventually set VM on-access)  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Thanks for the review! I've made the changes to comments and minor
fixes you suggested in attached v13 and have limited my inline
responses to areas where further discussion is required.

On Tue, Sep 9, 2025 at 3:26 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> 0003. What you've done here with xl_heap_prune.flags is kind of
> horrifying. The problem is that, while you've added code explaining
> that VISIBILITYMAP_ALL_{VISIBLE,FROZEN} are honorary XLHP flags,
> nobody who isn't looking directly at that comment is going to
> understand the muddling of the two namespaces. I would suggest not
> doing this, even if it means defining redundant constants and writing
> technically-unnecessary code to translate between them.

Fair. I've introduced new XLHP flags in attached v13. Hopefully it
puts an end to the horror.

> 0004. It is not clear to me why you need to get
> log_heap_prune_and_freeze to do the work here. Why can't
> log_newpage_buffer get the job done already?

Well, I need something to emit the changes to the VM. I'm eliminating
all users of xl_heap_visible. Empty pages are the ones that benefit
the least from switching from xl_heap_visible -> xl_heap_prune. But,
if I don't transition them, we have to maintain all the
xl_heap_visible code (including visibilitymap_set() in its long form).

As for log_newpage_buffer(), I could keep it if you think it is too
confusing to change log_heap_prune_and_freeze()'s API (by passing
force_heap_fpi) to handle this case, I can leave log_newpage_buffer()
there and then call log_heap_prune_and_freeze().

I just thought it seemed simple to avoid emitting the new page record
and the VM update record, so why not -- but I don't have strong
feelings.

> 0005. It looks a little curious that you delete the
> identify-corruption logic from the end of the if-nest and add it to
> the beginning. Ceteris paribus, you'd expect that to be worse, since
> corruption is a rare case.

On master, the two corruption cases are sandwiched between the normal
VM set cases. And I actually think doing it this way is brittle. If
you put the cases which set the VM first, you have to have completely
bulletproof the if statements guarding them to foreclose any possible
corruption case from entering because otherwise you will overwrite the
corruption you then try to detect.

But, specifically, from a performance perspective:

I think moving up the third case doesn't matter because the check is so cheap:

    else if (presult.lpdead_items > 0 && PageIsAllVisible(page))

And as for moving up the second case (the other corruption case), the
non-cheap thing it does is call visibilitymap_get_status()

    else if (all_visible_according_to_vm && !PageIsAllVisible(page) &&
             visibilitymap_get_status(vacrel->rel, blkno, &vmbuffer) != 0)

But once you call visibilitymap_get_status() once, assuming there is
no corruption and you need to go set the VM, you've already got that
page of the VM read, so it is probably pretty cheap. Overall, I didn't
think this would add noticeable overhead or many wasted operations.

And I thought that reorganizing the code improved clarity as well as
decreased the likelihood of bugs from insufficiently guarding positive
cases against corrupt pages and overwriting corruption instead of
detecting it.

If we're really worried about it from a performance perspective, I
could add an extra test at the top of identify_and_fix_vm_corruption()
that dumps out early if (!all_visible_according_to_vm &&
presult.all_visible).

> +                * If the heap page is all-visible but the VM bit is
> not set, we don't
> +                * need to dirty the heap page.  However, if checksums
> are enabled, we
> +                * do need to make sure that the heap page is dirtied
> before passing
> +                * it to visibilitymap_set(), because it may be logged.
>                  */
> -               PageSetAllVisible(page);
> -               MarkBufferDirty(buf);
> +               if (!PageIsAllVisible(page) || XLogHintBitIsNeeded())
> +               {
> +                       PageSetAllVisible(page);
> +                       MarkBufferDirty(buf);
> +               }
>
> I really hate this. Maybe you're going to argue that it's not the job
> of this patch to fix the awfulness here, but surely marking a buffer
> dirty in case some other function decides to WAL-log it is a
> ridiculous plan.

Right, it isn't pretty. But I don't quite see what the alternative is.
We need to mark the buffer dirty before setting the LSN. We could
perhaps rewrite visibilitymap_set()'s API to return the LSN of the
xl_heap_visible record and stamp it on the heap buffer ourselves. But
1) I think visibilitymap_set() purposefully conceals its WAL logging
ways from the caller and propagating that info back up starts to make
the API messy in another way and 2) I'm a bit loath to make big
changes to visibilitymap_set() right now since my patch set eventually
resolves this by putting the changes to the VM and heap page in the
same WAL record.

- Melanie

Attachment

pgsql-hackers by date:

Previous
From: "Matheus Alcantara"
Date:
Subject: Re: Proposal: Out-of-Order NOTIFY via GUC to Improve LISTEN/NOTIFY Throughput
Next
From: Jeff Davis
Date:
Subject: Re: Should io_method=worker remain the default?