Re: Eliminate redundant tuple visibility check in vacuum - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: Eliminate redundant tuple visibility check in vacuum |
Date | |
Msg-id | CAAKRu_YRv4OvT3t4Dw+8UCuji2aVmQxmkJpXvqvTBTWBHEJ-ww@mail.gmail.com Whole thread Raw |
In response to | Re: Eliminate redundant tuple visibility check in vacuum (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Eliminate redundant tuple visibility check in vacuum
Re: Eliminate redundant tuple visibility check in vacuum |
List | pgsql-hackers |
On Thu, Sep 7, 2023 at 3:30 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Sep 7, 2023 at 3:10 PM Melanie Plageman > <melanieplageman@gmail.com> wrote: > > I can fix it by changing the type of PruneResult->off_loc to be an > > OffsetNumber pointer. This does mean that PruneResult will be > > initialized partially by heap_page_prune() callers. I wonder if you > > think that undermines the case for making a new struct. > > I think that it undermines the case for including that particular > argument in the struct. It's not really a Prune*Result* if the caller > initializes it in part. It seems fairly reasonable to still have a > PruneResult struct for the other stuff, though, at least to me. How do > you see it? Yes, I think off_loc probably didn't belong in PruneResult to begin with. It is inherently not a result of pruning since it will only be used in the event that pruning doesn't complete (it errors out). In the attached v4 patch set, I have both PruneResult and off_loc as parameters to heap_page_prune(). I have also added a separate commit which adds comments both above heap_page_prune()'s call site in lazy_scan_prune() and in the heap_page_prune() function header clarifying the various points we discussed. > > I still want to eliminate the NULL check of off_loc in > > heap_page_prune() by making it a required parameter. Even though > > on-access pruning does not have an error callback mechanism which uses > > the offset, it seems better to have a pointless local variable in > > heap_page_prune_opt() than to do a NULL check for every tuple pruned. > > It doesn't seem important to me unless it improves performance. If > it's just stylistic, I don't object, but I also don't really see a > reason to care. I did some performance testing but, as expected, I couldn't concoct a scenario where the overhead was noticeable in a profile. So much else is happening in that code, the NULL check basically doesn't matter (even though it isn't optimized out). I mostly wanted to remove the NULL checks because I found them distracting (so, a stylistic complaint). However, upon further reflection, I actually think it is better if heap_page_prune_opt() passes NULL. heap_page_prune() has no error callback mechanism that could use it, and passing a valid value implies otherwise. Also, as you said, off_loc will always be InvalidOffsetNumber when heap_page_prune() returns normally, so having heap_page_prune_opt() pass a dummy value might actually be more confusing for future programmers. > > > I haven't run the regression suite with 0001 applied. I'm guessing > > > that you did, and that they passed, which if true means that we don't > > > have a test for this, which is a shame, although writing such a test > > > might be a bit tricky. If there is a test case for this and you didn't > > > run it, woops. > > > > There is no test coverage for the vacuum error callback context > > currently (tests passed for me). I looked into how we might add such a > > test. First, I investigated what kind of errors might occur during > > heap_prune_satisfies_vacuum(). Some of the multixact functions called > > by HeapTupleSatisfiesVacuumHorizon() could error out -- for example > > GetMultiXactIdMembers(). It seems difficult to trigger the errors in > > GetMultiXactIdMembers(), as we would have to cause wraparound. It > > would be even more difficult to ensure that we hit those specific > > errors from a call stack containing heap_prune_satisfies_vacuum(). As > > such, I'm not sure I can think of a way to protect future developers > > from repeating my mistake--apart from improving the comment like you > > mentioned. > > 004_verify_heapam.pl has some tests that intentionally corrupt pages > and then use pg_amcheck to detect the corruption. Such an approach > could probably also be used here. But it's a pain to get such tests > right, because any change to the page format due to endianness, > different block size, or whatever can make carefully-written tests go > boom. Cool! I hadn't examined how these tests work until now. I took a stab at writing a test in the existing 0004_verify_heapam.pl. The simplest thing would be if we could just vacuum the corrupted table ("test") after running pg_amcheck and compare the error context to our expectation. I found that this didn't work, though. In an assert build, vacuum trips an assert before it hits an error while vacuuming a corrupted tuple in the "test" table. There might be a way of modifying the existing test code to avoid this, but I tried the next easiest thing -- corrupting a tuple in the other existing table in the file, "junk". This is possible to do, but we have to repeat a lot of the setup code done for the "test" table to get the line pointer array and loop through and corrupt a tuple. In order to do this well, I would want to refactor some of the boilerplate into a function. There are other fiddly bits as well that I needed to consider. I think a test like this could be useful coverage of the some of the possible errors that could happen in heap_prune_satisfies_vacuum(), but it definitely isn't coverage of pg_amcheck (and thus shouldn't go in that file) and a whole new test which spins up a Postgres to cover vacuum_error_callback() seemed like a bit much. - Melanie
Attachment
pgsql-hackers by date: