Re: Optimising compactify_tuples() - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Optimising compactify_tuples() |
Date | |
Msg-id | CA+hUKGLhcvXFMvAG6O1UsE2sN2CUkMBNwdOFAHGEv4Szco_yXQ@mail.gmail.com Whole thread Raw |
In response to | Re: Optimising compactify_tuples() (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Optimising compactify_tuples()
|
List | pgsql-hackers |
On Thu, Sep 10, 2020 at 2:34 AM David Rowley <dgrowleyml@gmail.com> wrote: > I think you were adequately caffeinated. You're right that this is > fairly simple to do, but it looks even more simple than looping twice > of the array. I think it's just a matter of looping over the > itemidbase backwards and putting the higher itemid tuples at the end > of the page. I've done it this way in the attached patch. Yeah, I was wondering how to make as much of the algorithm work in a memory-forwards direction as possible (even the item pointer access), but it was just a hunch. Once you have the adjacent-tuple merging thing so you're down to just a couple of big memcpy calls, it's probably moot anyway. > I also added a presorted path which falls back on doing memmoves > without the temp buffer when the itemidbase array indicates that > higher lineitems all have higher offsets. I'm doing the presort check > in the calling function since that loops over the lineitems already. > We can just memmove the tuples in reverse order without overwriting > any yet to be moved tuples when these are in order. Great. I wonder if we could also identify a range at the high end that is already correctly sorted and maximally compacted so it doesn't even need to be copied out. + * Do the tuple compactification. Collapse memmove calls for adjacent + * tuples. s/memmove/memcpy/ > With the attached v3, performance is better. The test now runs > recovery 65.6 seconds, vs master's 148.5 seconds. So about 2.2x > faster. On my machine I'm seeing 57s, down from 86s on unpatched master, for the simple pgbench workload from https://github.com/macdice/redo-bench/. That's not quite what you're reporting but it still blows the doors off the faster sorting patch, which does it in 74s. > We should probably consider what else can be done to try to write > pages with tuples for earlier lineitems earlier in the page. VACUUM > FULLs and friends will switch back to the opposite order when > rewriting the heap. Yeah, and also bulk inserts/COPY. Ultimately if we flipped our page format on its head that'd come for free, but that'd be a bigger project with more ramifications. So one question is whether we want to do the order-reversing as part of this patch, or wait for a more joined-up project to make lots of code paths collude on making scan order match memory order (corellation = 1). Most or all of the gain from your patch would presumably still apply if did the exact opposite and forced offset order to match reverse-item ID order (correlation = -1), which also happens to be the initial state when you insert tuples today; you'd still tend towards a state that allows nice big memmov/memcpy calls during page compaction. IIUC currently we start with correlation -1 and then tend towards correlation = 0 after many random updates because we can't change the order, so it gets scrambled over time. I'm not sure what I think about that. PS You might as well post future patches with .patch endings so that the cfbot tests them; it seems pretty clear now that a patch to optimise sorting (as useful as it may be for future work) can't beat a patch to skip it completely. I took the liberty of switching the author and review names in the commitfest entry to reflect this.
pgsql-hackers by date: