Re: Parallel CREATE INDEX for BRIN indexes - Mailing list pgsql-hackers
From | Matthias van de Meent |
---|---|
Subject | Re: Parallel CREATE INDEX for BRIN indexes |
Date | |
Msg-id | CAEze2WiMsPZg=xkvSF_jt4=69k6K7gz5B8V2wY3gCGZ+1BzCbQ@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel CREATE INDEX for BRIN indexes (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: Parallel CREATE INDEX for BRIN indexes
|
List | pgsql-hackers |
On Thu, 30 Nov 2023 at 01:10, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > On 11/29/23 23:59, Matthias van de Meent wrote: >> On Wed, 29 Nov 2023 at 21:56, Tomas Vondra >> <tomas.vondra@enterprisedb.com> wrote: >>> >>> On 11/29/23 21:30, Matthias van de Meent wrote: >>>> On Wed, 29 Nov 2023 at 18:55, Tomas Vondra >>>> <tomas.vondra@enterprisedb.com> wrote: >>>>> I did try to measure how much it actually saves, but none of the tests I >>>>> did actually found measurable improvement. So I'm tempted to just not >>>>> include this part, and accept that we may deserialize some of the tuples >>>>> unnecessarily. >>>>> >>>>> Did you actually observe measurable improvements in some cases? >>>> >>>> The improvements would mostly stem from brin indexes with multiple >>>> (potentially compressed) by-ref types, as they go through more complex >>>> and expensive code to deserialize, requiring separate palloc() and >>>> memcpy() calls each. >>>> For single-column and by-value types the improvements are expected to >>>> be negligible, because there is no meaningful difference between >>>> copying a single by-ref value and copying its container; the >>>> additional work done for each tuple is marginal for those. >>>> >>>> For an 8-column BRIN index ((sha256((id)::text::bytea)::text), >>>> (sha256((id+1)::text::bytea)::text), >>>> (sha256((id+2)::text::bytea)::text), ...) instrumented with 0003 I >>>> measured a difference of 10x less time spent in the main loop of >>>> _brin_end_parallel, from ~30ms to 3ms when dealing with 55k 1-block >>>> ranges. It's not a lot, but worth at least something, I guess? >>>> >>> >>> It is something, but I can't really convince myself it's worth the extra >>> code complexity. It's a somewhat extreme example, and the parallelism >>> certainly saves much more than this. >> >> True. For this, I usually keep in mind that the docs on multi-column >> indexes still indicate to use 1 N-column brin index over N 1-column >> brin indexes (assuming the same storage parameters), so multi-column >> BRIN indexes should not be considered to be uncommon: >> >> "The only reason to have multiple BRIN indexes instead of one >> multicolumn BRIN index on a single table is to have a different >> pages_per_range storage parameter." >> >> Note that most of the time in my example index is spent in creating >> the actual tuples due to the use of hashing for data generation; for >> index or plain to-text formatting the improvement is much more >> pronounced: If I use an 8-column index (id::text, id, ...), index >> creation takes ~500ms with 4+ workers. Of this, deforming takes some >> 20ms, though when skipping the deforming step (i.e.,with my patch) it >> takes ~3.5ms. That's a 3% shaved off the build time when the index >> shape is beneficial. >> > > That's all true, and while 3.5% is not something to ignore, my POV is > that the parallelism speeds this up from ~2000ms to ~500ms. Yes, it > would be great to shave off the extra 1% (relative to the original > duration). But I don't have a great idea how to do code that in a way > that is readable, and I don't want to stall the patch indefinitely > because of a comparatively small improvement. > > Therefore I propose we get the simpler code committed and leave this as > a future improvement. That's fine with me, it is one reason why I kept it as a separate patch file. >>>> The attached patch fixes the issue that you called out . >>>> It also further updates _brin_end_parallel: the final 'write empty >>>> tuples' loop is never hit and is thus removed, because if there were >>>> any tuples in the spool we'd have filled the empty ranges at the end >>>> of the main loop, and if there were no tuples in the spool then the >>>> memtuple would still be at its original initialized value of 0 thus >>>> resulting in a constant false condition. I also updated some comments. >>>> >>> >>> Ah, right. I'll take a look tomorrow, but I guess I didn't realize we >>> insert the empty ranges in the main loop, because we're already looking >>> at the *next* summary. >> >> Yes, merging adds some significant complexity here. I don't think we >> can easily get around that though... >> >>> But I think the idea was to insert empty ranges if there's a chunk of >>> empty ranges at the end of the table, after the last tuple the index >>> build reads. But I'm not sure that can actually happen ... >> >> This would be trivial to construct with partial indexes; e.g. WHERE >> (my_pk IS NULL) would consist of exclusively empty ranges. >> I don't see a lot of value in partial BRIN indexes, but I may be >> overlooking something. >> > > Oh, I haven't even thought about partial BRIN indexes! I'm sure for > those it's even more important to actually fill-in the empty ranges, > otherwise we end up scanning the whole supposedly filtered-out part of > the table. I'll do some testing with that. I just ran some more tests in less favorable environments, and it looks like I hit a bug: % SET max_parallel_workers = 0; % CREATE INDEX ... USING brin (...); ERROR: cannot update tuples during a parallel operation Fix attached in 0002. In 0003 I add the mentioned backfilling of empty ranges at the end of the table. I added it for both normal and parallel index builds, as normal builds apparently also didn't yet have this yet. Kind regards, Matthias van de Meent
Attachment
pgsql-hackers by date: