Re: Parallel CREATE INDEX for BRIN indexes - Mailing list pgsql-hackers
From | Matthias van de Meent |
---|---|
Subject | Re: Parallel CREATE INDEX for BRIN indexes |
Date | |
Msg-id | CAEze2Wh8TLFC=GZCdJ0uPbuhGS-Mo1HXrcd52+cEXhwSb+mvUw@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel CREATE INDEX for BRIN indexes (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: Parallel CREATE INDEX for BRIN indexes
|
List | pgsql-hackers |
On Wed, 5 Jul 2023 at 00:08, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > > > On 7/4/23 23:53, Matthias van de Meent wrote: > > On Thu, 8 Jun 2023 at 14:55, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > >> > >> Hi, > >> > >> Here's a WIP patch allowing parallel CREATE INDEX for BRIN indexes. The > >> infrastructure (starting workers etc.) is "inspired" by the BTREE code > >> (i.e. copied from that and massaged a bit to call brin stuff). > > > > Nice work. > > > >> In both cases _brin_end_parallel then reads the summaries from worker > >> files, and adds them into the index. In 0001 this is fairly simple, > >> although we could do one more improvement and sort the ranges by range > >> start to make the index nicer (and possibly a bit more efficient). This > >> should be simple, because the per-worker results are already sorted like > >> that (so a merge sort in _brin_end_parallel would be enough). > > > > I see that you manually built the passing and sorting of tuples > > between workers, but can't we use the parallel tuplesort > > infrastructure for that? It already has similar features in place and > > improves code commonality. > > > > Maybe. I wasn't that familiar with what parallel tuplesort can and can't > do, and the little I knew I managed to forget since I wrote this patch. > Which similar features do you have in mind? > > The workers are producing the results in "start_block" order, so if they > pass that to the leader, it probably can do the usual merge sort. > > >> For 0002 it's a bit more complicated, because with a single parallel > >> scan brinbuildCallbackParallel can't decide if a range is assigned to a > >> different worker or empty. And we want to generate summaries for empty > >> ranges in the index. We could either skip such range during index build, > >> and then add empty summaries in _brin_end_parallel (if needed), or add > >> them and then merge them using "union". > >> > >> > >> I just realized there's a third option to do this - we could just do > >> regular parallel scan (with no particular regard to pagesPerRange), and > >> then do "union" when merging results from workers. It doesn't require > >> the sequence of TID scans, and the union would also handle the empty > >> ranges. The per-worker results might be much larger, though, because > >> each worker might produce up to the "full" BRIN index. > > > > Would it be too much effort to add a 'min_chunk_size' argument to > > table_beginscan_parallel (or ParallelTableScanDesc) that defines the > > minimum granularity of block ranges to be assigned to each process? I > > think that would be the most elegant solution that would require > > relatively little effort: table_block_parallelscan_nextpage already > > does parallel management of multiple chunk sizes, and I think this > > modification would fit quite well in that code. > > > > I'm confused. Isn't that pretty much exactly what 0002 does? I mean, > that passes pagesPerRange to table_parallelscan_initialize(), so that > each pagesPerRange is assigned to a single worker. Huh, I overlooked that one... Sorry for that. > The trouble I described above is that the scan returns tuples, and the > consumer has no idea what was the chunk size or how many other workers > are there. Imagine you get a tuple from block 1, and then a tuple from > block 1000. Does that mean that the blocks in between are empty or that > they were processed by some other worker? If the unit of work for parallel table scans is the index's pages_per_range, then I think we can just fill in expected-but-missing ranges as 'empty' in the parallel leader during index loading, like the first of the two solutions you proposed. Kind regards, Matthias van de Meent Neon (https://neon.tech/)
pgsql-hackers by date: