Re: PATCH: Using BRIN indexes for sorted output - Mailing list pgsql-hackers
From | Matthias van de Meent |
---|---|
Subject | Re: PATCH: Using BRIN indexes for sorted output |
Date | |
Msg-id | CAEze2WgGueCY+ihQDASaKAWNG9C2Ex0vzoNU5inrMLwWyj3KMA@mail.gmail.com Whole thread Raw |
In response to | Re: PATCH: Using BRIN indexes for sorted output (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: PATCH: Using BRIN indexes for sorted output
Re: PATCH: Using BRIN indexes for sorted output |
List | pgsql-hackers |
On Thu, 23 Feb 2023 at 16:22, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > On 2/23/23 15:19, Matthias van de Meent wrote: >> Comments on 0001, mostly comments and patch design: One more comment: >>> + range->min_value = bval->bv_values[0]; >>> + range->max_value = bval->bv_values[1]; This produces dangling pointers for minmax indexes on by-ref types such as text and bytea, due to the memory context of the decoding tuple being reset every iteration. :-( > >> +range_values_cmp(const void *a, const void *b, void *arg) > > > > Can the arguments of these functions be modified into the types they > > are expected to receive? If not, could you add a comment on why that's > > not possible? > > The reason is that that's what qsort() expects. If you change that to > actual data types, you'll get compile-time warnings. I agree this may > need better comments, though. Thanks in advance. > >> + * Statistics calculated by index AM (e.g. BRIN for ranges, etc.). > > > > Could you please expand on this? We do have GIST support for ranges, too. > > > > Expand in what way? This is meant to be AM-specific, so if GiST wants to > collect some additional stats, it's free to do so - perhaps some of the > ideas from the stats collected for BRIN would be applicable, but it's > also bound to the index structure. I don't quite understand the flow of the comment, as I don't clearly see what the "BRIN for ranges" tries to refer to. In my view, that makes it a bad example which needs further explanation or rewriting, aka "expanding on". > >> + * brin_minmax_stats > >> + * Calculate custom statistics for a BRIN minmax index. > >> + * > >> + * At the moment this calculates: > >> + * > >> + * - number of summarized/not-summarized and all/has nulls ranges > > > > I think statistics gathering of an index should be done at the AM > > level, not attribute level. The docs currently suggest that the user > > builds one BRIN index with 16 columns instead of 16 BRIN indexes with > > one column each, which would make the statistics gathering use 16x > > more IO if the scanned data cannot be reused. > > > > Why? The row sample is collected only once and used for building all the > index AM stats - it doesn't really matter if we analyze 16 single-column > indexes or 1 index with 16 columns. Yes, we'll need to scan more > indexes, but the with 16 columns the summaries will be larger so the > total amount of I/O will be almost the same I think. > > Or maybe I don't understand what I/O you're talking about? With the proposed patch, we do O(ncols_statsenabled) scans over the BRIN index. Each scan reads all ncol columns of all block ranges from disk, so in effect the data scan does on the order of O(ncols_statsenabled * ncols * nranges) IOs, or O(n^2) on cols when all columns have statistics enabled. > > It is possible to build BRIN indexes on more than one column with more > > than one opclass family like `USING brin (id int8_minmax_ops, id > > int8_bloom_ops)`. This would mean various duplicate statistics fields, > > no? > > It seems to me that it's more useful to do the null- and n_summarized > > on the index level instead of duplicating that inside the opclass. > > I don't think it's worth it. The amount of data this would save is tiny, > and it'd only apply to cases where the index includes the same attribute > multiple times, and that's pretty rare I think. I don't think it's worth > the extra complexity. Not necessarily, it was just an example of where we'd save IO. Note that the current gathering method already retrieves all tuple attribute data, so from a basic processing perspective we'd save some time decoding as well. > > > > I'm planning on reviewing the other patches, and noticed that a lot of > > the patches are marked WIP. Could you share a status on those, because > > currently that status is unknown: Are these patches you don't plan on > > including, or are these patches only (or mostly) included for > > debugging? > > > > I think the WIP label is a bit misleading, I used it mostly to mark > patches that are not meant to be committed on their own. A quick overview: > > [...] Thanks for the explanation, that's quite helpful. I'll see to further reviewing 0004 and 0005 when I have additional time. Kind regards, Matthias van de Meent.
pgsql-hackers by date: