Home > mailing lists

Re: PATCH: Using BRIN indexes for sorted output - Mailing list pgsql-hackers

From	Matthias van de Meent
Subject	Re: PATCH: Using BRIN indexes for sorted output
Date	February 23, 2023 16:44:31
Msg-id	CAEze2WgGueCY+ihQDASaKAWNG9C2Ex0vzoNU5inrMLwWyj3KMA@mail.gmail.com Whole thread Raw
In response to	Re: PATCH: Using BRIN indexes for sorted output (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses	Re: PATCH: Using BRIN indexes for sorted output Re: PATCH: Using BRIN indexes for sorted output
List	pgsql-hackers

Tree view

On Thu, 23 Feb 2023 at 16:22, Tomas Vondra
<tomas.vondra@enterprisedb.com> wrote:
>
> On 2/23/23 15:19, Matthias van de Meent wrote:
>> Comments on 0001, mostly comments and patch design:

One more comment:

>>> + range->min_value = bval->bv_values[0];
>>> + range->max_value = bval->bv_values[1];

This produces dangling pointers for minmax indexes on by-ref types
such as text and bytea, due to the memory context of the decoding
tuple being reset every iteration. :-(

> >> +range_values_cmp(const void *a, const void *b, void *arg)
> >
> > Can the arguments of these functions be modified into the types they
> > are expected to receive? If not, could you add a comment on why that's
> > not possible?
>
> The reason is that that's what qsort() expects. If you change that to
> actual data types, you'll get compile-time warnings. I agree this may
> need better comments, though.

Thanks in advance.

> >> +     * Statistics calculated by index AM (e.g. BRIN for ranges, etc.).
> >
> > Could you please expand on this? We do have GIST support for ranges, too.
> >
>
> Expand in what way? This is meant to be AM-specific, so if GiST wants to
> collect some additional stats, it's free to do so - perhaps some of the
> ideas from the stats collected for BRIN would be applicable, but it's
> also bound to the index structure.

I don't quite understand the flow of the comment, as I don't clearly
see what the "BRIN for ranges" tries to refer to. In my view, that
makes it a bad example which needs further explanation or rewriting,
aka "expanding on".

> >> + * brin_minmax_stats
> >> + *        Calculate custom statistics for a BRIN minmax index.
> >> + *
> >> + * At the moment this calculates:
> >> + *
> >> + *  - number of summarized/not-summarized and all/has nulls ranges
> >
> > I think statistics gathering of an index should be done at the AM
> > level, not attribute level. The docs currently suggest that the user
> > builds one BRIN index with 16 columns instead of 16 BRIN indexes with
> > one column each, which would make the statistics gathering use 16x
> > more IO if the scanned data cannot be reused.
> >
>
> Why? The row sample is collected only once and used for building all the
> index AM stats - it doesn't really matter if we analyze 16 single-column
> indexes or 1 index with 16 columns. Yes, we'll need to scan more
> indexes, but the with 16 columns the summaries will be larger so the
> total amount of I/O will be almost the same I think.
>
> Or maybe I don't understand what I/O you're talking about?

With the proposed patch, we do O(ncols_statsenabled) scans over the
BRIN index. Each scan reads all ncol columns of all block ranges from
disk, so in effect the data scan does on the order of
O(ncols_statsenabled * ncols * nranges) IOs, or O(n^2) on cols when
all columns have statistics enabled.

> > It is possible to build BRIN indexes on more than one column with more
> > than one opclass family like `USING brin (id int8_minmax_ops, id
> > int8_bloom_ops)`. This would mean various duplicate statistics fields,
> > no?
> > It seems to me that it's more useful to do the null- and n_summarized
> > on the index level instead of duplicating that inside the opclass.
>
> I don't think it's worth it. The amount of data this would save is tiny,
> and it'd only apply to cases where the index includes the same attribute
> multiple times, and that's pretty rare I think. I don't think it's worth
> the extra complexity.

Not necessarily, it was just an example of where we'd save IO.
Note that the current gathering method already retrieves all tuple
attribute data, so from a basic processing perspective we'd save some
time decoding as well.

> >
> > I'm planning on reviewing the other patches, and noticed that a lot of
> > the patches are marked WIP. Could you share a status on those, because
> > currently that status is unknown: Are these patches you don't plan on
> > including, or are these patches only (or mostly) included for
> > debugging?
> >
>
> I think the WIP label is a bit misleading, I used it mostly to mark
> patches that are not meant to be committed on their own. A quick overview:
>
> [...]

Thanks for the explanation, that's quite helpful. I'll see to further
reviewing 0004 and 0005 when I have additional time.

Kind regards,

Matthias van de Meent.

pgsql-hackers by date:

From: Tom Lane
Date: 23 February 2023, 16:06:56
Subject: Re: Wrong query results caused by loss of join quals

From: Maciek Sakrejda
Date: 23 February 2023, 17:04:25
Subject: Re: Proposal: %T Prompt parameter for psql for current time (like Oracle has)

Re: PATCH: Using BRIN indexes for sorted output - Mailing list pgsql-hackers

Previous

Next