Home > mailing lists

Thread: amcheck support for BRIN indexes

amcheck support for BRIN indexes

From

Arseniy Mukhin

Date:

22 April, 12:05:46

Hi,

It's not for v18, just wanted to share with the community and register
it in the upcoming Commitfest 2025-07.

Here is the patch with amcheck support for BRIN indexes.

Patch uses amcheck common infrastructure that was introduced in [1].
It works and I deleted all the code that
I copied from btree check at first. Great.

During the check we hold ShareUpdateExclusiveLock, so we don't block
regular reads/inserts/updates
and the same time range summarizations/desummarizations are impossible
during the check which simplifies check logic.
While checking we do ereport(ERROR) on the first issue, the same way
as btree, gin checks do.

There are two parts:

First part is 'index structure check':
1) Meta page check
2) Revmap check. Walk revmap and check every valid revmap item points
to the index tuple with the expected range blkno,
and index tuple is consistent with the tuple description. Also it's
not documented, but known from the brin code that
for every empty range we should have allnulls = true, hasnulls =
false. So this is also checked here.
3) Regular pages check. Walk regular pages and check that every index
tuple has a corresponding revmap item that points to it.
We don't check index tuple structures here, because it was already
done in 2 steps.

Regular pages check is optional. Orphan index tuple errors in this
check doesn't necessary mean that index is corrupted,
but AFAIS brin is not supposed to have such orphan index tuples now,
so if we encounter one than probably something
wrong with the index.

Second part is 'all consistent check':
We check all heap tuples are consistent with the index. It's the most
expensive part and it's optional.
Here we call consistent functions for every heap tuple. Also here we
check that fields like 'has_nulls', 'all_nulls',
'empty_range' are consistent with what we see in the heap.

There are two patch files:

0001-brin-refactoring.patch

It's just two tiny changes in the brin code.
1) For index tuple structure check we need to know how on-disk index
tuples look like.
Function that lets you get it 'brtuple_disk_tupdesc' is internal. This
patch makes it extern.
2) Create macros for BRIN_MAX_PAGES_PER_RANGE. For the meta page check.

0002-amechek-brin-support.patch

It's check functionality itself + regression tests + amcheck extension updates.

Some open questions:

1) How to get the correct strategy number for the scan key in "all
heap consistent" check. The consistent function wants
a scan key, and to generate it we have to pick a strategy number. We
can't just use the same strategy number for all
indexes because its meaning differs from opclass to opclass (for
example equal strategy number is 1 in Bloom
and 3 in min_max). We also can't pick an operator and use it for every
check, because opclasses don't have any requirements
about what operators they should support. The solution was to let user
to define check operator
(parameter consistent_operator_names). It's an array as we can have a
multicolumn index. We can use '=' as default check
operator, because AFAIS '=' operator is supported by all core
opclasses except 'box_inclusion_ops', and IMHO it's the
most obvious candidate for such a check. So if param
'consistent_operator_names' is an empty array (param default value),
then for all attributes we use operator '='. In most cases operator
'=' does the job and you don't need to worry about
consistent_operator_names param. Not sure about it, what do you think?

2) The current implementation of "all heap consistent" uses the scan
of the entire heap. So if we have a lot of
unsummarized ranges, we do a lot of wasted work because we can't use
the tuples that belong to the unsummarized ranges.
Instead of having one scan for the entire heap, we can walk the
revmap, take only the summarized ranges, and
scan only the pages that belong to those ranges. So we have one scan
per range. This approach helps us avoid touching
those heap tuples that we can't use for index checks. But I'm not sure
if we should to worry about that because every
autovacuum summarizes all the unsummarized ranges, so don't expect a
lot of unsummarized ranges on average.

TODO list:

1) add TAP tests
2) update SGML docs for amcheck (think it's better to do after patch
is reviewed and more or less finalized)
3) pg_amcheck integration

Big thanks to Tomas Vondra for the first patch idea and initial review.

[1] https://www.postgresql.org/message-id/45AC9B0A-2B45-40EE-B08F-BDCF5739D1E1%40yandex-team.ru

Best regards,
Arseniy Mukhin

Attachment

Re: amcheck support for BRIN indexes

From

Arseniy Mukhin

Date:

08 June, 15:39:22

Hi,

Here is a new version.

TAP tests were added. Tried to reproduce more or less every violation.
I don't include 2 tests where disk representation ItemIdData needs to
be changed: it works locally, but I don't think these tests are
portable. While writing tests some minor issues were found and fixed.
Also ci compiler warnings were fixed.


Best regards,
Arseniy Mukhin

Attachment

Re: amcheck support for BRIN indexes

From

Tomas Vondra

Date:

09 June, 01:16:04

On 6/8/25 14:39, Arseniy Mukhin wrote:
> Hi,
> 
> Here is a new version.
> 
> TAP tests were added. Tried to reproduce more or less every violation.
> I don't include 2 tests where disk representation ItemIdData needs to
> be changed: it works locally, but I don't think these tests are
> portable. While writing tests some minor issues were found and fixed.
> Also ci compiler warnings were fixed.
> 

Thanks. I've added myself as a reviewer, so that I don't forget about
this for the next CF.


regards

-- 
Tomas Vondra

Re: amcheck support for BRIN indexes

From

Andrey Borodin

Date:

16 June, 20:11:31

Hi!

Nice feature!

> On 8 Jun 2025, at 17:39, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:
>
> <v2-0001-brin-refactoring.patch>

+#define BRIN_MAX_PAGES_PER_RANGE    131072

I'd personally prefer some words on where does this limit come from. I'm not a BRIN pro, just curious.
Or, at least, maybe we can use a form 128 * 1024, if it's just a sane limit.

> On 8 Jun 2025, at 17:39, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote:
> <v2-0002-amcheck-brin-support.patch>

A bit more detailed commit message would be very useful.

The test coverage is very decent. The number of possible corruptions in tests is impressive. I don't really have an
experiencewith BRIN to say how different they are, but I want to ask if you are sure that these types of corruption
willbe portable across big-endian machines and such stuff. 

Copyright year in all new files should be 2025.

Some documentation about brin_index_check() would be handy, especially about its parameters. Perhaps, somewhere near
gin_index_check()in amcheck.sgml. 

brin_check_ereport() reports ERRCODE_DATA_CORRUPTED. We should distinguish between ERRCODE_INDEX_CORRUPTED which is a
stormbringerand ERRCODE_DATA_CORRUPTED which is an actual disaster. 

If it's not very difficult - it would be great to use read_stream infrastructure. See btvacuumscan() in nbtree.c
callingread_stream_begin_relation() for example. We cannot use it in logical scans in B-tree\GiST\GIN, but maybe BRIN
cantake some advantage of this new shiny technology. 

+        state->revmapbuf = ReadBufferExtended(idxrel, MAIN_FORKNUM, state->revmapBlk, RBM_NORMAL,
+                                              state->checkstrategy);
+        LockBuffer(state->revmapbuf, BUFFER_LOCK_SHARE);
// usage of state->revmapbuf
+        LockBuffer(state->revmapbuf, BUFFER_LOCK_UNLOCK);
// more usage of state->revmapbuf
+        ReleaseBuffer(state->revmapbuf);

I hope you know what you are doing. BRIN concurrency is not known to me at all.

That's all for first pass through patches. Thanks for working on it!

Best regards, Andrey Borodin.

Re: amcheck support for BRIN indexes

From

Álvaro Herrera

Date:

06 July, 22:49:54

On 2025-Jul-06, Arseniy Mukhin wrote:

> Sorry, forget to run a full test run with the new patch version. Some
> tests were unhappy with the new unknown support function. Here the new
> version with the fix.

Hello, I think this patch is probably a good idea.  I don't think it
makes sense to introduce a bunch of code in 0003 only to rewrite it
completely in 0005.  I would ask that you re-split your WITHIN_RANGE
(0004) to appear before the amcheck code, and then write the amcheck
code using that new functionality.

>  /*
>   * Return a tuple descriptor used for on-disk storage of BRIN tuples.
>   */
> -static TupleDesc
> +TupleDesc
>  brtuple_disk_tupdesc(BrinDesc *brdesc)

I think we should give this function a better name if it's going to be
exported.  How about brin_tuple_tupdesc?  (in brin_tuple.h we
seem to distinguish "brin tuples" which are the stored ones, from "brin
mem tuples" which are the ones to be used in memory.)

I didn't read the other patches.

Thanks

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"I'm impressed how quickly you are fixing this obscure issue. I came from 
MS SQL and it would be hard for me to put into words how much of a better job
you all are doing on [PostgreSQL]."
 Steve Midgley, http://archives.postgresql.org/pgsql-sql/2008-08/msg00000.php

Re: amcheck support for BRIN indexes

From

Arseniy Mukhin

Date:

07 July, 14:06:24

On Sun, Jul 6, 2025 at 10:49 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
>
> On 2025-Jul-06, Arseniy Mukhin wrote:
>
> > Sorry, forget to run a full test run with the new patch version. Some
> > tests were unhappy with the new unknown support function. Here the new
> > version with the fix.
>
> Hello, I think this patch is probably a good idea.  I don't think it
> makes sense to introduce a bunch of code in 0003 only to rewrite it
> completely in 0005.  I would ask that you re-split your WITHIN_RANGE
> (0004) to appear before the amcheck code, and then write the amcheck
> code using that new functionality.

Hi, Álvaro!

Thank you for looking into this.

OK, we can easily revert to the version with consistent function if
needed, so let's get rid of it.

>
> >  /*
> >   * Return a tuple descriptor used for on-disk storage of BRIN tuples.
> >   */
> > -static TupleDesc
> > +TupleDesc
> >  brtuple_disk_tupdesc(BrinDesc *brdesc)
>
> I think we should give this function a better name if it's going to be
> exported.  How about brin_tuple_tupdesc?  (in brin_tuple.h we
> seem to distinguish "brin tuples" which are the stored ones, from "brin
> mem tuples" which are the ones to be used in memory.)
>

'brin_tuple_tupdesc' sounds good to me. Done.


So here is a new version. 0001, 0002 - index structure check. 0003,
0004 - all heap indexed using WITHIN_RANGE approach.

Thank you!


Best regards,
Arseniy Mukhin