Thread: amcheck support for BRIN indexes
Hi, It's not for v18, just wanted to share with the community and register it in the upcoming Commitfest 2025-07. Here is the patch with amcheck support for BRIN indexes. Patch uses amcheck common infrastructure that was introduced in [1]. It works and I deleted all the code that I copied from btree check at first. Great. During the check we hold ShareUpdateExclusiveLock, so we don't block regular reads/inserts/updates and the same time range summarizations/desummarizations are impossible during the check which simplifies check logic. While checking we do ereport(ERROR) on the first issue, the same way as btree, gin checks do. There are two parts: First part is 'index structure check': 1) Meta page check 2) Revmap check. Walk revmap and check every valid revmap item points to the index tuple with the expected range blkno, and index tuple is consistent with the tuple description. Also it's not documented, but known from the brin code that for every empty range we should have allnulls = true, hasnulls = false. So this is also checked here. 3) Regular pages check. Walk regular pages and check that every index tuple has a corresponding revmap item that points to it. We don't check index tuple structures here, because it was already done in 2 steps. Regular pages check is optional. Orphan index tuple errors in this check doesn't necessary mean that index is corrupted, but AFAIS brin is not supposed to have such orphan index tuples now, so if we encounter one than probably something wrong with the index. Second part is 'all consistent check': We check all heap tuples are consistent with the index. It's the most expensive part and it's optional. Here we call consistent functions for every heap tuple. Also here we check that fields like 'has_nulls', 'all_nulls', 'empty_range' are consistent with what we see in the heap. There are two patch files: 0001-brin-refactoring.patch It's just two tiny changes in the brin code. 1) For index tuple structure check we need to know how on-disk index tuples look like. Function that lets you get it 'brtuple_disk_tupdesc' is internal. This patch makes it extern. 2) Create macros for BRIN_MAX_PAGES_PER_RANGE. For the meta page check. 0002-amechek-brin-support.patch It's check functionality itself + regression tests + amcheck extension updates. Some open questions: 1) How to get the correct strategy number for the scan key in "all heap consistent" check. The consistent function wants a scan key, and to generate it we have to pick a strategy number. We can't just use the same strategy number for all indexes because its meaning differs from opclass to opclass (for example equal strategy number is 1 in Bloom and 3 in min_max). We also can't pick an operator and use it for every check, because opclasses don't have any requirements about what operators they should support. The solution was to let user to define check operator (parameter consistent_operator_names). It's an array as we can have a multicolumn index. We can use '=' as default check operator, because AFAIS '=' operator is supported by all core opclasses except 'box_inclusion_ops', and IMHO it's the most obvious candidate for such a check. So if param 'consistent_operator_names' is an empty array (param default value), then for all attributes we use operator '='. In most cases operator '=' does the job and you don't need to worry about consistent_operator_names param. Not sure about it, what do you think? 2) The current implementation of "all heap consistent" uses the scan of the entire heap. So if we have a lot of unsummarized ranges, we do a lot of wasted work because we can't use the tuples that belong to the unsummarized ranges. Instead of having one scan for the entire heap, we can walk the revmap, take only the summarized ranges, and scan only the pages that belong to those ranges. So we have one scan per range. This approach helps us avoid touching those heap tuples that we can't use for index checks. But I'm not sure if we should to worry about that because every autovacuum summarizes all the unsummarized ranges, so don't expect a lot of unsummarized ranges on average. TODO list: 1) add TAP tests 2) update SGML docs for amcheck (think it's better to do after patch is reviewed and more or less finalized) 3) pg_amcheck integration Big thanks to Tomas Vondra for the first patch idea and initial review. [1] https://www.postgresql.org/message-id/45AC9B0A-2B45-40EE-B08F-BDCF5739D1E1%40yandex-team.ru Best regards, Arseniy Mukhin
Attachment
Hi, Here is a new version. TAP tests were added. Tried to reproduce more or less every violation. I don't include 2 tests where disk representation ItemIdData needs to be changed: it works locally, but I don't think these tests are portable. While writing tests some minor issues were found and fixed. Also ci compiler warnings were fixed. Best regards, Arseniy Mukhin
Attachment
On 6/8/25 14:39, Arseniy Mukhin wrote: > Hi, > > Here is a new version. > > TAP tests were added. Tried to reproduce more or less every violation. > I don't include 2 tests where disk representation ItemIdData needs to > be changed: it works locally, but I don't think these tests are > portable. While writing tests some minor issues were found and fixed. > Also ci compiler warnings were fixed. > Thanks. I've added myself as a reviewer, so that I don't forget about this for the next CF. regards -- Tomas Vondra
Hi! Nice feature! > On 8 Jun 2025, at 17:39, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote: > > <v2-0001-brin-refactoring.patch> +#define BRIN_MAX_PAGES_PER_RANGE 131072 I'd personally prefer some words on where does this limit come from. I'm not a BRIN pro, just curious. Or, at least, maybe we can use a form 128 * 1024, if it's just a sane limit. > On 8 Jun 2025, at 17:39, Arseniy Mukhin <arseniy.mukhin.dev@gmail.com> wrote: > <v2-0002-amcheck-brin-support.patch> A bit more detailed commit message would be very useful. The test coverage is very decent. The number of possible corruptions in tests is impressive. I don't really have an experiencewith BRIN to say how different they are, but I want to ask if you are sure that these types of corruption willbe portable across big-endian machines and such stuff. Copyright year in all new files should be 2025. Some documentation about brin_index_check() would be handy, especially about its parameters. Perhaps, somewhere near gin_index_check()in amcheck.sgml. brin_check_ereport() reports ERRCODE_DATA_CORRUPTED. We should distinguish between ERRCODE_INDEX_CORRUPTED which is a stormbringerand ERRCODE_DATA_CORRUPTED which is an actual disaster. If it's not very difficult - it would be great to use read_stream infrastructure. See btvacuumscan() in nbtree.c callingread_stream_begin_relation() for example. We cannot use it in logical scans in B-tree\GiST\GIN, but maybe BRIN cantake some advantage of this new shiny technology. + state->revmapbuf = ReadBufferExtended(idxrel, MAIN_FORKNUM, state->revmapBlk, RBM_NORMAL, + state->checkstrategy); + LockBuffer(state->revmapbuf, BUFFER_LOCK_SHARE); // usage of state->revmapbuf + LockBuffer(state->revmapbuf, BUFFER_LOCK_UNLOCK); // more usage of state->revmapbuf + ReleaseBuffer(state->revmapbuf); I hope you know what you are doing. BRIN concurrency is not known to me at all. That's all for first pass through patches. Thanks for working on it! Best regards, Andrey Borodin.
On 2025-Jul-06, Arseniy Mukhin wrote: > Sorry, forget to run a full test run with the new patch version. Some > tests were unhappy with the new unknown support function. Here the new > version with the fix. Hello, I think this patch is probably a good idea. I don't think it makes sense to introduce a bunch of code in 0003 only to rewrite it completely in 0005. I would ask that you re-split your WITHIN_RANGE (0004) to appear before the amcheck code, and then write the amcheck code using that new functionality. > /* > * Return a tuple descriptor used for on-disk storage of BRIN tuples. > */ > -static TupleDesc > +TupleDesc > brtuple_disk_tupdesc(BrinDesc *brdesc) I think we should give this function a better name if it's going to be exported. How about brin_tuple_tupdesc? (in brin_tuple.h we seem to distinguish "brin tuples" which are the stored ones, from "brin mem tuples" which are the ones to be used in memory.) I didn't read the other patches. Thanks -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "I'm impressed how quickly you are fixing this obscure issue. I came from MS SQL and it would be hard for me to put into words how much of a better job you all are doing on [PostgreSQL]." Steve Midgley, http://archives.postgresql.org/pgsql-sql/2008-08/msg00000.php
On Sun, Jul 6, 2025 at 10:49 PM Álvaro Herrera <alvherre@kurilemu.de> wrote: > > On 2025-Jul-06, Arseniy Mukhin wrote: > > > Sorry, forget to run a full test run with the new patch version. Some > > tests were unhappy with the new unknown support function. Here the new > > version with the fix. > > Hello, I think this patch is probably a good idea. I don't think it > makes sense to introduce a bunch of code in 0003 only to rewrite it > completely in 0005. I would ask that you re-split your WITHIN_RANGE > (0004) to appear before the amcheck code, and then write the amcheck > code using that new functionality. Hi, Álvaro! Thank you for looking into this. OK, we can easily revert to the version with consistent function if needed, so let's get rid of it. > > > /* > > * Return a tuple descriptor used for on-disk storage of BRIN tuples. > > */ > > -static TupleDesc > > +TupleDesc > > brtuple_disk_tupdesc(BrinDesc *brdesc) > > I think we should give this function a better name if it's going to be > exported. How about brin_tuple_tupdesc? (in brin_tuple.h we > seem to distinguish "brin tuples" which are the stored ones, from "brin > mem tuples" which are the ones to be used in memory.) > 'brin_tuple_tupdesc' sounds good to me. Done. So here is a new version. 0001, 0002 - index structure check. 0003, 0004 - all heap indexed using WITHIN_RANGE approach. Thank you! Best regards, Arseniy Mukhin