Resetting snapshots during the first phase of [CREATE |RE]INDEX CONCURRENTLY - Mailing list pgsql-hackers

From Mihail Nikalayeu
Subject Resetting snapshots during the first phase of [CREATE |RE]INDEX CONCURRENTLY
Date
Msg-id CADzfLwWAiimwngq38x8Ji26AtYEmSqnFD7aa4kVGF4ObDswHMA@mail.gmail.com
Whole thread Raw
Responses Re: Resetting snapshots during the first phase of [CREATE |RE]INDEX CONCURRENTLY
List pgsql-hackers
Hello, everyone! 

(added to CC involved persons based on history part and previous thread activity)

This thread is an extraction of part of https://commitfest.postgresql.org/patch/4971/ (STIR) into a separate commitfest entry as was agreed in [0].

HISTORY

In 2021 Álvaro proposed [1] and committed [2] the feature: VACUUM ignores snapshots involved in concurrent indexing operations. This was a great feature in PG14.

But in 2022 a bug related to the tuples missing in indexes was detected, and a little bit later explained by Andres [3]. As a result, the feature was reverted [4].

This patch set brings back part of it (first phase of CIC) using snapshot resetting technique.
For the second phase - another solution is available in entry mentioned above.

STRUCTURE

It is based on Matthias' idea [5] - to just reset snapshots every so often during a concurrent index build. It may work only during the first scan (because we'll miss some tuples during validation scan with such an approach).
Logic is simple – since the index built by the first scan already misses a lot of tuples – we may not worry to miss a few more – the validation phase is going to fix it anyway. Of course, it is not so simple in case of unique indexes, but still possible.

Commits are:

- Add stress tests for concurrent index builds

This is a set of stress tests to ensure concurrent index operations are worked correctly.
It easily detects the bug from 2022 (and other different things I met during the development, some were in master and already fixed - [6] and [7]).

- Reset snapshots periodically in non-unique non-parallel concurrent index builds

Apply this technique to the simplest case – non-unique and non-parallel. Snapshot is changed "between" pages.
One possible place here to worry about – to ensure xmin advanced we need to call InvalidateCatalogSnapshot during each snapshot switch.
So, theoretically it may cause some issues, but the table is locked to changes during the process. At least commit [2] (which ignored xmin of
CIC backend) did the same thing in essence, actually. Another more "clear" option here - we may just extract a separate catalog snapshot horizon (one more field near xmin specially only for catalog snapshot), it seems to be a pretty straightforward change.

- Support snapshot resets in parallel concurrent index builds

Extend that technique to parallel builds. It is mostly about ensuring workers have an initial snapshot restored from the leader before the leader goes to reset it.

- Support snapshot resets in concurrent builds of unique indexes

The most tricky commit in the patch set – apply that to unique indexes. Changing of snapshots may cause issues with validation of unique constraints. Currently, validation is done during the sorting of tuples, but that doesn't work with tuples read with different snapshots (some of them are dead already).

To deal with it:
- in case we see two identical tuples during tuplesort – ignore if some of them are dead according to SnapshotSelf, but fail if two are alive. It is not a required part, it is just mechanics for fail-fast behavior and may be removed or limited in spent resources.
- to provide the guarantee – during _bt_load compare the inserted index value with previously inserted. If they are equal – make sure only a single SnapshotSelf alive tuple exists in the whole equal "group" (it may include more than two tuples in general).

Theoretically it may affect performance of _bt_load because of _bt_keep_natts(_fast) call for each tuple, but I was unable to notice any significant difference here.

Best regards,
Mikhail.

[0]: https://www.postgresql.org/message-id/flat/CAEze2Wg6d2M8hop4uwdQXeH-YkOmEHyqp83%2BaE7vEzKdmu7w-A%40mail.gmail.com#b5dbefccc537167bc8c1efe0ad063491
[1]: https://www.postgresql.org/message-id/flat/20210115142926.GA19300%40alvherre.pgsql#0988173cb0cf4b8eb710a6cdaa88fcac
[2]: https://github.com/postgres/postgres/commit/d9d076222f5b94a85e0e318339cfc44b8f26022d
[3]: https://www.postgresql.org/message-id/flat/20220524190133.j6ee7zh4f5edt5je%40alap3.anarazel.de#1781408f40034c414ad6738140c118ef
[4]: https://github.com/postgres/postgres/commit/e28bb885196916b0a3d898ae4f2be0e38108d81b
[5]: https://www.postgresql.org/message-id/flat/CAEze2WgW6pj48xJhG_YLUE1QS%2Bn9Yv0AZQwaWeb-r%2BX%3DHAxU_g%40mail.gmail.com#b3809c158de4481bb1b29894aaa63fae

Attachment

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: how to gate experimental features (SQL/PGQ)
Next
From: Andres Freund
Date:
Subject: Re: how to gate experimental features (SQL/PGQ)