Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements - Mailing list pgsql-hackers
| From | Matthias van de Meent |
|---|---|
| Subject | Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements |
| Date | |
| Msg-id | CAEze2WigOH5Cyxo-oPjHebkfrohbe1H7sESKcDMi92vNUingWA@mail.gmail.com Whole thread Raw |
| In response to | Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements (Antonin Houska <ah@cybertec.at>) |
| List | pgsql-hackers |
On Thu, 27 Nov 2025 at 17:56, Antonin Houska <ah@cybertec.at> wrote:
>
> Michail Nikolaev <michail.nikolaev@gmail.com> wrote:
>
> > I think about revisiting (1) ({CREATE INDEX, REINDEX} CONCURRENTLY
> > improvements) in some lighter way.
>
> I haven't read the whole thread yet, but the effort to minimize the impact of
> C/RIC on VACUUM seems to prevail. Following is one more proposal. The core
> idea is that C/RIC should avoid indexing dead tuples, however snapshot is not
> necessary to distinguish dead tuple from a live one. And w/o snapshot, the
> backend executing C/RIC does not restrict VACUUM on other tables.
>
> Concurrent (re)build of unique index appears to be another topic of this
> thread, but I think this approach should handle the problem too. The workflow
> is:
>
> 1. Create an empty index.
>
> 2. Wait until all transactions are aware of the index, so they take the new
> index into account when deciding on new HOT chains. (This is already
> implemented.)
>
> 3. Set the 'indisready' flag so the index is ready for insertions.
>
> 4. While other transactions can insert their tuples into the index now,
> process the table one page at a time this way:
>
> 4.1 Acquire (shared) content lock on the buffer.
>
> 4.3 Collect the root tuples of HOT chains - these and only these need to be
> inserted into the index.
>
> 4.4 Unlock the buffer.
> 5. Once the whole table is processed, insert the collected tuples into the
> index.
>
> To avoid insertions of tuples that concurrent transactions have just
> inserted, we'd need something like index.c:validate_index() (i.e. insert
> into the index only the tuples that it does not contain yet), but w/o
> snapshot because we already have the heap tuples collected.
>
> Also it'd make sense to wait for completion of all the transactions that
> currently have the table locked for INSERT/UPDATE: some of these might have
> inserted their tuples into the heap, but not yet into the index. If we
> included some of those tuples into our collection and insert them into the
> index first, the other transactions could end up with ERROR when inserting
> those tuples again.
>
> 6. Set the 'indisvalid' flag so that the index can be used by queries.
>
> Note on pruning: As we only deal with the root tuples of HOT chains (4.3),
> page pruning triggered by queries (heap_page_prune_opt) should not be
> disruptive. Actually C/RIC can do the pruning itself it it appears to be
> useful. For example, if whole HOT chain should be considered DEAD by the next
> VACUUM, pruning is likely (depending on the OldestXid) to remove it so that we
> do not insert TID of the root tuple into the index unnecessarily.
[...]
> Of course, I could have missed some important point, so please explain why
> this concept is broken :-) Or let me know if something needs to be explained
> more in detail. Thanks.
1. When do you select and insert tuples that aren't part of a hot
chain into the index, i.e. tuples that were never updated after they
got inserted into the table? Or is every tuple "part of a hot chain"
even if the tuple wasn't ever updated?
2. HOT chains can be created while the index wasn't yet present, and
thus the indexed attributes of the root tuples can be different from
the most current tuple of a chain. If you only gather root tuples, we
could index incorrect data for that HOT chain. The correct approach
here is to index only the visible tuples, as those won't have been
updated in a non-HOT manner without all indexed attributes being
unchanged.
3. Having the index marked indisready before it contains any data is
going to slow down the indexing process significantly:
a. The main index build now must go through shared memory and buffer
locking, instead of being able to use backend-local memory
b. The tuple-wise insertion path (IndexAmRoutine->aminsert) can have a
significantly higher overhead than the bulk insertion logic in
ambuild(); in metrics of WAL, pages accessed (IO), and CPU cycles
spent.
So, I don't think moving away from ambuild() as basis for initially
building the index this is such a great idea.
(However, I do think that having an _option_ to build the index using
ambuildempty()+aminsert() instead of ambuild() might be useful, if
only to more easily compare "natural grown" indexes vs freshly built
ones, but that's completely orthogonal to CIC snapshotting
improvements.)
Kind regards,
Matthias van de Meent
Databricks (https://www.databricks.com)
pgsql-hackers by date: