Dear Mikhail,
Thanks for giving comments!
> But as far as I know, to solve this problem, we need to wait for slot.xmin during the [0]
> (WaitForOlderSnapshots) while creating index concurrently.
WaitForOlderSnapshots() waits other transactions which can access older tuples
than the specified (=current) transaction, right? I think it does not solve our issue.
Assuming that same workloads [1] are executed, slot.xmin on node2 is arbitrary
older than noted SQL, and WaitForOlderSnapshots(slot.xmin) is added in
ReindexRelationConcurrently(). In this case, transaction older than slot.xmin
does not exist at step 5, so the REINDEX will finish immediately. Then, the worker
receives changes at step 7 so it is problematic if worker uses the reindexed index.
From another point of view... this approach must fix REINDEX code, but we should
not modify other component of codes as much as possible. This feature is related
with the replication so that changes should be closed within the replication subdir.
[1]:
https://www.postgresql.org/message-id/TYAPR01MB5692541820BCC365C69442FFF54F2%40TYAPR01MB5692.jpnprd01.prod.outlook.com
Best regards,
Hayato Kuroda
FUJITSU LIMITED