REINDEX INDEX pg_catalog.pg_default_acl_role_nsp_obj_index stuck waiting for transaction from the future in PG 13.16 - Mailing list pgsql-bugs

From Marcin Barczyński
Subject REINDEX INDEX pg_catalog.pg_default_acl_role_nsp_obj_index stuck waiting for transaction from the future in PG 13.16
Date
Msg-id CAP3o3PdTOTBngAq7cBo6DymqjAv+apMZv22tykf41r8HntUwBQ@mail.gmail.com
Whole thread Raw
Responses Re: REINDEX INDEX pg_catalog.pg_default_acl_role_nsp_obj_index stuck waiting for transaction from the future in PG 13.16
List pgsql-bugs
Hello!

REINDEX INDEX pg_catalog.pg_default_acl_role_nsp_obj_index got stuck
waiting for a transaction from the future in PG 13.16. Below is some
additional data that I was able to gather.

Index size is small:

postgres=# SELECT
pg_size_pretty(pg_table_size('pg_default_acl_role_nsp_obj_index'));
 pg_size_pretty
----------------
 16 kB
(1 row)


pg_locks showed that all locks had been granted:

postgres=# select * from pg_locks;
   locktype    | database | relation | page | tuple | virtualxid |
transactionid | classid | objid | objsubid | virtualtransaction |  pid
 |        mode         | granted | fastpath

---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+-------+---------------------+---------+----------
 relation      |    14172 |    12141 |      |       |            |
          |         |       |          | 4/158              | 87021 |
AccessShareLock     | t       | t
 virtualxid    |          |          |      |       | 4/158      |
          |         |       |          | 4/158              | 87021 |
ExclusiveLock       | t       | t
 virtualxid    |          |          |      |       | 3/28       |
          |         |       |          | 3/28               | 86635 |
ExclusiveLock       | t       | t
 transactionid |          |          |      |       |            |
      792 |         |       |          | 3/28               | 86635 |
ExclusiveLock       | t       | f
 relation      |    16385 |      826 |      |       |            |
          |         |       |          | 3/28               | 86635 |
ShareLock           | t       | f
 relation      |    16385 |      827 |      |       |            |
          |         |       |          | 3/28               | 86635 |
AccessExclusiveLock | t       | f
(6 rows)

Stack trace:

(gdb) bt
#0  0x00007f8790f04199 in __GI___select (nfds=nfds@entry=0,
readfds=readfds@entry=0x0, writefds=writefds@entry=0x0,
exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7fffd823be20)
    at ../sysdeps/unix/sysv/linux/select.c:69
#1  0x000000000082d3d9 in pg_usleep (microsec=1000) at ../port/pgsleep.c:56
#2  pg_usleep (microsec=1000) at ../port/pgsleep.c:47
#3  XactLockTableWait (xid=2004, rel=<optimized out>, ctid=<optimized
out>, oper=XLTW_InsertIndexUnique) at storage/lmgr/lmgr.c:690
#4  0x000000000054b44b in heapam_index_build_range_scan
(heapRelation=0x7f86ca209a38, indexRelation=0x7f86ca20a5f8,
indexInfo=0x10cd008, allow_sync=<optimized out>, anyvisible=false,
progress=true,
    start_blockno=0, numblocks=4294967295, callback=0x568530
<_bt_build_callback>, callback_state=0x7fffd823c3b0, scan=0x10ca408)
at access/heap/heapam_handler.c:1552
#5  0x000000000056a3f8 in table_index_build_scan (scan=0x0,
callback_state=0x7fffd823c3b0, callback=0x568530 <_bt_build_callback>,
progress=true, allow_sync=true, index_info=0x10cd008,
    index_rel=0x7f86ca20a5f8, table_rel=0x7f86ca209a38) at
access/brin/../../../../src/include/access/tableam.h:1536
#6  _bt_spools_heapscan (indexInfo=0x10cd008,
buildstate=0x7fffd823c3b0, index=..., heap=0x7f86ca209a38) at
access/nbtree/nbtsort.c:478
#7  btbuild (heap=0x7f86ca209a38, index=0x7f86ca20a5f8,
indexInfo=0x10cd008) at access/nbtree/nbtsort.c:326
#8  0x00000000005ca5bc in index_build (heapRelation=0x7f86ca209a38,
indexRelation=0x7f86ca20a5f8, indexInfo=0x10cd008,
isreindex=<optimized out>, parallel=<optimized out>) at
catalog/index.c:3012
#9  0x00000000005cafbc in reindex_index (indexId=827,
skip_constraint_checks=<optimized out>, persistence=<optimized out>,
options=2) at catalog/index.c:3654
#10 0x00000000006666ed in ReindexIndex (indexRelation=<optimized out>,
options=0, concurrent=<optimized out>) at commands/indexcmds.c:2589
#11 0x0000000000846187 in standard_ProcessUtility (pstmt=<optimized
out>, queryString=0x1084758 "REINDEX INDEX
pg_catalog.pg_default_acl_role_nsp_obj_index",
context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
    queryEnv=0x0, dest=0x10855e0, qc=0x7fffd823ccf0) at tcop/utility.c:946
#12 0x00007f8790922c1d in pgss_ProcessUtility () from
/usr/pgsql-13/lib/pg_stat_statements.so
#13 0x0000000000846a39 in ProcessUtility (qc=0x7fffd823ccf0,
dest=0x10855e0, queryEnv=<optimized out>, params=<optimized out>,
context=PROCESS_UTILITY_TOPLEVEL, queryString=<optimized out>,
pstmt=0x1085320)
    at tcop/utility.c:520
#14 PortalRunUtility (portal=portal@entry=0x115b748,
pstmt=pstmt@entry=0x1085320, isTopLevel=isTopLevel@entry=true,
setHoldSnapshot=setHoldSnapshot@entry=false, dest=0x10855e0,
qc=0x7fffd823ccf0)
    at tcop/pquery.c:1153
#15 0x0000000000846bac in PortalRunMulti
(portal=portal@entry=0x115b748, isTopLevel=isTopLevel@entry=true,
setHoldSnapshot=setHoldSnapshot@entry=false,
dest=dest@entry=0x10855e0,
    altdest=altdest@entry=0x10855e0, qc=qc@entry=0x7fffd823ccf0) at
tcop/pquery.c:1316
#16 0x0000000000847060 in PortalRun (portal=0x115b748,
count=9223372036854775807, isTopLevel=<optimized out>,
run_once=<optimized out>, dest=0x10855e0, altdest=0x10855e0,
qc=0x7fffd823ccf0) at tcop/pquery.c:788
#17 0x0000000000840083 in exec_simple_query (query_string=0x1084758
"REINDEX INDEX  pg_catalog.pg_default_acl_role_nsp_obj_index") at
tcop/postgres.c:1241
#18 0x00000000008426fe in PostgresMain (argc=<optimized out>,
argv=<optimized out>, dbname=<optimized out>, username=<optimized
out>) at tcop/postgres.c:4426
#19 0x00000000007c6eaf in BackendRun (port=0x10cdb00) at
postmaster/postmaster.c:4560
#20 BackendStartup (port=0x10cdb00) at postmaster/postmaster.c:4244
#21 ServerLoop () at postmaster/postmaster.c:1742
#22 0x00000000007c7d27 in PostmasterMain (argc=3, argv=<optimized
out>) at postmaster/postmaster.c:1415
#23 0x00000000004fad81 in main (argc=3, argv=0x103f7b0) at main/main.c:210

The process waited for txid 2004 to complete.
But according to pg latest committed xid was:

(gdb) call ShmemVariableCache->latestCompletedXid
$5 = 791

And this was consistent with the fact that REINDEX was in transaction 792:

 16385 | redacted | 86635 |            |    16384 | redacted |
redacted    | 192.168.10.163 |                 |       39826 |
2024-08-14 08:23:39.689327-04 | 2024-08-14 08:23:39.712004-04 |
2024-08-1
4 08:23:39.712004-04 | 2024-08-14 08:23:39.712005-04 |
|                     | active |         792 |          792 | REINDEX
INDEX  pg_catalog.pg_default_acl_role_nsp_obj_index | client backend

Based on the following line in the source code:

xwait = HeapTupleHeaderGetUpdateXid(heapTuple->t_data);

I managed to figure out that the tuple was in a table with oid 826:

(gdb) print *heapTuple
$4 = {t_len = 82, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 0},
ip_posid = 13}, t_tableOid = 826, t_data = 0x7f2c0ab12eb8}

which is:

postgres=# select relname from pg_class where oid = 826;
    relname
----------------
 pg_default_acl
(1 row)


And here is the content of `pg_default_acl`:

postgres=# SELECT * FROM heap_page_items(get_raw_page('pg_default_acl', 0));
 lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid
| t_infomask2 | t_infomask | t_hoff | t_bits | t_oid |
                            t_data


----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------+-------------------------------------------------------------------------------
-----------------
  1 |   8120 |        1 |     70 |    500 |   1941 |        0 | (0,3)
|       16389 |       1282 |     24 |        |       |
\xc94000000a0000009808000072430100000000000000090400000100000001000000c8400000
0a00000002000000
  2 |   8048 |        1 |     70 |    500 |   1941 |        1 | (0,4)
|       16389 |       1282 |     24 |        |       |
\xca4000000a0000000c40000072430100000000000000090400000100000001000000c8400000
0a00000002000000
  3 |   7976 |        1 |     70 |   1941 |   2019 |        0 | (0,5)
|       49157 |       8450 |     24 |        |       |
\xc94000000a0000009808000072430100000000000000090400000100000001000000c8400000
0a00000002000000
  4 |   7904 |        1 |     70 |   1941 |   2019 |        1 | (0,6)
|       49157 |       8450 |     24 |        |       |
\xca4000000a0000000c40000072430100000000000000090400000100000001000000c8400000
0a00000002000000
  5 |   7832 |        1 |     70 |   2019 |      0 |        0 | (0,5)
|       32773 |      10242 |     24 |        |       |
\xc94000000a0000009808000072430100000000000000090400000100000001000000c8400000
0a00000002000000
  6 |   7760 |        1 |     70 |   2019 |      0 |        1 | (0,6)
|       32773 |      10242 |     24 |        |       |
\xca4000000a0000000c40000072430100000000000000090400000100000001000000c8400000
0a00000002000000
(6 rows)


The instance is running in a virtual machine, so I can gather more
data if needed. If it makes a difference, the instance has logical
replication set up. Stopping the replication did not help.



postgres=# select version();
                                                  version
-----------------------------------------------------------------------------------------------------------
 PostgreSQL 13.16 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.4.1
20231218 (Red Hat 11.4.1-3), 64-bit
(1 row)

--
Best regards,
Marcin Barczyński



pgsql-bugs by date:

Previous
From: Fire Emerald
Date:
Subject: Re: TLS session tickets disabled?
Next
From: Tom Lane
Date:
Subject: Re: TLS session tickets disabled?