Re: BUG #18630: Incorrect memory access inside ReindexIsProcessingIndex() on VACUUM - Mailing list pgsql-bugs

From Tender Wang
Subject Re: BUG #18630: Incorrect memory access inside ReindexIsProcessingIndex() on VACUUM
Date
Msg-id CAHewXNkKeE3i7aT_nysWphRHhwMnyG1r55Di=RCNtLa5dRSd4Q@mail.gmail.com
Whole thread Raw
In response to BUG #18630: Incorrect memory access inside ReindexIsProcessingIndex() on VACUUM  (PG Bug reporting form <noreply@postgresql.org>)
List pgsql-bugs


Tender Wang <tndrwang@gmail.com> 于2024年9月25日周三 17:28写道:


PG Bug reporting form <noreply@postgresql.org> 于2024年9月25日周三 13:35写道:
The following bug has been logged on the website:

Bug reference:      18630
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 17rc1
Operating system:   Ubuntu 22.04
Description:       

The following script:
psql -c "SELECT pg_sleep(5)" &

echo "
SET lock_timeout = '3s';
CREATE TABLE t(i int, t text);
REINDEX TABLE CONCURRENTLY t;
SELECT pg_sleep(3);
" | psql

psql -c "VACUUM (PROCESS_MAIN FALSE, FULL TRUE) t;"

produces:
WARNING:  cannot reindex invalid index "pg_toast.pg_toast_16384_index_ccnew"
on TOAST table, skipping

and then a Valgrind-detected error:
==00:00:00:10.727 3193327== Invalid read of size 4
==00:00:00:10.727 3193327==    at 0x5A6D80: list_member_oid (list.c:726)
==00:00:00:10.727 3193327==    by 0x33FE2F: ReindexIsProcessingIndex
(index.c:4083)
==00:00:00:10.727 3193327==    by 0x27B43F: systable_beginscan
(genam.c:396)
==00:00:00:10.727 3193327==    by 0x4CE8F9: vac_update_datfrozenxid
(vacuum.c:1723)
==00:00:00:10.727 3193327==    by 0x4CCFAB: vacuum (vacuum.c:691)
==00:00:00:10.727 3193327==    by 0x4CC910: ExecVacuum (vacuum.c:449)
==00:00:00:10.727 3193327==    by 0x7CE082: standard_ProcessUtility
(utility.c:859)
==00:00:00:10.727 3193327==    by 0x7CD61D: ProcessUtility (utility.c:523)
==00:00:00:10.727 3193327==    by 0x7CBE98: PortalRunUtility
(pquery.c:1158)
==00:00:00:10.727 3193327==    by 0x7CC10F: PortalRunMulti (pquery.c:1316)
==00:00:00:10.727 3193327==    by 0x7CB559: PortalRun (pquery.c:791)
==00:00:00:10.727 3193327==    by 0x7C3C7A: exec_simple_query
(postgres.c:1284)
==00:00:00:10.727 3193327==  Address 0x72f4878 is 7,496 bytes inside a
recently re-allocated block of size 8,192 alloc'd
==00:00:00:10.727 3193327==    at 0x4848899: malloc
(vg_replace_malloc.c:381)
==00:00:00:10.727 3193327==    by 0x9FDA95: AllocSetContextCreateInternal
(aset.c:444)
==00:00:00:10.727 3193327==    by 0x2E0BBC: AtStart_Memory (xact.c:1206)
==00:00:00:10.727 3193327==    by 0x2E1C56: StartTransaction (xact.c:2143)
==00:00:00:10.727 3193327==    by 0x2E2CA8: StartTransactionCommand
(xact.c:3050)
==00:00:00:10.727 3193327==    by 0x9DF444: InitPostgres (postinit.c:830)
==00:00:00:10.727 3193327==    by 0x7C8B3A: PostgresMain (postgres.c:4349)
==00:00:00:10.727 3193327==    by 0x7BF5AE: BackendMain
(backend_startup.c:107)
==00:00:00:10.727 3193327==    by 0x6D1E75: postmaster_child_launch
(launch_backend.c:274)
==00:00:00:10.727 3193327==    by 0x6D7CE8: BackendStartup
(postmaster.c:3420)
==00:00:00:10.727 3193327==    by 0x6D539A: ServerLoop (postmaster.c:1653)
==00:00:00:10.727 3193327==    by 0x6D4C92: PostmasterMain
(postmaster.c:1351)
==00:00:00:10.727 3193327==
...
2024-09-25 02:44:16.496 UTC|||66f378f6.30b9b3|LOG:  server process (PID
3193327) exited with exit code 1
2024-09-25 02:44:16.496 UTC|||66f378f6.30b9b3|DETAIL:  Failed process was
running: VACUUM (PROCESS_MAIN FALSE, FULL TRUE) t;

or an assertion failure (when executed without Valgrind):
TRAP: failed Assert("IsOidList(list)"), File: "list.c", Line: 726, PID:
3213057

Reproduced on REL_16_STABLE (starting from 4211fbd84) .. master.


Thanks for reporting.  I can reproduce this issue.

When this statement "REINDEX TABLE CONCURRENTLY t;"  failed because of lock timeout.
If we do vacuum like this case, when we do toast_relid of table t, we will get two index oids.
pg_toast.pg_toast_16384_index_ccnew is invalid because the REINDEX failed. 

Now we only report warings in reindex_relation(). The pg_toast.pg_toast_16384_index_ccnew is
still on the pendingReindexedIndexes list. After finishing the toast_rel of table t, the transatiocn committed,
and the memory of pendingReindexedIndexes was reset but not NIL.  So it will trigger assert failure when calling ReindexIsProcessingIndex().

I think we can remove the invalid index oid from the pendingReindexedIndexes instead of reporting warning.
I try this way, and no assert failure again. See the attached patch.


Adding  if (flags & REINDEX_REL_SUPPRESS_INDEX_USE) check before calling RemoveReindexPending() may look better.
Any thoughts?

--
Thanks,
Tender Wang

pgsql-bugs by date:

Previous
From: Wolfgang Walther
Date:
Subject: Re: BUG #18632: Whether you need to consider modifying the array's handling of delimiters?
Next
From: "David G. Johnston"
Date:
Subject: Re: BUG #18632: Whether you need to consider modifying the array's handling of delimiters?