Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 - Mailing list pgsql-hackers

From Japin Li
Subject Re: [WIP]Vertical Clustered Index (columnar store extension) - take2
Date
Msg-id ME0P300MB0445ED7C368B81944E578664B616A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
Whole thread Raw
List pgsql-hackers
On Thu, 18 Sep 2025 at 15:07, Peter Smith <smithpb2250@gmail.com> wrote:
> Hi Timur,
>
> Thanks for your ongoing work for this patch.
>
> On Thu, Sep 18, 2025 at 1:15 AM Timur Magomedov
> <t.magomedov@postgrespro.ru> wrote:
> ...
>> I've found (using valgrind) some cases of reading random garbage after
>> allocated memory. Investigation showed this was caused by converting
>> some nodes to VciScanState* even if they have smaller size allocated
>> originally. So accessing VciScanState fields was accessing memory after
>> palloc'ed memory which could be used by any other allocation.
>>
>> I think converting to VciScanState* is only valid for nodes with tag
>> T_CustomScanState so here is a patch that adds assertions for this:
>> 0001-Assert-corrrect-node-tags-when-casting-to-VciScanSta.patch
>
> What exactly did Valgrind report? For example, you said the
> VciScanState points beyond allocated memory. Do you have any more
> clues, like where that happened? Did you discover where that (smaller
> than it should be) memory was allocated in the first place?
>
>>
>> VCI v23 passes the tests with this patch applied.
>
> OK. I am not 100% certain about the asserts, but since the existing
> VCI tests are passing, I have merged your patch as-is into v24-0002. I
> guess we will find out later if the bug below is due to an old code
> cast problem or a new code assert problem.
>
>>
>> There are queries that fail unfortunately. I've added one of them to
>> bugs.sql:
>> 0002-Reproducer-of-invalid-cast-to-VciScanState.patch
>> Node with tag 420 (T_NestLoopState) is cast to VciScanState* that fails
>> newly added assertion.
>>
>> I'm not sure where to look next to fix this. Looking forward for you
>> comments and ideas.
>
> OK. I ran with your 2nd patch applied and reproduced the core-dump
> below, where it tripped over one of your new Asserts at
> executor/vci_sort.c:89. I can see it is an unexpected value
> T_NestLoopState.
>
I'm curious about the assert here. Why is it that children must be CustomScan?
It seems that there is a JOIN node, and we do not support customizing the JOIN
operation, right?

> (gdb) bt 15
> #0  0x00007ff948aa62c7 in raise () from /lib64/libc.so.6
> #1  0x00007ff948aa79b8 in abort () from /lib64/libc.so.6
> #2  0x0000000000c07977 in ExceptionalCondition
> (conditionName=0x7ff940839fa8 "scanstate->vci.css.ss.ps.type ==
> T_CustomScanState", fileName=0x7ff940839f90 "executor/vci_sort.c",
>     lineNumber=89) at assert.c:66
> #3  0x00007ff9408084e6 in vci_sort_ExecCustomPlan (node=0x2a862f0) at
> executor/vci_sort.c:89
> #4  0x000000000079d5bd in ExecCustomScan (pstate=0x2a862f0) at nodeCustom.c:137
> #5  0x000000000077f693 in ExecProcNodeInstr (node=0x2a862f0) at
> execProcnode.c:486
> #6  0x000000000077f664 in ExecProcNodeFirst (node=0x2a862f0) at
> execProcnode.c:470
> #7  0x0000000000772b72 in ExecProcNode (node=0x2a862f0) at
> ../../../src/include/executor/executor.h:316
> #8  0x0000000000775774 in ExecutePlan (queryDesc=0x2a89100,
> operation=CMD_SELECT, sendTuples=true, numberTuples=0,
> direction=ForwardScanDirection, dest=0xe5b1a0 <donothingDR>)
>     at execMain.c:1697
> #9  0x000000000077317b in standard_ExecutorRun (queryDesc=0x2a89100,
> direction=ForwardScanDirection, count=0) at execMain.c:366
> #10 0x00007ff9407f9efd in vci_executor_run_routine
> (queryDesc=0x2a89100, direction=ForwardScanDirection, count=0) at
> executor/vci_executor.c:178
> #11 0x0000000000772ff5 in ExecutorRun (queryDesc=0x2a89100,
> direction=ForwardScanDirection, count=0) at execMain.c:301
> #12 0x00000000006b7f66 in ExplainOnePlan (plannedstmt=0x2a8a628,
> into=0x0, es=0x2a81388,
>     queryString=0x28b0fd0 "EXPLAIN (ANALYZE, COSTS FALSE, BUFFERS
> FALSE, TIMING FALSE, SUMMARY FALSE)\nSELECT *\n  FROM main m\n  JOIN
> secondary s\n\tON m.id = s.main_id\n WHERE s.val in (\n\t\tSELECT
> MAX(val)\n\t\t  FROM secondary s2\n\t\t W"..., params=0x0,
> queryEnv=0x0, planduration=0x7ffe51311320, bufusage=0x0,
> mem_counters=0x0) at explain.c:579
> #13 0x00000000006b799a in standard_ExplainOneQuery (query=0x2ac21f8,
> cursorOptions=2048, into=0x0, es=0x2a81388,
>     queryString=0x28b0fd0 "EXPLAIN (ANALYZE, COSTS FALSE, BUFFERS
> FALSE, TIMING FALSE, SUMMARY FALSE)\nSELECT *\n  FROM main m\n  JOIN
> secondary s\n\tON m.id = s.main_id\n WHERE s.val in (\n\t\tSELECT
> MAX(val)\n\t\t  FROM secondary s2\n\t\t W"..., params=0x0,
> queryEnv=0x0) at explain.c:372
> #14 0x00007ff9407f9ff3 in vci_explain_one_query_routine
> (queryDesc=0x2ac21f8, cursorOptions=2048, into=0x0, es=0x2a81388,
>     queryString=0x28b0fd0 "EXPLAIN (ANALYZE, COSTS FALSE, BUFFERS
> FALSE, TIMING FALSE, SUMMARY FALSE)\nSELECT *\n  FROM main m\n  JOIN
> secondary s\n\tON m.id = s.main_id\n WHERE s.val in (\n\t\tSELECT
> MAX(val)\n\t\t  FROM secondary s2\n\t\t W"..., params=0x0,
> queryEnv=0x0) at executor/vci_executor.c:224
> (More stack frames follow...)
> (gdb)
>
> I will keep investigating it...
>
> I have not included your test case in the v24* patches because I
> didn't want this known test failure to mask out any other unknown test
> problems that might occur.

--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Proposal: Conflict log history table for Logical Replication
Next
From: Алена Васильева
Date:
Subject: PATCH for BUG #18785: Pointer bmr.rel, dereferenced by passing as 1st parameter to function is checked for NULL later