Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 - Mailing list pgsql-hackers
From | Peter Smith |
---|---|
Subject | Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 |
Date | |
Msg-id | CAHut+PsyZ_LQE01WmQ4X32PqLGFTiKbohVD+5o7UL_z_GNnjWQ@mail.gmail.com Whole thread Raw |
In response to | Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 (Timur Magomedov <t.magomedov@postgrespro.ru>) |
List | pgsql-hackers |
Hi Timur, Thanks for your ongoing work for this patch. On Thu, Sep 18, 2025 at 1:15 AM Timur Magomedov <t.magomedov@postgrespro.ru> wrote: ... > I've found (using valgrind) some cases of reading random garbage after > allocated memory. Investigation showed this was caused by converting > some nodes to VciScanState* even if they have smaller size allocated > originally. So accessing VciScanState fields was accessing memory after > palloc'ed memory which could be used by any other allocation. > > I think converting to VciScanState* is only valid for nodes with tag > T_CustomScanState so here is a patch that adds assertions for this: > 0001-Assert-corrrect-node-tags-when-casting-to-VciScanSta.patch What exactly did Valgrind report? For example, you said the VciScanState points beyond allocated memory. Do you have any more clues, like where that happened? Did you discover where that (smaller than it should be) memory was allocated in the first place? > > VCI v23 passes the tests with this patch applied. OK. I am not 100% certain about the asserts, but since the existing VCI tests are passing, I have merged your patch as-is into v24-0002. I guess we will find out later if the bug below is due to an old code cast problem or a new code assert problem. > > There are queries that fail unfortunately. I've added one of them to > bugs.sql: > 0002-Reproducer-of-invalid-cast-to-VciScanState.patch > Node with tag 420 (T_NestLoopState) is cast to VciScanState* that fails > newly added assertion. > > I'm not sure where to look next to fix this. Looking forward for you > comments and ideas. OK. I ran with your 2nd patch applied and reproduced the core-dump below, where it tripped over one of your new Asserts at executor/vci_sort.c:89. I can see it is an unexpected value T_NestLoopState. (gdb) bt 15 #0 0x00007ff948aa62c7 in raise () from /lib64/libc.so.6 #1 0x00007ff948aa79b8 in abort () from /lib64/libc.so.6 #2 0x0000000000c07977 in ExceptionalCondition (conditionName=0x7ff940839fa8 "scanstate->vci.css.ss.ps.type == T_CustomScanState", fileName=0x7ff940839f90 "executor/vci_sort.c", lineNumber=89) at assert.c:66 #3 0x00007ff9408084e6 in vci_sort_ExecCustomPlan (node=0x2a862f0) at executor/vci_sort.c:89 #4 0x000000000079d5bd in ExecCustomScan (pstate=0x2a862f0) at nodeCustom.c:137 #5 0x000000000077f693 in ExecProcNodeInstr (node=0x2a862f0) at execProcnode.c:486 #6 0x000000000077f664 in ExecProcNodeFirst (node=0x2a862f0) at execProcnode.c:470 #7 0x0000000000772b72 in ExecProcNode (node=0x2a862f0) at ../../../src/include/executor/executor.h:316 #8 0x0000000000775774 in ExecutePlan (queryDesc=0x2a89100, operation=CMD_SELECT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0xe5b1a0 <donothingDR>) at execMain.c:1697 #9 0x000000000077317b in standard_ExecutorRun (queryDesc=0x2a89100, direction=ForwardScanDirection, count=0) at execMain.c:366 #10 0x00007ff9407f9efd in vci_executor_run_routine (queryDesc=0x2a89100, direction=ForwardScanDirection, count=0) at executor/vci_executor.c:178 #11 0x0000000000772ff5 in ExecutorRun (queryDesc=0x2a89100, direction=ForwardScanDirection, count=0) at execMain.c:301 #12 0x00000000006b7f66 in ExplainOnePlan (plannedstmt=0x2a8a628, into=0x0, es=0x2a81388, queryString=0x28b0fd0 "EXPLAIN (ANALYZE, COSTS FALSE, BUFFERS FALSE, TIMING FALSE, SUMMARY FALSE)\nSELECT *\n FROM main m\n JOIN secondary s\n\tON m.id = s.main_id\n WHERE s.val in (\n\t\tSELECT MAX(val)\n\t\t FROM secondary s2\n\t\t W"..., params=0x0, queryEnv=0x0, planduration=0x7ffe51311320, bufusage=0x0, mem_counters=0x0) at explain.c:579 #13 0x00000000006b799a in standard_ExplainOneQuery (query=0x2ac21f8, cursorOptions=2048, into=0x0, es=0x2a81388, queryString=0x28b0fd0 "EXPLAIN (ANALYZE, COSTS FALSE, BUFFERS FALSE, TIMING FALSE, SUMMARY FALSE)\nSELECT *\n FROM main m\n JOIN secondary s\n\tON m.id = s.main_id\n WHERE s.val in (\n\t\tSELECT MAX(val)\n\t\t FROM secondary s2\n\t\t W"..., params=0x0, queryEnv=0x0) at explain.c:372 #14 0x00007ff9407f9ff3 in vci_explain_one_query_routine (queryDesc=0x2ac21f8, cursorOptions=2048, into=0x0, es=0x2a81388, queryString=0x28b0fd0 "EXPLAIN (ANALYZE, COSTS FALSE, BUFFERS FALSE, TIMING FALSE, SUMMARY FALSE)\nSELECT *\n FROM main m\n JOIN secondary s\n\tON m.id = s.main_id\n WHERE s.val in (\n\t\tSELECT MAX(val)\n\t\t FROM secondary s2\n\t\t W"..., params=0x0, queryEnv=0x0) at executor/vci_executor.c:224 (More stack frames follow...) (gdb) I will keep investigating it... I have not included your test case in the v24* patches because I didn't want this known test failure to mask out any other unknown test problems that might occur. ====== Kind Regards, Peter Smith. Fujitsu Australia
pgsql-hackers by date: