Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 - Mailing list pgsql-hackers
From | Japin Li |
---|---|
Subject | Re: [WIP]Vertical Clustered Index (columnar store extension) - take2 |
Date | |
Msg-id | ME0P300MB0445ED7C368B81944E578664B616A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM Whole thread Raw |
List | pgsql-hackers |
On Thu, 18 Sep 2025 at 15:07, Peter Smith <smithpb2250@gmail.com> wrote: > Hi Timur, > > Thanks for your ongoing work for this patch. > > On Thu, Sep 18, 2025 at 1:15 AM Timur Magomedov > <t.magomedov@postgrespro.ru> wrote: > ... >> I've found (using valgrind) some cases of reading random garbage after >> allocated memory. Investigation showed this was caused by converting >> some nodes to VciScanState* even if they have smaller size allocated >> originally. So accessing VciScanState fields was accessing memory after >> palloc'ed memory which could be used by any other allocation. >> >> I think converting to VciScanState* is only valid for nodes with tag >> T_CustomScanState so here is a patch that adds assertions for this: >> 0001-Assert-corrrect-node-tags-when-casting-to-VciScanSta.patch > > What exactly did Valgrind report? For example, you said the > VciScanState points beyond allocated memory. Do you have any more > clues, like where that happened? Did you discover where that (smaller > than it should be) memory was allocated in the first place? > >> >> VCI v23 passes the tests with this patch applied. > > OK. I am not 100% certain about the asserts, but since the existing > VCI tests are passing, I have merged your patch as-is into v24-0002. I > guess we will find out later if the bug below is due to an old code > cast problem or a new code assert problem. > >> >> There are queries that fail unfortunately. I've added one of them to >> bugs.sql: >> 0002-Reproducer-of-invalid-cast-to-VciScanState.patch >> Node with tag 420 (T_NestLoopState) is cast to VciScanState* that fails >> newly added assertion. >> >> I'm not sure where to look next to fix this. Looking forward for you >> comments and ideas. > > OK. I ran with your 2nd patch applied and reproduced the core-dump > below, where it tripped over one of your new Asserts at > executor/vci_sort.c:89. I can see it is an unexpected value > T_NestLoopState. > I'm curious about the assert here. Why is it that children must be CustomScan? It seems that there is a JOIN node, and we do not support customizing the JOIN operation, right? > (gdb) bt 15 > #0 0x00007ff948aa62c7 in raise () from /lib64/libc.so.6 > #1 0x00007ff948aa79b8 in abort () from /lib64/libc.so.6 > #2 0x0000000000c07977 in ExceptionalCondition > (conditionName=0x7ff940839fa8 "scanstate->vci.css.ss.ps.type == > T_CustomScanState", fileName=0x7ff940839f90 "executor/vci_sort.c", > lineNumber=89) at assert.c:66 > #3 0x00007ff9408084e6 in vci_sort_ExecCustomPlan (node=0x2a862f0) at > executor/vci_sort.c:89 > #4 0x000000000079d5bd in ExecCustomScan (pstate=0x2a862f0) at nodeCustom.c:137 > #5 0x000000000077f693 in ExecProcNodeInstr (node=0x2a862f0) at > execProcnode.c:486 > #6 0x000000000077f664 in ExecProcNodeFirst (node=0x2a862f0) at > execProcnode.c:470 > #7 0x0000000000772b72 in ExecProcNode (node=0x2a862f0) at > ../../../src/include/executor/executor.h:316 > #8 0x0000000000775774 in ExecutePlan (queryDesc=0x2a89100, > operation=CMD_SELECT, sendTuples=true, numberTuples=0, > direction=ForwardScanDirection, dest=0xe5b1a0 <donothingDR>) > at execMain.c:1697 > #9 0x000000000077317b in standard_ExecutorRun (queryDesc=0x2a89100, > direction=ForwardScanDirection, count=0) at execMain.c:366 > #10 0x00007ff9407f9efd in vci_executor_run_routine > (queryDesc=0x2a89100, direction=ForwardScanDirection, count=0) at > executor/vci_executor.c:178 > #11 0x0000000000772ff5 in ExecutorRun (queryDesc=0x2a89100, > direction=ForwardScanDirection, count=0) at execMain.c:301 > #12 0x00000000006b7f66 in ExplainOnePlan (plannedstmt=0x2a8a628, > into=0x0, es=0x2a81388, > queryString=0x28b0fd0 "EXPLAIN (ANALYZE, COSTS FALSE, BUFFERS > FALSE, TIMING FALSE, SUMMARY FALSE)\nSELECT *\n FROM main m\n JOIN > secondary s\n\tON m.id = s.main_id\n WHERE s.val in (\n\t\tSELECT > MAX(val)\n\t\t FROM secondary s2\n\t\t W"..., params=0x0, > queryEnv=0x0, planduration=0x7ffe51311320, bufusage=0x0, > mem_counters=0x0) at explain.c:579 > #13 0x00000000006b799a in standard_ExplainOneQuery (query=0x2ac21f8, > cursorOptions=2048, into=0x0, es=0x2a81388, > queryString=0x28b0fd0 "EXPLAIN (ANALYZE, COSTS FALSE, BUFFERS > FALSE, TIMING FALSE, SUMMARY FALSE)\nSELECT *\n FROM main m\n JOIN > secondary s\n\tON m.id = s.main_id\n WHERE s.val in (\n\t\tSELECT > MAX(val)\n\t\t FROM secondary s2\n\t\t W"..., params=0x0, > queryEnv=0x0) at explain.c:372 > #14 0x00007ff9407f9ff3 in vci_explain_one_query_routine > (queryDesc=0x2ac21f8, cursorOptions=2048, into=0x0, es=0x2a81388, > queryString=0x28b0fd0 "EXPLAIN (ANALYZE, COSTS FALSE, BUFFERS > FALSE, TIMING FALSE, SUMMARY FALSE)\nSELECT *\n FROM main m\n JOIN > secondary s\n\tON m.id = s.main_id\n WHERE s.val in (\n\t\tSELECT > MAX(val)\n\t\t FROM secondary s2\n\t\t W"..., params=0x0, > queryEnv=0x0) at executor/vci_executor.c:224 > (More stack frames follow...) > (gdb) > > I will keep investigating it... > > I have not included your test case in the v24* patches because I > didn't want this known test failure to mask out any other unknown test > problems that might occur. -- Regards, Japin Li ChengDu WenWu Information Technology Co., Ltd.
pgsql-hackers by date: