Re: BUG #15677: Crash while deleting from partitioned table - Mailing list pgsql-bugs
From | Amit Langote |
---|---|
Subject | Re: BUG #15677: Crash while deleting from partitioned table |
Date | |
Msg-id | 3ad5ba71-d200-96da-f903-7e3b16416140@lab.ntt.co.jp Whole thread Raw |
In response to | BUG #15677: Crash while deleting from partitioned table (PG Bug reporting form <noreply@postgresql.org>) |
Responses |
Re: BUG #15677: Crash while deleting from partitioned table
Re: BUG #15677: Crash while deleting from partitioned table |
List | pgsql-bugs |
Hi, On 2019/03/08 16:29, PG Bug reporting form wrote: > The following bug has been logged on the website: > > Bug reference: 15677 > Logged by: Norbert Benkocs > Email address: infernorb@gmail.com > PostgreSQL version: 11.2 > Operating system: CentOS Linux release 7.4.1708 (Core) > Description: > > Version: PostgreSQL 11.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 > 20150623 (Red Hat 4.8.5-36), 64-bit > OS: CentOS Linux release 7.4.1708 (Core) > > Hello, > > We have an insert/update/delete query on a partitioned table (multiple > CTE-s) that causes our PostgreSQL server to crash once every few days. We > haven't been able to reproduce this crash so far, and re-running the same > query with the same parameters didn't result in a crash either. The table in > question is updated thousands of times each hour, and most of these work > fine. > Previously this table was not partitioned, we started seeing the crash after > partitioning the table. Thanks for the report and for providing detailed information which was useful for diagnosing the bug. I looked at this: > (gdb) bt > #0 ExecInitModifyTable (node=node@entry=0x2568180, > estate=estate@entry=0x35f1440, eflags=eflags@entry=0) at > nodeModifyTable.c:2327 > #1 0x000000000060af88 in ExecInitNode (node=0x2568180, > estate=estate@entry=0x35f1440, eflags=eflags@entry=0) at > execProcnode.c:174 > #2 0x0000000000606fdd in EvalPlanQualStart (epqstate=0x3773848, > epqstate=0x3773848, planTree=0x36c3f08, parentestate=0xa6) at > execMain.c:3257 note: ExecInitModifyTable() being called from EvalPlanQualStart(). and: > (gdb) p *mtstate > $4 = {ps = {type = T_ModifyTableState, plan = 0x2568180, state = 0x35f1440, > ExecProcNode = 0x626e30 <ExecModifyTable>, ExecProcNodeReal = 0x0, > instrument = 0x0, worker_instrument = 0x0, worker_jit_instrument = 0x0, qual > = 0x0, lefttree = 0x0, righttree = 0x0, initPlan = 0x0, subPlan = 0x0, > chgParam = 0x0, ps_ResultTupleSlot = 0x0, ps_ExprContext = 0x0, > ps_ProjInfo = 0x0, scandesc = 0x0}, operation = CMD_DELETE, canSetTag = > false, mt_done = false, mt_plans = 0x39c8088, mt_nplans = 15, mt_whichplan = > 0, resultRelInfo = 0x35f3f78, rootResultRelInfo = 0xc0, mt_arowmarks = note: rootResultRelInfo = 0xc0 and: > (gdb) p *estate > $7 = {type = T_EState, es_direction = ForwardScanDirection, es_snapshot = > 0x208ba70, es_crosscheck_snapshot = 0x0, es_range_table = 0x282af48, > es_plannedstmt = 0x2829e98, es_sourceText = 0x0, es_junkFilter = 0x0, > es_output_cid = 0, es_result_relations = 0x35f3378, es_num_result_relations > = 34, es_result_relation_info = 0x0, > es_root_result_relations = 0x0, es_num_root_result_relations = 0, note: es_root_result_relations = 0x0 From the above, I could conclude that EvalPlanQualStart() is not copying the value of es_root_result_relations from the parent EState. That means ExecInitModifyTable called in the context of EvalPlanQual() checking has the wrong value of es_root_result_relations to begin with, so the value it computes for rootResultRelInfo for the ModifyTableState it's initializing is wrong (0xc0 as seen above). To reproduce, use these steps (needs 2 sessions to invoke EvalPlanQual at all): Setup: create table p (a int) partition by list (a); create table p1 partition of p for values in (1); insert into p values (1); Session 1: begin; update p set a = a; Session 2: with u as (update p set a = a returning p.*) update p set a = u.a from u; <blocks> Session 1: commit; Session 2: <invokes-EvalPlanQual-and-crashes> server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. This can be fixed by the attached patch, which modifies EvalPlanQualStart to copy the value of es_root_result_relations from its parent EState. Thanks, Amit
Attachment
pgsql-bugs by date: