Thread: SEGFAULT on a concurrent UPDATE of mix of local and foreign partitions
Hi, Postgres SEGFAULT'ed on the UPDATE of mix of local and foreign partitions. Initialization - see t.sql For replaying this segfault just execute in parallel: UPDATE test SET x = x - 1; The problem was introduced by commit 1375422. ExecUpdate has found a concurrently updated tuples and starts subplan evaluation. This operation creates new EState for EPQState and sets es_result_relations in NULL value. Next, ExecInitNode(subplan) is launched and underlying ExecInitForeignScan tries to access to an element of es_result_relations. This causes SEGFAULT. I studied this problem shortly. I think, EPQState can use es_result_relations of a parent EState. Patch in attachment fixes this. check-world passed clearly. -- regards, Andrey Lepikhov Postgres Professional
Attachment
Re: SEGFAULT on a concurrent UPDATE of mix of local and foreign partitions
From
Heikki Linnakangas
Date:
On 06/08/2021 08:34, Andrey Lepikhov wrote: > Hi, > Postgres SEGFAULT'ed on the UPDATE of mix of local and foreign partitions. > Initialization - see t.sql > For replaying this segfault just execute in parallel: > UPDATE test SET x = x - 1; > > The problem was introduced by commit 1375422. > ExecUpdate has found a concurrently updated tuples and starts subplan > evaluation. This operation creates new EState for EPQState and sets > es_result_relations in NULL value. Next, ExecInitNode(subplan) is > launched and underlying ExecInitForeignScan tries to access to an > element of es_result_relations. This causes SEGFAULT. Thanks for the report! > I studied this problem shortly. I think, EPQState can use > es_result_relations of a parent EState. Patch in attachment fixes this. > check-world passed clearly. Hmm, that seems to work, but I have a few questions: Can we be sure that the es_result_relations entry of the result relation has been initialized when the EPQState is constructed? The general model now is that you initialize them lazily, when needed. Does it ever make sense to re-evaluate a direct Foreign Update as part of EvalPlanQual? It doesn't seem sane to me, you cannot just run an UPDATE statement again and expect it to be idempotent. So I think it would make sense skip this during EvalPlanQual processing altogether. And no need to call the FDW's Begin/EndDirectModify functions either. It's not clear to me what guarantee there is that a direct-modify Foreign Update/Delete node is never re-evaluated as part of EvalPlanQual. Is it because the a ModifyTable node is always at the top of the plan tree? Or is it because the FDW never uses direct-modify in queries where EvalPlanQual might be needed? Or because we use ROW_MARK_COPY in those cases? In any case, re-evaluating a direct-modify statement doesn't seem sane, so I propose the attached patch to add some runtime checks for that and to avoid the original segfault. - Heikki
Attachment
Re: SEGFAULT on a concurrent UPDATE of mix of local and foreign partitions
From
"Andrey V. Lepikhov"
Date:
On 8/10/21 1:29 AM, Heikki Linnakangas wrote: > On 06/08/2021 08:34, Andrey Lepikhov wrote: >> Hi, >> Postgres SEGFAULT'ed on the UPDATE of mix of local and foreign >> partitions. >> Initialization - see t.sql >> For replaying this segfault just execute in parallel: >> UPDATE test SET x = x - 1; >> >> The problem was introduced by commit 1375422. >> ExecUpdate has found a concurrently updated tuples and starts subplan >> evaluation. This operation creates new EState for EPQState and sets >> es_result_relations in NULL value. Next, ExecInitNode(subplan) is >> launched and underlying ExecInitForeignScan tries to access to an >> element of es_result_relations. This causes SEGFAULT. > > Thanks for the report! > >> I studied this problem shortly. I think, EPQState can use >> es_result_relations of a parent EState. Patch in attachment fixes this. >> check-world passed clearly. > > Hmm, that seems to work, but I have a few questions: > > Can we be sure that the es_result_relations entry of the result relation > has been initialized when the EPQState is constructed? The general model > now is that you initialize them lazily, when needed. As I understand the code correctly - yes, because a ModifyTable node higher and will initialize this field earlier. But in many cases it is not needed. > > Does it ever make sense to re-evaluate a direct Foreign Update as part > of EvalPlanQual? It doesn't seem sane to me, you cannot just run an > UPDATE statement again and expect it to be idempotent. So I think it > would make sense skip this during EvalPlanQual processing altogether. > And no need to call the FDW's Begin/EndDirectModify functions either. Agree. It is my first dive into the EPQ code. After studying i think this makes no sense. > > It's not clear to me what guarantee there is that a direct-modify > Foreign Update/Delete node is never re-evaluated as part of > EvalPlanQual. Is it because the a ModifyTable node is always at the top > of the plan tree? Or is it because the FDW never uses direct-modify in > queries where EvalPlanQual might be needed? Or because we use > ROW_MARK_COPY in those cases? ModifyTable node may be found in an underlying CTE, for example. And, right now, it's not clear for me too. > > In any case, re-evaluating a direct-modify statement doesn't seem sane, > so I propose the attached patch to add some runtime checks for that and > to avoid the original segfaultYour patch looks much better. -- regards, Andrey Lepikhov Postgres Professional
Re: SEGFAULT on a concurrent UPDATE of mix of local and foreign partitions
From
Heikki Linnakangas
Date:
On 10/08/2021 12:18, Andrey V. Lepikhov wrote: > On 8/10/21 1:29 AM, Heikki Linnakangas wrote: >> Does it ever make sense to re-evaluate a direct Foreign Update as part >> of EvalPlanQual? It doesn't seem sane to me, you cannot just run an >> UPDATE statement again and expect it to be idempotent. So I think it >> would make sense skip this during EvalPlanQual processing altogether. >> And no need to call the FDW's Begin/EndDirectModify functions either. > Agree. It is my first dive into the EPQ code. After studying i think > this makes no sense. Pushed the fix to master and v14. Thanks for the report! - Heikki