Re: a misbehavior of partition row movement (?) - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Re: a misbehavior of partition row movement (?) |
Date | |
Msg-id | CA+HiwqEi9S2_1qcG-v98jPEEoNhs+Goonfz+Cjb4nqwf4CkcpQ@mail.gmail.com Whole thread Raw |
In response to | Re: a misbehavior of partition row movement (?) (Amit Langote <amitlangote09@gmail.com>) |
Responses |
Re: a misbehavior of partition row movement (?)
|
List | pgsql-hackers |
On Thu, Oct 14, 2021 at 6:00 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Mon, Sep 20, 2021 at 3:32 PM Amit Langote <amitlangote09@gmail.com> wrote: > > The problem was that the tuplestore > > (afterTriggers.query_stack[query_level].tuplestore) that I decided to > > use to store the AFTER trigger tuples of a partitioned table that is > > the target of an cross-partition update lives only for the duration of > > a given query. So that approach wouldn't work if the foreign key > > pointing into that partitioned table is marked INITIALLY DEFERRED. To > > fix, I added a List field to AfterTriggersData that stores the > > tuplestores to store the tuples of partitioned tables that undergo > > cross-partition updates in a transaction and are pointed to by > > INITIALLY DEFERRED foreign key constraints. I couldn't understand one > > comment about why using a tuplestore for such cases *might not* work, > > which as follows: > > > > * Note that we need triggers on foreign tables to be fired in exactly the > > * order they were queued, so that the tuples come out of the tuplestore in > > * the right order. To ensure that, we forbid deferrable (constraint) > > * triggers on foreign tables. This also ensures that such triggers do not > > * get deferred into outer trigger query levels, meaning that it's okay to > > * destroy the tuplestore at the end of the query level. > > > > I tried to break the approach using various test cases (some can be > > seen added by the patch to foreign_key.sql), but couldn't see the > > issue alluded to in the above comment. So I've marked the comment > > with an XXX note as follows: > > > > - * Note that we need triggers on foreign tables to be fired in exactly the > > - * order they were queued, so that the tuples come out of the tuplestore in > > - * the right order. To ensure that, we forbid deferrable (constraint) > > - * triggers on foreign tables. This also ensures that such triggers do not > > - * get deferred into outer trigger query levels, meaning that it's okay to > > - * destroy the tuplestore at the end of the query level. > > + * Note that we need triggers on foreign and partitioned tables to be fired in > > + * exactly the order they were queued, so that the tuples come out of the > > + * tuplestore in the right order. To ensure that, we forbid deferrable > > + * (constraint) triggers on foreign tables. This also ensures that such > > + * triggers do not get deferred into outer trigger query levels, meaning that > > + * it's okay to destroy the tuplestore at the end of the query level. > > + * XXX - update this paragraph if the new approach, whereby tuplestores in > > + * afterTriggers.deferred_tuplestores outlive any given query, can be proven > > + * to not really break any assumptions mentioned here. > > > > If anyone reading this can think of the issue the original comment > > seems to be talking about, please let me know. > > I brought this up in an off-list discussion with Robert and he asked > why I hadn't considered not using tuplestores to remember the tuples > in the first place, specifically pointing out that it may lead to > unnecessarily consuming a lot of memory when such updates move a bunch > of rows around. Like, we could simply remember the tuples to be > passed to the trigger function using their CTIDs as is done for normal > (single-heap-relation) updates, though in this case also remember the > OIDs of the source and the destination partitions of a particular > cross-partition update. > > I had considered that idea before but I think I had overestimated the > complexity of that approach so didn't go with it. I tried and the > resulting patch doesn't look all that complicated, with the added > bonus that the bulk update case shouldn't consume so much memory with > this approach like the previous tuplestore version would have. > > Updated patches attached. Patch 0001 conflicted with some pg_dump changes that were recently committed, so rebased. -- Amit Langote EDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: