Thread: Problem with transition tables on partitioned tables with foreign-table partitions
Problem with transition tables on partitioned tables with foreign-table partitions
From
Etsuro Fujita
Date:
Hi, While working on something else, I noticed that while we disallow transition tables on foreign tables, we allow transition tables on partitioned tables with foreign-table partitions, which produces incorrect results. Here is an example using postgres_fdw: create table parent (a text, b int) partition by list (a); create table loct (a text, b int); create foreign table child (a text, b int) server loopback options (table_name 'loct'); alter table parent attach partition child for values in ('AAA'); create function dump_insert() returns trigger language plpgsql as $$ begin raise notice 'trigger = %, new table = %', TG_NAME, (select string_agg(new_table::text, ', ' order by a) from new_table); return null; end; $$; create trigger parent_insert_trig after insert on parent referencing new table as new_table for each statement execute procedure dump_insert(); create function intercept_insert() returns trigger language plpgsql as $$ begin new.b = new.b + 1000; return new; end; $$; create trigger intercept_insert_loct before insert on loct for each row execute procedure intercept_insert(); insert into parent values ('AAA', 42); NOTICE: trigger = parent_insert_trig, new table = (AAA,42) INSERT 0 1 The trigger shows the original tuple created by the core, not the actual tuple inserted into the foreign-table partition, as postgres_fdw does not collect the actual tuple, of course! UPDATE/DELETE also produce incorrect results: create function dump_update() returns trigger language plpgsql as $$ begin raise notice 'trigger = %, old table = %, new table = %', TG_NAME, (select string_agg(old_table::text, ', ' order by a) from old_table), (select string_agg(new_table::text, ', ' order by a) from new_table); return null; end; $$; create trigger parent_update_trig after update on parent referencing old table as old_table new table as new_table for each statement execute procedure dump_update(); update parent set b = b + 1; NOTICE: trigger = parent_update_trig, old table = <NULL>, new table = <NULL> UPDATE 1 create function dump_delete() returns trigger language plpgsql as $$ begin raise notice 'trigger = %, old table = %', TG_NAME, (select string_agg(old_table::text, ', ' order by a) from old_table); return null; end; $$; create trigger parent_delete_trig after delete on parent referencing old table as old_table for each statement execute procedure dump_delete(); delete from parent; NOTICE: trigger = parent_delete_trig, old table = <NULL> DELETE 1 In both cases the triggers fail to show transition tuples. The cause of this is that postgres_fdw mistakenly performs direct modify for UPDATE/DELETE on the partition, which skips ExecARUpdateTriggers()/ExecARDeleteTriggers() entirely. To fix, I think we could disallow creating transition-table triggers on such partitioned tables, but I think that that is too restrictive because some users might have been using such triggers, avoiding this problem by e.g., modifying only plain-table partitions. So I would like to propose to fix this by the following: 1) disable using direct modify to modify foreign-table partitions if there are any transition-table triggers on the partitioned table, and then 2) throw an error in ExecARInsertTriggers()/ExecARUpdateTriggers()/ExecARDeleteTriggers() if they collects transition tuple(s) from a foreign-table partition. Attached is a WIP patch for that. Best regards, Etsuro Fujita
Attachment
Re: Problem with transition tables on partitioned tables with foreign-table partitions
From
Etsuro Fujita
Date:
Hi Amit-san, On Tue, Jul 1, 2025 at 4:42 PM Amit Langote <amitlangote09@gmail.com> wrote: > On Tue, Jul 1, 2025 at 11:55 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > > While working on something else, I noticed that while we disallow > > transition tables on foreign tables, we allow transition tables on > > partitioned tables with foreign-table partitions, which produces > > incorrect results. Here is an example using postgres_fdw: > > > > create table parent (a text, b int) partition by list (a); > > create table loct (a text, b int); > > create foreign table child (a text, b int) > > server loopback options (table_name 'loct'); > > alter table parent attach partition child for values in ('AAA'); > > > > create function dump_insert() returns trigger language plpgsql as > > $$ > > begin > > raise notice 'trigger = %, new table = %', > > TG_NAME, > > (select string_agg(new_table::text, ', ' order by a) > > from new_table); > > return null; > > end; > > $$; > > create trigger parent_insert_trig > > after insert on parent referencing new table as new_table > > for each statement execute procedure dump_insert(); > > > > create function intercept_insert() returns trigger language plpgsql as > > $$ > > begin > > new.b = new.b + 1000; > > return new; > > end; > > $$; > > create trigger intercept_insert_loct > > before insert on loct > > for each row execute procedure intercept_insert(); > > > > insert into parent values ('AAA', 42); > > NOTICE: trigger = parent_insert_trig, new table = (AAA,42) > > INSERT 0 1 > > > > The trigger shows the original tuple created by the core, not the > > actual tuple inserted into the foreign-table partition, as > > postgres_fdw does not collect the actual tuple, of course! > > Maybe I'm missing something, but given that the intercept_insert() > function is applied during the "remote" operation, isn't it expected > that the parent table's trigger for a "local" operation shows the > original tuple? That is the question of how we define the after image of a row inserted into a foreign table, but consider the case where the partition is a plain table: create table parent (a text, b int) partition by list (a); create table child partition of parent for values in ('AAA'); create trigger intercept_insert_child before insert on child for each row execute procedure intercept_insert(); insert into parent values ('AAA', 42); NOTICE: trigger = parent_insert_trig, new table = (AAA,1042) INSERT 0 1 The trigger shows the final tuple, not the original tuple. So from a consistency perspective, I thought it would be good if the trigger does so even in the case where the partition is a foreign table. > > So I would > > like to propose to fix this by the following: 1) disable using direct > > modify to modify foreign-table partitions if there are any > > transition-table triggers on the partitioned table, and then 2) throw > > an error in ExecARInsertTriggers()/ExecARUpdateTriggers()/ExecARDeleteTriggers() > > if they collects transition tuple(s) from a foreign-table partition. > > Is (2) intended to catch cases that occur during a foreign insert and > foreign/non-direct update/delete? That is right; the patch forces the FDW to perform ExecForeign* functions, and then throws an error in ExecAR* functions. One good thing about this is that we are able to avoid throwing the error when local/remote row-level BEFORE triggers return NULL. Thanks for the comments! Best regards, Etsuro Fujita
Re: Problem with transition tables on partitioned tables with foreign-table partitions
From
Amit Langote
Date:
On Wed, Jul 2, 2025 at 7:05 PM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > On Tue, Jul 1, 2025 at 4:42 PM Amit Langote <amitlangote09@gmail.com> wrote: > > On Tue, Jul 1, 2025 at 11:55 AM Etsuro Fujita <etsuro.fujita@gmail.com> wrote: > > > While working on something else, I noticed that while we disallow > > > transition tables on foreign tables, we allow transition tables on > > > partitioned tables with foreign-table partitions, which produces > > > incorrect results. Here is an example using postgres_fdw: > > > > > > create table parent (a text, b int) partition by list (a); > > > create table loct (a text, b int); > > > create foreign table child (a text, b int) > > > server loopback options (table_name 'loct'); > > > alter table parent attach partition child for values in ('AAA'); > > > > > > create function dump_insert() returns trigger language plpgsql as > > > $$ > > > begin > > > raise notice 'trigger = %, new table = %', > > > TG_NAME, > > > (select string_agg(new_table::text, ', ' order by a) > > > from new_table); > > > return null; > > > end; > > > $$; > > > create trigger parent_insert_trig > > > after insert on parent referencing new table as new_table > > > for each statement execute procedure dump_insert(); > > > > > > create function intercept_insert() returns trigger language plpgsql as > > > $$ > > > begin > > > new.b = new.b + 1000; > > > return new; > > > end; > > > $$; > > > create trigger intercept_insert_loct > > > before insert on loct > > > for each row execute procedure intercept_insert(); > > > > > > insert into parent values ('AAA', 42); > > > NOTICE: trigger = parent_insert_trig, new table = (AAA,42) > > > INSERT 0 1 > > > > > > The trigger shows the original tuple created by the core, not the > > > actual tuple inserted into the foreign-table partition, as > > > postgres_fdw does not collect the actual tuple, of course! > > > > Maybe I'm missing something, but given that the intercept_insert() > > function is applied during the "remote" operation, isn't it expected > > that the parent table's trigger for a "local" operation shows the > > original tuple? > > That is the question of how we define the after image of a row > inserted into a foreign table, but consider the case where the > partition is a plain table: > > create table parent (a text, b int) partition by list (a); > create table child partition of parent for values in ('AAA'); > create trigger intercept_insert_child > before insert on child > for each row execute procedure intercept_insert(); > insert into parent values ('AAA', 42); > NOTICE: trigger = parent_insert_trig, new table = (AAA,1042) > INSERT 0 1 > > The trigger shows the final tuple, not the original tuple. So from a > consistency perspective, I thought it would be good if the trigger > does so even in the case where the partition is a foreign table. Ok, but if you define the trigger on the foreign table partition (child) as follows, you do get what I think is the expected result? create trigger intercept_insert_foreign_child before insert on child for each row execute procedure intercept_insert(); insert into parent values ('AAA', 42); NOTICE: trigger = parent_insert_trig, new table = (AAA,1042) -- 2042, because row modified by both triggers table parent; a | b -----+------ AAA | 2042 (1 row) Or perhaps you're saying that the row returned by this line in ExecInsert(): /* * insert into foreign table: let the FDW do it */ slot = resultRelInfo->ri_FdwRoutine->ExecForeignInsert(estate, resultRelInfo, slot, planSlot); is not the expected "after image", and thus should not be added to the parent's transition table? IIUC, to prevent that, we now hit the following error in: void ExecARInsertTriggers(EState *estate, ResultRelInfo *relinfo, TupleTableSlot *slot, List *recheckIndexes, TransitionCaptureState *transition_capture) { TriggerDesc *trigdesc = relinfo->ri_TrigDesc; if (relinfo->ri_FdwRoutine && transition_capture && transition_capture->tcs_insert_new_table) { Assert(relinfo->ri_RootResultRelInfo); ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("cannot collect transition tuples from child foreign tables"))); } > > > So I would > > > like to propose to fix this by the following: 1) disable using direct > > > modify to modify foreign-table partitions if there are any > > > transition-table triggers on the partitioned table, and then 2) throw > > > an error in ExecARInsertTriggers()/ExecARUpdateTriggers()/ExecARDeleteTriggers() > > > if they collects transition tuple(s) from a foreign-table partition. > > > > Is (2) intended to catch cases that occur during a foreign insert and > > foreign/non-direct update/delete? > > That is right; the patch forces the FDW to perform ExecForeign* > functions, and then throws an error in ExecAR* functions. One good > thing about this is that we are able to avoid throwing the error when > local/remote row-level BEFORE triggers return NULL. Given my question above, I’m not sure I fully understand the intention behind adding these checks. -- Thanks, Amit Langote