Home > mailing lists
SQL:2011 Application Time Update & Delete - Mailing list pgsql-hackers

From	Paul Jungwirth
Subject	SQL:2011 Application Time Update & Delete
Date	June 2 08:24:44
Msg-id	ec498c3d-5f2b-48ec-b989-5561c8aa2024@illuminatedcomputing.com Whole thread Raw
List	pgsql-hackers
Tree view
Hi Hackers,

Here is a new thread for the next part of SQL:2011 Application Time: UPDATE and DELETE commands with 
FOR PORTION OF. This continues the long-running thread that ended with [1].

I don't have a new patch set yet, but I wanted to summarize the discussion at the PGConf.dev 
Advanced Patch Feedback session, especially to continue the conversation about triggers fired from 
inserting "temporal leftovers" as part of an UPDATE/DELETE FOR PORTION OF.

In my last patch series, I fire all statement & row triggers when the inserts happen for temporal 
leftovers. So let's assume there is a row with valid_at of [2000-01-01,2020-01-01) and the user's 
query is UPDATE t FOR PORTION OF valid_at FROM '2010-01-01' TO '2011-01-01'. So it changes one row, 
targeting only 2010. There are two temporal leftovers: one for 2000-2009 and one for 2011-2019 
(inclusive). Then these triggers fire in the order given:

BEFORE UPDATE STATEMENT
BEFORE UPDATE ROW
BEFORE INSERT STATEMENT -- for the 2000-2009 leftovers
BEFORE INSERT ROW
AFTER INSERT ROW
AFTER INSERT STATEMENT
BEFORE INSERT STATEMENT -- for the 2011-2019 leftovers
BEFORE INSERT ROW
AFTER INSERT ROW
AFTER INSERT STATEMENT
AFTER UPDATE ROW
AFTER UPDATE STATEMENT

I think this is the correct behavior (as I'll get to below), but at the session none of us seemed 
completely sure. What we all agreed on is that we shouldn't implement it with SPI.

Before I switched to SPI, I feared that getting INSERT STATEMENT triggers to fire was going to cause 
a lot of code duplication. But I took my last pre-SPI patch (v39 from 7 Aug 2024), restored its 
implementation for ExecForPortionOfLeftovers, and got the desired behavior with just these lines 
(executed once per temporal leftover):

AfterTriggerBeginQuery()
ExecSetupTransitionCaptureState(mtstate, estate);
fireBSTriggers(mtstate);
ExecInsert(context, resultRelInfo, leftoverSlot, node->canSetTag, NULL, NULL);
fireASTriggers(mtstate);
AfterTriggerEndQuery(estate);

You'll be able to see all that with my next patch set, but for now I'm just saying: replacing SPI 
was easier than I thought.

There were different opinions about whether this behavior is correct. Robert and Tom both thought 
that firing INSERT STATEMENT triggers was weird. (Please correct me if I misrepresent anything you 
said!)

Robert pointed out that if you are using statement triggers for performance reasons (since that may 
be the only reason to prefer them to row triggers), you might be annoyed to find that your INSERT 
STATEMENT triggers fire up to two times every time you update a *row*.

Robert also warned that some people implement replication with statement triggers (though maybe not 
people running v18), and they might not like INSERT STATEMENT triggers firing when there was no 
user-issued insert statement. This is especially true since C-based triggers have access to the FOR 
PORTION OF details, as do PL/pgSQL triggers (in a follow-on patch), so they don't need to hear about 
the implicit inserts.

Also trigger-based auditing will see insert statements that were never explicitly sent by a user.
(OTOH this is also true for inserts made from triggers, and (as we'll see below) several other 
commands fire statement triggers for implicit actions.)

Robert & Tom agreed that if we leave out the statement triggers, then the NEW transition table for 
the overall UPDATE STATEMENT trigger should include all three rows: the updated version of the old 
row and the (up to) two temporal leftovers.

A philosophical argument I can see for omitting INSERT STATEMENT is that the temporal leftovers only 
preserve the history that was already there. They don't add to what is asserted by the table. But 
reporting them as statements feels a bit like treating them as user assertions. (I'm not saying I 
find this argument very strong, but I can see how someone would make it.)

Tom & Robert thought that firing the INSERT *ROW* triggers made sense and was valuable for some 
use-cases, e.g. auditing.

Robert also thought that nesting was weird. He thought that the order should be this (and even 
better if omitting the INSERT STATEMENTs):

BEFORE UPDATE STATEMENT
BEFORE UPDATE ROW
AFTER UPDATE ROW
AFTER UPDATE STATEMENT
BEFORE INSERT STATEMENT -- for the 2000-2009 leftovers
BEFORE INSERT ROW
AFTER INSERT ROW
AFTER INSERT STATEMENT
BEFORE INSERT STATEMENT -- for the 2011-2019 leftovers
BEFORE INSERT ROW
AFTER INSERT ROW
AFTER INSERT STATEMENT

But I think that the behavior I have is correct. My draft copy of the 2011 standard says this about 
inserting temporal leftovers (15.13, General Rules 10.c.ii):

 > The following <insert statement> is effectively executed without further Access Rule
 > and constraint checking:
 >   INSERT INTO TN VALUES (VL1, ..., VLd)

When I compared IBM DB2 and MariaDB, I found that DB2 does this:

AFTER INSERT ROW -- for the 2000-2009 leftovers
AFTER INSERT STATEMENT
AFTER INSERT ROW -- for the 2011-2019 leftovers
AFTER INSERT STATEMENT
AFTER UPDATE ROW
AFTER UPDATE STATEMENT

(I didn't quickly find a way to observe BEFORE triggers firing, so they aren't show here. I was 
misremembering when I said at the session that it doesn't support BEFORE triggers. It does, but they 
can't do certain things, like insert into an auditing table.)

And MariaDB (which doesn't have statement triggers) does this:

BEFORE UPDATE ROW
BEFORE INSERT ROW -- for the 2000-2009 leftovers
AFTER INSERT ROW
BEFORE INSERT ROW -- for the 2011-2019 leftovers
AFTER INSERT ROW
AFTER UPDATE ROW

So both of those match the behavior I've implemented (including the nesting).

Peter later looked up the current text of the standard, and he found several parts that confirm the 
existing behavior. (Thank you for checking that for me Peter!) To paraphrase a note from him:

Paper SQL-026R2, which originally created this feature, says:

 > All UPDATE triggers defined on the table will get activated in the usual way for all rows that are
 > updated. In addition, all INSERT triggers will get activated for all rows that are inserted.

He also found the same text I quoted above (now in section 15.14).

He also brought up this other passage from SQL-026R2:

 > Currently it is not possible
 > for the body of an UPDATE trigger to gain access to the FROM and TO values in the FOR PORTION OF
 > clause if one is specified. The syntax of <trigger definition> will need to be extended to allow
 > such access. We are not proposing to enhance the syntax of <trigger definition> in this proposal.
 > We leave it as a future Language Opportunity.

Since the standard still hasn't added that, firing at least INSERT ROW triggers is necessary if you 
want trigger-based replication. (I don't think this speaks strongly to INSERT STATEMENT triggers 
though.)

Incidentally, note that my patches *do* include this information (as noted above): both in the 
TriggerData struct passed to C triggers, and (in a separate patch) via PL/pgSQL variables. I don't 
include it for SQL-language triggers, and perhaps those should wait to see what the standard recommends.

In a world where we *do* fire statement triggers, I think each statement should get its own 
transition table contents.

Robert also said that we should choose behavior that is consistent with other features in Postgres.
I've attached a script to demonstrate a few interesting comparisons. It tests:

- INSERT ON CONFLICT DO NOTHING (without then with a conflict)
- INSERT ON CONFLICT DO UPDATE (without then with a conflict)
- INSERT ON CONFLICT DO UPDATE WHERE (with a conflict)
- MERGE DO NOTHING (without then with a conflict)
- MERGE UPDATE (without then with a conflict)
- cross-partition UPDATE
- ON DELETE CASCADE
- ON DELETE SET NULL

ON CONFLICT DO NOTHING and MERGE DO NOTHING do not fire an UPDATE STATEMENT trigger (naturally).

Cross-partition update does not fire extra statement triggers. Everything else does fire extra 
statement triggers. I think this is what I would have guessed if I hadn't tested it first. It feels 
like the natural choice for each feature.

Note that commands have to "decide" a priori which statement triggers they'll fire, before they 
process rows. So ON CONFLICT DO UPDATE fires first BEFORE INSERT STATEMENT, then BEFORE UPDATE 
STATEMENT, then row triggers, and finally AFTER UPDATE STATEMENT and AFTER INSERT STATEMENT. MERGE 
UPDATE is the same. It fires BEFORE INSERT STATEMENT, then BEFORE UPDATE STATEMENT, then row 
triggers, and finally AFTER UPDATE STATEMENT and AFTER INSERT STATEMENT. And the referential 
integrity actions fire statement triggers (as expected, since they are implemented with SPI).

In all cases we see nesting. With cross-partition update, the DELETE & INSERT triggers are nested 
inside the before/after UPDATE trigger (although interestingly the AFTER DELETE/INSERT triggers 
don't quite follow a nesting-like order with respect to each other):

BEFORE UPDATE STATEMENT
BEFORE UPDATE ROW
BEFORE DELETE ROW
BEFORE INSERT ROW
AFTER DELETE ROW
AFTER INSERT ROW
AFTER UPDATE STATEMENT

That covers all my research. My conclusion is that we *should* fire INSERT STATEMENT triggers, and 
they should be nested within the BEFORE & AFTER UPDATE triggers. I'm pleased that achieving that 
without SPI is not as hard as I expected.

Please stay tuned for some actual patches!

[1] 
https://www.postgresql.org/message-id/CA%2BrenyUZuWOxvY1Lv9O3F1LdpKc442EYvViR1DVzbD9ztaa6Yg%40mail.gmail.com

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com
Attachment

how-do-triggers-work.sql
pgsql-hackers by date:
From: Dilip Kumar
Date: 02 June, 07:14:20
Subject: Re: Speedup truncations of temporary relation forks
From: jian he
Date: 02 June, 08:30:58
Subject: Re: Virtual generated columns
SQL:2011 Application Time Update & Delete - Mailing list pgsql-hackers

Attachment

Previous

Next