SQL:2011 Application Time Update & Delete - Mailing list pgsql-hackers
From | Paul Jungwirth |
---|---|
Subject | SQL:2011 Application Time Update & Delete |
Date | |
Msg-id | ec498c3d-5f2b-48ec-b989-5561c8aa2024@illuminatedcomputing.com Whole thread Raw |
List | pgsql-hackers |
Hi Hackers, Here is a new thread for the next part of SQL:2011 Application Time: UPDATE and DELETE commands with FOR PORTION OF. This continues the long-running thread that ended with [1]. I don't have a new patch set yet, but I wanted to summarize the discussion at the PGConf.dev Advanced Patch Feedback session, especially to continue the conversation about triggers fired from inserting "temporal leftovers" as part of an UPDATE/DELETE FOR PORTION OF. In my last patch series, I fire all statement & row triggers when the inserts happen for temporal leftovers. So let's assume there is a row with valid_at of [2000-01-01,2020-01-01) and the user's query is UPDATE t FOR PORTION OF valid_at FROM '2010-01-01' TO '2011-01-01'. So it changes one row, targeting only 2010. There are two temporal leftovers: one for 2000-2009 and one for 2011-2019 (inclusive). Then these triggers fire in the order given: BEFORE UPDATE STATEMENT BEFORE UPDATE ROW BEFORE INSERT STATEMENT -- for the 2000-2009 leftovers BEFORE INSERT ROW AFTER INSERT ROW AFTER INSERT STATEMENT BEFORE INSERT STATEMENT -- for the 2011-2019 leftovers BEFORE INSERT ROW AFTER INSERT ROW AFTER INSERT STATEMENT AFTER UPDATE ROW AFTER UPDATE STATEMENT I think this is the correct behavior (as I'll get to below), but at the session none of us seemed completely sure. What we all agreed on is that we shouldn't implement it with SPI. Before I switched to SPI, I feared that getting INSERT STATEMENT triggers to fire was going to cause a lot of code duplication. But I took my last pre-SPI patch (v39 from 7 Aug 2024), restored its implementation for ExecForPortionOfLeftovers, and got the desired behavior with just these lines (executed once per temporal leftover): AfterTriggerBeginQuery() ExecSetupTransitionCaptureState(mtstate, estate); fireBSTriggers(mtstate); ExecInsert(context, resultRelInfo, leftoverSlot, node->canSetTag, NULL, NULL); fireASTriggers(mtstate); AfterTriggerEndQuery(estate); You'll be able to see all that with my next patch set, but for now I'm just saying: replacing SPI was easier than I thought. There were different opinions about whether this behavior is correct. Robert and Tom both thought that firing INSERT STATEMENT triggers was weird. (Please correct me if I misrepresent anything you said!) Robert pointed out that if you are using statement triggers for performance reasons (since that may be the only reason to prefer them to row triggers), you might be annoyed to find that your INSERT STATEMENT triggers fire up to two times every time you update a *row*. Robert also warned that some people implement replication with statement triggers (though maybe not people running v18), and they might not like INSERT STATEMENT triggers firing when there was no user-issued insert statement. This is especially true since C-based triggers have access to the FOR PORTION OF details, as do PL/pgSQL triggers (in a follow-on patch), so they don't need to hear about the implicit inserts. Also trigger-based auditing will see insert statements that were never explicitly sent by a user. (OTOH this is also true for inserts made from triggers, and (as we'll see below) several other commands fire statement triggers for implicit actions.) Robert & Tom agreed that if we leave out the statement triggers, then the NEW transition table for the overall UPDATE STATEMENT trigger should include all three rows: the updated version of the old row and the (up to) two temporal leftovers. A philosophical argument I can see for omitting INSERT STATEMENT is that the temporal leftovers only preserve the history that was already there. They don't add to what is asserted by the table. But reporting them as statements feels a bit like treating them as user assertions. (I'm not saying I find this argument very strong, but I can see how someone would make it.) Tom & Robert thought that firing the INSERT *ROW* triggers made sense and was valuable for some use-cases, e.g. auditing. Robert also thought that nesting was weird. He thought that the order should be this (and even better if omitting the INSERT STATEMENTs): BEFORE UPDATE STATEMENT BEFORE UPDATE ROW AFTER UPDATE ROW AFTER UPDATE STATEMENT BEFORE INSERT STATEMENT -- for the 2000-2009 leftovers BEFORE INSERT ROW AFTER INSERT ROW AFTER INSERT STATEMENT BEFORE INSERT STATEMENT -- for the 2011-2019 leftovers BEFORE INSERT ROW AFTER INSERT ROW AFTER INSERT STATEMENT But I think that the behavior I have is correct. My draft copy of the 2011 standard says this about inserting temporal leftovers (15.13, General Rules 10.c.ii): > The following <insert statement> is effectively executed without further Access Rule > and constraint checking: > INSERT INTO TN VALUES (VL1, ..., VLd) When I compared IBM DB2 and MariaDB, I found that DB2 does this: AFTER INSERT ROW -- for the 2000-2009 leftovers AFTER INSERT STATEMENT AFTER INSERT ROW -- for the 2011-2019 leftovers AFTER INSERT STATEMENT AFTER UPDATE ROW AFTER UPDATE STATEMENT (I didn't quickly find a way to observe BEFORE triggers firing, so they aren't show here. I was misremembering when I said at the session that it doesn't support BEFORE triggers. It does, but they can't do certain things, like insert into an auditing table.) And MariaDB (which doesn't have statement triggers) does this: BEFORE UPDATE ROW BEFORE INSERT ROW -- for the 2000-2009 leftovers AFTER INSERT ROW BEFORE INSERT ROW -- for the 2011-2019 leftovers AFTER INSERT ROW AFTER UPDATE ROW So both of those match the behavior I've implemented (including the nesting). Peter later looked up the current text of the standard, and he found several parts that confirm the existing behavior. (Thank you for checking that for me Peter!) To paraphrase a note from him: Paper SQL-026R2, which originally created this feature, says: > All UPDATE triggers defined on the table will get activated in the usual way for all rows that are > updated. In addition, all INSERT triggers will get activated for all rows that are inserted. He also found the same text I quoted above (now in section 15.14). He also brought up this other passage from SQL-026R2: > Currently it is not possible > for the body of an UPDATE trigger to gain access to the FROM and TO values in the FOR PORTION OF > clause if one is specified. The syntax of <trigger definition> will need to be extended to allow > such access. We are not proposing to enhance the syntax of <trigger definition> in this proposal. > We leave it as a future Language Opportunity. Since the standard still hasn't added that, firing at least INSERT ROW triggers is necessary if you want trigger-based replication. (I don't think this speaks strongly to INSERT STATEMENT triggers though.) Incidentally, note that my patches *do* include this information (as noted above): both in the TriggerData struct passed to C triggers, and (in a separate patch) via PL/pgSQL variables. I don't include it for SQL-language triggers, and perhaps those should wait to see what the standard recommends. In a world where we *do* fire statement triggers, I think each statement should get its own transition table contents. Robert also said that we should choose behavior that is consistent with other features in Postgres. I've attached a script to demonstrate a few interesting comparisons. It tests: - INSERT ON CONFLICT DO NOTHING (without then with a conflict) - INSERT ON CONFLICT DO UPDATE (without then with a conflict) - INSERT ON CONFLICT DO UPDATE WHERE (with a conflict) - MERGE DO NOTHING (without then with a conflict) - MERGE UPDATE (without then with a conflict) - cross-partition UPDATE - ON DELETE CASCADE - ON DELETE SET NULL ON CONFLICT DO NOTHING and MERGE DO NOTHING do not fire an UPDATE STATEMENT trigger (naturally). Cross-partition update does not fire extra statement triggers. Everything else does fire extra statement triggers. I think this is what I would have guessed if I hadn't tested it first. It feels like the natural choice for each feature. Note that commands have to "decide" a priori which statement triggers they'll fire, before they process rows. So ON CONFLICT DO UPDATE fires first BEFORE INSERT STATEMENT, then BEFORE UPDATE STATEMENT, then row triggers, and finally AFTER UPDATE STATEMENT and AFTER INSERT STATEMENT. MERGE UPDATE is the same. It fires BEFORE INSERT STATEMENT, then BEFORE UPDATE STATEMENT, then row triggers, and finally AFTER UPDATE STATEMENT and AFTER INSERT STATEMENT. And the referential integrity actions fire statement triggers (as expected, since they are implemented with SPI). In all cases we see nesting. With cross-partition update, the DELETE & INSERT triggers are nested inside the before/after UPDATE trigger (although interestingly the AFTER DELETE/INSERT triggers don't quite follow a nesting-like order with respect to each other): BEFORE UPDATE STATEMENT BEFORE UPDATE ROW BEFORE DELETE ROW BEFORE INSERT ROW AFTER DELETE ROW AFTER INSERT ROW AFTER UPDATE STATEMENT That covers all my research. My conclusion is that we *should* fire INSERT STATEMENT triggers, and they should be nested within the BEFORE & AFTER UPDATE triggers. I'm pleased that achieving that without SPI is not as hard as I expected. Please stay tuned for some actual patches! [1] https://www.postgresql.org/message-id/CA%2BrenyUZuWOxvY1Lv9O3F1LdpKc442EYvViR1DVzbD9ztaa6Yg%40mail.gmail.com Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
Attachment
pgsql-hackers by date: