Re: [HACKERS] MERGE SQL Statement for PG11 - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: [HACKERS] MERGE SQL Statement for PG11 |
Date | |
Msg-id | CAH2-WzkBufhEn0tPtUDZ=jcGBnjAspFAN_Ewr_19TcZuJVdZgw@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] MERGE SQL Statement for PG11 (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] MERGE SQL Statement for PG11
|
List | pgsql-hackers |
On Fri, Feb 9, 2018 at 6:36 AM, Robert Haas <robertmhaas@gmail.com> wrote: > Here's my $0.02: I think that new concurrency errors thrown by the > merge code itself deserve strict scrutiny and can survive only if they > have a compelling justification, but if the merge code happens to > choose an action that leads to a concurrency error of its own, I think > that deserves only mild scrutiny. > > On that basis, of the options I listed in > http://postgr.es/m/CA+TgmoZDL-caukHkWet7sr7sqr0-e2T91+DEvhqeN5sfqsMjqw@mail.gmail.com > I like #1 least. > > I also dislike #4 from that list for the reasons stated there. For > example, if you say WHEN MATCHED AND x.some_boolean and then WHEN > MATCHED, you expect that every tuple that hits the latter clause will > have that Boolean as false or null, but #4 makes that not true. > > I think the best options are #2 and #5 -- #2 because it's simple, and > #5 because it's (maybe) more intuitive, albeit at the risk of > livelock. But I'm willing to convinced of some other option; like > you, I'm willing to be open-minded about this. But, as you say, we > need a coherent, well-considered justification for the selected > option, not just "well, this is what we implemented so you should like > it". The specification should define the implementation, not the > reverse. At first I hated option #2. I'm warming to #2 a lot now, though, because I've come to understand the patch's approach a little better. (Pavan and Simon should still verify that I got things right in my mail from earlier today, though.) It now seems like the patch throws a RC serialization error more or less only due to concurrent deletions (rarely, it will actually be a concurrent update that changed the join qual values of our MERGE). We're already not throwing the error (we just move on to the next input row from the join) when we happen to not have a WHEN NOT MATCHED case. But why even make that distinction? Why not just go ahead and WHEN NOT MATCHED ... INSERT when the MERGE happens to have such a clause? The INSERT will succeed, barring any concurrent conflicting insertion by a third session -- a hazard that has nothing to do with RC conflict handling in particular. Just inserting without throwing a RC serialization error is almost equivalent to a new MVCC snapshot being acquired due to a RC conflict, so all of this now looks okay to me. Especially because every other MERGE implementation seems to have serious issues with doing anything too fancy with the MERGE target table's quals within the main ON join. I think that SQL Server actually assumes that you're using the target's PK in a simple equi-join. All the examples look like that, and this assumption is heavily suggested by the "Do not attempt to improve query performance by filtering out rows in the target table in the ON clause" weasel words from their docs, that I referenced in my mail from earlier today. I can get my head around all of this now only because I've come to understand that once we've decided that a given input from the main join is NOT MATCHED, we stick to that determination. We don't bounce around between MATCHED and NOT MATCHED cases during an EPQ update chain walk. We can bounce around between multiple alternative MATCHED merge actions/WHEN cases, but that seems okay because it's still part of essentially the same EPQ update chain walk -- no obvious livelock hazard. It seems fairly clean to restart everything ("goto lmerge") for each and every tuple in the update chain. Maybe we should actually formalize that you're only supposed to do a PK or unique index equi-join within the main ON join, though -- you can do something fancy with the source table, but not the target table. -- Peter Geoghegan
pgsql-hackers by date: