Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE |
Date | |
Msg-id | CAM3SWZRi1bZVLfUk6CQKcJXXVd0c-F5rzmy-5uNeRBRFTN24UQ@mail.gmail.com Whole thread Raw |
In response to | Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE (Peter Geoghegan <pg@heroku.com>) |
Responses |
Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE
Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE |
List | pgsql-hackers |
I've worked on a simple set of tests, written quickly in bash, that I think exercise interesting cases: https://github.com/petergeoghegan/upsert Perhaps most notably, these tests make comparisons between the performance of ordinary inserts with a serial primary key table, and effectively equivalent upserts that always insert. Even though a SERIAL primary key is used, which you might imagine to be a bad case for the extended heavyweight leaf page locks, performance seems almost as good as regular insertion (though I should add that there was still only one unique index). Upserts that only update are significantly slower, and end up at very approximately 2/3 the performance of equivalent updates (a figure that I arrive at through running the tests on my laptop, which is not terribly rigorous, so perhaps take that with a grain of salt). That the update-only case is slower is hardly surprising, since those require an "extra" index scan as we re-find the conflict tuple. Using ctid projection to avoid the index scan doesn't work, at least not without extra in-application handling, because a tid scan is never used if you write a typical wCTE upsert statement - the planner forces a seqscan. It would have been interesting to see where a tid scan for the Update/ModifyTable node's nestloop join left things, but I didn't get that far. I think that if we had a really representative test (no "extra" index scan), the performance of upserts that only update would similarly be almost as good as regular updates for many representative cases. The upsert tests also provide cursory smoke testing for cases of interest. I suggest comparing the test cases, and their performance/behavior between the exclusion* and btree* patches. A new revision of my patch is attached. There have mostly just been improvement made that are equally applicable to promise tuples, so this should not be interpreted as a rejection of promise tuples, so much as a way of using my time efficiently while I wait to hear back about how others feel my approach compares. I'm still rather curious about what people think in light of recent developments. :-) Changes include: * Another ecpg fix. * Misc polishing and refactoring to btree code. Most notably I now cache btree insertion scankeys across phases (as well as continuing to cache each IndexTuple). * contrib/pageinspect notes when btree leaf pages have the locked flag bit set. * Better lock/unlock ordering. Commit a1dfaef6c6e2da34dbc808f205665a6c96362149 added strict ordering of indexes because of exclusive heavyweight locks held on indexes. Back then (~2000), Heavyweight page locks were also acquired on btree pages [1], and in general it was expected that heavyweight locks could be held on indexes across complex operations (I don't think this is relied upon anymore, but the assumption that it could happen remains). I've extended this so that relcache ensures that we always insert into primary key indexes first (and in the case of my patch, lock values in PKs last, to minimize the locking window). It seems sensible as a general bloat avoidance measure to get their insertion out of the way, so that we're guaranteed that the same slot's index tuples bloat only unique indexes when the slot is responsible for a unique violation, rather than just ordering by oid where you can get dead index tuples in previously inserted non-unique indexes. * I believe I've fixed the bug in the modifications made to HeapTupleSatisfiesMVCC(), though I'd like confirmation that I have the details right. What do you think, Andres? * I stop higher isolation levels from availing of the aforementioned modifications to HeapTupleSatisfiesMVCC(), since that would certainly be inconsistent with user expectations. I'm not sure what "read phenomenon" described by the standard this violates, but it's obviously inconsistent with the spirit of the higher isolation levels to be able to see a row committed by a transaction conceptually still-in-progress. It's bad enough to do it for READ COMMITTED, but I think a consensus may be emerging that that's the least worst thing (correct me if I'm mistaken). * The "priorConfict" optimization, which was previously shown to really help performance [2] has been slightly improved to remember row locking conflicts too. [1] https://github.com/postgres/postgres/blob/a1dfaef6c6e2da34dbc808f205665a6c96362149/src/backend/access/nbtree/nbtpage.c#L314 [2] http://www.postgresql.org/message-id/CAM3SWZQZTAN1fDiq4o2umGOaczbpemyQoM-6OxgUFBzi+dQzkg@mail.gmail.com -- Peter Geoghegan
Attachment
pgsql-hackers by date: