Thread: Performance question
I'm trying to convince another open-source project (phpOpenTracker) to modify their current INSERT sql queries. Currently they just do an INSERT into a table without first checking if their might be a record with the same primary key. The reason for this that they need fast inserts and most user I assume are using MySQL which silently drops INSERT queries that violate primary key constraints. But postgres on the other hand (and rightly so) issues and logs an error. I have suggested that their current INSERT INTO t VALUES() be changed to: INSERT INTO T SELECT 'v1', 'v2' WHERE NOT EXISTS ( SELECT NULL FROM t WHERE pk='v1' ) However one of the developers is worried that this would cause a performance drop for MySQL users b/c of the extra SELECT in my version of the INSERT query. I had thought that the 'extra' SELECT isn't really extra at all since *any* DB still has to check that there is a record with the primary key that we are trying to insert. So whereas in my query the SELECT is explicitly stated in the regular version of a simple INSERT, the select is still there but implicit since there was a primary key defined on the table. So there really shouldn't be much, if any of a performance drop. Is there any truth to my logic in the previous paragraph? :) I'd like to test my theory on a real MySQL database but I don't have access to one right now, and I am not sure how to go about testing a MySQL db or even what kind of testing. If I can get a spare machine I will give it a go though. Thanks, Jean-Christian Imbeault
Jean-Christian Imbeault <jc@mega-bucks.co.jp> writes: > I have suggested that their current INSERT INTO t VALUES() be changed to: > INSERT INTO > T > SELECT 'v1', 'v2' > WHERE > NOT EXISTS ( > SELECT NULL FROM t WHERE pk='v1' > ) That doesn't really buy anything in safety terms: if two backends execute this sort of command concurrently, it's perfectly likely that both sub-SELECTS will find no row matching 'v1', and so they'll both try the INSERT anyway. IMO the best way to do this (assuming that you have a unique index defined on the primary key column) is to just go ahead and try the INSERT, but be prepared to roll back your transaction and retry if you get a failure. You might find it useful to read the slides from my talk at last year's O'Reilly conference about this and related concurrency problems: http://conferences.oreillynet.com/cs/os2002/view/e_sess/2681 regards, tom lane
Tom Lane wrote: > > That doesn't really buy anything in safety terms: if two backends > execute this sort of command concurrently, it's perfectly likely > that both sub-SELECTS will find no row matching 'v1', and so they'll > both try the INSERT anyway. I wasn't looking for safety so much as avoiding the insert if possible. I've just finished a long thread about this and the consensus was that this query would be: 1- the insert would always happen if there was no record with the same primary key 2- is concurrency safe. Even if this query is executed by multiple backends at the same time one will always succeed. The reason I had asked for a query that could check before insert was to reduce the number of log errors the phpOpenTracker tool generates from trying to insert duplicate records into a table. The above would reduce (but not eliminate) the number attempts to insert duplicates and as such reduce the number of log error entries. At least that's what I am hoping for ... Jean-Christian Imbeault
Jean-Christian Imbeault <jc@mega-bucks.co.jp> writes: > I wasn't looking for safety so much as avoiding the insert if possible. > ... > The above would reduce (but not eliminate) the number attempts to insert > duplicates and as such reduce the number of log error entries. Fair enough, it would do that. But you still need the surrounding retry logic to cope when a collision does happen. regards, tom lane
Tom Lane wrote: > > Fair enough, it would do that. But you still need the surrounding retry > logic to cope when a collision does happen. Why retry logic? I thought that if there was a collision one of the insert attempts would necessarily succeed and all the others fails. So no need for a retry since the record (pk) was inserted. What I am trying to duplicate is if an inserts into a primary key field is a duplicate it fails silently. If it isn't or there are multiple attempts at the same time to insert the *same* new primary key, the new pk is inserted. The suggested method does guarantee an insert if it there is a collision right? If not I need to find a new solution :( Jean-Christian Imbeault
Jean-Christian Imbeault <jc@mega-bucks.co.jp> writes: > Tom Lane wrote: >> Fair enough, it would do that. But you still need the surrounding retry >> logic to cope when a collision does happen. > Why retry logic? I thought that if there was a collision one of the > insert attempts would necessarily succeed and all the others fails. So > no need for a retry since the record (pk) was inserted. Well, if that's how your application logic works then you're done. I was envisioning a case where you'd prefer to generate a new key value and try the insert again. regards, tom lane
Tom Lane wrote: > > Well, if that's how your application logic works then you're done. Great! It's not _my_ application logic, someone else's. I'm trying to find a way to make it more postgres friendly and generate less error messages. The application was first written for MySQL and the business _logic_ seems to have been to just throw inserts at the database and if they failed it must be because the record was already there so no need to worry about it. Pretty loose logic but then again the app can be *very* write intensive and the extra cycles needed to check whether something is already in the DB or not before inserting seems to be a real concern for the application developers. Jean-Christian Imbeault
On Wed, 2 Jul 2003, Tom Lane wrote: > Jean-Christian Imbeault <jc@mega-bucks.co.jp> writes: > > I have suggested that their current INSERT INTO t VALUES() be changed to: > > > INSERT INTO > > T > > SELECT 'v1', 'v2' > > WHERE > > NOT EXISTS ( > > SELECT NULL FROM t WHERE pk='v1' > > ) > > That doesn't really buy anything in safety terms: if two backends > execute this sort of command concurrently, it's perfectly likely > that both sub-SELECTS will find no row matching 'v1', and so they'll > both try the INSERT anyway. > > IMO the best way to do this (assuming that you have a unique index > defined on the primary key column) is to just go ahead and try the > INSERT, but be prepared to roll back your transaction and retry > if you get a failure. > > You might find it useful to read the slides from my talk at last > year's O'Reilly conference about this and related concurrency > problems: > http://conferences.oreillynet.com/cs/os2002/view/e_sess/2681 > I'd like to see all presentations in one collections. It'd be nice addition to documentation. > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 7: don't forget to increase your free space map settings > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83