Re: Select for update with offset interferes with concurrent transactions - Mailing list pgsql-general
From | Yngve Nysaeter Pettersen |
---|---|
Subject | Re: Select for update with offset interferes with concurrent transactions |
Date | |
Msg-id | op.vp8kr1buvqd7e2@killashandra.oslo.osa Whole thread Raw |
In response to | Re: Select for update with offset interferes with concurrent transactions (Andy Colson <andy@squeakycode.net>) |
Responses |
Re: Select for update with offset interferes with concurrent
transactions
|
List | pgsql-general |
On Tue, 01 Feb 2011 18:11:23 +0100, Andy Colson <andy@squeakycode.net> wrote: > On 2/1/2011 10:59 AM, Yngve Nysaeter Pettersen wrote: >> Hi, >> >> Thanks for the quick answer, Andy. >> >> On Tue, 01 Feb 2011 16:19:17 +0100, Andy Colson <andy@squeakycode.net> >> wrote: >> >> <snip> >>> So, if I understand correctly, you: >>> >>> q = SELECT record_id FROM queue >>> WHERE project_id = my_project AND state = idle >>> LIMIT n OFFSET i FOR UPDATE >>> while not q.eof >>> update queue set state = started where record_id = x; >>> process record_id >>> update queue set state = finsihed where record_id = x; >>> q.next; >> >> Almost, the update to "started" is done for all selected elements first, >> releasing the lock, then the items are processed one at a time, marking >> each "finished" as they complete. (each processing step can take >> minutes, so keeping a lock the whole time is not an option) >> >>> Might I suggest and alternative: >>> >>> q = update queue set state = started >>> WHERE project_id = my_project AND state = idle >>> LIMIT n OFFSET i >>> RETURNING project_id; >>> idlist = @q; >>> commit; >>> >>> foreach x in idlist >>> process record_id >>> begin >>> update queue set state = finsihed where record_id = x; >>> commit; >>> >>> Forgive the part perl part python sudocode. Oh, and I've never done >>> this, no idea if it actually works. :-) >> >> Thanks for that suggestion, I'll take a look at it. >> >> While I hadn't caught on to the "RETURNING" part, I had been wondering >> if using a single step UPDATE might be a solution. One concern I have is >> how concurrent updates will affect the returned list (or if they will >> just be skipped, as SELECT would in normal transaction mode, if I >> understood correctly), or whether it might return with an error code (I >> know that the normal update return value is the number of updated items, >> just not sure if that applies for "RETURNING"). >> >> Although, I will note that this process (if it works) will, sort of, >> make FOR UPDATE redundant. Or, if it doesn't, the current lock-policy >> might cause issues for concurrent updates for the use-cases where FOR >> UPDATE is relevant. >> > > Yeah, I'd wondered the same thing. It could be two updates hitting the > same row will deadlock, or maybe not, I'm not sure. But I think its the > same as with the select, if you happen to have two limits that hit the > same range, you're in trouble. > > I think the random limit thing is a race condition itself. Whenever you > have multiple processes hitting the same rows you're going to run into > problems. Have you thought of using a sequence instead of a random > limit? Each process could get the next 100 record_id'd via a sequence, > then there would be much less chance of deadlock. How would that work, in case you would like to provide an example? I am not really familiar with sequences, as I have only seen them used for the "id" field in Django generated tables. In case it is relevant, the processes does not (currently, at least) have a unique ID; though they have a local sequence number for the machine they are running on. -- Sincerely, Yngve N. Pettersen ******************************************************************** Senior Developer Email: yngve@opera.com Opera Software ASA http://www.opera.com/ Phone: +47 23 69 32 60 Fax: +47 23 69 24 01 ********************************************************************
pgsql-general by date: