Thread: Assigning data-entry tasks to multiple concurrent clients
Hi,<br /><br />I am trying to provide a simple data entry interface to allow multiple people to efficiently work throughevery record in a table and fill in the missing values.<br /><br />The interface is a web application that simplyloads up record after record until they're all complete.<br /><br />I want to minimize the possibility of assigningthe same record to two users.<br /><br />Below is how I'm thinking of assigning records to clients for editing. The idea is to pick a record for a user and remove it from the queue temporarily. It re-enters the queue after5 minutes if no edit has been made.<br /><br />BEGIN;<br />SELECT * FROM records<br />WHERE in_edit_queue AND id NOTIN (<br /> SELECT record_id FROM locked_records<br /> WHERE locked_since < now() + interval '5 minutes')<br />LIMIT1;<br /><br />INSERT INTO locked_records (record_id, locked_since) VALUES (?, now());<br /> COMMIT;<br /><br />Thento save (first-in wins is acceptable for this environment):<br /><br />BEGIN;<br />UPDATE records SET in_edit_queue= false WHERE id = ? AND in_edit_queue = true;<br />DELETE FROM locked_records WHERE record_id = ?;<br /> COMMIT;<br/><br />Is this a sane approach? Is there a better way to do this with PostgreSQL?<br /><br />All feedback isgreatly appreciated..<br /><br />Cheers,<br />J.<br />
tOn 2009-06-01, Jamie Tufnell <diesql@googlemail.com> wrote: > --00163646d8e6795c49046b4163e0 > Content-Type: text/plain; charset=ISO-8859-1 > Content-Transfer-Encoding: 7bit > > Hi, > > I am trying to provide a simple data entry interface to allow multiple > people to efficiently work through every record in a table and fill in the > missing values. > > The interface is a web application that simply loads up record after record > until they're all complete. > > I want to minimize the possibility of assigning the same record to two > users. update records set locked_since = now() where id = (select id from recored order by locked_since is NOT NULL, locked_since desc limit 1) returning * your operators may end up fighting over the last two records, would that be a bad thing?
On Sun, May 31, 2009 at 9:54 PM, Jamie Tufnell <diesql@googlemail.com> wrote: > BEGIN; > SELECT * FROM records > WHERE in_edit_queue AND id NOT IN ( > SELECT record_id FROM locked_records > WHERE locked_since < now() + interval '5 minutes') > LIMIT 1; > > INSERT INTO locked_records (record_id, locked_since) VALUES (?, now()); > COMMIT; There's a race condition here but a unique constraint on record_id will take care of that, as long as you catch the error and retry. > Then to save (first-in wins is acceptable for this environment): > > BEGIN; > UPDATE records SET in_edit_queue = false WHERE id = ? AND in_edit_queue = > true; > DELETE FROM locked_records WHERE record_id = ?; > COMMIT; > > Is this a sane approach? Is there a better way to do this with PostgreSQL? It'll work. The key to any kind of system like this is monitoring the progress for things that get stuck / fail to be processed and running them a second time if need be. I had a system to process 1M rows at a time from an 880M row db, and I used a secondary sequence and recid/1M to partition it out. So, the next job up grabs a sequence id from t secondary sequence, which matches the record(or set) to be processed. With that method there's no locking or anything needed, and no one needs to "check out" the records, because incrementing the secindary sequence is in fact checking them out. Just check the finished table to see if there's any holes and if there are put those jobs back in the queue by simply updating their id to the next value for the porimary id sequence. Sequences can be an elegant way of assigning jobs to multiple threads without locking issues.