Thread: Parallel updates on multiple cores

Parallel updates on multiple cores

From

Andrei

Date:

09 June 2008, 05:30:21

I have the following case: a simple table<br /><br />drop table test_data;<br />create table test_data (<br />id
bigserialnot null primary key,<br />content varchar(50),<br />processed varchar(1)<br />);<br /><br />My function doing
theinserts<br /><br />CREATE OR REPLACE FUNCTION populate_test_data(IN nr_records BIGINT, IN proc_nr BIGINT) RETURNS
integerAS $$<br />DECLARE<br />    counter BIGINT := 0;<br />    record_val text;<br />BEGIN<br />LOOP    <br />  
 counter:=counter+1;<br/>    record_val:=((('v ' || counter) || ' p ') || proc_nr);<br />    insert into
test_data(content,processed) values(record_val,'n');<br />    EXIT WHEN counter > nr_records;<br />END LOOP;<br
/>RETURN0;<br />END;<br />$$ LANGUAGE plpgsql;<br /><br />where nr_records represents the number of inserts, and<br
/><br/>CREATE OR REPLACE FUNCTION select_unprocessed(IN start_id BIGINT, IN end_id BIGINT) RETURNS integer AS $$<br
/>DECLARE<br/>    counter BIGINT := 0;<br />    record_val text;<br />    rec record;<br /><br />BEGIN<br />FOR rec IN
SELECTid, content, processed FROM test_data WHERE id >= start_id AND id < end_id<br />LOOP        <br />  
 record_val:=rec.content|| '-DONE-';<br />    update test_data set content=record_val, processed='n' where
id=rec.id;<br/>END LOOP;<br />RETURN 0;<br /><br />END;<br />$$ LANGUAGE plpgsql;<br /><br />The function above updates
therows between the ids start_id and end_id.<br />I have a quad core procesor so i run two separate connections to the
database:select populate_test_data(5000,1) and another select populate_test_data(5000,2). In this case each function
runson one core doing the inserts in parallel, but when i try to run select select_unprocessed(1,5001) and from another
connectionselect select_unprocessed(5001, 10001), one of the processes locks the table so the other one has to wait
untilthe table is unlocked.<br />Each process updates different parts of the table.<br />Is there a way to do the
updatesin parallel on multiple cores?<p>

Re: Parallel updates on multiple cores

From

Shane Ambler

Date:

09 June 2008, 07:30:20

Andrei wrote:

> The function above updates the rows between the ids start_id and
> end_id. I have a quad core procesor so i run two separate connections
> to the database: select populate_test_data(5000,1) and another select
> populate_test_data(5000,2). In this case each function runs on one
> core doing the inserts in parallel, but when i try to run select
> select_unprocessed(1,5001) and from another connection select
> select_unprocessed(5001, 10001), one of the processes locks the table
> so the other one has to wait until the table is unlocked. Each
> process updates different parts of the table. Is there a way to do
> the updates in parallel on multiple cores?
> 

Wait until the other is done or wait until it has done what it needs to?

If it appears to not update the records I would look at the id ranges 
you are passing. You insert 5000 rows with the first function then you 
tell the update function to update row id's 1 to 50001 - have you reset 
the sequence for the id column? or do you drop and create the table 
before each test? My guess is no updates appear to happen as the id's 
entered by the serial type are larger than 10000.

Also you update with processed='n' - is that what you want? Is that the 
only column you look at to see that it is done?

Transactions would be the only cause of the problem you describe. I am 
guessing that you use bigger numbers than 5000 in your tests and the 
examples above use overlapping id's. If the first updates row 5001 early 
then the second may need to wait until it commits to update it again. 
This can work the other way 5001 is updated by the second locking it 
until it finishes and the first waits until the second commits to update 
it again.
With 5000 rows I wouldn't expect to see a time difference.

Without an order by in the select the rows can be returned and updated 
in any order.

-- 

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz

Re: Parallel updates on multiple cores

From

Simon Riggs

Date:

09 June 2008, 11:24:38

On Mon, 2008-06-09 at 01:29 -0700, Andrei wrote:
> The function above updates the rows between the ids start_id and
> end_id.
> I have a quad core procesor so i run two separate connections to the
> database: select populate_test_data(5000,1) and another select
> populate_test_data(5000,2). In this case each function runs on one
> core doing the inserts in parallel, but when i try to run select
> select_unprocessed(1,5001) and from another connection select
> select_unprocessed(5001, 10001), one of the processes locks the table
> so the other one has to wait until the table is unlocked.
> Each process updates different parts of the table.

Your ranges overlap. So one waits for the other on tuple=5001.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support