Re: INSERT ... ON CONFLICT UPDATE/IGNORE 4.0 - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: INSERT ... ON CONFLICT UPDATE/IGNORE 4.0 |
Date | |
Msg-id | CAM3SWZQ3TzqC13FtH5D1kC6iJ8N_=yrKwg4PCwDcy_ZiQ2FDoA@mail.gmail.com Whole thread Raw |
In response to | Re: INSERT ... ON CONFLICT UPDATE/IGNORE 4.0 (Heikki Linnakangas <hlinnaka@iki.fi>) |
Responses |
Re: INSERT ... ON CONFLICT UPDATE/IGNORE 4.0
|
List | pgsql-hackers |
On Thu, Apr 30, 2015 at 7:00 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > To fix that, we need to fix the "livelock insurance" check so that A does > not wait for B here. Because B is not a speculative insertion, A should > cancel its speculative insertion and retry instead. (I pushed the one-line > fix for that to your github repository) I've been unable to reproduce the unprincipled deadlock using the same test case as before. However, the exclusion constraint code now livelocks. Here is example output from a stress-testing session: trying 128 clients: [Fri May 1 04:45:15 2015] NOTICE: extension "btree_gist" already exists, skipping [Fri May 1 04:45:15 2015] init done at count_upsert_exclusion.pl line 106. [Fri May 1 04:45:19 2015] sum is 96 [Fri May 1 04:45:19 2015] count is 8906 [Fri May 1 04:45:19 2015] normal exit at 1430455519 after 128000 items processed at count_upsert_exclusion.pl line 192. trying 128 clients: [Fri May 1 04:45:19 2015] NOTICE: extension "btree_gist" already exists, skipping [Fri May 1 04:45:19 2015] init done at count_upsert_exclusion.pl line 106. [Fri May 1 04:45:23 2015] sum is -610 [Fri May 1 04:45:23 2015] count is 8867 [Fri May 1 04:45:23 2015] normal exit at 1430455523 after 128000 items processed at count_upsert_exclusion.pl line 192. trying 128 clients: [Fri May 1 04:45:23 2015] NOTICE: extension "btree_gist" already exists, skipping [Fri May 1 04:45:23 2015] init done at count_upsert_exclusion.pl line 106. [Fri May 1 04:45:27 2015] sum is 352 [Fri May 1 04:45:27 2015] count is 8861 [Fri May 1 04:45:27 2015] normal exit at 1430455527 after 128000 items processed at count_upsert_exclusion.pl line 192. trying 128 clients: [Fri May 1 04:45:27 2015] NOTICE: extension "btree_gist" already exists, skipping [Fri May 1 04:45:27 2015] init done at count_upsert_exclusion.pl line 106. [Fri May 1 04:45:31 2015] sum is 190 [Fri May 1 04:45:31 2015] count is 8895 [Fri May 1 04:45:31 2015] normal exit at 1430455531 after 128000 items processed at count_upsert_exclusion.pl line 192. trying 128 clients: [Fri May 1 04:45:31 2015] NOTICE: extension "btree_gist" already exists, skipping [Fri May 1 04:45:31 2015] init done at count_upsert_exclusion.pl line 106. [Fri May 1 04:45:35 2015] sum is -76 [Fri May 1 04:45:35 2015] count is 8833 [Fri May 1 04:45:35 2015] normal exit at 1430455535 after 128000 items processed at count_upsert_exclusion.pl line 192. trying 128 clients: [Fri May 1 04:45:58 2015] NOTICE: extension "btree_gist" already exists, skipping [Fri May 1 04:45:58 2015] init done at count_upsert_exclusion.pl line 106. (I ssh into server, check progress). Then, due to some issue with the scheduler or something, progress continues: [Fri May 1 05:17:57 2015] sum is 462 [Fri May 1 05:17:57 2015] count is 8904 [Fri May 1 05:17:58 2015] normal exit at 1430457478 after 128000 items processed at count_upsert_exclusion.pl line 192. trying 128 clients: [Fri May 1 05:17:58 2015] NOTICE: extension "btree_gist" already exists, skipping [Fri May 1 05:17:58 2015] init done at count_upsert_exclusion.pl line 106. [Fri May 1 05:18:01 2015] sum is 316 [Fri May 1 05:18:01 2015] count is 8839 [Fri May 1 05:18:01 2015] normal exit at 1430457481 after 128000 items processed at count_upsert_exclusion.pl line 192. Note that it's much easier to see non-uniform durations for each run for no good reason, rather than livelock per say. Most runs are 3-4 seconds, but then every so often one will last 25 seconds for no good reason. So maybe this is better described as a lock starvation issue: trying 128 clients: [Fri May 1 05:41:16 2015] NOTICE: extension "btree_gist" already exists, skipping [Fri May 1 05:41:16 2015] init done at count_upsert_exclusion.pl line 106. [Fri May 1 05:41:19 2015] sum is -264 [Fri May 1 05:41:19 2015] count is 8886 [Fri May 1 05:41:20 2015] normal exit at 1430458880 after 128000 items processed at count_upsert_exclusion.pl line 192. trying 128 clients: [Fri May 1 05:41:20 2015] NOTICE: extension "btree_gist" already exists, skipping [Fri May 1 05:41:20 2015] init done at count_upsert_exclusion.pl line 106. [Fri May 1 05:41:43 2015] sum is -14 [Fri May 1 05:41:43 2015] count is 8894 [Fri May 1 05:41:43 2015] normal exit at 1430458903 after 128000 items processed at count_upsert_exclusion.pl line 192. trying 128 clients: [Fri May 1 05:41:43 2015] NOTICE: extension "btree_gist" already exists, skipping [Fri May 1 05:41:43 2015] init done at count_upsert_exclusion.pl line 106. [Fri May 1 05:41:47 2015] sum is -338 [Fri May 1 05:41:47 2015] count is 8867 [Fri May 1 05:41:47 2015] normal exit at 1430458907 after 128000 items processed at count_upsert_exclusion.pl line 192. If I look at the Postgres server log itself, I see very long running queries that match the following pattern: 2015-05-01 06:06:42 UTC [ 0 ]: LOG: duration: 21855.521 ms statement: delete from upsert_race_test where index='9399' and count=0 2015-05-01 06:06:42 UTC [ 0 ]: LOG: duration: 21855.303 ms statement: delete from upsert_race_test where index='5296' and count=0 2015-05-01 06:06:42 UTC [ 0 ]: LOG: duration: 21849.475 ms statement: insert into upsert_race_test (index, count) values ('9777','-1') on conflict do update set count=TARGET.count + EXCLUDED.count whereTARGET.index = EXCLUDED.index returning count 2015-05-01 06:06:42 UTC [ 0 ]: LOG: duration: 21853.646 ms statement: insert into upsert_race_test (index, count) values ('6843','1') on conflict do update set count=TARGET.count + EXCLUDED.count whereTARGET.index = EXCLUDED.index returning count 2015-05-01 06:06:42 UTC [ 0 ]: LOG: duration: 21856.496 ms statement: delete from upsert_race_test where index='6015' and count=0 2015-05-01 06:06:42 UTC [ 0 ]: LOG: duration: 21855.670 ms statement: insert into upsert_race_test (index, count) values ('3283','-1') on conflict do update set count=TARGET.count + EXCLUDED.count whereTARGET.index = EXCLUDED.index returning count 2015-05-01 06:06:42 UTC [ 0 ]: LOG: duration: 21849.788 ms statement: insert into upsert_race_test (index, count) values ('5355','-1') on conflict do update set count=TARGET.count + EXCLUDED.count whereTARGET.index = EXCLUDED.index returning count 2015-05-01 06:06:42 UTC [ 0 ]: LOG: duration: 21856.756 ms statement: insert into upsert_race_test (index, count) values ('6716','1') on conflict do update set count=TARGET.count + EXCLUDED.count whereTARGET.index = EXCLUDED.index returning count 2015-05-01 06:06:42 UTC [ 0 ]: LOG: duration: 21855.800 ms statement: insert into upsert_race_test (index, count) values ('9573','1') on conflict do update set count=TARGET.count + EXCLUDED.count whereTARGET.index = EXCLUDED.index returning count 2015-05-01 06:06:42 UTC [ 0 ]: LOG: duration: 21854.482 ms statement: delete from upsert_race_test where index='2909' and count=0 2015-05-01 06:06:42 UTC [ 0 ]: LOG: duration: 21851.270 ms statement: insert into upsert_race_test (index, count) values ('5129','1') on conflict do update set count=TARGET.count + EXCLUDED.count whereTARGET.index = EXCLUDED.index returning count This is the same server that I shared credentials with you for. Feel free to ssh in and investigate it yourself. Thanks -- Peter Geoghegan
pgsql-hackers by date: