Thread: Avoiding deadlocks when performing bulk update and delete operations
On 24/11/14 16:51, Sanjaya Vithanagama wrote: > Hi All, > > We have a single table which does not have any foreign key references. > > id_A (bigint) > id_B (bigint) > val_1 (varchar) > val_2 (varchar) > > The primary key of the table is a composite of id_A and id_B. > > Reads and writes of this table are highly concurrent and the table has > millions of rows. We have several stored procedures which do mass > updates and deletes. Those stored procedures are being called > concurrently mainly by triggers and application code. > > The operations usually look like the following where it could match > thousands of records to update or delete: > > DELETE FROM table_name t > USING ( > SELECT id_A, id_B > FROM table_name > WHERE id_A = ANY(array_of_id_A) > AND id_B = ANY(array_of_id_B) > ORDER BY id_A, id_B > FOR UPDATE > ) del > WHERE t.id_A = del.id_A > AND t.id_B = del.id_B; > > > UPDATE table_name t > SET val_1 = 'some value' > , val_2 = 'some value' > FROM ( > SELECT id_A, id_B > FROM table_name > WHERE id_A = ANY(array_of_id_A) > AND id_B = ANY(array_of_id_B) > ORDER BY id_A, id_B > FOR UPDATE > ) upd > WHERE t.id_A = upd.id_A > AND t.id_B = upd.id_B; > > We are experiencing deadlocks and all our attempts to perform > operations with locks (row level using SELECT FOR UPDATE as used in > the above queries and table level locks) do not seem to solve these > deadlock issues. (Note that we cannot in any way use access exclusive > locking on this table because of the performance impact) > > Is there another way that we could try to solve these deadlock > situations? The reference manual says — "The best defense against > deadlocks is generally to avoid them by being certain that all > applications using a database acquire locks on multiple objects in a > consistent order." > > Is there a guaranteed way to do bulk update/delete operations in a > particular order so that we can ensure deadlocks won't occur? Or are > there any other tricks to avoid deadlocks in this situation? > > Thank you in advance, > Sanjaya Unless there is some sort of implied locking, or other nonsense like different lock types, then always acquiring locks in the same order should work - as far as I can tell. For purely locking problems, and assuming that all the relevant tables are locked: For if process p1 acquires locks in the order A, B, & C and process p2 acquires locks in the order A, C, & D, Then as soon as one process grabs A, then the other process cannot grab A nor the other locks - so deadlock is avoided. Similarly: p1 A, B, C, & D p2 B & C However, if p1 grabs A and then p2 grabs B, P1 will have to wait for p2 to finish before p1 continues - but still, neither is deadlocked. Even if there is p3 which locks B - at worst 2 processes will wait until the lucky first process releases its locks. You may have problems if there is some resource that is in contention, where 2 processes require the resource and grab it in several parts at different times, and they both grab some, and then there is insufficient to completely satisfy either - this is guesswork, I'm not sure what resources (if any) would be a problem here. Cheers, Gavin
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Sanjaya Vithanagama
Sent: Sunday, November 23, 2014 10:52 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Avoiding deadlocks when performing bulk update and delete operations
Hi All,
We have a single table which does not have any foreign key references.
id_A (bigint)
id_B (bigint)
val_1 (varchar)
val_2 (varchar)
The primary key of the table is a composite of id_A and id_B.
Reads and writes of this table are highly concurrent and the table has millions of rows. We have several stored procedures which do mass updates and deletes. Those stored procedures are being called concurrently mainly by triggers and application code.
The operations usually look like the following where it could match thousands of records to update or delete:
DELETE FROM table_name t
USING (
SELECT id_A, id_B
FROM table_name
WHERE id_A = ANY(array_of_id_A)
AND id_B = ANY(array_of_id_B)
ORDER BY id_A, id_B
FOR UPDATE
) del
WHERE t.id_A = del.id_A
AND t.id_B = del.id_B;
UPDATE table_name t
SET val_1 = 'some value'
, val_2 = 'some value'
FROM (
SELECT id_A, id_B
FROM table_name
WHERE id_A = ANY(array_of_id_A)
AND id_B = ANY(array_of_id_B)
ORDER BY id_A, id_B
FOR UPDATE
) upd
WHERE t.id_A = upd.id_A
AND t.id_B = upd.id_B;
We are experiencing deadlocks and all our attempts to perform operations with locks (row level using SELECT FOR UPDATE as used in the above queries and table level locks) do not seem to solve these deadlock issues. (Note that we cannot in any way use access exclusive locking on this table because of the performance impact)
Is there another way that we could try to solve these deadlock situations? The reference manual says — "The best defense against deadlocks is generally to avoid them by being certain that all applications using a database acquire locks on multiple objects in a consistent order."
Is there a guaranteed way to do bulk update/delete operations in a particular order so that we can ensure deadlocks won't occur? Or are there any other tricks to avoid deadlocks in this situation?
Thank you in advance,
Sanjaya
May be I’m missing something here, but it seems that you make the problem worse by using :
DELETE … USING (SELECT … FOR UPDATE)…
Can’t you just do:
DELETE FROM table_name
WHERE id_A = ANY(array_of_id_A)
AND id_B = ANY(array_of_id_B);
?
Regards,
Igor Neyman
Is there a good way to individually list the DDL for all individual data base objects? Running a data base dump like: pg_dump mydatabase > mydatabase-database.sql produces one big file with all the DDL and DML to restore the data base, which is very convenient for most cases. Using that I have sometimes cut-and-pasted out of it the DDL for individual items that I want to work on, and then fed that back in with psql mydatabase < newstuff.sql What I would like, though, is a set of output files, one for each DDL/DML item currently represented in the one big file. I could maybe attempt some convoluted automated parsing of the big file with, for example, Perl, but that seems, well, convoluted, error-prone, and probably fragile. The directory dump output option for pg_dump is similar to, but not really, what I want (as far as I see the individual files that produces are only the ones for reloading data, so correct me if I am wrong ... and please show me how to do it right!) I have played around with the custom format dump followed by pg_restore and various options, but did not get what I wanted, at least not as elegantly as I wanted. What I have come up with is fairly simple in appearance, but the way it works, by reading one line-at-a-time from the list file associated with the dump file, and then running pg_restore with just that one line, rinse and repeat for each piece of DDL/DML, also seems convoluted and potentially fragile. Something along the lines of (... if anyone thinks this a good idea, or good starting point ...): grep -v '^;' listfile | while read a b c n do a=${a/;} echo $a > f pg_restore -L f -f outputdir/$a dumpfile done This, as it is, creates a set of files named according to the id number that pg_dump uses to identify each element. Ideally, I would like the files named after the schema+object it represents. Thanks for your help! --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com
Re: How to individually list the DDL for all individual data base objects
Le 2014-11-24 à 10:14, Berend Tober <btober@broadstripe.net> a écrit : > Is there a good way to individually list the DDL for all individual data base objects? > <snip> > grep -v '^;' listfile | while read a b c n > do > a=${a/;} > echo $a > f > pg_restore -L f -f outputdir/$a dumpfile > done > > This, as it is, creates a set of files named according to the id number that pg_dump uses to identify each element. Ideally,I would like the files named after the schema+object it represents. Were you aware that pg_restore can restore to STDOUT, and output DDL for only a single named object from a custom dump file? $ pg_restore —help … -f, --file=FILENAME output file name … -t, --table=NAME restore named table … Such that you could run: $ pg_restore -f public.mytable.sql -t public.mytable whatever.pgdump Unfortunately, this does not respect dependencies and you may have issues. The --disable-triggers option can alleviate someof these problems. YMMV. Hope that helps! François
Berend Tober wrote: > Is there a good way to individually list the DDL for all individual data > base objects? These threads might interest you: http://www.postgresql.org/message-id/AANLkTikLHA2x6U=q-t0j0YS78txHFmdtyxJfsrsRcLqN@mail.gmail.com http://www.postgresql.org/message-id/CAASwCXdKROy2UHLof4Zvxx00goSccqXkg_g4Yf9vOuK8aXWh0g@mail.gmail.com -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
François Beausoleil wrote: > > Le 2014-11-24 à 10:14, Berend Tober <btober@broadstripe.net> a écrit : > >> Is there a good way to individually list the DDL for all individual data base objects? >> > > <snip> > > > Were you aware that pg_restore can restore to STDOUT, and output DDL for only a single named object from a custom dumpfile? Yes sir, thank you. That is what the script does. But that little script seems fragilly-dependent upon the format of the list file and runs (starts a new process for) pg_restore for every individual line in the list file. ... which seems like poor practise, generally. Additionally, I'd like stability in the output file names, rather than the (likely changing) internal, automatically numerically-named items. And, btw, I want not just a single table. The dependance issue is potentially a problem, but (as I believe) it looks like the output of pg_restore in a list file is in the correct order to process dependencies, so I could walk backwards of forwards through that if changes to one object were dependent on, or caused cascading effects in other objects. > > $ pg_restore —help > … > > -f, --file=FILENAME output file name > … > -t, --table=NAME restore named table > … > > Such that you could run: > > $ pg_restore -f public.mytable.sql -t public.mytable whatever.pgdump > > Unfortunately, this does not respect dependencies and you may have issues. The --disable-triggers option can alleviatesome of these problems. YMMV. > > Hope that helps! > François > --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com
Re: How to individually list the DDL for all individual data base objects
https://github.com/omniti-labs/pg_extractor
Berend Tober wrote:
> Is there a good way to individually list the DDL for all individual data
> base objects?
These threads might interest you:
http://www.postgresql.org/message-id/AANLkTikLHA2x6U=q-t0j0YS78txHFmdtyxJfsrsRcLqN@mail.gmail.com
http://www.postgresql.org/message-id/CAASwCXdKROy2UHLof4Zvxx00goSccqXkg_g4Yf9vOuK8aXWh0g@mail.gmail.com
--
Álvaro Herrera http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

On 11/24/2014 08:12 AM, Berend Tober wrote: > François Beausoleil wrote: >> >> Le 2014-11-24 à 10:14, Berend Tober <btober@broadstripe.net> a écrit : >> >>> Is there a good way to individually list the DDL for all individual >>> data base objects? >>> >> >> <snip> >> >> >> Were you aware that pg_restore can restore to STDOUT, and output DDL >> for only a single named object from a custom dump file? > > Yes sir, thank you. That is what the script does. > > But that little script seems fragilly-dependent upon the format of the > list file and runs (starts a new process for) pg_restore for every > individual line in the list file. ... which seems like poor practise, > generally. Additionally, I'd like stability in the output file names, > rather than the (likely changing) internal, automatically > numerically-named items. > > And, btw, I want not just a single table. > > The dependance issue is potentially a problem, but (as I believe) it > looks like the output of pg_restore in a list file is in the correct > order to process dependencies, so I could walk backwards of forwards > through that if changes to one object were dependent on, or caused > cascading effects in other objects. At this point I have to ask: What is the problem you are trying to solve? Your last paragraph seems to be reinventing the -Fc TOC, so I am not sure where you going with this? > > > >> >> $ pg_restore —help >> … >> >> -f, --file=FILENAME output file name >> … >> -t, --table=NAME restore named table >> … >> >> Such that you could run: >> >> $ pg_restore -f public.mytable.sql -t public.mytable whatever.pgdump >> >> Unfortunately, this does not respect dependencies and you may have >> issues. The --disable-triggers option can alleviate some of these >> problems. YMMV. >> >> Hope that helps! >> François >> > > > --- > This email is free from viruses and malware because avast! Antivirus > protection is active. > http://www.avast.com > > > -- Adrian Klaver adrian.klaver@aklaver.com
Adrian Klaver wrote: > On 11/24/2014 08:12 AM, Berend Tober wrote: >> François Beausoleil wrote: >>> >>> Le 2014-11-24 à 10:14, Berend Tober <btober@broadstripe.net> a écrit : >>> >>>> Is there a good way to individually list the DDL for all individual >>>> data base objects? >>>> >>> >>> <snip> >>> >>> >>> Were you aware that pg_restore can restore to STDOUT, and output DDL >>> for only a single named object from a custom dump file? >> >> <snip> <snip> >> >> The dependance issue is potentially a problem, but (as I believe) it >> looks like the output of pg_restore in a list file is in the correct >> order to process dependencies, so I could walk backwards of forwards >> through that if changes to one object were dependent on, or caused >> cascading effects in other objects. > > At this point I have to ask: > > What is the problem you are trying to solve? > > Your last paragraph seems to be reinventing the -Fc TOC, so I am not > sure where you going with this? On the small scale, I wanted to have an individual file listing the DDL for each data base object in the case that I need to modify the object ... I could start with the script that created it as a basis for modifications, like for views and functions, etc. In the larger scale, I was thinking I would like to check in all of the individual modules to revision control, retaining a fine-grained control, rather than submitting the one big comprehensive file. --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com
Melvin Davidson wrote: > You should probably look at the pg_extractor utility. > > https://github.com/omniti-labs/pg_extractor > > With it, you can dump individual or selected objects to separate > directories. That looks like what I'm looking for. (Note: I did Google searching, but apparently did not use the right set of search terms, because this tool really is described as being what it is that I am looking to do!) Thanks! --- This email is free from viruses and malware because avast! Antivirus protection is active. http://www.avast.com
On Mon, 24 Nov 2014 14:51:42 +1100 Sanjaya Vithanagama <svithanagama@gmail.com> wrote: > Hi All, > > We have a single table which does not have any foreign key references. > > id_A (bigint) > id_B (bigint) > val_1 (varchar) > val_2 (varchar) > > The primary key of the table is a composite of id_A and id_B. > > Reads and writes of this table are highly concurrent and the table has > millions of rows. We have several stored procedures which do mass updates > and deletes. Those stored procedures are being called concurrently mainly > by triggers and application code. > > The operations usually look like the following where it could match > thousands of records to update or delete: > > DELETE FROM table_name t > USING ( > SELECT id_A, id_B > FROM table_name > WHERE id_A = ANY(array_of_id_A) > AND id_B = ANY(array_of_id_B) > ORDER BY id_A, id_B > FOR UPDATE > ) del > WHERE t.id_A = del.id_A > AND t.id_B = del.id_B; > > > UPDATE table_name t > SET val_1 = 'some value' > , val_2 = 'some value' > FROM ( > SELECT id_A, id_B > FROM table_name > WHERE id_A = ANY(array_of_id_A) > AND id_B = ANY(array_of_id_B) > ORDER BY id_A, id_B > FOR UPDATE > ) upd > WHERE t.id_A = upd.id_A > AND t.id_B = upd.id_B; > > We are experiencing deadlocks and all our attempts to perform operations > with locks (row level using SELECT FOR UPDATE as used in the above queries > and table level locks) do not seem to solve these deadlock issues. (Note > that we cannot in any way use access exclusive locking on this table > because of the performance impact) > > Is there another way that we could try to solve these deadlock situations? > The reference manual says ? "The best defense against deadlocks is > generally to avoid them by being certain that all applications using a > database acquire locks on multiple objects in a consistent order." > > Is there a guaranteed way to do bulk update/delete operations in a > particular order so that we can ensure deadlocks won't occur? Or are there > any other tricks to avoid deadlocks in this situation? Lots of stuff to say about this ... First off, Igor has a valid point that the subselects are not helping any and may be making the situation slightly worse. I can't see any reason not to simiplify the queries as he suggested. Secondly, a lot of your comments are too vague for me to understand what's happening, so I'm going to ask a bunch of questions to clarify: * How many UPDATE/INSERT queries are you running per second? * How many in parallel on average? * What's the typical execution time for an UPDATE/INSERT that might cause this problem? * How frequently do deadlocks occur? * Are there other tables involved in the transactions ... i.e., have you confirmed that these are the _only_ tables causing the deadlock? Since you didn't include any log output, I'm fuzzy on some of those things above ... but I'm assuming that you're unable to post specific details of the precise problem. I have a lot of suggestions, but instead of bombing you with all of them, I think it would be better if you answered those questions, which will tell me which suggestions are most likely to help. -- Bill Moran I need your help to succeed: http://gamesbybill.com
Re: Avoiding deadlocks when performing bulk update and delete operations
Lots of stuff to say about this ...On Mon, 24 Nov 2014 14:51:42 +1100
Sanjaya Vithanagama <svithanagama@gmail.com> wrote:
> Hi All,
>
> We have a single table which does not have any foreign key references.
>
> id_A (bigint)
> id_B (bigint)
> val_1 (varchar)
> val_2 (varchar)
>
> The primary key of the table is a composite of id_A and id_B.
>
> Reads and writes of this table are highly concurrent and the table has
> millions of rows. We have several stored procedures which do mass updates
> and deletes. Those stored procedures are being called concurrently mainly
> by triggers and application code.
>
> The operations usually look like the following where it could match
> thousands of records to update or delete:
>
> DELETE FROM table_name t
> USING (
> SELECT id_A, id_B
> FROM table_name
> WHERE id_A = ANY(array_of_id_A)
> AND id_B = ANY(array_of_id_B)
> ORDER BY id_A, id_B
> FOR UPDATE
> ) del
> WHERE t.id_A = del.id_A
> AND t.id_B = del.id_B;
>
>
> UPDATE table_name t
> SET val_1 = 'some value'
> , val_2 = 'some value'
> FROM (
> SELECT id_A, id_B
> FROM table_name
> WHERE id_A = ANY(array_of_id_A)
> AND id_B = ANY(array_of_id_B)
> ORDER BY id_A, id_B
> FOR UPDATE
> ) upd
> WHERE t.id_A = upd.id_A
> AND t.id_B = upd.id_B;
>
> We are experiencing deadlocks and all our attempts to perform operations
> with locks (row level using SELECT FOR UPDATE as used in the above queries
> and table level locks) do not seem to solve these deadlock issues. (Note
> that we cannot in any way use access exclusive locking on this table
> because of the performance impact)
>
> Is there another way that we could try to solve these deadlock situations?
> The reference manual says ? "The best defense against deadlocks is
> generally to avoid them by being certain that all applications using a
> database acquire locks on multiple objects in a consistent order."
>
> Is there a guaranteed way to do bulk update/delete operations in a
> particular order so that we can ensure deadlocks won't occur? Or are there
> any other tricks to avoid deadlocks in this situation?
First off, Igor has a valid point that the subselects are not helping any
and may be making the situation slightly worse. I can't see any reason not
to simiplify the queries as he suggested.
Secondly, a lot of your comments are too vague for me to understand what's
happening, so I'm going to ask a bunch of questions to clarify:
* How many UPDATE/INSERT queries are you running per second?
* How many in parallel on average?
* What's the typical execution time for an UPDATE/INSERT that might cause
this problem?
* How frequently do deadlocks occur?
* Are there other tables involved in the transactions ... i.e., have you
confirmed that these are the _only_ tables causing the deadlock?
Since you didn't include any log output, I'm fuzzy on some of those things
above ... but I'm assuming that you're unable to post specific details of
the precise problem.
I have a lot of suggestions, but instead of bombing you with all of them, I
think it would be better if you answered those questions, which will tell
me which suggestions are most likely to help.
--
Bill Moran
I need your help to succeed:
http://gamesbybill.com
On Wed, 26 Nov 2014 10:41:56 +1100 Sanjaya Vithanagama <svithanagama@gmail.com> wrote: > > > * How frequently do deadlocks occur? > > We are seeing deadlocks about 2-3 times per day in the production server. > To reproduce the problem easily we've written a simple Java class with > multiple threads calling to the stored procedures running the above queries > inside a loop. This way we can easily recreate a scenario that happens in > the production. Don't overcomplicate your solution. Adjust your code to detect the deadlock and replay the transaction when it happens. At 2-3 deadlocks per day, it's difficult to justify any other solution (as any other solution would be more time-consuming to implement, AND would interfere with performance). I've worked with a number of write-heavy applications that experienced deadlocks, some of them on the order of hundreds of deadlocks per day. In some cases, you can adjust the queries to reduce the incidence of deadlocks, or eliminate the possibility of deadlocks completely. The situation that you describe is not one of those cases, as the planner can choose to lock rows in whatever order it thinks it most efficient and you don't have direct control over that. The performance hit you'll take 2-3 times a day when a statement has to be replayed due to deadlock will hardly be noticed (although a statement that takes 50 seconds will cause raised eyebrows if it runs 2x) but that will only happen 2-3 times a day, and the solution I'm proposing won't have any performance impact on the other 13000000 queries per day that don't deadlock. 2-3 deadlocks per day is normal operation for a heavily contented table, in my experience. -- Bill Moran I need your help to succeed: http://gamesbybill.com
Re: Avoiding deadlocks when performing bulk update and delete operations
On Wed, 26 Nov 2014 10:41:56 +1100
Sanjaya Vithanagama <svithanagama@gmail.com> wrote:
>
> > * How frequently do deadlocks occur?
>
> We are seeing deadlocks about 2-3 times per day in the production server.
> To reproduce the problem easily we've written a simple Java class with
> multiple threads calling to the stored procedures running the above queries
> inside a loop. This way we can easily recreate a scenario that happens in
> the production.
Don't overcomplicate your solution. Adjust your code to detect the deadlock
and replay the transaction when it happens. At 2-3 deadlocks per day, it's
difficult to justify any other solution (as any other solution would be
more time-consuming to implement, AND would interfere with performance).
I've worked with a number of write-heavy applications that experienced
deadlocks, some of them on the order of hundreds of deadlocks per day.
In some cases, you can adjust the queries to reduce the incidence of
deadlocks, or eliminate the possibility of deadlocks completely. The
situation that you describe is not one of those cases, as the planner
can choose to lock rows in whatever order it thinks it most efficient
and you don't have direct control over that.
The performance hit you'll take 2-3 times a day when a statement has to
be replayed due to deadlock will hardly be noticed (although a statement
that takes 50 seconds will cause raised eyebrows if it runs 2x) but that
will only happen 2-3 times a day, and the solution I'm proposing won't
have any performance impact on the other 13000000 queries per day that
don't deadlock.
2-3 deadlocks per day is normal operation for a heavily contented table,
in my experience.
On Thu, 27 Nov 2014 15:07:49 +1100 Sanjaya Vithanagama <svithanagama@gmail.com> wrote: > On Wed, Nov 26, 2014 at 11:47 AM, Bill Moran <wmoran@potentialtech.com> > wrote: > > > On Wed, 26 Nov 2014 10:41:56 +1100 > > Sanjaya Vithanagama <svithanagama@gmail.com> wrote: > > > > > > > * How frequently do deadlocks occur? > > > > > > We are seeing deadlocks about 2-3 times per day in the production server. > > > To reproduce the problem easily we've written a simple Java class with > > > multiple threads calling to the stored procedures running the above > > queries > > > inside a loop. This way we can easily recreate a scenario that happens in > > > the production. > > > > Don't overcomplicate your solution. Adjust your code to detect the deadlock > > and replay the transaction when it happens. At 2-3 deadlocks per day, it's > > difficult to justify any other solution (as any other solution would be > > more time-consuming to implement, AND would interfere with performance). > > When you say replay the transaction, I believe that is to catch the > exception inside the stored procedure? We've considered that option at one > state but, the problem with that is we don't have enough context > information at the stored procedure where this deadlock occurs. Why not catch it in the application calling the stored procedure? I don't understand how you could not have enough context to run the command you were just trying to run. Can you elaborate on what you mean by that? > > I've worked with a number of write-heavy applications that experienced > > deadlocks, some of them on the order of hundreds of deadlocks per day. > > In some cases, you can adjust the queries to reduce the incidence of > > deadlocks, or eliminate the possibility of deadlocks completely. The > > situation that you describe is not one of those cases, as the planner > > can choose to lock rows in whatever order it thinks it most efficient > > and you don't have direct control over that. > > > > The performance hit you'll take 2-3 times a day when a statement has to > > be replayed due to deadlock will hardly be noticed (although a statement > > that takes 50 seconds will cause raised eyebrows if it runs 2x) but that > > will only happen 2-3 times a day, and the solution I'm proposing won't > > have any performance impact on the other 13000000 queries per day that > > don't deadlock. > > > > 2-3 deadlocks per day is normal operation for a heavily contented table, > > in my experience. > > Given that we have no control over how Postgres performs delete and update > operations, the only other possibility seems to be to partition this table > by id_A (so that the individual tables will never be deadlocked). But that > seems to be a too extreme end option at this stage. That would be overcomplicating the solution, and almost certainly won't work anyway. If you're getting deadlocks, it's because two processes are trying to modify the same rows. Even if you partition, those same rows will be on the same partition, so you'll still deadlock. -- Bill Moran I need your help to succeed: http://gamesbybill.com
On Thu, 2014-11-27 at 06:49 -0500, Bill Moran wrote: > On Thu, 27 Nov 2014 15:07:49 +1100 > Sanjaya Vithanagama <svithanagama@gmail.com> wrote: > > > On Wed, Nov 26, 2014 at 11:47 AM, Bill Moran <wmoran@potentialtech.com> > > wrote: > > > > > On Wed, 26 Nov 2014 10:41:56 +1100 > > > Sanjaya Vithanagama <svithanagama@gmail.com> wrote: > > > > > > > > > * How frequently do deadlocks occur? > > > > > > > > We are seeing deadlocks about 2-3 times per day in the production server. > > > > To reproduce the problem easily we've written a simple Java class with > > > > multiple threads calling to the stored procedures running the above > > > queries > > > > inside a loop. This way we can easily recreate a scenario that happens in > > > > the production. > > > > > > Don't overcomplicate your solution. Adjust your code to detect the deadlock > > > and replay the transaction when it happens. At 2-3 deadlocks per day, it's > > > difficult to justify any other solution (as any other solution would be > > > more time-consuming to implement, AND would interfere with performance). > > > > When you say replay the transaction, I believe that is to catch the > > exception inside the stored procedure? We've considered that option at one > > state but, the problem with that is we don't have enough context > > information at the stored procedure where this deadlock occurs. > > Why not catch it in the application calling the stored procedure? > > I don't understand how you could not have enough context to run the command > you were just trying to run. Can you elaborate on what you mean by that? > > > > I've worked with a number of write-heavy applications that experienced > > > deadlocks, some of them on the order of hundreds of deadlocks per day. > > > In some cases, you can adjust the queries to reduce the incidence of > > > deadlocks, or eliminate the possibility of deadlocks completely. The > > > situation that you describe is not one of those cases, as the planner > > > can choose to lock rows in whatever order it thinks it most efficient > > > and you don't have direct control over that. > > > > > > The performance hit you'll take 2-3 times a day when a statement has to > > > be replayed due to deadlock will hardly be noticed (although a statement > > > that takes 50 seconds will cause raised eyebrows if it runs 2x) but that > > > will only happen 2-3 times a day, and the solution I'm proposing won't > > > have any performance impact on the other 13000000 queries per day that > > > don't deadlock. > > > > > > 2-3 deadlocks per day is normal operation for a heavily contented table, > > > in my experience. > > > > Given that we have no control over how Postgres performs delete and update > > operations, the only other possibility seems to be to partition this table > > by id_A (so that the individual tables will never be deadlocked). But that > > seems to be a too extreme end option at this stage. > > That would be overcomplicating the solution, and almost certainly won't work > anyway. If you're getting deadlocks, it's because two processes are trying > to modify the same rows. Even if you partition, those same rows will be on > the same partition, so you'll still deadlock. > > -- > Bill Moran > I need your help to succeed: > http://gamesbybill.com > > I've been following this discussion for a few days. In my experience, deadlocks are either caused by poor database design or within the application. As you are only talking about a single table, then the problem has to be with the application. You mentioned that you are using Java and so can we assume there is a class for the table with getters and setters? The table should also have its action factory where there ought to be a method to handle updates and deletes? Instead of passing int_a and int_b values via an array, why not have a method to which you pass an int_a value, an int_b value, a boolean to indicate either update or delete and the values for update or nulls if deleting. Then you can go into transaction state, lock the row for update and throw all the exceptions you need and gracefully handle any feedback to your users. If you cannot lock the row just display an alert "Please try again in five minutes" or something similar. HTH. Rob