Thread: autovacuum process blocks without reporting a deadlock
Hi anybody, I step in just one of our identically customer databases in a kind of a deadlock with Autovacuum involved. One Autovacuum process stuck in the middle of the night and it seemed that it compete with another Select process for an index: [14398 / 2007-11-21 00:52:04 CET]CONTEXT: SQL statement "UPDATE hst_timerecording SET id_timerecording_join = NULL WHERE id_timerecording_join = -1" PL/pgSQL function "set_id_timerecording_join" line 121 at SQL statement SQL statement "UPDATE hst_timerecording SET sales_volume = NULL, sales_volume_commission = NULL WHERE business_day = $1 AND id_employee = $2 " PL/pgSQL function "compress_salaries_day" line 168 at SQL statement SQL statement "SELECT compress_salaries_day( $1 , NULL, NULL)" PL/pgSQL function "compress" line 460 at perform [14398 / 2007-11-21 00:52:04 CET]LOCATION: exec_stmt_raise, pl_exec.c:2110 [14391 / 2007-11-21 00:52:14 CET]DEBUG: 00000: index "hst_timerecording_id_timerecording_idx" now contains 8537 row versions in 61 pages [14391 / 2007-11-21 00:52:14 CET]DETAIL: 4454 index row versions were removed. 12 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 3.47 sec. [14391 / 2007-11-21 00:52:14 CET]LOCATION: lazy_vacuum_index, vacuumlazy.c:736 This are the last log entires for these both processes. Over 9 hours later, i can see them allready running in the process list : 14391 ? S 0:00 postgres: autovacuum process backoffice_db 14398 ? S 0:02 postgres: spoon backoffice_db office(39302) SELECT waiting This happens every night and a dump restore wont help. I dont set any explicit locks and so i would expect that no deadlock could occure? And when, then i would expect that it would be logged together with the dead locked relations? Can anyone give a tipp, please? Version: PostgreSQL 8.1.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.0.2 (Debian 4.0.2-2) regards thomas
Thomas Chille escribió: > This are the last log entires for these both processes. Over 9 hours > later, i can see them allready running in the process list : > > 14391 ? S 0:00 postgres: autovacuum process > backoffice_db > 14398 ? S 0:02 postgres: spoon backoffice_db office(39302) > SELECT waiting Can you paste the relevant lines from pg_locks? -- Alvaro Herrera http://www.PlanetPostgreSQL.org/ "Estoy de acuerdo contigo en que la verdad absoluta no existe... El problema es que la mentira sí existe y tu estás mintiendo" (G. Lama)
i have to wait till monday, then i can provide these lines. thanks, thomas On Nov 23, 2007 1:32 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Thomas Chille escribió: > > > This are the last log entires for these both processes. Over 9 hours > > later, i can see them allready running in the process list : > > > > 14391 ? S 0:00 postgres: autovacuum process > > backoffice_db > > 14398 ? S 0:02 postgres: spoon backoffice_db office(39302) > > SELECT waiting > > Can you paste the relevant lines from pg_locks? > > -- > Alvaro Herrera http://www.PlanetPostgreSQL.org/ > "Estoy de acuerdo contigo en que la verdad absoluta no existe... > El problema es que la mentira sí existe y tu estás mintiendo" (G. Lama) >
"Thomas Chille" <thomas.chille@gmail.com> writes: > One Autovacuum process stuck in the middle of the night and it seemed > that it compete with another Select process for an index: > [14391 / 2007-11-21 00:52:14 CET]DEBUG: 00000: index > "hst_timerecording_id_timerecording_idx" now contains 8537 row > versions in 61 pages What other indexes does that table have? regards, tom lane
On Nov 24, 2007 6:20 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > What other indexes does that table have? > > regards, tom lane > Hi, last night it happend again. In the log-snippet u can see all indexes of this table: [9293 / 2007-11-26 21:46:28 CET]CONTEXT: SQL statement "UPDATE hst_timerecording SET id_timerecording_join = NULL WHERE id_timerecording_join = -1" PL/pgSQL function "set_id_timerecording_join" line 121 at SQL statement SQL statement "UPDATE hst_timerecording SET sales_volume = NULL, sales_volume_commission = NULL WHERE business_day = $1 AND id_employee = $2 " PL/pgSQL function "compress_salaries_day" line 168 at SQL statement SQL statement "SELECT compress_salaries_day( $1 , NULL, NULL)" PL/pgSQL function "compress" line 460 at perform [9293 / 2007-11-26 21:46:28 CET]LOCATION: exec_stmt_raise, pl_exec.c:2110 [9317 / 2007-11-26 21:46:34 CET]DEBUG: 00000: index "hst_timerecording_business_day_idx" now contains 8640 row versions in 80 pages [9317 / 2007-11-26 21:46:34 CET]DETAIL: 4469 index row versions were removed. 13 index pages have been deleted, 9 are currently reusable. CPU 0.00s/0.00u sec elapsed 3.40 sec. [9317 / 2007-11-26 21:46:34 CET]LOCATION: lazy_vacuum_index, vacuumlazy.c:736 [9317 / 2007-11-26 21:46:39 CET]DEBUG: 00000: index "hst_timerecording_id_employee_idx" now contains 8640 row versions in 95 pages [9317 / 2007-11-26 21:46:39 CET]DETAIL: 4469 index row versions were removed. 10 index pages have been deleted, 6 are currently reusable. CPU 0.00s/0.00u sec elapsed 4.22 sec. [9317 / 2007-11-26 21:46:39 CET]LOCATION: lazy_vacuum_index, vacuumlazy.c:736 [9317 / 2007-11-26 21:46:43 CET]DEBUG: 00000: index "hst_timerecording_id_timerecording_idx" now contains 8640 row versions in 97 pages [9317 / 2007-11-26 21:46:43 CET]DETAIL: 4469 index row versions were removed. 11 index pages have been deleted, 5 are currently reusable. CPU 0.00s/0.00u sec elapsed 4.40 sec. [9317 / 2007-11-26 21:46:43 CET]LOCATION: lazy_vacuum_index, vacuumlazy.c:736 This are again the last log-entries of the blocking processes. Obviously the three indexes where successful vacuumed? I think this are the relevant pg_locks entries: relation 75685778 75686189 9017862 25467 AccessShareLock f relation 75685778 75686189 9009323 9317 ShareUpdateExclusiveLock t relation 75685778 75686189 9009312 9293 AccessShareLock t relation 75685778 75686189 9009312 9293 RowExclusiveLock t relation 75685778 75686189 9009312 9293 AccessExclusiveLock f relation 75685778 75686189 9012978 28370 AccessShareLock f 75686189 is the table hst_timerecording. for me it looks like the autovacuum is not releasing the blocking ShareUpdateExclusiveLock? I hope this infos could help. regards, t thomas
Thomas Chille wrote: > I think this are the relevant pg_locks entries: > > relation 75685778 75686189 > 9017862 25467 AccessShareLock f > relation 75685778 75686189 > 9009323 9317 ShareUpdateExclusiveLock > t > relation 75685778 75686189 > 9009312 9293 AccessShareLock t > relation 75685778 75686189 > 9009312 9293 RowExclusiveLock t > relation 75685778 75686189 > 9009312 9293 AccessExclusiveLock f > relation 75685778 75686189 > 9012978 28370 AccessShareLock f > > 75686189 is the table hst_timerecording. for me it looks like the > autovacuum is not releasing the blocking ShareUpdateExclusiveLock? What are the column headings? I find this difficult to read. Please post the whole of pg_locks. I may be missing something but I think we're missing part of the picture here. Autovacuum does not seem to be locking on anything. -- Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4 One man's impedance mismatch is another man's layer of abstraction. (Lincoln Yeoh)
On Nov 27, 2007 3:14 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: hat are the column headings? I find this difficult to read. > > Please post the whole of pg_locks. I may be missing something but I > think we're missing part of the picture here. Autovacuum does not seem > to be locking on anything. Unfortunately i logged the pg_locks-state not well formated. i added now the heading manually. maybe it is better or i have to wait till tomorrow morning. but what i think i can see, is that the process with pid 9317 is holding a ShareUpdateExclusiveLock locktype | database | relation | transaction | pid | mode | granted relation | 75685778 | 75686189 | 9009323 | 9317 | ShareUpdateExclusiveLock | t and process with pid 9293 is waiting to get AccessExclusiveLock on the same relation: relation | 75685778 | 75686189 | 9009312 | 9293 | AccessShareLock | f ... and 9317 is the autvacuumprocess? afterkiling it, erything is fine. i attached the full pg_lock. regards, thomas
Attachment
Thomas Chille wrote: > On Nov 27, 2007 3:14 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > hat are the column headings? I find this difficult to read. > > > > Please post the whole of pg_locks. I may be missing something but I > > think we're missing part of the picture here. Autovacuum does not seem > > to be locking on anything. > > Unfortunately i logged the pg_locks-state not well formated. i added > now the heading manually. maybe it is better or i have to wait till > tomorrow morning. > > but what i think i can see, is that the process with pid 9317 is > holding a ShareUpdateExclusiveLock You missed that 9317 is also holding an ExclusiveLock. What are the vacuum_cost_delay and autovacuum_vacuum_cost_delay setting? I think what's happening here is that you're being bitten by the bug that made autovac sleep because of vacuum_delay, with the exclusive lock held trying to truncate the table. The solution is to upgrade to 8.1.10. Also I just noticed you're on 8.1.4. This is a bad idea because of another ancient autovacuum bug. As soon as you upgrade, allow connections to template0 (update pg_database, set datallowconn), connect to it and issue VACUUM FREEZE. Then disallow connections to it again. -- Alvaro Herrera http://www.PlanetPostgreSQL.org/ "La experiencia nos dice que el hombre peló millones de veces las patatas, pero era forzoso admitir la posibilidad de que en un caso entre millones, las patatas pelarían al hombre" (Ijon Tichy)
Alvaro Herrera wrote: > Thomas Chille wrote: > > On Nov 27, 2007 3:14 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > hat are the column headings? I find this difficult to read. > > > > > > Please post the whole of pg_locks. I may be missing something but I > > > think we're missing part of the picture here. Autovacuum does not seem > > > to be locking on anything. > > > > Unfortunately i logged the pg_locks-state not well formated. i added > > now the heading manually. maybe it is better or i have to wait till > > tomorrow morning. > > > > but what i think i can see, is that the process with pid 9317 is > > holding a ShareUpdateExclusiveLock > > You missed that 9317 is also holding an ExclusiveLock. Sorry, I misread it. My analysis is bogus :-) I suggest you upgrade anyway because there are nasty bugs in the version you are using. I assume 77862554, 77862552 and 77862553 are indexes on the 75686189 table? -- Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4 "La experiencia nos dice que el hombre peló millones de veces las patatas, pero era forzoso admitir la posibilidad de que en un caso entre millones, las patatas pelarían al hombre" (Ijon Tichy)
yes, u are right. this are the 3 involved indexes: hst_timerecording_business_day_idx on hst_timerecording hst_timerecording_id_employee_idxon hst_timerecording hst_timerecording_id_timerecording_idxon hst_timerecording lg t On Nov 27, 2007 4:07 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > Alvaro Herrera wrote: > > Thomas Chille wrote: > > > On Nov 27, 2007 3:14 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > hat are the column headings? I find this difficult to read. > > > > > > > > Please post the whole of pg_locks. I may be missing something but I > > > > think we're missing part of the picture here. Autovacuum does not seem > > > > to be locking on anything. > > > > > > Unfortunately i logged the pg_locks-state not well formated. i added > > > now the heading manually. maybe it is better or i have to wait till > > > tomorrow morning. > > > > > > but what i think i can see, is that the process with pid 9317 is > > > holding a ShareUpdateExclusiveLock > > > > You missed that 9317 is also holding an ExclusiveLock. > > Sorry, I misread it. My analysis is bogus :-) I suggest you upgrade > anyway because there are nasty bugs in the version you are using. > > I assume 77862554, 77862552 and 77862553 are indexes on the 75686189 > table? > > -- > Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4 > > "La experiencia nos dice que el hombre peló millones de veces las patatas, > pero era forzoso admitir la posibilidad de que en un caso entre millones, > las patatas pelarían al hombre" (Ijon Tichy) >
"Thomas Chille" <thomas@chille.de> writes: > I think this are the relevant pg_locks entries: > relation 75685778 75686189 > 9017862 25467 AccessShareLock f > relation 75685778 75686189 > 9009323 9317 ShareUpdateExclusiveLock > t > relation 75685778 75686189 > 9009312 9293 AccessShareLock t > relation 75685778 75686189 > 9009312 9293 RowExclusiveLock t > relation 75685778 75686189 > 9009312 9293 AccessExclusiveLock f > relation 75685778 75686189 > 9012978 28370 AccessShareLock f I don't think the vacuum is the problem here. The problem is process 9293, which for some reason is trying to get AccessExclusiveLock, and is blocked behind autovac, and everything else is stacking up behind it. You didn't happen to note what 9293 was doing did you? It's living fairly dangerously in any case by trying to acquire exclusive lock when it already holds a bunch of other lower-level locks; that's a recipe for deadlock if I ever saw one. regards, tom lane
On Nov 27, 2007 4:52 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > You didn't happen to note what 9293 was doing did you? It's living > fairly dangerously in any case by trying to acquire exclusive lock > when it already holds a bunch of other lower-level locks; that's a > recipe for deadlock if I ever saw one. > > regards, tom lane > Ah ok, 9293 is a triggerd process and tries to "ALTER TABLE ... DISABLE TRIGGER (other trigger)" and so implicitly tries to acquire an AccessExclusiveLock and runs in a deadlock? So is better not to use "ALTER TABLE ... " in triggerfunctions, because there are always existing lower-level locks? regards, thomas
"Thomas Chille" <thomas@chille.de> writes: > Ah ok, 9293 is a triggerd process and tries to "ALTER TABLE ... > DISABLE TRIGGER (other trigger)" and so implicitly tries to acquire an > AccessExclusiveLock and runs in a deadlock? Well, you're certainly risking deadlock with that; and even if no actual deadlock happens, waiting for the exclusive lock needed to do the ALTER can cause the kind of lock queueing you had here. I'd suggest trying to think of a way to solve your problem that doesn't need trigger enabling/disabling. regards, tom lane