Re: BUG #11264: Auto vacuum wraparound job blocking everything - Mailing list pgsql-bugs
From | Alvaro Herrera |
---|---|
Subject | Re: BUG #11264: Auto vacuum wraparound job blocking everything |
Date | |
Msg-id | 20140827152636.GC7046@eldon.alvh.no-ip.org Whole thread Raw |
In response to | BUG #11264: Auto vacuum wraparound job blocking everything (dbhandary@switchfly.com) |
Responses |
Re: BUG #11264: Auto vacuum wraparound job blocking everything
Re: BUG #11264: Auto vacuum wraparound job blocking everything |
List | pgsql-bugs |
dbhandary@switchfly.com wrote: > We recently upgraded to postgressql 9.3.5 from postgres 9.1 using > pg_upgrade. We are running streaming replication. DB was functioning without > any issues till today where we discovered an auto vacuum wraparound job on > one of our tables which was blocking everything. There were no waiting jobs > when we queried pg_stat_activity, but incoming request would just hang, so > database was unusable. Can you please paste select oid::regclass, relfrozenxid, relminmxid from pg_class where relname = 'room_contract_service_code'; select datname, datfrozenxid, datminmxid from pg_database where datname = 'jetblue'; > (gdb) bt > #0 0x00007fb9740eb5b3 in __select_nocancel () from /lib64/libc.so.6 > #1 0x000000000075ad7a in pg_usleep () > #2 0x00000000004ae2d4 in GetMultiXactIdMembers () > #3 0x000000000047f6cc in heap_prepare_freeze_tuple () > #4 0x000000000057d036 in lazy_vacuum_rel () This stall in GetMultiXactIdMembers should only occur when somebody is writing the multixact immediately following. It certainly should not sleep for long. I'm not sure what's happening here but this is probably where the problem is. While waiting, it holds the buffer content lock in exclusive mode. Can you install debug symbols and pageinspect? I'm curious why this is stalling. If the tuple has a FOR SHARE or FOR UPDATE marking from before the upgrade, it shouldn't be inquiring the multixact members at all. I'm curious about Latest checkpoint's NextMultiXactId: 23431 Latest checkpoint's NextMultiOffset: 5678 Latest checkpoint's oldestMultiXid: 1 Latest checkpoint's oldestMulti's DB: 16423 It says the oldest multi is 1, so the database should not have any values between 1 and 23431 that correspond to pg_upgrade'd multixact values ... so what is going on here? Unless the recent mucking with pg_upgrade to handle multixact's got something wrong. > 10.33.11.11 | | 40735 | 2014-08-25 10:38:02.944083+00 > | 2014-08-25 10:38:02.964562+00 | 2014-08-25 10:38:22.805859+00 | 2014-08-25 > 10:38:22.805861+00 | f | active | COPY > settings.room_contract_service_code (room_contract_service_code, > room_contract_service_description > , room_contract_service_code_id, supplierid, inactive) TO stdout; > > (gdb) bt > #0 0x00007fb9740f5187 in semop () from /lib64/libc.so.6 > #1 0x0000000000609397 in PGSemaphoreLock () > #2 0x000000000064e871 in LWLockAcquire () > #3 0x000000000047e1a9 in ?? () > #4 0x000000000047e66a in ?? () > #5 0x000000000047f0a6 in heap_getnext () > #6 0x00000000005343e6 in ?? () > #7 0x000000000053521a in DoCopy () > #8 0x000000000066112d in standard_ProcessUtility () > #9 0x00007fb96c4df261 in ?? () from /usr/pgsql-9.3/lib/pg_stat_statements.so This one is stalling on the buffer content lock trying to read the table to dump it. > SELECTS: > > query | > | SELECT room_contract_service_code, > room_contract_service_description, room_contract_service_code_id > | FROM room_contract_service_code > | WHERE room_contract_service_code.supplierid = $1 > | AND NOT inactive > | ORDER by room_contract_service_description > (gdb) bt > #0 0x00007fb9740f5187 in semop () from /lib64/libc.so.6 > #1 0x0000000000609397 in PGSemaphoreLock () > #2 0x000000000064e871 in LWLockAcquire () > #3 0x000000000048d05d in index_fetch_heap () > #4 0x000000000048d22e in index_getnext () And this one is stalling in the same buffer lock most likely. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
pgsql-bugs by date: