Home > mailing lists

Re: BUG #11264: Auto vacuum wraparound job blocking everything - Mailing list pgsql-bugs

From	Alvaro Herrera
Subject	Re: BUG #11264: Auto vacuum wraparound job blocking everything
Date	August 27, 2014 15:26:47
Msg-id	20140827152636.GC7046@eldon.alvh.no-ip.org Whole thread Raw
In response to	BUG #11264: Auto vacuum wraparound job blocking everything (dbhandary@switchfly.com)
Responses	Re: BUG #11264: Auto vacuum wraparound job blocking everything Re: BUG #11264: Auto vacuum wraparound job blocking everything
List	pgsql-bugs

Tree view

dbhandary@switchfly.com wrote:

> We recently upgraded to postgressql 9.3.5  from postgres 9.1 using
> pg_upgrade. We are running streaming replication. DB was functioning without
> any issues till today where we discovered an auto vacuum wraparound job on
> one of our tables which was blocking everything. There were no waiting jobs
> when we queried pg_stat_activity, but incoming request would just hang, so
> database was unusable.

Can you please paste

select oid::regclass, relfrozenxid, relminmxid from pg_class where relname = 'room_contract_service_code';
select datname, datfrozenxid, datminmxid from pg_database where datname = 'jetblue';

> (gdb) bt
> #0  0x00007fb9740eb5b3 in __select_nocancel () from /lib64/libc.so.6
> #1  0x000000000075ad7a in pg_usleep ()
> #2  0x00000000004ae2d4 in GetMultiXactIdMembers ()
> #3  0x000000000047f6cc in heap_prepare_freeze_tuple ()
> #4  0x000000000057d036 in lazy_vacuum_rel ()

This stall in GetMultiXactIdMembers should only occur when somebody is
writing the multixact immediately following.  It certainly should not
sleep for long.  I'm not sure what's happening here but this is probably
where the problem is.  While waiting, it holds the buffer content lock
in exclusive mode.

Can you install debug symbols and pageinspect?  I'm curious why this is
stalling.  If the tuple has a FOR SHARE or FOR UPDATE marking from
before the upgrade, it shouldn't be inquiring the multixact members at
all.  I'm curious about

Latest checkpoint's NextMultiXactId:  23431
                                                                                           
Latest checkpoint's NextMultiOffset:  5678
                                                                                           
Latest checkpoint's oldestMultiXid:   1
                                                                                           
Latest checkpoint's oldestMulti's DB: 16423
                                                                                           

It says the oldest multi is 1, so the database should not have any
values between 1 and 23431 that correspond to pg_upgrade'd multixact
values ... so what is going on here?  Unless the recent mucking with
pg_upgrade to handle multixact's got something wrong.


> 10.33.11.11 |                 |       40735 | 2014-08-25 10:38:02.944083+00
> | 2014-08-25 10:38:02.964562+00 | 2014-08-25 10:38:22.805859+00 | 2014-08-25
> 10:38:22.805861+00 | f       | active | COPY
> settings.room_contract_service_code (room_contract_service_code,
> room_contract_service_description
> , room_contract_service_code_id, supplierid, inactive) TO stdout;
>
>  (gdb) bt
> #0  0x00007fb9740f5187 in semop () from /lib64/libc.so.6
> #1  0x0000000000609397 in PGSemaphoreLock ()
> #2  0x000000000064e871 in LWLockAcquire ()
> #3  0x000000000047e1a9 in ?? ()
> #4  0x000000000047e66a in ?? ()
> #5  0x000000000047f0a6 in heap_getnext ()
> #6  0x00000000005343e6 in ?? ()
> #7  0x000000000053521a in DoCopy ()
> #8  0x000000000066112d in standard_ProcessUtility ()
> #9  0x00007fb96c4df261 in ?? () from /usr/pgsql-9.3/lib/pg_stat_statements.so

This one is stalling on the buffer content lock trying to read the
table to dump it.

> SELECTS:
>
> query            |
>                  |         SELECT room_contract_service_code,
> room_contract_service_description, room_contract_service_code_id
>                  |         FROM room_contract_service_code
>                  |         WHERE room_contract_service_code.supplierid = $1
>                  |                 AND NOT inactive
>                  |         ORDER by room_contract_service_description
> (gdb) bt
> #0  0x00007fb9740f5187 in semop () from /lib64/libc.so.6
> #1  0x0000000000609397 in PGSemaphoreLock ()
> #2  0x000000000064e871 in LWLockAcquire ()
> #3  0x000000000048d05d in index_fetch_heap ()
> #4  0x000000000048d22e in index_getnext ()

And this one is stalling in the same buffer lock most likely.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

pgsql-bugs by date:

From: Matheus de Oliveira
Date: 27 August 2014, 11:30:18
Subject: Re: BUG #11271: Out of memory while reading tuples.

From: Alvaro Herrera
Date: 27 August 2014, 15:39:33
Subject: Re: BUG #11264: Auto vacuum wraparound job blocking everything

Re: BUG #11264: Auto vacuum wraparound job blocking everything - Mailing list pgsql-bugs

Previous

Next