Re: BUG #6200: standby bad memory allocations on SELECT - Mailing list pgsql-bugs
From | Robert Haas |
---|---|
Subject | Re: BUG #6200: standby bad memory allocations on SELECT |
Date | |
Msg-id | CA+TgmoZ6S5e46xThsHKv6-vV58f==D4_TH_ECB2sQsZRngL+8Q@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #6200: standby bad memory allocations on SELECT (Bridget Frey <bridget.frey@redfin.com>) |
Responses |
Re: BUG #6200: standby bad memory allocations on SELECT
|
List | pgsql-bugs |
On Mon, Jan 23, 2012 at 3:22 PM, Bridget Frey <bridget.frey@redfin.com> wro= te: > Hello, > We upgraded to postgres 9.1.2 two weeks ago, and we are also experiencing= an > issue that seems very similar to the one reported as bug 6200.=A0 We see > approximately 2 dozen alloc errors per day across 3 slaves, and we are > getting one segfault approximately every 3 days.=A0 We did not experience= this > issue before our upgrade (we were on version 8.4, and used skytools for > replication). > > We are attempting to get a core dump on segfault (our last attempt did not > work due to a config issue for the core dump).=A0 We're also attempting to > repro the alloc errors on a test setup, but it seems like we may need qui= te > a bit of load to trigger the issue.=A0 We're not certain that the alloc i= ssues > and the sefaults are "the same issue" - but it seems that it may be since > the OP for bug 6200 sees the same behavior.=A0 We have seen no issues on = the > master, all alloc errors and segfaults have been on the slaves. > > We've seen the alloc errors on a few different tables, but most frequently > on logins.=A0 Rows are added to the logins table one-by-one, and updates > generally happen one row at a time.=A0 The table is pretty basic, it looks > like this... > > CREATE TABLE logins > ( > =A0 login_id bigserial NOT NULL, > =A0 <snip - a bunch of columns> > =A0 CONSTRAINT logins_pkey PRIMARY KEY (login_id ), > =A0 <snip - some other constraints...> > ) > WITH ( > =A0 FILLFACTOR=3D80, > =A0 OIDS=3DFALSE > ); > > The queries that trigger the alloc error on this table look like this (we > use hibernate hence the funny underscoring...) > select login0_.login_id as login1_468_0_, l...=A0 from logins login0_ whe= re > login0_.login_id=3D$1 > > The alloc error in the logs looks like this: > -01-12_080925.log:2012-01-12 17:33:46 PST [16034]: [7-1] [24/25934] ERROR: > invalid memory alloc request size 18446744073709551613 > > The alloc error is nearly always for size 18446744073709551613 - though we > have seen one time where it was a different amount... Hmm, that number in hex works out to 0xfffffffffffffffd, which makes it sound an awful lot like the system (for some unknown reason) attempted to allocate -3 bytes of memory. I've seen something like this once before on a customer system running a modified version of PostgreSQL. In that case, the problem turned out to be page corruption. Circumstances didn't permit determination of the root cause of the page corruption, however, nor was I able to figure out exactly how the corruption I saw resulted in an allocation request like this. It would be nice to figure out where in the code this is happening and put in a higher-level guard so that we get a better error message. You want want to compile a modified PostgreSQL executable that puts an extremely long sleep (like a year) just before this error is reported. Then, when the system hangs at that point, you can attach a debugger and pull a stack backtrace. Or you could insert an abort() at that point in the code and get a backtrace from the core dump. --=20 Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-bugs by date: