Re: BUG #6200: standby bad memory allocations on SELECT - Mailing list pgsql-bugs
From | Michael Brauwerman |
---|---|
Subject | Re: BUG #6200: standby bad memory allocations on SELECT |
Date | |
Msg-id | CAHDXJ6jes_Zv1OFo=EZn-HGOrjKoy2uLz3Sg4ShXhb0yMY_-5A@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #6200: standby bad memory allocations on SELECT (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: BUG #6200: standby bad memory allocations on SELECT
|
List | pgsql-bugs |
I work with Bridget at Redfin. We have a core dump from a once-in-5-days (multi-million queries) hot standby segfault in pg 9.1.2 . (It might or might be the same root issue as the "alloc" errors. If I should file a new bug report, let me know. The postgres executable that crashed did not have debugging symbols installed, and we were unable to debug (gdb) the core file using a debug build of postgres. (Symbols didn't match.) Running gdb against a non-debug postgres executable gave us this stack trace: [root@query-7 core]# gdb -q -c /postgres/core/query-9.core.19678 /usr/pgsql-9.1/bin/postgres-non-debug Reading symbols from /usr/pgsql-9.1/bin/postgres-non-debug...(no debugging symbols found)...done. warning: core file may not match specified executable file. [New Thread 19678] warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7fffdcd58000 Core was generated by `postgres: datamover stingray_prod 10.11.0.134(54140) SELEC'. Program terminated with signal 11, Segmentation fault. #0 0x000000000045694c in nocachegetattr () (gdb) bt #0 0x000000000045694c in nocachegetattr () #1 0x00000000006f93c9 in ?? () #2 0x00000000006fa231 in tuplesort_puttupleslot () #3 0x0000000000573ad1 in ExecSort () #4 0x000000000055cdda in ExecProcNode () #5 0x000000000055bcd1 in standard_ExecutorRun () #6 0x0000000000623594 in ?? () #7 0x0000000000624ae0 in PortalRun () #8 0x00000000006220f2 in PostgresMain () #9 0x00000000005e6ba4 in ?? () #10 0x00000000005e791c in PostmasterMain () #11 0x000000000058b9ae in main () We have the (5GB) core file, and are happy to do any more forensics anyone can advise. Please instruct. I hope this helps point to a root cause and resolution.... Thank you, Mike Brauwerman Data Team, Redfin On Fri, Jan 27, 2012 at 10:53 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Jan 27, 2012 at 1:31 PM, Bridget Frey <bridget.frey@redfin.com> > wrote: > > Thanks for the info - that's very helpful. We had also noted that the > alloc > > seems to be -3 bytes. We have run pg_check and it found no instances of > > corruption. We've also replayed queries that have failed, and have never > > been able to get the same query to fail twice. In the case you > > investigated, was there permanent page corruption - e.g. you could run > the > > same query over and over and get the same result? > > Yes. I observed that the infomask bits on several tuples had somehow > been overwritten by nonsense. I am not sure whether there were other > kinds of corruption as well - I suspect probably so - but that's the > only one I saw with my own eyes, courtesy of pg_filedump. > > > It really does seem like this is an issue either in Hot Standby or very > > closely related to that feature, where there is temporary corruption of a > > btree index that then disappears. Our master is not experiencing any > malloc > > issues, while the 3 slaves get about a dozen per day, despite similar > > workloads. We haven't have a slave segfault since we set it up to > produce a > > core dump, but we're expecting to have that within the next few days > > (assuming we'll continue to get a segfault every 3-4 days). We're also > > planning to set up one slave that will panic when it gets a malloc > issue, as > > you (and other posters on 6400) had suggested. > > > > Thanks again for the help, and we'll keep you posted as we learn more... > > The case I investigated involved corruption on the master, and I think > it predated Hot Standby. However, the symptom is generic enough that > it seems quite possible that there's more than one way for it to > happen. :-( > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > -- > Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-bugs > -- Mike Brauwerman Data Team, Redfin
pgsql-bugs by date: