Re: BUG #13490: Segmentation fault on pg_stat_activity - Mailing list pgsql-bugs
From | Michael Paquier |
---|---|
Subject | Re: BUG #13490: Segmentation fault on pg_stat_activity |
Date | |
Msg-id | CAB7nPqRurz+i5pUc=AFz7W-QJ-9x66TO8qRNCn5LtoR5voS5vQ@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #13490: Segmentation fault on pg_stat_activity (Michael Bommarito <michael@bommaritollc.com>) |
Responses |
Re: BUG #13490: Segmentation fault on pg_stat_activity
|
List | pgsql-bugs |
On Mon, Jul 13, 2015 at 4:16 AM, Michael Bommarito <michael@bommaritollc.com> wrote: > This particular instance is from pghero, which is a monitoring tool. It > can be reproduced simply by querying stat_activity in psql as well. Pghero > is using prepared statements via ruby from a quick skim on their github > repo. > > We have pg_stat_statements enabled, and can reproduce without pghero setup > as well. No other extensions loaded. > > On Jul 12, 2015 2:37 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote: >> >> Michael Bommarito <michael@bommaritollc.com> writes: >> > Here's the session with debug_query_string: >> > (gdb) printf "%s\n", debug_query_string >> > SELECT application_name AS source, client_addr AS ip, COUNT(*) AS >> > total_connections FROM pg_stat_activity WHERE pid <> pg_backend_pid() >> > GROUP >> > BY application_name, ip ORDER BY COUNT(*) DESC, application_name ASC, >> > client_addr ASC >> >> Thanks. This still doesn't match the stack trace: in particular, this >> stack frame >> >> #3 0x00007fd0d478152c in expression_tree_mutator (node=0x7fd0d5d9e908, >> mutator=0x7fd0d481c390 <replace_rte_variables_mutator>, >> context=0x7fff52170620) at >> >> /tmp/buildd/postgresql-9.5-9.5~alpha1/build/../src/backend/nodes/nodeFuncs.c:2769 >> >> indicates that we found a PlaceHolderInfo node in the expression tree that >> pullup_replace_vars() was applied to, but so far as I can see no such node >> should exist in the query tree generated by this query. The most likely >> theory seems to be that something clobbered the query tree while it was >> sitting in the plancache, causing this recursive function to follow a >> bogus pointer. But that doesn't leave us with a lot to go on. >> >> What can you tell us about the environment this is happening in? >> How is the client-side code executing the failing queries? (We know >> it's using extended query protocol, but is it preparing a statement >> and then executing it repeatedly, or just using a one-shot unnamed >> prepared statement?) What nondefault settings are in use on the >> server side? Do you have any extensions loaded, such as >> pg_stat_statements or auto_explain? FWIW, I have been fooling around with the query reported in the back trace upthread by playing a bit with the extended query protocol to send BIND messages with PQdescribePrepared and PQsendDescribePrepared, as well as with psql and while I am able to reproduce stack traces close to what you had I am not seeing any crashes. I have as well played a bit with pghero with pgbench running in parallel and there were no problems, with and without pg_stat_statements loaded. In the backtrace you send previously (http://www.postgresql.org/message-id/CAN=rtBipwKdHCtmXH3r4GNfUhF9e4ZfJbqcj7s_Ec9e2Mbf_LA@mail.gmail.com), what is the value of MyProcPid? Is it 12803 or 20696? If it is the former, do you have a backtrace for process 20696? What we may be looking at now is actually a side effect of the real problem, and as long as we do not have a real test case, I am afraid that finding the root problem is rather difficult. -- Michael
pgsql-bugs by date: