[BUGS] BUG #14901: Canceled queries missing from pg_stat_statements - Mailing list pgsql-bugs
From | gregburek@heroku.com |
---|---|
Subject | [BUGS] BUG #14901: Canceled queries missing from pg_stat_statements |
Date | |
Msg-id | 20171112222811.1464.28388@wrigleys.postgresql.org Whole thread Raw |
Responses |
Re: [BUGS] BUG #14901: Canceled queries missing frompg_stat_statements
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 14901 Logged by: Greg Burek Email address: gregburek@heroku.com PostgreSQL version: 9.5.10 Operating system: linux Description: Hello, I recently worked with an app that hit a mysterious wall in Postgres query performance, where overall query latency had increased dramatically, with sporadic statement_timeout producing app errors. Examining pg_stat_statements showed some queries that were consuming resources, which were then optimized, but the overall query latency and app error rate stayed constant. Only after examining logs did we find a class of query that was being consistently canceled by statement_timeout=30s. This query was found mostly by chance as it was a fraction of the total statement_timeout rate, as some sessions have much more aggressive statement_timeout=2s. Running explain analyze on this 30s timeout query, we found it to be extremely resource heavy and ultimately causing the overall degredation of response time. This was a surprise to the app's developers that a query, which had a large execution time but was being canceled by a relatively sane statement_timeout, was missing from pg_stat_statements. That one would need to access, consume and analyze postgres logs was unexpected, given Postgres documents (https://www.postgresql.org/docs/current/static/pgstatstatements.html) that "The pg_stat_statements module provides a means for tracking execution statistics of *all* SQL statements executed by a server." That canceled queries are missing from this view seems like an easy to miss, but vital exception. I think this is either a bug in how pgss does not account for canceled queries that still exert load on a database or in the documentation of pgss that it only accounts for completed queries and that statement_timeout or app timeouts can result in query undercounting or omissions. Below is a reproduction of this issue, where a pathologically bad query, which is responsible for the bulk of the load executed on the db, does not increment or add execution statistics to pg_stat_statements. I would expect that the final `select * from test order by random() limit ?;` entry of pg_stat_statements would have a count equal 7 and total_time around 142623.038. =# create extension "uuid-ossp"; CREATE EXTENSION Time: 10.053 ms =# create extension "pg_stat_statements"; CREATE EXTENSION Time: 23.607 ms =# create table test as ( SELECT id as id, uuid_generate_v4() as uuid, uuid_generate_v1() as name FROM generate_series(1,(10^8)::int) as id); SELECT 100000000 Time: 276467.422 ms (04:36.467) =# select pg_stat_statements_reset();pg_stat_statements_reset -------------------------- (1 row) Time: 0.867 ms =# select query, calls from pg_stat_statements; query | calls ------------------------------------+-------select pg_stat_statements_reset(); | 1 (1 row) Time: 1.221 ms =# select * from test order by random() limit 1; id | uuid | name ----------+--------------------------------------+--------------------------------------79600490 | 46844417-f3ee-4add-b5c5-1ba3e6db5da5| e4c118b2-c5a3-11e7-be4b-8c8590305665 (1 row) Time: 22627.277 ms (00:22.627) =# select query, calls from pg_stat_statements; query | calls -----------------------------------------------+-------select query, calls from pg_stat_statements; | 1select pg_stat_statements_reset(); | 1select * from test order by random() limit ?; | 1 (3 rows) Time: 1.439 ms =# SET statement_timeout='20s'; SET Time: 8.936 ms =# select * from test order by random() limit 1; ERROR: 57014: canceling statement due to statement timeout LOCATION: ProcessInterrupts, postgres.c:2971 Time: 20073.747 ms (00:20.074) =# select * from test order by random() limit 1; ERROR: 57014: canceling statement due to statement timeout LOCATION: ProcessInterrupts, postgres.c:2971 Time: 20000.455 ms (00:20.000) =# select * from test order by random() limit 1; ERROR: 57014: canceling statement due to statement timeout LOCATION: ProcessInterrupts, postgres.c:2971 Time: 20065.561 ms (00:20.066) =# select * from test order by random() limit 1; ERROR: 57014: canceling statement due to statement timeout LOCATION: ProcessInterrupts, postgres.c:2971 Time: 20019.753 ms (00:20.020) =# select * from test order by random() limit 1; ERROR: 57014: canceling statement due to statement timeout LOCATION: ProcessInterrupts, postgres.c:2971 Time: 20056.721 ms (00:20.057) =# select * from test order by random() limit 1; ERROR: 57014: canceling statement due to statement timeout LOCATION: ProcessInterrupts, postgres.c:2971 Time: 20001.994 ms (00:20.002) =# select query, calls, total_time from pg_stat_statements; query | calls | total_time -----------------------------------------------+-------+------------select query, calls from pg_stat_statements; | 2| 2.063SET statement_timeout='20s'; | 1 | 0.857select pg_stat_statements_reset(); | 1 | 2.753select * from test order by random() limit ?; | 1 | 22623.038 (4 rows) Time: 2.536 ms -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
pgsql-bugs by date: