Storing pg_stat_statements query texts externally, pg_stat_statements in core - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Storing pg_stat_statements query texts externally, pg_stat_statements in core |
Date | |
Msg-id | CAM3SWZRYYnfwXi3r3eDAKWBOYtaf1_PwGJxTAyddNSbjAfDzjA@mail.gmail.com Whole thread Raw |
Responses |
Re: Storing pg_stat_statements query texts externally,
pg_stat_statements in core
|
List | pgsql-hackers |
Attached patch allows pg_stat_statements to store arbitrarily long query texts, using an external, transparently managed lookaside file. This is of great practical benefit to certain types of users, who need to understand the execution costs of queries with associated excessively long query texts, often due to the fact that the query was generated by an ORM. This is a common complaint of Heroku customers, though I'm sure they're not alone there. As I mentioned on a previous occasion, since query fingerprinting was added, the query text is nothing more than a way of displaying the entries to users that are interested. As such, it seems perfectly reasonable to store those externally, out of shared memory - the function pg_stat_statements() itself (that the view is defined as a call to) isn't particularly performance sensitive. Creating a brand new pg_stat_statements entry is already costly, since it necessitates an exclusive lock that blocks everything, so the additional cost of storing the query text when that happens is assumed to be of marginal concern. Better to have more entries in the first place, so that doesn't need to happen very frequently - using far less memory per entry helps the user to increase pg_stat_statements.max in the first place, making excessive exclusive locking more unlikely in the first place. Having said all that, the code bends over backwards to make sure that physical I/O is as unlikely as possible during exclusive locking. There are numerous tricks employed here where cache pressure is a concern, that I won't go into here, since it's all well commented in the patch. Maybe we should have a warning when eviction occurs too frequently? I also think that we might want to take another look at making the normalization logic less strict in terms of differentiating queries that a reasonable person might consider equivalent. This is obviously rather fuzzy, but I've had complaints about things like pg_stat_statements being overly fussy in seeing row and array comparisons as distinct from each other. This is a discussion for another thread most probably, but informed by one of the same concerns - making expensive cache pressure less likely. This incidentally furthers the case for pg_stat_statements in core (making it similar to pg_stat_user_functions - not turned on by default, but easily available, even on busy production systems that cannot afford a restart and didn't know they needed it until performance tanked 5 minutes ago). The amount of shared memory now used (and therefore arguably wasted by having a little bit of shared memory just in case pg_stat_statements becomes useful) is greatly reduced. That's really a separate discussion, though, which is why I haven't done the additional work of integrating pg_stat_statements into core here. Doing a restart to enable pg_stat_statements is, in my experience, only acceptable infrequently - restarts are generally to be avoided at all costs. I can see a day when the query hash is actually a general query cache identifier, at which time the query text will also be stored outside of the pgss hash table proper. Refactoring ========= I've decided to remove the encoding id from the pgss entry key (it's now just in the main entry struct) - I don't think it makes sense anymore. It is clearly no longer true that it's a "notational convenience" (and hasn't been since 9.1). Like query texts themselves, it is something that exists for the sole purpose of displaying statistics to users, and as such cannot possibly result in incorrectly failing to differentiate or spuriously differentiating two entries. Now, I suppose it's true that since we're sometimes directly hashing query texts, it might matter (e.g. utility statements) assuming that they could vary. But since encoding invariably comes from database encoding anyway, and we always differentiate on databaseid to begin with, I fail to see how that could possibly matter. I've bumped PGSS_FILE_HEADER, just in case a pg_stat_statements temp file survives a pg_upgrade. I think that some minor tweaks made by Noah to pg_stat_statements (as part of commit b560ec1b) ought to have necessitated doing so anyway, but no matter. The role of time-series aggregation ============================= Over on the min/max pg_stat_statements storage thread, I've stated that I think the way forward is that third party tools aggregate deltas fairly aggressively - maybe even as aggressively as every 3 seconds or something. So I'm slightly concerned that I may be hampering a future tool that needs to aggregate statistics very aggressively. For that reason, we should probably provide an alternative function/view that identifies pg_stat_statements entries by hashid + databaseid + userid only, so any overhead from reading big query strings from disk with a shared lock held is eliminated. Thoughts? -- Peter Geoghegan
Attachment
pgsql-hackers by date: