Re: Planning counters in pg_stat_statements (using pgss_store) - Mailing list pgsql-hackers
From | Julien Rouhaud |
---|---|
Subject | Re: Planning counters in pg_stat_statements (using pgss_store) |
Date | |
Msg-id | CAOBaU_YekXNsQ819w=eBTM_aQ+BYPcVWLX0sDO-5xRNhqipbRA@mail.gmail.com Whole thread Raw |
In response to | Re: Planning counters in pg_stat_statements (using pgss_store) (Marco Slot <marco.slot@gmail.com>) |
Responses |
RE: Planning counters in pg_stat_statements (using pgss_store)
|
List | pgsql-hackers |
On Thu, Mar 12, 2020 at 1:11 PM Marco Slot <marco.slot@gmail.com> wrote: > > On Thu, Mar 12, 2020 at 11:31 AM Julien Rouhaud <rjuju123@gmail.com> wrote: > > There's at least the current version of IVM patchset that lacks the > > querytext. Looking at various extensions, I see that pg_background > > and pglogical call pg_plan_query internally but shouldn't have any > > issue providing the query text. But there's also citus extension, > > which don't keep around the query string at least when distributing > > plans, which makes sense since it's of no use and they're heavily > > modifying the original Query. I think that citus folks opinion on the > > subject would be useful, so I'm Cc-ing Marco. > > Most of the time we keep our Query * data structures in a form that > can be deparsed back into a query string by a modified copy of > ruleutils.c, so we could generate a correct query string if absolutely > necessary, though there are performance-sensitive cases where we'd > rather not have the deparsing overhead. Yes, deparsing is probably too expensive for that use case. > In case of INSERT..SELECT into a distributed table, we might call > pg_plan_query on the SELECT part (Query *) and send the output into a > DestReceiver that sends tuples into shards of the distributed table > via COPY. The fact that SELECT does not show up in pg_stat_statements > separately is generally fine because it's an implementation detail, > and it would probably be a little confusing because the user never ran > the SELECT query. Moreover, the call to pg_plan_query would already be > reflected in the planning or execution time of the top-level query, so > it would be double counted if it had its own entry. Well, surprising statements can already appears in pg_stat_statements when you use some psql features, or if you have triggers as those will run additional queries under the hood. The difference here is that since citus is a CustomNode, underlying calls to planner will be accounted for that node, and that's indeed annoying. I can see that citus is doing some calls to spi_exec or Executor* (in ExecuteLocalTaskPlan), which could also trigger pg_stat_statements, but I don't know if a queryid is present there. > Another case is when some of the shards turn out to be local to the > server that handles the distributed query. In that case we plan the > queries on those shards via pg_plan_query instead of deparsing and > sending the query string to a remote server. It would be less > confusing for these queries to show in pg_stat_statements, because the > queries on the shards on remote servers will show up as well. However, > this is a performance-sensitive code path where we'd rather avoid > deparsing. Agreed. > In general, I'd prefer if there was no requirement to pass a correct > query string. I'm ok with passing "SELECT 'citus_internal'" or just "" > if that does not lead to issues. Passing NULL to signal that the > planner call should not be tracked separately does seem a bit cleaner. That's very interesting feedback, thanks! I'm not a fan of giving a way for queries to say that they want to be ignored by pg_stat_statements, but double counting the planning counters seem even worse, so I'm +0.5 to accept NULL query string in the planner, incidentally making pgss ignore those.
pgsql-hackers by date: