Re: RFC: Logging plan of the running query - Mailing list pgsql-hackers

From Robert Haas
Subject Re: RFC: Logging plan of the running query
Date
Msg-id CA+TgmoYBLNSNGSumcDZB4ymVfy-aHyFa_i4n9MHRpPOLp5m+tQ@mail.gmail.com
Whole thread Raw
In response to Re: RFC: Logging plan of the running query  (torikoshia <torikoshia@oss.nttdata.com>)
List pgsql-hackers
On Mon, Sep 1, 2025 at 9:35 AM torikoshia <torikoshia@oss.nttdata.com> wrote:
> Rebased just because of doc refactoring.

I haven't had time to review this in a while -- sorry about that --
and I have only limited time now, but let me try to give you some
comments in the time that I do have.

I bet some users would really like a feature that allows them to get
the plans for their own running queries. They will be sad with the
limitation of this feature, that it only writes to the logs. We should
probably just live with that limitation, because doing anything else
seems a lot more complicated. Still, it's not great.

+ ereport(LOG_SERVER_ONLY,
+ errmsg("backend with PID %d is finishing query execution and cannot
log the plan.",
+    MyProcPid),
+ errhidestmt(true),
+ errhidecontext(true));

+ ereport(LOG_SERVER_ONLY,
+ errmsg("backend with PID %d is not running a query or a
subtransaction is aborted",
+    MyProcPid),
+ errhidestmt(true),
+ errhidecontext(true));

+ ereport(LOG_SERVER_ONLY,
+ errmsg("query plan running on backend with PID %d is:\n%s",
+    MyProcPid, es->str->data),
+ errhidestmt(true),
+ errhidecontext(true));

The first of these messages ends with a period, which is contrary to
style guidelines. All of them do errhidestmt() and errhidecontext(),
which is an unusual choice. I don't see why it's appropriate here.
Wanting to see both the query and the query plan seems pretty
reasonable to me. The first two messages seem fairly unhelpful to me:
the user isn't going to understand the distinction between those two
states and it's unclear why we should give them that information. I'm
not sure if we should log a generic message in these kinds of cases or
log nothing at all, but I feel like this is too much technical
information. Also, I think that we don't normally put the PID of the
process that is performing an action in the primary error message,
because the user can include %p in log_line_prefix if they so desire.

But there's another, subtler point here, which is that I feel like "is
not running a query or a subtransaction is aborted" is pointing to a
design defect in the patch. When we enter an inner query, we switch
activeQueryDesc to point to the inner QueryDesc. When we abort a
subtransaction, instead of resetting it to the outer query's
QueryDesc, we reset it to NULL. I don't really think that's
acceptable. Consider this:

SELECT some_slow_function_that_uses_a_subtransaction_which_aborts(g)
FROM generate_series(1,1000000) g;

What's going to happen here is that for 99.9999% of the execution time
of this function, you can't print the query plan. And that won't be
because of any fundamental thing is preventing you from so doing -- it
will just be because the patch doesn't have the right bookkeeping. If
you added a QueryDesc * pointer to TransactionStateData, then
ExecutorRun could (1) save the current value of that pointer from the
topmost element of the transaction state stack, (2) update the pointer
value to the new QueryDesc, and (3) put the old pointer back at the
end. Then, you wouldn't need any cleanup in AbortSubTransaction, and
this value would always be right, even after an outer query continues
after an abort, because the subtransaction abort would pop the
transaction state stack, leaving the right thing on top.

The changes to miscadmin.h delete a blank line. They probably shouldn't.

The only change to pquery.h is to delete a blank line. That hunk needs
reverting.

+-- Note that we're using a slightly more complex query here because
+-- a simple 'SELECT pg_log_query_plan(pg_backend_pid())' would finish
+-- before it reaches the code path that actually outputs the plan.
+SELECT result FROM (SELECT pg_log_query_plan(pg_backend_pid())) result;
+ result
+--------
+ (t)
+(1 row)

I seriously doubt that this will be stable across the entire
buildfarm. You're going to need a different approach here.

Is there any real reason to rename regress_log_memory to
regress_log_function, vs. just introducing a separate regress_log_plan
role?

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Melanie Plageman
Date:
Subject: Re: Checkpointer write combining
Next
From: "Sophie Alpert"
Date:
Subject: Re: Fix missing EvalPlanQual recheck for TID scans