Re: RFC: Logging plan of the running query - Mailing list pgsql-hackers

From torikoshia
Subject Re: RFC: Logging plan of the running query
Date
Msg-id 6143f00af4bfdba95a85cf866f4acb41@oss.nttdata.com
Whole thread Raw
In response to Re: RFC: Logging plan of the running query  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: RFC: Logging plan of the running query
List pgsql-hackers
On 2025-09-10 02:59, Robert Haas wrote:
> I haven't had time to review this in a while -- sorry about that --
> and I have only limited time now, but let me try to give you some
> comments in the time that I do have.

Thank you very much for taking the time to review this!

> I bet some users would really like a feature that allows them to get
> the plans for their own running queries. They will be sad with the
> limitation of this feature, that it only writes to the logs. We should
> probably just live with that limitation, because doing anything else
> seems a lot more complicated. Still, it's not great.

I agree.
Regarding output in a form other than the logs, I think the idea
proposed in [1] — making it possible to inspect memory contexts of other
backends via a view — would be a useful reference.

[1]
https://www.postgresql.org/message-id/flat/CAH2L28v8mc9HDt8QoSJ8TRmKau_8FM_HKS41NeO9-6ZAkuZKXw@mail.gmail.com

> The first of these messages ends with a period, which is contrary to
> style guidelines.

Removed the period.

> All of them do errhidestmt() and errhidecontext(),
> which is an unusual choice. I don't see why it's appropriate here.
> Wanting to see both the query and the query plan seems pretty
> reasonable to me.

Removed errhidestmt() and errhidecontext().

> The first two messages seem fairly unhelpful to me:
> the user isn't going to understand the distinction between those two
> states and it's unclear why we should give them that information. I'm
> not sure if we should log a generic message in these kinds of cases or
> log nothing at all, but I feel like this is too much technical
> information.

I'm not sure if this is the best approach, but I changed them to log
nothing in these cases.
In addition to the reason you mentioned, I found that when
pg_log_query_plan() is executed repeatedly at very short intervals
(e.g., every 0.01 seconds), ereport() could lead to
ProcessLogQueryPlanInterrupt() being invoked again inside errfinish via
CFI(), which resulted in a stack overflow.
While this could be processed with check_stack_depth(), it doesn’t seem
worthwhile to use it just to emit a message of limited usefulness.
Additionally, there are other cases where the plan cannot be logged
(e.g., when the query is in a state which ExecProcNode will not be
invoked any further).
So instead, I added the following note to the documentation:

   Note that depending on the execution state of the query,
   it may not be possible to log the plan.

> Also, I think that we don't normally put the PID of the
> process that is performing an action in the primary error message,
> because the user can include %p in log_line_prefix if they so desire.

That makes sense, but
I had used the following message from pg_log_backend_memory_contexts()
as a reference
and put the PID in the message:

   errmsg("logging memory contexts of PID %d", MyProcPid)));

Since both pg_log_query_plan() and pg_log_backend_memory_contexts()
output information about another other backend, it feels slightly odd
that their log messages would be inconsistent.
Perhaps should we consider changing pg_log_backend_memory_contexts() for
consistency as well?

> But there's another, subtler point here, which is that I feel like "is
> not running a query or a subtransaction is aborted" is pointing to a
> design defect in the patch. When we enter an inner query, we switch
> activeQueryDesc to point to the inner QueryDesc. When we abort a
> subtransaction, instead of resetting it to the outer query's
> QueryDesc, we reset it to NULL. I don't really think that's
> acceptable. Consider this:
>
> SELECT some_slow_function_that_uses_a_subtransaction_which_aborts(g)
> FROM generate_series(1,1000000) g;
>
> What's going to happen here is that for 99.9999% of the execution time
> of this function, you can't print the query plan. And that won't be
> because of any fundamental thing is preventing you from so doing -- it
> will just be because the patch doesn't have the right bookkeeping. If
> you added a QueryDesc * pointer to TransactionStateData, then
> ExecutorRun could (1) save the current value of that pointer from the
> topmost element of the transaction state stack, (2) update the pointer
> value to the new QueryDesc, and (3) put the old pointer back at the
> end.

Thank you for pointing out the issue and suggesting a fix.
Attached patch added a QueryDesc pointer to TransactionStateData and
updated it accordingly.
With this change, while running the query below, pg_log_query_plan() is
now able to output the plan for SELECT gen_subtxn(g) FROM
generate_series(1,1000000) g as below:

   (session1)=# CREATE OR REPLACE FUNCTION gen_subtxn(i int)
   RETURNS int LANGUAGE plpgsql AS $$
   DECLARE
       result int;
   BEGIN
       BEGIN
           PERFORM 1 / 0;
       EXCEPTION
           WHEN division_by_zero THEN
               result := -i;
       END;
       RETURN result;
   END;
   $$;

   =# SELECT gen_subtxn(g) from generate_series(1,1000000) g;

   (session2)=# SELECT pg_log_query_plan(pid) FROM pg_stat_activity WHERE
backend_type = 'client backend';
   (session2)=# \watch 0.1

   (in the log)
    LOG:  00000: query and its plan running on the backend are:
    Query Text: SELECT gen_subtxn(g) from generate_series(1,1000000) g;
    Function Scan on pg_catalog.generate_series g  (cost=0.00..260000.00
rows=1000000 width=4)
      Output: gen_subtxn(g)
      Function Call: generate_series(1, 1000000)

> Then, you wouldn't need any cleanup in AbortSubTransaction, and
> this value would always be right, even after an outer query continues
> after an abort, because the subtransaction abort would pop the
> transaction state stack, leaving the right thing on top.

It seems that simply making the above change is not sufficient.
In particular, when raising an error inside a subtransaction, I
sometimes observed segfaults:

   BEGIN;
   savepoint x;
   CREATE TABLE koju (a INT UNIQUE);
   INSERT INTO koju VALUES (1);
   INSERT INTO koju VALUES (1);  -- key duplicate error

Inspecting the crash showed that the argument *ps to
ExecSetExecProcNodeRecurse() was 0x7f7f7f....

   (lldb) f 0
   frame #0: 0x000000010102ec40
postgres`ExecSetExecProcNodeRecurse(ps=0x7f7f7f7f7f7f7f7f) at
execProcnode.c:601:30
      600  {
   -> 601          ExecSetExecProcNode(ps, ps->ExecProcNodeReal);

This happens when the plan-logging function is called after an ERROR is
raised but before the transaction state stack is popped.
So in the attached patch, as in the previous version, I reset QueryDesc
to NULL during that interval.
I have also confirmed that even with this change, the above test (SELECT
gen_subtxn(g) from generate_series(1,1000000) g;) can still have its
plan logged successfully.

> The changes to miscadmin.h delete a blank line. They probably
> shouldn't.
Fixed.

> The only change to pquery.h is to delete a blank line. That hunk needs
> reverting.
Fixed.

> +-- Note that we're using a slightly more complex query here because
> +-- a simple 'SELECT pg_log_query_plan(pg_backend_pid())' would finish
> +-- before it reaches the code path that actually outputs the plan.
> +SELECT result FROM (SELECT pg_log_query_plan(pg_backend_pid()))
> result;
> + result
> +--------
> + (t)
> +(1 row)
>
> I seriously doubt that this will be stable across the entire
> buildfarm. You're going to need a different approach here.

Changed it to the following query:

   WITH t AS MATERIALIZED (SELECT pg_log_query_plan(pg_backend_pid()))
     SELECT * FROM t;

Also since when the target query is using CTE, pg_log_query_plan() had
few chance to log the plan, attached patch added T_CteScan handling.

> Is there any real reason to rename regress_log_memory to
> regress_log_function, vs. just introducing a separate regress_log_plan
> role?

I don’t have any particular reason to insist on unifying them.
In the attached patch, I created a regress_log_plan role and used that
to run the same test.


--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.
Attachment

pgsql-hackers by date:

Previous
From: Mircea Cadariu
Date:
Subject: Re: vacuumdb --analyze-only does not need to issue VACUUM (ONLY_DATABASE_STATS) ?
Next
From: Thomas Munro
Date:
Subject: Re: --with-llvm on 32-bit platforms?