Re: RFC: Logging plan of the running query - Mailing list pgsql-hackers

From torikoshia
Subject Re: RFC: Logging plan of the running query
Date
Msg-id 9886a201a5da64eab1472db6c70473a8@oss.nttdata.com
Whole thread Raw
In response to Re: RFC: Logging plan of the running query  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: RFC: Logging plan of the running query
List pgsql-hackers
On Wed, Sep 17, 2025 at 12:06 AM Robert Haas <robertmhaas@gmail.com>
wrote:
Thanks for the comments!

> On Tue, Sep 16, 2025 at 9:30 AM torikoshia <torikoshia@oss.nttdata.com>
> wrote:
> > I'm not sure if this is the best approach, but I changed them to log
> > nothing in these cases.
> > In addition to the reason you mentioned, I found that when
> > pg_log_query_plan() is executed repeatedly at very short intervals
> > (e.g., every 0.01 seconds), ereport() could lead to
> > ProcessLogQueryPlanInterrupt() being invoked again inside errfinish via
> > CFI(), which resulted in a stack overflow.
> > While this could be processed with check_stack_depth(), it doesn’t seem
> > worthwhile to use it just to emit a message of limited usefulness.
>
> Yes, it seems as though we should set a flag to prevent reentrancy
> here -- if we are already in the code path where we log the query
> plan, we shouldn't accept an interrupt telling us to log the query
> plan. I haven't looked at the updated patch so maybe that's not
> necessary for some reason, but in general reentrancy is a concern with
> features of this type.

Agreed. In the attached patch added a flag.

> > Additionally, there are other cases where the plan cannot be logged
> > (e.g., when the query is in a state which ExecProcNode will not be
> > invoked any further).
> > So instead, I added the following note to the documentation:
> >
> >    Note that depending on the execution state of the query,
> >    it may not be possible to log the plan.
>
> This seems to me to be too vague to be useful. I expect readers to
> read this to mean "sometimes this feature may not work." But that
> seems too pessimistic if the only case in which it doesn't work is
> when we are the very tail end of the query and it was just about to
> stop running, we probably don't need to document anything, as it will
> happen rarely and will look about the same as if the query finished
> very slightly earlier and we just missed it. If there are more cases
> than that in which this feature won't work, we should talk about
> those, and maybe fix them.

As you described, I think it's a rare case to have no chance to log
plan, removed the note.

> > > Also, I think that we don't normally put the PID of the
> > > process that is performing an action in the primary error message,
> > > because the user can include %p in log_line_prefix if they so desire.
> >
> > That makes sense, but
> > I had used the following message from pg_log_backend_memory_contexts()
> > as a reference
> > and put the PID in the message:
> >
> >    errmsg("logging memory contexts of PID %d", MyProcPid)));
> >
> > Since both pg_log_query_plan() and pg_log_backend_memory_contexts()
> > output information about another other backend, it feels slightly odd
> > that their log messages would be inconsistent.
> > Perhaps should we consider changing pg_log_backend_memory_contexts() for
> > consistency as well?
>
> Yeah, maybe. I don't know what the reason behind the unusual framing
> of that message is, so perhaps there is an argument for being
> consistent with it, rather than changing it. However, it looks unusual
> to me.

Looking back at the discussion on the development of
pg_log_backend_memory_contexts(), it seems that I had already proposed
outputting the PID in the log message from the very beginning [1].
Originally, I had suggested making the memory context available directly
as a function result rather than logging it, but we gave up on that idea
and switched to logging, so I think the PID output may be a leftover
from that change.

  [1]
https://www.postgresql.org/message-id/70ae4b79eb8b0dcf42161c80a00e3f22%40oss.nttdata.com

It’s possible that users who don’t include %p in log_line_prefix could
have trouble identifying which PID’s memory context is being logged when
pg_log_backend_memory_contexts() is called concurrently for multiple
processes.
However, I suspect that few users run without %p in log_line_prefix.
So I think it would be fine to remove the PID from
pg_log_backend_memory_contexts()’s log message when this patch is
merged.

> > With this change, while running the query below, pg_log_query_plan() is
> > now able to output the plan for SELECT gen_subtxn(g) FROM
> > generate_series(1,1000000) g as below:
>
> Excellent!
>
> > This happens when the plan-logging function is called after an ERROR is
> > raised but before the transaction state stack is popped.
> > So in the attached patch, as in the previous version, I reset QueryDesc
> > to NULL during that interval.
>
> I think a better fix would be to dump nothing when the current
> (sub)transaction is (sub)aborted. If you don't do that, then you're
> hoping that you can manage to set QueryDesc to null quickly enough
> after the transaction is no longer in a valid state, which does not
> seem like a great way to make things robust.

To detect “when the current (sub)transaction is (sub)aborted,” attached
patch uses IsTransactionState().
With that, the logging code is not executed when the transaction state
is anything other than TRANS_INPROGRESS, and in some cases this works as
expected.
On the other hand, when calling pg_log_query_plan() repeatedly during
make installcheck, I still encountered cases where segfaults occurred.

Specifically, this seems to happen when, after a (sub)abort, the next
query starts in the same session and a CFI() occurs after
StartTransaction() has been executed but before Executor() runs.  In
that case, the transaction state has already been changed to
TRANS_INPROGRESS by StartTransaction(), but the QueryDesc from the
aborted transaction is still present, which causes the problem.

To address this, attached patch initializes QueryDesc within
CleanupTransaction().
Since this function already resets various members of
CurrentTransactionState, I feel it’s not so unreasonable to initialize
QueryDesc there as well.
Does this approach make sense, or is it problematic?

> > > +-- Note that we're using a slightly more complex query here because
> > > +-- a simple 'SELECT pg_log_query_plan(pg_backend_pid())' would finish
> > > +-- before it reaches the code path that actually outputs the plan.
> > > +SELECT result FROM (SELECT pg_log_query_plan(pg_backend_pid()))
> > > result;
> > > + result
> > > +--------
> > > + (t)
> > > +(1 row)
> > >
> > > I seriously doubt that this will be stable across the entire
> > > buildfarm. You're going to need a different approach here.
> >
> > Changed it to the following query:
> >
> >    WITH t AS MATERIALIZED (SELECT pg_log_query_plan(pg_backend_pid()))
> >      SELECT * FROM t;
>
> I don't see why that wouldn't have a race condition?

The idea is that (1) pg_log_query_plan() in the WITH clause wraps
ExecProcNode, and then (2) SELECT * FROM t invokes ExecProcNode, which
causes the plan to be logged.  I think that’s what currently happens on
HEAD.
When you mention a "race condition", do you mean that if the timing of
the wrapping in step (1) -- that is, when CFI() is called -- changes in
the future, then the logging might not work?

>> few chance to log the plan, attached patch added T_CteScan handling.
>
> Shouldn't all node types be handled equally?

IIUC the node types that would need to be handled in the same way are
those PlanState subclasses that satisfy both of the following:
- Structs in execnodes.h whose first field is a PlanState, and
- Subclasses that also contain another PlanState as a member in addition
to the first field.

If that’s the case, it seems to me that we already have the full list.

BTW I haven’t looked into this in detail yet, but I’m a little curious
whether the absence of T_CteScan in functions like
planstate_tree_walker_impl() is intentional.


--
Regards,

--
Atsushi Torikoshi
Seconded from NTT DATA Japan Corporation to SRA OSS K.K.
Attachment

pgsql-hackers by date:

Previous
From: Arseniy Mukhin
Date:
Subject: Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Next
From: Álvaro Herrera
Date:
Subject: Re: REPACK and naming