Re: RFC: Logging plan of the running query - Mailing list pgsql-hackers
From | torikoshia |
---|---|
Subject | Re: RFC: Logging plan of the running query |
Date | |
Msg-id | 9886a201a5da64eab1472db6c70473a8@oss.nttdata.com Whole thread Raw |
In response to | Re: RFC: Logging plan of the running query (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: RFC: Logging plan of the running query
|
List | pgsql-hackers |
On Wed, Sep 17, 2025 at 12:06 AM Robert Haas <robertmhaas@gmail.com> wrote: Thanks for the comments! > On Tue, Sep 16, 2025 at 9:30 AM torikoshia <torikoshia@oss.nttdata.com> > wrote: > > I'm not sure if this is the best approach, but I changed them to log > > nothing in these cases. > > In addition to the reason you mentioned, I found that when > > pg_log_query_plan() is executed repeatedly at very short intervals > > (e.g., every 0.01 seconds), ereport() could lead to > > ProcessLogQueryPlanInterrupt() being invoked again inside errfinish via > > CFI(), which resulted in a stack overflow. > > While this could be processed with check_stack_depth(), it doesn’t seem > > worthwhile to use it just to emit a message of limited usefulness. > > Yes, it seems as though we should set a flag to prevent reentrancy > here -- if we are already in the code path where we log the query > plan, we shouldn't accept an interrupt telling us to log the query > plan. I haven't looked at the updated patch so maybe that's not > necessary for some reason, but in general reentrancy is a concern with > features of this type. Agreed. In the attached patch added a flag. > > Additionally, there are other cases where the plan cannot be logged > > (e.g., when the query is in a state which ExecProcNode will not be > > invoked any further). > > So instead, I added the following note to the documentation: > > > > Note that depending on the execution state of the query, > > it may not be possible to log the plan. > > This seems to me to be too vague to be useful. I expect readers to > read this to mean "sometimes this feature may not work." But that > seems too pessimistic if the only case in which it doesn't work is > when we are the very tail end of the query and it was just about to > stop running, we probably don't need to document anything, as it will > happen rarely and will look about the same as if the query finished > very slightly earlier and we just missed it. If there are more cases > than that in which this feature won't work, we should talk about > those, and maybe fix them. As you described, I think it's a rare case to have no chance to log plan, removed the note. > > > Also, I think that we don't normally put the PID of the > > > process that is performing an action in the primary error message, > > > because the user can include %p in log_line_prefix if they so desire. > > > > That makes sense, but > > I had used the following message from pg_log_backend_memory_contexts() > > as a reference > > and put the PID in the message: > > > > errmsg("logging memory contexts of PID %d", MyProcPid))); > > > > Since both pg_log_query_plan() and pg_log_backend_memory_contexts() > > output information about another other backend, it feels slightly odd > > that their log messages would be inconsistent. > > Perhaps should we consider changing pg_log_backend_memory_contexts() for > > consistency as well? > > Yeah, maybe. I don't know what the reason behind the unusual framing > of that message is, so perhaps there is an argument for being > consistent with it, rather than changing it. However, it looks unusual > to me. Looking back at the discussion on the development of pg_log_backend_memory_contexts(), it seems that I had already proposed outputting the PID in the log message from the very beginning [1]. Originally, I had suggested making the memory context available directly as a function result rather than logging it, but we gave up on that idea and switched to logging, so I think the PID output may be a leftover from that change. [1] https://www.postgresql.org/message-id/70ae4b79eb8b0dcf42161c80a00e3f22%40oss.nttdata.com It’s possible that users who don’t include %p in log_line_prefix could have trouble identifying which PID’s memory context is being logged when pg_log_backend_memory_contexts() is called concurrently for multiple processes. However, I suspect that few users run without %p in log_line_prefix. So I think it would be fine to remove the PID from pg_log_backend_memory_contexts()’s log message when this patch is merged. > > With this change, while running the query below, pg_log_query_plan() is > > now able to output the plan for SELECT gen_subtxn(g) FROM > > generate_series(1,1000000) g as below: > > Excellent! > > > This happens when the plan-logging function is called after an ERROR is > > raised but before the transaction state stack is popped. > > So in the attached patch, as in the previous version, I reset QueryDesc > > to NULL during that interval. > > I think a better fix would be to dump nothing when the current > (sub)transaction is (sub)aborted. If you don't do that, then you're > hoping that you can manage to set QueryDesc to null quickly enough > after the transaction is no longer in a valid state, which does not > seem like a great way to make things robust. To detect “when the current (sub)transaction is (sub)aborted,” attached patch uses IsTransactionState(). With that, the logging code is not executed when the transaction state is anything other than TRANS_INPROGRESS, and in some cases this works as expected. On the other hand, when calling pg_log_query_plan() repeatedly during make installcheck, I still encountered cases where segfaults occurred. Specifically, this seems to happen when, after a (sub)abort, the next query starts in the same session and a CFI() occurs after StartTransaction() has been executed but before Executor() runs. In that case, the transaction state has already been changed to TRANS_INPROGRESS by StartTransaction(), but the QueryDesc from the aborted transaction is still present, which causes the problem. To address this, attached patch initializes QueryDesc within CleanupTransaction(). Since this function already resets various members of CurrentTransactionState, I feel it’s not so unreasonable to initialize QueryDesc there as well. Does this approach make sense, or is it problematic? > > > +-- Note that we're using a slightly more complex query here because > > > +-- a simple 'SELECT pg_log_query_plan(pg_backend_pid())' would finish > > > +-- before it reaches the code path that actually outputs the plan. > > > +SELECT result FROM (SELECT pg_log_query_plan(pg_backend_pid())) > > > result; > > > + result > > > +-------- > > > + (t) > > > +(1 row) > > > > > > I seriously doubt that this will be stable across the entire > > > buildfarm. You're going to need a different approach here. > > > > Changed it to the following query: > > > > WITH t AS MATERIALIZED (SELECT pg_log_query_plan(pg_backend_pid())) > > SELECT * FROM t; > > I don't see why that wouldn't have a race condition? The idea is that (1) pg_log_query_plan() in the WITH clause wraps ExecProcNode, and then (2) SELECT * FROM t invokes ExecProcNode, which causes the plan to be logged. I think that’s what currently happens on HEAD. When you mention a "race condition", do you mean that if the timing of the wrapping in step (1) -- that is, when CFI() is called -- changes in the future, then the logging might not work? >> few chance to log the plan, attached patch added T_CteScan handling. > > Shouldn't all node types be handled equally? IIUC the node types that would need to be handled in the same way are those PlanState subclasses that satisfy both of the following: - Structs in execnodes.h whose first field is a PlanState, and - Subclasses that also contain another PlanState as a member in addition to the first field. If that’s the case, it seems to me that we already have the full list. BTW I haven’t looked into this in detail yet, but I’m a little curious whether the absence of T_CteScan in functions like planstate_tree_walker_impl() is intentional. -- Regards, -- Atsushi Torikoshi Seconded from NTT DATA Japan Corporation to SRA OSS K.K.
Attachment
pgsql-hackers by date: