Re: RFC: Logging plan of the running query - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: RFC: Logging plan of the running query |
Date | |
Msg-id | 20220201.172757.1480996662235658750.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: RFC: Logging plan of the running query (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Responses |
Re: RFC: Logging plan of the running query
|
List | pgsql-hackers |
At Tue, 1 Feb 2022 01:51:11 +0900, Fujii Masao <masao.fujii@oss.nttdata.com> wrote in > > > On 2022/01/28 17:45, torikoshia wrote: > >> There is the case where the request to log a query plan is skipped > >> even while the target backend is running a query. If this happens, > >> users can just retry pg_log_query_plan(). These things should be > >> documented? > > Agreed. > > Added following: > > + Note that there is the case where the request to log a > > query > > + plan is skipped even while the target backend is running a > > + query due to lock conflict avoidance. > > + If this happens, users can just retry pg_log_query_plan(). > > This may cause users to misunderstand that pg_log_query_plan() fails > while the target backend is holding *any* locks? Isn't it better to > mention "page-level locks", instead? So how about the following? > > -------------------------- > Note that the request to log the query plan will be ignored if it's > received during a short period while the target backend is holding a > page-level lock. > -------------------------- > > > >> + ereport(LOG_SERVER_ONLY, > >> + errmsg("backend with PID %d is holding a page > >> lock. Try again", > >> + MyProcPid)); > >> > >> It seems more proper to output this message in DETAIL or HINT, > >> instead. So how about something like the following messages? > >> > >> LOG: could not log the query plan > >> DETAIL: query plan cannot be logged while page level lock is being > >> held > >> HINT: Try pg_log_query_plan() after a few .... > > Agreed. > > I felt the HINT message 'after a few ...' is difficult to describe, > > and wrote as below. > > | HINT: Retrying pg_log_query_plan() might succeed since the lock > > | duration of page level locks are usually short > > How do you think? > > Or we don't need HINT message? > > > + errmsg("could not log the query plan"), > + errdetail("query plan cannot be logged while page level lock is > being held"), > > In detail message, the first word of sentences should be > capitalized. How about "Cannot log the query plan while holding > page-level lock.", instead? > > > Thanks for updating the patch! Here are some review comments. > > + <row> > + <entry role="func_table_entry"><para role="func_signature"> > + <indexterm> > + <primary>pg_log_query_plan</primary> > + </indexterm> > > This entry is placed before one for > pg_log_backend_memory_contexts(). But it should be *after* that since > those entries seem to be placed in alphabetical order in the table? > > > + Requests to log the plan of the query currently running on the > + backend with specified process ID along with the untruncated > + query string. > > Other descriptions about logging of query string seem not to mention > something like "untruncated query string". For example, auto_explain, > log_statement, etc. Why do we need to mention "along with the > untruncated query string" specially for pg_log_query_plan()? > > > + Note that nested statements (statements executed inside a function) > are not > + considered for logging. Only the plan of the most deeply nested > query is logged. > > Now the plan of even nested statement can be logged. So this > description needs to be updated? > > > @@ -440,6 +450,7 @@ standard_ExecutorFinish(QueryDesc *queryDesc) > MemoryContextSwitchTo(oldcontext); > + ActiveQueryDesc = NULL; > > ActiveQueryDesc seems unnecessary. Why does ActiveQueryDesc need to be > reset to NULL in standard_ExecutorFinish()? > > > Currently even during ProcessLogQueryPlanInterrupt(), > pg_log_query_plan() can be call and another > ProcessLogQueryPlanInterrupt() can be executed. So repeatable > re-entrances to ProcessLogQueryPlanInterrupt() could cause "stack > depth limit exceeded" error. To avoid this, shouldn't we make > ProcessLogQueryPlanInterrupt() do nothing and return immediately, if > it's called during another ProcessLogQueryPlanInterrupt()? > > pg_log_backend_memory_contexts() also might have the same issue. Good catch. By the way, I'm anxious about the following part and I'd like to remove it. + * Ensure no page lock is held on this process. It seems to me what is wrong is ginInsertCleanup(), not this feature. As I read the comment for the assertion, I don't believe we want to allow CFI while holding a page lock. And AFAICS the function is the only point where doing that. (It is the alone user of LockPage()...) This is the assertion. lock.c: 902 > /* > * We don't acquire any other heavyweight lock while holding the page lock > * except for relation extension. > */ > Assert(!IsPageLockHeld || > (locktag->locktag_type == LOCKTAG_RELATION_EXTEND)); It is added by a recent commit 72e78d831ab5550c39f2dcc7cc5d44c406ec3dc2. > * Similar to relation extension, page locks are also held for a short > * duration, so imposing such a restriction won't hurt. I don't believe a path involving vacuum_delay_point() calls is short-duration'ed. https://www.postgresql.org/message-id/CAH2-WznzCPUKnOV%2Bre30_rHLCkqQWm2JTSVjTCAei9LySTc2pw%40mail.gmail.com > One thing that really bothers me about commit e2c79e14 is that > LockPage() is called, not LockBuffer(). GIN had no LockPage() calls > before that commit, and is now the only code in the entire system that > calls LockPage()/ConditionalLockPage() (the hash am no longer uses > page heavyweight locks following recent work there). I agree to the discussion. Can't we use other mechanism here to get rid of the Lockpage()? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: