Thread: [BUG] temporary file usage report with extended protocol and unnamed portals
[BUG] temporary file usage report with extended protocol and unnamed portals
From
Frédéric Yhuel
Date:
Hi, It seems there's a bug in the logging of temporary file usage when the extended protocol is used with unnamed portals. For example, with the attached Java / pgJDBC programs, we get the following logs: [...] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp525566.0", size 2416640 [..] STATEMENT: SELECT 1 but it should be: [...] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp538230.0", size 2416640 [...] STATEMENT: SELECT * FROM foo ORDER BY a OFFSET $1 LIMIT 2 It has been tested with HEAD and REL_17_STABLE. My guess is that there's a race somewhere, probably when the global variable "debug_query_string" is set. The unnamed portal is closed when the BIND of the next query arrives (here: SELECT 1), and I suspect that the variable is set before the temporary file is deleted (and logged). pgJDBC uses unnamed portals, but I don't think this is specific to JDBC. I see the same problem with the attached Python / psycopg3 program. I think it would be better if the drivers used named portals all the time (and an explicit close message), but this seems to be a postgres bug. What do you think? Best regards, Frédéric PS : the dataset is created like this on the server: CREATE UNLOGGED TABLE foo(a int); INSERT INTO foo SELECT * FROM generate_series(1, 200000); ALTER SYSTEM SET log_temp_files = 0; ALTER SYSTEM SET log_min_duration_statement = -1 SELECT pg_reload_conf();
Attachment
Re: [BUG] temporary file usage report with extended protocol and unnamed portals
From
Frédéric Yhuel
Date:
On 4/18/25 10:49, Frédéric Yhuel wrote: > Hi, > > It seems there's a bug in the logging of temporary file usage when the > extended protocol is used with unnamed portals. > FWIW, the attached patch seems to fix the problem. Best regards, Frédéric
Attachment
Re: [BUG] temporary file usage report with extended protocol and unnamed portals
From
Sami Imseih
Date:
> [...] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp525566.0", > size 2416640 > [..] STATEMENT: SELECT 1 > but it should be: > [...] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp538230.0", > size 2416640 > [...] STATEMENT: SELECT * FROM foo ORDER BY a OFFSET $1 LIMIT 2 hmm, looking at the repro and your patch, I was really confused by why the STATEMENT: ends up being SELECT 1 even though the query with temp being tracked is the ORDER BY query. So, I ran a slowed down test where there is a 15 second thread sleep between the ORDER BY and SELECT 1 ( see attached ) and I also ran it with the following GUCs enabled ``` log_temp_files = '0' work_mem = '64kB' max_parallel_workers_per_gather = '0' log_min_duration_statement = '0' ``` 2025-04-18 13:31:45.572 CDT [95918] LOG: duration: 0.046 ms bind <unnamed>: select pg_sleep(0) 2025-04-18 13:31:45.572 CDT [95918] LOG: duration: 0.009 ms execute <unnamed>: select pg_sleep(0) 2025-04-18 13:31:45.574 CDT [95918] LOG: duration: 0.454 ms parse <unnamed>: SELECT * FROM foo ORDER BY a 2025-04-18 13:31:45.575 CDT [95918] LOG: duration: 0.278 ms bind <unnamed>: SELECT * FROM foo ORDER BY a 2025-04-18 13:31:46.396 CDT [95918] LOG: duration: 821.076 ms execute <unnamed>: SELECT * FROM foo ORDER BY a 2025-04-18 13:32:01.403 CDT [95918] LOG: duration: 0.438 ms parse <unnamed>: SELECT 1 2025-04-18 13:32:01.454 CDT [95918] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp95918.0", size 202039296 2025-04-18 13:32:01.454 CDT [95918] STATEMENT: SELECT 1 2025-04-18 13:32:01.454 CDT [95918] LOG: duration: 50.818 ms bind <unnamed>: SELECT 1 2025-04-18 13:32:01.454 CDT [95918] LOG: duration: 0.022 ms execute <unnamed>: SELECT 1 2025-04-18 13:32:01.457 CDT [95918] LOG: duration: 0.064 ms parse S_1: COMMIT 2025-04-18 13:32:01.457 CDT [95918] LOG: duration: 0.017 ms bind S_1: COMMIT 2025-04-18 13:32:01.457 CDT [95918] LOG: duration: 0.105 ms execute S_1: COMMIT In the log above, it shows that the temporary file logging does not actually occur until the next query is executed, so at that point the query string is already "SELECT 1" what is even more interesting is when running the commands directly from psql using extended query protocol, the STATEMENT is not logged after the temporary file message. ``` SELECT FROM foo ORDER BY a \bind ; SELECT 1 \bind ; ``` ``` istance=0 kB, estimate=0 kB; lsn=0/E55A088, redo lsn=0/E55A030 2025-04-18 13:36:39.275 CDT [96237] LOG: duration: 1.447 ms parse <unnamed>: SELECT FROM foo ORDER BY a ; 2025-04-18 13:36:39.275 CDT [96237] LOG: duration: 0.719 ms bind <unnamed>: SELECT FROM foo ORDER BY a ; 2025-04-18 13:36:39.822 CDT [96237] LOG: duration: 546.803 ms execute <unnamed>: SELECT FROM foo ORDER BY a ; 2025-04-18 13:36:39.852 CDT [96237] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp96237.0", size 202039296 2025-04-18 13:36:40.736 CDT [96237] LOG: duration: 0.394 ms parse <unnamed>: SELECT 1 ; 2025-04-18 13:36:40.736 CDT [96237] LOG: duration: 0.133 ms bind <unnamed>: SELECT 1 ; 2025-04-18 13:36:40.736 CDT [96237] LOG: duration: 0.023 ms execute <unnamed>: SELECT 1 ; ``` I think the DropPortal is happening later in the JDBC case, and only after the next query is executed, maybe? So your patch does take care of the problem because it ensures that the drop portal, when it occurs, is referencing the correct sql. I am not yet sure if the patch the right solution yet, it maybe the best answer. I don't have a better answer, but wanted to share this research as well. -- Sami Imseih Amazon Web Services (AWS)
Attachment
Re: [BUG] temporary file usage report with extended protocol and unnamed portals
From
Sami Imseih
Date:
I looked into this a bit more. What occurs during exec_bind_message is that the debug_query_string of the query is set early on. Later in that routine, a new portal is created via CreatePortal, which also drops the existing unnamed portal from the previous execution and which also logs the temp file information. So the logging is now using the debug_query_string of the current query and not the one referenced by the to-be-dropped portal. This behavior means that there is a delay in log_temp_files logging, since the portal must be dropped, which can occur after the statement has completed execution. We can look into improving that, but I am not sure whether it is possible or can be done safely. I think the solution proposed by Frédéric seems reasonable: to switch the debug_query_string inside PortalDrop. However, I do not like the way the debug_query_string changes values after the CreatePortal call inside exec_bind_message; that seems incorrect. So, I believe we should temporarily switch the debug_query_string value only while running PortalDrop. Attached is what I think could be safer to do. What do you think? -- Sami Imseih Amazon Web Services (AWS)
Attachment
Re: [BUG] temporary file usage report with extended protocol and unnamed portals
From
Tom Lane
Date:
Sami Imseih <samimseih@gmail.com> writes: > I think the solution proposed by Frédéric seems reasonable: to switch > the debug_query_string > inside PortalDrop. However, I do not like the way the > debug_query_string changes values > after the CreatePortal call inside exec_bind_message; that seems incorrect. > So, I believe we should temporarily switch the debug_query_string > value only while > running PortalDrop. Attached is what I think could be safer to do. > What do you think? I don't think this is safe at all. The portal's query string is potentially shorter-lived than the portal, see in particular exec_simple_query which just passes a pointer to the original string in MessageContext. I'm frankly inclined to do nothing, but if we must do something, the way to fix it here would be to transiently set debug_query_string to NULL so that the actions of PortalDrop aren't blamed on any particular query. (In my testing, the "temporary file:" message comes out without any attached STATEMENT most of the time already, so this isn't losing much as far as that's concerned.) The whole thing is just a band-aid, though. debug_query_string is not the only indicator of what the backend is currently doing. If somebody comes along and complains that the pg_stat_activity entry doesn't reflect what's happening, are we going to take that seriously? Perhaps a cleaner answer is to rearrange things in postgres.c so that if there's a pre-existing unnamed portal, we drop that before we ever set debug_query_string and friends at all. This would add an extra portal hashtable lookup, but I'm not sure if that would be measurable or not. (Possibly we could get that back by simplifying CreatePortal's API so that it doesn't need to perform an initial lookup.) regards, tom lane
Re: [BUG] temporary file usage report with extended protocol and unnamed portals
From
Sami Imseih
Date:
> Sami Imseih <samimseih@gmail.com> writes: > > I think the solution proposed by Frédéric seems reasonable: to switch > > the debug_query_string > > inside PortalDrop. However, I do not like the way the > > debug_query_string changes values > > after the CreatePortal call inside exec_bind_message; that seems incorrect. > > So, I believe we should temporarily switch the debug_query_string > > value only while > > running PortalDrop. Attached is what I think could be safer to do. > > What do you think? > > I don't think this is safe at all. The portal's query string > is potentially shorter-lived than the portal, see in particular > exec_simple_query which just passes a pointer to the original > string in MessageContext. Yes, you are correct. The comments about MessageContect in mmgr/README give me pause. ".... This is kept separate from per-transaction and per-portal contexts because a query string might need to live either a longer or shorter time than any single transaction or portal. """ > I'm frankly inclined to do nothing, but if we must do something, > the way to fix it here would be to transiently set debug_query_string > to NULL so that the actions of PortalDrop aren't blamed on any > particular query. (In my testing, the "temporary file:" message comes > out without any attached STATEMENT most of the time already, so this > isn't losing much as far as that's concerned.) moreover, as I was looking into why the case mentioned earlier ``` SELECT FROM foo ORDER BY a \bind ; SELECT 1 \bind ; ``` does not show STATEMENT after the temp file logging, I realized it's because the temp files are cleaned up and reported at the end of transaction, which means that debug_query_string is NULL at the time the portal is dropped in the next query. This causes check_log_of_query to return false. /* query string available? */ if (debug_query_string == NULL) return false; > Perhaps a cleaner answer is to rearrange things in postgres.c > so that if there's a pre-existing unnamed portal, we drop that > before we ever set debug_query_string and friends at all. That did cross my mind as well, but I was trying to avoid doing this type of rearranging. I still rather not go down that path considering the case mentioned above will still not display the query text in a STATEMENT log. > I'm frankly inclined to do nothing, but if we must do something, > the way to fix it here would be to transiently set debug_query_string > to NULL so that the actions of PortalDrop aren't blamed on any > particular query. I think this is better, because I rather we avoid lines like the below in which there are temp files being reported all with STATEMENT logging of a different query. It's better to just not show STATEMENT at all. ``` 2025-04-19 16:44:38.082 CDT [38963] STATEMENT: SELECT 1 2025-04-19 16:44:38.115 CDT [38963] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp38963.1", size 1073741824 2025-04-19 16:44:38.115 CDT [38963] STATEMENT: SELECT 1 2025-04-19 16:44:38.149 CDT [38963] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp38963.2", size 1073741824 2025-04-19 16:44:38.149 CDT [38963] STATEMENT: SELECT 1 2025-04-19 16:44:38.305 CDT [38963] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp38963.3", size 1073741824 2025-04-19 16:44:38.305 CDT [38963] STATEMENT: SELECT 1 2025-04-19 16:44:38.558 CDT [38963] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp38963.4", size 1073741824 2025-04-19 16:44:38.558 CDT [38963] STATEMENT: SELECT 1 2025-04-19 16:44:38.744 CDT [38963] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp38963.5", size 1073741824 ``` -- Sami Imseih Amazon Web Services (AWS)
Re: [BUG] temporary file usage report with extended protocol and unnamed portals
From
Guillaume Lelarge
Date:
Hi, On 20/04/2025 00:42, Sami Imseih wrote: > [...] >> I'm frankly inclined to do nothing, but if we must do something, >> the way to fix it here would be to transiently set debug_query_string >> to NULL so that the actions of PortalDrop aren't blamed on any >> particular query. > > I think this is better, because I rather we avoid lines like the below in which > there are temp files being reported all with STATEMENT logging > of a different query. It's better to just not show STATEMENT at all. > We should definitely avoid blaming the wrong query. I'd prefer we always log queries using temp files but if this isn't doable easily or without mistakes, then don't log the query at all. -- Guillaume Lelarge Consultant https://dalibo.com
Re: [BUG] temporary file usage report with extended protocol and unnamed portals
From
Frédéric Yhuel
Date:
On 4/20/25 00:42, Sami Imseih wrote: >> (In my testing, the "temporary file:" message comes >> out without any attached STATEMENT most of the time already, so this >> isn't losing much as far as that's concerned.) Indeed, this happens when using autocommit / implicit transactions. But if you disable autocommit, and I think this is the most common setting, we do have the attached statement. > moreover, as I was looking into why the case mentioned earlier > > ``` > SELECT FROM foo ORDER BY a \bind > ; > SELECT 1 \bind > ; > ``` > > does not show STATEMENT after the temp file logging, I realized > it's because the temp files are cleaned up and reported at > the end of transaction, which means that debug_query_string is NULL at the > time the portal is dropped in the next query. This causes > check_log_of_query to return false. > > /* query string available? */ > if (debug_query_string == NULL) > return false; > Yes, we actually have two related bugs. >> Perhaps a cleaner answer is to rearrange things in postgres.c >> so that if there's a pre-existing unnamed portal, we drop that >> before we ever set debug_query_string and friends at all. > That did cross my mind as well, but I was trying to avoid doing this > type of rearranging. I still rather not go down that path considering the > case mentioned above will still not display the query text in a STATEMENT log. IMHO, it would be nice to fix the explicit transactions case. The bug with implicit transactions is less serious. I can try to implement Tom's idea if we have a consensus. Best regards, Frédéric
Re: [BUG] temporary file usage report with extended protocol and unnamed portals
From
Frédéric Yhuel
Date:
On 4/21/25 07:46, Frédéric Yhuel wrote: > I can try to implement Tom's idea if we have a consensus. v3 attached. Would that do?
Attachment
Re: [BUG] temporary file usage report with extended protocol and unnamed portals
From
Frédéric Yhuel
Date:
On 4/21/25 07:46, Frédéric Yhuel wrote: > On 4/20/25 00:42, Sami Imseih wrote: >>> (In my testing, the "temporary file:" message comes >>> out without any attached STATEMENT most of the time already, so this >>> isn't losing much as far as that's concerned.) > > > Indeed, this happens when using autocommit / implicit transactions. Actually, this also happens with Java-style cursors, i.e. using the setFetchSize(<count>) method, which pgJDBC converts to using named portals and EXECUTE <name> <count> protocol messages. The explanation is probably very similar to the one Sami gave for the implicit transaction case. In any case, my v3 patch seems to fix all these cases. (I'm not saying it's good enough to be committed as is. I think I should at least add some comments. Anything else?)
Re: [BUG] temporary file usage report with extended protocol and unnamed portals
From
Sami Imseih
Date:
> In any case, my v3 patch seems to fix all these cases. > > (I'm not saying it's good enough to be committed as is. I think I should > at least add some comments. Anything else?) the patch relies on looking up queryDesc->sourceText inside DropPortal, which Tom raised concerns about earlier in the thread [0] So, It seems to me we are better off just setting debug_query_string to NULL in DropPortal, or alternatively why not just log the statement automatically at the start of execution whenever we have log_temp_files > 0. That will allow us to safely capture the statement to blame for the temp files and will cover all cases? [0] https://www.postgresql.org/message-id/CAA5RZ0ssqRTz_9T0Gy74SiTViiX3V0rSFxc4N_4GNcbEBK9wow%40mail.gmail.com -- Sami Imseih Amazon Web Services (AWS)
Re: [BUG] temporary file usage report with extended protocol and unnamed portals
From
Sami Imseih
Date:
> So, It seems to me we are better off just setting debug_query_string > to NULL in DropPortal, or alternatively why not just log the statement > automatically at the start of execution whenever we have log_temp_files > 0. > That will allow us to safely capture the statement to blame for the > temp files and will cover all cases? of course we only know about the temp file usage after execution, so this means we will probably end up with unnecessary statement logging, so this may not be such a good idea after all. So this tells me not logging STATEMENT along with log_temp_files is a better alternative and a statement can be logged with other existing mechanisms like log_min_duration_statement. -- Sami Imseih Amazon Web Services (AWS)
Re: [BUG] temporary file usage report with extended protocol and unnamed portals
From
Frédéric Yhuel
Date:
On 4/22/25 19:37, Sami Imseih wrote: > the patch relies on looking up queryDesc->sourceText inside DropPortal, > which Tom raised concerns about earlier in the thread [0] Yes, I think I had misunderstood what Tom said. Thank you for pointing that out. However, is it really unsafe? In exec_bind_message, the portal's query string comes from a duplicate of the original string (see CreateCachedPlan). So we are safe in this case. In exec_simple_query, the portal is dropped towards the end of this function, so we are safe here too. Am I missing something?
Re: [BUG] temporary file usage report with extended protocol and unnamed portals
From
Frédéric Yhuel
Date:
On 4/23/25 09:41, Frédéric Yhuel wrote: > > > On 4/22/25 19:37, Sami Imseih wrote: >> the patch relies on looking up queryDesc->sourceText inside DropPortal, >> which Tom raised concerns about earlier in the thread [0] > > Yes, I think I had misunderstood what Tom said. Thank you for pointing > that out. > > However, is it really unsafe? > > In exec_bind_message, the portal's query string comes from a duplicate > of the original string (see CreateCachedPlan). So we are safe in this case. > > In exec_simple_query, the portal is dropped towards the end of this > function, so we are safe here too. > > Am I missing something? Note: the patch doesn't work well with server-side prepared statements: the PREPARE query is blamed instead of the EXECUTE one. But this is maybe something that can be fixed easily.
Re: [BUG] temporary file usage report with extended protocol and unnamed portals
From
Sami Imseih
Date:
> Yes, I think I had misunderstood what Tom said. Thank you for pointing > that out. > > However, is it really unsafe? I have not been able to find a case where this is unsafe, but the documentation in mmgr/README does indicate that the query string may live shorter than the portal in some cases. ".... This is kept separate from per-transaction and per-portal contexts because a query string might need to live either a longer or shorter time than any single transaction or portal. """ Also, another strange behavior of the way portal cleanup occurs is that in extended-query-protocol and within a transaction, ExecutorEnd for the last query is not actually called until the next command. This just seems odd to me especially for extensions that rely on ExecutorEnd. So, Can we do something like this? This drops the portal as soon as execution completes ( the portal is fetched to completion ). This will ensure that there is no delay in ExecutorEnd getting called and in the case of log_temp_files, the message will be logged while debug_query_string is still pointing to the correct query. diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index dc4c600922d..efe0151ca8f 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -2327,6 +2327,9 @@ exec_execute_message(const char *portal_name, long max_rows) /* Send appropriate CommandComplete to client */ EndCommand(&qc, dest, false); + + if (!portal->portalPinned) + PortalDrop(portal, false); } else { ``` postgres=# begin; BEGIN postgres=*# select from foo order by a \bind postgres-*# ; -- (1000000 rows) postgres=*# select 1 \bind postgres-*# ; ?column? ---------- 1 (1 row) postgres=*# commit; COMMIT ``` ``` 2025-04-23 11:11:47.777 CDT [67362] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp67362.0", size 1009983488 2025-04-23 11:11:47.777 CDT [67362] STATEMENT: select from foo order by a ; ``` thoughts? -- Sami Imseih Amazon Web Services (AWS)
Re: [BUG] temporary file usage report with extended protocol and unnamed portals
From
Frédéric Yhuel
Date:
On 4/23/25 18:13, Sami Imseih wrote: > Also, another strange behavior of the way portal cleanup occurs is that > in extended-query-protocol and within a transaction, ExecutorEnd for the > last query is not actually called until the next command. This just seems > odd to me especially for extensions that rely on ExecutorEnd. > > So, Can we do something like this? This drops the portal as soon as > execution completes ( the portal is fetched to completion ). This will > ensure that there is no delay in ExecutorEnd getting called and in the > case of log_temp_files, the message will be logged while debug_query_string > is still pointing to the correct query. > > > diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c > index dc4c600922d..efe0151ca8f 100644 > --- a/src/backend/tcop/postgres.c > +++ b/src/backend/tcop/postgres.c > @@ -2327,6 +2327,9 @@ exec_execute_message(const char *portal_name, > long max_rows) > > /* Send appropriate CommandComplete to client */ > EndCommand(&qc, dest, false); > + > + if (!portal->portalPinned) > + PortalDrop(portal, false); > } > else > { I don't know if it is the correct solution, but it seems good to me (FWIW), and I've tested it and it works well in all the following cases: * Java program: extended protocol used for the two queries (the one that use the temp file and the SELECT 1). * Python program: the SELECT 1 is using the simple protocol. * SQL PREPARE / EXECUTE * Another version of the Java program using the setFetchSize() method (and named portals).
Re: [BUG] temporary file usage report with extended protocol and unnamed portals
From
Sami Imseih
Date:
Thanks for testing. I also tested it a bit more today with other patterns like different fetch sizes, named portal, etc. and I can't find an issue with this, but I could be missing something. I will go ahead and attach this change in patch form. -- Sami Imseih Amazon Web Services (AWS)
Attachment
Re: [BUG] temporary file usage report with extended protocol and unnamed portals
From
Sami Imseih
Date:
I found several issues with v4. It does not deal correctly with pipelining, and we should only really be concerned with dropping an unnamed portal only. So, v5 now moves the DropPortal call after the unnamed portal was executed to completion ( as v4 was doing ), but does so only in the case in which we are not inside a transaction-control statement or the portal was executing a command that can be run inside a transaction block. Also, I realize that explicit cursors ( DECLARE CURSOR ) will only log temp file at cursor close and in which case, the statement associated with the temp file logging is the CLOSE command: i.e. ``` 2025-04-26 18:46:38.084 UTC [10415] LOG: temporary file: path "base/pgsql_tmp/pgsql_tmp10415.0", size 1014030336 2025-04-26 18:46:38.084 UTC [10415] STATEMENT: close mycursor_1; ``` I don't think there is much we can do there, or should we. -- Sami Imseih