Thread: small temp files
What would be the cause of small temp files? For example: temporary file: path "base/pgsql_tmp/pgsql_tmp1261242.83979", size 7452 work_mem is set to 128MB -- Scott Ribe scott_ribe@elevated-dev.com https://www.linkedin.com/in/scottribe/
Am 22.07.24 um 14:50 schrieb Scott Ribe: > What would be the cause of small temp files? For example: > > temporary file: path "base/pgsql_tmp/pgsql_tmp1261242.83979", size 7452 > > work_mem is set to 128MB > > > -- > Scott Ribe > scott_ribe@elevated-dev.com > https://www.linkedin.com/in/scottribe/ > > Typically, queries which need a lot of memory (RAM) create temp files if work_mem isn't sufficient for some sorting or hash algorithms. Increasing work_mem will help, but small temp files don't create any trouble. You can set work_mem within each session, don't set it high globally. Regards, Holger > > -- Holger Jakobs, Bergisch Gladbach
Attachment
> On Jul 22, 2024, at 7:00 AM, Holger Jakobs <holger@jakobs.com> wrote: > > Typically, queries which need a lot of memory (RAM) create temp files if work_mem isn't sufficient for some sorting orhash algorithms. > > Increasing work_mem will help, but small temp files don't create any trouble. > > You can set work_mem within each session, don't set it high globally. I understand those things--my question is why, with work_mem set to 128MB, I would see tiny temp files (7452 is common, asis 102, and I've seen as small as 51).
On 22/07/2024 14:28, Scott Ribe wrote:
From the manual:I understand those things--my question is why, with work_mem set to 128MB, I would see tiny temp files (7452 is common, as is 102, and I've seen as small as 51).
"Note that a complex query might perform several sort and hash operations at the same time, with each operation generally being allowed to use as much memory as this value specifies before it starts to write data into temporary files. Also, several running sessions could be doing such operations concurrently. Therefore, the total memory used could be many times the value of
work_mem
; it is necessary to keep this fact in mind when choosing the value. Sort operations are used for ORDER BY
, DISTINCT
, and merge joins. Hash tables are used in hash joins, hash-based aggregation, memoize nodes and hash-based processing of IN
subqueries."So, if it's doing lots of joins, there may be lots of bits of temporary data which together add up to more than work_mem. AIUI PostgreSQL doesn't necessarily shove all the temporary data for one query into one file, it may have multiple smaller files for the data for a single query.
On Monday, July 22, 2024, Scott Ribe <scott_ribe@elevated-dev.com> wrote:
> On Jul 22, 2024, at 7:00 AM, Holger Jakobs <holger@jakobs.com> wrote:
>
> Typically, queries which need a lot of memory (RAM) create temp files if work_mem isn't sufficient for some sorting or hash algorithms.
>
> Increasing work_mem will help, but small temp files don't create any trouble.
>
> You can set work_mem within each session, don't set it high globally.
I understand those things--my question is why, with work_mem set to 128MB, I would see tiny temp files (7452 is common, as is 102, and I've seen as small as 51).
You expect the smallest temporary file to be 128MB? I.e., if the memory used exceeds work_mem all of it gets put into the temp file at that point? Versus only the amount of data that exceeds work_mem getting pushed out to the temporary file. The overflow only design seems much more reasonable - why write to disk that which fits, and already exists, in memory.
David J.
> ...with each operation generally being allowed to use as much memory as this value specifies before it starts to writedata into temporary files. So, doesn't explain the 7452-byte files. Unless an operation can use a temporary file as an addendum to work_mem, insteadof spilling the RAM contents to disk as is my understanding. > So, if it's doing lots of joins, there may be lots of bits of temporary data which together add up to more than work_mem. If it's doing lots of joins, each will get work_mem--there is no "adding up" among operations using work_mem. > You expect the smallest temporary file to be 128MB? I.e., if the memory used exceeds work_mem all of it gets put intothe temp file at that point? Versus only the amount of data that exceeds work_mem getting pushed out to the temporaryfile. The overflow only design seems much more reasonable - why write to disk that which fits, and already exists,in memory. Well, I don't know of an algorithm which can effectively sort 128MB + 7KB of data using 128MB of RAM and a 7KB file. Samefor many of the other operations which use work_mem, so yes, I expected spill over to start with 128MB file and growit as needed. If I'm wrong and there are operations which can effectively use temp files as adjunct, then that wouldbe the answer to my question. Does anybody know for sure that this is the case?
Scott Ribe <scott_ribe@elevated-dev.com> writes: >> You expect the smallest temporary file to be 128MB? I.e., if the memory used exceeds work_mem all of it gets put intothe temp file at that point? Versus only the amount of data that exceeds work_mem getting pushed out to the temporaryfile. The overflow only design seems much more reasonable - why write to disk that which fits, and already exists,in memory. > Well, I don't know of an algorithm which can effectively sort 128MB + 7KB of data using 128MB of RAM and a 7KB file. Samefor many of the other operations which use work_mem, so yes, I expected spill over to start with 128MB file and growit as needed. If I'm wrong and there are operations which can effectively use temp files as adjunct, then that wouldbe the answer to my question. Does anybody know for sure that this is the case? You would get more specific answers if you provided an example of the queries that cause this, with EXPLAIN ANALYZE output. But I think a likely bet is that it's doing a hash join that overruns work_mem. What will happen is that the join gets divided into batches based on hash codes, and each batch gets dumped into its own temp files (one per batch for each side of the join). It would not be too surprising if some of the batches are small, thanks to the vagaries of hash values. Certainly they could be less than work_mem, since the only thing we can say for sure is that the sum of the temp file sizes for the inner side of the join should exceed work_mem. regards, tom lane
> On Jul 22, 2024, at 8:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > You would get more specific answers if you provided an example of the > queries that cause this, with EXPLAIN ANALYZE output. But I think a > likely bet is that it's doing a hash join that overruns work_mem. > What will happen is that the join gets divided into batches based on > hash codes, and each batch gets dumped into its own temp files (one > per batch for each side of the join). It would not be too surprising > if some of the batches are small, thanks to the vagaries of hash > values. Certainly they could be less than work_mem, since the only > thing we can say for sure is that the sum of the temp file sizes for > the inner side of the join should exceed work_mem. OK, that makes total sense, and fits our usage patterns. (Lots of complex queries, lots of hash joins.) thanks