BUG #18909: Query creates millions of temporary files and stalls - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #18909: Query creates millions of temporary files and stalls |
Date | |
Msg-id | 18909-e5e1b702c9441b8a@postgresql.org Whole thread Raw |
Responses |
Re: BUG #18909: Query creates millions of temporary files and stalls
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 18909 Logged by: Sergey Koposov Email address: skoposov@ed.ac.uk PostgreSQL version: 15.12 Operating system: Linux, Debian 10 Description: Hi, I have a query: select phot_g_mean_mag,phot_bp_mean_mag,phot_rp_mean_mag, g.source_id,pmra,pmdec,parallax,parallax_error, d.ra,d.dec,d.ebv,d.gmeanpsfmag,rmeanpsfmag,imeanpsfmag,gkronmag,rkronmag from disk_sample1 as d left join gaia_edr3_aux.panstarrs1bestneighbour as gaps1 on (gaps1.original_ext_source_id=d.objid) left join gaia_edr3.gaia_source as g on ( g.source_id = gaps1.source_id ) that I ran (from python client), and after it didn't finish in ~ an hour, i started to investigate, finding that the pgsql_tmp folder had 15 million temporary files from that query. If try to explain it in psql, It has this plan: ------------------------------------------------------------------------------------------------------------------------------ Gather (cost=14443102.62..111487693.90 rows=65470868 width=100) Workers Planned: 4 -> Nested Loop Left Join (cost=14442102.62..104939607.10 rows=16367717 width=100) -> Parallel Hash Left Join (cost=14442102.04..22124798.60 rows=16367717 width=60) Hash Cond: (d.objid = gaps1.original_ext_source_id) -> Parallel Seq Scan on disk_sample1 d (cost=0.00..1218371.17 rows=16367717 width=60) -> Parallel Hash (cost=10307380.24..10307380.24 rows=237862624 width=16) -> Parallel Seq Scan on panstarrs1bestneighbour gaps1 (cost=0.00..10307380.24 rows=237862624 width=16) -> Index Scan using gaia_sourcex_source_id_idx on gaia_source g (cost=0.58..5.05 rows=1 width=48) Index Cond: (source_id = gaps1.source_id) JIT: Functions: 14 Options: Inlining true, Optimization true, Expressions true, Deforming true (13 rows) And if I just do create table create table disk_sample1_gaia as select phot_g_mean_mag,phot_bp_mean_mag,phot_rp_mean_mag, g.source_id,pmra,pmdec,parallax,parallax_error,d.ra,d.dec,d.ebv,d.gmeanpsfmag,rmeanps fmag,imeanpsfmag,gkronmag,rkronmag from disk_sample1 as d left join gaia_edr3_aux.panstarrs1bestneighbour as gaps1 on (gaps1.original_ext_source_id=d.objid) left join gaia_edr3.gaia _source as g on ( g.source_id = gaps1.source_id ); That command finished fine without creating many temporary files. The resulting table has 65mill rows. However if I again run the query from python, I very quickly saw postgresql creating 300k+ files. At which point I interrupted the query. The way the query is run from python is using https://github.com/segasai/sqlutilpy which basically does this (using psycopg): 1) creates named cursor 2) set cursor_tuple_fraction TO 1 3) FETCH the result in 100,000 row chunks Here's the gdb trace of postgres from the situation when the pg tries to create millions of tmp files ********** #0 0x00007fcd4ce6cc8b in __libc_open64 (file=file@entry=0x7ffcce70e3e0 "base/pgsql_tmp/pgsql_tmp78857.224669", oflag=oflag@entry=578) at ../sysdeps/unix/sysv/linux/open64.c:48 #1 0x000056430b609eac in BasicOpenFilePerm (fileName=0x7ffcce70e3e0 "base/pgsql_tmp/pgsql_tmp78857.224669", fileFlags=578, fileMode=384) at fd.c:1125 #2 0x000056430b60a2f3 in PathNameOpenFilePerm (fileName=0x7ffcce70e3e0 "base/pgsql_tmp/pgsql_tmp78857.224669", fileFlags=578, fileMode=384) at fd.c:1603 #3 0x000056430b60a4fa in PathNameOpenFile (fileFlags=578, fileName=0x7ffcce70e3e0 "base/pgsql_tmp/pgsql_tmp78857.224669") at fd.c:1806 #4 OpenTemporaryFileInTablespace (tblspcOid=<optimized out>, rejectError=<optimized out>) at fd.c:1806 #5 0x000056430b60a6d9 in OpenTemporaryFile (interXact=interXact@entry=false) at fd.c:1743 #6 0x000056430b6082c4 in BufFileCreateTemp (interXact=interXact@entry=false) at buffile.c:204 #7 0x000056430b4ccaeb in ExecHashJoinSaveTuple (tuple=tuple@entry=0x56430c243cd8, hashvalue=<optimized out>, hashvalue@entry=1412196535, fileptr=0x7fc500720d98) at nodeHashjoin.c:1248 #8 0x000056430b4c9b3b in ExecHashTableInsert (hashtable=hashtable@entry=0x56430c2434a8, slot=slot@entry=0x56430c208178, hashvalue=hashvalue@entry=1412196535) at nodeHash.c:1714 #9 0x000056430b4ca2fc in MultiExecPrivateHash (node=<optimized out>) at nodeHash.c:188 #10 MultiExecHash (node=node@entry=0x56430c10e098) at nodeHash.c:115 #11 0x000056430b4b213e in MultiExecProcNode (node=node@entry=0x56430c10e098) at execProcnode.c:520 #12 0x000056430b4cb5dc in ExecHashJoinImpl (parallel=false, pstate=0x56430c10ca68) at nodeHashjoin.c:297 #13 ExecHashJoin (pstate=0x56430c10ca68) at nodeHashjoin.c:621 #14 0x000056430b4aa92a in ExecProcNode (node=0x56430c10ca68) at ../../../src/include/executor/executor.h:262 #15 ExecutePlan (queryDesc=queryDesc@entry=0x56430c121928, operation=operation@entry=CMD_SELECT, sendTuples=sendTuples@entry=true, numberTuples=numberTuples@entry=100000, direction=direction@entry=ForwardScanDirection, dest=dest@entry=0x56430bfb70b8) at execMain.c:1640 #16 0x000056430b4ab03b in standard_ExecutorRun (queryDesc=0x56430c121928, direction=ForwardScanDirection, count=100000, execute_once=<optimized out>) at execMain.c:362 #17 0x00007fcd4d2556d5 in explain_ExecutorRun (queryDesc=0x56430c121928, direction=ForwardScanDirection, count=100000, execute_once=<optimized out>) at auto_explain.c:322 #18 0x000056430b641989 in PortalRunSelect (portal=0x56430c062378, forward=<optimized out>, count=100000, dest=<optimized out>) at pquery.c:922 #19 0x000056430b643356 in PortalRunFetch (portal=0x56430c062378, fdirection=FETCH_FORWARD, count=100000, dest=dest@entry=0x56430bfb70b8) at pquery.c:1425 #20 0x000056430b45ad19 in PerformPortalFetch (stmt=stmt@entry=0x56430bf5e710, dest=dest@entry=0x56430bfb70b8, qc=qc@entry=0x7ffcce70ee00) at portalcmds.c:198 #21 0x000056430b644462 in standard_ProcessUtility (pstmt=0x56430bf5ea30, queryString=0x56430bf5dd08 "FETCH FORWARD 100000 FROM \"sqlutilcursor\"", readOnlyTree=<optimized out>, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x56430bfb70b8, qc=0x7ffcce70ee00) at utility.c:710 #22 0x000056430b641e90 in PortalRunUtility (portal=portal@entry=0x56430c062268, pstmt=0x56430bf5ea30, isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=true, dest=dest@entry=0x56430bfb70b8, qc=qc@entry=0x7ffcce70ee00) at pquery.c:1156 #23 0x000056430b642517 in FillPortalStore (portal=portal@entry=0x56430c062268, isTopLevel=isTopLevel@entry=true) at ../../../src/include/nodes/pg_list.h:260 #24 0x000056430b64318d in PortalRun (portal=portal@entry=0x56430c062268, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x56430bf5eb10, altdest=altdest@entry=0x56430bf5eb10, qc=0x7ffcce70eff0) at pquery.c:761 #25 0x000056430b63ee91 in exec_simple_query (query_string=0x56430bf5dd08 "FETCH FORWARD 100000 FROM \"sqlutilcursor\"") at postgres.c:1254 #26 0x000056430b63fbd5 in PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4691 #27 0x000056430b5ac569 in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4515 #28 BackendStartup (port=<optimized out>) at postmaster.c:4243 #29 ServerLoop () at postmaster.c:1811 --Type <RET> for more, q to quit, c to continue without paging-- #30 0x000056430b5ad627 in PostmasterMain (argc=3, argv=0x56430bf574f0) at postmaster.c:1483 #31 0x000056430b2f2d84 in main (argc=3, argv=0x56430bf574f0) at main.c:204 ************* Interesting when I tried to run the same query when disabling jit, i.e. set jit_above_cost=-1 before the query, then the query finished fine without creating millions of tmp files Also, I tried to run the query just in psql. ( I didn't finish it, since it's 60+ mil rows), but I think it works fine, i don't see explosion in number of tmp files. , so I think the fact that the named cursor was used is important. I'm happy to provide more info if I can/ if needed. Thanks, Sergey
pgsql-bugs by date: