open/lseek overhead with many partitions (including with "fasterpartitioning pruning") - Mailing list pgsql-hackers
From | Justin Pryzby |
---|---|
Subject | open/lseek overhead with many partitions (including with "fasterpartitioning pruning") |
Date | |
Msg-id | 20180403194613.GY28454@telsasoft.com Whole thread Raw |
Responses |
Re: open/lseek overhead with many partitions (including with "fasterpartitioning pruning")
Re: open/lseek overhead with many partitions (including with "fasterpartitioning pruning") |
List | pgsql-hackers |
TLDR: even with "faster pruning" patch, pg11dev open()+lseek() every file backing every table being queried, even for those partitions eventually "excluded". One of our customers (with the largest number of child tables, currently of relkind='r') takes a long time to plan query, and I wondered whether we'd benefit from ongoing work, if I should plan to convert to relkind='p', and if I could start testing that for our use cases. I found by stracing that the backend is open()ing + lseek(SEEK_END) each file in each relation being UNIONed in this view, including relations which were excluded by constraints. Their tables are partitioned by day (some are still 40+GB), so there's only 10-20 tables being included in the above query, total size perhaps ~50GB and under 50 backing files, but accessing a LOT more files than needed: cold backend: [pryzbyj@database ~]$ grep open /tmp/strace-pg12-0|cut -d/ -f1,2 |sort |uniq -c |sort -nr |head 26593 open("pg_tblspc/16400 6894 open("pg_tblspc/16401 482 open("base/16402 warm backend: [pryzbyj@database ~]$ grep open /tmp/strace-pg12-1|cut -d/ -f1,2 |sort |uniq -c |sort -nr |head 6545 open("pg_tblspc/16401 1338 open("pg_tblspc/16400 254 open("base/16402 I was curious if that was improved in pg11dev, and if it applied to relkind='r' or only to relkind='p'. I applied v47 patches for "path toward faster partition pruning", and was able to quickly create synthetic "huge table" using truncate (and verified that smgr logic actually opens all 1e5 files). $ psql -d postgres -h /tmp --port 5678 -c "CREATE TABLE t(i int) PARTITION BY range(i)" $ for a in `seq 1 999`; do psql -d postgres -h /tmp --port 5678 -c "CREATE TABLE t$a PARTITION OF t FOR VALUES FROM ($a)TO($((1+a)))";done $ time for a in ./test11dev/base/13236/1???[0-9]; do echo "$a..."; truncate -s 1G $a; done # make existing, empty files 1GB.. $ time for a in ./test11dev/base/13236/1???[0-9]; do echo "$a..."; for b in `seq 1 99`; do truncate -s 1G $a.$b; done; done# make new 1GB tables with .n extension postgres=# explain SELECT COUNT(1) FROM t WHERE i=-99 LIMIT 1; Limit (cost=0.00..0.01 rows=1 width=8) -> Aggregate (cost=0.00..0.01 rows=1 width=8) -> Result (cost=0.00..0.00 rows=0 width=0) One-Time Filter: false Time: 63268.474 ms (01:03.268) => zfs/fuse is certainly exacerbating the issue here due to context switches; but.. [pryzbyj@database postgresql]$ time cut -d\( -f1 /tmp/strace-pg11dev-faster-15|sort |uniq -c |sort -nr |head 100899 open 99900 lseek 98918 close 118 brk postgres is still open()+lseek() on 100k files, most of which are for relations which were clearly excluded: (gdb) bt #0 0x00000032b2adb580 in __open_nocancel () from /lib64/libc.so.6 #1 0x0000000000711827 in BasicOpenFilePerm (fileName=0x1e2c9d0 "base/13236/16759.10", fileFlags=2, fileMode=384) at fd.c:965 #2 0x00000000007121ea in PathNameOpenFilePerm (fileName=0x1e2c9d0 "base/13236/16759.10", fileFlags=2, fileMode=384) at fd.c:1394 #3 0x0000000000733f6e in _mdfd_openseg (reln=0x1e14d98, forknum=MAIN_FORKNUM, segno=10, oflags=0) at md.c:1783 #4 0x0000000000734026 in mdnblocks (reln=0x1e14d98, forknum=MAIN_FORKNUM) at md.c:918 #5 0x00000000006b26bc in estimate_rel_size (rel=0x7f3f707f4b98, attr_widths=0x1e2cd24, pages=<value optimized out>, tuples=0x1e2cb98,allvisfrac=<value optimized out>) at plancat.c:946 #6 0x00000000006b3a47 in get_relation_info (root=0x1bd9030, relationObjectId=16759, inhparent=false, rel=0x1e2cae0) at plancat.c:144 #7 0x00000000006b78fa in build_simple_rel (root=0x1bd9030, relid=126, parent=0x1bda578) at relnode.c:185 #8 0x00000000006b7990 in build_simple_rel (root=0x1bd9030, relid=1, parent=0x0) at relnode.c:251 #9 0x0000000000691be3 in add_base_rels_to_query (root=0x1bd9030, jtnode=<value optimized out>) at initsplan.c:121 #10 0x00000000006924af in query_planner (root=<value optimized out>, tlist=0x1d34fd8, qp_callback=0x693940 <standard_qp_callback>,qp_extra=0x7ffe0a077290) at planmain.c:138 #11 0x000000000069777e in grouping_planner (root=<value optimized out>, inheritance_update=false, tuple_fraction=<value optimizedout>) at planner.c:1892 #12 0x0000000000698ef7 in subquery_planner (glob=<value optimized out>, parse=<value optimized out>, parent_root=<value optimizedout>, hasRecursion=<value optimized out>, tuple_fraction=0) at planner.c:966 #13 0x0000000000699d97 in standard_planner (parse=0x1bd9778, cursorOptions=256, boundParams=<value optimized out>) at planner.c:405 #14 0x0000000000739d2a in pg_plan_query (querytree=<value optimized out>, cursorOptions=<value optimized out>, boundParams=<valueoptimized out>) at postgres.c:808 #15 0x00000000005a0bd6 in ExplainOneQuery (query=0x1bd9778, cursorOptions=<value optimized out>, into=0x0, es=0x1bd8f58,queryString=0x1ae0fb0 "explain SELECT COUNT(1) FROM t WHERE i=-99 LIMIT 1;", params=0x0, queryEnv=0x0) at explain.c:365 #16 0x00000000005a0e5d in ExplainQuery (pstate=0x1bd8d80, stmt=0x1ae1fa0, queryString=0x1ae0fb0 "explain SELECT COUNT(1)FROM t WHERE i=-99 LIMIT 1;", params=0x0, queryEnv=0x0, dest=0x1bd8e90) at explain.c:254 #17 0x00000000007408f2 in standard_ProcessUtility (pstmt=0x1ae2050, queryString=0x1ae0fb0 "explain SELECT COUNT(1) FROM tWHERE i=-99 LIMIT 1;", context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1bd8e90, completionTag=0x7ffe0a077720 "") at utility.c:670 #18 0x000000000073c789 in PortalRunUtility (portal=0x1b46780, pstmt=0x1ae2050, isTopLevel=<value optimized out>, setHoldSnapshot=<valueoptimized out>, dest=0x1bd8e90, completionTag=<value optimized out>) at pquery.c:1178 #19 0x000000000073d9aa in FillPortalStore (portal=0x1b46780, isTopLevel=true) at pquery.c:1038 #20 0x000000000073de80 in PortalRun (portal=0x1b46780, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x1ae2618,altdest=0x1ae2618, completionTag=0x7ffe0a077920 "") at pquery.c:768 #21 0x000000000073a39d in exec_simple_query (query_string=0x1ae0fb0 "explain SELECT COUNT(1) FROM t WHERE i=-99 LIMIT 1;")at postgres.c:1121 #22 0x000000000073b341 in PostgresMain (argc=<value optimized out>, argv=<value optimized out>, dbname=0x1b0ac80 "postgres",username=<value optimized out>) at postgres.c:4149 #23 0x00000000006cd4f3 in BackendRun (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:4409 #24 BackendStartup (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:4081 #25 ServerLoop (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1754 #26 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1362 #27 0x00000000006477d0 in main (argc=7, argv=0x1adba20) at main.c:228 I haven't been following the development patch, so don't know if that's expected to be improved, or a possible future improvement, or ... Also, I wondered if it'd be preferable for mdnblocks/estimate_rel_size to l/stat() rather than open+lseek, which can require opening large number of rarely-queried files and mess up fd.c LRU cache. Justin
pgsql-hackers by date: