BUG #16624: Query Optimizer - Performance bug related to predicate simplification - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #16624: Query Optimizer - Performance bug related to predicate simplification |
Date | |
Msg-id | 16624-1375434d97659203@postgresql.org Whole thread Raw |
Responses |
Re: BUG #16624: Query Optimizer - Performance bug related to predicate simplification
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 16624 Logged by: XINYU LIU Email address: XINYULIU@UMICH.EDU PostgreSQL version: 13rc1 Operating system: Ubuntu 20.04 Description: Hello, We are developing a tool for automatically finding performance bugs in PostgreSQL. Our key insight is that given a pair of semantic equivalent queries, a robust DBMS should return the same result within a similar execution time. Significant time difference suggests a potential performance bug in the DBMS. We are sharing a pair of TPC-H queries that exhibit a potential performance bug in this report: First query: SELECT "ps_suppkey" FROM "partsupp" WHERE "ps_partkey" = 1486; Second query: SELECT "ps_suppkey" FROM "partsupp" WHERE "ps_partkey" + 1486 = 2972; [Actual Behavior] We executed both queries on the TPC-H benchmark of scale factor 5: the first query takes only 1.059 millisecond, while the second query takes 247.176 millisecond. We think the time difference results from different plans selected. [Query Execution Plan] * First query: QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Index Only Scan using partsupp_pkey on partsupp (cost=0.43..4.59 rows=9 width=4) (actual time=0.692..0.694 rows=4 loops=1) Index Cond: (ps_partkey = 1486) Heap Fetches: 0 Planning Time: 4.748 ms Execution Time: 1.059 ms (5 rows) * Second query: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------- Gather (cost=1000.43..91865.94 rows=19994 width=4) (actual time=2.032..246.821 rows=4 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Index Only Scan using partsupp_pkey on partsupp (cost=0.43..88866.54 rows=8331 width=4) (actual time=159.371..240.012 rows=1 loops=3) Filter: ((ps_partkey + 1486) = 2972) Rows Removed by Filter: 1333332 Heap Fetches: 0 Planning Time: 4.556 ms Execution Time: 247.176 ms (9 rows) [Expected Behavior] I would have expected the DBMS to run these two queries with similar execution time, given that they both have the same semantics. Notably, the execution time difference between these two queries will grow significantly when the size of the database grows. On the TPC-H benchmark of scale factor 100, the first query takes 1.9 millisecond, while the second query takes 83 seconds. [Test Environment] Ubuntu 20.04 machine "Linux panda 5.4.0-40-generic #44-Ubuntu SMP Tue Jun 23 00:01:04 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux" PostgreSQL v13 beta3 Database: TPC-H benchmark (with scale factor 5) [Steps for reproducing our observations] * Download the dataset from the link: https://drive.google.com/file/d/13rFa1BNDi4e2RmXBn-yEQkcqt6lsBu1c/view?usp=sharing * Set up TPC-H benchmark tar xzvf tpch5_postgresql.tar.gz cd tpch5_postgresql db=tpch5 createdb $db psql -d $db < dss.ddl for i in `ls *.tbl` do echo $i name=`echo $i|cut -d'.' -f1` psql -d $db -c "COPY $name FROM '`pwd`/$i' DELIMITER '|' ENCODING 'LATIN1';" done psql -d $db < dss_postgres.ri * Execute the queries
pgsql-bugs by date: