BUG #5543: Poor performance - Index scan backwards not used for order by desc with partitioned tables - Mailing list pgsql-bugs
From | Ranga Gopalan |
---|---|
Subject | BUG #5543: Poor performance - Index scan backwards not used for order by desc with partitioned tables |
Date | |
Msg-id | 201007061820.o66IK6Xd099664@wwwmaster.postgresql.org Whole thread Raw |
Responses |
Re: BUG #5543: Poor performance - Index scan backwards not
used for order by desc with partitioned tables
|
List | pgsql-bugs |
The following bug has been logged online: Bug reference: 5543 Logged by: Ranga Gopalan Email address: ranga_gopalan@hotmail.com PostgreSQL version: 8.4.4 Operating system: Linux x86-64 Description: Poor performance - Index scan backwards not used for order by desc with partitioned tables Details: My problem is regarding ORDER BY / LIMIT query behavior when using partitioning. I have a large table (about 100 columns, several million rows) partitioned by a column called day (which is the date stored as yyyymmdd - say 20100502 for May 2nd 2010 etc.). Say the main table is called FACT_TABLE and each child table is called FACT_TABLE_yyyymmdd (e.g. FACT_TABLE_20100502, FACT_TABLE_20100503 etc.) and has an appropriate CHECK constraint created on it to CHECK (day = yyyymmdd). The query pattern I am looking at is (I have tried to simplify the column names for readability): SELECT F1 from FACT_TABLE where day >= 20100502 and day <= 20100507 # selecting for a week ORDER BY F2 desc LIMIT 100 This is what is happening: When I query from the specific day's (child) table, I get what I expect - a descending Index scan and good performance. # explain select F1 from FACT_TABLE_20100502 where day = 20100502 order by F2 desc limit 100; QUERY PLAN ---------------------------------------------------------------------------- -------------------------------------------------------------------- -- Limit (cost=0.00..4.81 rows=100 width=41) -> Index Scan Backward using F2_20100502 on FACT_TABLE_20100502 (cost=0.00..90355.89 rows=1876985 width=41 ) Filter: (day = 20100502) BUT: When I do the same query against the parent table it is much slower - two things seem to happen - one is that the descending scan of the index is not done and secondly there seems to be a separate sort/limit at the end - i.e. all data from all partitions is retrieved and then sorted and limited - This seems to be much less efficient than doing a descending scan on each partition and limiting the results and then combining and reapplying the limit at the end. explain select F1 from FACT_TABLE where day = 20100502 order by F2 desc limit 100; QUERY PLAN ---------------------------------------------------------------------------- -------------------------------------------------------------------- --- Limit (cost=20000084948.01..20000084948.01 rows=100 width=41) -> Sort (cost=20000084948.01..20000084994.93 rows=1876986 width=41) Sort Key: public.FACT_TABLE.F2 -> Result (cost=10000000000.00..20000084230.64 rows=1876986 width=41) -> Append (cost=10000000000.00..20000084230.64 rows=1876986 width=41) -> Seq Scan on FACT_TABLE (cost=10000000000.00..10000000010.02 rows=1 width=186) Filter: (day = 20100502) -> Seq Scan on FACT_TABLE_20100502 FACT_TABLE (cost=10000000000.00..10000084220.62 rows=1876985 width=4 1) Filter: (day = 20100502) (9 rows)
pgsql-bugs by date: