Thread: Partitions and the optimizer.
Optimizer doesn't behave well when the tables are partitioned. Here is the problem: news=# explain select max("document#") from moreover_documents_y2010m08; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------ Result (cost=0.15..0.16 rows=1 width=0) InitPlan 1 (returns $0) -> Limit (cost=0.00..0.15 rows=1 width=8) -> Index Scan Backward using pk_moreover_documents_y2010m08 on moreover_documents_y2010m08 (cost=0.00..2169871.61 rows=14615132 width=8) Filter: ("document#" IS NOT NULL) (5 rows) Time: 31.191 ms news=# explain select max("document#") from moreover_documents; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Aggregate (cost=4227019.40..4227019.41 rows=1 width=8) -> Append (cost=0.00..4145103.32 rows=32766432 width=8) -> Seq Scan on moreover_documents (cost=0.00..10.20 rows=20 width=8) -> Seq Scan on moreover_documents_y2010m06 moreover_documents (cost=0.00..236523.53 rows=1856853 width=8) -> Seq Scan on moreover_documents_y2010m07 moreover_documents (cost=0.00..2073778.27 rows=16294327 width=8) -> Seq Scan on moreover_documents_y2010m08 moreover_documents (cost=0.00..1834740.32 rows=14615132 width=8) -> Seq Scan on moreover_documents_y2010m09 moreover_documents (cost=0.00..10.20 rows=20 width=8) -> Seq Scan on moreover_documents_y2010m10 moreover_documents (cost=0.00..10.20 rows=20 width=8) -> Seq Scan on moreover_documents_y2010m11 moreover_documents (cost=0.00..10.20 rows=20 width=8) -> Seq Scan on moreover_documents_y2010m12 moreover_documents (cost=0.00..10.20 rows=20 width=8) -> Seq Scan on moreover_documents_y2011m01 moreover_documents (cost=0.00..10.20 rows=20 width=8) (11 rows) Time: 31.961 ms news=# I have several partitions for the table, 3 of them are loaded. When I search for the maximum value of the primary key column on a single partition, the access path is as it should be, using the PK index. Each of the partitions has the same PK. When I try selecting the maximum on the entire table, the indexes on the partitions are not used, Postgres does the full scan instead. The difference in timing is drastic: news=# select max("document#") from moreover_documents_y2010m08; max ------------ 1175107508 (1 row) Time: 56.778 ms news=# select max("document#") from moreover_documents; max ------------ 1175107508 (1 row) Time: 200490.228 ms news=# -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions
Hello, > Optimizer doesn't behave well when the tables are partitioned. Here is > the problem: > > news=# explain > select max("document#") from moreover_documents_y2010m08; > > QUERY > PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------------ > Result (cost=0.15..0.16 rows=1 width=0) > InitPlan 1 (returns $0) > -> Limit (cost=0.00..0.15 rows=1 width=8) > -> Index Scan Backward using pk_moreover_documents_y2010m08 > on moreover_documents_y2010m08 (cost=0.00..2169871.61 rows=14615132 > width=8) > Filter: ("document#" IS NOT NULL) > (5 rows) > > Time: 31.191 ms > news=# explain > select max("document#") from moreover_documents; > QUERY > PLAN > ----------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=4227019.40..4227019.41 rows=1 width=8) > -> Append (cost=0.00..4145103.32 rows=32766432 width=8) > -> Seq Scan on moreover_documents (cost=0.00..10.20 rows=20 > width=8) > -> Seq Scan on moreover_documents_y2010m06 moreover_documents > (cost=0.00..236523.53 rows=1856853 width=8) > -> Seq Scan on moreover_documents_y2010m07 moreover_documents > (cost=0.00..2073778.27 rows=16294327 width=8) > -> Seq Scan on moreover_documents_y2010m08 moreover_documents > (cost=0.00..1834740.32 rows=14615132 width=8) > -> Seq Scan on moreover_documents_y2010m09 moreover_documents > (cost=0.00..10.20 rows=20 width=8) > -> Seq Scan on moreover_documents_y2010m10 moreover_documents > (cost=0.00..10.20 rows=20 width=8) > -> Seq Scan on moreover_documents_y2010m11 moreover_documents > (cost=0.00..10.20 rows=20 width=8) > -> Seq Scan on moreover_documents_y2010m12 moreover_documents > (cost=0.00..10.20 rows=20 width=8) > -> Seq Scan on moreover_documents_y2011m01 moreover_documents > (cost=0.00..10.20 rows=20 width=8) > (11 rows) > > Time: 31.961 ms > news=# > > I have several partitions for the table, 3 of them are loaded. When I > search for the maximum value of the primary key column on a single > partition, the access path is as it should be, using the PK index. Each > of the partitions has the same PK. When I try selecting the maximum on > the entire table, the indexes on the partitions are not used, Postgres > does the full scan instead. The difference in timing is drastic: > news=# select max("document#") from moreover_documents_y2010m08; > max > ------------ > 1175107508 > (1 row) > > Time: 56.778 ms > news=# select max("document#") from moreover_documents; > max > ------------ > 1175107508 > (1 row) > > Time: 200490.228 ms > news=# > > -- I am a PostgreSQL novice myself. So please try my suggestions only if you have enough time to experiment. If you try select max("document#") from ( select "document#" from moreover_documents_y2010m08 union select "document#" from moreover_documents_xxx ) would that improve performance? Regards, Jayadevan DISCLAIMER: "The information in this e-mail and any attachment is intended only for the person to whom it is addressed and may contain confidential and/or privileged material. If you have received this e-mail in error, kindly contact the sender and destroy all copies of the original communication. IBS makes no warranty, express or implied, nor guarantees the accuracy, adequacy or completeness of the information contained in this email or any attachment and is not liable for any errors, defects, omissions, viruses or for resultant loss or damage, if any, direct or indirect."
Jayadevan M wrote: > > I am a PostgreSQL novice myself. So please try my suggestions only if you > have enough time to experiment. > Good ideas can come from anybody. Being a novice is legal in almost all states. Not sure about AZ, though. > If you try > select max("document#") from ( > select "document#" from moreover_documents_y2010m08 > union > select "document#" from moreover_documents_xxx > ) > would that improve performance? > > Regards, > Jayadevan > Yes, this does improve performance: news=> explain select max(doc) from ( select max("document#") as doc from moreover_documents_y2010m08 union all select max("document#") as doc from moreover_documents_y2010m07) as dc; QUER Y PLAN -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -- Aggregate (cost=0.34..0.35 rows=1 width=8) -> Append (cost=0.15..0.34 rows=2 width=8) -> Result (cost=0.15..0.16 rows=1 width=0) InitPlan 1 (returns $0) -> Limit (cost=0.00..0.15 rows=1 width=8) -> Index Scan Backward using pk_moreover_documents_y2010 m08 on moreover_documents_y2010m08 (cost=0.00..2169871.61 rows=14615132 width=8 ) Filter: ("document#" IS NOT NULL) -> Result (cost=0.15..0.16 rows=1 width=0) InitPlan 2 (returns $1) -> Limit (cost=0.00..0.15 rows=1 width=8) -> Index Scan Backward using pk_moreover_documents_y2010 m07 on moreover_documents_y2010m07 (cost=0.00..2419975.91 rows=16294327 width=8 ) Filter: ("document#" IS NOT NULL) (12 rows) The point of my post is that the optimizer should be doing that, not me. I recently saw a presentation about the Postgres optimizer which begun with: "don't assume that you're smarter than the optimizer". Well, I apparently an smarter than the optimizer and I am not even particularly smart. -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 www.vmsinfo.com