Home > mailing lists

Re: A Guide to Constraint Exclusion (Partitioning) - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: A Guide to Constraint Exclusion (Partitioning)
Date	July 22, 2005 19:33:05
Msg-id	17011.1122071572@sss.pgh.pa.us Whole thread Raw
In response to	A Guide to Constraint Exclusion (Partitioning) (Simon Riggs <simon@2ndquadrant.com>)
Responses	Re: A Guide to Constraint Exclusion (Partitioning) Re: A Guide to Constraint Exclusion (Partitioning)
List	pgsql-hackers

Tree view

Simon Riggs <simon@2ndquadrant.com> writes:
> In summary, the CE feature will be a huge performance gain for
> qualifying queries against large tables in PostgreSQL databases.

BTW, before we spend too much time chasing an emperor that may have no
clothes, it's worth asking whether this feature is really going to buy
anything in the real world.  What is bothering me at the moment is the
thought that the partitioning key would normally be indexed within
each table, and that an indexscan that's off the end of the indexed
range is cheap.

For example, you write

> Now, if we run our example query again
>     SELECT sum(soldqty) FROM Sales_DateItemOutlet
>     WHERE DateKey between 20050101 and 20050101
> we find that the query will
>     Scan all rows in Sales_DateItemOutlet (which is empty)
>     Scan all rows in Sales_Jan_DateItemOutlet
>     Scan all rows in Sales_Feb_DateItemOutlet
>     Scan all rows in Sales_Mar_DateItemOutlet

but the "scan all rows" will only happen if no index is provided on
DateKey in the child tables.  Otherwise the planner will probably
select plans like this:
       ->  Index Scan using i1 on sales_jan_dateitemoutlet sales_dateitemoutlet  (cost=0.00..5.98 rows=1 width=0)
      Index Cond: ((datekey >= 20050101) AND (datekey <= 20050101))

for each child table for which the statistics indicate that no rows are
likely to be selected.  This will fall through quite quickly in
practice, meaning that the "huge performance gain" from not doing it at
all is a bit oversold.  (Note that it's already true that each child
table is planned separately, so the plan for the partition that *is*
targeted by the query may be different.)

AFAICS, CE as proposed is only worth bothering with if the partitioning
key is something you would not want to create indexes on; which does not
strike me as a major use-case.

It'd be more attractive if the methodology allowed an inheritance or
union-all collection to be reduced to one single table (ie, CE gets rid
of all but one collection member) and then that could be planned as if
it were a primitive table entry (ie, no Append plan node).  This doesn't
help much for simple queries on the fact table but it is interesting for
join cases, because if the Append is in the way there's no way to handle
inner indexscan join plans.

Unfortunately the patch as proposed is a long way from being able to do
that, and given the current semantics of inherited constraints it's not
even remotely feasible, since as you note we can't put a constraint on
just the parent table.  We could maybe do it for UNION ALL views, but
the patch doesn't handle that case.

So at the moment I'm feeling a bit dubious about the real value.
        regards, tom lane

pgsql-hackers by date:

From: Dawid Kuroczko
Date: 22 July 2005, 17:49:06
Subject: Re: [PERFORM] Planner doesn't look at LIMIT?

From: Alvaro Herrera
Date: 22 July 2005, 19:54:03
Subject: Re: Autovacuum loose ends

Re: A Guide to Constraint Exclusion (Partitioning) - Mailing list pgsql-hackers

Previous

Next