Thread: TODO idea - implicit constraints across child tables with a common column as primary key (but obviously not a shared index)

TODO idea - implicit constraints across child tables with a common column as primary key (but obviously not a shared index)

From

Andrew Hammond

Date:

23 April 2007, 15:10:34

If you have a table with a bunch of children, and these children all
have a primary key which is generated from the same sequence, assuming
that you're partitioning based on date (ie, this is a transaction
record table), it would be nice if the planner could spot that all
tables have a primary key on a column used as a join condition, check
the min / max to see if there is overlap between tables, then apply
CBE as if constraints existed.

This strikes me as a pretty common situation, certainly we're seeing
it here.

Andrew

Re: TODO idea - implicit constraints across child tables with a common column as primary key (but obviously not a shared index)

From

Gregory Stark

Date:

23 April 2007, 15:46:59

"Andrew Hammond" <andrew.george.hammond@gmail.com> writes:

> If you have a table with a bunch of children, and these children all
> have a primary key which is generated from the same sequence, assuming
> that you're partitioning based on date (ie, this is a transaction
> record table), it would be nice if the planner could spot that all
> tables have a primary key on a column used as a join condition, check
> the min / max to see if there is overlap between tables, then apply
> CBE as if constraints existed.

The problem is that it's not really true that sequences and time move
together. It's quite possible to have two transactions which both start just
before the date-based partition cutoff but have one land in each partition
with the greater sequence number landing in the old partition.

It would be rare (but still possible) if you always insert using quick
autocommitted inserts with nextval() in a values list. But it would be quite
likely if you use one of the other coding styles such as doing one query to
look up the nextval() and then doing various inserts based on that value in
multiple statements within a single transaction.

What I've been considering instead was using the statistics. If we provided a
way to mark partitions read-only then once a table (or partition) is marked
then a subsequent VACUUM ANALYZE could mark the resulting statistics as
"authoritative". Now that we have plan invalidation we could use this kind of
information in the planning.

The main data from the statistics that's of interest here are the extreme
values of the histogram. If we're not interested in any values in that range
then we can exclude the partition entirely.

This has a number of nice properties. It requires little additional work for
the DBA and "read-only" is a nice simple concept for a DBA to understand. It's
even a useful feature for other purposes. It also can catch a lot more cases
than the one you describe. In particular it would eliminate the parent table
if it has no rows which gives us a chance to eliminate the Append node
altogether.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com

Re: TODO idea - implicit constraints across child tables with a common column as primary key (but obviously not a shared index)

From

Tom Lane

Date:

24 April 2007, 02:11:56

Gregory Stark <stark@enterprisedb.com> writes:
> The main data from the statistics that's of interest here are the extreme
> values of the histogram. If we're not interested in any values in that range
> then we can exclude the partition entirely.

Except that there is *no* guarantee that the histogram includes the
extreme values --- to promise that would require ANALYZE to scan every
table row.
        regards, tom lane

Re: TODO idea - implicit constraints across child tables with a common column as primary key (but obviously not a shared index)

From

Gregory Stark

Date:

24 April 2007, 10:23:00

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>> The main data from the statistics that's of interest here are the extreme
>> values of the histogram. If we're not interested in any values in that range
>> then we can exclude the partition entirely.
>
> Except that there is *no* guarantee that the histogram includes the
> extreme values --- to promise that would require ANALYZE to scan every
> table row.

That's why I said:
 a subsequent VACUUM ANALYZE could mark the resulting statistics as "authoritative"

Not just plain analyze.

There's another issue here too. One of the other motivations is to be able to
put read-only tables on read-only media. To do that would require freezing
every tuple which would at the very least involve looking at every tuple. (It
would also involve waiting until all tuples are freezable too.) 

So there's a natural step in which to gather these authoritative statistics
anyways.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com