Thread: TODO idea - implicit constraints across child tables with a common column as primary key (but obviously not a shared index)
TODO idea - implicit constraints across child tables with a common column as primary key (but obviously not a shared index)
From
Andrew Hammond
Date:
If you have a table with a bunch of children, and these children all have a primary key which is generated from the same sequence, assuming that you're partitioning based on date (ie, this is a transaction record table), it would be nice if the planner could spot that all tables have a primary key on a column used as a join condition, check the min / max to see if there is overlap between tables, then apply CBE as if constraints existed. This strikes me as a pretty common situation, certainly we're seeing it here. Andrew
Re: TODO idea - implicit constraints across child tables with a common column as primary key (but obviously not a shared index)
From
Gregory Stark
Date:
"Andrew Hammond" <andrew.george.hammond@gmail.com> writes: > If you have a table with a bunch of children, and these children all > have a primary key which is generated from the same sequence, assuming > that you're partitioning based on date (ie, this is a transaction > record table), it would be nice if the planner could spot that all > tables have a primary key on a column used as a join condition, check > the min / max to see if there is overlap between tables, then apply > CBE as if constraints existed. The problem is that it's not really true that sequences and time move together. It's quite possible to have two transactions which both start just before the date-based partition cutoff but have one land in each partition with the greater sequence number landing in the old partition. It would be rare (but still possible) if you always insert using quick autocommitted inserts with nextval() in a values list. But it would be quite likely if you use one of the other coding styles such as doing one query to look up the nextval() and then doing various inserts based on that value in multiple statements within a single transaction. What I've been considering instead was using the statistics. If we provided a way to mark partitions read-only then once a table (or partition) is marked then a subsequent VACUUM ANALYZE could mark the resulting statistics as "authoritative". Now that we have plan invalidation we could use this kind of information in the planning. The main data from the statistics that's of interest here are the extreme values of the histogram. If we're not interested in any values in that range then we can exclude the partition entirely. This has a number of nice properties. It requires little additional work for the DBA and "read-only" is a nice simple concept for a DBA to understand. It's even a useful feature for other purposes. It also can catch a lot more cases than the one you describe. In particular it would eliminate the parent table if it has no rows which gives us a chance to eliminate the Append node altogether. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
Re: TODO idea - implicit constraints across child tables with a common column as primary key (but obviously not a shared index)
From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes: > The main data from the statistics that's of interest here are the extreme > values of the histogram. If we're not interested in any values in that range > then we can exclude the partition entirely. Except that there is *no* guarantee that the histogram includes the extreme values --- to promise that would require ANALYZE to scan every table row. regards, tom lane
Re: TODO idea - implicit constraints across child tables with a common column as primary key (but obviously not a shared index)
From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes: > Gregory Stark <stark@enterprisedb.com> writes: >> The main data from the statistics that's of interest here are the extreme >> values of the histogram. If we're not interested in any values in that range >> then we can exclude the partition entirely. > > Except that there is *no* guarantee that the histogram includes the > extreme values --- to promise that would require ANALYZE to scan every > table row. That's why I said: a subsequent VACUUM ANALYZE could mark the resulting statistics as "authoritative" Not just plain analyze. There's another issue here too. One of the other motivations is to be able to put read-only tables on read-only media. To do that would require freezing every tuple which would at the very least involve looking at every tuple. (It would also involve waiting until all tuples are freezable too.) So there's a natural step in which to gather these authoritative statistics anyways. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com