Re: On partitioning - Mailing list pgsql-hackers
From | Stephen Frost |
---|---|
Subject | Re: On partitioning |
Date | |
Msg-id | 20141113063944.GY28859@tamriel.snowman.net Whole thread Raw |
In response to | Re: On partitioning (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: On partitioning
Re: On partitioning |
List | pgsql-hackers |
* Robert Haas (robertmhaas@gmail.com) wrote: > On Wed, Nov 12, 2014 at 5:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > >> Maybe as anyarray, but I think pg_node_tree > >> might even be better. That can also represent data of some arbitrary > >> type, but it doesn't enforce that everything is uniform. > > > > Of course, the more general you make it, the more likely that it'll be > > impossible to optimize well. Agreed- a node tree seems a bit too far to make this really work well.. But, I'm curious what you were thinking specifically? A node tree which accepts an "argument" of the constant used in the original query and then spits back a table might work reasonably well for that case- but with declarative partitioning, I expect us to eventually be able to eliminate complete partitions from consideration on both sides of a partition-table join and optimize cases where we have two partitioned tables being joined with a compatible join key and only actually do joins between the partitions which overlap each other. I don't see those happening if we're allowing a node tree (only). If having a node tree is just one option among other partitioning options, then we can provide users with the ability to choose what suits their particular needs. > The point for me is just that range and list partitioning probably > need different structure, and hash partitioning, if we want to support > that, needs something else again. Range partitioning needs an array > of partition boundaries and an array of child OIDs. List partitioning > needs an array of specific values and a child table OID for each. > Hash partitioning needs something probably quite different. We might > be able to do it as a pair of arrays - one of type anyarray and one of > type OID - and meet all needs that way. I agree that these will require different structures in the catalog.. While reviewing the superuser checks, I expected to have a similar need and discussed various options- having multiple catalog tables, having a single table with multiple columns, having a single table with a 'type' column and then a bytea blob. In the end, it wasn't really necessary as the only thing which I expected to need more than 'yes/no' were the directory permissions (which it looks like might end up killed anyway, much to my sadness..), but while considering the options, I continued to feel like anything but independent tables was hacking around to try and reduce the number of inodes used for folks who don't actually use these features, and that's a terrible reason to complicate the catalog and code, in my view. It occurs to me that we might be able to come up with a better way to address the inode concern and therefore ignore it. There are other considerations to having more catalog tables, but declarative partitioning is an important enough feature, in my view, that I wouldn't care if it required 10 catalog tables to implement. Misrepresenting it with a catalog that's got a bunch of columns, all but one of which are NULL, or by using essentially removing the knowledge of the data type from the system by using a type column with some binary blob, isn't doing ourselves or our users any favors. That's not to say that I'm against a solution which only needs one catalog table, but let's not completely throw away proper structure because of inode or other resource consideration issues. We have quite a few other catalog tables which are rarely used and it'd be good to address the issue with those consuming resources independently. I'm not a fan of using pg_class- there are a number of columns in there which I would *not* wish to be allowed to be different per partition (starting with relowner and relacl...). Making those NULL would be just as bad (probably worse, really, since we'd also need to add new columns to pg_class to indicate the partitioning...) as having a sparsely populated new catalog table. Thanks! Stephen
pgsql-hackers by date: