Storage Model for Partitioning - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Storage Model for Partitioning |
Date | |
Msg-id | 1200047140.4266.972.camel@ebony.site Whole thread Raw |
Responses |
Re: Storage Model for Partitioning
Re: Storage Model for Partitioning |
List | pgsql-hackers |
In my striving towards more effective partitioning for Postgres, I see we have one main decision to make and that all other sub-tasks are driven directly by this one issue. The issue is: At what point we store the data within the existing storage model? We discussed this in 2005 when I started to discuss what became constraint exclusion and it remains the core issue from which all other tasks are driven. If we can establish the basics of how a table can be split into partitions, then that allows work to progress on the other issues. I'd like some guidance from the senior crew on this, which is hopefully possible without getting embroiled in all the details of partitioning, most of which are more straightforward technical issues. The current storage model for a table is that above the smgr layer a table looks like a single continuous range of blocks, while below the smgr layer it is in fact a set of segment files. Given where we are now, how should we change the storage model to support partitioning? If at all. We have two types of requirement: - manageability - tablespaces for each partition etc.. - performance - excluding partitions etc.. I've argued from multiple sides of the fence, so I'm trying to present a neutral view to allow us to take the best route forward. The basic options are these: 0. Do Nothing - we don't want any of the other options. 1. Partitions are Contiguous Ranges of Blocks As proposed for segment exclusion based partitioning 2. Partitions are Tables As used by current constraint exclusion based partitioning. 3. Partitions are RelFileNodes, but not Tables 4. Some Other Choice In more detail... 1. Partitions are Contiguous Ranges of Blocks Partitions are a simple subset of a table, i.e. a contiguous range of blocks within the main block range of the table. That allows us to maintain the current smgr model, which then allows... - allows RI via SHARE locks - avoids the need for complex planner changes - allows unique indexes - allows global indexes (because we only have one table) - works naturally with synchronous scans and buffer recycling Doing partitioning this way means we would (as a trivial example) assign blocks 1-10 as partition 1, blocks 11-20 as partition 2 etc.. There are two sub-options of this basic idea: a) Dynamic Partitioning - we define partitioning based around what is already in the table, rather than trying to force the data to a "correct" partition. No changes to the current storage model. Useful, but it doesn't do all that everybody wants. - allows automated table extension, so works automatically with Slony - allows partition wise merge joins - but not easily usable with declarative partitioning b) Fixed partitioning - we define partitions as static ranges of blocks, which may leave us with holes in the range of BlockNumbers, plus each partition has a maximum size that it cannot expand beyond. Probably unacceptable. 2. Partitions are Tables Current situation. This means we have to change - Nested loop joins work with partitions, so an IndexScan must be able to cross partitions within the target table - indexes, so they can refer to more than one partition - share locking in the executor - changes to allow synchronous scans and buffer recycling - automatic partition creation required - single DDL declaration from the Parent table 3. Partitions are RelFileNodes, but not Tables We allow a table to have multiple RelFileNodes, when explicitly declared that way. This means we have to change - DDL changes to allow TABLE level changes to apply to all RelFileNodes, while PARTITION level changes to apply to only one RelFileNode - indexes, so they can refer to more than one partition - share locking in the executor - changes to allow synchronous scans and buffer recycling There *are* other changes not mentioned here that are required for partitioning, which although complex are less doubtful. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
pgsql-hackers by date: