Thread: Partitioning V schema
Hi List I am looking for some general advice about the best was of splitting a large data table,I have 2 different choices, partitioning or different schemas. The data table refers to the number of houses that can be include in a city, as such there are large number of records. I am wondering if decided to partition the table if the update speed/access might be faster that just declaring a different schema per city. Under the partition the data table would appear to be smaller, so I should get an increase in speed, but the database still have to do some sort of indexing. If I used different schemas, it resolves data protection issues, but doing a backup might become a nightmare In general which is the fast access method? regards Dave.
If I were you I will use partitioning. In my experience, partitioning is easier and transparent. I just have to set it up and then refers just to one table and done.
About speed, if you have the value "constraint_exclusion" = partition, postgres will examine constraints only for inheritance child tables and UNION ALL subqueries and will improve the perfomance of your query
On Thu, Sep 19, 2013 at 3:02 AM, Dave Potts <dave.potts@pinan.co.uk> wrote:
Hi List
I am looking for some general advice about the best was of splitting a large data table,I have 2 different choices, partitioning or different schemas.
The data table refers to the number of houses that can be include in a city, as such there are large number of records.
I am wondering if decided to partition the table if the update speed/access might be faster that just declaring a different schema per city.
Under the partition the data table would appear to be smaller, so I should get an increase in speed, but the database still have to do some sort of indexing.
If I used different schemas, it resolves data protection issues, but doing a backup might become a nightmare
In general which is the fast access method?
regards
Dave.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
Gracias
-----------------
Agustín Larreinegabe
Hi Dave, How many rows of data are we talking here and how much information? (GiB) Are you able to provide the table definition? (can normalisation partition off some of this data?). Have you addressed dedicated options for lookup data, tune the database appropriately and keep that single table? With postgres we have schemas, so it can provide some convenience and design options. So look at it in terms how your query will look. SELECT schema.table.column FROM schema.table; vs SELECT schema_table.column FROM schema_table; Not much in it? However, I tend to go with partitions when required to be generated on demand dynamically and automatically (which probably isn't the case here). SCHEMAs have other uses, provide a level of security (GRANT) and useful in design when partitioning off blocks of related datasets completely. Regards, Julian On 19/09/13 17:02, Dave Potts wrote: > Hi List > > I am looking for some general advice about the best was of splitting > a large data table,I have 2 different choices, partitioning or > different schemas. > > The data table refers to the number of houses that can be include in a > city, as such there are large number of records. > > > I am wondering if decided to partition the table if the update > speed/access might be faster that just declaring a different schema > per city. > > Under the partition the data table would appear to be smaller, so I > should get an increase in speed, but the database still have to do > some sort of indexing. > > If I used different schemas, it resolves data protection issues, but > doing a backup might become a nightmare > > In general which is the fast access method? > > regards > > > Dave. > > > > >
On Thu, Sep 19, 2013 at 12:02 AM, Dave Potts <dave.potts@pinan.co.uk> wrote:
Hi List
I am looking for some general advice about the best was of splitting a large data table,I have 2 different choices, partitioning or different schemas.
I don't think there is much of a choice there. If you put them in different schemas, then you are inherently partitioning the data. It just a question of how you name your partitions, which is more of a naming issue than a performance issue.
The data table refers to the number of houses that can be include in a city, as such there are large number of records.
I am wondering if decided to partition the table if the update speed/access might be faster that just declaring a different schema per city.
If you partition based on city, then there should be no meaningful difference. If you partition based on something else, you would have to describe what it is partitioned on, and what your access patterns are like.
Cheers,
Jeff
I would look towards how PostGis handles the Tiger census data for guidance. It's a similar, massive data set.
Greg Haase
On Sep 20, 2013 9:47 AM, "Jeff Janes" <jeff.janes@gmail.com> wrote:
On Thu, Sep 19, 2013 at 12:02 AM, Dave Potts <dave.potts@pinan.co.uk> wrote:Hi List
I am looking for some general advice about the best was of splitting a large data table,I have 2 different choices, partitioning or different schemas.I don't think there is much of a choice there. If you put them in different schemas, then you are inherently partitioning the data. It just a question of how you name your partitions, which is more of a naming issue than a performance issue.
The data table refers to the number of houses that can be include in a city, as such there are large number of records.
I am wondering if decided to partition the table if the update speed/access might be faster that just declaring a different schema per city.If you partition based on city, then there should be no meaningful difference. If you partition based on something else, you would have to describe what it is partitioned on, and what your access patterns are like.Cheers,Jeff
On 21/09/13 02:51, Gregory Haase wrote: > > I would look towards how PostGis handles the Tiger census data for > guidance. It's a similar, massive data set. > > Greg Haase > I'm not sure why it wouldn't handle it fine? The question is at what point would third party "imported" datasets, required for local lookup require their own dedicated solution (database and/or server)? Especially when we are talking about a large amounts of data that come with their own unique global identifiers. Something like ISO 3166, even small, would do well with its own database - but not large enough to require a dedicated server. This is the question I put to the OP, I would be interested to know. Regards, Julian
On Fri, Sep 20, 2013 at 4:38 AM, Julian <tempura@internode.on.net> wrote: > However, I tend to go with partitions when required to be generated on > demand dynamically and automatically (which probably isn't the case > here). SCHEMAs have other uses, provide a level of security (GRANT) and > useful in design when partitioning off blocks of related datasets > completely. I would do a partition in this case for the same reason: schemas are much more logical divisions of data due to different grantings, search paths, users, and so on. Partition is more a data-level split so it does make sense when you want the data to stay in a single point but for performance reason split it across multiple tables. Luca