Thread: Partitioning V schema

Partitioning V schema

From

Dave Potts

Date:

19 September 2013, 07:02:52

Hi List

I am looking for some general advice about the best was of splitting  a
large data table,I have  2 different choices, partitioning or different
schemas.

The data table refers to the number of houses that can be include in a
city, as such there are large number of records.


I am wondering if decided to partition the table if the update
speed/access might be faster that just declaring a different schema per
city.

Under the partition the data table would appear to be smaller, so I
should get an increase in speed, but the database still have to do some
sort of indexing.

If I used different schemas, it resolves data protection issues, but
doing a backup might become a nightmare

In general which is the fast access method?

regards


Dave.

Re: Partitioning V schema

From

Agustin Larreinegabe

Date:

19 September 2013, 12:40:37

If I were you I will use partitioning. In my experience, partitioning is easier and transparent. I just have to set it up and then refers just to one table and done.

About speed, if you have the value "constraint_exclusion" = partition, postgres will examine constraints only for inheritance child tables and UNION ALL subqueries and will improve the perfomance of your query

http://www.postgresql.org/docs/9.2/static/ddl-partitioning.html

http://www.postgresql.org/docs/9.2/static/runtime-config-query.html

On Thu, Sep 19, 2013 at 3:02 AM, Dave Potts <dave.potts@pinan.co.uk> wrote:

Hi List

I am looking for some general advice about the best was of splitting a large data table,I have 2 different choices, partitioning or different schemas.

The data table refers to the number of houses that can be include in a city, as such there are large number of records.

I am wondering if decided to partition the table if the update speed/access might be faster that just declaring a different schema per city.

Under the partition the data table would appear to be smaller, so I should get an increase in speed, but the database still have to do some sort of indexing.

If I used different schemas, it resolves data protection issues, but doing a backup might become a nightmare

In general which is the fast access method?

regards

Dave.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
Gracias
-----------------
Agustín Larreinegabe

Re: Partitioning V schema

From

Julian

Date:

20 September 2013, 02:39:04

Hi Dave,
How many rows of data are we talking here and how much information? (GiB)

Are you able to provide the table definition? (can normalisation
partition off some of this data?).

Have you addressed dedicated options for lookup data, tune the database
appropriately and keep that single table?

With postgres we have schemas, so it can provide some convenience and
design options.

So look at it in terms how your query will look.

SELECT schema.table.column FROM schema.table;

vs

SELECT schema_table.column FROM schema_table;

Not much in it?

However, I tend to go with partitions when required to be generated on
demand dynamically and automatically (which probably isn't the case
here). SCHEMAs have other uses, provide a level of security (GRANT) and
useful in design when partitioning off blocks of related datasets
completely.

Regards,
Julian

On 19/09/13 17:02, Dave Potts wrote:
> Hi List
>
> I am looking for some general advice about the best was of splitting
> a large data table,I have  2 different choices, partitioning or
> different schemas.
>
> The data table refers to the number of houses that can be include in a
> city, as such there are large number of records.
>
>
> I am wondering if decided to partition the table if the update
> speed/access might be faster that just declaring a different schema
> per city.
>
> Under the partition the data table would appear to be smaller, so I
> should get an increase in speed, but the database still have to do
> some sort of indexing.
>
> If I used different schemas, it resolves data protection issues, but
> doing a backup might become a nightmare
>
> In general which is the fast access method?
>
> regards
>
>
> Dave.
>
>
>
>
>

Re: Partitioning V schema

From

Jeff Janes

Date:

20 September 2013, 16:47:13

On Thu, Sep 19, 2013 at 12:02 AM, Dave Potts <dave.potts@pinan.co.uk> wrote:

Hi List

I am looking for some general advice about the best was of splitting a large data table,I have 2 different choices, partitioning or different schemas.

I don't think there is much of a choice there. If you put them in different schemas, then you are inherently partitioning the data. It just a question of how you name your partitions, which is more of a naming issue than a performance issue.

The data table refers to the number of houses that can be include in a city, as such there are large number of records.

I am wondering if decided to partition the table if the update speed/access might be faster that just declaring a different schema per city.

If you partition based on city, then there should be no meaningful difference. If you partition based on something else, you would have to describe what it is partitioned on, and what your access patterns are like.

Cheers,

Jeff

Re: Partitioning V schema

From

Gregory Haase

Date:

20 September 2013, 16:51:42

I would look towards how PostGis handles the Tiger census data for guidance. It's a similar, massive data set.

Greg Haase

On Sep 20, 2013 9:47 AM, "Jeff Janes" <jeff.janes@gmail.com> wrote:

On Thu, Sep 19, 2013 at 12:02 AM, Dave Potts <dave.potts@pinan.co.uk> wrote:
Hi List

I am looking for some general advice about the best was of splitting a large data table,I have 2 different choices, partitioning or different schemas.

I don't think there is much of a choice there. If you put them in different schemas, then you are inherently partitioning the data. It just a question of how you name your partitions, which is more of a naming issue than a performance issue.

The data table refers to the number of houses that can be include in a city, as such there are large number of records.

I am wondering if decided to partition the table if the update speed/access might be faster that just declaring a different schema per city.

If you partition based on city, then there should be no meaningful difference. If you partition based on something else, you would have to describe what it is partitioned on, and what your access patterns are like.

Cheers,

Jeff

Re: Partitioning V schema

From

Julian

Date:

21 September 2013, 04:30:23

On 21/09/13 02:51, Gregory Haase wrote:
>
> I would look towards how PostGis handles the Tiger census data for
> guidance. It's a similar, massive data set.
>
> Greg Haase
>
I'm not sure why it wouldn't handle it fine?
The question is at what point would third party "imported" datasets,
required for local lookup require their own dedicated solution (database
and/or server)?
Especially when we are talking about a large amounts of data that come
with their own unique global identifiers. Something like ISO 3166, even
small, would do well with its own database - but not large enough to
require a dedicated server.

This is the question I put to the OP, I would be interested to know.

Regards,
Julian

Re: Partitioning V schema

From

Luca Ferrari

Date:

21 September 2013, 13:14:26

On Fri, Sep 20, 2013 at 4:38 AM, Julian <tempura@internode.on.net> wrote:
> However, I tend to go with partitions when required to be generated on
> demand dynamically and automatically (which probably isn't the case
> here). SCHEMAs have other uses, provide a level of security (GRANT) and
> useful in design when partitioning off blocks of related datasets
> completely.

I would do a partition in this case for the same reason: schemas are
much more logical divisions of data due to different grantings, search
paths, users, and so on. Partition is more a data-level split so it
does make sense when you want the data to stay in a single point but
for performance reason split it across multiple tables.

Luca