Thread: Choosing an index on partitioned tables.

Choosing an index on partitioned tables.

From

Tim Uckun

Date:

07 September 2021, 03:44:35

I have a series of tables which are going to be queries mostly on two
columns. A timestamp table and a metric type column.

My plan is to partition by date ranges which means the primary key has
to include the timestamp column and the id column  As far as I know
there is no way to specify an index type for those columns.

The metric type is a text column and will not be very selective. It
will have somewhere around 200 types of metrics and they will all be
short, less than ten characters.

Given that there will be a lot of records I was wondering what type of
index would be ideal for that column. Seems like hash indexes would be
ideal because only comparison will be = and they are smaller than
Btrees but for a while they were not recommended.

Would hash be the best or would something work better?

Re: Choosing an index on partitioned tables.

From

Brent Wood

Date:

07 September 2021, 04:06:27

Hi Tim,

I've had good success with TimescaleDB for large timesries databases (40b readings).

https://www.timescale.com/

You turn your timestamp table into a Timescale hypertable and it looks after the indexing and partitioning automatically, with the table accessed like a normal postgres table, but very quickly.

It also adds some SQL functions to add a bit of time based query functionality.

Cheers

Brent Wood

Principal Technician, Fisheries
NIWA
DDI: +64 (4) 3860529

From: Tim Uckun <timuckun@gmail.com>
Sent: Tuesday, September 7, 2021 15:44
To: pgsql-general <pgsql-general@postgresql.org>
Subject: Choosing an index on partitioned tables.

I have a series of tables which are going to be queries mostly on two
columns. A timestamp table and a metric type column.

My plan is to partition by date ranges which means the primary key has
to include the timestamp column and the id column As far as I know
there is no way to specify an index type for those columns.

The metric type is a text column and will not be very selective. It
will have somewhere around 200 types of metrics and they will all be
short, less than ten characters.

Given that there will be a lot of records I was wondering what type of
index would be ideal for that column. Seems like hash indexes would be
ideal because only comparison will be = and they are smaller than
Btrees but for a while they were not recommended.

Would hash be the best or would something work better?

Brent Wood
Principal Technician - GIS and Spatial Data Management
Programme Leader - Environmental Information Delivery
+64-4-386-0529

National Institute of Water & Atmospheric Research Ltd (NIWA)
301 Evans Bay Parade Hataitai Wellington New Zealand
Connect with NIWA: niwa.co.nz Facebook LinkedIn Twitter Instagram

To ensure compliance with legal requirements and to maintain cyber security standards, NIWA's IT systems are subject to ongoing monitoring, activity logging and auditing. This monitoring and auditing service may be provided by third parties. Such third parties can access information transmitted to, processed by and stored on NIWA's IT systems

Re: Choosing an index on partitioned tables.

From

Tim Uckun

Date:

07 September 2021, 04:26:52

Hi Brent.

I looked at timescaledb. It does make partitioning on date ranges automatic which is awesome and as you said it does add a couple of extra features on top of postgres but their cloud offering are much more expensive than buying a generic postgres instance from AWS. A generic t3.medium on amazon costs ~90 dollars per month in singapore and ~140 per month from timescale.

Is it really worth the extra expense?

On Tue, Sep 7, 2021 at 4:06 PM Brent Wood <Brent.Wood@niwa.co.nz> wrote:

Hi Tim,

I've had good success with TimescaleDB for large timesries databases (40b readings).
https://www.timescale.com/

You turn your timestamp table into a Timescale hypertable and it looks after the indexing and partitioning automatically, with the table accessed like a normal postgres table, but very quickly.

It also adds some SQL functions to add a bit of time based query functionality.

Cheers

Brent Wood

Principal Technician, Fisheries
NIWA
DDI: +64 (4) 3860529

From: Tim Uckun <timuckun@gmail.com>
Sent: Tuesday, September 7, 2021 15:44
To: pgsql-general <pgsql-general@postgresql.org>
Subject: Choosing an index on partitioned tables.

I have a series of tables which are going to be queries mostly on two
columns. A timestamp table and a metric type column.

My plan is to partition by date ranges which means the primary key has
to include the timestamp column and the id column As far as I know
there is no way to specify an index type for those columns.

The metric type is a text column and will not be very selective. It
will have somewhere around 200 types of metrics and they will all be
short, less than ten characters.

Given that there will be a lot of records I was wondering what type of
index would be ideal for that column. Seems like hash indexes would be
ideal because only comparison will be = and they are smaller than
Btrees but for a while they were not recommended.

Would hash be the best or would something work better?

Brent Wood
Principal Technician - GIS and Spatial Data Management
Programme Leader - Environmental Information Delivery
+64-4-386-0529

National Institute of Water & Atmospheric Research Ltd (NIWA)
301 Evans Bay Parade Hataitai Wellington New Zealand
Connect with NIWA: niwa.co.nz Facebook LinkedIn Twitter Instagram
To ensure compliance with legal requirements and to maintain cyber security standards, NIWA's IT systems are subject to ongoing monitoring, activity logging and auditing. This monitoring and auditing service may be provided by third parties. Such third parties can access information transmitted to, processed by and stored on NIWA's IT systems

Re: Choosing an index on partitioned tables.

From

Laurenz Albe

Date:

07 September 2021, 07:21:24

On Tue, 2021-09-07 at 15:44 +1200, Tim Uckun wrote:
> I have a series of tables which are going to be queries mostly on two
> columns. A timestamp table and a metric type column.
> 
> My plan is to partition by date ranges which means the primary key has
> to include the timestamp column and the id column  As far as I know
> there is no way to specify an index type for those columns.
> 
> The metric type is a text column and will not be very selective. It
> will have somewhere around 200 types of metrics and they will all be
> short, less than ten characters.
> 
> Given that there will be a lot of records I was wondering what type of
> index would be ideal for that column. Seems like hash indexes would be
> ideal because only comparison will be = and they are smaller than
> Btrees but for a while they were not recommended.
> 
> Would hash be the best or would something work better?

If you don't need to speed up searches by "id", you could define
the primary key on (timestamp_col, id), which can be used to speed
up searches by the timestamp column without defining an extra index.

I would choose a B-tree index for the metrics column.
With the B-tree deduplication feature added in v13, the index will
be small, and I doubt that hash indexes would perform much better.

If there is a dominant value, you could consider a partial index
that excludes that value.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com

Re: Choosing an index on partitioned tables.

From

Laurenz Albe

Date:

07 September 2021, 07:24:21

On Tue, 2021-09-07 at 04:06 +0000, Brent Wood wrote:
> From: Tim Uckun <timuckun@gmail.com>
> > My plan is to partition by date ranges which means the primary key has
> > to include the timestamp column and the id column  As far as I know
> > there is no way to specify an index type for those columns.
> >
> > The metric type is a text column and will not be very selective. It
> > will have somewhere around 200 types of metrics and they will all be
> > short, less than ten characters.
> >
> > Given that there will be a lot of records I was wondering what type of
> > index would be ideal for that column. Seems like hash indexes would be
> > ideal because only comparison will be = and they are smaller than
> > Btrees but for a while they were not recommended.
> >
> > Would hash be the best or would something work better?
>
> I've had good success with TimescaleDB for large timesries databases (40b readings).

That has nothing to do with indexing, and I would think twice to install
an invasive extension like that and add a dependency on third-party code,
just because I want to partition a table.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com

Re: Choosing an index on partitioned tables.

From

Tim Uckun

Date:

07 September 2021, 08:46:38

Thanks!

That's great about the Btree deduplication feature in 13.


On Tue, Sep 7, 2021 at 7:21 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> On Tue, 2021-09-07 at 15:44 +1200, Tim Uckun wrote:
> > I have a series of tables which are going to be queries mostly on two
> > columns. A timestamp table and a metric type column.
> >
> > My plan is to partition by date ranges which means the primary key has
> > to include the timestamp column and the id column  As far as I know
> > there is no way to specify an index type for those columns.
> >
> > The metric type is a text column and will not be very selective. It
> > will have somewhere around 200 types of metrics and they will all be
> > short, less than ten characters.
> >
> > Given that there will be a lot of records I was wondering what type of
> > index would be ideal for that column. Seems like hash indexes would be
> > ideal because only comparison will be = and they are smaller than
> > Btrees but for a while they were not recommended.
> >
> > Would hash be the best or would something work better?
>
> If you don't need to speed up searches by "id", you could define
> the primary key on (timestamp_col, id), which can be used to speed
> up searches by the timestamp column without defining an extra index.
>
> I would choose a B-tree index for the metrics column.
> With the B-tree deduplication feature added in v13, the index will
> be small, and I doubt that hash indexes would perform much better.
>
> If there is a dominant value, you could consider a partial index
> that excludes that value.
>
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
>

Re: Choosing an index on partitioned tables.

From

Tim Uckun

Date:

07 September 2021, 08:51:00

To be fair Timescale also adds some other features which might be
useful.  For example they add some SQL enhancements like last value
and auto maintaining materialized views and such. The automatic
management of partitions is also pretty big plus in my opinion.  You
can get some of the equivalent functionality by writing stored procs
and deploying pg_cron but it's nice to have those things built in.

It's open source so you can just deploy their docker image which I did
for development but in the end I wanted to try and do the same thing
in plain jane postgres.

On Tue, Sep 7, 2021 at 7:24 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> On Tue, 2021-09-07 at 04:06 +0000, Brent Wood wrote:
> > From: Tim Uckun <timuckun@gmail.com>
> > > My plan is to partition by date ranges which means the primary key has
> > > to include the timestamp column and the id column  As far as I know
> > > there is no way to specify an index type for those columns.
> > >
> > > The metric type is a text column and will not be very selective. It
> > > will have somewhere around 200 types of metrics and they will all be
> > > short, less than ten characters.
> > >
> > > Given that there will be a lot of records I was wondering what type of
> > > index would be ideal for that column. Seems like hash indexes would be
> > > ideal because only comparison will be = and they are smaller than
> > > Btrees but for a while they were not recommended.
> > >
> > > Would hash be the best or would something work better?
> >
> > I've had good success with TimescaleDB for large timesries databases (40b readings).
>
> That has nothing to do with indexing, and I would think twice to install
> an invasive extension like that and add a dependency on third-party code,
> just because I want to partition a table.
>
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
>

Re: Choosing an index on partitioned tables.

From

Magnus Hagander

Date:

07 September 2021, 09:16:03

On Tue, Sep 7, 2021 at 10:51 AM Tim Uckun <timuckun@gmail.com> wrote:
>
> To be fair Timescale also adds some other features which might be
> useful.  For example they add some SQL enhancements like last value
> and auto maintaining materialized views and such. The automatic
> management of partitions is also pretty big plus in my opinion.  You
> can get some of the equivalent functionality by writing stored procs
> and deploying pg_cron but it's nice to have those things built in.

If you want automatic partition management, look at pg_partman. No
need to write your own procs and deploy with cron.

And FWIW, in reference to the discussions about AWS, it is supported on RDS.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Choosing an index on partitioned tables.

From

Tim Uckun

Date:

07 September 2021, 09:52:32

It's weird that it's supported on AWS and many other providers but not
in the official docker images.

On Tue, Sep 7, 2021 at 9:16 PM Magnus Hagander <magnus@hagander.net> wrote:
>
> On Tue, Sep 7, 2021 at 10:51 AM Tim Uckun <timuckun@gmail.com> wrote:
> >
> > To be fair Timescale also adds some other features which might be
> > useful.  For example they add some SQL enhancements like last value
> > and auto maintaining materialized views and such. The automatic
> > management of partitions is also pretty big plus in my opinion.  You
> > can get some of the equivalent functionality by writing stored procs
> > and deploying pg_cron but it's nice to have those things built in.
>
> If you want automatic partition management, look at pg_partman. No
> need to write your own procs and deploy with cron.
>
> And FWIW, in reference to the discussions about AWS, it is supported on RDS.
>
> --
>  Magnus Hagander
>  Me: https://www.hagander.net/
>  Work: https://www.redpill-linpro.com/

Re: Choosing an index on partitioned tables.

From

Magnus Hagander

Date:

07 September 2021, 09:59:44

On Tue, Sep 7, 2021 at 11:52 AM Tim Uckun <timuckun@gmail.com> wrote:
>
> It's weird that it's supported on AWS and many other providers but not
> in the official docker images.

That'd be something to talk to the docker people about I guess? There
are no official docker images published by *postgresql*. (And of
course, AWS or Azure or whomever do whatever they want, but I assume
they're including pg_partman because it's a very popular extension)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Choosing an index on partitioned tables.

From

Tim Uckun

Date:

07 September 2021, 10:15:35

There is an image marked as official:  https://hub.docker.com/_/postgres

I presumed this was maintained by the postgres team.


On Tue, Sep 7, 2021 at 9:59 PM Magnus Hagander <magnus@hagander.net> wrote:
>
> On Tue, Sep 7, 2021 at 11:52 AM Tim Uckun <timuckun@gmail.com> wrote:
> >
> > It's weird that it's supported on AWS and many other providers but not
> > in the official docker images.
>
> That'd be something to talk to the docker people about I guess? There
> are no official docker images published by *postgresql*. (And of
> course, AWS or Azure or whomever do whatever they want, but I assume
> they're including pg_partman because it's a very popular extension)
>
> --
>  Magnus Hagander
>  Me: https://www.hagander.net/
>  Work: https://www.redpill-linpro.com/

Re: Choosing an index on partitioned tables.

From

Magnus Hagander

Date:

07 September 2021, 10:28:09

On Tue, Sep 7, 2021 at 12:15 PM Tim Uckun <timuckun@gmail.com> wrote:
>
> There is an image marked as official:  https://hub.docker.com/_/postgres
>
> I presumed this was maintained by the postgres team.

It is official *docker*, just not official *postgresql*. If you click
their "maintained by" link you get to
https://github.com/docker-library/postgres which clearly explains what
"type of official" it is.

That said, as long as you use the debian based version of their
container, it should be trivial to add any extension that's supported
on Debian, which definitely includes pg_partman.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/