Thread: Choosing an index on partitioned tables.
I have a series of tables which are going to be queries mostly on two columns. A timestamp table and a metric type column. My plan is to partition by date ranges which means the primary key has to include the timestamp column and the id column As far as I know there is no way to specify an index type for those columns. The metric type is a text column and will not be very selective. It will have somewhere around 200 types of metrics and they will all be short, less than ten characters. Given that there will be a lot of records I was wondering what type of index would be ideal for that column. Seems like hash indexes would be ideal because only comparison will be = and they are smaller than Btrees but for a while they were not recommended. Would hash be the best or would something work better?
Hi Tim,
I've had good success with TimescaleDB for large timesries databases (40b readings).
You turn your timestamp table into a Timescale hypertable and it looks after the indexing and partitioning automatically, with the table accessed like a normal postgres table, but very quickly.
It also adds some SQL functions to add a bit of time based query functionality.
Cheers
Brent Wood
Principal Technician, Fisheries
NIWA
DDI: +64 (4) 3860529
Principal Technician, Fisheries
NIWA
DDI: +64 (4) 3860529
From: Tim Uckun <timuckun@gmail.com>
Sent: Tuesday, September 7, 2021 15:44
To: pgsql-general <pgsql-general@postgresql.org>
Subject: Choosing an index on partitioned tables.
Sent: Tuesday, September 7, 2021 15:44
To: pgsql-general <pgsql-general@postgresql.org>
Subject: Choosing an index on partitioned tables.
I have a series of tables which are going to be queries mostly on two
columns. A timestamp table and a metric type column.
My plan is to partition by date ranges which means the primary key has
to include the timestamp column and the id column As far as I know
there is no way to specify an index type for those columns.
The metric type is a text column and will not be very selective. It
will have somewhere around 200 types of metrics and they will all be
short, less than ten characters.
Given that there will be a lot of records I was wondering what type of
index would be ideal for that column. Seems like hash indexes would be
ideal because only comparison will be = and they are smaller than
Btrees but for a while they were not recommended.
Would hash be the best or would something work better?
columns. A timestamp table and a metric type column.
My plan is to partition by date ranges which means the primary key has
to include the timestamp column and the id column As far as I know
there is no way to specify an index type for those columns.
The metric type is a text column and will not be very selective. It
will have somewhere around 200 types of metrics and they will all be
short, less than ten characters.
Given that there will be a lot of records I was wondering what type of
index would be ideal for that column. Seems like hash indexes would be
ideal because only comparison will be = and they are smaller than
Btrees but for a while they were not recommended.
Would hash be the best or would something work better?
![]() | Brent Wood Principal Technician - GIS and Spatial Data Management Programme Leader - Environmental Information Delivery +64-4-386-0529 National Institute of Water & Atmospheric Research Ltd (NIWA) 301 Evans Bay Parade Hataitai Wellington New Zealand Connect with NIWA: niwa.co.nz Facebook LinkedIn Twitter Instagram |
Hi Brent.
Is it really worth the extra expense?
On Tue, Sep 7, 2021 at 4:06 PM Brent Wood <Brent.Wood@niwa.co.nz> wrote:
Hi Tim,I've had good success with TimescaleDB for large timesries databases (40b readings).You turn your timestamp table into a Timescale hypertable and it looks after the indexing and partitioning automatically, with the table accessed like a normal postgres table, but very quickly.It also adds some SQL functions to add a bit of time based query functionality.CheersBrent Wood
Principal Technician, Fisheries
NIWA
DDI: +64 (4) 3860529From: Tim Uckun <timuckun@gmail.com>
Sent: Tuesday, September 7, 2021 15:44
To: pgsql-general <pgsql-general@postgresql.org>
Subject: Choosing an index on partitioned tables.I have a series of tables which are going to be queries mostly on two
columns. A timestamp table and a metric type column.
My plan is to partition by date ranges which means the primary key has
to include the timestamp column and the id column As far as I know
there is no way to specify an index type for those columns.
The metric type is a text column and will not be very selective. It
will have somewhere around 200 types of metrics and they will all be
short, less than ten characters.
Given that there will be a lot of records I was wondering what type of
index would be ideal for that column. Seems like hash indexes would be
ideal because only comparison will be = and they are smaller than
Btrees but for a while they were not recommended.
Would hash be the best or would something work better?To ensure compliance with legal requirements and to maintain cyber security standards, NIWA's IT systems are subject to ongoing monitoring, activity logging and auditing. This monitoring and auditing service may be provided by third parties. Such third parties can access information transmitted to, processed by and stored on NIWA's IT systems
![]()
Brent Wood
Principal Technician - GIS and Spatial Data Management
Programme Leader - Environmental Information Delivery
+64-4-386-0529
National Institute of Water & Atmospheric Research Ltd (NIWA)
301 Evans Bay Parade Hataitai Wellington New Zealand
Connect with NIWA: niwa.co.nz Facebook LinkedIn Twitter Instagram
On Tue, 2021-09-07 at 15:44 +1200, Tim Uckun wrote: > I have a series of tables which are going to be queries mostly on two > columns. A timestamp table and a metric type column. > > My plan is to partition by date ranges which means the primary key has > to include the timestamp column and the id column As far as I know > there is no way to specify an index type for those columns. > > The metric type is a text column and will not be very selective. It > will have somewhere around 200 types of metrics and they will all be > short, less than ten characters. > > Given that there will be a lot of records I was wondering what type of > index would be ideal for that column. Seems like hash indexes would be > ideal because only comparison will be = and they are smaller than > Btrees but for a while they were not recommended. > > Would hash be the best or would something work better? If you don't need to speed up searches by "id", you could define the primary key on (timestamp_col, id), which can be used to speed up searches by the timestamp column without defining an extra index. I would choose a B-tree index for the metrics column. With the B-tree deduplication feature added in v13, the index will be small, and I doubt that hash indexes would perform much better. If there is a dominant value, you could consider a partial index that excludes that value. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
On Tue, 2021-09-07 at 04:06 +0000, Brent Wood wrote: > From: Tim Uckun <timuckun@gmail.com> > > My plan is to partition by date ranges which means the primary key has > > to include the timestamp column and the id column As far as I know > > there is no way to specify an index type for those columns. > > > > The metric type is a text column and will not be very selective. It > > will have somewhere around 200 types of metrics and they will all be > > short, less than ten characters. > > > > Given that there will be a lot of records I was wondering what type of > > index would be ideal for that column. Seems like hash indexes would be > > ideal because only comparison will be = and they are smaller than > > Btrees but for a while they were not recommended. > > > > Would hash be the best or would something work better? > > I've had good success with TimescaleDB for large timesries databases (40b readings). That has nothing to do with indexing, and I would think twice to install an invasive extension like that and add a dependency on third-party code, just because I want to partition a table. Yours, Laurenz Albe -- Cybertec | https://www.cybertec-postgresql.com
Thanks! That's great about the Btree deduplication feature in 13. On Tue, Sep 7, 2021 at 7:21 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Tue, 2021-09-07 at 15:44 +1200, Tim Uckun wrote: > > I have a series of tables which are going to be queries mostly on two > > columns. A timestamp table and a metric type column. > > > > My plan is to partition by date ranges which means the primary key has > > to include the timestamp column and the id column As far as I know > > there is no way to specify an index type for those columns. > > > > The metric type is a text column and will not be very selective. It > > will have somewhere around 200 types of metrics and they will all be > > short, less than ten characters. > > > > Given that there will be a lot of records I was wondering what type of > > index would be ideal for that column. Seems like hash indexes would be > > ideal because only comparison will be = and they are smaller than > > Btrees but for a while they were not recommended. > > > > Would hash be the best or would something work better? > > If you don't need to speed up searches by "id", you could define > the primary key on (timestamp_col, id), which can be used to speed > up searches by the timestamp column without defining an extra index. > > I would choose a B-tree index for the metrics column. > With the B-tree deduplication feature added in v13, the index will > be small, and I doubt that hash indexes would perform much better. > > If there is a dominant value, you could consider a partial index > that excludes that value. > > Yours, > Laurenz Albe > -- > Cybertec | https://www.cybertec-postgresql.com >
To be fair Timescale also adds some other features which might be useful. For example they add some SQL enhancements like last value and auto maintaining materialized views and such. The automatic management of partitions is also pretty big plus in my opinion. You can get some of the equivalent functionality by writing stored procs and deploying pg_cron but it's nice to have those things built in. It's open source so you can just deploy their docker image which I did for development but in the end I wanted to try and do the same thing in plain jane postgres. On Tue, Sep 7, 2021 at 7:24 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Tue, 2021-09-07 at 04:06 +0000, Brent Wood wrote: > > From: Tim Uckun <timuckun@gmail.com> > > > My plan is to partition by date ranges which means the primary key has > > > to include the timestamp column and the id column As far as I know > > > there is no way to specify an index type for those columns. > > > > > > The metric type is a text column and will not be very selective. It > > > will have somewhere around 200 types of metrics and they will all be > > > short, less than ten characters. > > > > > > Given that there will be a lot of records I was wondering what type of > > > index would be ideal for that column. Seems like hash indexes would be > > > ideal because only comparison will be = and they are smaller than > > > Btrees but for a while they were not recommended. > > > > > > Would hash be the best or would something work better? > > > > I've had good success with TimescaleDB for large timesries databases (40b readings). > > That has nothing to do with indexing, and I would think twice to install > an invasive extension like that and add a dependency on third-party code, > just because I want to partition a table. > > Yours, > Laurenz Albe > -- > Cybertec | https://www.cybertec-postgresql.com >
On Tue, Sep 7, 2021 at 10:51 AM Tim Uckun <timuckun@gmail.com> wrote: > > To be fair Timescale also adds some other features which might be > useful. For example they add some SQL enhancements like last value > and auto maintaining materialized views and such. The automatic > management of partitions is also pretty big plus in my opinion. You > can get some of the equivalent functionality by writing stored procs > and deploying pg_cron but it's nice to have those things built in. If you want automatic partition management, look at pg_partman. No need to write your own procs and deploy with cron. And FWIW, in reference to the discussions about AWS, it is supported on RDS. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
It's weird that it's supported on AWS and many other providers but not in the official docker images. On Tue, Sep 7, 2021 at 9:16 PM Magnus Hagander <magnus@hagander.net> wrote: > > On Tue, Sep 7, 2021 at 10:51 AM Tim Uckun <timuckun@gmail.com> wrote: > > > > To be fair Timescale also adds some other features which might be > > useful. For example they add some SQL enhancements like last value > > and auto maintaining materialized views and such. The automatic > > management of partitions is also pretty big plus in my opinion. You > > can get some of the equivalent functionality by writing stored procs > > and deploying pg_cron but it's nice to have those things built in. > > If you want automatic partition management, look at pg_partman. No > need to write your own procs and deploy with cron. > > And FWIW, in reference to the discussions about AWS, it is supported on RDS. > > -- > Magnus Hagander > Me: https://www.hagander.net/ > Work: https://www.redpill-linpro.com/
On Tue, Sep 7, 2021 at 11:52 AM Tim Uckun <timuckun@gmail.com> wrote: > > It's weird that it's supported on AWS and many other providers but not > in the official docker images. That'd be something to talk to the docker people about I guess? There are no official docker images published by *postgresql*. (And of course, AWS or Azure or whomever do whatever they want, but I assume they're including pg_partman because it's a very popular extension) -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
There is an image marked as official: https://hub.docker.com/_/postgres I presumed this was maintained by the postgres team. On Tue, Sep 7, 2021 at 9:59 PM Magnus Hagander <magnus@hagander.net> wrote: > > On Tue, Sep 7, 2021 at 11:52 AM Tim Uckun <timuckun@gmail.com> wrote: > > > > It's weird that it's supported on AWS and many other providers but not > > in the official docker images. > > That'd be something to talk to the docker people about I guess? There > are no official docker images published by *postgresql*. (And of > course, AWS or Azure or whomever do whatever they want, but I assume > they're including pg_partman because it's a very popular extension) > > -- > Magnus Hagander > Me: https://www.hagander.net/ > Work: https://www.redpill-linpro.com/
On Tue, Sep 7, 2021 at 12:15 PM Tim Uckun <timuckun@gmail.com> wrote: > > There is an image marked as official: https://hub.docker.com/_/postgres > > I presumed this was maintained by the postgres team. It is official *docker*, just not official *postgresql*. If you click their "maintained by" link you get to https://github.com/docker-library/postgres which clearly explains what "type of official" it is. That said, as long as you use the debian based version of their container, it should be trivial to add any extension that's supported on Debian, which definitely includes pg_partman. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/