Thread: UUIDs & Clustered Indexes
Luke Gordon <gordysc@gmail.com> writes: > However, according to a message on this mailing list, Postgres doesn't have > clustered indexes: > "But Postgres doesn't _have_ clustered indexes, so that article doesn't > apply at all. The other authors appear to have missed this important point." > https://www.postgresql.org/message-id/56798352.7060902%40uchicago.edu > But, doing a quick check, it appears Postgres does indeed have a mechanism > for a clustered index: > https://www.postgresql.org/docs/9.5/static/sql-cluster.html CLUSTER just does a one-time sort to put the table into index order. There is no mechanism that would cause subsequent insertions of new keys to respect that ordering, so it's pretty much irrelevant to the argument about whether new UUID keys need to be generated in some ordered fashion. Do you actually *need* UUID keys, and if so why? A plain old bigint column is smaller, cheaper to index, and the natural mechanism for generating it (ie a sequence) will tend to preserve ordering for free. regards, tom lane
I'm trying to decide on which UUID generator to use for my Postgres database, and I've narrowed it down to gen_random & uuid_generate_v1mc.There's a fascinating article that discusses performance implications between gen_random_uuid & uuid_generate_v1mc:TL;DR, the article suggests:"Random produces very fragmented inserts that destroy tables. Use uuid_generate_v1mc() [instead].."However, according to a message on this mailing list, Postgres doesn't have clustered indexes:"But Postgres doesn't _have_ clustered indexes, so that article doesn't apply at all. The other authors appear to have missed this important point."https://www.postgresql.org/message-id/56798352.7060902% 40uchicago.edu But, doing a quick check, it appears Postgres does indeed have a mechanism for a clustered index:So, does anyone know which is best? Or are the performance differences so minute they'd only matter in extremely unique circumstances?Regards,Luke GordonPS I attempted to reply to the above thread, but downloading the "raw" link resulted in a basic auth challenge. If I entered my mailing list credentials, I'd just get rechallenged with basic auth...
https://www.postgresql.org/docs/9.4/static/sql-cluster.html
--
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

Luke Gordon <gordysc@gmail.com> writes:
> However, according to a message on this mailing list, Postgres doesn't have
> clustered indexes:
> "But Postgres doesn't _have_ clustered indexes, so that article doesn't
> apply at all. The other authors appear to have missed this important point."
> https://www.postgresql.org/message-id/56798352.7060902% 40uchicago.edu
> But, doing a quick check, it appears Postgres does indeed have a mechanism
> for a clustered index:
> https://www.postgresql.org/docs/9.5/static/sql-cluster. html
CLUSTER just does a one-time sort to put the table into index order.
There is no mechanism that would cause subsequent insertions of new keys
to respect that ordering, so it's pretty much irrelevant to the argument
about whether new UUID keys need to be generated in some ordered fashion.
Do you actually *need* UUID keys, and if so why? A plain old bigint
column is smaller, cheaper to index, and the natural mechanism for
generating it (ie a sequence) will tend to preserve ordering for free.
regards, tom lane
Luke Gordon <gordysc@gmail.com> writes: > PS I attempted to reply to the above thread, but downloading the "raw" link > resulted in a basic auth challenge. If I entered my mailing list > credentials, I'd just get rechallenged with basic auth... BTW, so far as that goes: it's just a weak anti-bot measure. If you try it in Firefox you'll get an explanation: A username and password are being requested by https://www.postgresql.org. The site says: "Please authenticate with user archives and password antispam" Unfortunately, we've found that most other browsers don't present that message :-( regards, tom lane
Luke: On Tue, Aug 30, 2016 at 3:59 PM, Luke Gordon <gordysc@gmail.com> wrote: > I'm trying to decide on which UUID generator to use for my Postgres > database, and I've narrowed it down to gen_random & uuid_generate_v1mc. > > There's a fascinating article that discusses performance implications > between gen_random_uuid & uuid_generate_v1mc: > https://www.starkandwayne.com/blog/uuid-primary-keys-in-postgresql/ > TL;DR, the article suggests: > "Random produces very fragmented inserts that destroy tables. Use > uuid_generate_v1mc() [instead].." He probably means destroy index, not tables, tables are not index ordered in postgres ( like some kind of clustered tables in other products ) > > However, according to a message on this mailing list, Postgres doesn't have > clustered indexes: > "But Postgres doesn't _have_ clustered indexes, so that article doesn't > apply at all. The other authors appear to have missed this important point." > https://www.postgresql.org/message-id/56798352.7060902%40uchicago.edu No, he hasn't. If you are generating them to use as a kind of primary key, like a serial, having a timestamp based means they are generated in ascending order, which means you append on the table ( at the end ) and on the index, so they end up better filled and perform better. Specially in the index, ordered insertions tend to be well optimized, being a common case, and perform quite well. It's not going to be as advantegous as on a clustered table system, but it will help ( but not that much ). > But, doing a quick check, it appears Postgres does indeed have a mechanism > for a clustered index: > https://www.postgresql.org/docs/9.5/static/sql-cluster.html > So, does anyone know which is best? Or are the performance differences so > minute they'd only matter in extremely unique circumstances? Cluster just resorts the table and rebuild the index, it's already being told in other messages. It's equivalent to create temp table tt as select * from the_table, truncate th_table, insert into the_table select * from tt order by index_expression, drop table tt. It is nice to do it for tables that are normally ordered but somehow lost it. Like having a log table with an indexed field for insertion timestamp and updating it a lot, or purging many old records. As you normally would typically query it with a range condition on the timestamp, a cluster will help. Francisco Olarte.
From: Tom Lane Sent: Tuesday, August 30, 2016 7:16 AM
Luke Gordon <gordysc@gmail.com> writes:
> However, according to a message on this mailing list, Postgres doesn't
> have clustered indexes:
> "But Postgres doesn't _have_ clustered indexes, so that article
> doesn't apply at all. The other authors appear to have missed this important point."
> https://www.postgresql.org/message-id/56798352.7060902%40uchicago.edu
> But, doing a quick check, it appears Postgres does indeed have a
> mechanism for a clustered index:
> https://www.postgresql.org/docs/9.5/static/sql-cluster.html
CLUSTER just does a one-time sort to put the table into index order.
There is no mechanism that would cause subsequent insertions of new keys to respect that ordering, so it's pretty much irrelevant to the argument about whether new UUID keys need to be generated in some ordered fashion.
Do you actually *need* UUID keys, and if so why? A plain old bigint column is smaller, cheaper to index, and the natural mechanism for generating it (ie a sequence) will tend to preserve ordering for free.
regards, tom lane
I agree with Tom for a “normal” application – I would always use bigints (bigserial) as a PK column. The app I currently working on is a high security web app for which the app coders require guids for all identifiers flowing around the system. So in this scenario, I’m using BOTH bigserials as the PK and uuids as AKs in the core tables. I reference the bigints for all joins and (have to) use the uuids for the filters. It’s been working ok so far, lookup performance on a table with a few million rows, using the uuid (indexed) is instantaneous. I’ll soon have a 100 million+ rows loaded into a single table and know a bit more.
The uuids are also design insurance for me in case I need to shard, since I’ll need/want that uniqueness across servers.
Mike Sofen
On Tue, 30 Aug 2016 09:40:33 -0700, "Mike Sofen" <msofen@runbox.com> wrote: >From: Tom Lane Sent: Tuesday, August 30, 2016 7:16 AM > >>Do you actually *need* UUID keys, and if so why? A plain old bigint column >>is smaller, cheaper to index, and the natural mechanism for generating it >>(ie a sequence) will tend to preserve ordering for free. > >I agree with Tom for a "normal" application - I would always use bigints >(bigserial) as a PK column. The app I currently working on is a high >security web app for which the app coders require guids for all identifiers >flowing around the system. So in this scenario, I'm using BOTH bigserials >as the PK and uuids as AKs in the core tables. I reference the bigints for >all joins and (have to) use the uuids for the filters. It's been working ok >so far, lookup performance on a table with a few million rows, using the >uuid (indexed) is instantaneous. I'll soon have a 100 million+ rows loaded >into a single table and know a bit more. > > > >The uuids are also design insurance for me in case I need to shard, since >I'll need/want that uniqueness across servers. FYI: articles about sharding using bigint keys. http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram http://rob.conery.io/2014/05/29/a-better-id-generator-for-postgresql/ George
From: George Neuner Sent: Tuesday, August 30, 2016 5:54 PM
>Mike Sofen wrote: So in this scenario, I'm using
>BOTH bigserials as the PK and uuids as AKs in the core tables. I
>reference the bigints for all joins and (have to) use the uuids for the
>filters. It's been working ok so far, lookup performance on a table
>with a few million rows, using the uuid (indexed) is instantaneous.
>I'll soon have a 100 million+ rows loaded into a single table and know a bit more.
>
>The uuids are also design insurance for me in case I need to shard,
>since I'll need/want that uniqueness across servers.
FYI: articles about sharding using bigint keys.
http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram
http://rob.conery.io/2014/05/29/a-better-id-generator-for-postgresql/
George
I remember reading these articles a long time ago, forgot about them...and appreciate the reminder!
I really liked the enhanced Instagram function from Rob Conery in the second link, but so far haven’t needed to deal with it. However, an upcoming project may require huge data storage – approaching hundreds of billions of rows, and I’m sticking with Postgres – so this will be a great way to test drive the function. And I may try my hand at a further enhancement, time permitting. Thanks for the links!
Mike