Re: [HACKERS] Can ICU be used for a database's default sort order? - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: [HACKERS] Can ICU be used for a database's default sort order? |
Date | |
Msg-id | CAEepm=11rd6jh=oXn+WgumHT_DExE7PkYH4wfViOdUr8n1LEQA@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Can ICU be used for a database's default sort order? (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: [HACKERS] Can ICU be used for a database's default sort order?
|
List | pgsql-hackers |
On Sat, Jun 24, 2017 at 10:55 AM Peter Geoghegan <pg@bowt.ie> wrote: > On Fri, Jun 23, 2017 at 11:32 AM, Peter Eisentraut > <peter.eisentraut@2ndquadrant.com> wrote: > > It's something I hope to address soon. > > I hope you do. I think that we'd realize significant benefits by > having ICU become the defacto standard collation provider, that most > users get without even realizing it. As things stand, you have to make > a point of specifying an ICU collation as your per-column collation > within every CREATE TABLE. That's a significant barrier to adoption. > > > 1) Associate by name only. That is, you can create a database with any > > COLLATION "foo" that you want, and it's only checked when you first > > connect to or do anything in the database. > > > > 2) Create shared collations. Then we'd need a way to manage having a > > mix of shared and non-shared collations around. > > > > There are significant pros and cons to all of these ideas. Some people > > I talked to appeared to prefer the shared collations approach. > > I strongly prefer the second approach. The only downside that occurs > to me is that that approach requires more code. Is there something > that I've missed? Sorry to join this thread late. I was redirected here from another one[1]. I like the shared catalog idea, but here's one objection I thought about: it makes it a bit harder to track whether you've sorted out all your indexes after a version change. Say collation fr_CA's version changes according to the provider, so that it no longer matches the stored collversion. Now you'll need to be careful to connect to every database in the cluster and run REINDEX, before you run ALTER COLLATION "fr_CA" REFRESH VERSION to update the single shared pg_collation row's collversion. With the non-shared pg_collation scheme we have currently, you'd need to refresh the collation row in each database after reindexing the whole database, which is IMHO a bit nicer (you track which databases you've dealt with as you go through them). In other words, using a shared catalog moves the "scope" of the version tracking even further away from the ideal scope, and requires humans to actually get the cleanup right, and it's extra confusing because you can only be connected to one database at a time so there is no "REINDEX MY CLUSTER" and no possibility of making a command that reindexes dependent indexes and then refreshes the collation version. The ideal scope would be to track all referenced collation versions on every index, and only update them at CREATE INDEX or REINDEX time (also, as discussed in some other thread, CHECK constraints and partition keys might be invalidated and should in theory also carry versions that can only be updated by running a hypothetical RECHECK or REPARTITION command). Then a shared pg_collation catalog would make perfect sense, and there would be no need for it to have a collversion column at all, or an ALTER COLLATION ... REFRESH VERSION command, and therefore there would be no way to screw it up by REFRESHing the VERSION without having really fixed the problem. [1] https://www.postgresql.org/message-id/242e081c-aec8-a20a-510c-f4d0f183cebd%402ndquadrant.com -- Thomas Munro http://www.enterprisedb.com
pgsql-hackers by date: