Re: WIP patch: Collation support - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: WIP patch: Collation support |
Date | |
Msg-id | 48D3ABE8.3030802@enterprisedb.com Whole thread Raw |
In response to | Re: WIP patch: Collation support (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
Responses |
Re: WIP patch: Collation support
Re: WIP patch: Collation support |
List | pgsql-hackers |
Here's an updated version of the stripped-down patch, now with documentation changes, plus a couple of minor bug fixes. I think this is looking pretty good now, and I would appreciate review from others before I go ahead committing this. Is there any more places in the documentation that needs updating, or any grammar mistakes? Further opinions on whether to allow using a database with different collation order as template, requiring a manual reindex and facing weird results if that's neglected, or to error out with a hint suggesting to use template0 as template (as is in the patch now)? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml index f484db8..1e1786a 100644 --- a/doc/src/sgml/charset.sgml +++ b/doc/src/sgml/charset.sgml @@ -130,23 +130,24 @@ initdb --locale=sv_SE <para> The nature of some locale categories is that their value has to be - fixed for the lifetime of a database cluster. That is, once - <command>initdb</command> has run, you cannot change them anymore. - <literal>LC_COLLATE</literal> and <literal>LC_CTYPE</literal> are - those categories. They affect the sort order of indexes, so they - must be kept fixed, or indexes on text columns will become corrupt. - <productname>PostgreSQL</productname> enforces this by recording - the values of <envar>LC_COLLATE</> and <envar>LC_CTYPE</> that are - seen by <command>initdb</>. The server automatically adopts - those two values when it is started. + fixed when the database is created. You can use different settings + for different databases, but once a database is created, you cannot + change them for that database anymore. <literal>LC_COLLATE</literal> + and <literal>LC_CTYPE</literal> are those categories. They affect + the sort order of indexes, so they must be kept fixed, or indexes on + text columns will become corrupt. The default values for these + categories are defined when <command>initdb</command> is run, and + those values are used when new databases are created, unless + explicitly specified otherwise in the <command>CREATE + DATABASE</command> command. </para> <para> The other locale categories can be changed as desired whenever the server is running by setting the run-time configuration variables that have the same name as the locale categories (see <xref - linkend="runtime-config-client-format"> for details). The defaults that are - chosen by <command>initdb</command> are actually only written into + linkend="runtime-config-client-format"> for details). The defaults + that are chosen by <command>initdb</command> are actually only written into the configuration file <filename>postgresql.conf</filename> to serve as defaults when the server is started. If you delete these assignments from <filename>postgresql.conf</filename> then the @@ -261,7 +262,7 @@ initdb --locale=sv_SE <para> Check that <productname>PostgreSQL</> is actually using the locale - that you think it is. <envar>LC_COLLATE</> and <envar>LC_CTYPE</> + that you think it is. The default <envar>LC_COLLATE</> and <envar>LC_CTYPE</> settings are determined at <command>initdb</> time and cannot be changed without repeating <command>initdb</>. Other locale settings including <envar>LC_MESSAGES</> and <envar>LC_MONETARY</> @@ -320,16 +321,10 @@ initdb --locale=sv_SE <para> An important restriction, however, is that each database character set - must be compatible with the server's <envar>LC_CTYPE</> setting. + must be compatible with the database's <envar>LC_CTYPE</> setting. When <envar>LC_CTYPE</> is <literal>C</> or <literal>POSIX</>, any character set is allowed, but for other settings of <envar>LC_CTYPE</> there is only one character set that will work correctly. - Since the <envar>LC_CTYPE</> setting is frozen by <command>initdb</>, the - apparent flexibility to use different encodings in different databases - of a cluster is more theoretical than real, except when you select - <literal>C</> or <literal>POSIX</> locale (thus disabling any real locale - awareness). It is likely that these mechanisms will be revisited in future - versions of <productname>PostgreSQL</productname>. </para> <sect2 id="multibyte-charset-supported"> @@ -734,19 +729,19 @@ initdb -E EUC_JP </para> <para> - If you have selected <literal>C</> or <literal>POSIX</> locale, - you can create a database with a different character set: + You can specify a non-default encoding at database creation time, + provided that the encoding is compatible with the selected locale: <screen> -createdb -E EUC_KR korean +createdb -E EUC_KR -T template0 --lc-collate=ko_KR.euckr --lc-ctype=ko_KR.euckr korean </screen> This will create a database named <literal>korean</literal> that - uses the character set <literal>EUC_KR</literal>. Another way to - accomplish this is to use this SQL command: + uses the character set <literal>EUC_KR</literal>, and locale <literal>ko_KR</literal>. + Another way to accomplish this is to use this SQL command: <programlisting> -CREATE DATABASE korean WITH ENCODING 'EUC_KR'; +CREATE DATABASE korean WITH ENCODING 'EUC_KR' COLLATE='ko_KR.euckr' CTYPE='ko_KR.euckr' TEMPLATE=template0; </programlisting> The encoding for a database is stored in the system catalog @@ -754,6 +749,8 @@ CREATE DATABASE korean WITH ENCODING 'EUC_KR'; <option>-l</option> option or the <command>\l</command> command of <command>psql</command>. +XXX It would be nice to show the "korean" database here. And we should +modify psql to show locale as well <screen> $ <userinput>psql -l</userinput> List of databases diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml index 2d4a7bf..f02a957 100644 --- a/doc/src/sgml/indices.sgml +++ b/doc/src/sgml/indices.sgml @@ -157,7 +157,7 @@ CREATE INDEX test1_id_index ON test1 (id); <emphasis>if</emphasis> the pattern is a constant and is anchored to the beginning of the string — for example, <literal>col LIKE 'foo%'</literal> or <literal>col ~ '^foo'</literal>, but not - <literal>col LIKE '%bar'</literal>. However, if your server does not + <literal>col LIKE '%bar'</literal>. However, if your database does not use the C locale you will need to create the index with a special operator class to support indexing of pattern-matching queries. See <xref linkend="indexes-opclass"> below. It is also possible to use @@ -922,7 +922,7 @@ CREATE INDEX <replaceable>name</replaceable> ON <replaceable>table</replaceable> according to the locale-specific collation rules. This makes these operator classes suitable for use by queries involving pattern matching expressions (<literal>LIKE</literal> or POSIX - regular expressions) when the server does not use the standard + regular expressions) when the database does not use the standard <quote>C</quote> locale. As an example, you might index a <type>varchar</type> column like this: <programlisting> diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml index 95350c4..ed048b0 100644 --- a/doc/src/sgml/ref/create_database.sgml +++ b/doc/src/sgml/ref/create_database.sgml @@ -24,6 +24,8 @@ CREATE DATABASE <replaceable class="PARAMETER">name</replaceable> [ [ WITH ] [ OWNER [=] <replaceable class="parameter">dbowner</replaceable> ] [ TEMPLATE [=] <replaceable class="parameter">template</replaceable> ] [ ENCODING [=] <replaceable class="parameter">encoding</replaceable> ] + [ COLLATE [=] <replaceable class="parameter">collation</replaceable> ] + [ CTYPE [=] <replaceable class="parameter">ctype</replaceable> ] [ TABLESPACE [=] <replaceable class="parameter">tablespace</replaceable> ] [ CONNECTION LIMIT [=] <replaceable class="parameter">connlimit</replaceable> ] ] </synopsis> @@ -113,6 +115,29 @@ CREATE DATABASE <replaceable class="PARAMETER">name</replaceable> </listitem> </varlistentry> <varlistentry> + <term><replaceable class="parameter">collation</replaceable></term> + <listitem> + <para> + Collation order (<literal>LC_COLLATE</>) to use in the new database. + This affects the sort order applied to strings, e.g in queries with + ORDER BY, as well as the order used in indexes on text columns. + The default is to use the collation order of the template database. + See below for additional restrictions. + </para> + </listitem> + </varlistentry> + <varlistentry> + <term><replaceable class="parameter">ctype</replaceable></term> + <listitem> + <para> + Character classification (<literal>LC_CTYPE</>) to use in the new + database. This affects the categorization of characters, e.g. lower, + upper and digit. The default is to use the character classification of + the template database. See below for additional restrictions. + </para> + </listitem> + </varlistentry> + <varlistentry> <term><replaceable class="parameter">tablespace</replaceable></term> <listitem> <para> @@ -181,12 +206,10 @@ CREATE DATABASE <replaceable class="PARAMETER">name</replaceable> <para> Any character set encoding specified for the new database must be - compatible with the server's <envar>LC_CTYPE</> locale setting. + compatible with the chosen COLLATE and CTYPE settings. If <envar>LC_CTYPE</> is <literal>C</> (or equivalently <literal>POSIX</>), then all encodings are allowed, but for other - locale settings there is only one encoding that will work properly, - and so the apparent freedom to specify an encoding is illusory if - you didn't initialize the database cluster in <literal>C</> locale. + locale settings there is only one encoding that will work properly. <command>CREATE DATABASE</> will allow superusers to specify <literal>SQL_ASCII</> encoding regardless of the locale setting, but this choice is deprecated and may result in misbehavior of @@ -195,6 +218,16 @@ CREATE DATABASE <replaceable class="PARAMETER">name</replaceable> </para> <para> + The <literal>COLLATE</> and <literal>CTYPE</> settings must match + those of the template database, except when template0 is used as + template. This is because <literal>COLLATE</> and <literal>CTYPE</> + affects the ordering in indexes, so that any indexes copied from the + template database would be invalid in the new database with different + settings. <literal>template0</literal>, however, is known to not + contain any indexes that would be affected. + </para> + + <para> The <literal>CONNECTION LIMIT</> option is only enforced approximately; if two new sessions start at about the same time when just one connection <quote>slot</> remains for the database, it is possible that diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml index 10a9d1c..b313b1f 100644 --- a/doc/src/sgml/ref/initdb.sgml +++ b/doc/src/sgml/ref/initdb.sgml @@ -76,25 +76,30 @@ PostgreSQL documentation <para> <command>initdb</command> initializes the database cluster's default - locale and character set encoding. The collation order - (<literal>LC_COLLATE</>) and character set classes - (<literal>LC_CTYPE</>, e.g. upper, lower, digit) are fixed for all - databases and cannot be changed. Collation orders other than - <literal>C</> or <literal>POSIX</> also have a performance penalty. - For these reasons it is important to choose the right locale when - running <command>initdb</command>. The remaining locale categories - can be changed later when the server is started. All server locale - values (<literal>lc_*</>) can be displayed via <command>SHOW ALL</>. + locale and character set encoding. + + The character set encoding, collation order (<literal>LC_COLLATE</>) + and character set classes (<literal>LC_CTYPE</>, e.g. upper, lower, + digit) can be set separately for a database when it is created. + <command>initdb</command> determines those settings for the + <literal>template1</literal> database, which will serve as the + default for all other databases. + + To alter the default collation order or character set classes, use the + <option>--lc-collate</option> and <option>--lc-ctype</option> options. + Collation orders other than <literal>C</> or <literal>POSIX</> also have + a performance penalty. For these reasons it is important to choose the + right locale when running <command>initdb</command>. + + The remaining locale categories can be changed later when the server + is started. You can also use <option>--locale</option> to set the + default for all locale categories, including collation order and + character set classes. All server locale values (<literal>lc_*</>) can + be displayed via <command>SHOW ALL</>. More details can be found in <xref linkend="locale">. - </para> - <para> - The character set encoding can be set separately for a database when - it is created. <command>initdb</command> determines the encoding for - the <literal>template1</literal> database, which will serve as the - default for all other databases. To alter the default encoding use - the <option>--encoding</option> option. More details can be found in - <xref linkend="multibyte">. + To alter the default encoding, use the <option>--encoding</option>. + More details can be found in <xref linkend="multibyte">. </para> </refsect1> diff --git a/doc/src/sgml/ref/pg_controldata.sgml b/doc/src/sgml/ref/pg_controldata.sgml index b379df0..7f50b7b 100644 --- a/doc/src/sgml/ref/pg_controldata.sgml +++ b/doc/src/sgml/ref/pg_controldata.sgml @@ -30,7 +30,7 @@ PostgreSQL documentation <title>Description</title> <para> <command>pg_controldata</command> prints information initialized during - <command>initdb</>, such as the catalog version and server locale. + <command>initdb</>, such as the catalog version. It also shows information about write-ahead logging and checkpoint processing. This information is cluster-wide, and not specific to any one database. diff --git a/doc/src/sgml/ref/pg_resetxlog.sgml b/doc/src/sgml/ref/pg_resetxlog.sgml index 1933455..f7331e4 100644 --- a/doc/src/sgml/ref/pg_resetxlog.sgml +++ b/doc/src/sgml/ref/pg_resetxlog.sgml @@ -62,12 +62,9 @@ PostgreSQL documentation by specifying the <literal>-f</> (force) switch. In this case plausible values will be substituted for the missing data. Most of the fields can be expected to match, but manual assistance might be needed for the next OID, - next transaction ID and epoch, next multitransaction ID and offset, - WAL starting address, and database locale fields. - The first six of these can be set using the switches discussed below. - <command>pg_resetxlog</command>'s own environment is the source for its - guess at the locale fields; take care that <envar>LANG</> and so forth - match the environment that <command>initdb</> was run in. + next transaction ID and epoch, next multitransaction ID and offset, and + WAL starting address fields. + The first five of these can be set using the switches discussed below. If you are not able to determine correct values for all these fields, <literal>-f</> can still be used, but the recovered database must be treated with even more suspicion than diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index e463d15..a1be7cb 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -747,8 +747,7 @@ SELECT name FROM distributors ORDER BY code; <para> Character-string data is sorted according to the locale-specific - collation order that was established when the database cluster - was initialized. + collation order that was established when the database was created. </para> </refsect2> diff --git a/doc/src/sgml/ref/show.sgml b/doc/src/sgml/ref/show.sgml index 3d238c3..461a9c8 100644 --- a/doc/src/sgml/ref/show.sgml +++ b/doc/src/sgml/ref/show.sgml @@ -82,8 +82,8 @@ SHOW ALL <para> Shows the database's locale setting for collation (text ordering). At present, this parameter can be shown but not - set, because the setting is determined at - <command>initdb</> time. + set, because the setting is determined at database creation + time. </para> </listitem> </varlistentry> @@ -94,8 +94,8 @@ SHOW ALL <para> Shows the database's locale setting for character classification. At present, this parameter can be shown but - not set, because the setting is determined at - <command>initdb</> time. + not set, because the setting is determined at database creation + time. </para> </listitem> </varlistentry> diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml index 5ce6163..4b7f5eb 100644 --- a/doc/src/sgml/runtime.sgml +++ b/doc/src/sgml/runtime.sgml @@ -145,11 +145,12 @@ postgres$ <userinput>initdb -D /usr/local/pgsql/data</userinput> Normally, it will just take the locale settings in the environment and apply them to the initialized database. It is possible to specify a different locale for the database; more information about - that can be found in <xref linkend="locale">. The sort order used - within a particular database cluster is set by - <command>initdb</command> and cannot be changed later, short of - dumping all data, rerunning <command>initdb</command>, and reloading - the data. There is also a performance impact for using locales + that can be found in <xref linkend="locale">. The default sort order used + within the particular database cluster is set by + <command>initdb</command>, and while you can create new databases using + different sort order, the order used in the template databases that initdb + creates cannot be changed without dropping and recreating them. + There is also a performance impact for using locales other than <literal>C</> or <literal>POSIX</>. Therefore, it is important to make this choice correctly the first time. </para> diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml index e48f82b..98ac7f8 100644 --- a/doc/src/sgml/textsearch.sgml +++ b/doc/src/sgml/textsearch.sgml @@ -1896,7 +1896,7 @@ LIMIT 10; <note> <para> - The parser's notion of a <quote>letter</> is determined by the server's + The parser's notion of a <quote>letter</> is determined by the database's locale setting, specifically <varname>lc_ctype</>. Words containing only the basic ASCII letters are reported as a separate token type, since it is sometimes useful to distinguish them. In most European diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 5645271..edd1d74 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -3847,7 +3847,6 @@ WriteControlFile(void) { int fd; char buffer[PG_CONTROL_SIZE]; /* need not be aligned */ - char *localeptr; /* * Initialize version and compatibility-check fields @@ -3876,18 +3875,6 @@ WriteControlFile(void) ControlFile->float4ByVal = FLOAT4PASSBYVAL; ControlFile->float8ByVal = FLOAT8PASSBYVAL; - ControlFile->localeBuflen = LOCALE_NAME_BUFLEN; - localeptr = setlocale(LC_COLLATE, NULL); - if (!localeptr) - ereport(PANIC, - (errmsg("invalid LC_COLLATE setting"))); - StrNCpy(ControlFile->lc_collate, localeptr, LOCALE_NAME_BUFLEN); - localeptr = setlocale(LC_CTYPE, NULL); - if (!localeptr) - ereport(PANIC, - (errmsg("invalid LC_CTYPE setting"))); - StrNCpy(ControlFile->lc_ctype, localeptr, LOCALE_NAME_BUFLEN); - /* Contents are protected with a CRC */ INIT_CRC32(ControlFile->crc); COMP_CRC32(ControlFile->crc, @@ -4126,34 +4113,6 @@ ReadControlFile(void) " but the server was compiled without USE_FLOAT8_BYVAL."), errhint("It looks like you need to recompile or initdb."))); #endif - - if (ControlFile->localeBuflen != LOCALE_NAME_BUFLEN) - ereport(FATAL, - (errmsg("database files are incompatible with server"), - errdetail("The database cluster was initialized with LOCALE_NAME_BUFLEN %d," - " but the server was compiled with LOCALE_NAME_BUFLEN %d.", - ControlFile->localeBuflen, LOCALE_NAME_BUFLEN), - errhint("It looks like you need to recompile or initdb."))); - if (pg_perm_setlocale(LC_COLLATE, ControlFile->lc_collate) == NULL) - ereport(FATAL, - (errmsg("database files are incompatible with operating system"), - errdetail("The database cluster was initialized with LC_COLLATE \"%s\"," - " which is not recognized by setlocale().", - ControlFile->lc_collate), - errhint("It looks like you need to initdb or install locale support."))); - if (pg_perm_setlocale(LC_CTYPE, ControlFile->lc_ctype) == NULL) - ereport(FATAL, - (errmsg("database files are incompatible with operating system"), - errdetail("The database cluster was initialized with LC_CTYPE \"%s\"," - " which is not recognized by setlocale().", - ControlFile->lc_ctype), - errhint("It looks like you need to initdb or install locale support."))); - - /* Make the fixed locale settings visible as GUC variables, too */ - SetConfigOption("lc_collate", ControlFile->lc_collate, - PGC_INTERNAL, PGC_S_OVERRIDE); - SetConfigOption("lc_ctype", ControlFile->lc_ctype, - PGC_INTERNAL, PGC_S_OVERRIDE); } void diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c index 4dd6262..2d5e27b 100644 --- a/src/backend/commands/dbcommands.c +++ b/src/backend/commands/dbcommands.c @@ -53,6 +53,7 @@ #include "utils/fmgroids.h" #include "utils/guc.h" #include "utils/lsyscache.h" +#include "utils/pg_locale.h" #include "utils/syscache.h" #include "utils/tqual.h" @@ -69,7 +70,7 @@ static bool get_db_info(const char *name, LOCKMODE lockmode, Oid *dbIdP, Oid *ownerIdP, int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP, Oid *dbLastSysOidP, TransactionId *dbFrozenXidP, - Oid *dbTablespace); + Oid *dbTablespace, char **dbCollation, char **dbCtype); static bool have_createdb_privilege(void); static void remove_dbtablespaces(Oid db_id); static bool check_db_file_conflict(Oid db_id); @@ -87,6 +88,8 @@ createdb(const CreatedbStmt *stmt) Oid src_dboid; Oid src_owner; int src_encoding; + char *src_collation; + char *src_ctype; bool src_istemplate; bool src_allowconn; Oid src_lastsysoid; @@ -104,10 +107,14 @@ createdb(const CreatedbStmt *stmt) DefElem *downer = NULL; DefElem *dtemplate = NULL; DefElem *dencoding = NULL; + DefElem *dcollation = NULL; + DefElem *dctype = NULL; DefElem *dconnlimit = NULL; char *dbname = stmt->dbname; char *dbowner = NULL; const char *dbtemplate = NULL; + char *lc_collate = NULL; + char *lc_ctype = NULL; int encoding = -1; int dbconnlimit = -1; int ctype_encoding; @@ -152,6 +159,22 @@ createdb(const CreatedbStmt *stmt) errmsg("conflicting or redundant options"))); dencoding = defel; } + else if (strcmp(defel->defname, "collate") == 0) + { + if (dcollation) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("conflicting or redundant options"))); + dcollation = defel; + } + else if (strcmp(defel->defname, "ctype") == 0) + { + if (dctype) + ereport(ERROR, + (errcode(ERRCODE_SYNTAX_ERROR), + errmsg("conflicting or redundant options"))); + dctype = defel; + } else if (strcmp(defel->defname, "connectionlimit") == 0) { if (dconnlimit) @@ -205,6 +228,11 @@ createdb(const CreatedbStmt *stmt) elog(ERROR, "unrecognized node type: %d", nodeTag(dencoding->arg)); } + if (dcollation && dcollation->arg) + lc_collate = strVal(dcollation->arg); + if (dctype && dctype->arg) + lc_ctype = strVal(dctype->arg); + if (dconnlimit && dconnlimit->arg) dbconnlimit = intVal(dconnlimit->arg); @@ -243,7 +271,8 @@ createdb(const CreatedbStmt *stmt) if (!get_db_info(dbtemplate, ShareLock, &src_dboid, &src_owner, &src_encoding, &src_istemplate, &src_allowconn, &src_lastsysoid, - &src_frozenxid, &src_deftablespace)) + &src_frozenxid, &src_deftablespace, + &src_collation, &src_ctype)) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_DATABASE), errmsg("template database \"%s\" does not exist", @@ -262,9 +291,13 @@ createdb(const CreatedbStmt *stmt) dbtemplate))); } - /* If encoding is defaulted, use source's encoding */ + /* If encoding or locales are defaulted, use source's setting */ if (encoding < 0) encoding = src_encoding; + if (lc_collate == NULL) + lc_collate = src_collation; + if (lc_ctype == NULL) + lc_ctype = src_ctype; /* Some encodings are client only */ if (!PG_VALID_BE_ENCODING(encoding)) @@ -272,6 +305,16 @@ createdb(const CreatedbStmt *stmt) (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("invalid server encoding %d", encoding))); + /* Check that the chosen locales are valid */ + if (!check_locale(LC_COLLATE, lc_collate)) + ereport(ERROR, + (errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("invalid locale name %s", lc_collate))); + if (!check_locale(LC_CTYPE, lc_ctype)) + ereport(ERROR, + (errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("invalid locale name %s", lc_ctype))); + /* * Check whether encoding matches server locale settings. We allow * mismatch in three cases: @@ -290,7 +333,7 @@ createdb(const CreatedbStmt *stmt) * * Note: if you change this policy, fix initdb to match. */ - ctype_encoding = pg_get_encoding_from_locale(NULL); + ctype_encoding = pg_get_encoding_from_locale(lc_ctype); if (!(ctype_encoding == encoding || ctype_encoding == PG_SQL_ASCII || @@ -299,12 +342,32 @@ createdb(const CreatedbStmt *stmt) #endif (encoding == PG_SQL_ASCII && superuser()))) ereport(ERROR, - (errmsg("encoding %s does not match server's locale %s", + (errmsg("encoding %s does not match locale %s", pg_encoding_to_char(encoding), - setlocale(LC_CTYPE, NULL)), - errdetail("The server's LC_CTYPE setting requires encoding %s.", + lc_ctype), + errdetail("The chosen LC_CTYPE setting requires encoding %s.", pg_encoding_to_char(ctype_encoding)))); + /* + * Check that the new locale is compatible with the source database. + * + * We know that template0 doesn't contain any indexes that depend on + * collation or ctype, so template0 can be used as template for + * any locale. + */ + if (strcmp(dbtemplate, "template0") != 0) + { + if (strcmp(lc_collate, src_collation)) + ereport(ERROR, + (errmsg("new collation is incompatible with the collation of the template database (%s)", src_collation), + errhint("Use the same collation as in the template database, or use template0 as template"))); + + if (strcmp(lc_ctype, src_ctype)) + ereport(ERROR, + (errmsg("new ctype is incompatible with the ctype of the template database (%s)", src_ctype), + errhint("Use the same ctype as in the template database, or use template0 as template"))); + } + /* Resolve default tablespace for new database */ if (dtablespacename && dtablespacename->arg) { @@ -421,6 +484,8 @@ createdb(const CreatedbStmt *stmt) DirectFunctionCall1(namein, CStringGetDatum(dbname)); new_record[Anum_pg_database_datdba - 1] = ObjectIdGetDatum(datdba); new_record[Anum_pg_database_encoding - 1] = Int32GetDatum(encoding); + new_record[Anum_pg_database_collation - 1] = DirectFunctionCall1(namein, CStringGetDatum(lc_collate)); + new_record[Anum_pg_database_ctype - 1] = DirectFunctionCall1(namein, CStringGetDatum(lc_ctype)); new_record[Anum_pg_database_datistemplate - 1] = BoolGetDatum(false); new_record[Anum_pg_database_datallowconn - 1] = BoolGetDatum(true); new_record[Anum_pg_database_datconnlimit - 1] = Int32GetDatum(dbconnlimit); @@ -629,7 +694,7 @@ dropdb(const char *dbname, bool missing_ok) pgdbrel = heap_open(DatabaseRelationId, RowExclusiveLock); if (!get_db_info(dbname, AccessExclusiveLock, &db_id, NULL, NULL, - &db_istemplate, NULL, NULL, NULL, NULL)) + &db_istemplate, NULL, NULL, NULL, NULL, NULL, NULL)) { if (!missing_ok) { @@ -781,7 +846,7 @@ RenameDatabase(const char *oldname, const char *newname) rel = heap_open(DatabaseRelationId, RowExclusiveLock); if (!get_db_info(oldname, AccessExclusiveLock, &db_id, NULL, NULL, - NULL, NULL, NULL, NULL, NULL)) + NULL, NULL, NULL, NULL, NULL, NULL, NULL)) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_DATABASE), errmsg("database \"%s\" does not exist", oldname))); @@ -1168,7 +1233,7 @@ get_db_info(const char *name, LOCKMODE lockmode, Oid *dbIdP, Oid *ownerIdP, int *encodingP, bool *dbIsTemplateP, bool *dbAllowConnP, Oid *dbLastSysOidP, TransactionId *dbFrozenXidP, - Oid *dbTablespace) + Oid *dbTablespace, char **dbCollation, char **dbCtype) { bool result = false; Relation relation; @@ -1259,6 +1324,11 @@ get_db_info(const char *name, LOCKMODE lockmode, /* default tablespace for this database */ if (dbTablespace) *dbTablespace = dbform->dattablespace; + /* default locale settings for this database */ + if (dbCollation) + *dbCollation = pstrdup(NameStr(dbform->collation)); + if (dbCtype) + *dbCtype = pstrdup(NameStr(dbform->ctype)); ReleaseSysCache(tuple); result = true; break; diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 5ff353d..0831a94 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -398,7 +398,7 @@ static TypeName *TableFuncTypeName(List *columns); CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT COMMITTED CONCURRENTLY CONFIGURATION CONNECTION CONSTRAINT CONSTRAINTS CONTENT_P CONTINUE_P CONVERSION_P COPY COST CREATE CREATEDB - CREATEROLE CREATEUSER CROSS CSV CURRENT_P CURRENT_DATE CURRENT_ROLE + CREATEROLE CREATEUSER CROSS CSV CTYPE CURRENT_P CURRENT_DATE CURRENT_ROLE CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS @@ -5458,6 +5458,22 @@ createdb_opt_item: { $$ = makeDefElem("encoding", NULL); } + | COLLATE opt_equal Sconst + { + $$ = makeDefElem("collate", (Node *)makeString($3)); + } + | COLLATE opt_equal DEFAULT + { + $$ = makeDefElem("collate", NULL); + } + | CTYPE opt_equal Sconst + { + $$ = makeDefElem("ctype", (Node *)makeString($3)); + } + | CTYPE opt_equal DEFAULT + { + $$ = makeDefElem("ctype", NULL); + } | CONNECTION LIMIT opt_equal SignedIconst { $$ = makeDefElem("connectionlimit", (Node *)makeInteger($4)); @@ -9216,6 +9232,7 @@ unreserved_keyword: | CREATEROLE | CREATEUSER | CSV + | CTYPE | CURRENT_P | CURSOR | CYCLE diff --git a/src/backend/parser/keywords.c b/src/backend/parser/keywords.c index b30a478..e2fe4bf 100644 --- a/src/backend/parser/keywords.c +++ b/src/backend/parser/keywords.c @@ -114,6 +114,7 @@ const ScanKeyword ScanKeywords[] = { {"createuser", CREATEUSER, UNRESERVED_KEYWORD}, {"cross", CROSS, TYPE_FUNC_NAME_KEYWORD}, {"csv", CSV, UNRESERVED_KEYWORD}, + {"ctype", CTYPE, UNRESERVED_KEYWORD}, {"current", CURRENT_P, UNRESERVED_KEYWORD}, {"current_date", CURRENT_DATE, RESERVED_KEYWORD}, {"current_role", CURRENT_ROLE, RESERVED_KEYWORD}, diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c index c0a01ae..4a55be5 100644 --- a/src/backend/utils/adt/pg_locale.c +++ b/src/backend/utils/adt/pg_locale.c @@ -189,6 +189,30 @@ pg_perm_setlocale(int category, const char *locale) } +/* + * Is the locale name valid for the locale category? + */ +bool +check_locale(int category, const char *value) +{ + char *save; + bool ret; + + save = setlocale(category, NULL); + if (!save) + return false; /* won't happen, we hope */ + + /* save may be pointing at a modifiable scratch variable, see above */ + save = pstrdup(save); + + ret = (setlocale(category, value) != NULL); + + setlocale(category, save); /* assume this won't fail */ + pfree(save); + + return ret; +} + /* GUC assign hooks */ /* @@ -203,21 +227,9 @@ pg_perm_setlocale(int category, const char *locale) static const char * locale_xxx_assign(int category, const char *value, bool doit, GucSource source) { - char *save; - - save = setlocale(category, NULL); - if (!save) - return NULL; /* won't happen, we hope */ - - /* save may be pointing at a modifiable scratch variable, see above */ - save = pstrdup(save); - - if (!setlocale(category, value)) + if (!check_locale(category, value)) value = NULL; /* set failure return marker */ - setlocale(category, save); /* assume this won't fail */ - pfree(save); - /* need to reload cache next time? */ if (doit && value != NULL) { diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c index 461edd9..08fbf3b 100644 --- a/src/backend/utils/init/postinit.c +++ b/src/backend/utils/init/postinit.c @@ -159,6 +159,8 @@ CheckMyDatabase(const char *name, bool am_superuser) { HeapTuple tup; Form_pg_database dbform; + char *collate; + char *ctype; /* Fetch our real pg_database row */ tup = SearchSysCache(DATABASEOID, @@ -240,6 +242,28 @@ CheckMyDatabase(const char *name, bool am_superuser) /* If we have no other source of client_encoding, use server encoding */ SetConfigOption("client_encoding", GetDatabaseEncodingName(), PGC_BACKEND, PGC_S_DEFAULT); + + /* assign locale variables */ + collate = NameStr(dbform->collation); + ctype = NameStr(dbform->ctype); + + if (setlocale(LC_COLLATE, collate) == NULL) + ereport(FATAL, + (errmsg("database locale is incompatible with operating system"), + errdetail("The database was initialized with LC_COLLATE \"%s\", " + " which is not recognized by setlocale().", collate), + errhint("Recreate the database with another locale or install the missing locale."))); + + if (setlocale(LC_CTYPE, ctype) == NULL) + ereport(FATAL, + (errmsg("database locale is incompatible with operating system"), + errdetail("The database was initialized with LC_CTYPE \"%s\", " + " which is not recognized by setlocale().", ctype), + errhint("Recreate the database with another locale or install the missing locale."))); + + /* Make the locale settings visible as GUC variables, too */ + SetConfigOption("lc_collate", collate, PGC_INTERNAL, PGC_S_DATABASE); + SetConfigOption("lc_ctype", ctype, PGC_INTERNAL, PGC_S_DATABASE); /* * Lastly, set up any database-specific configuration variables. diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 3effafb..4b85281 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -1353,6 +1353,10 @@ bootstrap_template1(char *short_version) bki_lines = replace_token(bki_lines, "ENCODING", encodingid); + bki_lines = replace_token(bki_lines, "LC_COLLATE", lc_collate); + + bki_lines = replace_token(bki_lines, "LC_CTYPE", lc_ctype); + /* * Pass correct LC_xxx environment to bootstrap. * @@ -2378,12 +2382,12 @@ usage(const char *progname) printf(_("\nOptions:\n")); printf(_(" [-D, --pgdata=]DATADIR location for this database cluster\n")); printf(_(" -E, --encoding=ENCODING set default encoding for new databases\n")); - printf(_(" --locale=LOCALE initialize database cluster with given locale\n")); + printf(_(" --locale=LOCALE set default locale for new databases\n")); printf(_(" --lc-collate, --lc-ctype, --lc-messages=LOCALE\n" " --lc-monetary, --lc-numeric, --lc-time=LOCALE\n" - " initialize database cluster with given locale\n" - " in the respective category (default taken from\n" - " environment)\n")); + " set default locale in the respective\n" + " category for new databases (default\n" + " taken from environment)\n")); printf(_(" --no-locale equivalent to --locale=C\n")); printf(_(" -T, --text-search-config=CFG\n" " default text search configuration\n")); @@ -2806,10 +2810,12 @@ main(int argc, char *argv[]) strcmp(lc_ctype, lc_numeric) == 0 && strcmp(lc_ctype, lc_monetary) == 0 && strcmp(lc_ctype, lc_messages) == 0) - printf(_("The database cluster will be initialized with locale %s.\n"), lc_ctype); + printf(_("The template databases will be initialized with locale %s.\n"), lc_ctype); else { - printf(_("The database cluster will be initialized with locales\n" + /* XXX only collate and ctype are actually set in stone here, others + * are userset gucs */ + printf(_("The template databases will be initialized with locales\n" " COLLATE: %s\n" " CTYPE: %s\n" " MESSAGES: %s\n" diff --git a/src/bin/pg_controldata/pg_controldata.c b/src/bin/pg_controldata/pg_controldata.c index 7970c32..08e102f 100644 --- a/src/bin/pg_controldata/pg_controldata.c +++ b/src/bin/pg_controldata/pg_controldata.c @@ -220,12 +220,5 @@ main(int argc, char *argv[]) (ControlFile.float4ByVal ? _("by value") : _("by reference"))); printf(_("Float8 argument passing: %s\n"), (ControlFile.float8ByVal ? _("by value") : _("by reference"))); - printf(_("Maximum length of locale name: %u\n"), - ControlFile.localeBuflen); - printf(_("LC_COLLATE: %s\n"), - ControlFile.lc_collate); - printf(_("LC_CTYPE: %s\n"), - ControlFile.lc_ctype); - return 0; } diff --git a/src/bin/pg_resetxlog/pg_resetxlog.c b/src/bin/pg_resetxlog/pg_resetxlog.c index 345b89c..0df796b 100644 --- a/src/bin/pg_resetxlog/pg_resetxlog.c +++ b/src/bin/pg_resetxlog/pg_resetxlog.c @@ -493,22 +493,6 @@ GuessControlValues(void) #endif ControlFile.float4ByVal = FLOAT4PASSBYVAL; ControlFile.float8ByVal = FLOAT8PASSBYVAL; - ControlFile.localeBuflen = LOCALE_NAME_BUFLEN; - - localeptr = setlocale(LC_COLLATE, ""); - if (!localeptr) - { - fprintf(stderr, _("%s: invalid LC_COLLATE setting\n"), progname); - exit(1); - } - strlcpy(ControlFile.lc_collate, localeptr, sizeof(ControlFile.lc_collate)); - localeptr = setlocale(LC_CTYPE, ""); - if (!localeptr) - { - fprintf(stderr, _("%s: invalid LC_CTYPE setting\n"), progname); - exit(1); - } - strlcpy(ControlFile.lc_ctype, localeptr, sizeof(ControlFile.lc_ctype)); /* * XXX eventually, should try to grovel through old XLOG to develop more @@ -584,12 +568,6 @@ PrintControlValues(bool guessed) (ControlFile.float4ByVal ? _("by value") : _("by reference"))); printf(_("Float8 argument passing: %s\n"), (ControlFile.float8ByVal ? _("by value") : _("by reference"))); - printf(_("Maximum length of locale name: %u\n"), - ControlFile.localeBuflen); - printf(_("LC_COLLATE: %s\n"), - ControlFile.lc_collate); - printf(_("LC_CTYPE: %s\n"), - ControlFile.lc_ctype); } diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c index 286667e..37d41de 100644 --- a/src/bin/scripts/createdb.c +++ b/src/bin/scripts/createdb.c @@ -32,6 +32,8 @@ main(int argc, char *argv[]) {"tablespace", required_argument, NULL, 'D'}, {"template", required_argument, NULL, 'T'}, {"encoding", required_argument, NULL, 'E'}, + {"lc-collate", required_argument, NULL, 1}, + {"lc-ctype", required_argument, NULL, 2}, {NULL, 0, NULL, 0} }; @@ -50,6 +52,8 @@ main(int argc, char *argv[]) char *tablespace = NULL; char *template = NULL; char *encoding = NULL; + char *lc_collate = NULL; + char *lc_ctype = NULL; PQExpBufferData sql; @@ -95,6 +99,12 @@ main(int argc, char *argv[]) case 'E': encoding = optarg; break; + case 1: + lc_collate = optarg; + break; + case 2: + lc_ctype = optarg; + break; default: fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname); exit(1); @@ -152,6 +162,11 @@ main(int argc, char *argv[]) appendPQExpBuffer(&sql, " ENCODING '%s'", encoding); if (template) appendPQExpBuffer(&sql, " TEMPLATE %s", fmtId(template)); + if (lc_collate) + appendPQExpBuffer(&sql, " COLLATE '%s'", lc_collate); + if (lc_ctype) + appendPQExpBuffer(&sql, " CTYPE '%s'", lc_ctype); + appendPQExpBuffer(&sql, ";\n"); conn = connectDatabase(strcmp(dbname, "postgres") == 0 ? "template1" : "postgres", @@ -209,6 +224,9 @@ help(const char *progname) printf(_("\nOptions:\n")); printf(_(" -D, --tablespace=TABLESPACE default tablespace for the database\n")); printf(_(" -E, --encoding=ENCODING encoding for the database\n")); + printf(_(" --lc-collate=LOCALE LC_COLLATE setting for the database\n")); + printf(_(" --lc-ctype=LOCALE LC_CTYPE setting for the database\n")); + printf(_(" -O, --owner=OWNER database user to own the new database\n")); printf(_(" -T, --template=TEMPLATE template database to copy\n")); printf(_(" -e, --echo show the commands being sent to the server\n")); diff --git a/src/include/catalog/pg_control.h b/src/include/catalog/pg_control.h index 38b5a84..aee6934 100644 --- a/src/include/catalog/pg_control.h +++ b/src/include/catalog/pg_control.h @@ -144,11 +144,6 @@ typedef struct ControlFileData bool float4ByVal; /* float4 pass-by-value? */ bool float8ByVal; /* float8, int8, etc pass-by-value? */ - /* active locales */ - uint32 localeBuflen; - char lc_collate[LOCALE_NAME_BUFLEN]; - char lc_ctype[LOCALE_NAME_BUFLEN]; - /* CRC of all above ... MUST BE LAST! */ pg_crc32 crc; } ControlFileData; diff --git a/src/include/catalog/pg_database.h b/src/include/catalog/pg_database.h index 6e9e5d2..1b10e60 100644 --- a/src/include/catalog/pg_database.h +++ b/src/include/catalog/pg_database.h @@ -33,6 +33,8 @@ CATALOG(pg_database,1262) BKI_SHARED_RELATION NameData datname; /* database name */ Oid datdba; /* owner of database */ int4 encoding; /* character encoding */ + NameData collation; /* LC_COLLATE of database */ + NameData ctype; /* LC_CTYPE of database */ bool datistemplate; /* allowed as CREATE DATABASE template? */ bool datallowconn; /* new connections allowed? */ int4 datconnlimit; /* max connections allowed (-1=no limit) */ @@ -54,20 +56,22 @@ typedef FormData_pg_database *Form_pg_database; * compiler constants for pg_database * ---------------- */ -#define Natts_pg_database 11 +#define Natts_pg_database 13 #define Anum_pg_database_datname 1 #define Anum_pg_database_datdba 2 #define Anum_pg_database_encoding 3 -#define Anum_pg_database_datistemplate 4 -#define Anum_pg_database_datallowconn 5 -#define Anum_pg_database_datconnlimit 6 -#define Anum_pg_database_datlastsysoid 7 -#define Anum_pg_database_datfrozenxid 8 -#define Anum_pg_database_dattablespace 9 -#define Anum_pg_database_datconfig 10 -#define Anum_pg_database_datacl 11 +#define Anum_pg_database_collation 4 +#define Anum_pg_database_ctype 5 +#define Anum_pg_database_datistemplate 6 +#define Anum_pg_database_datallowconn 7 +#define Anum_pg_database_datconnlimit 8 +#define Anum_pg_database_datlastsysoid 9 +#define Anum_pg_database_datfrozenxid 10 +#define Anum_pg_database_dattablespace 11 +#define Anum_pg_database_datconfig 12 +#define Anum_pg_database_datacl 13 -DATA(insert OID = 1 ( template1 PGUID ENCODING t t -1 0 0 1663 _null_ _null_ )); +DATA(insert OID = 1 ( template1 PGUID ENCODING "LC_COLLATE" "LC_CTYPE" t t -1 0 0 1663 _null_ _null_)); SHDESCR("default template database"); #define TemplateDbOid 1 diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h index 5a49823..2b60027 100644 --- a/src/include/utils/pg_locale.h +++ b/src/include/utils/pg_locale.h @@ -39,6 +39,7 @@ extern const char *locale_numeric_assign(const char *value, extern const char *locale_time_assign(const char *value, bool doit, GucSource source); +extern bool check_locale(int category, const char *locale); extern char *pg_perm_setlocale(int category, const char *locale); extern bool lc_collate_is_c(void); diff --git a/src/interfaces/ecpg/preproc/preproc.y b/src/interfaces/ecpg/preproc/preproc.y index 0a8b62b..949e76b 100644 --- a/src/interfaces/ecpg/preproc/preproc.y +++ b/src/interfaces/ecpg/preproc/preproc.y @@ -428,7 +428,7 @@ add_typedef(char *name, char * dimension, char * length, enum ECPGttype type_enu CLUSTER COALESCE COLLATE COLUMN COMMENT COMMIT COMMITTED CONCURRENTLY CONFIGURATION CONNECTION CONSTRAINT CONSTRAINTS CONTENT_P CONTINUE_P CONVERSION_P COPY COST CREATE CREATEDB - CREATEROLE CREATEUSER CROSS CSV CURRENT_P CURRENT_DATE CURRENT_ROLE + CREATEROLE CREATEUSER CROSS CSV CTYPE CURRENT_P CURRENT_DATE CURRENT_ROLE CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
pgsql-hackers by date: