Re: Proposal - Collation at database level - Mailing list pgsql-hackers
From | Radek Strnad |
---|---|
Subject | Re: Proposal - Collation at database level |
Date | |
Msg-id | 483FFBDE.9040504@gmail.com Whole thread Raw |
In response to | Re: Proposal - Collation at database level (Zdenek Kotala <Zdenek.Kotala@Sun.COM>) |
Responses |
Re: Proposal - Collation at database level
|
List | pgsql-hackers |
Zdenek Kotala wrote: > Radek Strnad napsal(a): > > <snip> > >> >> I'm thinking of dividing the problem into two parts - in beginning >> pg_collation will contain two functions. One will have hard-coded rules >> for these basic collations (SQL_CHARACTER, GRAPHIC_IRV, LATIN1, ISO8BIT, >> UCS_BASIC). It will compare each string character bitwise and guarantee >> that the implementation will meet the SQL standard implemented in >> PostgreSQL. >> Second one will allow the user to use installed system locales. The set >> of these collations will obviously vary between systems. Catalogs will >> contain encoding and collation for calling the system locale function. >> This will allow us to use collations such as en_US.utf8, cs_CZ.iso88592 >> etc. if they will be availible. >> >> We will also need to change the way how strings are compared. Regarding >> the set database collation the right function will be used. >> http://doxygen.postgresql.org/varlena_8c.html#4c7af81f110f9be0bd8eb2bd99525675 >> >> >> This design will make possible switch to ICU or any other implementation >> quite simple and will not cause any major rewriting of what I'm coding >> right now. > > > Collation function is main point here. How you mentioned one will be > only wrapper about strcmp and second one about strcoll. (maybe you > need four - char/wchar) Which function will be used it is defined in > pg_collation catalog by CREATE COLLATION command. But you need specify > name of locale for system locales. It means you need attribute for > storing locale name. > You're right. I've extended pg_collation for system locale columns. In the first stage we actually don't need any other catalogs such as encoding, etc. and we can build this functionality only on following pg_collation catalog. Used collation function (system or built-in) will be decided on existing collation name. CATALOG(pg_collations, ###) { NameData colname; /* collation name */ Oid colschema; /* collation schema */ NameData colcharset; /* character set specification */ Oid colexistingcollation; /* existing collation */ bool colpadattribute; /* pad attribute */ bool colcasesensitive; /* case sensitive */ bool colaccent; /* accent sensitive */ NameData colsyslccollate; /* lc_collate */ NameData colsyslcctype; /*lc_ctype */ regproc colfunc; /* used collation function */ } FormData_pg_collations; >> FormData_pg_collations; > It would be good to send list of new and modified SQL commands (like > CREATE COLLATION) for wide discussion. > CREATE COLLATION <collation name> FOR <character set specification> FROM <existing collation name> [ <pad characteristic> ] [ <case sensitive> ] [ <accent sensitive> ] [ LC_COLLATE <lc_collate> ] [ LC_CTYPE <lc_ctype> ] <pad characteristic> := NO PAD | PAD SPACE <case sensitive> := CASE SENSITIVE | CASE INSENSITIVE <accent sensitive> := ACCENT SENSITIVE | ACCENT INSENSITIVE Since you can specify order by in select clause there's no need for adding ascending and descending type of collation. They will allways be ascending. DROP COLLATION <collation name> CREATE DATABASE ... [ COLLATE <collation name> ] ... ALTER DATABASE ... [ COLLATE <collation name> ] ... Any thoughts? Radek
pgsql-hackers by date: