Thread: [BUGS] Problem with PostgreSQL string sorting
Hello All,
I am a Database administrator and we are using PostgreSQL-9.6.1 version in RHEL-7 linux machine. Current server Encoding setup is UTF8 and LC_COLLATE and LC_CTYPE is en_US.UTF-8.
Lets say table person has field name with values likes
name
------
Abc
abc
.dcb
Dcb
$sdf
EDF
en_US.UTF-8 collate sorting this field as:
name
------
abc
Abc
.dcb
Dcb
EDF
$sdf
It ignore's Special characters.
"C" collate sorting this field as:
name
------
$sdf
.dcb
Abc
Dcb
EDF
abc
It does case sensitive sort.
My expected sorting should be: (Case insensitive and special character)
name
------
$sdf
.dcb
Abc
abc
Dcb
EDF
I can use lower() and collate 'C' to get this result. But I need a default collate to support this.
Is there any collation that support both (Case insensitive and special character) sorting in utf8?
your help would be really appreciated.
Thanks,
Hari
I am a Database administrator and we are using PostgreSQL-9.6.1 version in RHEL-7 linux machine. Current server Encoding setup is UTF8 and LC_COLLATE and LC_CTYPE is en_US.UTF-8.
Lets say table person has field name with values likes
name
------
Abc
abc
.dcb
Dcb
$sdf
EDF
en_US.UTF-8 collate sorting this field as:
name
------
abc
Abc
.dcb
Dcb
EDF
$sdf
It ignore's Special characters.
"C" collate sorting this field as:
name
------
$sdf
.dcb
Abc
Dcb
EDF
abc
It does case sensitive sort.
My expected sorting should be: (Case insensitive and special character)
name
------
$sdf
.dcb
Abc
abc
Dcb
EDF
I can use lower() and collate 'C' to get this result. But I need a default collate to support this.
Is there any collation that support both (Case insensitive and special character) sorting in utf8?
your help would be really appreciated.
Thanks,
Hari
On Wed, Feb 15, 2017 at 7:44 AM, Hari Sankar A <hsshanthamhari@gmail.com> wrote: > I am a Database administrator and we are using PostgreSQL-9.6.1 version in > RHEL-7 linux machine. Current server Encoding setup is UTF8 and LC_COLLATE > and LC_CTYPE is en_US.UTF-8. The behavior of each collation comes from the operating system's own libc, except for the C collation, which is based on the ordering implied by strcmp() comparisons. Generally, most implementations have the behavior you describe, in that they assign least weight of all to caseness and whitespace, and somewhat more weight to punctuation. I don't think that there is much that can be done about it in practice, though in principal there could be a collation that has all the properties you want. Future versions of PostgreSQL may have better support for this kind of thing. -- Peter Geoghegan -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Peter Geoghegan <pg@bowt.ie> writes: > The behavior of each collation comes from the operating system's own > libc, except for the C collation, which is based on the ordering > implied by strcmp() comparisons. Generally, most implementations have > the behavior you describe, in that they assign least weight of all to > caseness and whitespace, and somewhat more weight to punctuation. I > don't think that there is much that can be done about it in practice, > though in principal there could be a collation that has all the > properties you want. I believe most versions of libc use declarative specifications of locale properties, so if you wanted to spend some time learning the language in which those are written, you could make and install a locale that acts like you want. It's possible there's one out there already, because a lot of people seem to want that. In any case, this is not a bug, this is a feature request; and it's not even for a Postgres feature, it's for a libc feature. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Feb 15, 2017 at 9:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I believe most versions of libc use declarative specifications of locale > properties, so if you wanted to spend some time learning the language in > which those are written, you could make and install a locale that acts > like you want. It's possible there's one out there already, because > a lot of people seem to want that. True. And, the declarative language glibc uses doesn't look that complicated to me. However, you'd really have to vendor your own glibc to make this work, since they are compiled to some binary representation when glibc is built. That's probably prohibitively complicated to manage. -- Peter Geoghegan -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Thank you all for your valuable comments.
I will work on glibc to get this feature.
Appreciate your help.
Thanks,
Hari
On Wed, Feb 15, 2017 at 12:14 PM Peter Geoghegan <pg@bowt.ie> wrote:
On Wed, Feb 15, 2017 at 9:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I believe most versions of libc use declarative specifications of locale
> properties, so if you wanted to spend some time learning the language in
> which those are written, you could make and install a locale that acts
> like you want. It's possible there's one out there already, because
> a lot of people seem to want that.
True. And, the declarative language glibc uses doesn't look that
complicated to me. However, you'd really have to vendor your own glibc
to make this work, since they are compiled to some binary
representation when glibc is built. That's probably prohibitively
complicated to manage.
--
Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > On Wed, Feb 15, 2017 at 9:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I believe most versions of libc use declarative specifications of locale >> properties, so if you wanted to spend some time learning the language in >> which those are written, you could make and install a locale that acts >> like you want. It's possible there's one out there already, because >> a lot of people seem to want that. > True. And, the declarative language glibc uses doesn't look that > complicated to me. However, you'd really have to vendor your own glibc > to make this work, since they are compiled to some binary > representation when glibc is built. That's probably prohibitively > complicated to manage. Really? "man localedef" certainly gives the impression that you can just compile some more locales if you want. Admittedly, your system's package manager might have a hissy fit about nonstandard files suddenly appearing in /usr/lib/locale, but that's not glibc's fault. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
On Wed, Feb 15, 2017 at 9:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Really? "man localedef" certainly gives the impression that you can > just compile some more locales if you want. Admittedly, your system's > package manager might have a hissy fit about nonstandard files > suddenly appearing in /usr/lib/locale, but that's not glibc's fault. I agree, but the apparent lack of any standard, high-level guidance on creating a custom locale would leave me hesitant to take that course of action. -- Peter Geoghegan -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs