Thread: order by different on mac vs linux
I have postgresql 9.1.3 on a mac and on linux. On the mac the results come out: ! *`-=[];',./~@#$%^&()_+{}|:"<>?\ then \--\ On ubuntu 12.04 x64 it comes out (compiled and installed postgres from tbz2 from postgresql.org repo): \--\ then ! *`-=[];',./~@#$%^&()_+{}|:"<>?\ Why is there a different order on the different platforms. I even copied the data folder on the mac to the linux box instead of doing pg_dump and then \i pg.sql file on Linux to move the data and the order is still different like this. Any idea why? the ! line should come before the \ line, lexically. ! is char 33 and \ is char 92. This is done with select * from table order by field; on both systems and "field" is character varying on both systems. Thanks, Wes
Wes James <comptekki@gmail.com> writes: > Why is there a different order on the different platforms. This is not exactly unusual. You should first check to see if lc_collate is set differently in the two installations --- but even if it's the same, there are often platform-specific interpretations of the sorting rules. (Not to mention that OS X is flat out broken when it comes to sorting UTF8 data ...) If you want consistent cross-platform results, "C" locale will get that for you, but it's pretty stupid about non-ASCII characters. For more info read http://www.postgresql.org/docs/9.1/static/charset.html regards, tom lane
On Mon, May 14, 2012 at 5:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Wes James <comptekki@gmail.com> writes: >> Why is there a different order on the different platforms. > > This is not exactly unusual. You should first check to see if > lc_collate is set differently in the two installations --- but even if > it's the same, there are often platform-specific interpretations of > the sorting rules. (Not to mention that OS X is flat out broken when > it comes to sorting UTF8 data ...) > > If you want consistent cross-platform results, "C" locale will get > that for you, but it's pretty stupid about non-ASCII characters. > > For more info read > http://www.postgresql.org/docs/9.1/static/charset.html > > regards, tom lane Thanks, I'll take a look. -wes
On Mon, May 14, 2012 at 5:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Wes James <comptekki@gmail.com> writes: >> Why is there a different order on the different platforms. > > This is not exactly unusual. You should first check to see if > lc_collate is set differently in the two installations --- but even if > it's the same, there are often platform-specific interpretations of > the sorting rules. (Not to mention that OS X is flat out broken when > it comes to sorting UTF8 data ...) > > If you want consistent cross-platform results, "C" locale will get > that for you, but it's pretty stupid about non-ASCII characters. > > For more info read > http://www.postgresql.org/docs/9.1/static/charset.html > > regards, tom lane I checked both and they both say this: show lc_collate;lc_collate -------------en_US.UTF-8 (1 row) You mention OS X is broken, but it looks like the linux box is broken in this case. With the current database I have, it looks like everything is sorting correctly on both platforms, except those two records (the two lines from my first email) which I added later for testing characters and escaping, etc. on a web app. -wes
On Mon, May 14, 2012 at 5:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Wes James <comptekki@gmail.com> writes: >> Why is there a different order on the different platforms. > > This is not exactly unusual. You should first check to see if > lc_collate is set differently in the two installations --- but even if > it's the same, there are often platform-specific interpretations of > the sorting rules. (Not to mention that OS X is flat out broken when > it comes to sorting UTF8 data ...) > > If you want consistent cross-platform results, "C" locale will get > that for you, but it's pretty stupid about non-ASCII characters. > > For more info read > http://www.postgresql.org/docs/9.1/static/charset.html > > regards, tom lane I tried using the postgres that comes with ubuntu (sudo apt-get install postgresql). With my app I kept getting invalid password. I went in to the database sudo -u postgres database and did \password and set a password, but I still got invalid password error from the app api trying to make a connection. I then went back to the source installed version and now the output is correct. I'm not sure what changed. hmmm. -wes
On Tue, May 15, 2012 at 10:06 AM, Wes James <comptekki@gmail.com> wrote: > On Mon, May 14, 2012 at 5:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Wes James <comptekki@gmail.com> writes: >>> Why is there a different order on the different platforms. >> >> This is not exactly unusual. You should first check to see if >> lc_collate is set differently in the two installations --- but even if >> it's the same, there are often platform-specific interpretations of >> the sorting rules. (Not to mention that OS X is flat out broken when >> it comes to sorting UTF8 data ...) >> >> If you want consistent cross-platform results, "C" locale will get >> that for you, but it's pretty stupid about non-ASCII characters. >> >> For more info read >> http://www.postgresql.org/docs/9.1/static/charset.html >> >> regards, tom lane > > I tried using the postgres that comes with ubuntu (sudo apt-get > install postgresql). > > With my app I kept getting invalid password. I went in to the > database sudo -u postgres database and did \password and set a > password, but I still got invalid password error from the app api > trying to make a connection. > > I then went back to the source installed version and now the output is > correct. I'm not sure what changed. hmmm. It's most likely an issue with the settings in your pg_hba.conf file.
On Tue, May 15, 2012 at 10:16 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Tue, May 15, 2012 at 10:06 AM, Wes James <comptekki@gmail.com> wrote: >> On Mon, May 14, 2012 at 5:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Wes James <comptekki@gmail.com> writes: >>>> Why is there a different order on the different platforms. >>> >>> This is not exactly unusual. You should first check to see if >>> lc_collate is set differently in the two installations --- but even if >>> it's the same, there are often platform-specific interpretations of >>> the sorting rules. (Not to mention that OS X is flat out broken when >>> it comes to sorting UTF8 data ...) >>> >>> If you want consistent cross-platform results, "C" locale will get >>> that for you, but it's pretty stupid about non-ASCII characters. >>> >>> For more info read >>> http://www.postgresql.org/docs/9.1/static/charset.html >>> >>> regards, tom lane >> >> I tried using the postgres that comes with ubuntu (sudo apt-get >> install postgresql). >> >> With my app I kept getting invalid password. I went in to the >> database sudo -u postgres database and did \password and set a >> password, but I still got invalid password error from the app api >> trying to make a connection. >> >> I then went back to the source installed version and now the output is >> correct. I'm not sure what changed. hmmm. > > It's most likely an issue with the settings in your pg_hba.conf file. Ok - if I try the apt-get version, I'll look at that. The apt-get version is a cluster version. Looks like that file is in /etc/postgresql/9.1/dbname I see local all postgres peer local all all peer host all all 127.0.0.1/32 md5 host all all ::1/128 md5 in there. -wes
On Mon, May 14, 2012 at 5:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
on linux
# SELECT CASE WHEN 'apache' > '!yada' THEN 'TRUE' ELSE 'FALSE' END FROM pg_user;
case
-------
FALSE
(1 row)
# show lc_collate;
lc_collate
-------------
en_US.UTF-8
Wes James <comptekki@gmail.com> writes:This is not exactly unusual. You should first check to see if
> Why is there a different order on the different platforms.
lc_collate is set differently in the two installations --- but even if
it's the same, there are often platform-specific interpretations of
the sorting rules. (Not to mention that OS X is flat out broken when
it comes to sorting UTF8 data ...)
I just ran these:
linux:
# SELECT CASE WHEN 'apache' > '!yada' THEN 'TRUE' ELSE 'FALSE' END FROM pg_user;
case
-------
FALSE
(1 row)
# show lc_collate;
lc_collate
-------------
en_US.UTF-8
(1 row)
------------------------
on mac os x:
# SELECT CASE WHEN 'apache' > '!yada' THEN 'TRUE' ELSE 'FALSE' END FROM pg_user;
case
------
TRUE
(1 row)
# show lc_collate;
lc_collate
-------------
en_US.UTF-8
(1 row)
-----------------------
Why is the linux postgres saying false with the lc_collage set the way it is?
-wes
On Wed, May 16, 2012 at 3:46 PM, Wes James <comptekki@gmail.com> wrote:
On Mon, May 14, 2012 at 5:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:Wes James <comptekki@gmail.com> writes:This is not exactly unusual. You should first check to see if
> Why is there a different order on the different platforms.
lc_collate is set differently in the two installations --- but even if
it's the same, there are often platform-specific interpretations of
the sorting rules. (Not to mention that OS X is flat out broken when
it comes to sorting UTF8 data ...)I just ran these:linux:on linux
# SELECT CASE WHEN 'apache' > '!yada' THEN 'TRUE' ELSE 'FALSE' END FROM pg_user;
case
-------
FALSE
(1 row)
# show lc_collate;
lc_collate
-------------
en_US.UTF-8(1 row)------------------------on mac os x:# SELECT CASE WHEN 'apache' > '!yada' THEN 'TRUE' ELSE 'FALSE' END FROM pg_user;case------TRUE(1 row)# show lc_collate;lc_collate-------------en_US.UTF-8(1 row)-----------------------Why is the linux postgres saying false with the lc_collage set the way it is?
That's the point - UTF-8 collation is just completely broken under OS X. There's much previous discussion of the topic on this list and elsewhere. If you're developing on OS X but running linux and you are mostly using an ascii character set in your test dataset, set your development OS X boxes to use C collation, which will basically do what you expect it do do until you start throwing multibyte characters at it. If you can't constrain your testing/development dataset in such a manner and collation order really matters during development, then you probably shouldn't develop on OS X. I spent a fair amount of time investigating how to define a new charset in what proved to ultimately be a futile attempt to redefine UTF-8 on OSX to behave just like ti does on Linux. I just gave it up after wasting a few too many hours on it. It may be possible to do it, but the return on invested time was non-existent for me so I abandoned my effort.
On Wed, May 16, 2012 at 5:00 PM, Samuel Gendler <sgendler@ideasculptor.com> wrote:
On Wed, May 16, 2012 at 3:46 PM, Wes James <comptekki@gmail.com> wrote:On Mon, May 14, 2012 at 5:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:Wes James <comptekki@gmail.com> writes:This is not exactly unusual. You should first check to see if
> Why is there a different order on the different platforms.
lc_collate is set differently in the two installations --- but even if
it's the same, there are often platform-specific interpretations of
the sorting rules. (Not to mention that OS X is flat out broken when
it comes to sorting UTF8 data ...)I just ran these:linux:on linux
# SELECT CASE WHEN 'apache' > '!yada' THEN 'TRUE' ELSE 'FALSE' END FROM pg_user;
case
-------
FALSE
(1 row)
# show lc_collate;
lc_collate
-------------
en_US.UTF-8(1 row)------------------------on mac os x:# SELECT CASE WHEN 'apache' > '!yada' THEN 'TRUE' ELSE 'FALSE' END FROM pg_user;case------TRUE(1 row)# show lc_collate;lc_collate-------------en_US.UTF-8(1 row)-----------------------Why is the linux postgres saying false with the lc_collage set the way it is?That's the point - UTF-8 collation is just completely broken under OS X. There's much previous discussion of the topic on this list and elsewhere. If you're developing on OS X but running linux and you are mostly using an ascii character set in your test dataset, set your development OS X boxes to use C collation, which will basically do what you expect it do do until you start throwing multibyte characters at it. If you can't constrain your testing/development dataset in such a manner and collation order really matters during development, then you probably shouldn't develop on OS X. I spent a fair amount of time investigating how to define a new charset in what proved to ultimately be a futile attempt to redefine UTF-8 on OSX to behave just like ti does on Linux. I just gave it up after wasting a few too many hours on it. It may be possible to do it, but the return on invested time was non-existent for me so I abandoned my effort.
Why are people saying os x is broken in my case? Looking at http://www.utf8-chartable.de/ and http://www.asciitable.com/ for sort order, ! should come before apache. On os x it is correct, on ubuntu linux it is not. In the order by output per my previous emails, it is correct on os x, but no on linux. Why do people keep saying os x is broken, when it is doing the correct thing from what I've seen?
-wes
On Wed, May 16, 2012 at 7:58 PM, Wes James <comptekki@gmail.com> wrote: > > > On Wed, May 16, 2012 at 5:00 PM, Samuel Gendler <sgendler@ideasculptor.com> > wrote: >> >> >> >> On Wed, May 16, 2012 at 3:46 PM, Wes James <comptekki@gmail.com> wrote: >>> >>> >>> >>> On Mon, May 14, 2012 at 5:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> >>>> Wes James <comptekki@gmail.com> writes: >>>> > Why is there a different order on the different platforms. >>>> >>>> This is not exactly unusual. You should first check to see if >>>> lc_collate is set differently in the two installations --- but even if >>>> it's the same, there are often platform-specific interpretations of >>>> the sorting rules. (Not to mention that OS X is flat out broken when >>>> it comes to sorting UTF8 data ...) >>>> >>> >>> I just ran these: >>> >>> linux: >>> >>> on linux >>> >>> # SELECT CASE WHEN 'apache' > '!yada' THEN 'TRUE' ELSE 'FALSE' END FROM >>> pg_user; >>> case >>> ------- >>> FALSE >>> (1 row) >>> >>> # show lc_collate; >>> >>> lc_collate >>> ------------- >>> en_US.UTF-8 >>> (1 row) >>> >>> ------------------------ >>> >>> on mac os x: >>> >>> # SELECT CASE WHEN 'apache' > '!yada' THEN 'TRUE' ELSE 'FALSE' END FROM >>> pg_user; >>> case >>> ------ >>> TRUE >>> (1 row) >>> >>> # show lc_collate; >>> lc_collate >>> ------------- >>> en_US.UTF-8 >>> (1 row) >>> >>> >>> ----------------------- >>> >>> Why is the linux postgres saying false with the lc_collage set the way it >>> is? >> >> >> That's the point - UTF-8 collation is just completely broken under OS X. >> There's much previous discussion of the topic on this list and elsewhere. >> If you're developing on OS X but running linux and you are mostly using an >> ascii character set in your test dataset, set your development OS X boxes to >> use C collation, which will basically do what you expect it do do until you >> start throwing multibyte characters at it. If you can't constrain your >> testing/development dataset in such a manner and collation order really >> matters during development, then you probably shouldn't develop on OS X. I >> spent a fair amount of time investigating how to define a new charset in >> what proved to ultimately be a futile attempt to redefine UTF-8 on OSX to >> behave just like ti does on Linux. I just gave it up after wasting a few >> too many hours on it. It may be possible to do it, but the return on >> invested time was non-existent for me so I abandoned my effort. >> > > Why are people saying os x is broken in my case? Looking > at http://www.utf8-chartable.de/ and http://www.asciitable.com/ for sort > order, ! should come before apache. On os x it is correct, on ubuntu linux > it is not. In the order by output per my previous emails, it is correct on > os x, but no on linux. Why do people keep saying os x is broken, when it is > doing the correct thing from what I've seen? You're confusing encoding with locale. UTF-8 is an encoding. If there were no Locale (i.e. it was set to C, simple byte ordering) then you'd be correct. HOWEVER, a locale is a different animal altogether.For instance, most locales ignore many characters whenit comes to sort ordering. Such as spaces: smarlowe=# create table test (i text); CREATE TABLE smarlowe=# insert into test values ('abc'),(' abc'),('def'),(' def'); INSERT 0 4 smarlowe=# select * from test order by i; i ------abc abcdef def (4 rows) Note the spaces are ignored for sorting purposes.
On Mon, May 14, 2012 at 4:42 PM, Wes James <comptekki@gmail.com> wrote:
I have postgresql 9.1.3 on a mac and on linux.
On the mac the results come out:
! *`-=[];',./~@#$%^&()_+{}|:"<>?\
then
\--\
On ubuntu 12.04 x64 it comes out (compiled and installed postgres from
tbz2 from postgresql.org repo):
\--\
then
! *`-=[];',./~@#$%^&()_+{}|:"<>?\
Why is there a different order on the different platforms. I even
copied the data folder on the mac to the linux box instead of doing
pg_dump and then \i pg.sql file on Linux to move the data and the
order is still different like this.
Any idea why? the ! line should come before the \ line, lexically. !
is char 33 and \ is char 92.
This is done with
select * from table order by field;
on both systems and "field" is character varying on both systems.
I'd like to bring up my question again after testing on windows xp. Why does xp and mac os x sort properly when linux does not? I tested this last week with ubuntu 11.10 and it is doing the same thing as 12.04. UTF8 encoding and lc_collate = en_US.UTF-8 on mac and linux and american_usa on windows which is the same thing as en_US.UTF-8. All are using UTF8 encoding.
I'd really like to use linux to host this DB (and have it hosted in vmware ESXi), but the order is not coming out correct. I've got foreign titles of books in this db, so I need the UTF8 all the way through. "C" won't cut it.
Here is a dump of a db that you can test on ubuntu.
do:
select * from dbooks order by title;
in psql
and it won't sort correctly. Add a record with something like "!a" in the title and run the query above and it won't sort properly. (This is unverified as I don't have linux in front of me right now - but I did add records and they did not sort properly with "order by title")
-wes
Back to the drawing board. Windows sorts these lines thus (postgresql 9.1.3):
! *`-=[];',./~@#$%^&()_+{}|:"<>?\
!!
!a
!A
!B
!ia
!test
\--\
African agricultural research and technological development
on the ascii table here:
upper case letters should sort before lowercase letters.
-wes
On 2012-05-27, Wes James <comptekki@gmail.com> wrote: >> >> Why is there a different order on the different platforms. postgres uses the text comparison operators provided by the host platform. for linux these are defined in the locales package this may be part of the glibc package. > I'd like to bring up my question again after testing on windows xp. Why > does xp and mac os x sort properly when linux does not? osx and windows hare broken sorting, > I tested this last > week with ubuntu 11.10 and it is doing the same thing as 12.04. UTF8 > encoding and lc_collate = en_US.UTF-8 on mac and linux and american_usa on > windows which is the same thing as en_US.UTF-8. All are using UTF8 > encoding. > > I'd really like to use linux to host this DB (and have it hosted in vmware > ESXi), but the order is not coming out correct. I've got foreign titles of > books in this db, so I need the UTF8 all the way through. "C" won't cut it. I briefly worked for a library software company. They had defined their own ordering operator in the form of a function that mangled strings (they were using paradox database) As I understand it postgres allows you to define your own types with their own ordering operators even ignoring the issue of non-alphabetic symbols there are problems that should be addressed for sorting titles and names. Many of these are due to abbreviations which shouls be sorted asif expanded. some examples: St. Stephen of HungarySaint Stephen the MartyrSt. RitaStreet art.Wall StickersWall st.Wall StreetMcArthurMacArthur, DouglassMaccaroni Wye RoadY Road -- ⚂⚃ 100% natural
On Mon, May 14, 2012 at 5:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Wes James <comptekki@gmail.com> writes:This is not exactly unusual. You should first check to see if
> Why is there a different order on the different platforms.
lc_collate is set differently in the two installations --- but even if
it's the same, there are often platform-specific interpretations of
the sorting rules. (Not to mention that OS X is flat out broken when
it comes to sorting UTF8 data ...)
If you want consistent cross-platform results, "C" locale will get
that for you, but it's pretty stupid about non-ASCII characters.
For more info read
http://www.postgresql.org/docs/9.1/static/charset.html
regards, tom lane
I did pg_dump of data, set LC_ALL=C, did initdb and createdb, pgsql -f pgdump.sql db, etc, and now it is sorting the way I'd expect and for the needs I have.
Thanks all for your responses.
-wes
On lör, 2012-05-26 at 21:47 -0600, Wes James wrote: > on the ascii table here: > > http://www.ascii-code.com/ > > upper case letters should sort before lowercase letters. ASCII has nothing to do with how letters "should" be sorted. It is not a sorting standard, it is a character encoding standard. If you open a dictionary, you won't (usually) find all upper case letters before all lower case letters. The sort orders provided by glibc are, individual bugs aside, correct, where "correct" means, they implement various national and international standards. (Equally, the sort orders on Mac OS X are, degenerate cases aside, broken, because they don't implement any recognized sorting standard.) Of course, you are not required to like them, and there are many people who have particular issues with them, but you won't get far claiming they are wrong, if you don't have a recognized alternative source to point to that defines correctness.