BUG #8821: pg_trgm segfault with Turkish locale database - Mailing list pgsql-bugs
From | ufuk@paralaus.com |
---|---|
Subject | BUG #8821: pg_trgm segfault with Turkish locale database |
Date | |
Msg-id | 20140113125641.30125.40578@wrigleys.postgresql.org Whole thread Raw |
Responses |
Re: BUG #8821: pg_trgm segfault with Turkish locale database
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 8821 Logged by: Ufuk Kayserilioglu Email address: ufuk@paralaus.com PostgreSQL version: 9.3.2 Operating system: Linux - Ubuntu x86_64 Description: PLATFORM: -------- PostgreSQL 9.3.2 installed from precise-pgdg apt repository on Ubuntu 12.04.3 LTS and 13.10 x86_64 SETUP: ----- Given a database with encoding UTF-8, locale tr_TR.UTF-8, 'pg_trgm' extension enabled, and the following setup: CREATE TABLE car_makers (maker TEXT); INSERT INTO car_makers VALUES ('AUDI'), ('MINI'); Run any one of the following commands: - SELECT maker <-> 'MAZDA' FROM car_makers; - SELECT similarity(maker, 'MAZDA') FROM car_makers; - SELECT show_trgm('III'); ACTUAL OUTCOME: -------------- The database will crash and the log will output something along the lines of: LOG: server process (PID 24941) was terminated by signal 11: Segmentation fault DETAIL: Failed process was running: SELECT maker <-> 'MAZDA' FROM car_makers; LOG: terminating any other active server processes WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. FATAL: the database system is in recovery mode LOG: all server processes terminated; reinitializing EXPECTED OUTCOME: ---------------- The commands return a result and the database process stays up. ANALYSIS: -------- I initially ran into this issue on a table with more records and have narrowed the case down to rows that have 4 letter words that contain the uppercase I character (that's why only 'AUDI' and 'MINI' in the example). I am not sure why comparison with 'MAZDA' triggers the crash (for example, a comparison with 'MAZDA ' does not), but it seems other 5 char strings also trigger it (eg. 'ABCDE' or '12345'). Also, there are other, longer comparison strings like 'RENAULT-OYAK ' that also trigger it. As I was trying to understand what exactly causes the problem, I tried to understand the trigrams generated by the values in the table and various test strings. I happened on the other crash during that experimentation. As a result, it seems there is a problem with trigrams generation and/or comparison when the Turkish locale is being used. The issue ultimately seems to be linked to the fact that the lowercase equivalent of "I" is "ı" (i without the dot), and the trigram codes does the proper conversion to lowercase which turns the string into a multibyte string. The same problem does NOT exist on Mac OSX 10.9.1 with PostgreSQL installed via Postgres.app. However, OSX has broken tr_TR locale support to begin with (for example, sorting does not work properly.) WORKAROUND: ---------- I was not able to find a reliable workaround since the code that does the lowercase conversion seems to only take into consideration the locale of the working database (which in my case needs to be tr_TR). Specifying an 'en_US' collation on the 'maker' column or an explicit collation specifier in the query do not work.
pgsql-bugs by date: