Thread: BUG #5219: Segfault in to_tsvector
The following bug has been logged online: Bug reference: 5219 Logged by: Kenaniah Cerny Email address: kenaniah@gmail.com PostgreSQL version: 8.4.1 Operating system: Centos5.2 -- Linux 2.6.18-92.1.10.el5 #1 SMP i686 athlon i386 GNU/Linux Description: Segfault in to_tsvector Details: Full backtrace: http://pgsql.privatepaste.com/5411abf8f3 The issue takes place running this query: http://pgsql.privatepaste.com/35064cbba8 Crash is attributed to this index definition: CREATE INDEX "anime_titles_idx_name_simple_text" ON "public"."anime_titles" USING gin ((to_tsvector('simple'::regconfig, name))); I believe the issue is caused by possibly non-UTF-8 data. Both the server and the client (a PHP script using PDO's pgsql driver) are using UTF-8. The string causing this issue is stored in the database in a text field and looks like this: http://s801.photobucket.com/albums/yy299/kenaniah972/?action=view¤t=is sue.png After output into an HTML input field and resubmission through firefox, the string that is passed through to the DB looks like this: http://s801.photobucket.com/albums/yy299/kenaniah972/?action=view¤t=su bmitted.png (The characters were manually omitted in submission) I don't profess to know anything about encodings, but I don't think this is valid UTF-8 input. I might be wrong. All I do know is that this causes the to_tsvector part of the gin index to throw a segfault in the insert statement, rather than returning an invalid UTF-8 input error or just plain working.
"Kenaniah Cerny" <kenaniah@gmail.com> writes: > Description: Segfault in to_tsvector > Full backtrace: http://pgsql.privatepaste.com/5411abf8f3 This looks like the known problem that ts_stat fails on an empty tsvector. Can you try this patch http://archives.postgresql.org/pgsql-committers/2009-10/msg00056.php or just pick up 8.4 branch tip from CVS? If that does fix it, I don't think this is an encoding problem, but rather that the name doesn't contain anything that is recognized as a word by the textsearch configuration you're using. regards, tom lane
Thanks, The patch took some massaging, but took care of the issue when applied to the 8.4.1 source. Kenaniah Cerny On Sat, Nov 28, 2009 at 7:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kenaniah Cerny" <kenaniah@gmail.com> writes: > > Description: Segfault in to_tsvector > > Full backtrace: http://pgsql.privatepaste.com/5411abf8f3 > > This looks like the known problem that ts_stat fails on an empty > tsvector. Can you try this patch > http://archives.postgresql.org/pgsql-committers/2009-10/msg00056.php > or just pick up 8.4 branch tip from CVS? > > If that does fix it, I don't think this is an encoding problem, > but rather that the name doesn't contain anything that is recognized > as a word by the textsearch configuration you're using. > > regards, tom lane >