Thread: Slow fulltext query plan
Hi, I would like to understand why the following query execution don't use any fulltext indexes and takes more than 300s (using lot of temporary files): EXPLAIN ANALYZE SELECT hierarchy.id FROM hierarchy JOIN fulltext ON fulltext.id = hierarchy.id, TO_TSQUERY('whatever') query1, TO_TSQUERY('whatever') query2 WHERE (query1 @@ nx_to_tsvector(fulltext.fulltext)) OR (query2 @@ nx_to_tsvector(fulltext.fulltext_title)); The query plan is here: http://explain.depesz.com/s/YgP While if I replace the query2 by query1 in the second clause: EXPLAIN ANALYZE SELECT hierarchy.id FROM hierarchy JOIN fulltext ON fulltext.id = hierarchy.id, TO_TSQUERY('whatever') query1, TO_TSQUERY('whatever') query2 WHERE (query1 @@ nx_to_tsvector(fulltext.fulltext)) OR (query1 @@ nx_to_tsvector(fulltext.fulltext_title)); It is 5 order of magniude faster (15ms) using the gin indexes: http://explain.depesz.com/s/RLa The nx_to_tsvector is an immutable function with the following code: SELECT TO_TSVECTOR('english', SUBSTR($1, 1, 250000)) Here is the list of indexes: hierarchy: "hierarchy_pk" PRIMARY KEY, btree (id) fulltext: "fulltext_fulltext_idx" gin (nx_to_tsvector(fulltext::character varying)) fulltext: "fulltext_fulltext_title_idx" gin (nx_to_tsvector(fulltext_title::character varying)) fulltext and fulltext_title are text type. And some PostgreSQL configuration: PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu shared_buffers: 4GB effective_cache_size: 10GB work_mem: 20MB Thanks for your work and enlightenment ben
Benoit Delbosc <bdelbosc@nuxeo.com> writes: > EXPLAIN ANALYZE SELECT hierarchy.id > FROM hierarchy > JOIN fulltext ON fulltext.id = hierarchy.id, > TO_TSQUERY('whatever') query1, > TO_TSQUERY('whatever') query2 > WHERE (query1 @@ nx_to_tsvector(fulltext.fulltext)) OR (query2 @@ > nx_to_tsvector(fulltext.fulltext_title)); Is there a reason why you're writing the query in such a non-straightforward way, rather than just EXPLAIN ANALYZE SELECT hierarchy.id FROM hierarchy JOIN fulltext ON fulltext.id = hierarchy.id WHERE (TO_TSQUERY('whatever') @@ nx_to_tsvector(fulltext.fulltext)) OR (TO_TSQUERY('whatever') @@ nx_to_tsvector(fulltext.fulltext_title)); ? regards, tom lane
On 13/04/2012 00:25, Tom Lane wrote: > Benoit Delbosc<bdelbosc@nuxeo.com> writes: >> EXPLAIN ANALYZE SELECT hierarchy.id >> FROM hierarchy >> JOIN fulltext ON fulltext.id = hierarchy.id, >> TO_TSQUERY('whatever') query1, >> TO_TSQUERY('whatever') query2 >> WHERE (query1 @@ nx_to_tsvector(fulltext.fulltext)) OR (query2 @@ >> nx_to_tsvector(fulltext.fulltext_title)); > Is there a reason why you're writing the query in such a > non-straightforward way, rather than just > > EXPLAIN ANALYZE SELECT hierarchy.id > FROM hierarchy > JOIN fulltext ON fulltext.id = hierarchy.id > WHERE (TO_TSQUERY('whatever') @@ nx_to_tsvector(fulltext.fulltext)) > OR (TO_TSQUERY('whatever') @@ nx_to_tsvector(fulltext.fulltext_title)); > > ? > This query is written by a framework, also I thought that is a common pattern that can be found in the documentation: http://www.postgresql.org/docs/9.1/interactive/textsearch-controls.html if you think this a wrong way to do it then I will try to fix the framework. btw your version takes 15ms :) Thanks ben
Benoit Delbosc <bdelbosc@nuxeo.com> writes: > On 13/04/2012 00:25, Tom Lane wrote: >> Is there a reason why you're writing the query in such a >> non-straightforward way, rather than just >> >> EXPLAIN ANALYZE SELECT hierarchy.id >> FROM hierarchy >> JOIN fulltext ON fulltext.id = hierarchy.id >> WHERE (TO_TSQUERY('whatever') @@ nx_to_tsvector(fulltext.fulltext)) >> OR (TO_TSQUERY('whatever') @@ nx_to_tsvector(fulltext.fulltext_title)); > This query is written by a framework, also I thought that is a common > pattern that can be found in the documentation: > http://www.postgresql.org/docs/9.1/interactive/textsearch-controls.html Well, "common pattern" would be stretching it. Anyway I've concluded that this is in fact a planner bug. There will be a fix in 9.2, but I'm not going to take the risk of back-patching it, so you might want to think about changing that framework. regards, tom lane
Tom Lane wrote: > Benoit Delbosc<bdelbosc@nuxeo.com> writes: >> On 13/04/2012 00:25, Tom Lane wrote: >>> Is there a reason why you're writing the query in such a >>> non-straightforward way, rather than just >>> >>> EXPLAIN ANALYZE SELECT hierarchy.id >>> FROM hierarchy >>> JOIN fulltext ON fulltext.id = hierarchy.id >>> WHERE (TO_TSQUERY('whatever') @@ nx_to_tsvector(fulltext.fulltext)) >>> OR (TO_TSQUERY('whatever') @@ nx_to_tsvector(fulltext.fulltext_title)); >> >> This query is written by a framework, also I thought that is a common >> pattern that can be found in the documentation: >> http://www.postgresql.org/docs/9.1/interactive/textsearch-controls.html > > Well, "common pattern" would be stretching it. Anyway I've concluded > that this is in fact a planner bug. There will be a fix in 9.2, but I'm > not going to take the risk of back-patching it, so you might want to > think about changing that framework. FYI the reason why we have queries that look like what Benoit describes is that we often use the query alias twice, once for TO_TSVECTOR and once for TS_RANK_CD, for instance: SELECT hierarchy.id, TS_RANK_CD(fulltext, query1, 32) as nxscore FROM hierarchy JOIN fulltext ON fulltext.id = hierarchy.id, TO_TSQUERY('whatever') query1, TO_TSQUERY('whatever') query2 WHERE (query1 @@ nx_to_tsvector(fulltext.fulltext)) OR (query2 @@ nx_to_tsvector(fulltext.fulltext_title)) ORDER BY nxscore DESC; (as is also described in the doc mentioned btw). Florent -- Florent Guillaume, Director of R&D, Nuxeo Open Source, Java EE based, Enterprise Content Management (ECM) http://www.nuxeo.com http://www.nuxeo.org +33 1 40 33 79 87