Thread: Searching a gin index
Hi again,
I have set my configuration as default and I have created a GIN index on three columns, like this
create index textsearch_index on products using gin(strip( to_tsvector( 'my_config', title || '' || description || '' || name)))
Searching these columns the way I have
... AND (title || '' || description || '' || name) @@ plainto_tsquery('car')
seems not to be correct since it's taking as much time as non-indexed. Also changing the word car to cars returns nothing even though
select * from plainto_tsquery('cars')
returns 'car'
Could you please point me in the right direction ?
James
I have set my configuration as default and I have created a GIN index on three columns, like this
create index textsearch_index on products using gin(strip( to_tsvector( 'my_config', title || '' || description || '' || name)))
Searching these columns the way I have
... AND (title || '' || description || '' || name) @@ plainto_tsquery('car')
seems not to be correct since it's taking as much time as non-indexed. Also changing the word car to cars returns nothing even though
select * from plainto_tsquery('cars')
returns 'car'
Could you please point me in the right direction ?
James
James, you provide us a little useful information. If you really want help you need to provide all information about your setup, sample queries and explain analyze, test data illustrated your problem. I, personally, usually prefer to have copy-pasteable information. I think we all are quite busy, so don't expect to get help if you can't give yours attention. For example, this post would be reduced to the simple test query test=# select version(); version ------------------------------------------------------------------------ PostgreSQL 8.3.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.2.3 (1 row) test=# select strip(to_tsvector('english', 'cars and vehicles')) @@ plainto_tsquery('english', 'cars'); ?column? ---------- t If not, you have problem. Oleg On Fri, 6 Feb 2009, James Dooley wrote: > Hi again, > > I have set my configuration as default and I have created a GIN index on > three columns, like this > > create index textsearch_index on products using gin(strip( to_tsvector( > 'my_config', title || '' || description || '' || name))) > > Searching these columns the way I have > > ... AND (title || '' || description || '' || name) @@ plainto_tsquery('car') > > seems not to be correct since it's taking as much time as non-indexed. Also > changing the word car to cars returns nothing even though > > select * from plainto_tsquery('cars') > > returns 'car' > > Could you please point me in the right direction ? > > James > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
James Dooley wrote: > Hi again, > > I have set my configuration as default and I have created a GIN index on > three columns, like this > > create index textsearch_index on products using gin(strip( to_tsvector( > 'my_config', title || '' || description || '' || name))) > > Searching these columns the way I have > > ... AND (title || '' || description || '' || name) @@ plainto_tsquery('car') > > seems not to be correct since it's taking as much time as non-indexed. PG's planner isn't smart enough to transform a complex expression so as to use a functional index (which is what you've got). You need to mention the function explicitly. So, if you had: CREATE INDEX lowercase_idx ON mytable ( lower(mycolumn) ) You then need to search against "lower(mycolumn)" and not just expect the planner to notice that mycolumn="abc" could use the index. The following should work for you as a starting point: CREATE TABLE tsearch_tbl (id SERIAL, title text, body text, PRIMARY KEY (id)); INSERT INTO tsearch_tbl (title, body) SELECT 'title number ' || n, 'This is body number ' || n FROM generate_series(1,9999) n; ANALYSE tsearch_tbl; CREATE INDEX tsearch_tbl_words_idx ON tsearch_tbl USING gin ( to_tsvector('english', title || body) ); EXPLAIN ANALYSE SELECT * FROM tsearch_tbl WHERE to_tsvector('english', title || body) @@ to_tsquery('17'); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on tsearch_tbl (cost=4.34..34.76 rows=10 width=45) (actual time=0.067..0.067 rows=1 loops=1) Recheck Cond: (to_tsvector('english'::regconfig, (title || body)) @@ to_tsquery('17'::text)) -> Bitmap Index Scan on tsearch_tbl_words_idx (cost=0.00..4.34 rows=10 width=0) (actual time=0.059..0.059 rows=1 loops=1) Index Cond: (to_tsvector('english'::regconfig, (title || body)) @@ to_tsquery('17'::text)) Total runtime: 0.121 ms Note that you'll have problems if any of your text-fields contain nulls (since null || anything = null). Personally, unless I'm dealing with a large table, I like to have a separate tsvector column which I keep up to date with triggers. It makes it easier to debug problems. -- Richard Huxton Archonet Ltd
Oleg, but I am only interested in whether or not the syntax of my search-query is correct.
Having created the index as I mentioned above, would the correct way of searching and using that index be
... AND (title || '' || description || '' || name) @@ plainto_tsquery('car')
or should it be as Richard just mentioned
... AND to_tsvector(title || '' || description || '' || name) @@ plainto_tsquery('car')
or some other way ?
Having created the index as I mentioned above, would the correct way of searching and using that index be
... AND (title || '' || description || '' || name) @@ plainto_tsquery('car')
or should it be as Richard just mentioned
... AND to_tsvector(title || '' || description || '' || name) @@ plainto_tsquery('car')
or some other way ?
On Fri, Feb 6, 2009 at 3:30 PM, Richard Huxton <dev@archonet.com> wrote:
James Dooley wrote:PG's planner isn't smart enough to transform a complex expression so as
> Hi again,
>
> I have set my configuration as default and I have created a GIN index on
> three columns, like this
>
> create index textsearch_index on products using gin(strip( to_tsvector(
> 'my_config', title || '' || description || '' || name)))
>
> Searching these columns the way I have
>
> ... AND (title || '' || description || '' || name) @@ plainto_tsquery('car')
>
> seems not to be correct since it's taking as much time as non-indexed.
to use a functional index (which is what you've got). You need to
mention the function explicitly.
So, if you had:
CREATE INDEX lowercase_idx ON mytable ( lower(mycolumn) )
You then need to search against "lower(mycolumn)" and not just expect
the planner to notice that mycolumn="abc" could use the index.
The following should work for you as a starting point:
CREATE TABLE tsearch_tbl (id SERIAL, title text, body text,
PRIMARY KEY (id));
INSERT INTO tsearch_tbl (title, body)
SELECT 'title number ' || n, 'This is body number ' || n
FROM generate_series(1,9999) n;
ANALYSE tsearch_tbl;
CREATE INDEX tsearch_tbl_words_idx ON tsearch_tbl USING gin (
to_tsvector('english', title || body) );
EXPLAIN ANALYSE SELECT * FROM tsearch_tbl WHERE to_tsvector('english',
title || body) @@ to_tsquery('17');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tsearch_tbl (cost=4.34..34.76 rows=10 width=45)
(actual time=0.067..0.067 rows=1 loops=1)
Recheck Cond: (to_tsvector('english'::regconfig, (title || body)) @@
to_tsquery('17'::text))
-> Bitmap Index Scan on tsearch_tbl_words_idx (cost=0.00..4.34
rows=10 width=0) (actual time=0.059..0.059 rows=1 loops=1)
Index Cond: (to_tsvector('english'::regconfig, (title || body))
@@ to_tsquery('17'::text))
Total runtime: 0.121 ms
Note that you'll have problems if any of your text-fields contain nulls
(since null || anything = null).
Personally, unless I'm dealing with a large table, I like to have a
separate tsvector column which I keep up to date with triggers. It makes
it easier to debug problems.
--
Richard Huxton
Archonet Ltd
James - don't repeat the whole of the previous message in your replies, trim it to the relevant part. The message is already archived on the list. James Dooley wrote: > Oleg, but I am only interested in whether or not the syntax of my > search-query is correct. > > Having created the index as I mentioned above, would the correct way of > searching and using that index be > > ... AND (title || '' || description || '' || name) @@ plainto_tsquery('car') That looks like a correct syntax, but it's unlikely to use your index - your index was created on to_tsvector('my_config', ...) > or should it be as Richard just mentioned > > ... AND to_tsvector(title || '' || description || '' || name) @@ > plainto_tsquery('car') > > or some other way ? Try playing around with the example I gave, and try variations on it. Do you understand why you have to supply a config when creating the functional index? Do you understand the difference between plainto_tsquery() and to_tsquery()? -- Richard Huxton Archonet Ltd
James, syntax is documented on http://www.postgresql.org/docs/8.3/static/textsearch-tables.html#TEXTSEARCH-TABLES-SEARCH and in the Introduction http://www.postgresql.org/docs/8.3/static/textsearch-intro.html#TEXTSEARCH-MATCHING text-search operator was specified for tsvector @@ tsquery. You did wrong twice, you didn't specified type tsvector and you forgot about coalesce. There is general rule for partial indexes - you should use the same expression in query as you used in create index command. Oleg On Fri, 6 Feb 2009, James Dooley wrote: > Oleg, but I am only interested in whether or not the syntax of my > search-query is correct. > > Having created the index as I mentioned above, would the correct way of > searching and using that index be > > ... AND (title || '' || description || '' || name) @@ plainto_tsquery('car') > > or should it be as Richard just mentioned > > ... AND to_tsvector(title || '' || description || '' || name) @@ > plainto_tsquery('car') > > or some other way ? > > > > On Fri, Feb 6, 2009 at 3:30 PM, Richard Huxton <dev@archonet.com> wrote: > >> James Dooley wrote: >>> Hi again, >>> >>> I have set my configuration as default and I have created a GIN index on >>> three columns, like this >>> >>> create index textsearch_index on products using gin(strip( to_tsvector( >>> 'my_config', title || '' || description || '' || name))) >>> >>> Searching these columns the way I have >>> >>> ... AND (title || '' || description || '' || name) @@ >> plainto_tsquery('car') >>> >>> seems not to be correct since it's taking as much time as non-indexed. >> >> PG's planner isn't smart enough to transform a complex expression so as >> to use a functional index (which is what you've got). You need to >> mention the function explicitly. >> >> So, if you had: >> CREATE INDEX lowercase_idx ON mytable ( lower(mycolumn) ) >> You then need to search against "lower(mycolumn)" and not just expect >> the planner to notice that mycolumn="abc" could use the index. >> >> The following should work for you as a starting point: >> >> CREATE TABLE tsearch_tbl (id SERIAL, title text, body text, >> PRIMARY KEY (id)); >> >> INSERT INTO tsearch_tbl (title, body) >> SELECT 'title number ' || n, 'This is body number ' || n >> FROM generate_series(1,9999) n; >> >> ANALYSE tsearch_tbl; >> >> CREATE INDEX tsearch_tbl_words_idx ON tsearch_tbl USING gin ( >> to_tsvector('english', title || body) ); >> >> EXPLAIN ANALYSE SELECT * FROM tsearch_tbl WHERE to_tsvector('english', >> title || body) @@ to_tsquery('17'); >> QUERY PLAN >> >> >> ------------------------------------------------------------------------------------------------------------------------------- >> Bitmap Heap Scan on tsearch_tbl (cost=4.34..34.76 rows=10 width=45) >> (actual time=0.067..0.067 rows=1 loops=1) >> Recheck Cond: (to_tsvector('english'::regconfig, (title || body)) @@ >> to_tsquery('17'::text)) >> -> Bitmap Index Scan on tsearch_tbl_words_idx (cost=0.00..4.34 >> rows=10 width=0) (actual time=0.059..0.059 rows=1 loops=1) >> Index Cond: (to_tsvector('english'::regconfig, (title || body)) >> @@ to_tsquery('17'::text)) >> Total runtime: 0.121 ms >> >> >> Note that you'll have problems if any of your text-fields contain nulls >> (since null || anything = null). >> >> Personally, unless I'm dealing with a large table, I like to have a >> separate tsvector column which I keep up to date with triggers. It makes >> it easier to debug problems. >> >> -- >> Richard Huxton >> Archonet Ltd >> > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
On Fri, Feb 6, 2009 at 4:01 PM, Richard Huxton <dev@archonet.com> wrote:
Got it, I will use the same function and parameters as the one created on the index. As a matter a fact now the index seems to be used after I added the strip(to_tsvector(...))
Good!!
Yes, I do understand the difference.
James - don't repeat the whole of the previous message in your replies,
trim it to the relevant part. The message is already archived on the list.That looks like a correct syntax, but it's unlikely to use your index -
James Dooley wrote:
> Oleg, but I am only interested in whether or not the syntax of my
> search-query is correct.
>
> Having created the index as I mentioned above, would the correct way of
> searching and using that index be
>
> ... AND (title || '' || description || '' || name) @@ plainto_tsquery('car')
your index was created on to_tsvector('my_config', ...)
Got it, I will use the same function and parameters as the one created on the index. As a matter a fact now the index seems to be used after I added the strip(to_tsvector(...))
Good!!
Try playing around with the example I gave, and try variations on it. Do
> or should it be as Richard just mentioned
>
> ... AND to_tsvector(title || '' || description || '' || name) @@
> plainto_tsquery('car')
>
> or some other way ?
you understand why you have to supply a config when creating the
functional index? Do you understand the difference between
plainto_tsquery() and to_tsquery()?
Yes, I do understand the difference.
--
Richard Huxton
Archonet Ltd
Now everything is working, I added my config to
plainto_tsquery('my_config', 'cars')
and I get results back. I find that strange though since I set the default to be by config, but appearantly that was just temporary and for the PID that changed it. Running from my app seemed to keep using the old configuration.
Thanks all for your help.
Now I have a complete and working fulltext search!
PostgreSQL rules!!
James Dooley wrote: > > Now everything is working, I added my config to > > plainto_tsquery('my_config', 'cars') > > and I get results back. I find that strange though since I set the default > to be by config, but appearantly that was just temporary and for the PID > that changed it. Running from my app seemed to keep using the old > configuration. Not quite clear on what you mean here, but you always need to specify the config to use a functional index. The default config when you run the query isn't guaranteed to be the same as the default when you create the index. Actually, you can change it mid-transaction if you like. If you want to set the default permanently do something like "ALTER DATABASE mydb SET .... = ..." -- Richard Huxton Archonet Ltd
BTW, Oleg I don't need colasque since those values can't be null.
On Fri, Feb 6, 2009 at 4:11 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:
James,
syntax is documented on
http://www.postgresql.org/docs/8.3/static/textsearch-tables.html#TEXTSEARCH-TABLES-SEARCH
and in the Introduction http://www.postgresql.org/docs/8.3/static/textsearch-intro.html#TEXTSEARCH-MATCHING
text-search operator was specified for tsvector @@ tsquery.
You did wrong twice, you didn't specified type tsvector and you forgot about coalesce.
There is general rule for partial indexes - you should use the same expression in query as you used in create index command.
Oleg
On Fri, 6 Feb 2009, James Dooley wrote:Oleg, but I am only interested in whether or not the syntax of my
search-query is correct.
Having created the index as I mentioned above, would the correct way of
searching and using that index be
... AND (title || '' || description || '' || name) @@ plainto_tsquery('car')
or should it be as Richard just mentioned
... AND to_tsvector(title || '' || description || '' || name) @@
plainto_tsquery('car')
or some other way ?
On Fri, Feb 6, 2009 at 3:30 PM, Richard Huxton <dev@archonet.com> wrote:James Dooley wrote:Hi again,plainto_tsquery('car')
I have set my configuration as default and I have created a GIN index on
three columns, like this
create index textsearch_index on products using gin(strip( to_tsvector(
'my_config', title || '' || description || '' || name)))
Searching these columns the way I have
... AND (title || '' || description || '' || name) @@
seems not to be correct since it's taking as much time as non-indexed.
PG's planner isn't smart enough to transform a complex expression so as
to use a functional index (which is what you've got). You need to
mention the function explicitly.
So, if you had:
CREATE INDEX lowercase_idx ON mytable ( lower(mycolumn) )
You then need to search against "lower(mycolumn)" and not just expect
the planner to notice that mycolumn="abc" could use the index.
The following should work for you as a starting point:
CREATE TABLE tsearch_tbl (id SERIAL, title text, body text,
PRIMARY KEY (id));
INSERT INTO tsearch_tbl (title, body)
SELECT 'title number ' || n, 'This is body number ' || n
FROM generate_series(1,9999) n;
ANALYSE tsearch_tbl;
CREATE INDEX tsearch_tbl_words_idx ON tsearch_tbl USING gin (
to_tsvector('english', title || body) );
EXPLAIN ANALYSE SELECT * FROM tsearch_tbl WHERE to_tsvector('english',
title || body) @@ to_tsquery('17');
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tsearch_tbl (cost=4.34..34.76 rows=10 width=45)
(actual time=0.067..0.067 rows=1 loops=1)
Recheck Cond: (to_tsvector('english'::regconfig, (title || body)) @@
to_tsquery('17'::text))
-> Bitmap Index Scan on tsearch_tbl_words_idx (cost=0.00..4.34
rows=10 width=0) (actual time=0.059..0.059 rows=1 loops=1)
Index Cond: (to_tsvector('english'::regconfig, (title || body))
@@ to_tsquery('17'::text))
Total runtime: 0.121 ms
Note that you'll have problems if any of your text-fields contain nulls
(since null || anything = null).
Personally, unless I'm dealing with a large table, I like to have a
separate tsvector column which I keep up to date with triggers. It makes
it easier to debug problems.
--
Richard Huxton
Archonet LtdRegards,
Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
Richard, I was refereing to the plainto_tsquery
Oleg, I only need colasque if any of those columns can be null, which is not the case here. Correct?
Oleg, I only need colasque if any of those columns can be null, which is not the case here. Correct?
On Fri, 6 Feb 2009, James Dooley wrote: > Richard, I was refereing to the plainto_tsquery > > Oleg, I only need colasque if any of those columns can be null, which is not > the case here. Correct? yes, but for safety and generality I'd use coalesce. Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83