Re: Searching for pgweb - Mailing list pgsql-www
From | Magnus Hagander |
---|---|
Subject | Re: Searching for pgweb |
Date | |
Msg-id | CABUevEzPfCtbH1Qg9nDQNkwgzw2vUqg7yQgCEgygpRy4f45_HQ@mail.gmail.com Whole thread Raw |
In response to | Re: Searching for pgweb (Oleg Bartunov <obartunov@gmail.com>) |
Responses |
Re: Searching for pgweb
|
List | pgsql-www |
On Fri, Mar 31, 2017 at 2:46 PM, Oleg Bartunov <obartunov@gmail.com> wrote:
-- On Fri, Mar 31, 2017 at 8:04 AM, Magnus Hagander <magnus@hagander.net> wrote:On Wed, Mar 29, 2017 at 3:55 PM, Oleg Bartunov <obartunov@gmail.com> wrote:On 29 Mar 2017 09:49, "Magnus Hagander" <magnus@hagander.net> wrote:On Fri, Mar 24, 2017 at 8:56 AM, Oleg Bartunov <obartunov@gmail.com> wrote:On Wed, Mar 22, 2017 at 7:51 PM, Magnus Hagander <magnus@hagander.net> wrote:Right now our main website search uses plainto_tsquery() to generate the searches.Should we consider switching that to phraseto_tsquery() now that we have phrase searching?
+1Also, I suggest to use new parser, which better works _ and -, for example:
1.
select ts_parse('tsparser', 'btree_gin');
ts_parse
----------------
(16,btree_gin)
(11,btree)
(12,_)
(11,gin)
(4 rows)
select ts_parse('default', 'btree_gin');
ts_parse
-----------
(1,btree)
(12,_)
(1,gin)
(3 rows)Default parser produces too much noise, just check the difference:
https://postgrespro.ru/search/?area=version&q=btree_gin&prod uct=postgresql&version=9.6
https://www.postgresql.org/search/?u=%2Fdocs%2F9.6%2F&q=btre e_gin
2.
select ts_parse('tsparser', 'utc-5');
ts_parse
------------
(15,utc-5)
(11,utc)
(12,-)
(9,5)
(4 rows)
select ts_parse('default', 'utc-5');
ts_parse
----------
(1,utc)
(21,-5)
(2 rows)again, compare
https://postgrespro.ru/search/?area=version&q=utc-5&product= postgresql&version=9.6
https://www.postgresql.org/search/?u=%2Fdocs%2F9.6%2F&q=utc- 5 We have also better parsing of email, but I'm not sure we need it on postgres site.We'll publish soon on github, let me know if you know it.That sounds interesting. Two questions:1. Do you have plans for contributing this one for upstream postgres, or is it intended to be run separately?We would love to do this, but currently it's thereRight, found that one. But if your long term plan is to contribute it upstream, that makes it easier to rely on :)I'd love if you test it, give us feedback what to improve, what to fix. Then we could try to convince community to accept it.
I've applied this one for testing on the main website search.
At the same time I realized we didn't setweight() on the title on regular webpages, so I fixed that too (setting title to weight A).
Basically the conf is:
CREATE TEXT SEARCH DICTIONARY english_ispell (
TEMPLATE = pg_catalog.ispell,
dictfile = 'en_us', afffile = 'en_us', stopwords = 'english' );
CREATE TEXT SEARCH DICTIONARY pg_dict (
TEMPLATE = pg_catalog.synonym,
synonyms = 'pg_dict' );
CREATE TEXT SEARCH CONFIGURATION pg (
PARSER = tsparser );
ALTER TEXT SEARCH CONFIGURATION pg
ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
word, hword, hword_part
WITH pg_dict, english_ispell, english_stem;
ALTER TEXT SEARCH CONFIGURATION pg
ALTER MAPPING FOR email, file, float, host, hword_numpart, int, numhword, numword, sfloat, uint, url, url_path, version WITH simple;
If you have any other suggestions of things we should change there, please let me know!
So far, this is on the main website search and *not* on the archives search. Let's try it there first, but in the long run we should use similar configurations.