Re: Improving docs for strict_word_similarity() - Mailing list pgsql-docs
From | Alexander Korotkov |
---|---|
Subject | Re: Improving docs for strict_word_similarity() |
Date | |
Msg-id | CAPpHfdumsXfLUhtuiwDWU+Gf-KYkkqHCvMvRggYOugt-FBjfFg@mail.gmail.com Whole thread Raw |
In response to | Improving docs for strict_word_similarity() (Bruce Momjian <bruce@momjian.us>) |
Responses |
Re: Improving docs for strict_word_similarity()
Re: Improving docs for strict_word_similarity() |
List | pgsql-docs |
Hi, Bruce!
On Sat, May 26, 2018 at 7:56 PM Bruce Momjian <bruce@momjian.us> wrote:
While creating the release notes, I was confused by the description for
strict_word_similarity(), particularly "extent boundaries". The
attached patch clarifies, at least for me, how word_similarity() and
strict_word_similarity() differ.
Thank you for your efforts on improving documentation of pg_trgm.
However, I don't find all of them correct. I've following notes regarding
the edits you propose.
--- 112,119 ----
</entry>
<entry><type>real</type></entry>
<entry>
! Same as <function>word_similarity(text, text)</function>, but
! considers the set of trigrams to be of the same length.
</entry>
</row>
<row>
This doesn't look a correct description. In short, strict_word_similarity() is searching
for extent of words in the second string, which is best match for the first string.
So, this function takes care about using whole words from the second strings,
not parts of words. However, this is not matter of length of trigrams sets.
--- 164,182 ----
This function returns a value that can be approximately understood as the
greatest similarity between the first string and any substring of the second
string. However, this function does not add padding to the boundaries of
! the extent. Thus, the number of additional characters present in the
! second string is not considered, except for the mismatched word boundry.
</para>
This looks correct for me.
! The function <function>strict_word_similarity(text, text)</function>
! does consider additional characters in the second string. In the
! example above, <function>strict_word_similarity(text, text)</function>
! would use the full trigram for the second string when computing
! similarity, not just the part of the trigram that matches the
! first string. For example, it would use the <literal>{" w","
! wo","wor","ord","rds","ds "}</literal>, which corresponds to the whole
! word <literal>'words'</literal>.
After your edits, it looks like strict_word_similarity() matches full
set of first string trigrams to full set of second string trigrams. However,
this is description of just similarity() function. Actually,
strict_word_similarity() matches set of trigrams of first string to
set of trigrams of conjuncted subset of second string words.
--- 189,197 ----
<para>
Thus, the <function>strict_word_similarity(text, text)</function> function
! is useful for finding the similarity to whole words, while
<function>word_similarity(text, text)</function> is more suitable for
! finding the similarity for parts of words.
</para>
This also looks correct to me.
------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
pgsql-docs by date: