BUG #18654: From fuzzystrmatch, levenshtein function with costs parameters produce incorrect results - Mailing list pgsql-bugs

From PG Bug reporting form
Subject BUG #18654: From fuzzystrmatch, levenshtein function with costs parameters produce incorrect results
Date
Msg-id 18654-c09f568d3ba6dfcd@postgresql.org
Whole thread Raw
Responses Re: BUG #18654: From fuzzystrmatch, levenshtein function with costs parameters produce incorrect results
List pgsql-bugs
The following bug has been logged on the website:

Bug reference:      18654
Logged by:          bjdev
Email address:      bjdev.gthb@laposte.net
PostgreSQL version: 15.4
Operating system:   Ubuntu 22.04.5 LTS
Description:

Hi,

The extension fuzzystrmatch propose an implementation of levenshtein
function.
There is one version with costs parameters
levenshtein(text source, text target, int ins_cost, int del_cost, int
sub_cost) returns int

But if we use this function with parameters other than 1 (the default) the
result is incorrect

SELECT levenshtein('horses','shorse',1,1,1) => 2 (correct)

SELECT levenshtein('horses','shorse',100,10,1) => 101 (INCORRECT)
The correct result is 6 (all the letter have to be substitute and it's not
possible to have a lower score with others operations)
Here, it's easy to verify manually but you can check that using python
implementation 
from Levenshtein import distance
distance("horses","shorse",weights=(100,10,1))
# => 6

SELECT levenshtein('horses','shorse',1,10,100) => 12 (INCORRECT)
The correct result is 11 (insert "s" first (+1) and remove last "s"(+10)
Here, it's easy to verify manually but you can check that using python
implementation 
from Levenshtein import distance
distance("horses","shorse",weights=(1,10,100))
# => 11

SELECT levenshtein('horses','shorse',1,10,1) => 2 (INCORRECT)
The correct result is 6
you can check that using python implementation 
from Levenshtein import distance
distance("horses","shorse",weights=(1,10,1))
# => 6

The use of cost parameters of the levenshtein function is therefore not
possible, which is a shame.

Regards


pgsql-bugs by date:

Previous
From: Ba Jinsheng
Date:
Subject: Re: Question of Parallel Hash Join on TPC-H Benchmark
Next
From: Maxim Boguk
Date:
Subject: Re: BUG #18644: ALTER PUBLICATION ... SET (publish_via_partition_root) wrong/undocumented behavior.