Thread: Calculating Minkowski distance between two rows
Greetings everyone,
--
I'm a novice plpgsql user.
For an application, I'm trying to write a user-defined function that takes a row of some table (let's say with k fields) and takes another row from another table (again with k fields); then calculate the Euclidean, Manhattan or generally Minkowski distance (with some p) and then return an integer.
I've written this:
CREATE FUNCTION euclidean_distance(row1 table_train, row2 table_test, OUT distance DOUBLE PRECISION) AS $$
DECLARE
tmp DOUBLE PRECISION;
BEGIN
FOR col IN SELECT column_name FROM information_schema.columns WHERE table_name=table_train LOOP
tmp := (row1.col - row2.col);
distance += tmp*tmp;
END LOOP;
distance := sqrt(distance);
END;
$$ LANGUAGE plpgsql;
Could anyone please help me fix this function so that I can pass any two rows of two tables (with same number of columns) and have their distance returned.
Best regards,
Babak Alipour
Babak Alipour ,
University of Florida
On 04/25/2016 07:07 AM, Babak Alipour wrote: > Greetings everyone, > > I'm a novice plpgsql user. > For an application, I'm trying to write a user-defined function that > takes a row of some table (let's say with k fields) and takes another > row from another table (again with k fields); then calculate the > Euclidean, Manhattan or generally Minkowski distance (with some p) and > then return an integer. > I've written this: > > CREATE FUNCTION euclidean_distance(row1 table_train, row2 table_test, > OUT distance DOUBLE PRECISION) AS $$ > DECLARE > tmp DOUBLE PRECISION; > BEGIN > FOR col IN SELECT column_name FROM information_schema.columns WHERE > table_name=table_train LOOP > tmp := (row1.col - row2.col); > distance += tmp*tmp; > END LOOP; > distance := sqrt(distance); > END; > $$ LANGUAGE plpgsql; > > Could anyone please help me fix this function so that I can pass any two > rows of two tables (with same number of columns) and have their distance > returned. You are already doing that, so do you mean any two rows of any two tables? > > Best regards, > Babak Alipour > > -- > */Babak Alipour ,/* > */University of Florida/* -- Adrian Klaver adrian.klaver@aklaver.com
That is correct. The function I've written only works when the two tables are named table_train and table_test; is it possible to generalize that to take in any two tables?
Thanks in advance.
>Babak
On Mon, Apr 25, 2016 at 10:24 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
You are already doing that, so do you mean any two rows of any two tables?On 04/25/2016 07:07 AM, Babak Alipour wrote:Greetings everyone,
I'm a novice plpgsql user.
For an application, I'm trying to write a user-defined function that
takes a row of some table (let's say with k fields) and takes another
row from another table (again with k fields); then calculate the
Euclidean, Manhattan or generally Minkowski distance (with some p) and
then return an integer.
I've written this:
CREATE FUNCTION euclidean_distance(row1 table_train, row2 table_test,
OUT distance DOUBLE PRECISION) AS $$
DECLARE
tmp DOUBLE PRECISION;
BEGIN
FOR col IN SELECT column_name FROM information_schema.columns WHERE
table_name=table_train LOOP
tmp := (row1.col - row2.col);
distance += tmp*tmp;
END LOOP;
distance := sqrt(distance);
END;
$$ LANGUAGE plpgsql;
Could anyone please help me fix this function so that I can pass any two
rows of two tables (with same number of columns) and have their distance
returned.
Best regards,
Babak Alipour
--
*/Babak Alipour ,/*
*/University of Florida/*
--
Adrian Klaver
adrian.klaver@aklaver.com
Babak Alipour ,
University of Florida
On 04/25/2016 07:26 AM, Babak Alipour wrote: > That is correct. The function I've written only works when the two > tables are named table_train and table_test; is it possible to > generalize that to take in any two tables? I'm heading out the door and off the top of my head: 1) Get tables names as text. 2) Get the row conditions as text. 3) Use EXECUTE to build a query string: http://www.postgresql.org/docs/9.5/interactive/plpgsql-statements.html#PLPGSQL-STATEMENTS-EXECUTING-DYN In particular: "A cleaner approach is to use format()'s %I specification for table or column names (strings separated by a newline are concatenated):" http://www.postgresql.org/docs/9.5/static/functions-string.html#FUNCTIONS-STRING-FORMAT 4) SELECT the result INTO a record variable: http://www.postgresql.org/docs/9.5/interactive/plpgsql-declarations.html#PLPGSQL-DECLARATION-RECORDS 5) Do your calculations 6) Not considered, validating that number of table columns are the same and the types are compatible. > > Thanks in advance. > >>Babak > > On Mon, Apr 25, 2016 at 10:24 AM, Adrian Klaver > <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>> wrote: > > On 04/25/2016 07:07 AM, Babak Alipour wrote: > > Greetings everyone, > > I'm a novice plpgsql user. > For an application, I'm trying to write a user-defined function that > takes a row of some table (let's say with k fields) and takes > another > row from another table (again with k fields); then calculate the > Euclidean, Manhattan or generally Minkowski distance (with some > p) and > then return an integer. > I've written this: > > CREATE FUNCTION euclidean_distance(row1 table_train, row2 > table_test, > OUT distance DOUBLE PRECISION) AS $$ > DECLARE > tmp DOUBLE PRECISION; > BEGIN > FOR col IN SELECT column_name FROM information_schema.columns WHERE > table_name=table_train LOOP > tmp := (row1.col - row2.col); > distance += tmp*tmp; > END LOOP; > distance := sqrt(distance); > END; > $$ LANGUAGE plpgsql; > > Could anyone please help me fix this function so that I can pass > any two > rows of two tables (with same number of columns) and have their > distance > returned. > > > You are already doing that, so do you mean any two rows of any two > tables? > > > Best regards, > Babak Alipour > > -- > */Babak Alipour ,/* > */University of Florida/* > > > > -- > Adrian Klaver > adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com> > > > > > -- > */Babak Alipour ,/* > */University of Florida/* -- Adrian Klaver adrian.klaver@aklaver.com
Hi: On Mon, Apr 25, 2016 at 4:26 PM, Babak Alipour <babak.alipour@gmail.com> wrote: > That is correct. The function I've written only works when the two tables > are named table_train and table_test; is it possible to generalize that to > take in any two tables? And only when all table_train columns are numbers AND table_test contains at least all of them AND they are numbers too. Wouldn't it be easier to use numeric arrays to represent coordinate vectors? Anyway, I'm not versed in all of this, wbut I would try to make a function to turn a table record to a numeric array and then write the numeric array version of the func and call them, divide and conquer. Francisco Olarte.
I am inclined to go with Francisco's solution Daniel -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Francisco Olarte Sent: April-25-16 10:46 To: Babak Alipour Cc: Adrian Klaver; pgsql-general@postgresql.org Subject: Re: [GENERAL] Calculating Minkowski distance between two rows Hi: On Mon, Apr 25, 2016 at 4:26 PM, Babak Alipour <babak.alipour@gmail.com> wrote: > That is correct. The function I've written only works when the two > tables are named table_train and table_test; is it possible to > generalize that to take in any two tables? And only when all table_train columns are numbers AND table_test contains at least all of them AND they are numbers too. Wouldn't it be easier to use numeric arrays to represent coordinate vectors? Anyway, I'm not versed in all of this, wbut I would try to make a function to turn a table record to a numeric array andthen write the numeric array version of the func and call them, divide and conquer. Francisco Olarte. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general