Home > mailing lists

workaround for expensive KNN? - Mailing list pgsql-hackers

From	PostgreSQL - Hans-Jürgen Schönig
Subject	workaround for expensive KNN?
Date	April 8, 2011 10:06:15
Msg-id	70FD80C7-5D7A-4EA3-9712-72DB43BE0912@cybertec.at Whole thread Raw
Responses	Re: workaround for expensive KNN?
List	pgsql-hackers

Tree view

hello all ...

given oleg's posting before i also wanted to fire up some KNN related question.
let us consider a simple example. i got some million lines and i want all rows matching a tsquery sorted by price.
i did some tests:

test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german',
title)@@ to_tsquery('german', 'iphone') ORDER BY int_price <-> 0 LIMIT 10;
                            QUERY PLAN                                                                             

-----------------------------------------------------------------------------------------------------------------------------
--------------------------------------Limit  (cost=0.00..41.11 rows=10 width=16) (actual time=36391.717..45542.590
rows=10loops=1)  Buffers: shared hit=9 read=5004  ->  Index Scan using idx_product_t_product_titleprice on t_product
(cost=0.00..13251.91rows=3224 width=16) (actual time= 
36391.715..45542.573 rows=10 loops=1)        Index Cond: (to_tsvector('german'::regconfig, title) @@
'''iphon'''::tsquery)       Order By: (int_price <-> 0::bigint)        Buffers: shared hit=9 read=5004Total runtime:
45542.676ms 
(7 rows)

test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german',
title)@@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 10;
                           QUERY PLAN                                                                            

-----------------------------------------------------------------------------------------------------------------------------
-------------------------------------Limit  (cost=0.00..41.03 rows=10 width=16) (actual time=7243.526..10935.227
rows=10loops=1)  Buffers: shared hit=3 read=2316  ->  Index Scan using idx_product_t_product_titleprice on t_product
(cost=0.00..29762.61rows=7255 width=16) (actual time= 
7243.524..10935.217 rows=10 loops=1)        Index Cond: (to_tsvector('german'::regconfig, title) @@
'''handy'''::tsquery)       Order By: (int_price <-> 0::bigint)        Buffers: shared hit=3 read=2316Total runtime:
10935.265ms 
(7 rows)

test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german',
title)@@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 1;
                       QUERY PLAN                                                                         

-----------------------------------------------------------------------------------------------------------------------------
-------------------------------Limit  (cost=0.00..4.10 rows=1 width=16) (actual time=28.527..28.528 rows=1 loops=1)
Buffers:shared hit=1 read=1577  ->  Index Scan using idx_product_t_product_titleprice on t_product
(cost=0.00..29762.61rows=7255 width=16) (actual time= 
28.525..28.525 rows=1 loops=1)        Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery)
  Order By: (int_price <-> 0::bigint)        Buffers: shared hit=1 read=1577Total runtime: 28.558 ms 
(7 rows)


under any circumstances - there is no way to reduce the number of buffers needed for a query like that.
if everything is cached this is still ok but as soon as you have to take a single block from disk you will die a
painfulrandom I/O death. 
is there any alternative which does not simply die when i try to achieve what i want?

the use case is quite simple: all products with a certain word (10 cheapest or so).

is there any alternative approach to this?
i was putting some hope into KNN but it seems it needs too much random I/O :(.
many thanks,
    hans

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de

pgsql-hackers by date:

From: PostgreSQL - Hans-Jürgen Schönig
Date: 08 April 2011, 10:06:11
Subject: Re: k-neighbourhood search in databases

From: Oleg Bartunov
Date: 08 April 2011, 10:22:59
Subject: Re: workaround for expensive KNN?

workaround for expensive KNN? - Mailing list pgsql-hackers

Previous

Next