Re: workaround for expensive KNN? - Mailing list pgsql-hackers
From | Oleg Bartunov |
---|---|
Subject | Re: workaround for expensive KNN? |
Date | |
Msg-id | Pine.LNX.4.64.1104081721150.9772@sn.sai.msu.ru Whole thread Raw |
In response to | workaround for expensive KNN? (PostgreSQL - Hans-Jürgen Schönig<postgres@cybertec.at>) |
Responses |
Re: workaround for expensive KNN?
|
List | pgsql-hackers |
Probably, you miss two-columnt index. From my early post: http://www.sai.msu.su/~megera/wiki/knngist =# CREATE INDEX spots_idx ON spots USING knngist (coordinates, to_tsvector('french',address)); =# SELECT id, address, (coordinates <-> '(2.29470491409302,48.858263472125)'::point) AS dist FROM spots WHERE coordinates >< '(2.29470491409302,48.858263472125)'::point AND to_tsvector('french',address) @@ to_tsquery('french','mars') LIMIT 10; id | address | dist ---------+-------------------------------------------------------------+--------------------- 366096 | 1st Floor Tour Eiffel| Champs de Mars, Paris 75007, France | 2.32488941293945e-05 4356328 | r Champ de Mars 75007 PARIS | 0.00421854756964406 5200167 | Champ De Mars 75007 Paris | 0.004535645625872889301676 | Champ de Mars, 75007 Paris, | 0.00453564562587288 2152213 |16, ave Rapp, Champ de Mars, Tour Eiffel, Paris, France | 0.00624152097590896 1923818 | Champ de Mars Paris, France | 0.00838214733539654 5165953 | 39 Rue Champ De Mars Paris, France | 0.00874410234569529 7395870 | 39 Rue Champ De Mars Paris, France | 0.008744102345695294358671 | 32 Rue Champ De Mars Paris, France | 0.00876089659276339 1923742 |12 rue du Champ de Mars Paris, France | 0.00876764731845995 (10 rows) Time: 7.859 ms =# EXPLAIN (COSTS OFF) SELECT id, address FROM spots WHERE coordinates >< '(2.29470491409302,48.858263472125)'::point AND to_tsvector('french',address) @@ to_tsquery('french','mars') LIMIT 10; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Limit -> Index Scan using spots_idx on spots Index Cond: ((coordinates >< '(2.29470491409302,48.858263472125)'::point)AND (to_tsvector('french'::regconfig, address) @@ '''mar'''::tsquery)) (3 rows) On Fri, 8 Apr 2011, PostgreSQL - Hans-J?rgen Sch?nig wrote: > hello all ... > > given oleg's posting before i also wanted to fire up some KNN related question. > let us consider a simple example. i got some million lines and i want all rows matching a tsquery sorted by price. > i did some tests: > > test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german', title)@@ to_tsquery('german', 'iphone') ORDER BY int_price <-> 0 LIMIT 10; > QUERY PLAN > > ----------------------------------------------------------------------------------------------------------------------------- > -------------------------------------- > Limit (cost=0.00..41.11 rows=10 width=16) (actual time=36391.717..45542.590 rows=10 loops=1) > Buffers: shared hit=9 read=5004 > -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..13251.91 rows=3224 width=16) (actualtime= > 36391.715..45542.573 rows=10 loops=1) > Index Cond: (to_tsvector('german'::regconfig, title) @@ '''iphon'''::tsquery) > Order By: (int_price <-> 0::bigint) > Buffers: shared hit=9 read=5004 > Total runtime: 45542.676 ms > (7 rows) > > test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german', title)@@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 10; > QUERY PLAN > > ----------------------------------------------------------------------------------------------------------------------------- > ------------------------------------- > Limit (cost=0.00..41.03 rows=10 width=16) (actual time=7243.526..10935.227 rows=10 loops=1) > Buffers: shared hit=3 read=2316 > -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..29762.61 rows=7255 width=16) (actualtime= > 7243.524..10935.217 rows=10 loops=1) > Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery) > Order By: (int_price <-> 0::bigint) > Buffers: shared hit=3 read=2316 > Total runtime: 10935.265 ms > (7 rows) > > test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german', title)@@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 1; > QUERY PLAN > > ----------------------------------------------------------------------------------------------------------------------------- > ------------------------------- > Limit (cost=0.00..4.10 rows=1 width=16) (actual time=28.527..28.528 rows=1 loops=1) > Buffers: shared hit=1 read=1577 > -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..29762.61 rows=7255 width=16) (actualtime= > 28.525..28.525 rows=1 loops=1) > Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery) > Order By: (int_price <-> 0::bigint) > Buffers: shared hit=1 read=1577 > Total runtime: 28.558 ms > (7 rows) > > > under any circumstances - there is no way to reduce the number of buffers needed for a query like that. > if everything is cached this is still ok but as soon as you have to take a single block from disk you will die a painfulrandom I/O death. > is there any alternative which does not simply die when i try to achieve what i want? > > the use case is quite simple: all products with a certain word (10 cheapest or so). > > is there any alternative approach to this? > i was putting some hope into KNN but it seems it needs too much random I/O :(. > > many thanks, > > hans > > -- > Cybertec Sch?nig & Sch?nig GmbH > Gr?hrm?hlgasse 26 > A-2700 Wiener Neustadt, Austria > Web: http://www.postgresql-support.de > > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-hackers by date: