workaround for expensive KNN? - Mailing list pgsql-hackers
From | PostgreSQL - Hans-Jürgen Schönig |
---|---|
Subject | workaround for expensive KNN? |
Date | |
Msg-id | 70FD80C7-5D7A-4EA3-9712-72DB43BE0912@cybertec.at Whole thread Raw |
Responses |
Re: workaround for expensive KNN?
|
List | pgsql-hackers |
hello all ... given oleg's posting before i also wanted to fire up some KNN related question. let us consider a simple example. i got some million lines and i want all rows matching a tsquery sorted by price. i did some tests: test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german', title)@@ to_tsquery('german', 'iphone') ORDER BY int_price <-> 0 LIMIT 10; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- --------------------------------------Limit (cost=0.00..41.11 rows=10 width=16) (actual time=36391.717..45542.590 rows=10loops=1) Buffers: shared hit=9 read=5004 -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..13251.91rows=3224 width=16) (actual time= 36391.715..45542.573 rows=10 loops=1) Index Cond: (to_tsvector('german'::regconfig, title) @@ '''iphon'''::tsquery) Order By: (int_price <-> 0::bigint) Buffers: shared hit=9 read=5004Total runtime: 45542.676ms (7 rows) test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german', title)@@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 10; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- -------------------------------------Limit (cost=0.00..41.03 rows=10 width=16) (actual time=7243.526..10935.227 rows=10loops=1) Buffers: shared hit=3 read=2316 -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..29762.61rows=7255 width=16) (actual time= 7243.524..10935.217 rows=10 loops=1) Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery) Order By: (int_price <-> 0::bigint) Buffers: shared hit=3 read=2316Total runtime: 10935.265ms (7 rows) test=# explain (analyze true, buffers true, costs true) SELECT id FROM product.t_product WHERE to_tsvector('german', title)@@ to_tsquery('german', 'handy') ORDER BY int_price <-> 0 LIMIT 1; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- -------------------------------Limit (cost=0.00..4.10 rows=1 width=16) (actual time=28.527..28.528 rows=1 loops=1) Buffers:shared hit=1 read=1577 -> Index Scan using idx_product_t_product_titleprice on t_product (cost=0.00..29762.61rows=7255 width=16) (actual time= 28.525..28.525 rows=1 loops=1) Index Cond: (to_tsvector('german'::regconfig, title) @@ '''handy'''::tsquery) Order By: (int_price <-> 0::bigint) Buffers: shared hit=1 read=1577Total runtime: 28.558 ms (7 rows) under any circumstances - there is no way to reduce the number of buffers needed for a query like that. if everything is cached this is still ok but as soon as you have to take a single block from disk you will die a painfulrandom I/O death. is there any alternative which does not simply die when i try to achieve what i want? the use case is quite simple: all products with a certain word (10 cheapest or so). is there any alternative approach to this? i was putting some hope into KNN but it seems it needs too much random I/O :(. many thanks, hans -- Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de
pgsql-hackers by date: