Join problem - Mailing list pgsql-performance
From | Silke Trissl |
---|---|
Subject | Join problem |
Date | |
Msg-id | 408E9934.8060509@informatik.hu-berlin.de Whole thread Raw |
Responses |
Re: Join problem
Re: Join problem |
List | pgsql-performance |
Hi, I came across a very intriguing thing: I had to join two tables and in both tables I wanted to restrict the result set by some (text/varchar) attributes. Here is an example: Table "item" # 147 000 entries Column | Type | Modifiers ---------------+-----------------------+------------ id | integer | not null description | text | comment | text | not null order_id | integer | Table "orders" # 210 000 entries Column | Type | Modifiers -----------------+------------------------+----------- order_id | integer | order_name | character varying(255) | The tables have 147 000 and 210 000 entries, respectively. First I tried the following query, which took ages: (Query 1) EXPLAIN ANALYZE SELECT item.id FROM item, orders WHERE orders.order_name ~* 'Smit' AND item.description ~* 'CD' and orders.order_id = item.order_id; I found out, that the change of the operator from '~*' to '=' for the item.description brought a great boost in performance (425 secs to 1 sec!), but not in cost (Query plans at the end). (Query 2) EXPLAIN ANALYZE SELECT item.id FROM item, orders WHERE orders.order_name ~* 'Smit' AND item.description = 'CD' and orders.order_id = item.order_id; The main difference was that Query 2 used the Hash join instead of the Nested Loop, so I disabled the option 'NESTED LOOP' and got for Query 1 a similar time as for Query 2. Can anyone tell me, why in one case the Hash join and in the other the much worse Nested Loop is prefered? And my second question is, is there any possibility to execute the first query without disabling the Nested Loop first, but get the good performance of the Hash join? Many thanks in advance for your help or suggestions Silke QUERY PLANS: ##################################### Query 1: QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=0.00..28836.75 rows=1 width=4) (actual time=65350.780..452130.702 rows=6 loops=1) Join Filter: ("inner".order_id = "outer".order_id) -> Seq Scan on item (cost=0.00..28814.24 rows=1 width=8) (actual time=33.180..1365.190 rows=716 loops=1) Filter: (description ~* 'CD'::text) -> Seq Scan on orders (cost=0.00..22.50 rows=1 width=4) (actual time=21.644..629.500 rows=18 loops=716) Filter: ((order_name)::text ~* 'Smith'::text) Total runtime: 452130.782 ms ########################################################################### Query 2: QUERY PLAN ------------------------------------------------------------------------------------------------------------------ Hash Join (cost=22.50..28840.44 rows=4 width=4) (actual time=1187.798..1187.798 rows=0 loops=1) Hash Cond: ("outer".order_id = "inner".order_id) -> Seq Scan on item (cost=0.00..28814.24 rows=733 width=8) (actual time=542.737..542.737 rows=0 loops=1) Filter: (description = 'CD'::text) -> Hash (cost=22.50..22.50 rows=1 width=4) (actual time=645.042..645.042 rows=0 loops=1) -> Seq Scan on orders (cost=0.00..22.50 rows=1 width=4) (actual time=22.373..644.996 rows=18 loops=1) Filter: ((order_name)::text ~* 'Smith'::text) Total runtime: 1187.865 ms ############################################################################ Query 1 with 'set enable_nestloop to false' QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- Hash Join (cost=22.50..28836.75 rows=1 width=4) (actual time=1068.593..2003.330 rows=6 loops=1) Hash Cond: ("outer".item_id = "inner".item_id) -> Seq Scan on item (cost=0.00..28814.24 rows=1 width=8) (actual time=33.347..1357.073 rows=716 loops=1) Filter: (description ~* 'CD'::text) -> Hash (cost=22.50..22.50 rows=1 width=4) (actual time=645.287..645.287 rows=0 loops=1) -> Seq Scan on orders (cost=0.00..22.50 rows=1 width=4) (actual time=22.212..645.239 rows=18 loops=1) Filter: ((order_name)::text ~* 'CD'::text) Total runtime: 2003.409 ms
pgsql-performance by date: