LIKE query problem - Mailing list pgsql-performance
From | Marc McIntyre |
---|---|
Subject | LIKE query problem |
Date | |
Msg-id | 450F303A.6090603@squiz.net Whole thread Raw |
Responses |
Re: LIKE query problem
|
List | pgsql-performance |
I'm having a problem with a simple query, that finds children of a node, using a materialized path to the node. The query: select n1.id from nodes n1, nodes n2 where n1.path like n2.path || '%' and n2.id = 14; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..120256.56 rows=17517 width=4) (actual time=0.901..953.485 rows=7 loops=1) Join Filter: (("inner".path)::text ~~ (("outer".path)::text || '%'::text)) -> Index Scan using nodes_id on nodes n2 (cost=0.00..35.08 rows=11 width=34) (actual time=0.050..0.059 rows=1 loops=1) Index Cond: (id = 14) -> Seq Scan on nodes n1 (cost=0.00..6151.89 rows=318489 width=38) (actual time=0.010..479.479 rows=318489 loops=1) Total runtime: 953.551 ms (6 rows) I've tried re-writing the query, which results in a different plan: select id from nodes where path like ( select path from nodes where id = 14 limit 1 ) || '%'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------ Seq Scan on nodes (cost=3.19..7747.52 rows=1592 width=4) (actual time=0.230..226.311 rows=7 loops=1) Filter: ((path)::text ~~ (($0)::text || '%'::text)) InitPlan -> Limit (cost=0.00..3.19 rows=1 width=34) (actual time=0.018..0.019 rows=1 loops=1) -> Index Scan using nodes_id on nodes (cost=0.00..35.08 rows=11 width=34) (actual time=0.016..0.016 rows=1 loops=1) Index Cond: (id = 14) Total runtime: 226.381 ms (7 rows) While the plan looks a little better, the estimated rows are woefully inaccurate for some reason, resulting in a seq scan on nodes. If I perform the nested select in the second query separately, then use the result in the outer select, it's extremely fast: test=# select path from nodes where id = 14; path -------- /3/13/ (1 row) Time: 0.555 ms test=# select id from nodes where path like '/3/13/%'; id --------- 14 169012 15 16 17 169219 169220 (7 rows) Time: 1.062 ms I've vacuum full analyzed. PG version is 8.1.4 The nodes table is as follows: test=# \d nodes Table "public.nodes" Column | Type | Modifiers --------+-------------------------+----------- id | integer | not null path | character varying(2000) | not null depth | integer | not null Indexes: "nodes_pkey" PRIMARY KEY, btree (id, path) "nodes_id" btree (id) "nodes_path" btree (path) test# select count(*) from nodes; count -------- 318489 Is there a way to perform this efficiently in one query ?
pgsql-performance by date: