Home > mailing lists

LIKE query problem - Mailing list pgsql-performance

From	Marc McIntyre
Subject	LIKE query problem
Date	September 18, 2006 21:07:41
Msg-id	450F303A.6090603@squiz.net Whole thread Raw
Responses	Re: LIKE query problem
List	pgsql-performance

Tree view

I'm having a problem with a simple query, that finds children of a node,
using a materialized path to the node. The query:

select n1.id
from nodes n1, nodes n2
where n1.path like n2.path || '%'
and n2.id = 14;

                                                                QUERY
PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..120256.56 rows=17517 width=4) (actual
time=0.901..953.485 rows=7 loops=1)
   Join Filter: (("inner".path)::text ~~ (("outer".path)::text ||
'%'::text))
   ->  Index Scan using nodes_id on nodes n2  (cost=0.00..35.08 rows=11
width=34) (actual time=0.050..0.059 rows=1 loops=1)
         Index Cond: (id = 14)
   ->  Seq Scan on nodes n1  (cost=0.00..6151.89 rows=318489 width=38)
(actual time=0.010..479.479 rows=318489 loops=1)
 Total runtime: 953.551 ms
(6 rows)

I've tried re-writing the query, which results in a different plan:

select id
from nodes
where path like (
    select path
    from nodes
    where id = 14
    limit 1
) || '%';

                                                                   QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on nodes  (cost=3.19..7747.52 rows=1592 width=4) (actual
time=0.230..226.311 rows=7 loops=1)
   Filter: ((path)::text ~~ (($0)::text || '%'::text))
   InitPlan
     ->  Limit  (cost=0.00..3.19 rows=1 width=34) (actual
time=0.018..0.019 rows=1 loops=1)
           ->  Index Scan using nodes_id on nodes  (cost=0.00..35.08
rows=11 width=34) (actual time=0.016..0.016 rows=1 loops=1)
                 Index Cond: (id = 14)
 Total runtime: 226.381 ms
(7 rows)

While the plan looks a little better, the estimated rows are woefully
inaccurate for some reason, resulting in a seq scan on nodes.
If I perform the nested select in the second query separately, then use
the result in the outer select, it's extremely fast:

test=# select path from nodes where id = 14;
  path
--------
 /3/13/
(1 row)

Time: 0.555 ms

test=# select id from nodes where path like '/3/13/%';
id
---------
      14
  169012
      15
      16
      17
  169219
  169220
(7 rows)

Time: 1.062 ms

I've vacuum full analyzed. PG version is 8.1.4

The nodes table is as follows:

test=# \d nodes
           Table "public.nodes"
 Column |          Type           | Modifiers
--------+-------------------------+-----------
 id     | integer                 | not null
 path   | character varying(2000) | not null
 depth  | integer                 | not null
Indexes:
    "nodes_pkey" PRIMARY KEY, btree (id, path)
    "nodes_id" btree (id)
    "nodes_path" btree (path)

test# select count(*) from nodes;
 count
--------
 318489

Is there a way to perform this efficiently in one query ?

pgsql-performance by date:

From: Michael Stone
Date: 18 September 2006, 20:58:46
Subject: Re: Large tables (was: RAID 0 not as fast as expected)

From: Francisco Reyes
Date: 18 September 2006, 21:21:29
Subject: Re: Vacuums on large busy databases

LIKE query problem - Mailing list pgsql-performance

Previous

Next