Re: Very specialised query - Mailing list pgsql-performance
From | Matthew Wakeling |
---|---|
Subject | Re: Very specialised query |
Date | |
Msg-id | alpine.DEB.2.00.0903271708080.21772@aragorn.flymine.org Whole thread Raw |
In response to | Re: Very specialised query (Віталій Тимчишин <tivv00@gmail.com>) |
Responses |
Re: Very specialised query
|
List | pgsql-performance |
On Fri, 27 Mar 2009, Віталій Тимчишин wrote: > ...an index on (objectid, start) would help... Definitely. > You could try adding "AND l2.start > l1.start" to the first query. > This will drop symmetric half of intersections (the ones that will > remain are l2 inside or to the left of l1), but you can redo results by > id1,id2 union all id2, id1 and may allow to use start index for > "between" That's remarkably clever, and should have been obvious. Thanks. Adding the constraint as you say does indeed make the query fast. However, there seems to be a problem with the planner, in that it does not distinguish between the costs of two alternative plans, which have vastly different real costs. Consider the following: SELECT l1.id AS id1, l2.id AS id2 FROM location l1, location l2 WHERE l1.objectid = 228000093 AND l2.objectid = 228000093 AND l1.id <> l2.id AND l1.start < l2.end AND l1.end > l2.start AND l1.start < l2.start UNION ALL SELECT l1.id AS id1, l2.id AS id2 FROM location l1, location l2 WHERE l1.objectid = 228000093 AND l2.objectid = 228000093 AND l1.id <> l2.id AND l1.start < l2.end AND l1.end > l2.start AND l1.start >= l2.start; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------- Append (cost=0.00..20479179.74 rows=138732882 width=8) -> Nested Loop (cost=0.00..9545925.46 rows=69366441 width=8) Join Filter: ((l1.id <> l2.id) AND (l1.start < l2.end)) -> Index Scan using location_test_obj_end on location l1 (cost=0.00..55966.07 rows=43277 width=12) Index Cond: (objectid = 228000093) -> Index Scan using location_test_obj_start on location l2 (cost=0.00..123.10 rows=4809 width=12) Index Cond: ((l2.objectid = 228000093) AND (l1.end > l2.start) AND (l1.start < l2.start)) -> Nested Loop (cost=0.00..9545925.46 rows=69366441 width=8) Join Filter: ((l1.id <> l2.id) AND (l1.start < l2.end)) -> Index Scan using location_test_obj_end on location l1 (cost=0.00..55966.07 rows=43277 width=12) Index Cond: (objectid = 228000093) -> Index Scan using location_test_obj_start on location l2 (cost=0.00..123.10 rows=4809 width=12) Index Cond: ((l2.objectid = 228000093) AND (l1.end > l2.start) AND (l1.start >= l2.start)) (13 rows) Notice the two different index conditions: (l1.end > l2.start) AND (l1.start < l2.start) - "between" (l1.end > l2.start) AND (l1.start >= l2.start) - open-ended Both have a cost of (cost=0.00..123.10 rows=4809 width=12) Postgres estimates these two index scans to be equivalent in cost, where they are actually vastly different in real cost. Shouldn't Postgres favour a "between" index scan over an open-ended one? Matthew -- [About NP-completeness] These are the problems that make efficient use of the Fairy Godmother. -- Computer Science Lecturer
pgsql-performance by date: