Re: 8.4 optimization regression? - Mailing list pgsql-performance
From | Mark Kirkwood |
---|---|
Subject | Re: 8.4 optimization regression? |
Date | |
Msg-id | 4E601B2F.30107@catalyst.net.nz Whole thread Raw |
In response to | Re: 8.4 optimization regression? (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>) |
Responses |
Re: 8.4 optimization regression?
|
List | pgsql-performance |
On 02/09/11 11:18, Mark Kirkwood wrote: > On 02/09/11 11:13, Tom Lane wrote: >> I wrote: >>> Mark Kirkwood<mark.kirkwood@catalyst.net.nz> writes: >>>> [ assorted examples showing that commit >>>> 7f3eba30c9d622d1981b1368f2d79ba0999cdff2 has got problems ] >>> ... >>> So, not only are you correct that we should revert the changes to >>> eqjoinsel_inner, but what's happening in eqjoinsel_semi is wrong too. >> I've retested these examples with the patches I committed yesterday. >> Six of the eight examples are estimated pretty nearly dead on, while the >> other two are estimated about 50% too high (still a lot better than >> before). AFAICT there's no easy way to improve those estimates further; >> eqjoinsel_semi just plain hasn't got enough information to know how many >> matches there will be. >> >> > > Just noticed your two commits this morning and ran them through the > examples too - results look really good! Not only are the plain join > queries looking way better but that last semi join that was way off is > now being estimated pretty close. Should be interesting to see how > much this improves more complex queries! > > While this is still fresh in your mind, a couple of additional anti join queries are still managing to sneak past estimation: EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 100000 AND NOT EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND n.updated > '2011-01-01'::timestamp ); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------ Hash Anti Join (cost=501666.88..851597.05 rows=1 width=0) (actual time=29956.971..50933.702 rows=5914 loops=1) Hash Cond: (nk.nodeid = n.nodeid) -> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=1991560 width=4) (actual time=13.352..13765.749 rows=1999780 loops=1) Filter: (keywordid < 100000) -> Hash (cost=419643.00..419643.00 rows=4999510 width=4) (actual time=29345.238..29345.238 rows=4985269 loops=1) Buckets: 4096 Batches: 256 Memory Usage: 699kB -> Seq Scan on node n (cost=0.00..419643.00 rows=4999510 width=4) (actual time=0.010..22731.316 rows=4985269 loops=1) Filter: (updated > '2011-01-01 00:00:00'::timestamp without time zone) EXPLAIN ANALYZE SELECT 1 FROM nodekeyword nk WHERE nk.keywordid < 10000 AND NOT EXISTS (SELECT 1 FROM node n WHERE n.nodeid = nk.nodeid AND n.updated > '2011-01-01'::timestamp ); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- Hash Anti Join (cost=501666.88..821806.96 rows=1 width=0) (actual time=46497.231..49196.057 rows=566 loops=1) Hash Cond: (nk.nodeid = n.nodeid) -> Seq Scan on nodekeyword nk (cost=0.00..297414.03 rows=192921 width=4) (actual time=19.916..16250.224 rows=199616 loops=1) Filter: (keywordid < 10000) -> Hash (cost=419643.00..419643.00 rows=4999510 width=4) (actual time=29901.178..29901.178 rows=4985269 loops=1) Buckets: 4096 Batches: 256 Memory Usage: 699kB -> Seq Scan on node n (cost=0.00..419643.00 rows=4999510 width=4) (actual time=0.008..23207.964 rows=4985269 loops=1) Filter: (updated > '2011-01-01 00:00:00'::timestamp without time zone)
pgsql-performance by date: