Re: To use a VIEW or not to use a View..... - Mailing list pgsql-sql
From | Tom Lane |
---|---|
Subject | Re: To use a VIEW or not to use a View..... |
Date | |
Msg-id | 26724.1043342499@sss.pgh.pa.us Whole thread Raw |
In response to | Re: To use a VIEW or not to use a View..... (Stephan Szabo <sszabo@megazone23.bigpanda.com>) |
Responses |
Re: To use a VIEW or not to use a View.....
|
List | pgsql-sql |
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > On Wed, 22 Jan 2003, Tom Lane wrote: >> It could still use more eyeballs looking at it. One thing I'm concerned >> about is whether the extra (derived) conditions lead to double-counting >> restrictivity and thus underestimating the number of result rows. I >> haven't had time to really test that, but I suspect there may be a problem. > I haven't looked at code yet but tried examples like Tomasz's and some > simple ones and have gotten reasonable seeming output for the estimates > given accurate statistics I realized this morning that there definitely is a problem. Consider this example using the regression database: regression=# explain analyze select * from tenk1 a join tenk1 b using(ten) regression-# where ten = 3; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------Merge Join (cost=1055.45..2102.12 rows=83006 width=488) (actual time=582.97..65486.57 rows=1000000 loops=1) Merge Cond: ("outer".ten= "inner".ten) -> Sort (cost=527.73..530.00 rows=910 width=244) (actual time=373.57..382.48 rows=1000 loops=1) Sort Key: a.ten -> Seq Scan on tenk1 a (cost=0.00..483.00 rows=910 width=244) (actual time=8.98..330.39rows=1000 loops=1) Filter: (ten = 3) -> Sort (cost=527.73..530.00 rows=910 width=244) (actualtime=209.19..8057.64 rows=999001 loops=1) Sort Key: b.ten -> Seq Scan on tenk1 b (cost=0.00..483.00rows=910 width=244) (actual time=0.40..193.93 rows=1000 loops=1) Filter: (3 = ten)Total runtime:73291.01 msec (11 rows) The condition "ten=3" will select 1000 rows out of the 10000 in the table. But, once we have applied that condition to both sides of the join, the join condition "a.ten = b.ten" is a no-op --- it will not reject any pair of rows coming out of the seqscans. Presently we count its restrictivity anyway, so the estimated row count at the merge is a bad underestimate. Not only should we ignore the join condition for selectivity purposes, but it's a waste of time for execution as well. We could have implemented the above query as a nestloop with no join condition, and saved the effort of the sort and merge logic. What I was thinking was that any time the code sees a "var = const" clause as part of a mergejoin equivalence set, we could mark all the "var = var" clauses in the same set as no-ops. For example, given WHERE a.f1 = b.f2 AND b.f2 = c.f3 AND c.f3 = 42 then after we finish deducing a.f1 = 42 and b.f2 = 42, there is no longer any value in either of the original clauses a.f1 = b.f2 and b.f2 = c.f3, nor in the other deduced clause a.f1 = c.f3. This would take a little bit of restructuring of generate_implied_equalities() and process_implied_equality(), but it doesn't seem too difficult to do. Thoughts? Are there any holes in that logic? regards, tom lane