Re: estimates for nested loop very wrong? - Mailing list pgsql-sql
From | joostje@komputilo.org |
---|---|
Subject | Re: estimates for nested loop very wrong? |
Date | |
Msg-id | 20030410194553.GB28537@co.uea.org Whole thread Raw |
In response to | Re: estimates for nested loop very wrong? (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: estimates for nested loop very wrong?
Re: estimates for nested loop very wrong? Re: estimates for nested loop very wrong? |
List | pgsql-sql |
Je 2003/04/10(4)/12:04, Tom Lane skribis: > joostje@komputilo.org writes: > > Unless I'm mistaken, pg_nstats.n_distinct should be (aproximately) the same as > > count(distinct(id)), but it obviously isn't. Also the most_common_freqs > > values are about a 100 times higher than in reality, and, even tough about > > 900 values of id occur more often than 40 times, in the 'most_common_vals' > > list are 7 (of the 10) vals that occur less than 40 times, and the real > > top two isn't even represented. > > Please try increasing the statistics target (see ALTER TABLE) for db.id, then > re-analyze and see if the estimates get better. The default setting is > 10 --- try 20, 50, 100 to see what happens. Well, the n_distinct estimates get better, but the cost estimates still don't quite add up: `actual cost' is 23.24, cost estimate never gets below 49930. stat.targ n_distinct| correlation cost estimate 5 1917 | 0.43189 3621794.92 10 1998 | 0.3909 3618363.33 20 4330 | -0.247617 1981594.38 50 9708 | 0.0762642 975847.15100 14604 | 0.030706 657631.41200 21855 | 0.0446929 204335.70500 39980 | -0.0497829 121000.31 1000 29468 | 0.0366528 49930.08 1000 29453 | 0.0367673 49954.08 Table 1: various estimates as a function of statistical target actualdistinct values: 42226 actual cost: varies from 5.0 to 27.8 So, the planner still prefers the mergejoin and hashjoin plans, causing the select to take tens of seconds (60 for the mergejoin, I beleve), wheras the Nested Loop takes only 0.024 seconds: For example, for the stat.targ=500 run: => explain analyse SELECT id from db, tmp0 WHERE valida AND poseda='uea' AND tab='pers' AND tmp0.v0=id ; NOTICE: QUERY PLAN: Nested Loop (cost=0.00..121000.31 rows=28184 width=39) (actual time=1.05..23.24 rows=415 loops=1) -> Seq Scan on tmp0 (cost=0.00..20.00 rows=1000 width=32) (actual time=0.22..0.40 rows=29 loops=1) -> Index Scan using db_id_idx on db (cost=0.00..120.63rows=28 width=7) (actual time=0.27..0.75 rows=14 loops=29) Total runtime: 23.92 msec In the above example, tmp0 had 29 values, that correspond to 415 rows in table db. Table db has 586157 rows. The shown select statement is the one used to get all cost estimates in table 1. postgresql: 7.2.1 (debian release 3)