Thread: Estimates not taking null_frac element into account with @@ operator? (8.4 .. git-head)
Estimates not taking null_frac element into account with @@ operator? (8.4 .. git-head)
From
Jesper Krogh
Date:
Hi All. The NULL element always suprises me in unpleasant ways.. my brain simply cant really understand the logic, so please let me know if this is one of the cases where I just should spend way more efforts into fixing that instead. I have a table with a "null_frac" of 0.5 and i have tested that a where clause that evaluates to null isnt included in the result: testdb=# select id from testtable where null @@ to_tsquery('testterm80'); id ---- (0 rows) Then I'd expect to have the null_fraq taken into account when computing the estimates for the query: testdb=# explain select id from testtable where fts @@ to_tsquery('testterm80'); QUERY PLAN --------------------------------------------------------------- Seq Scan on testtable (cost=0.00..1985.03 rows=1966 width=4) Filter: (fts @@ to_tsquery('testterm80'::text)) (2 rows) Whereas it actually does it if I explicitly add the "fts is not null" clause to the query. testdb=# explain select id from testtable where fts @@ to_tsquery('testterm80') and fts is not null; QUERY PLAN ------------------------------------------------------------------------------------- Bitmap Heap Scan on testtable (cost=130.34..1735.19 rows=983 width=4) Recheck Cond: ((fts @@ to_tsquery('testterm80'::text)) AND (fts IS NOT NULL)) -> Bitmap Index Scan on testtable_fts_idx (cost=0.00..130.09 rows=983 width=0) Index Cond: ((fts @@ to_tsquery('testterm80'::text)) AND (fts IS NOT NULL)) (4 rows) When something evaluates to "null" isn't included in the result, shouldn't the query-planner then take the null_frac into account when computing the estimate? Trying to do the same thing using integers and the < operator seem to take the null_frac into account. Below snippet allows to reproduce the dataset. create table testtable (id serial primary key, document text, fts tsvector); create index on testtable using gist(fts); CREATE OR REPLACE FUNCTION public.filltable(rows integer) RETURNS boolean LANGUAGE plpgsql AS $function$ DECLARE count integer; BEGIN count := 0; LOOP EXIT WHEN count = rows; count := count +1; insert into testtable(document,fts) select document,to_tsvector('english',document) from (select string_agg(concat,' ') as document from (select concat('testterm' || generate_series(1,floor(random()*100)::integer))) as foo) as bar; END LOOP; RETURN TRUE; END; $function$ select filltable(10000); testdb=# update testtable set fts = null where id % 2 = 0; UPDATE 5001 testdb=# ANALYZE verbose testtable; INFO: analyzing "public.testtable" INFO: "testtable": scanned 1835 of 1835 pages, containing 10002 live rows and 5001 dead rows; 10002 rows in sample, 10002 estimated total rows ANALYZE testdb=# select null_frac from pg_stats where attname = 'fts'; null_frac ----------- 0.5 (1 row) ... trying with integers: testdb=# ALTER TABLE testtable add column testint integer; ALTER TABLE testdb=# update testtable set testint = floor(random()*100); UPDATE 10002 testdb=# ANALYZE verbose testtable; INFO: analyzing "public.testtable" INFO: "testtable": scanned 2186 of 2186 pages, containing 10002 live rows and 10002 dead rows; 10002 rows in sample, 10002 estimated total rows ANALYZE testdb=# update testtable set testint = null where id %2 = 0; UPDATE 5001 testdb=# ANALYZE verbose testtable; INFO: analyzing "public.testtable" INFO: "testtable": scanned 2282 of 2282 pages, containing 10002 live rows and 13335 dead rows; 10002 rows in sample, 10002 estimated total rows analyzze ANALYZE testdb=# explain select id from testtable where testint = 50; QUERY PLAN ------------------------------------------------------------- Seq Scan on testtable (cost=0.00..2407.03 rows=64 width=4) Filter: (testint = 50) (2 rows) testdb=# explain select id from testtable where testint = 1; QUERY PLAN ------------------------------------------------------------- Seq Scan on testtable (cost=0.00..2407.03 rows=48 width=4) Filter: (testint = 1) (2 rows) testdb=# explain select id from testtable where testint < 50; QUERY PLAN --------------------------------------------------------------- Seq Scan on testtable (cost=0.00..2407.03 rows=2470 width=4) Filter: (testint < 50) (2 rows) (found on 8.4 and reproduced on git-head) Attached patch tries to align the behaviour Thanks. -- Jesper
Attachment
Re: Estimates not taking null_frac element into account with @@ operator? (8.4 .. git-head)
From
Tom Lane
Date:
Jesper Krogh <jesper@krogh.cc> writes: > When something evaluates to "null" isn't included in the result, > shouldn't the query-planner > then take the null_frac into account when computing the estimate? The proposed patch seems wrong to me: if we're estimating on the basis of most-common-value fractions, the null_frac is already accounted for, because it's not part of the MCV selectivity fractions. IOW, aren't you double-counting the null fraction? regards, tom lane
Re: Estimates not taking null_frac element into account with @@ operator? (8.4 .. git-head)
From
Jesper Krogh
Date:
On 2011-02-17 23:20, Tom Lane wrote: > Jesper Krogh<jesper@krogh.cc> writes: >> When something evaluates to "null" isn't included in the result, >> shouldn't the query-planner >> then take the null_frac into account when computing the estimate? > The proposed patch seems wrong to me: if we're estimating on the basis > of most-common-value fractions, the null_frac is already accounted for, > because it's not part of the MCV selectivity fractions. IOW, aren't you > double-counting the null fraction? It might be the wrong place to fix, but here it seems like we're only counting MCE-freqs based on non-null elements: http://git.postgresql.org/gitweb?p=postgresql.git;a=blob;f=src/backend/tsearch/ts_typanalyze.c;h=2654d644579fd1959282d83919474f42540ca703;hb=HEAD#l396 And the testdata confirms the behaviour. -- Jesper
Re: Estimates not taking null_frac element into account with @@ operator? (8.4 .. git-head)
From
Tom Lane
Date:
Jesper Krogh <jesper@krogh.cc> writes: > On 2011-02-17 23:20, Tom Lane wrote: >> The proposed patch seems wrong to me: if we're estimating on the basis >> of most-common-value fractions, the null_frac is already accounted for, >> because it's not part of the MCV selectivity fractions. IOW, aren't you >> double-counting the null fraction? > It might be the wrong place to fix, but here it seems like we're only > counting MCE-freqs based on non-null elements: > http://git.postgresql.org/gitweb?p=postgresql.git;a=blob;f=src/backend/tsearch/ts_typanalyze.c;h=2654d644579fd1959282d83919474f42540ca703;hb=HEAD#l396 Hmm, you're right, and the specification in pg_statistic.h neglects to say that. This does need work. regards, tom lane
Re: Estimates not taking null_frac element into account with @@ operator? (8.4 .. git-head)
From
Tom Lane
Date:
Jesper Krogh <jesper@krogh.cc> writes: > Attached patch tries to align the behaviour Applied with a bit of editorialization. regards, tom lane