Estimates not taking null_frac element into account with @@ operator? (8.4 .. git-head) - Mailing list pgsql-hackers
From | Jesper Krogh |
---|---|
Subject | Estimates not taking null_frac element into account with @@ operator? (8.4 .. git-head) |
Date | |
Msg-id | 4D5D9224.7080105@krogh.cc Whole thread Raw |
Responses |
Re: Estimates not taking null_frac element into account with @@ operator? (8.4 .. git-head)
Re: Estimates not taking null_frac element into account with @@ operator? (8.4 .. git-head) |
List | pgsql-hackers |
Hi All. The NULL element always suprises me in unpleasant ways.. my brain simply cant really understand the logic, so please let me know if this is one of the cases where I just should spend way more efforts into fixing that instead. I have a table with a "null_frac" of 0.5 and i have tested that a where clause that evaluates to null isnt included in the result: testdb=# select id from testtable where null @@ to_tsquery('testterm80'); id ---- (0 rows) Then I'd expect to have the null_fraq taken into account when computing the estimates for the query: testdb=# explain select id from testtable where fts @@ to_tsquery('testterm80'); QUERY PLAN --------------------------------------------------------------- Seq Scan on testtable (cost=0.00..1985.03 rows=1966 width=4) Filter: (fts @@ to_tsquery('testterm80'::text)) (2 rows) Whereas it actually does it if I explicitly add the "fts is not null" clause to the query. testdb=# explain select id from testtable where fts @@ to_tsquery('testterm80') and fts is not null; QUERY PLAN ------------------------------------------------------------------------------------- Bitmap Heap Scan on testtable (cost=130.34..1735.19 rows=983 width=4) Recheck Cond: ((fts @@ to_tsquery('testterm80'::text)) AND (fts IS NOT NULL)) -> Bitmap Index Scan on testtable_fts_idx (cost=0.00..130.09 rows=983 width=0) Index Cond: ((fts @@ to_tsquery('testterm80'::text)) AND (fts IS NOT NULL)) (4 rows) When something evaluates to "null" isn't included in the result, shouldn't the query-planner then take the null_frac into account when computing the estimate? Trying to do the same thing using integers and the < operator seem to take the null_frac into account. Below snippet allows to reproduce the dataset. create table testtable (id serial primary key, document text, fts tsvector); create index on testtable using gist(fts); CREATE OR REPLACE FUNCTION public.filltable(rows integer) RETURNS boolean LANGUAGE plpgsql AS $function$ DECLARE count integer; BEGIN count := 0; LOOP EXIT WHEN count = rows; count := count +1; insert into testtable(document,fts) select document,to_tsvector('english',document) from (select string_agg(concat,' ') as document from (select concat('testterm' || generate_series(1,floor(random()*100)::integer))) as foo) as bar; END LOOP; RETURN TRUE; END; $function$ select filltable(10000); testdb=# update testtable set fts = null where id % 2 = 0; UPDATE 5001 testdb=# ANALYZE verbose testtable; INFO: analyzing "public.testtable" INFO: "testtable": scanned 1835 of 1835 pages, containing 10002 live rows and 5001 dead rows; 10002 rows in sample, 10002 estimated total rows ANALYZE testdb=# select null_frac from pg_stats where attname = 'fts'; null_frac ----------- 0.5 (1 row) ... trying with integers: testdb=# ALTER TABLE testtable add column testint integer; ALTER TABLE testdb=# update testtable set testint = floor(random()*100); UPDATE 10002 testdb=# ANALYZE verbose testtable; INFO: analyzing "public.testtable" INFO: "testtable": scanned 2186 of 2186 pages, containing 10002 live rows and 10002 dead rows; 10002 rows in sample, 10002 estimated total rows ANALYZE testdb=# update testtable set testint = null where id %2 = 0; UPDATE 5001 testdb=# ANALYZE verbose testtable; INFO: analyzing "public.testtable" INFO: "testtable": scanned 2282 of 2282 pages, containing 10002 live rows and 13335 dead rows; 10002 rows in sample, 10002 estimated total rows analyzze ANALYZE testdb=# explain select id from testtable where testint = 50; QUERY PLAN ------------------------------------------------------------- Seq Scan on testtable (cost=0.00..2407.03 rows=64 width=4) Filter: (testint = 50) (2 rows) testdb=# explain select id from testtable where testint = 1; QUERY PLAN ------------------------------------------------------------- Seq Scan on testtable (cost=0.00..2407.03 rows=48 width=4) Filter: (testint = 1) (2 rows) testdb=# explain select id from testtable where testint < 50; QUERY PLAN --------------------------------------------------------------- Seq Scan on testtable (cost=0.00..2407.03 rows=2470 width=4) Filter: (testint < 50) (2 rows) (found on 8.4 and reproduced on git-head) Attached patch tries to align the behaviour Thanks. -- Jesper
Attachment
pgsql-hackers by date: