Re: Indexing on JSONB field not working - Mailing list pgsql-bugs
From | Zhihong Zhang |
---|---|
Subject | Re: Indexing on JSONB field not working |
Date | |
Msg-id | 0AA828F3-5BA5-48CD-B229-8351ACB308D3@gmail.com Whole thread Raw |
In response to | Re: Indexing on JSONB field not working (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
RE: Indexing on JSONB field not working
|
List | pgsql-bugs |
Just came back from the long break and I couldn’t quite follow the threads. Is this a bug or something I am doing wrong? If it’s a bug, can I have a tracking or ticket number? Thanks! Zhihong > On Dec 21, 2019, at 10:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: >> On Fri, Dec 20, 2019 at 08:15:39PM -0500, Zhihong Zhang wrote: >>> "Limit (cost=0.43..313.25 rows=100 width=53) (actual time=0.092..0.236 rows=7 loops=1)" >>> " -> Index Scan using assets_floatvalue_idx on assets (cost=0.43..2714072.57 rows=867607 width=53) (actual time=0.089..0.230rows=7 loops=1)" >>> " Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision)" > >> Well, this confirms what I suspected before - the optimizer believes the >> seqscan plan is a bit cheaper compared to index scan (107 vs. 313) but >> that mismatches the actual performance. The question is why ... > > The planner evidently believes that 867607 rows will match the query > condition, so it expects that the scan will stop (after collecting > 100 rows) very quickly. In reality only 7 rows match, so the scan > has to run to completion. This is what's bollixing the plan choice. > > I suspect that 867607 is just a default estimate, but if ANALYZE has > been run then there should be stats for the index column, so why isn't > it doing better? When I try a similar case here, I get good estimates: > > regression=# select version(); > version > --------------------------------------------------------------------------------------------------------- > PostgreSQL 11.6 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23), 64-bit > (1 row) > > regression=# create table public.assets(_doc jsonb); > CREATE TABLE > regression=# CREATE INDEX on assets (((_doc #> '{floatValue}'::text[])::double precision) ASC NULLS LAST); > CREATE INDEX > regression=# insert into assets select ('{"floatValue": ' || x || '}')::jsonb from generate_series(1,10000) x; > INSERT 0 10000 > regression=# explain select _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision <3.0 limit 100; > QUERY PLAN > ------------------------------------------------------------------------------------------------ > Limit (cost=0.00..7.71 rows=100 width=32) > -> Seq Scan on assets (cost=0.00..293.44 rows=3808 width=32) > Filter: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision) > (3 rows) > > regression=# analyze assets; > ANALYZE > regression=# explain select _doc->>'floatValue' from assets where (_doc #> '{floatValue}'::text[])::double precision <3.0 limit 100; > QUERY PLAN > ---------------------------------------------------------------------------------------------------- > Limit (cost=0.29..8.33 rows=2 width=32) > -> Index Scan using assets_float8_idx on assets (cost=0.29..8.33 rows=2 width=32) > Index Cond: (((_doc #> '{floatValue}'::text[]))::double precision < '3'::double precision) > (3 rows) > > The "3808" estimate is just a default for '<' with no stats, but > with stats I get a dead-on estimate. > > regards, tom lane
pgsql-bugs by date: