Home > mailing lists

Re: Unable to make use of "deep" JSONB index - Mailing list pgsql-bugs

From	Shaheed Haque
Subject	Re: Unable to make use of "deep" JSONB index
Date	June 12, 2022 19:31:19
Msg-id	CAHAc2jfQvVMUKQ4rZ16G3Nsn+Rn-5_NFabzXnJ+q41OXmH+pOg@mail.gmail.com Whole thread
In response to	Re: Unable to make use of "deep" JSONB index (Erik Rijkers <er@xs4all.nl>)
Responses	Re: Unable to make use of "deep" JSONB index Re: Unable to make use of "deep" JSONB index
List	pgsql-bugs

Tree view

(Resend, wrong version was sent before)

Thanks Erik. Is the point that the index has to be on the JSON field
as a whole (i.e. "snapshot") rather than deep inside it (e.g.
"snapshot.something.further[down]")?

In my case, the snapshot is several MB in size (perhaps 10MB or even
20MB), dominated by the snapshot.employee (cardinality 10k, each sized
as a dict 1-2kB as text). My expectation/guess is that an index of
"snapshot" will itself be of a size of similar order. However the
design as-is works very well except for this one case where to speed
it up, in principle, the index need contain no more than one boolean
per employee. So that's what I'd like to achieve, if possible.

I've seen no hint in the documentation that creating the index on
"snapshot.something.further[down]" should not work, and PG certainly
allows it to be created. Also, Tom has suggested that I should not
look to some magical ability to infer the use of the index from a
differently structured query, and I've taken that on board with the
updated query + index.

AFAIK, there are 3 possibilities:

- I've not done things right, in which case I'd love to know my mistake.
- It is not supposed to work, in which case it would be good to have
that stated, and maybe have PG not allow useless indices to be
created.
- It is a bug.

All input much appreciated,

Thanks, Shaheed


On Sun, 12 Jun 2022 at 11:53, Erik Rijkers <er@xs4all.nl> wrote:
>
> Op 12-06-2022 om 11:34 schreef Shaheed Haque:
> > OK, I have corrected and simplified the test case (including switching
> > to a btree index). The WHERE clause and the inex now look like this:
> >
> >      ...WHERE         ((snapshot -> 'employee' -> '999' ->>
> > 'pay_graph')::integer != 0);
> >      ...USING btree (((snapshot -> 'employee' -> '$.*'  ->>
> > 'pay_graph')::integer != 0));
> >
> > But the index is still not being used (test case below). I have
> > confirmed that the equality operator is listed for "search" (I assume
> > inequality is the same as equality, but I tried both):
> >
> > # \dAo btree jsonb*
> >              List of operators of operator families
> >   AM   | Operator family |    Operator     | Strategy | Purpose
> > -------+-----------------+-----------------+----------+---------
> > btree | jsonb_ops       | <(jsonb,jsonb)  |        1 | search
> > btree | jsonb_ops       | <=(jsonb,jsonb) |        2 | search
> > btree | jsonb_ops       | =(jsonb,jsonb)  |        3 | search
> > btree | jsonb_ops       | >=(jsonb,jsonb) |        4 | search
> > btree | jsonb_ops       | >(jsonb,jsonb)  |        5 | search
> > (5 rows)
> >
> > If this is not a bug, then how should the query or the index be
> > changed to make this work?
> >
> >
> > === begin test case ===
> >
> > CREATE TABLE payrun (
> >      id serial primary key,
> >      snapshot JSONB
> > );
> >
> > INSERT INTO payrun(snapshot)
> > VALUES
> >    ('{"employee": {"999": {"id": 999, "state": {"employment":
> > [["1920-01-01", null, 3]]}, "pay_graph": 0, "last_run_of_employment":
> > false}, "111": {"id": 111, "state": {"employment": [["1920-01-01",
> > null, 5]]}, "pay_graph": 3, "last_run_of_employment": true}}}'),
> >    ('{"employee": {"999": {"id": 999, "state": {"employment":
> > [["1970-01-01", null, 3]]}, "pay_graph": 6, "last_run_of_employment":
> > true}, "222": {"id": 222, "state": {"employment": [["1920-01-01",
> > null, 5]]}, "pay_graph": 5, "last_run_of_employment": true}}}'),
> >    ('{"employee": {"998": {"id": 998, "state": {"employment":
> > [["1980-01-01", null, 3]]}, "pay_graph": 7, "last_run_of_employment":
> > false}, "333": {"id": 333, "state": {"employment": [["1920-01-01",
> > null, 5]]}, "pay_graph": 3, "last_run_of_employment": true}}}')
> > ;
> >
> > SELECT id,snapshot #>'{employee,999,state,employment}' FROM "payrun"
> > WHERE ((snapshot -> 'employee' -> '999' ->> 'pay_graph')::integer !=
> > 0);
> >
> > --
> > -- Create index designed to match the query.
> > --
> > CREATE INDEX idx1 ON payrun USING btree (((snapshot -> 'employee' ->
> > '$.*' ->> 'pay_graph')::integer != 0));
> >
> > set enable_seqscan = OFF;
> >
>
> How is this?
>
> I took the triple condition from your earlier email.
> I did not use your index.
> I added one index using gin jsonb_path_ops.
>
> create index payrun_jspathop_idx ON payrun using gin (snapshot
> jsonb_path_ops);
> set enable_seqscan = OFF;
> \timing on
>      select id, snapshot #>'{employee,999,state,employment}' from payrun
> where snapshot @? '$."employee"."999" ?
>     ( @.pay_graph <> 0
>    || @.last_run_of_employment == true
>    || @.state.employment[last][2] == 0
> )';
>   id |         ?column?
> ----+---------------------------
>    2 | [["1970-01-01", null, 3]]
> (1 row)
>
> Time: 0.897 ms
> explain analyze  select id, snapshot #>'{employee,999,state,employment}'
> from payrun
> where snapshot @? '$."employee"."999" ?
>     ( @.pay_graph <> 0
>    || @.last_run_of_employment == true
>    || @.state.employment[last][2] == 0
> )';
>
>         QUERY PLAN
>
>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
>   Bitmap Heap Scan on payrun  (cost=136.00..140.02 rows=1 width=36)
> (actual time=0.018..0.019 rows=1 loops=1)
>     Recheck Cond: (snapshot @? '$."employee"."999"?((@."pay_graph" != 0
> || @."last_run_of_employment" == true) ||
> @."state"."employment"[last][2] == 0)'::jsonpath)
>     Rows Removed by Index Recheck: 2
>     Heap Blocks: exact=1
>     ->  Bitmap Index Scan on payrun_jspathop_idx  (cost=0.00..136.00
> rows=1 width=0) (actual time=0.007..0.007 rows=3 loops=1)
>           Index Cond: (snapshot @? '$."employee"."999"?((@."pay_graph"
> != 0 || @."last_run_of_employment" == true) ||
> @."state"."employment"[last][2] == 0)'::jsonpath)
>   Planning Time: 0.034 ms
>   Execution Time: 0.033 ms
> (8 rows)
>
> Time: 0.284 ms
>
>
> hope that helps.
>
> Erik Rijkers

pgsql-bugs by date:

From: Shaheed Haque
Date: 12 June 2022, 11:12:55
Subject: Re: Unable to make use of "deep" JSONB index

From: Jeff Janes
Date: 12 June 2022, 21:24:48
Subject: Re: Unable to make use of "deep" JSONB index

Re: Unable to make use of "deep" JSONB index - Mailing list pgsql-bugs

Previous

Next