Re: querying with index on jsonb slower than standard column. Why? - Mailing list pgsql-sql
| From | Adrian Klaver |
|---|---|
| Subject | Re: querying with index on jsonb slower than standard column. Why? |
| Date | |
| Msg-id | 5485C449.4020204@aklaver.com Whole thread Raw |
| In response to | Re: querying with index on jsonb slower than standard column. Why? (Tom Lane <tgl@sss.pgh.pa.us>) |
| Responses |
Re: querying with index on jsonb slower than standard column. Why?
|
| List | pgsql-sql |
On 12/07/2014 05:28 PM, Tom Lane wrote:
> Tim Dudgeon <tdudgeon.ml@gmail.com> writes:
>> The index created is not a gin index. Its a standard btree index on the
>> data extracted from the json. So the indexes on the standard columns and
>> the ones on the 'fields' extracted from the json seem to be equivalent.
>> But perform differently.
>
> I don't see any particular difference ...
>
> regression=# explain analyze select count(*) from json_test where (data->>'assay1_ic50')::float > 90
> and (data->>'assay2_ic50')::float < 10;
> QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=341613.79..341613.80 rows=1 width=0) (actual time=901.207..901.208 rows=1 loops=1)
> -> Bitmap Heap Scan on json_test (cost=123684.69..338836.02 rows=1111111 width=0) (actual time=497.982..887.128
rows=100690loops=1)
> Recheck Cond: ((((data ->> 'assay2_ic50'::text))::double precision < 10::double precision) AND (((data ->>
'assay1_ic50'::text))::doubleprecision > 90::double precision))
> Heap Blocks: exact=77578
> -> BitmapAnd (cost=123684.69..123684.69 rows=1111111 width=0) (actual time=476.585..476.585 rows=0
loops=1)
> -> Bitmap Index Scan on idx_data_json_assay2_ic50 (cost=0.00..61564.44 rows=3333333 width=0)
(actualtime=219.287..219.287 rows=999795 loops=1)
> Index Cond: (((data ->> 'assay2_ic50'::text))::double precision < 10::double precision)
> -> Bitmap Index Scan on idx_data_json_assay1_ic50 (cost=0.00..61564.44 rows=3333333 width=0)
(actualtime=208.197..208.197 rows=1000231 loops=1)
> Index Cond: (((data ->> 'assay1_ic50'::text))::double precision > 90::double precision)
> Planning time: 0.128 ms
> Execution time: 904.196 ms
> (11 rows)
>
> regression=# explain analyze select count(*) from json_test where assay1_ic50 > 90 and assay2_ic50 < 10;
> QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
> Aggregate (cost=197251.24..197251.25 rows=1 width=0) (actual time=895.238..895.238 rows=1 loops=1)
> -> Bitmap Heap Scan on json_test (cost=36847.25..197003.24 rows=99197 width=0) (actual time=495.427..881.033
rows=100690loops=1)
> Recheck Cond: ((assay2_ic50 < 10::double precision) AND (assay1_ic50 > 90::double precision))
> Heap Blocks: exact=77578
> -> BitmapAnd (cost=36847.25..36847.25 rows=99197 width=0) (actual time=474.201..474.201 rows=0 loops=1)
> -> Bitmap Index Scan on idx_data_col_assay2_ic50 (cost=0.00..18203.19 rows=985434 width=0) (actual
time=219.060..219.060rows=999795 loops=1)
> Index Cond: (assay2_ic50 < 10::double precision)
> -> Bitmap Index Scan on idx_data_col_assay1_ic50 (cost=0.00..18594.21 rows=1006637 width=0) (actual
time=206.066..206.066rows=1000231 loops=1)
> Index Cond: (assay1_ic50 > 90::double precision)
> Planning time: 0.129 ms
> Execution time: 898.237 ms
> (11 rows)
>
> regression=# \timing
> Timing is on.
> regression=# select count(*) from json_test where (data->>'assay1_ic50')::float > 90
> and (data->>'assay2_ic50')::float < 10;
> count
> --------
> 100690
> (1 row)
>
> Time: 882.607 ms
> regression=# select count(*) from json_test where assay1_ic50 > 90 and assay2_ic50 < 10;
> count
> --------
> 100690
> (1 row)
>
> Time: 881.071 ms
>
> regards, tom lane
>
>
Running the above on my machine I do see the slow down the OP reports. I ran it several times
and it stayed around 3.5x. It might be interesting to get the OS and architecture information
from the OP.
test=# select version();
version
------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 9.4rc1 on i686-pc-linux-gnu, compiled by gcc (SUSE
Linux)4.8.1 20130909 [gcc-4_8-branch revision 202388], 32-bit
(1 row)
test=# \timing
Timing is on.
test=# select count(*) from json_test where (data->>'assay1_ic50')::float > 90
test-# and (data->>'assay2_ic50')::float < 10;count
-------99288
(1 row)
Time: 9092.966 ms
test=# select count(*) from json_test where assay1_ic50 > 90 and assay2_ic50 < 10;count
-------99288
(1 row)
Time: 2542.294 ms
explain analyze select count(*) from json_test where (data->>'assay1_ic50')::float > 90
and (data->>'assay2_ic50')::float < 10;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=332209.79..332209.80 rows=1 width=0) (actual time=8980.009..8980.009 rows=1 loops=1) -> Bitmap Heap Scan on
json_test (cost=123684.69..329432.02 rows=1111111 width=0) (actual time=538.688..8960.308 rows=99288 loops=1)
RecheckCond: ((((data ->> 'assay2_ic50'::text))::double precision < 10::double precision) AND (((data ->>
'assay1_ic50'::text))::doubleprecision > 90::double precision)) Rows Removed by Index Recheck: 7588045
HeapBlocks: exact=20894 lossy=131886 -> BitmapAnd (cost=123684.69..123684.69 rows=1111111 width=0) (actual
time=531.066..531.066rows=0 loops=1) -> Bitmap Index Scan on idx_data_json_assay2_ic50
(cost=0.00..61564.44rows=3333333 width=0) (actual time=258.717..258.717 rows=998690 loops=1) Index
Cond:(((data ->> 'assay2_ic50'::text))::double precision < 10::double precision) -> Bitmap Index Scan on
idx_data_json_assay1_ic50 (cost=0.00..61564.44 rows=3333333 width=0) (actual time=251.664..251.664 rows=997880 loops=1)
Index Cond: (((data ->> 'assay1_ic50'::text))::double precision > 90::double precision)Planning time:
0.391msExecution time: 8980.391 ms
(12 rows)
test=# explain analyze select count(*) from json_test where assay1_ic50 > 90 and assay2_ic50 < 10;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=196566.38..196566.39 rows=1 width=0) (actual time=2609.545..2609.545 rows=1 loops=1) -> Bitmap Heap Scan on
json_test (cost=37869.00..196304.39 rows=104796 width=0) (actual time=550.273..2590.093 rows=99288 loops=1)
RecheckCond: ((assay2_ic50 < 10::double precision) AND (assay1_ic50 > 90::double precision)) Rows Removed by
IndexRecheck: 7588045 Heap Blocks: exact=20894 lossy=131886 -> BitmapAnd (cost=37869.00..37869.00
rows=104796width=0) (actual time=542.666..542.666 rows=0 loops=1) -> Bitmap Index Scan on
idx_data_col_assay2_ic50 (cost=0.00..18871.73 rows=1021773 width=0) (actual time=263.959..263.959 rows=998690 loops=1)
Index Cond: (assay2_ic50 < 10::double precision) -> Bitmap Index Scan on
idx_data_col_assay1_ic50 (cost=0.00..18944.62 rows=1025624 width=0) (actual time=257.912..257.912 rows=997880 loops=1)
Index Cond: (assay1_ic50 > 90::double precision)Planning time: 0.834 msExecution time: 2609.960 ms
(12 rows)
--
Adrian Klaver
adrian.klaver@aklaver.com