[GENERAL] jsonb case insensitive search - Mailing list pgsql-general
From | armand pirvu |
---|---|
Subject | [GENERAL] jsonb case insensitive search |
Date | |
Msg-id | 287C7A09-A660-49AB-ADA8-866121D5156C@gmail.com Whole thread Raw |
Responses |
Re: [GENERAL] jsonb case insensitive search
|
List | pgsql-general |
Hi
The goal would be to be able to search case insensitive by any key:value combined with some other columns like outlined below, but initially would be Company:CompuTestSystems
A sample would be
SELECT *
FROM cfg_files_data
WHERE cfg_files_data.show_id = 32
AND cfg_files_data.file_id = 123
AND lower((file_data_record ->> 'Company')) = lower('CompuTestSystems');
;
- Table definition
Table "csischema.cfg_files_data"
Column | Type | Modifiers
--------------------+--------------------------+-----------------------------------------------------------------------
file_data_id | bigint | not null default nextval('cfg_files_data_file_data_id_seq'::regclass)
file_id | bigint | not null
show_id | bigint | not null
file_data_record | jsonb | not null
additional_info | jsonb |
file_data_add_by | character varying(100) | not null
file_data_add_date | timestamp with time zone | not null default now()
Indexes:
"cfg_files_data_pkey" PRIMARY KEY, btree (file_data_id, show_id)
- show_id, file_id data distribution
select show_id, file_id, count(*) from cfg_files_data group by show_id, file_id order by 1;
show_id | file_id | count
---------+---------+-------
4 | 9 | 3
4 | 68 | 22
4 | 2 | 6
6 | 3 | 13
6 | 5 | 215
13 | 13 | 13
13 | 8 | 22
21 | 11 | 13
21 | 10 | 22
26 | 12 | 13
30 | 16 | 6
32 | 123 | 53731
35 | 125 | 5
40 | 122 | 53731
46 | 69 | 4
46 | 64 | 4
46 | 67 | 4
46 | 70 | 4
For the example mentioned
SELECT *
FROM cfg_files_data
WHERE cfg_files_data.show_id = 32
AND cfg_files_data.file_id = 123
AND lower((file_data_record ->> 'Company')) = lower('CompuTestSystems');
;
create index cfg_files_data_record_idx on cfg_files_data (show_id, file_id, lower(file_data_record::text));
SELECT *
FROM cfg_files_data
WHERE cfg_files_data.show_id = 35
AND cfg_files_data.file_id = 125
AND lower((file_data_record ->> 'Company')) = lower('CompuTestSystems');
;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using cfg_files_data_record_idx on cfg_files_data (cost=0.54..8.57 rows=1 width=359) (actual time=0.287..0.305 rows=5 loops=1)
Index Cond: ((show_id = 35) AND (file_id = 125))
Filter: (lower((file_data_record ->> 'Company'::text)) = 'computestsystems'::text)
Planning time: 0.271 ms
Execution time: 0.370 ms
(5 rows)
SELECT *
FROM cfg_files_data
WHERE cfg_files_data.show_id = 32
AND cfg_files_data.file_id = 123
AND lower((file_data_record ->> 'Company')) = lower('CompuTestSystems');
;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Seq Scan on cfg_files_data (cost=0.00..7276.20 rows=134 width=359) (actual time=194.817..194.848 rows=2 loops=1)
Filter: ((show_id = 32) AND (file_id = 123) AND (lower((file_data_record ->> 'Company'::text)) = 'computestsystems'::text))
Rows Removed by Filter: 107829
Planning time: 1.006 ms
Execution time: 194.905 ms
Shall I understand that prior to apply the lower((file_data_record ->> 'Company'::text)) = 'computestsystems'::text) filtering, it goes by show_id and field_id and for 32,123 since it is way above 5% it gets into a seq scan ? Although the end result is just 2 rows
SET enable_seqscan = OFF;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on cfg_files_data (cost=2485.18..8544.40 rows=134 width=359) (actual time=173.314..173.317 rows=2 loops=1)
Recheck Cond: (show_id = 32)
Filter: ((file_id = 123) AND (lower((file_data_record ->> 'Company'::text)) = 'computestsystems'::text))
Rows Removed by Filter: 53729
Heap Blocks: exact=2437
-> Bitmap Index Scan on cfg_files_data_pkey (cost=0.00..2485.15 rows=53743 width=0) (actual time=12.195..12.195 rows=53731 loops=1)
Index Cond: (show_id = 32)
Planning time: 0.232 ms
Execution time: 173.392 ms
(9 rows)
And that got me to look at
SELECT *
FROM cfg_files_data
WHERE
lower((file_data_record ->> 'Company')) = lower('CompuTestSystems');
;
drop index cfg_files_data_record_idx ;
create index cfg_files_data_record_idx on cfg_files_data (lower(file_data_record::text));
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Seq Scan on cfg_files_data (cost=0.00..6737.04 rows=539 width=359) (actual time=194.382..194.468 rows=8 loops=1)
Filter: (lower((file_data_record ->> 'Company'::text)) = 'computestsystems'::text)
Rows Removed by Filter: 107823
Planning time: 0.421 ms
Execution time: 194.539 ms
Well the end result is 8 rows from a total of 100k+
Not sure why the index is ignored
But is is possible to go for a broader search, aka being able to search by any key:value , efficient and case insensitive ? What am I missing in this picture ?
Thanks for help
Armand
pgsql-general by date: