Home > mailing lists

[GENERAL] jsonb case insensitive search - Mailing list pgsql-general

From	armand pirvu
Subject	[GENERAL] jsonb case insensitive search
Date	May 31, 2017 22:18:21
Msg-id	287C7A09-A660-49AB-ADA8-866121D5156C@gmail.com Whole thread Raw
Responses	Re: [GENERAL] jsonb case insensitive search
List	pgsql-general

Tree view

The goal would be to be able to search case insensitive by any key:value combined with some other columns like outlined below, but initially would be Company:CompuTestSystems

A sample would be

SELECT *

FROM cfg_files_data

WHERE cfg_files_data.show_id = 32

AND cfg_files_data.file_id = 123

AND lower((file_data_record ->> 'Company')) = lower('CompuTestSystems');

;

- Table definition

Table "csischema.cfg_files_data"

Column | Type | Modifiers

--------------------+--------------------------+-----------------------------------------------------------------------

file_data_id | bigint | not null default nextval('cfg_files_data_file_data_id_seq'::regclass)

file_id | bigint | not null

show_id | bigint | not null

file_data_record | jsonb | not null

additional_info | jsonb |

file_data_add_by | character varying(100) | not null

file_data_add_date | timestamp with time zone | not null default now()

Indexes:

"cfg_files_data_pkey" PRIMARY KEY, btree (file_data_id, show_id)

- show_id, file_id data distribution

select show_id, file_id, count(*) from cfg_files_data group by show_id, file_id order by 1;

show_id | file_id | count

---------+---------+-------

4 | 9 | 3

4 | 68 | 22

4 | 2 | 6

6 | 3 | 13

6 | 5 | 215

13 | 13 | 13

13 | 8 | 22

21 | 11 | 13

21 | 10 | 22

26 | 12 | 13

30 | 16 | 6

32 | 123 | 53731

35 | 125 | 5

40 | 122 | 53731

46 | 69 | 4

46 | 64 | 4

46 | 67 | 4

46 | 70 | 4

For the example mentioned

SELECT *

FROM cfg_files_data

WHERE cfg_files_data.show_id = 32

AND cfg_files_data.file_id = 123

AND lower((file_data_record ->> 'Company')) = lower('CompuTestSystems');

;

create index cfg_files_data_record_idx on cfg_files_data (show_id, file_id, lower(file_data_record::text));

SELECT *

FROM cfg_files_data

WHERE cfg_files_data.show_id = 35

AND cfg_files_data.file_id = 125

AND lower((file_data_record ->> 'Company')) = lower('CompuTestSystems');

;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------

Index Scan using cfg_files_data_record_idx on cfg_files_data (cost=0.54..8.57 rows=1 width=359) (actual time=0.287..0.305 rows=5 loops=1)

Index Cond: ((show_id = 35) AND (file_id = 125))

Filter: (lower((file_data_record ->> 'Company'::text)) = 'computestsystems'::text)

Planning time: 0.271 ms

Execution time: 0.370 ms

(5 rows)

SELECT *

FROM cfg_files_data

WHERE cfg_files_data.show_id = 32

AND cfg_files_data.file_id = 123

AND lower((file_data_record ->> 'Company')) = lower('CompuTestSystems');

;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------

Seq Scan on cfg_files_data (cost=0.00..7276.20 rows=134 width=359) (actual time=194.817..194.848 rows=2 loops=1)

Filter: ((show_id = 32) AND (file_id = 123) AND (lower((file_data_record ->> 'Company'::text)) = 'computestsystems'::text))

Rows Removed by Filter: 107829

Planning time: 1.006 ms

Execution time: 194.905 ms

Shall I understand that prior to apply the lower((file_data_record ->> 'Company'::text)) = 'computestsystems'::text) filtering, it goes by show_id and field_id and for 32,123 since it is way above 5% it gets into a seq scan ? Although the end result is just 2 rows

SET enable_seqscan = OFF;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------

Bitmap Heap Scan on cfg_files_data (cost=2485.18..8544.40 rows=134 width=359) (actual time=173.314..173.317 rows=2 loops=1)

Recheck Cond: (show_id = 32)

Filter: ((file_id = 123) AND (lower((file_data_record ->> 'Company'::text)) = 'computestsystems'::text))

Rows Removed by Filter: 53729

Heap Blocks: exact=2437

-> Bitmap Index Scan on cfg_files_data_pkey (cost=0.00..2485.15 rows=53743 width=0) (actual time=12.195..12.195 rows=53731 loops=1)

Index Cond: (show_id = 32)

Planning time: 0.232 ms

Execution time: 173.392 ms

(9 rows)

And that got me to look at

SELECT *

FROM cfg_files_data

WHERE

lower((file_data_record ->> 'Company')) = lower('CompuTestSystems');

;

drop index cfg_files_data_record_idx ;

create index cfg_files_data_record_idx on cfg_files_data (lower(file_data_record::text));

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------

Seq Scan on cfg_files_data (cost=0.00..6737.04 rows=539 width=359) (actual time=194.382..194.468 rows=8 loops=1)

Filter: (lower((file_data_record ->> 'Company'::text)) = 'computestsystems'::text)

Rows Removed by Filter: 107823

Planning time: 0.421 ms

Execution time: 194.539 ms

Well the end result is 8 rows from a total of 100k+

Not sure why the index is ignored

But is is possible to go for a broader search, aka being able to search by any key:value , efficient and case insensitive ? What am I missing in this picture ?

Thanks for help

Armand

pgsql-general by date:

From: Julien Rouhaud
Date: 31 May 2017, 19:32:30
Subject: Re: [GENERAL] Ora2Pg-Database migration report

From: "David G. Johnston"
Date: 31 May 2017, 22:32:20
Subject: Re: [GENERAL] jsonb case insensitive search

[GENERAL] jsonb case insensitive search - Mailing list pgsql-general

Previous

Next