Thread: get counts of multiple field values in a jsonb column

get counts of multiple field values in a jsonb column

From

Martin Norbäck Olivers

Date:

17 October 2020, 15:00:05

HI!
I'm using postgres to store unstructured fields in a jsonb column. I also have a quite complicated query on the table, joining it with other tables etc, and given that query I want to get a count of all the values for a number of keys in the data.

Currently I'm doing one query for each key, like this
select data->>'field1', count(*) from COMPLICATED QUERY group by 1

select data->>'field2', count(*) from COMPLICATED QUERY group by 1

...
select data->>'fieldN', count(*) from COMPLICATED QUERY group by 1

field1, field2, ..., fieldN are known at query time.

But as the number of keys I want to count for increases, so does the time it takes to run all these queries. I think the main problem is that COMPLICATED QUERY is complicated and takes time to run each time. I would very much like to run only one query that counts all the values of all the fields, but I'm not quite sure how to do that. I'm looking at all the aggregation functions but can't quite find one that suits this purpose.

I would love to get some input on ways to make this faster.

Regards,

Martin

Re: get counts of multiple field values in a jsonb column

From

"David G. Johnston"

Date:

17 October 2020, 15:38:50

On Sat, Oct 17, 2020 at 8:00 AM Martin Norbäck Olivers <martin@norpan.org> wrote:

I'm looking at all the aggregation functions but can't quite find one that suits this purpose.

I would love to get some input on ways to make this faster.

If you want counts the count function is your goto aggregate function. What you are missing is performing conditional counting.

SELECT fld, count(*) FILTER (WHERE expression) FROM query GROUP BY fld;

https://www.postgresql.org/docs/13/sql-expressions.html#SYNTAX-AGGREGATES

David J.

Re: get counts of multiple field values in a jsonb column

From

Rob Sargent

Date:

17 October 2020, 16:15:12


> On Oct 17, 2020, at 9:00 AM, Martin Norbäck Olivers <martin@norpan.org> wrote:
>
> HI!
> I'm using postgres to store unstructured fields in a jsonb column. I also have a quite complicated query on the
table,joining it with other tables etc, and given that query I want to get a count of all the values for a number of
keysin the data. 
>
> Currently I'm doing one query for each key, like this
> select data->>'field1', count(*) from COMPLICATED QUERY group by 1
> select data->>'field2', count(*) from COMPLICATED QUERY group by 1
> ...
> select data->>'fieldN', count(*) from COMPLICATED QUERY group by 1
>
> field1, field2, ..., fieldN are known at query time.
>
> But as the number of keys I want to count for increases, so does the time it takes to run all these queries. I think
themain problem is that COMPLICATED QUERY is complicated and takes time to run each time. I would very much like to run
onlyone query that counts all the values of all the fields, but I'm not quite sure how to do that. I'm looking at all
theaggregation functions but can't quite find one that suits this purpose. 
>
> I would love to get some input on ways to make this faster.
>
> Regards,
>
> Martin

If the “COMPLICATED QUERY” is repeated verbatim each time, run it once into a temp table and do the counts against the
temptable. 

create temp table complicate as COMPLICATED_QUERY;
/*maybe index*/
select data->>'field1', count(*) from COMPLICATED QUERY group by 1

You can put the whole thing in a function and drop the temp table each run.


A CTE based on COMPLICATED_QUERY might work too.

Re: get counts of multiple field values in a jsonb column

From

Simon Riggs

Date:

20 October 2020, 08:56:42

On Sat, 17 Oct 2020 at 16:00, Martin Norbäck Olivers <martin@norpan.org> wrote:
>
> HI!
> I'm using postgres to store unstructured fields in a jsonb column. I also have a quite complicated query on the
table,joining it with other tables etc, and given that query I want to get a count of all the values for a number of
keysin the data. 
>
> Currently I'm doing one query for each key, like this
> select data->>'field1', count(*) from COMPLICATED QUERY group by 1
> select data->>'field2', count(*) from COMPLICATED QUERY group by 1
> ...
> select data->>'fieldN', count(*) from COMPLICATED QUERY group by 1
>
> field1, field2, ..., fieldN are known at query time.
>
> But as the number of keys I want to count for increases, so does the time it takes to run all these queries. I think
themain problem is that COMPLICATED QUERY is complicated and takes time to run each time. I would very much like to run
onlyone query that counts all the values of all the fields, but I'm not quite sure how to do that. I'm looking at all
theaggregation functions but can't quite find one that suits this purpose. 
>
> I would love to get some input on ways to make this faster.

You want something like this...

select key, count(*) from (select (jsonb_each_text(data)).key from
COMPLICATED QUERY) as kv group by 1;

--
Simon Riggs                http://www.EnterpriseDB.com/