Thread: get counts of multiple field values in a jsonb column
HI!
I'm using postgres to store unstructured fields in a jsonb column. I also have a quite complicated query on the table, joining it with other tables etc, and given that query I want to get a count of all the values for a number of keys in the data.
Currently I'm doing one query for each key, like this
select data->>'field1', count(*) from COMPLICATED QUERY group by 1
I'm using postgres to store unstructured fields in a jsonb column. I also have a quite complicated query on the table, joining it with other tables etc, and given that query I want to get a count of all the values for a number of keys in the data.
Currently I'm doing one query for each key, like this
select data->>'field1', count(*) from COMPLICATED QUERY group by 1
select data->>'field2', count(*) from COMPLICATED QUERY group by 1
...
select data->>'fieldN', count(*) from COMPLICATED QUERY group by 1
field1, field2, ..., fieldN are known at query time.
select data->>'fieldN', count(*) from COMPLICATED QUERY group by 1
field1, field2, ..., fieldN are known at query time.
But as the number of keys I want to count for increases, so does the time it takes to run all these queries. I think the main problem is that COMPLICATED QUERY is complicated and takes time to run each time. I would very much like to run only one query that counts all the values of all the fields, but I'm not quite sure how to do that. I'm looking at all the aggregation functions but can't quite find one that suits this purpose.
I would love to get some input on ways to make this faster.
Regards,
Martin
I would love to get some input on ways to make this faster.
Regards,
Martin
On Sat, Oct 17, 2020 at 8:00 AM Martin Norbäck Olivers <martin@norpan.org> wrote:
I'm looking at all the aggregation functions but can't quite find one that suits this purpose.
I would love to get some input on ways to make this faster.
If you want counts the count function is your goto aggregate function. What you are missing is performing conditional counting.
SELECT fld, count(*) FILTER (WHERE expression) FROM query GROUP BY fld;
David J.
> On Oct 17, 2020, at 9:00 AM, Martin Norbäck Olivers <martin@norpan.org> wrote: > > HI! > I'm using postgres to store unstructured fields in a jsonb column. I also have a quite complicated query on the table,joining it with other tables etc, and given that query I want to get a count of all the values for a number of keysin the data. > > Currently I'm doing one query for each key, like this > select data->>'field1', count(*) from COMPLICATED QUERY group by 1 > select data->>'field2', count(*) from COMPLICATED QUERY group by 1 > ... > select data->>'fieldN', count(*) from COMPLICATED QUERY group by 1 > > field1, field2, ..., fieldN are known at query time. > > But as the number of keys I want to count for increases, so does the time it takes to run all these queries. I think themain problem is that COMPLICATED QUERY is complicated and takes time to run each time. I would very much like to run onlyone query that counts all the values of all the fields, but I'm not quite sure how to do that. I'm looking at all theaggregation functions but can't quite find one that suits this purpose. > > I would love to get some input on ways to make this faster. > > Regards, > > Martin If the “COMPLICATED QUERY” is repeated verbatim each time, run it once into a temp table and do the counts against the temptable. create temp table complicate as COMPLICATED_QUERY; /*maybe index*/ select data->>'field1', count(*) from COMPLICATED QUERY group by 1 You can put the whole thing in a function and drop the temp table each run. A CTE based on COMPLICATED_QUERY might work too.
On Sat, 17 Oct 2020 at 16:00, Martin Norbäck Olivers <martin@norpan.org> wrote: > > HI! > I'm using postgres to store unstructured fields in a jsonb column. I also have a quite complicated query on the table,joining it with other tables etc, and given that query I want to get a count of all the values for a number of keysin the data. > > Currently I'm doing one query for each key, like this > select data->>'field1', count(*) from COMPLICATED QUERY group by 1 > select data->>'field2', count(*) from COMPLICATED QUERY group by 1 > ... > select data->>'fieldN', count(*) from COMPLICATED QUERY group by 1 > > field1, field2, ..., fieldN are known at query time. > > But as the number of keys I want to count for increases, so does the time it takes to run all these queries. I think themain problem is that COMPLICATED QUERY is complicated and takes time to run each time. I would very much like to run onlyone query that counts all the values of all the fields, but I'm not quite sure how to do that. I'm looking at all theaggregation functions but can't quite find one that suits this purpose. > > I would love to get some input on ways to make this faster. You want something like this... select key, count(*) from (select (jsonb_each_text(data)).key from COMPLICATED QUERY) as kv group by 1; -- Simon Riggs http://www.EnterpriseDB.com/