Thread: Jsonb extraction very slow

Jsonb extraction very slow

From

"hari.prasath"

Date:

09 August 2016, 09:39:22

Hi all

I am using jsonb for storing key-value pair information(500 keys) and it was a very big data set with some 10M rows. Whenever i try to extract some keys(let say some 10 keys and its values) its really very slow.

Is this due to jsonb parsing (or) each time json will be loaded from disk to memory for 10keys(mainly if my keys are at end of 500 this is very slow).?

cheers

- Harry

Re: Jsonb extraction very slow

From

Tom Lane

Date:

09 August 2016, 14:29:53

"hari.prasath" <hari.prasath@zohocorp.com> writes:
>      I am using jsonb for storing key-value pair information(500 keys) and it was a very big data set with some 10M
rows.Whenever i try to extract some keys(let say some 10 keys and its values) its really very slow. 
> Is this due to jsonb parsing (or) each time json will be loaded from disk to memory for 10keys(mainly if my keys are
atend of 500 this is very slow).? 

It's probably mostly the cost to fetch and decompress the very wide json
field.  jsonb is pretty quick at finding an object key once it's got
the value available to look at.

You could possibly alleviate some of the speed issue by storing the column
uncompressed (see ALTER TABLE ... SET STORAGE EXTERNAL), but that would
bloat your disk space requirements so I'm not really sure it'd be a win.

            regards, tom lane

Re: Jsonb extraction very slow

From

Jim Nasby

Date:

10 August 2016, 23:34:22

On 8/9/16 9:29 AM, Tom Lane wrote:
> "hari.prasath" <hari.prasath@zohocorp.com> writes:
>>      I am using jsonb for storing key-value pair information(500 keys) and it was a very big data set with some 10M
rows.Whenever i try to extract some keys(let say some 10 keys and its values) its really very slow. 
>> Is this due to jsonb parsing (or) each time json will be loaded from disk to memory for 10keys(mainly if my keys are
atend of 500 this is very slow).? 
>
> It's probably mostly the cost to fetch and decompress the very wide json
> field.  jsonb is pretty quick at finding an object key once it's got
> the value available to look at.
>
> You could possibly alleviate some of the speed issue by storing the column
> uncompressed (see ALTER TABLE ... SET STORAGE EXTERNAL), but that would
> bloat your disk space requirements so I'm not really sure it'd be a win.

Actually I've done some testing with this and there is a *significant*
overhead in getting multiple keys from a large document. There's a
significant extra cost for the first key, but there's also a non-trivial
cost for every key after that.

I suspect the issue is the goofy logic used to store key name offsets
(to improve compression), but I never got around to actually tracing it.
I suspect there's a win to be had by having both json types use the
ExpandedObject stuff.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

Re: Jsonb extraction very slow

From

Jim Nasby

Date:

11 August 2016, 13:08:14

Please CC the list.

On 8/11/16 2:19 AM, hari.prasath wrote:
>>Actually I've done some testing with this and there is a *significant*
>>overhead in getting multiple keys from a large document. There's a
>>significant extra cost for the first key, but there's also a non-trivial
>>cost for every key after that.
>
> Why is it take some extra cost for the first key and less for keys after
> that.?
> Is there any specific reason for this.? if so please explain..

I never dug into why. As Tom posited, decompression might explain the
time to get a single key out. Getting 10 keys instead of just 1 wasn't
10x more expensive, but it was significantly more expensive than just
getting a single key.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

Re: Jsonb extraction very slow

From

Tom Lane

Date:

11 August 2016, 13:45:59

Jim Nasby <Jim.Nasby@BlueTreble.com> writes:
> I never dug into why. As Tom posited, decompression might explain the
> time to get a single key out. Getting 10 keys instead of just 1 wasn't
> 10x more expensive, but it was significantly more expensive than just
> getting a single key.

What were you doing to "get ten keys out"?  If those were ten separate
JSON operators, they'd likely have done ten separate decompressions.
You'd have saved something by having the TOAST data already fetched into
shared buffers, but it'd still hardly be free.

            regards, tom lane

Re: Jsonb extraction very slow

From

"hari.prasath"

Date:

12 August 2016, 04:31:53

>What were you doing to "get ten keys out"? If those were ten separate

>JSON operators, they'd likely have done ten separate decompressions.

>You'd have saved something by having the TOAST data already fetched into

>shared buffers, but it'd still hardly be free.

Now i got the point. Initially, i thought for n keys to extract from json only one time the full json is decompressed. But it's actually decompressing n times for n keys.

Thanks

- Harry

Re: Jsonb extraction very slow

From

Merlin Moncure

Date:

16 August 2016, 13:30:01

On Thu, Aug 11, 2016 at 8:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jim Nasby <Jim.Nasby@BlueTreble.com> writes:
>> I never dug into why. As Tom posited, decompression might explain the
>> time to get a single key out. Getting 10 keys instead of just 1 wasn't
>> 10x more expensive, but it was significantly more expensive than just
>> getting a single key.
>
> What were you doing to "get ten keys out"?  If those were ten separate
> JSON operators, they'd likely have done ten separate decompressions.
> You'd have saved something by having the TOAST data already fetched into
> shared buffers, but it'd still hardly be free.

Huh -- FWICT there is no way to pull N values from a jsonb with # of
items M for any value of N other than 1 or M with a single operation.

merlin

Re: Jsonb extraction very slow

From

Jim Nasby

Date:

16 August 2016, 14:58:24

On 8/11/16 8:45 AM, Tom Lane wrote:
> Jim Nasby <Jim.Nasby@BlueTreble.com> writes:
>> I never dug into why. As Tom posited, decompression might explain the
>> time to get a single key out. Getting 10 keys instead of just 1 wasn't
>> 10x more expensive, but it was significantly more expensive than just
>> getting a single key.
>
> What were you doing to "get ten keys out"?  If those were ten separate
> JSON operators, they'd likely have done ten separate decompressions.
> You'd have saved something by having the TOAST data already fetched into
> shared buffers, but it'd still hardly be free.

Multiple -> or ->> operators, but all operating on the same field (which
I thought would mean a single datum that would end up detoasted?).

Some of these would have been nested ->/->>. In essence, this was a set
of nested views that ultimately pulled from a single JSONB field.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

Re: Jsonb extraction very slow

From

Tom Lane

Date:

16 August 2016, 15:19:29

Jim Nasby <Jim.Nasby@BlueTreble.com> writes:
> On 8/11/16 8:45 AM, Tom Lane wrote:
>> What were you doing to "get ten keys out"?  If those were ten separate
>> JSON operators, they'd likely have done ten separate decompressions.
>> You'd have saved something by having the TOAST data already fetched into
>> shared buffers, but it'd still hardly be free.

> Multiple -> or ->> operators, but all operating on the same field (which
> I thought would mean a single datum that would end up detoasted?).

No, that's going to work as I said.  It'd be a useful thing to be able to
amortize the decompression work across multiple references to the field,
but currently there's no way to do that.

[ thinks for a bit... ]  In principle we could have the planner notice
whether there are multiple references to the same Var of a varlena type,
and then cue the executor to do a pre-emptive detoasting of that field
of the input tuple slot.  But it would be hard to avoid introducing some
regressions along with the benefits, I'm afraid.

> Some of these would have been nested ->/->>.

In a chain of functions only the first one would be paying the overhead
we're talking about here; though I'm not sure how efficient the case is
overall in JSONB.

            regards, tom lane

Re: Jsonb extraction very slow

From

Jim Nasby

Date:

16 August 2016, 19:47:38

On 8/16/16 10:19 AM, Tom Lane wrote:
> [ thinks for a bit... ]  In principle we could have the planner notice
> whether there are multiple references to the same Var of a varlena type,
> and then cue the executor to do a pre-emptive detoasting of that field
> of the input tuple slot.  But it would be hard to avoid introducing some
> regressions along with the benefits, I'm afraid.


I suspect that the ExtendedObject stuff makes this even more
appealing... it would certainly be nice if we only needed to pay the
expansion cost once (assuming no one dirtied the expanded object). I
certainly think there's more need for this kind of thing as the use of
JSON expands.

Perhaps that's part of what Robert was suggesting recently with moving
datums around the executor instead of tuples.

>> > Some of these would have been nested ->/->>.
> In a chain of functions only the first one would be paying the overhead
> we're talking about here; though I'm not sure how efficient the case is
> overall in JSONB.

I've since heard that chaining -> is a really bad idea compared to #>,
which is unfortunately because -> is the normal idiom in other languages
(and what I suspect everyone will use by default). I've wondered if an
expanded object version of json might be expanding only top-level keys
(and maybe only as needed), and then -> is actually just a pointer to
the originally expanded data. A chained -> then wouldn't need to
duplicate everything... and in fact might be able to do it's expansion
in the original object so that subsequent references to that key
wouldn't need to re-expand it. I don't think the current EO framework
supports that, but it doesn't seem impossible to add...
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461