Thread: Jsonb extraction very slow
Hi all
I am using jsonb for storing key-value pair information(500 keys) and it was a very big data set with some 10M rows. Whenever i try to extract some keys(let say some 10 keys and its values) its really very slow.
Is this due to jsonb parsing (or) each time json will be loaded from disk to memory for 10keys(mainly if my keys are at end of 500 this is very slow).?
cheers
- Harry
"hari.prasath" <hari.prasath@zohocorp.com> writes: > I am using jsonb for storing key-value pair information(500 keys) and it was a very big data set with some 10M rows.Whenever i try to extract some keys(let say some 10 keys and its values) its really very slow. > Is this due to jsonb parsing (or) each time json will be loaded from disk to memory for 10keys(mainly if my keys are atend of 500 this is very slow).? It's probably mostly the cost to fetch and decompress the very wide json field. jsonb is pretty quick at finding an object key once it's got the value available to look at. You could possibly alleviate some of the speed issue by storing the column uncompressed (see ALTER TABLE ... SET STORAGE EXTERNAL), but that would bloat your disk space requirements so I'm not really sure it'd be a win. regards, tom lane
On 8/9/16 9:29 AM, Tom Lane wrote: > "hari.prasath" <hari.prasath@zohocorp.com> writes: >> I am using jsonb for storing key-value pair information(500 keys) and it was a very big data set with some 10M rows.Whenever i try to extract some keys(let say some 10 keys and its values) its really very slow. >> Is this due to jsonb parsing (or) each time json will be loaded from disk to memory for 10keys(mainly if my keys are atend of 500 this is very slow).? > > It's probably mostly the cost to fetch and decompress the very wide json > field. jsonb is pretty quick at finding an object key once it's got > the value available to look at. > > You could possibly alleviate some of the speed issue by storing the column > uncompressed (see ALTER TABLE ... SET STORAGE EXTERNAL), but that would > bloat your disk space requirements so I'm not really sure it'd be a win. Actually I've done some testing with this and there is a *significant* overhead in getting multiple keys from a large document. There's a significant extra cost for the first key, but there's also a non-trivial cost for every key after that. I suspect the issue is the goofy logic used to store key name offsets (to improve compression), but I never got around to actually tracing it. I suspect there's a win to be had by having both json types use the ExpandedObject stuff. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461
Please CC the list. On 8/11/16 2:19 AM, hari.prasath wrote: >>Actually I've done some testing with this and there is a *significant* >>overhead in getting multiple keys from a large document. There's a >>significant extra cost for the first key, but there's also a non-trivial >>cost for every key after that. > > Why is it take some extra cost for the first key and less for keys after > that.? > Is there any specific reason for this.? if so please explain.. I never dug into why. As Tom posited, decompression might explain the time to get a single key out. Getting 10 keys instead of just 1 wasn't 10x more expensive, but it was significantly more expensive than just getting a single key. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461
Jim Nasby <Jim.Nasby@BlueTreble.com> writes: > I never dug into why. As Tom posited, decompression might explain the > time to get a single key out. Getting 10 keys instead of just 1 wasn't > 10x more expensive, but it was significantly more expensive than just > getting a single key. What were you doing to "get ten keys out"? If those were ten separate JSON operators, they'd likely have done ten separate decompressions. You'd have saved something by having the TOAST data already fetched into shared buffers, but it'd still hardly be free. regards, tom lane
>What were you doing to "get ten keys out"? If those were ten separate
>JSON operators, they'd likely have done ten separate decompressions.
>You'd have saved something by having the TOAST data already fetched into
>shared buffers, but it'd still hardly be free.
Now i got the point. Initially, i thought for n keys to extract from json only one time the full json is decompressed. But it's actually decompressing n times for n keys.
Thanks
- Harry
On Thu, Aug 11, 2016 at 8:45 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Jim Nasby <Jim.Nasby@BlueTreble.com> writes: >> I never dug into why. As Tom posited, decompression might explain the >> time to get a single key out. Getting 10 keys instead of just 1 wasn't >> 10x more expensive, but it was significantly more expensive than just >> getting a single key. > > What were you doing to "get ten keys out"? If those were ten separate > JSON operators, they'd likely have done ten separate decompressions. > You'd have saved something by having the TOAST data already fetched into > shared buffers, but it'd still hardly be free. Huh -- FWICT there is no way to pull N values from a jsonb with # of items M for any value of N other than 1 or M with a single operation. merlin
On 8/11/16 8:45 AM, Tom Lane wrote: > Jim Nasby <Jim.Nasby@BlueTreble.com> writes: >> I never dug into why. As Tom posited, decompression might explain the >> time to get a single key out. Getting 10 keys instead of just 1 wasn't >> 10x more expensive, but it was significantly more expensive than just >> getting a single key. > > What were you doing to "get ten keys out"? If those were ten separate > JSON operators, they'd likely have done ten separate decompressions. > You'd have saved something by having the TOAST data already fetched into > shared buffers, but it'd still hardly be free. Multiple -> or ->> operators, but all operating on the same field (which I thought would mean a single datum that would end up detoasted?). Some of these would have been nested ->/->>. In essence, this was a set of nested views that ultimately pulled from a single JSONB field. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461
Jim Nasby <Jim.Nasby@BlueTreble.com> writes: > On 8/11/16 8:45 AM, Tom Lane wrote: >> What were you doing to "get ten keys out"? If those were ten separate >> JSON operators, they'd likely have done ten separate decompressions. >> You'd have saved something by having the TOAST data already fetched into >> shared buffers, but it'd still hardly be free. > Multiple -> or ->> operators, but all operating on the same field (which > I thought would mean a single datum that would end up detoasted?). No, that's going to work as I said. It'd be a useful thing to be able to amortize the decompression work across multiple references to the field, but currently there's no way to do that. [ thinks for a bit... ] In principle we could have the planner notice whether there are multiple references to the same Var of a varlena type, and then cue the executor to do a pre-emptive detoasting of that field of the input tuple slot. But it would be hard to avoid introducing some regressions along with the benefits, I'm afraid. > Some of these would have been nested ->/->>. In a chain of functions only the first one would be paying the overhead we're talking about here; though I'm not sure how efficient the case is overall in JSONB. regards, tom lane
On 8/16/16 10:19 AM, Tom Lane wrote: > [ thinks for a bit... ] In principle we could have the planner notice > whether there are multiple references to the same Var of a varlena type, > and then cue the executor to do a pre-emptive detoasting of that field > of the input tuple slot. But it would be hard to avoid introducing some > regressions along with the benefits, I'm afraid. I suspect that the ExtendedObject stuff makes this even more appealing... it would certainly be nice if we only needed to pay the expansion cost once (assuming no one dirtied the expanded object). I certainly think there's more need for this kind of thing as the use of JSON expands. Perhaps that's part of what Robert was suggesting recently with moving datums around the executor instead of tuples. >> > Some of these would have been nested ->/->>. > In a chain of functions only the first one would be paying the overhead > we're talking about here; though I'm not sure how efficient the case is > overall in JSONB. I've since heard that chaining -> is a really bad idea compared to #>, which is unfortunately because -> is the normal idiom in other languages (and what I suspect everyone will use by default). I've wondered if an expanded object version of json might be expanding only top-level keys (and maybe only as needed), and then -> is actually just a pointer to the originally expanded data. A chained -> then wouldn't need to duplicate everything... and in fact might be able to do it's expansion in the original object so that subsequent references to that key wouldn't need to re-expand it. I don't think the current EO framework supports that, but it doesn't seem impossible to add... -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461