Thread: jsonb access operators inefficiency
Hi! jsonb operators ->text, ->>text,->int, ->>int use inefficient methods to access to needed field, proportional O(N/2). Attached patch suggests for text operators O(log(N)) and for integer - O(1). The fuctions with fast access already are existed in current code and are used in contains operation, for example. Attached patch uses that functions instead of sequentual loop over object/array. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Attachment
On 05/30/2014 09:35 AM, Teodor Sigaev wrote: > Hi! > > jsonb operators ->text, ->>text,->int, ->>int use inefficient methods > to access to needed field, proportional O(N/2). Attached patch > suggests for text operators O(log(N)) and for integer - O(1). The > fuctions with fast access already are existed in current code and are > used in contains operation, for example. Attached patch uses that > functions instead of sequentual loop over object/array. Teodor, thanks for this. My only worry is this sort of code, which in a quick search I didn't find used elsewhere - (void) JsonbToCString(jtext, &tjb->root , -1); - result = cstring_to_text_with_len(jtext->data, jtext->len); + appendStringInfoSpaces(jtext, VARHDRSZ); + (void) JsonbToCString(jtext, v->val.binary.data, -1); + + result = (text*)jtext->data; + SET_VARSIZE(result, jtext->len); If we're going to construct varlena objects inside a StringInfo, maybe we need a proper API for it. Isn't there a danger that data member of the StringInfo won't be properly aligned to allow us to do this? In any case, we should get most of the benefit of your patch without this "optimization". cheers andrew
On 05/30/2014 11:08 AM, Andrew Dunstan wrote: > > On 05/30/2014 09:35 AM, Teodor Sigaev wrote: >> Hi! >> >> jsonb operators ->text, ->>text,->int, ->>int use inefficient methods >> to access to needed field, proportional O(N/2). Attached patch >> suggests for text operators O(log(N)) and for integer - O(1). The >> fuctions with fast access already are existed in current code and are >> used in contains operation, for example. Attached patch uses that >> functions instead of sequentual loop over object/array. > > Teodor, thanks for this. > > My only worry is this sort of code, which in a quick search I didn't > find used elsewhere > > - (void) JsonbToCString(jtext, &tjb->root , -1); > - result = cstring_to_text_with_len(jtext->data, > jtext->len); > + appendStringInfoSpaces(jtext, VARHDRSZ); > + (void) JsonbToCString(jtext, v->val.binary.data, > -1); > + > + result = (text*)jtext->data; > + SET_VARSIZE(result, jtext->len); > > > If we're going to construct varlena objects inside a StringInfo, maybe > we need a proper API for it. Isn't there a danger that data member of > the StringInfo won't be properly aligned to allow us to do this? In > any case, we should get most of the benefit of your patch without this > "optimization". I see that palloc.h says: The result of palloc() is always word-aligned so maybe my alignment fear is misplaced. So my remaining question is whether this is OK stylistically. cheers andrew
> > If we're going to construct varlena objects inside a StringInfo, maybe > we need a proper API for it. Isn't there a danger that data member of > the StringInfo won't be properly aligned to allow us to do this? In any > case, we should get most of the benefit of your patch without this > "optimization". I believe that StringInfo->data is palloc'ed, it means it's MAXALIGNed. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
> I see that palloc.h says: > > The result of palloc() is always word-aligned void * repalloc(void *pointer, Size size) { ... /* * Try to detect bogus pointers handed to us, poorly though we can. * Presumably, a pointer that isn't MAXALIGNEDisn't pointing at an * allocated chunk. */ Assert(pointer != NULL); Assert(pointer == (void *) MAXALIGN(pointer)); ... > > > so maybe my alignment fear is misplaced. So my remaining question is > whether this is OK stylistically. Something like this? initStringInfoVarlena()/makeStringInfoVarlena() {initStringInfo()appendStringInfoSpaces(jtext, VARHDRSZ); } char* formStringInfoVarlena() {SET_VARSIZE(jtext->data, jtext->len);return jtext->data; } -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
The patch really improves access performance to jsonb. On the delicious bookmarks I got 5 times better performance.Now jsonb outperforms json on simple access (slide 12 of pgcon presentation) by 103 times ! Oleg On Fri, May 30, 2014 at 9:35 AM, Teodor Sigaev <teodor@sigaev.ru> wrote: > Hi! > > jsonb operators ->text, ->>text,->int, ->>int use inefficient methods to > access to needed field, proportional O(N/2). Attached patch suggests for > text operators O(log(N)) and for integer - O(1). The fuctions with fast > access already are existed in current code and are used in contains > operation, for example. Attached patch uses that functions instead of > sequentual loop over object/array. > -- > Teodor Sigaev E-mail: teodor@sigaev.ru > WWW: > http://www.sigaev.ru/ > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
On 05/30/2014 01:30 PM, Oleg Bartunov wrote: > The patch really improves access performance to jsonb. On the > delicious bookmarks I got 5 times better performance.Now jsonb > outperforms json on simple access (slide 12 of pgcon presentation) by > 103 times ! > > (Oleg, please try not to top-post) The question is whether the speedup comes from the reduction in lookup times or the reduction in string copying. I have a strong suspicion that it's mostly the first, not the second. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > If we're going to construct varlena objects inside a StringInfo, maybe > we need a proper API for it. Isn't there a danger that data member of > the StringInfo won't be properly aligned to allow us to do this? In any > case, we should get most of the benefit of your patch without this > "optimization". As noted, the data buffer is maxaligned; but nonetheless I agree that this is a serious stylistic abuse, and it's not buying much compared to the rest of the patch. I'd stick with the cstring_to_text_with_len coding. At the other end of the process, why are we using PG_GETARG_TEXT_P and not PG_GETARG_TEXT_PP to avoid a "detoast" cycle on short-header inputs? The function body is using VARDATA_ANY/VARSIZE_ANY_EXHDR, so it's already prepared for unaligned input. regards, tom lane
> The question is whether the speedup comes from the reduction in lookup > times or the reduction in string copying. I have a strong suspicion that > it's mostly the first, not the second. Agree with about Oleg's test, but some users put huge values into json... Actually, I don't insist, ITSM, that's easy way to prevent extra copying. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
On 05/30/2014 01:41 PM, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> If we're going to construct varlena objects inside a StringInfo, maybe >> we need a proper API for it. Isn't there a danger that data member of >> the StringInfo won't be properly aligned to allow us to do this? In any >> case, we should get most of the benefit of your patch without this >> "optimization". > As noted, the data buffer is maxaligned; but nonetheless I agree that > this is a serious stylistic abuse, and it's not buying much compared > to the rest of the patch. I'd stick with the cstring_to_text_with_len > coding. > > At the other end of the process, why are we using PG_GETARG_TEXT_P > and not PG_GETARG_TEXT_PP to avoid a "detoast" cycle on short-header > inputs? The function body is using VARDATA_ANY/VARSIZE_ANY_EXHDR, > so it's already prepared for unaligned input. > > Committed with these changes. cheers andrew