Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values - Mailing list pgsql-bugs
From | Tom Lane |
---|---|
Subject | Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values |
Date | |
Msg-id | 9397.1502049651@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE andwork_mem values
Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE andwork_mem values |
List | pgsql-bugs |
I wrote: > "Daniel Verite" <daniel@manitou-mail.org> writes: >> If someone wants to reproduce the problem, I've made >> a custom dump (40MB) available at >> http://www.manitou-mail.org/vrac/words_test.dump > Thanks for providing this test data. I've attempted, so far > unsuccessfully, to reproduce the crash. OK, so after installing ICU 52.1 from upstream sources, I can reproduce these problems, and I think there's nothing to see here except ICU bugs. In the case where Daniel reports huge core dumps, the problem seems to be that ucol_strcollRegular() gets into an infinite loop here: // We fetch CEs until we hit a non ignorable primary or end. uint32_t sPrimary; do { // We get the next CE sOrder = ucol_IGetNextCE(coll, sColl, status); // Stuff it inthe buffer UCOL_CEBUF_PUT(&sCEs, sOrder, sColl, status); // And keep just the primary part. sPrimary = sOrder & UCOL_PRIMARYMASK; } while(sPrimary == 0); It keeps on stuffing tokens into "sCEs", which is an expansible buffer in the same spirit as our StringInfos, and once that gets past 2^31 tokens all hell breaks loose, because the expansion code is entirely unconcerned about the possibility of integer overflow: void ucol_CEBuf_Expand(ucol_CEBuf *b, collIterate *ci, UErrorCode *status) { uint32_t oldSize; uint32_t newSize; uint32_t *newBuf; ci->flags |= UCOL_ITER_ALLOCATED; oldSize = (uint32_t)(b->pos - b->buf); newSize = oldSize * 2; newBuf = (uint32_t*)uprv_malloc(newSize * sizeof(uint32_t)); if(newBuf == NULL) { *status = U_MEMORY_ALLOCATION_ERROR; } else { uprv_memcpy(newBuf, b->buf, oldSize * sizeof(uint32_t)); ... The buffer size seems to always be a power of 2. After enough iterations oldSize is 2^31, so newSize overflows to zero, and it allocates a zero-length newBuf and proceeds to copy ~8GB of data there. No surprise that we get a SIGSEGV, but before that happens it'll have trashed a ton of memory. I suspect whatever problems valgrind is reporting are just false positives induced by that massive memory stomp. So there are two separate bugs there --- one is that that loop fails to terminate, for reasons that are doubtless specific to particular collation data, and the other is that the expansible-buffer code is not robust. The other class of failures amounts to this loop iterating till it falls off the end of memory: while(schar > (tchar = *(UCharOffset+offset))) { /* since the contraction codepoints should be ordered, we skip allthat are smaller */ offset++; } which is unsurprising, because (in my core dump) schar is 1113834 which is larger than any possible UChar value, so the loop cannot terminate except by crashing. The crash occurred while trying to process this string: buf2 = 0x1614d20 "requ\364\217\273\252te", and I do not think it's coincidence that that multibyte character there corresponds to U+10FEEA or decimal 1113834. Apparently they've got some bugs with dealing with characters beyond U+FFFF, at least in certain locales. In short then, ICU 52.1 is just too buggy to contemplate using. I haven't compared versions to see when these issues were introduced or fixed. But I'm now thinking that trying to support old ICU versions is a mistake, and we'd better serve our users by telling them not to use ICU before some-version-after-52. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
pgsql-bugs by date: