Thread: sortsupport for text
I decided to investigate the possible virtues of allowing "text" to use the sortsupport infrastructure, since strings are something people often want to sort. I generated 100,000 random alphanumeric strings, each 30 characters in length, and loaded them into a single-column table, froze it, ran SELECT SUM(1) FROM tablename on it until it was fully cached, and then repeatedly quicksorted the table contents using my default locale (en_US.UTF8). I repeated this test a number of times, removing and recreating the data directory via initdb each time. The test was performed on my home desktop, which is running Fedora 14 (yeah, I know I should reinstall) and equipped with an AMD Athlon 5000 Dual-Core Processor. Here's the exact test query: SELECT SUM(1) FROM (SELECT * FROM randomtext ORDER BY t) x; On unpatched master, this takes about 416 ms (plus or minus a few). With the attached patch, it takes about 389 ms (plus or minus a very few), a speedup of about 7%. I repeated the experiment using the C locale, like this: SELECT SUM(1) FROM (SELECT * FROM randomtext ORDER BY t COLLATE "C") x; Here, it takes about 202 ms with the patch, and about 231 ms on unpatched master, a savings of about 13%. I also tried on a larger data set of 5 million strings, with a heap sort using work_mem=256MB. Unfortunately, there was so much noise there that it was hard to get any useful measurements: the exact same code base, using the exact same test script (that started with an initdb) could perform quite differently on consecutive runs, perhaps because the choice of blocks chosen to contain the database itself affected the efficiency of reading and writing temporary files. I think it may be faster, and the results on the smaller data set argue that it should be faster, but I was unable to gather reproducible numbers. I did observe the following oprofile results on a run on this larger data set, on master: 12789 28.2686 libc-2.13.so strcoll_l 6802 15.0350 postgres text_cmp 5081 11.2310 postgres comparetup_heap 3510 7.7584 postgres comparison_shim 2892 6.3924 postgres lc_collate_is_c 2722 6.0167 no-vmlinux /no-vmlinux 2596 5.7382 postgres varstr_cmp 2517 5.5635 libc-2.13.so __strlen_sse2 2515 5.5591 libc-2.13.so __memcpy_sse2 968 2.1397 postgres tuplesort_heap_siftup 710 1.5694 postgres bttextcmp 664 1.4677 postgres pg_detoast_datum_packed Clearly, a lot of that is unnecessary. Doing lc_collate_is_c for every tuple is a complete waste; as is translating the collation OID to collate_t; this patch arranges to do those things just once per sort. The comparison_shim is also a waste. Considering all that, I had hoped for more like a 15-20% gain from this approach, but it didn't happen, I suppose because some of the instructions saved just resulted in more processor stalls. All the same, I'm inclined to think it's still worth doing. I didn't attempt to handle the weirdness that is UTF-8 on Windows, since I don't develop on Windows. I thought when I wrote this code that I could just leave the comparator uninitialized and let the caller default to the shim implementation if the sort-support function didn't do anything. But I see now that PrepareSortSupportFromOrderingOp() feels that it's entitled to assume that the sort-support function will always fill in a comparator. Either that assumption needs to be changed, or the corresponding Windows code needs to be written, or the sort support function needs to call PrepareSortSupportComparisonShim() in this case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
On Fri, Mar 02, 2012 at 03:45:38PM -0500, Robert Haas wrote: > SELECT SUM(1) FROM (SELECT * FROM randomtext ORDER BY t) x; > > On unpatched master, this takes about 416 ms (plus or minus a few). > With the attached patch, it takes about 389 ms (plus or minus a very > few), a speedup of about 7%. > > I repeated the experiment using the C locale, like this: > > SELECT SUM(1) FROM (SELECT * FROM randomtext ORDER BY t COLLATE "C") x; > > Here, it takes about 202 ms with the patch, and about 231 ms on > unpatched master, a savings of about 13%. > [oprofile report, further discussion] Thanks for looking into this. Your patch is also a nice demonstration of sortsupport's ability to help with more than just fmgr overhead. > Considering all that, I > had hoped for more like a 15-20% gain from this approach, but it > didn't happen, I suppose because some of the instructions saved just > resulted in more processor stalls. All the same, I'm inclined to > think it's still worth doing. This is a border case, but I suggest that a 13% speedup on a narrowly-tailored benchmark, degrading to 7% in common configurations, is too meager to justify adopting this patch. nm
Noah Misch <noah@leadboat.com> wrote: > On Fri, Mar 02, 2012 at 03:45:38PM -0500, Robert Haas wrote: >> SELECT SUM(1) FROM (SELECT * FROM randomtext ORDER BY t) x; >> [13% faster with patch for C collation; 7% faster for UTF8] >> I had hoped for more like a 15-20% gain from this approach, but >> it didn't happen, I suppose because some of the instructions >> saved just resulted in more processor stalls. All the same, I'm >> inclined to think it's still worth doing. > > This is a border case, but I suggest that a 13% speedup on a > narrowly-tailored benchmark, degrading to 7% in common > configurations, is too meager to justify adopting this patch. We use the C collation and have character strings in most indexes and ORDER BY clauses. Unless there are significant contra- indications, I'm in favor of adopting this patch. -Kevin
On Fri, Mar 2, 2012 at 8:45 PM, Robert Haas <robertmhaas@gmail.com> wrote: > 12789 28.2686 libc-2.13.so strcoll_l > 6802 15.0350 postgres text_cmp I'm still curious how it would compare to call strxfrm and sort the resulting binary blobs. I don't think the sortsupport stuff actually makes this any easier though. Since using it requires storing the binary blob somewhere I think the support would have to be baked into tuplesort (or hacked into the sortkey as an expr that was evaluated earlier somehow). It's a tradeoff and not an obvious one. The binary blobs are larger and it would mean reading and copying more data around memory. But it would mean doing the work that strcoll_l does only n times instead of nlogn times. That might be a pretty significant gain. -- greg
Greg Stark <stark@mit.edu> writes: > I'm still curious how it would compare to call strxfrm and sort the > resulting binary blobs. In principle that should be a win; it's hard to believe that strxfrm would have gotten into the standards if it were not a win for sorting applications. > I don't think the sortsupport stuff actually > makes this any easier though. Since using it requires storing the > binary blob somewhere I think the support would have to be baked into > tuplesort (or hacked into the sortkey as an expr that was evaluated > earlier somehow). Well, obviously something has to be done, but I think it might be possible to express this as another sortsupport API function rather than doing anything as ugly as hardwiring strxfrm into the callers. However, it occurred to me that we could pretty easily jury-rig something that would give us an idea about the actual benefit available here. To wit: make a C function that wraps strxfrm, basically strxfrm(text) returns bytea. Then compare the performance of ORDER BY text_col to ORDER BY strxfrm(text_col). (You would need to have either both or neither of text and bytea using the sortsupport code paths for this to be a fair comparison.) One other thing I've always wondered about in this connection is the general performance of sorting toasted datums. Is it better to detoast them in every comparison, or pre-detoast to save comparison cycles at the cost of having to push much more data around? I didn't see any discussion of this point in Robert's benchmarks, but I don't think we should go very far towards enabling sortsupport for text until we understand the issue and know whether we need to add more infrastructure for it. If you cross your eyes a little bit, this is very much like the strxfrm question... regards, tom lane
On Sat, Mar 17, 2012 at 6:58 PM, Greg Stark <stark@mit.edu> wrote: > On Fri, Mar 2, 2012 at 8:45 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> 12789 28.2686 libc-2.13.so strcoll_l >> 6802 15.0350 postgres text_cmp > > I'm still curious how it would compare to call strxfrm and sort the > resulting binary blobs. I don't think the sortsupport stuff actually > makes this any easier though. Since using it requires storing the > binary blob somewhere I think the support would have to be baked into > tuplesort (or hacked into the sortkey as an expr that was evaluated > earlier somehow). Well, the real problem here is that the strxfrm'd representations aren't just bigger - they are huge. On MacBook Pro, if the input representation is n characters, the strxfrm'd representation is 9x+3 characters. If the we're sorting very wide tuples of which the sort key is only a small part, maybe that would be acceptable, but if the sort key makes up the bulk of the tuple than caching the strxfrm() representation works out to slashing work_mem tenfold. That might be just fine if the sort is going to fit in work_mem either way, but if it turns a quicksort into a heap sort then I feel pretty confident that it's going to be a loser. Keep in mind that even if the strxfrm'd representation were no larger at all, it would still amount to an additional copy of the data, so you'd still potentially be eating up lots of work_mem that way. The fact that it's an order of magnitude larger is just making a bad problem worse. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sun, Mar 18, 2012 at 11:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > However, it occurred to me that we could pretty easily jury-rig > something that would give us an idea about the actual benefit available > here. To wit: make a C function that wraps strxfrm, basically > strxfrm(text) returns bytea. Then compare the performance of > ORDER BY text_col to ORDER BY strxfrm(text_col). > > (You would need to have either both or neither of text and bytea > using the sortsupport code paths for this to be a fair comparison.) Since the index will be ~9x bigger at least on this machine, I think I know the answer, but I suppose it doesn't hurt to test it. It's not that much work. > One other thing I've always wondered about in this connection is the > general performance of sorting toasted datums. Is it better to detoast > them in every comparison, or pre-detoast to save comparison cycles at > the cost of having to push much more data around? I didn't see any > discussion of this point in Robert's benchmarks, but I don't think we > should go very far towards enabling sortsupport for text until we > understand the issue and know whether we need to add more infrastructure > for it. If you cross your eyes a little bit, this is very much like > the strxfrm question... It would be surprising to me if there is a one-size-fits-all answer to this question. For example, if the tuple got toasted because it's got lots of columns and we had to take desperate measures to get it to fit into an 8kB block at all, chances are that detoasting will work out well. We'll use a bit more memory, but hopefully that'll be repaid by much faster comparisons. OTOH, if you have a data set with many relatively short strings and a few very long ones, detoasting up-front could turn a quicksort into a heapsort. Since only a small fraction of the comparisons would have involved one of the problematic long strings anyway, it's unlikely to be worth the expense of keeping those strings around in detoasted form for the entire sort (unless maybe reconstructing them even a few times is problematic because we're under heavy cache pressure and we get lots of disk seeks as a result). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Mon, Mar 19, 2012 at 12:19:53PM -0400, Robert Haas wrote: > On Sat, Mar 17, 2012 at 6:58 PM, Greg Stark <stark@mit.edu> wrote: > > On Fri, Mar 2, 2012 at 8:45 PM, Robert Haas <robertmhaas@gmail.com> wrote: > >> 12789 28.2686 libc-2.13.so strcoll_l > >> 6802 15.0350 postgres text_cmp > > > > I'm still curious how it would compare to call strxfrm and sort the > > resulting binary blobs. I don't think the sortsupport stuff actually > > makes this any easier though. Since using it requires storing the > > binary blob somewhere I think the support would have to be baked into > > tuplesort (or hacked into the sortkey as an expr that was evaluated > > earlier somehow). > > Well, the real problem here is that the strxfrm'd representations > aren't just bigger - they are huge. On MacBook Pro, if the input > representation is n characters, the strxfrm'd representation is 9x+3 > characters. Ouch. I was holding out hope that you could get a meaningful improvement if we could use the first X bytes of the strxfrm output so you only need to do a strcoll on strings that actually nearly match. But with an information density of 9 bytes for one 1 character it doesn't seem worthwhile. That and this gem in the strxfrm manpage: RETURN VALUE The strxfrm() function returns the number of bytes required to store the transformed string in destexcluding the terminating '\0' character. If the value returned is n or more, the contents of dest are indeterminate. Which means that you have to take the entire transformed string, you can't just ask for the first bit. I think that kind of leaves the whole idea dead in the water. Just for interest I looked at the ICU API for this and they have the same restriction. There is another function which you can use to return partial sort keys (ucol_nextSortKeyPart) but it produces "uncompressed sortkeys", which it seems is what Mac OSX is doing, which seems useless for our purposes. Either this is a hard problem or we're nowhere near a target use case. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > He who writes carelessly confesses thereby at the very outset that he does > not attach much importance to his own thoughts. -- Arthur Schopenhauer
On Mon, Mar 19, 2012 at 9:23 PM, Martijn van Oosterhout <kleptog@svana.org> wrote: > Ouch. I was holding out hope that you could get a meaningful > improvement if we could use the first X bytes of the strxfrm output so > you only need to do a strcoll on strings that actually nearly match. > But with an information density of 9 bytes for one 1 character it > doesn't seem worthwhile. When I was playing with glibc it was 4n. I think what they do is have n bytes for the high order bits, then n bytes for low order bits like capitalization or whitespace differences. I suspect they used to use 16 bits for each and have gone to some larger size. > That and this gem in the strxfrm manpage: > > RETURN VALUE > The strxfrm() function returns the number of bytes required to > store the transformed string in dest excluding the terminating > '\0' character. If the value returned is n or more, the > contents of dest are indeterminate. > > Which means that you have to take the entire transformed string, you > can't just ask for the first bit. I think that kind of leaves the whole > idea dead in the water. I believe the intended API is that you allocate a buffer with your guess of the right size, call strxfrm and if it returns a larger number you realloc your buffer and call it again. -- greg
On 18 March 2012 15:08, Tom Lane <tgl@sss.pgh.pa.us> wrote: > One other thing I've always wondered about in this connection is the > general performance of sorting toasted datums. Is it better to detoast > them in every comparison, or pre-detoast to save comparison cycles at > the cost of having to push much more data around? I didn't see any > discussion of this point in Robert's benchmarks, but I don't think we > should go very far towards enabling sortsupport for text until we > understand the issue and know whether we need to add more infrastructure > for it. If you cross your eyes a little bit, this is very much like > the strxfrm question... I see the parallels. I note that glibc's strcoll_l() is implemented entirely in C (strcoll() itself is implemented in terms of strcoll_l() ), whereas the various strcmp.S are written in hand-optimized assembler, with SSE3 instructions in the "Highly optimized version for x86-64", for example. I wonder just how important a factor that is. I suppose the reason why the glibc guys haven't just done something equivalent internally might be that they much prefer to perform the comparison in-place, due to the need to target a conservative lowest common denominator...or it could be because it just doesn't matter that much. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On Thu, Jun 14, 2012 at 11:36 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote: > On 18 March 2012 15:08, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> One other thing I've always wondered about in this connection is the >> general performance of sorting toasted datums. Is it better to detoast >> them in every comparison, or pre-detoast to save comparison cycles at >> the cost of having to push much more data around? I didn't see any >> discussion of this point in Robert's benchmarks, but I don't think we >> should go very far towards enabling sortsupport for text until we >> understand the issue and know whether we need to add more infrastructure >> for it. If you cross your eyes a little bit, this is very much like >> the strxfrm question... > > I see the parallels. The problem with pre-detoasting to save comparison cycles is that you can now fit many, many fewer tuples in work_mem. There might be cases where it wins (for example, because the entire data set fits even after decompressing everything) but in most cases it seems like a loser. Also, my guess is that most values people sort by are pretty short, making this concern mostly academic. Suppose you are sorting a bunch of strings which might be either 100 characters in length or 1MB. If they're all 100 characters, you probably don't need to detoast. If they're all 1MB, you probably can't detoast without eating up a ton of memory (and even if you have it, this might not be the best use for it). If you have a mix, detoasting might be affordable provided that the percentage of long strings is small, but it's also not going to save you much, because if the percentage of long strings is small, then most comparisons will be between two short strings where we don't save anything anyway. All things considered, this seems to me to be aiming at a pretty narrow target, but maybe I'm just not thinking about it creatively enough. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 14 June 2012 17:35, Robert Haas <robertmhaas@gmail.com> wrote: > The problem with pre-detoasting to save comparison cycles is that you > can now fit many, many fewer tuples in work_mem. There might be cases > where it wins (for example, because the entire data set fits even > after decompressing everything) but in most cases it seems like a > loser. If I had to guess, I'd say you're probably right about that - optimising sorting toasted text doesn't seem like a terribly sensible use of your time. What about the strxfrm suggestion of Greg's? You might find that the added benefit of being able to avail of a highly optimised strcmp() tipped the balance in favour of that idea, beyond the simple fact that there's only a linear number of what you might loosely call "strcoll_l units of work" rather than as many as O(n ^ 2). Furthermore, I'd speculate that if you were to interlace the strxfrm() calls with copying each text string, somewhere like within a specialised datumCopy(), that would make the approach more efficient still, as you specify a location for the blob in the just-palloc()'d leading-key private memory directly, rather than just using memcpy. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On 2 March 2012 20:45, Robert Haas <robertmhaas@gmail.com> wrote: > I decided to investigate the possible virtues of allowing "text" to > use the sortsupport infrastructure, since strings are something people > often want to sort. I should mention up-front that I agree with the idea that it is worth optimising text sorting because it is a very common thing to have to do, and therefore the standard for inclusion ought to be lower. I don't intend to talk about tapesort though - that isn't really fair, not least because I have some serious doubts about the quality of our implementation. Furthermore, I think that it is logical that doing things like resolving collations occur within a preparatory function in advance of sorting, rather than redundantly doing that for each and every comparison. Why have you made the reusable buffer managed by sortsupport TEXTBUFLEN-aligned? The existing rationale for that constant (whose value is 1024) does not seem to carry forward here: * This should be large enough that most strings will be fit, but small* enough that we feel comfortable putting it on thestack. ISTM it would be on average worth the hit of having to repalloc a few more times for larger strings by making that buffer much smaller initially, and doubling its size each time that proved insufficient, rather than increasing its size to the smallest possible TEXTBUFLEN-aligned size that you can get away with for the immediately subsequent memcpy. Realistically, any database I've ever worked with would probably be able to fit a large majority of its text strings into 16 chars of memory - you yourself said that sorting toasted text isn't at all common. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On Thu, Jun 14, 2012 at 1:56 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote: > On 14 June 2012 17:35, Robert Haas <robertmhaas@gmail.com> wrote: >> The problem with pre-detoasting to save comparison cycles is that you >> can now fit many, many fewer tuples in work_mem. There might be cases >> where it wins (for example, because the entire data set fits even >> after decompressing everything) but in most cases it seems like a >> loser. > > If I had to guess, I'd say you're probably right about that - > optimising sorting toasted text doesn't seem like a terribly sensible > use of your time. > > What about the strxfrm suggestion of Greg's? You might find that the > added benefit of being able to avail of a highly optimised strcmp() > tipped the balance in favour of that idea, beyond the simple fact that > there's only a linear number of what you might loosely call "strcoll_l > units of work" rather than as many as O(n ^ 2). Furthermore, I'd > speculate that if you were to interlace the strxfrm() calls with > copying each text string, somewhere like within a specialised > datumCopy(), that would make the approach more efficient still, as you > specify a location for the blob in the just-palloc()'d leading-key > private memory directly, rather than just using memcpy. Well, it's still got the problem of blowing up memory usage. I just can't get excited about optimizing for the case where we can consume 10x the memory and still fit in work_mem. If we've got that case, the sort is gonna be pretty fast anyway. The case where preprocessing wins is when there are going to be a large number of comparisons against each tuple - i.e. lg(N) is large. But the cases where we could pre-transform with strxfrm are those where the data fits in a small percentage of work mem - i.e. lg(N) is small. I'm open to somebody showing up with a test result that demonstrates that it's worthwhile, but to me it seems like it's chasing diminishing returns. The point of this patch isn't really to improve things for the collation-aware case, although it's nice that it does. The point is rather to shave off a double-digit percentage off the time it takes to do the sort required to build a C-collation index, which is what people should be using when they don't care about < and >, which most don't. Despite Tom's concerns, I don't think there's anything in this patch that can't be fairly easily revised at a later date if we decide we want a different API. I think it's worth picking the low-hanging fruit in the meantime. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, Jun 14, 2012 at 2:10 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote: > Why have you made the reusable buffer managed by sortsupport > TEXTBUFLEN-aligned? The existing rationale for that constant (whose > value is 1024) does not seem to carry forward here: > > * This should be large enough that most strings will be fit, but small > * enough that we feel comfortable putting it on the stack. Well, as the comments explain: + /* + * We avoid repeated palloc/pfree on long strings by keeping the buffers + * we allocate around for the duration of the sort. When we expand them, + * we round off the to the next multiple of TEXTBUFLEN in order to avoid + * repeatedly expanding them by very small amounts. + */ > ISTM it would be on average worth the hit of having to repalloc a few > more times for larger strings by making that buffer much smaller > initially, and doubling its size each time that proved insufficient, > rather than increasing its size to the smallest possible > TEXTBUFLEN-aligned size that you can get away with for the immediately > subsequent memcpy. Realistically, any database I've ever worked with > would probably be able to fit a large majority of its text strings > into 16 chars of memory - you yourself said that sorting toasted text > isn't at all common. I thought that doubling repeatedly would be overly aggressive in terms of memory usage. Blowing the buffers out to 8kB because we hit a string that's a bit over 4kB isn't so bad, but blowing them out to 256MB because we hit a string that's a bit over 128MB seems a bit excessive. Also, I don't think it really saves anything from a performance point of view. The worst case for this is if we're asked to repeatedly compare strings that expand the buffer by a kilobyte each time. First, how likely is that to happen in a real world test case? In many cases, most of the input strings will be of approximately equal length; also in many cases, that length will be short; even if the lengths take the worst possible distribution, we have to hit them in the worst possible order for this to be a problem. Second, even if it does happen, does it really matter? Suppose we expand the buffer a kilobyte at a time from an initial size of 1kB all the way out to 256MB. That's 256,000 palloc calls, so we must be sorting at least 256,000 datums, at least 128,000 of which are longer than 128MB. I think the cost of calling memcpy() and strcoll() repeatedly on all those long datums - not to mention repeatedly detoasting them - is going to bludgeon the palloc overhead into complete insignificance. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 14 June 2012 19:28, Robert Haas <robertmhaas@gmail.com> wrote: > I thought that doubling repeatedly would be overly aggressive in terms > of memory usage. Blowing the buffers out to 8kB because we hit a > string that's a bit over 4kB isn't so bad, but blowing them out to > 256MB because we hit a string that's a bit over 128MB seems a bit > excessive. That's pretty much what all popular dynamic array implementations do, from C++'s std::vector to Python's list (it's a misnomer). Having to allocate 256MB for a buffer to contain a string a bit over 128MB may seem excessive, until you later get a 250MB string. Even if doubling is generally excessive, which I doubt, that's beside the point, which is that expanding the array by some constant proportion results in each insertion taking amortized constant time. > Suppose we expand the buffer a > kilobyte at a time from an initial size of 1kB all the way out to > 256MB. That's 256,000 palloc calls, so we must be sorting at least > 256,000 datums, at least 128,000 of which are longer than 128MB. I > think the cost of calling memcpy() and strcoll() repeatedly on all > those long datums - not to mention repeatedly detoasting them - is > going to bludgeon the palloc overhead into complete insignificance. I fail to understand how this sortsupport buffer fundamentally differs from a generic dynamic array abstraction built to contain chars. That being the case, I see no reason not to just do what everyone else does when expanding dynamic arrays, and no reason why we shouldn't make essentially the same time-space trade-off here as others do elsewhere. Another concern is that it seems fairly pointless to have two buffers. Wouldn't it be more sensible to have a single buffer that was partitioned to make two logical, equally-sized buffers, given that in general each buffer is expected to grow at exactly the same rate? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On Thu, Jun 14, 2012 at 3:24 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote: > On 14 June 2012 19:28, Robert Haas <robertmhaas@gmail.com> wrote: >> I thought that doubling repeatedly would be overly aggressive in terms >> of memory usage. Blowing the buffers out to 8kB because we hit a >> string that's a bit over 4kB isn't so bad, but blowing them out to >> 256MB because we hit a string that's a bit over 128MB seems a bit >> excessive. > > That's pretty much what all popular dynamic array implementations do, > from C++'s std::vector to Python's list (it's a misnomer). Having to > allocate 256MB for a buffer to contain a string a bit over 128MB may > seem excessive, until you later get a 250MB string. Even if doubling > is generally excessive, which I doubt, that's beside the point, which > is that expanding the array by some constant proportion results in > each insertion taking amortized constant time. Yeah, but *it doesn't matter*. If you test this on strings that are long enough that they get pushed out to TOAST, you'll find that it doesn't measurably improve performance, because the overhead of detoasting so completely dominates any savings on the palloc side that you can't pick them out of the inter-run noise. Risking eating up an extra 100MB of memory that we don't really need in order to obtain a performance optimization that is far too small to measure does not make sense. The case with std::vector is not analagous; they don't have any way of knowing what other overhead you are incurring between insertions into the vector, so it's reasonable to support that the cost of the vector insertions themselves might be material. Here we know that it doesn't matter, so the application of Knuth's first law of optimization is appropriate. >> Suppose we expand the buffer a >> kilobyte at a time from an initial size of 1kB all the way out to >> 256MB. That's 256,000 palloc calls, so we must be sorting at least >> 256,000 datums, at least 128,000 of which are longer than 128MB. I >> think the cost of calling memcpy() and strcoll() repeatedly on all >> those long datums - not to mention repeatedly detoasting them - is >> going to bludgeon the palloc overhead into complete insignificance. > > I fail to understand how this sortsupport buffer fundamentally differs > from a generic dynamic array abstraction built to contain chars. That > being the case, I see no reason not to just do what everyone else does > when expanding dynamic arrays, and no reason why we shouldn't make > essentially the same time-space trade-off here as others do elsewhere. > > Another concern is that it seems fairly pointless to have two buffers. > Wouldn't it be more sensible to have a single buffer that was > partitioned to make two logical, equally-sized buffers, given that in > general each buffer is expected to grow at exactly the same rate? Sure, but it would be making the code more complicated in return for no measurable performance benefit. We generally avoid that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 14 June 2012 20:32, Robert Haas <robertmhaas@gmail.com> wrote: > Yeah, but *it doesn't matter*. If you test this on strings that are > long enough that they get pushed out to TOAST, you'll find that it > doesn't measurably improve performance, because the overhead of > detoasting so completely dominates any savings on the palloc side that > you can't pick them out of the inter-run noise. That's probably true, but it's also beside the point. As recently as a few hours ago, you yourself said "my guess is that most values people sort by are pretty short, making this concern mostly academic". Why are you getting hung up on toasting now? > Here we know that it doesn't matter, so the application of Knuth's first law > of optimization is appropriate. I'm not advocating some Byzantine optimisation, or even something that could reasonably be described as an optimisation at all here. I'm questioning why you've unnecessarily complicated the code by having the buffer size just big enough to fit the biggest value seen so far, but arbitrarily aligned to a value that is completely irrelevant to bttextfastcmp_locale(), rather than using simple geometric expansion, which is more or less the standard way of managing the growth of a dynamic array. You have to grow the array in some way. The basic approach I've outlined has something to recommend it - why does it make sense to align the size of the buffer to TEXTBUFLEN in particular though? It's quite easy to imagine what you've done here resulting in an excessive number of allocations (and pfree()s), which *could* be expensive. If you're so conservative about allocating memory, don't grow the array at quite so aggressive a rate as doubling it each time. There is a trade-off between space and time to be made here, but I don't know why you think that the right choice is to use almost the smallest possible amount of memory in all cases. >> Another concern is that it seems fairly pointless to have two buffers. >> Wouldn't it be more sensible to have a single buffer that was >> partitioned to make two logical, equally-sized buffers, given that in >> general each buffer is expected to grow at exactly the same rate? > > Sure, but it would be making the code more complicated in return for > no measurable performance benefit. We generally avoid that. Fair enough. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On Thu, Jun 14, 2012 at 6:30 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote: >> Here we know that it doesn't matter, so the application of Knuth's first law >> of optimization is appropriate. > > I'm not advocating some Byzantine optimisation, or even something that > could reasonably be described as an optimisation at all here. I'm > questioning why you've unnecessarily complicated the code by having > the buffer size just big enough to fit the biggest value seen so far, > but arbitrarily aligned to a value that is completely irrelevant to > bttextfastcmp_locale(), rather than using simple geometric expansion, > which is more or less the standard way of managing the growth of a > dynamic array. Well, so... palloc, and most malloc implementations, don't actually just double every time forever. They double up to some relatively small number like 1MB, and then they expand in 1MB chunks. std::vector may well behave as you're describing, but that's doing something completely different. We're not accumulating a string of unknown length into a buffer, with the risk of having to copy the whole buffer every time we reallocate. We know the exact size of the buffer we need for any given string, so the cost of a pfree/palloc is only the cost of releasing and allocating memory; there is no additional copying cost, as there would be for std::vector. Look at it another way. Right now, the worst case number of temporary buffer allocations is about 2N lg N, assuming we do about N lg N comparisons during a sort. If we simply implemented the most naive buffer reallocation algorithm possible, we might in the worst case reallocate each buffer up to N times. That is already better than what we do now by a factor of lg N. If we suppose N is about a million, that's an improvement of 20x *in the worst case* - typically, the improvement will be much larger, because all the strings will be of similar length or we won't hit them in exactly increasing order of length. The algorithm I've actually implemented bounds the worst case 1024 times more tightly - given N strings, we can't need to enlarge the buffer more than N/1024 times. So with N of around a million, this algorithm should eliminate *at least* 99.995% of the pallocs we currently do . How much better does it need to be? > You have to grow the array in some way. The basic approach I've > outlined has something to recommend it - why does it make sense to > align the size of the buffer to TEXTBUFLEN in particular though? There's nothing magic about TEXTBUFLEN as far as that goes. We could round to the nearest multiple of 8kB or whatever if that seemed better. But if we rounded to, say, the nearest 1MB, then someone sorting strings that are 2kB long would use 2MB of memory for these buffers. Considering that we ship with work_mem = 1MB, that could easily end up wasting almost twice work_mem. I don't see how you can view that as a trivial problem. It might be worth sucking that up if it seemed likely to dramatically improve performance, but it doesn't. > It's > quite easy to imagine what you've done here resulting in an excessive > number of allocations (and pfree()s), which *could* be expensive. Actually, I can't imagine that at all, per the above analysis. If I could, I might agree with you. :-) > There is a trade-off between space and time to be made here, but I > don't know why you think that the right choice is to use almost the > smallest possible amount of memory in all cases. Because I think that with the current implementation I can have my cake and eat it, too. I believe I've gotten essentially all the available speedup for essentially no memory wastage. If you can convince me that I've missed something and there is still a meaningful amount of palloc overhead left to be squeezed out, I'm all ears. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Peter Geoghegan <peter@2ndquadrant.com> writes: > On 14 June 2012 19:28, Robert Haas <robertmhaas@gmail.com> wrote: >> I thought that doubling repeatedly would be overly aggressive in terms >> of memory usage. > I fail to understand how this sortsupport buffer fundamentally differs > from a generic dynamic array abstraction built to contain chars. That > being the case, I see no reason not to just do what everyone else does > when expanding dynamic arrays, and no reason why we shouldn't make > essentially the same time-space trade-off here as others do elsewhere. I agree with Peter on this one; not only is double-each-time the most widespread plan, but it is what we do in just about every other place in Postgres that needs a dynamically expansible buffer. If you do it randomly differently here, readers of the code will be constantly stopping to wonder why it's different here and if that's a bug or not. (And from a performance standpoint, I'm not entirely convinced it's not a bug, anyway. Worst-case behavior could be pretty bad.) regards, tom lane
On Fri, Jun 15, 2012 at 12:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Geoghegan <peter@2ndquadrant.com> writes: >> On 14 June 2012 19:28, Robert Haas <robertmhaas@gmail.com> wrote: >>> I thought that doubling repeatedly would be overly aggressive in terms >>> of memory usage. > >> I fail to understand how this sortsupport buffer fundamentally differs >> from a generic dynamic array abstraction built to contain chars. That >> being the case, I see no reason not to just do what everyone else does >> when expanding dynamic arrays, and no reason why we shouldn't make >> essentially the same time-space trade-off here as others do elsewhere. > > I agree with Peter on this one; not only is double-each-time the most > widespread plan, but it is what we do in just about every other place > in Postgres that needs a dynamically expansible buffer. If you do it > randomly differently here, readers of the code will be constantly > stopping to wonder why it's different here and if that's a bug or not. That could, of course, be addressed by adding a comment. > (And from a performance standpoint, I'm not entirely convinced it's not > a bug, anyway. Worst-case behavior could be pretty bad.) Instead of simply asserting that, could you respond to the specific points raised in my analysis? I think there's no way it can be bad. I am happy to be proven wrong, but I like to understand why it is that I am wrong before changing things. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Jun 15, 2012 at 12:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> (And from a performance standpoint, I'm not entirely convinced it's not >> a bug, anyway. Worst-case behavior could be pretty bad.) > Instead of simply asserting that, could you respond to the specific > points raised in my analysis? I think there's no way it can be bad. > I am happy to be proven wrong, but I like to understand why it is that > I am wrong before changing things. Maybe I missed something, but as far as I saw your argument was not that the performance wasn't bad but that the rest of the sort code would dominate the runtime anyway. I grant that entirely, but that doesn't mean that it's good for this piece of it to possibly have bad behavior. In any case, it seems to me that if the bottom line is that the performance of this piece of code isn't going to matter overall, then we might as well use the simplest, least surprising implementation we can. And I concur with Peter that a doubling-based approach meets that description, while what you've got here doesn't. regards, tom lane
On Fri, Jun 15, 2012 at 1:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Fri, Jun 15, 2012 at 12:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> (And from a performance standpoint, I'm not entirely convinced it's not >>> a bug, anyway. Worst-case behavior could be pretty bad.) > >> Instead of simply asserting that, could you respond to the specific >> points raised in my analysis? I think there's no way it can be bad. >> I am happy to be proven wrong, but I like to understand why it is that >> I am wrong before changing things. > > Maybe I missed something, but as far as I saw your argument was not that > the performance wasn't bad but that the rest of the sort code would > dominate the runtime anyway. I grant that entirely, but that doesn't > mean that it's good for this piece of it to possibly have bad behavior. That, plus the fact that not wasting memory in code paths where memory is at a premium seems important to me. I'm shocked that either of you think it's OK to overallocate by as much as 2X in a code path that's only going to be used when we're going through fantastic gyrations to make memory usage fit inside work_mem. The over-allocation by itself could easily exceed work_mem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 15 June 2012 21:06, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Jun 15, 2012 at 1:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> On Fri, Jun 15, 2012 at 12:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>>> (And from a performance standpoint, I'm not entirely convinced it's not >>>> a bug, anyway. Worst-case behavior could be pretty bad.) >> >>> Instead of simply asserting that, could you respond to the specific >>> points raised in my analysis? I think there's no way it can be bad. >>> I am happy to be proven wrong, but I like to understand why it is that >>> I am wrong before changing things. >> >> Maybe I missed something, but as far as I saw your argument was not that >> the performance wasn't bad but that the rest of the sort code would >> dominate the runtime anyway. I grant that entirely, but that doesn't >> mean that it's good for this piece of it to possibly have bad behavior. > > That, plus the fact that not wasting memory in code paths where memory > is at a premium seems important to me. I'm shocked that either of you > think it's OK to overallocate by as much as 2X in a code path that's > only going to be used when we're going through fantastic gyrations to > make memory usage fit inside work_mem. The over-allocation by itself > could easily exceed work_mem. That seems pretty thin to me. We're talking about a couple of buffers whose ultimate size is only approximately just big enough to hold the largest text datum seen when sorting. Meanwhile, if it's the leading key we're dealing with (and of course, it usually will be), before exceeding work_mem all of the *entire* set of strings to be sorted are sitting in palloc()'d memory anyway. I'm surprised that you didn't immediately concede the point, to be honest. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Jun 15, 2012 at 1:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Maybe I missed something, but as far as I saw your argument was not that >> the performance wasn't bad but that the rest of the sort code would >> dominate the runtime anyway. �I grant that entirely, but that doesn't >> mean that it's good for this piece of it to possibly have bad behavior. > That, plus the fact that not wasting memory in code paths where memory > is at a premium seems important to me. I'm shocked that either of you > think it's OK to overallocate by as much as 2X in a code path that's > only going to be used when we're going through fantastic gyrations to > make memory usage fit inside work_mem. The over-allocation by itself > could easily exceed work_mem. I would be concerned about this if it were per-sort-tuple wastage, but what I understood to be happening was that this was a single instance of an expansible buffer (per sort, perhaps, but still just one buffer). And, as you keep pointing out when it suits your argument, it's relatively uncommon to be sorting enormous values anyway. So no, I am not particularly worried about that. If you are, there are more important places to be concerned about allocation pad wastage, starting with palloc itself. regards, tom lane
On 18 March 2012 15:08, Tom Lane <tgl@sss.pgh.pa.us> wrote: > However, it occurred to me that we could pretty easily jury-rig > something that would give us an idea about the actual benefit available > here. To wit: make a C function that wraps strxfrm, basically > strxfrm(text) returns bytea. Then compare the performance of > ORDER BY text_col to ORDER BY strxfrm(text_col). > > (You would need to have either both or neither of text and bytea > using the sortsupport code paths for this to be a fair comparison.) I thought this was an interesting idea, so decided to try it out for myself. I tried this out against master (not Robert's patch, per Tom's direction). The results were interesting: [peter@peterlaptop strxfrm_test]$ pgbench postgres -T 60 -f sort_strxfrm.sql -n transaction type: Custom query scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 duration: 60 s number of transactions actually processed: 2795 tps = 46.563970 (including connections establishing) tps = 46.568234 (excluding connections establishing) [peter@peterlaptop strxfrm_test]$ pgbench postgres -T 60 -f sort_reg.sql -n transaction type: Custom query scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 duration: 60 s number of transactions actually processed: 2079 tps = 34.638838 (including connections establishing) tps = 34.640665 (excluding connections establishing) The first test executed the following query against the dellstore database: select * from products order by strxfrm_test(actor) offset 10001; The second: select * from products order by actor offset 10001; So, this was pretty good - an improvement that is completely independent of Robert's. Bear in mind, this simple demonstration adds additional fmgr overhead, which we have plenty of reason to believe could hurt things, besides which each call must allocate memory that could perhaps be avoided. In addition, I don't know enough about locale-aware sorting and related algorithms to have devised a test that would stress strxfrm()/ strcoll() - these were all strings that could be represented as ASCII. In light of this, I think there is a pretty strong case to be made for pre-processing text via strxfrm() as part of this patch. Thoughts? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Attachment
The fly in the ointment for strxfrm() adoption may be the need to be consistent with this earlier behaviour: commit 656beff59033ccc5261a615802e1a85da68e8fad Author: Tom Lane <tgl@sss.pgh.pa.us> Date: Thu Dec 22 22:50:00 2005 +0000 Adjust string comparison so that only bitwise-equal strings are considered equal: if strcoll claims two strings areequal, check it with strcmp, and sort according to strcmp if not identical. This fixes inconsistent behavior underglibc's hu_HU locale, and probably under some other locales as well. Also, take advantage of the now-well-definedbehavior to speed up texteq, textne, bpchareq, bpcharne: they may as well just do a bitwise comparisonand not bother with strcoll at all. NOTE: affected databases may need to REINDEX indexes on text columns to be sure they are self-consistent. Here is the relevant code: /* * In some locales strcoll() can claim that nonidentical strings are * equal. Believing that would be badnews for a number of reasons, * so we follow Perl's lead and sort "equal" strings according to * strcmp(). */ if (result == 0) result = strcmp(a1p, a2p); I'm not sure I agree with this decision; why should we presume to know better than the glibc locale what constitutes equality? What are the number of reasons referred to? It's seems very likely that the main one was the then-need to guard against poor quality qsort() implementations that went quadratic in the face of lots of duplicates, but we already removed a bunch of other such hacks, because of course we now control the qsort implementation used, and have since the year after this commit was made, 2006. Obviously this decision was made a number of years ago now, and at least one person went on to rely on this behaviour, so it can only be revisited with that in mind. However, provided we are able to say "here is a compatibility ordering operator" to those that complain about this, and provided it is appropriately listed as a compatibility issue in the 9.3 release notes, I think it would be worth reverting this commit to facilitate strxfrm(). How many people: A) are using hu_HU or some other locale where this can happen? and B) will care? Now, I'm sure that there is another workaround too, so this doesn't need to be a blocker even if it is absolutely unacceptable to revert - but I have to wonder if that's worth it. People don't have any business relying on a sort order that is consistent in any way other than the one they actually asked for. A few people still do even as we go blue in the face telling them not to of course, but that's fairly normal. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Peter Geoghegan <peter@2ndquadrant.com> writes: > The fly in the ointment for strxfrm() adoption may be the need to be > consistent with this earlier behaviour: > if strcoll claims two strings are equal, check it with strcmp, and > sort according to strcmp if not identical. > I'm not sure I agree with this decision; why should we presume to know > better than the glibc locale what constitutes equality? The killer reason why it must be like that is that you can't use hash methods on text if text equality is some unknown condition subtly different from bitwise equality. My recollection is that there were some other problems as well, but I'm too lazy to search the archives for you. > It's seems very likely that the main > one was the then-need to guard against poor quality qsort() > implementations that went quadratic in the face of lots of duplicates, No, I don't recall that that had anything to do with it. regards, tom lane
On 17 June 2012 17:01, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'm not sure I agree with this decision; why should we presume to know >> better than the glibc locale what constitutes equality? > > The killer reason why it must be like that is that you can't use hash > methods on text if text equality is some unknown condition subtly > different from bitwise equality. Fair enough, but I doubt that we need to revert the changes made in this commit to texteq in addition to the changes I'd like to see in order to be semantically self-consistent. That is because there is often a distinction made between equality and equivalence, and we could adopt this distinction. strcoll() could be said to be just making a representation that its two arguments are equivalent (and not necessarily equal) when it returns 0. This distinction is explicitly made in the C++ standard library, and failing to understand it can result in bugs: http://www.cplusplus.com/reference/algorithm/equal_range/ Note the use of the word "equivalent" rather than "equal" in the text. equal_range is a bit of a misnomer. This distinction is important enough to have warranted an entire subsection of the book "Effective STL" by Scott Meyers, a well-respected expert on the language. This comes up more often than you'd think - "std::set::insert" determines if an element already exists (to know if it must replace it) based on equivalency (usually, though not necessarily, defined in terms of operator< ), whereas the "find" algorithm finds elements based on equality (operator==). > My recollection is that there were > some other problems as well, but I'm too lazy to search the archives > for you. Fair enough. I'll search for it myself later. I'm about to head out now. >> It's seems very likely that the main >> one was the then-need to guard against poor quality qsort() >> implementations that went quadratic in the face of lots of duplicates, > > No, I don't recall that that had anything to do with it. Oh, okay. It looked very much like the "avoid equality at all costs" thing you still see some of in tuplesort.c . -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Peter Geoghegan <peter@2ndquadrant.com> writes: > On 17 June 2012 17:01, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The killer reason why it must be like that is that you can't use hash >> methods on text if text equality is some unknown condition subtly >> different from bitwise equality. > Fair enough, but I doubt that we need to revert the changes made in > this commit to texteq in addition to the changes I'd like to see in > order to be semantically self-consistent. That is because there is > often a distinction made between equality and equivalence, and we > could adopt this distinction. How exactly do you plan to shoehorn that into SQL? You could invent some nonstandard "equivalence" operator I suppose, but what will be the value? We aren't going to set things up in such a way that we can't use hash join or hash aggregation in queries that use the regular "=" operator. IMO there just aren't going to be enough people who care to use a non-default operator. regards, tom lane
<p><br /> On Jun 17, 2012 5:50 PM, "Tom Lane" <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:<br/> ><br /> > Peter Geoghegan <<a href="mailto:peter@2ndquadrant.com">peter@2ndquadrant.com</a>> writes:<br/> > > On 17 June 2012 17:01, Tom Lane <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:<br/> > How exactly do you plan to shoehorn that into SQL? You could invent<br /> > some nonstandard "equivalence"operator I suppose, but what will be the<br /> > value? We aren't going to set things up in such a way thatwe can't<br /> > use hash join or hash aggregation in queries that use the regular "="<br /> > operator.<p>Right,most people won't care. You may or may not want a new <br /> Operator for equivalency. The regular operatorfor equality doesn't have to and shouldn't change. It is both useful and conceptually clean to not guarantee thata compator can be relied upon to indicate equality and not just equivalency.
Peter Geoghegan <peter@2ndquadrant.com> writes: > Right, most people won't care. You may or may not want a new > Operator for equivalency. The regular operator for equality doesn't have to > and shouldn't change. It is both useful and conceptually clean to not > guarantee that a compator can be relied upon to indicate equality and not > just equivalency. Sure, and in general we only expect that "=" operators mean equivalency; a concrete example is float8 "=", which on IEEE-spec machines will say that zero and minus zero are equal. The trick for hashing such datatypes is to be able to guarantee that "equal" values hash to the same hash code, which is typically possible as long as you know the equality rules well enough. We could possibly do that for text with pure-strcoll equality if we knew all the details of what strcoll would consider "equal", but we do not. See also citext for an example of a datatype where we can manage to treat distinct textual values as equal and still hash them. regards, tom lane
On 17 June 2012 21:26, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Sure, and in general we only expect that "=" operators mean equivalency; > a concrete example is float8 "=", which on IEEE-spec machines will say > that zero and minus zero are equal. Right; the spec says that, and we punt to the spec. No one sensible thinks that minus zero and zero floats are not equal, by virtue of the fact that if you compare them in C using ==, it evaluates to true. Equivalency as I use the term can't really be talked about without talking about sorting or less than operators. > The trick for hashing such datatypes is to be able to guarantee that > "equal" values hash to the same hash code, which is typically possible > as long as you know the equality rules well enough. We could possibly > do that for text with pure-strcoll equality if we knew all the details > of what strcoll would consider "equal", but we do not. I see. I am tentatively suggesting that we don't change the definition of equal, but allow that equivalent text values may not be equal. > See also citext for an example of a datatype where we can manage to > treat distinct textual values as equal and still hash them. I'm not talking about textually distinct values being equal/equivalent. That doesn't quite capture it (although I suppose a corollary of what I am trying to express is that equivalent though non-equal values should invariably have different textual representations in Postgres). I think a good example that illustrates the difference between equivalency and equality is the German ß character (esszet), which is regarded as equivalent to 'ss' (two 's' characters). The following German words are sorted correctly as required by de_DE.UTF-8: Arg Ärgerlich Arm Assistant Aßlar Assoziation So if you take the word "Aßlar" here - that is equivalent to "Asslar", and so strcoll("Aßlar", "Asslar") will return 0 if you have the right LC_COLLATE (if you tried this out for yourself and found that I was actually lying through my teeth, pretend I said Hungarian instead of German and "some really obscure character" rather than ß). It seems a bit unsatisfactory to me that a unique constraint will happily accept these two equivalent strings because they're not bitwise identical, when the collation was supposed to have that normalisation baked in. At the same time, I find it intuitively obvious that "Aßlar" == "Asslar" is false, and at the very least I think that very few people would not grant that it is a not unreasonable state of affairs for both of those two things to hold, at least on the face of it (presuming that I wasn't actually lying about the particulars of this, which I was, since this is apparently confined to less popular locales and I'm too lazy to research the exact details). Now, I do realise that there is what might appear to be a tension in what I'm saying; it is precisely the fact that we can traverse a btree index using comparators that allows a btree to satisfy an equality condition (or an equivalency condition; however you choose to characterise whatever it is that the '=' operator does). To restate the problem: The '=' operator implies equivalency and not equality. Or does it? Look at this code from nbtutils.c's _bt_checkkeys() function: test = FunctionCall2Coll(&key->sk_func, key->sk_collation, datum, key->sk_argument); if (!DatumGetBool(test)) { /* * Tuple fails this qual. If it's a required qual for the current * scan direction, then we can conclude no further tuples will * pass, either. * * Note: becausewe stop the scan as soon as any required equality * qual fails, it is critical that equality quals be usedfor the * initial positioning in _bt_first() when they are available. See * comments in _bt_first(). */ ***SNIP*** The qual is verified for the index tuple itself (the test return value lets us know if it matches) - the equality operator is actually called, and is actually re-verified via texteq(). So what appears to happen is that the btree code finds equivalent tuples, and then, knowing that all pairs of equal tuples are equivalent (but not necessarily the inverse) checks that it actually has a tuple that's equal/satisfies the qual. Makes sense, and by this standard I'd judge that '=' was actually an equality operator that sometimes took advantage of equivalency purely as an implementation detail, but I'm not familiar enough with that part of the code to have any degree of confidence that I haven't made a leap here that I shouldn't have. ISTM if '=' was really a mere equivalency operator, we'd only every check (a < b && b < a) in the btree code. It occurs to me that we might also have a new equivalency operator (=== or something) so that we actually could answer questions like "show me the tuples with the word Aßlar in some non-unique column - both spellings will do". Maybe that's a bit off the wall, I'm not sure. Simple question: if you were to just remove the strcmp tie-breaker for strcoll() in varstr_cmp(), but not touch anything else, would Postgres exhibit objectively incorrect behaviour? Incidentally, I think it's interesting that the SQL-exposed interface to all of this merely presents the user with a boolean: postgres=# select '4'::text < '4'::text;?column? ----------f (1 row) This is pretty much what C++ does; operator< and friends conventionally return a bool, not an int. This might have something to do with the need to make a break from the qsort() convention of returning an int that may indicate equality. So the formal definition of equivalency there may be that two equivalent values will always be either less than each other or not less than each other - you have no business expecting them to come out of a sort in any particular order relative to each other (except with a stable sort). Which is not the same as equal or "textually distinct but equal", which I think is how I'd classify your IEEE 754 example. We can decree that equivalency implies equality, or make all this internal (which, perversely, I suppose the C++ committee people cannot). So, I may have lost sight of why I starting on about equivalency, which is that it sure would be nice if we could use strxfrm to prepare strings for sorting, which looks to be a fairly significant win. Does anyone have any simpler ideas, assuming this one is no good? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On 17 June 2012 23:58, Peter Geoghegan <peter@2ndquadrant.com> wrote: > We can decree that equivalency implies equality, or make all this > internal (which, perversely, I suppose the C++ committee people > cannot). Sorry, that should obviously read "equality implies equivalency". We may not have to decree it, because it may already be a tacit assumption - I'm not sure. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Peter Geoghegan <peter@2ndquadrant.com> writes: > ISTM if '=' was really a mere equivalency operator, we'd only every > check (a < b && b < a) in the btree code. You're not really making a lot of sense here, or at least I'm not grasping the distinction you want to draw. btree indexes (and sorting in general) require the trichotomy law to hold, so what you say above is tautological. The only reason we test "a = b" and not "a < b || a > b" is that the latter is at least twice as expensive to evaluate. The last section of src/backend/access/nbtree/README has some notes that you might find relevant. > Simple question: if you were to just remove the strcmp tie-breaker for > strcoll() in varstr_cmp(), but not touch anything else, would Postgres > exhibit objectively incorrect behaviour? Yes, it would, and did, before we put that in; see the archives for the discussions that led up to the patch you mentioned earlier. > So, I may have lost sight of why I starting on about equivalency, > which is that it sure would be nice if we could use strxfrm to prepare > strings for sorting, which looks to be a fairly significant win. We could still do that as long as we were willing to store the original strings as well as the strxfrm output. Given the memory bloat already implied by strxfrm, I don't think that's necessarily ridiculous on its face --- it just makes the bar a bit higher for whether this is a win. regards, tom lane
On 18 June 2012 00:38, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The only reason we test "a = b" and not "a < b || a > b" > is that the latter is at least twice as expensive to evaluate. Perhaps I've been unclear. I meant to write "(!(a < b) && !(b < a))", and not "(a < b && b < a)". The former is not tautological when the trichotomy law holds, but the latter is. My mistake - I was a little tired when I wrote that (though you seemed to know what I meant). You can sort things just fine if you only have a less than comparator than returns a boolean, which is what I sought to illustrate with that example: There is no requirement that the comparator indicate anything more than whether one element is less than the other. In particular, it need not tell the sort function that two elements are equal, though the sort function will still (almost invariably in practice) put equal elements beside each other. Consider std::set, a highly-generic template class that enforces uniqueness, usually implemented using a red-black tree (so its elements are sorted), that only requires that the datatype it is instantiated with have an operator<. The datatype used may not even have an operator== for std::set to consult if it wanted to. So, in principle, std::set could figure out if any two given elements were equivalent. That is to say, it could determine: if (!(a < b) && !(b < a)) It could not, even in principle given its implicit interface/requirements for datatypes, know for sure that: if (a == b) Despite the fact that equivalency and equality can be expected to coincide the vast majority of the time, they are not quite the same thing. It's perfectly reasonable that it might actually not hold that a and b are equal, even though they are equivalent, for certain datatypes. Certainly not for integers. But, for example, it could make sense to have a string datatype where with a certain locale a certain Hungarian word was equivalent to the same string with a minor variation in diacritical marks or whatever (an entirely culturally defined equivalency), even though it was not equal according to this datatype's own idiosyncratic notion of equality, which is bitwise equality. That would be a datatype that std::set would be entirely capable of rolling with, because it doesn't care about equality. It might be better if our text type had the same semantics as this hypothetical string datatype, because: 1. A culture has gone so far as to make these variations of diacritical marks or whatever equivalent. The Unicode consortium consulted with the Hungarian government, or maybe the Hungarian version of the Académie française, and they asked for this, which seems pretty serious to me. Maybe they'd like to have a unique constraint reject what they consider to be equivalent strings. If this sounds pedantic to you, google "Han unification controversy", because this sounds like that in reverse (Hungarian separation controversy). The fact that when a unique constraint is violated, it isn't actually necessary to check the equality operator function (equivalence will do; no need to call texteq as when selecting with the same value in the qual) suggests that this wouldn't be too hard to facilitate, though again, I admit my understanding here is more shallow than I'd like. Now, I'll grant you that I'm not currently losing any sleep about our cultural insensitivity, and apparently neither are any Hungarians; this just serves to illustrate that my position is The Right Thing (I hope). Here is a description of the Hungarian problem: http://blogs.msdn.com/b/michkap/archive/2005/08/10/449909.aspx It says " the feature is such that since dzs is a compression within Hungarian, that if you see ddzs it should be treated as equivalent to dzsdzs...Basically, the feature is such that since dzs is a compression within Hungarian, that if you see ddzs it should be treated as equivalent to dzsdzs.". We're not the first people to have to deal with this: http://bugs.mysql.com/bug.php?id=12519 Here is Tom's original analysis, that lead to this patch: http://archives.postgresql.org/pgsql-general/2005-12/msg00803.php 2. This way, it's still possible to do the strxfrm() optimisation more or less for free, without breaking hash methods on text. That is not to be sniffed at - sorting text is very common and important. Most people don't know about the C locale, and couldn't use it if they did. > The last section of src/backend/access/nbtree/README has some notes > that you might find relevant. I assume that you're referring to "Notes to Operator Class Implementors". It says: """ An = operator must be an equivalence relation; that is, for all non-null values A,B,C of the datatype: A = A is true reflexive lawif A = B, then B = A symmetric lawif A = B and B = C,then A = C transitive law """ This would still hold; we'd just have to change the = operators here to something representing equivalency, I suppose (or, more likely, don't bring up equivalency .Vs equality here for pedagogical reasons). If it didn't hold, than it would be impossible to write a strxfrm() implementation, since the two variations of the textually distinct though equivalent Hungarian strings would produce identical strxfrm() blobs that were on the one hand bitwise identical, but on the other not reflexive, symmetric or transitive when bitwise compared. This is obviously a logical contradiction. So, we'd still be using a comparison that returns an integer under the hood, but we'd interpret 0 as equivalency and not equality. >> Simple question: if you were to just remove the strcmp tie-breaker for >> () in varstr_cmp(), but not touch anything else, would Postgres >> exhibit objectively incorrect behaviour? > > Yes, it would, and did, before we put that in; see the archives for the > discussions that led up to the patch you mentioned earlier. It wouldn't be the same though, because you also changed the definition of texteq, textne, bpchareq, bpcharne in those commits (to just use strncmp(), not varstr_cmp()), and I am not suggesting that you consider changing them back too. Incidentally, I should mention that I tried removing the strcmp() tie-breaker in varstr_cmp() on master, and unsurprisingly the regression tests don't break. Now, there might be a better way of explaining all of this that is more clear and succinct, and that's what I'll attempt to do: tuplesort.c tells lies to the btree code: That two index tuples that are logically equal are not. Now, the btree code doesn't "repeat" those lies to anyone else, like the user, which is unsurprising because that would lead to incorrect answers. Like all good liars, it is consistent in the lies that it tells (because the lies are derived from the item-pointers of index tuples, which are used as a tie-breaker). I'm tentatively suggesting that it might be to our advantage to tell a similar though different lie to the btree code that it would never have to know about, a lie originating higher-up and reverberating more places, within the comparator proper ( that is, varstr_cmp() ), rather than the index-tuple encapsulating comparator within tuplesort (that is, comparetup_index_btree() ): That two index tuples are equal, when in-fact it might just be that they're equivalent. Now, for most people, this is exactly the same thing since every datatype except text doesn't have a separate notion of equivalence, and text basically only has that notion when using certain locales like hu_HU.UTF-8 (hardly surprising that the regression tests didn't fail on me). The Hungarians get to have uniqueness enforced for textually distinct variants that strcoll() returns 0 for (equivalent not equal values) - in other words, they get what they asked for. However, they still must specify the correct exact variation in predicates, since the equality operator only cares about bitwise equality, which is consistent with general Postgres convention. I believe that that could be the most correct behaviour for them. So, the original complainant saw this behaviour, a clear violation of the trichotomy law: mage=# select 'potyty'::varchar = 'potty'::varchar;?column? ----------f (1 row) mage=# select 'potyty'::varchar < 'potty'::varchar;?column? ----------f (1 row) mage=# select 'potyty'::varchar > 'potty'::varchar;?column? ----------f (1 row) The thing is that I think that the above user-visible behaviour might actually be desirable. 'potyty'::varchar is* not* bitwise equal (our general definition of text equality) to 'potty'::varchar. So, suppose there was an equivalency operator, say ===, all would be right with the world: mage=# select 'potyty'::varchar === 'potty'::varchar;?column? ----------t (1 row) mage=# select 'potyty'::varchar < 'potty'::varchar;?column? ----------f (1 row) mage=# select 'potyty'::varchar > 'potty'::varchar;?column? ----------f (1 row) The reason everything wasn't right with the world, and this example got screwed up for the complainant at the time was that texteq() still returned false via the len1 != len2 fastpath - otherwise Tom's testcase here would not have shown a problem. So at the time and under the exact circumstances, the '=' operator was actually equality-orientated (if only by accident, because len1 != len2 here), whereas varstr_cmp() was entirely equivalency-orientated. So that explains this test-case of Tom's. It does not explain the confusing behaviour that led to the complaint though, since few people are in the habit of verifying that the trichotomy law holds just for the fun of it. online=# select * from common_logins where username = 'potyty'; uid | username | password | lastlogin | status | usertype | loginnum -----+----------+----------+-----------+--------+----------+---------- (0 rows) online=# select * from common_logins where username like 'potyty'; uid | username | password | lastlogin | status | usertype | loginnum --------+----------+----------+----------------------------+--------+----------+---------- 155505 | potyty | board | 2004-08-16 17:45:55.723829 | A | S | 1 60067 | potyty | board | 2004-07-07 20:22:17.68699 | A | S | 3 174041 | potyty | board | 2005-02-17 00:00:13.706144 | A | S | 3 (3 rows) online=# select username, username = 'potyty' from common_logins where username like 'potyty'; username | ?column? ----------+---------- potyty | t potyty | t potyty | t (3 rows) That isn't something that I've been able to figure out yet. I think that maybe an index got corrupted because Incidentally, I should point out that right now this code appears in like_match.c: MatchText(char *t, int tlen, char *p, int plen, pg_locale_t locale, bool locale_is_c) The only way that the locale and locale_is_c are used within this function is via a recursive call to MatchText() - they're entirely vestigial, it seems. Now, again I must admit that it isn't obvious to me that my white lie to the btree code about something being equal that is actually just equivalent won't come back to haunt me - lies have short legs. On the face of it though, it looks like I might get away with it, since equality in a qual is re-checked when I execute a select statement, but isn't checked when I attempt to violate a unique constraint (a behavioural difference which the Hungarians will be happy about, since they took the unusual step of representing that the strings 'potyty' and 'potty' were exactly equivalent to the appropriate authorities). Perhaps more importantly, I cannot recreate any of these problems on my Fedora 16 machine. Even with hu_HU on LATIN2, Tom's original test case (from 2005, on a Fedora 4 machine) cannot be recreated. So it may be that they've tightened these things up in some way. It's far from clear why that should be. It could be worth -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On 18 June 2012 16:59, Peter Geoghegan <peter@2ndquadrant.com> wrote: > Perhaps more importantly, I cannot recreate any of these problems on > my Fedora 16 machine. Even with hu_HU on LATIN2, Tom's original test > case (from 2005, on a Fedora 4 machine) cannot be recreated. So it may > be that they've tightened these things up in some way. It's far from > clear why that should be. > > It could be worth Ugh, hit send too soon. I meant to close with the observation that it may be safe to rely on strcoll() / strxfrm() + strcmp() not ever returning 0 on certain platforms, if my inability to recreate this is anything to go by. There could be a test run by configure that determines this, enabling us to then use the strxfrm() optimisation. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
So, just to give a bit more weight to my argument that we should recognise that equivalent strings ought to be treated identically, I direct your attention to conformance requirement C9 of Unicode 3.0: http://www.unicode.org/unicode/standard/versions/enumeratedversions.html#Unicode_3_0_0 This "requires that canonically equivalent text be treated identically. See also C10 and several other places in The Unicode Standard." This page, "DETERMINISTIC SORTING", was also interesting: http://www.unicode.org/notes/tn9/ The same hack that we use in varstr_cmp() appears as psuedo-code, under "Forcing Deterministic Comparisons". I'm not sure in what general sense what *they'd* call a non-deterministic comparison (i.e. plain old strcoll()) is actually non-deterministic. Clearly a "non-deterministic comparison" is deterministic to the extent that given the same two binary strings and encoding and collation, the comparison function will always give the same answer. The article goes on to describe how we could bolt-on a strxfrm() type value to a string, to ensure "deterministic comparison" (i.e. identical behaviour to Postgres master) with pre-processing. Tom did think that this might still be worth it. I'd rather avoid it. The article goes on to say of so-called deterministic comparisons: "However, a deterministic comparison is generally not best practice. First, it has a certain performance cost in comparison, and a quite substantial impact on sort key size. (For example, [ICU] language-sensitive sort keys are generally about the size of the original string, so appending a copy of the original string generally doubles the size of the sort key.) A database using these sort keys can have substantially increased disk footprint and memory footprint, and consequently reduced performance." I note also that the The Single UNIX Specification, Version 2 says of strcoll(): "The strxfrm() and strcmp() functions should be used for sorting large lists". Why has it taken us this long to research this? I am aware that Greg Stark suggested it back in 2005, but I doubt that that was the first time. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Peter Geoghegan <peter@2ndquadrant.com> wrote: > So, just to give a bit more weight to my argument that we should > recognise that equivalent strings ought to be treated identically Since we appear to be questioning everything in this area, I'll raise something which has been bugging me for a while: in some other systems I've used, the "tie-breaker" comparison for equivalent values comes after equivalence sorting on *all* sort keys, rather than *each* sort key. For example, this much makes sense with lc_collate = 'en_US.UTF-8': test=# create table c (last_name text not null, first_name text); CREATE TABLE test=# insert into c values ('smith', 'bob'), ('smith', 'peter'), ('SMITH', 'EDWARD'); INSERT 0 3 test=# select * from c order by 2;last_name | first_name -----------+------------smith | bobSMITH | EDWARDsmith | peter (3 rows) This seems completely wrong: test=# select * from c order by 1,2;last_name | first_name -----------+------------smith | bobsmith | peterSMITH | EDWARD (3 rows) I have seen other databases which get it in the order I would expect -- where the C compare only matters within groups of equivalent rows. It seems that PostgreSQL effectively orders by: last_name using collation 'en_US.UTF-8' last_name using collation'C' first_name using collation 'en_US.UTF-8' first_name using collation 'C' while some other products order by: last_name using collation 'en_US.UTF-8' first_name using collation 'en_US.UTF-8' last_nameusing collation 'C' first_name using collation 'C' I'm sure the latter is harder to do and slower to execute; but the former just doesn't seem defensible as correct. -Kevin
On 19 June 2012 16:17, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Peter Geoghegan <peter@2ndquadrant.com> wrote: > >> So, just to give a bit more weight to my argument that we should >> recognise that equivalent strings ought to be treated identically > > Since we appear to be questioning everything in this area, I'll > raise something which has been bugging me for a while: in some other > systems I've used, the "tie-breaker" comparison for equivalent > values comes after equivalence sorting on *all* sort keys, rather > than *each* sort key. Are you sure that they actually have a tie-breaker, and don't just make the distinction between equality and equivalence (if only internally)? I would have checked that myself already, but I don't have access to any other RDBMS that I'd expect to care about these kinds of distinctions. They make sense for ensuring that the text comparator's notion of equality is consistent with text's general notion (if that's bitwise equality, which I suspect it is in these other products too for the same reasons it is for us). I don't see why you'd want a tie-breaker across multiple keys. I mean, you could, I just don't see any reason to. > test=# select * from c order by 2; > last_name | first_name > -----------+------------ > smith | bob > SMITH | EDWARD > smith | peter > (3 rows) > > This seems completely wrong: > > test=# select * from c order by 1,2; > last_name | first_name > -----------+------------ > smith | bob > smith | peter > SMITH | EDWARD > (3 rows) Agreed. Definitely a POLA violation. > I'm sure the latter is harder to do and slower to execute; but the > former just doesn't seem defensible as correct. This same gripe is held by the author of that sorting document I linked to from the Unicode consortium, with a very similar example. So it seems like this could be a win from several perspectives, as it would enable the strxfrm() optimisation. I'm pretty sure that pg_upgrade wouldn't be very happy about this, so we'd have to have a legacy compatibility mode. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Peter Geoghegan <peter@2ndquadrant.com> wrote: > Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: >> Since we appear to be questioning everything in this area, I'll >> raise something which has been bugging me for a while: in some >> other systems I've used, the "tie-breaker" comparison for >> equivalent values comes after equivalence sorting on *all* sort >> keys, rather than *each* sort key. > > Are you sure that they actually have a tie-breaker, and don't just > make the distinction between equality and equivalence (if only > internally)? I'm pretty sure that when I was using Sybase ASE the order for non-equal values was always predictable, and it behaved in the manner I describe below. I'm less sure about any other product. > I don't see why you'd want a tie-breaker across multiple keys. I > mean, you could, I just don't see any reason to. So that you can have entirely repeatable results across multiple runs? >> test=# select * from c order by 2; >> last_name | first_name >> -----------+------------ >> smith | bob >> SMITH | EDWARD >> smith | peter >> (3 rows) >> >> This seems completely wrong: >> >> test=# select * from c order by 1,2; >> last_name | first_name >> -----------+------------ >> smith | bob >> smith | peter >> SMITH | EDWARD >> (3 rows) > > Agreed. Definitely a POLA violation. > >> I'm sure the latter is harder to do and slower to execute; but >> the former just doesn't seem defensible as correct. > > This same gripe is held by the author of that sorting document I > linked to from the Unicode consortium, with a very similar > example. So it seems like this could be a win from several > perspectives, as it would enable the strxfrm() optimisation. I'm > pretty sure that pg_upgrade wouldn't be very happy about this, so > we'd have to have a legacy compatibility mode. At a minimum, it could require rebuilding indexes with multiple columns where any indexed value before the last index column can be equivalent but not equal. -Kevin
On 19 June 2012 17:45, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Peter Geoghegan <peter@2ndquadrant.com> wrote: >> Are you sure that they actually have a tie-breaker, and don't just >> make the distinction between equality and equivalence (if only >> internally)? > > I'm pretty sure that when I was using Sybase ASE the order for > non-equal values was always predictable, and it behaved in the > manner I describe below. I'm less sure about any other product. Maybe it used a physical row identifier as a tie-breaker? Note that we use ItemPointers as a tie-breaker for sorting index tuples. I imagine that it was at least predictable among columns being sorted, if only because en_US.UTF-8 doesn't have any notion of equivalence (that is, it just so happens that there are no two strings that are equivalent but not bitwise equal). It would surely be impractical to do a comparison for the entire row, as that could be really expensive. >> I don't see why you'd want a tie-breaker across multiple keys. I >> mean, you could, I just don't see any reason to. > > So that you can have entirely repeatable results across multiple > runs? I suppose that isn't an unreasonable use-case, but it would be unreasonable to require that everyone pay for it if, particularly if it implied another strcmp(). You don't need me to tell you that we generally discourage the idea that the order that tuples are returned in is defined in any way beyond that asked for by the user. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Peter Geoghegan <peter@2ndquadrant.com> wrote: > Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: >> I'm pretty sure that when I was using Sybase ASE the order for >> non-equal values was always predictable, and it behaved in the >> manner I describe below. I'm less sure about any other product. > > Maybe it used a physical row identifier as a tie-breaker? Note > that we use ItemPointers as a tie-breaker for sorting index > tuples. > > I imagine that it was at least predictable among columns being > sorted, if only because en_US.UTF-8 doesn't have any notion of > equivalence (that is, it just so happens that there are no two > strings that are equivalent but not bitwise equal). It would > surely be impractical to do a comparison for the entire row, as > that could be really expensive. We weren't using en_US.UTF-8 collation (or any other "proper" collation) on Sybase -- I'm not sure whether they even supported proper collation sequences on the versions we used. I'm thinking of when we were using their "case insensitive" sorting. I don't know the implementation details, but the behavior was consistent with including each character-based column twice: once in the requested position in the ORDER BY clause but folded to a consistent case, and again after all the columns in the ORDER BY clause in original form, with C collation. I wasn't aware that en_US.UTF-8 doesn't have equivalence without equality. I guess that surprising result in my last post is just plain inevitable with that collation then. Bummer. Is there actually anyone who finds that to be a useful behavior? For a collation which considered upper-case and lower-case to be equivalent, would PostgreSQL sort as I wanted, or is it doing some tie-break per column within equivalent values? -Kevin
On 19 June 2012 18:57, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > We weren't using en_US.UTF-8 collation (or any other "proper" > collation) on Sybase -- I'm not sure whether they even supported > proper collation sequences on the versions we used. I'm thinking of > when we were using their "case insensitive" sorting. I don't know > the implementation details, but the behavior was consistent with > including each character-based column twice: once in the requested > position in the ORDER BY clause but folded to a consistent case, and > again after all the columns in the ORDER BY clause in original form, > with C collation. > > I wasn't aware that en_US.UTF-8 doesn't have equivalence without > equality. Not that many do. The underlying cause of the problem back in 2005 was the tacit assumption that none do, which presumably we got away with for a while. I mentioned Hungarian, but it happens a bit in Swedish too. PostgreSQL supported Unicode before 2005, when the tie-breaker was introduced. I know at least one Swede who used Postgres95. I just took a look at the REL6_4 branch, and it looks much the same in 1999 as it did in 2005, in that there is no tie-breaker after the strcoll(). Now, that being the case, and Hungarian in particular having a whole bunch of these equivalencies, I have to wonder if the original complainant's problem really was diagnosed correctly. It could of had something to do with the fact that texteq() was confused about whether it reported equality or equivalency - it may have taken that long for the (len1 != len2) fastpath thing (only holds for equality, not equivalence, despite the fact that the 2005-era strcoll() call checks equivalence within texteq() ) to trip someone out, because texteq() would have thereby given inconsistent answers in a very subtle way, that were not correct either according to the Hungarian locale, nor according to simple bitwise equality. That's mostly speculation, but the question must be asked. > I guess that surprising result in my last post is just > plain inevitable with that collation then. Bummer. Is there > actually anyone who finds that to be a useful behavior? Looking at the details again and assuming a US locale, yeah, it is. The substance of your complain holds though. > For a collation which considered upper-case and lower-case to be > equivalent, would PostgreSQL sort as I wanted, or is it doing some > tie-break per column within equivalent values? You could do that, and some people do use custom collations for various reasons. That's obviously very much of minority interest though. Most people will just use citext or something. However, since citext is itself a client of varstr_cmp(), this won't help you. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > I wasn't aware that en_US.UTF-8 doesn't have equivalence without > equality. I guess that surprising result in my last post is just > plain inevitable with that collation then. Bummer. Is there > actually anyone who finds that to be a useful behavior? For a > collation which considered upper-case and lower-case to be > equivalent, would PostgreSQL sort as I wanted, or is it doing some > tie-break per column within equivalent values? Well, there are two different questions there. As far as the overall structure of an ORDER BY or btree index is concerned, all it knows is what the datatype comparator functions tell it. There is no such thing as a second pass to reconsider values found to be "equal"; that's all the info there is. As far as the behavior of the comparator function for text is concerned, we choose to break ties reported by strcoll via strcmp. But that's not visible from outside the comparator, so there's no notion of an "equivalence" behavior different from "equality". I'm not exactly convinced that we could have a second-pass tiebreak mechanism without creating serious problems for btree indexes; in particular it seems like ordering considering only some leading keys might not match the actual index ordering. regards, tom lane
On 19 June 2012 19:44, Peter Geoghegan <peter@2ndquadrant.com> wrote: > You could do that, and some people do use custom collations for > various reasons. That's obviously very much of minority interest > though. Most people will just use citext or something. However, since > citext is itself a client of varstr_cmp(), this won't help you. I spoke too soon - that would work fine in the U.S, since the text is converted to lower case on-the-fly in a locale and collation aware fashion. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On sön, 2012-06-17 at 23:58 +0100, Peter Geoghegan wrote: > So if you take the word "Aßlar" here - that is equivalent to "Asslar", > and so strcoll("Aßlar", "Asslar") will return 0 if you have the right > LC_COLLATE This is not actually correct. glibc will sort Asslar before Aßlar, and that is correct in my mind. When a Wikipedia page on some particular language's alphabet says something like "$letterA and $letterB are equivalent", what it really means is that they are sorted the same compared to other letters, but are distinct when ties are broken. > (if you tried this out for yourself and found that I was > actually lying through my teeth, pretend I said Hungarian instead of > German and "some really obscure character" rather than ß). Yeah, there are obviously exceptions, which led to the original change being made, but they are not as wide-spread as they appear to be. The real issue in this area, I suspect, will be dealing with Unicode combining sequences versus equivalent precombined characters. But support for that is generally crappy, so it's not urgent to deal with it.
On 20 June 2012 11:00, Peter Eisentraut <peter_e@gmx.net> wrote: > On sön, 2012-06-17 at 23:58 +0100, Peter Geoghegan wrote: >> So if you take the word "Aßlar" here - that is equivalent to "Asslar", >> and so strcoll("Aßlar", "Asslar") will return 0 if you have the right >> LC_COLLATE > > This is not actually correct. glibc will sort Asslar before Aßlar, and > that is correct in my mind. Uh, what happened here was that I assumed that it was correct, and then went to verify it and found that it wasn't before sending the mail, and couldn't immediately find any hard data about what characters this did apply to, I decided to turn it into a joke. I say this, and yet you've included that bit of the e-mail inline in your reply, so maybe it just wasn't a very good joke. > When a Wikipedia page on some particular language's alphabet says > something like "$letterA and $letterB are equivalent", what it really > means is that they are sorted the same compared to other letters, but > are distinct when ties are broken. I know. >> (if you tried this out for yourself and found that I was >> actually lying through my teeth, pretend I said Hungarian instead of >> German and "some really obscure character" rather than ß). > > Yeah, there are obviously exceptions, which led to the original change > being made, but they are not as wide-spread as they appear to be. True. > The real issue in this area, I suspect, will be dealing with Unicode > combining sequences versus equivalent precombined characters. But > support for that is generally crappy, so it's not urgent to deal with > it. I agree that it isn't urgent. However, I have an ulterior motive, which is that in allowing for this, we remove the need to strcmp() after each strcoll(), and consequently it becomes possible to use strxfrm() instead. Now, we could also use a hack that's going to make the strxfrm() blobs even bulkier still (basically, concatenate the original text to the blob before strcmp()), but I don't want to go there if it can possibly be avoided. I should also point out that we pride ourselves on following the letter of the standard when that makes sense, and we are currently not doing that in respect of the Unicode standard. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On 19 June 2012 19:44, Peter Geoghegan <peter@2ndquadrant.com> wrote: > PostgreSQL supported Unicode before 2005, when the tie-breaker was > introduced. I know at least one Swede who used Postgres95. I just took > a look at the REL6_4 branch, and it looks much the same in 1999 as it > did in 2005, in that there is no tie-breaker after the strcoll(). Now, > that being the case, and Hungarian in particular having a whole bunch > of these equivalencies, I have to wonder if the original complainant's > problem really was diagnosed correctly. It could of had something to > do with the fact that texteq() was confused about whether it reported > equality or equivalency - it may have taken that long for the (len1 != > len2) fastpath thing (only holds for equality, not equivalence, > despite the fact that the 2005-era strcoll() call checks equivalence > within texteq() ) to trip someone out, because texteq() would have > thereby given inconsistent answers in a very subtle way, that were not > correct either according to the Hungarian locale, nor according to > simple bitwise equality. It seems likely that this is more-or-less correct. The two equivalent strings had a variable number of characters, so the fastpath made texteq not accord with varstr_cmp(), even though texteq() itself also only had a single strcoll() call. So there was some tuples with the hungarian string "potty" in the 2005 bug report. They were not visible for any of the queries seen in test cases, even the "good" ones. There were a few tuples with equivalent strings like "potyty" that were visible. The index scans didn't fail to return the expected tuples because the indexes were internally inconsistent or otherwise corrupt. Rather, the _bt_checkkeys() function returned early because the faulty "half equivalence, half equality" texteq() comparator reported that the tuple returned didn't satisfy the qual, and on that basis the index scan stopped. This usually wouldn't happen with Swedish, because their equivalencies tend to be one character long, and are less common. So far, so good, but how did this not blow-up sooner? Did Hungarians only start using Postgres in late 2005, immediately after the 8.1 release? Hardly. Commit c1d62bfd00f4d1ea0647e12947ca1de9fea39b33, made in late 2003, "Add operator strategy and comparison-value datatype fields to ScanKey", may be part of the problem here. Consider this test within _bt_checkkeys(), that was changed by that commit: - if (key->sk_flags & SK_COMMUTE) - test = FunctionCall2(&key->sk_func, - key->sk_argument, datum); - else - test = FunctionCall2(&key->sk_func, - datum, key->sk_argument); + test = FunctionCall2(&key->sk_func, datum, key->sk_argument); - if (DatumGetBool(test) == !!(key->sk_flags & SK_NEGATE)) + if (!DatumGetBool(test)) I think that this change may have made the difference between the Hungarians getting away with it and not getting away with it. Might it have been that for text, they were using some operator that wasn't '=' (perhaps one which has no fastpath, and thus correctly made a representation about equivalency) rather than texteq prior to this commit? I didn't eyeball the pg_amop entries of the era myself, but it seems quite possible. In any case, I find it hard to believe that it took at least ten years for this problem to manifest itself just because it took that long for a Hungarian with a strcoll() implementation that correctly represented equivalency to use Postgres. A year is a more plausible window. If we do introduce an idea of equivalency to make all this work, that means there'll have to be equivalency verification when equality verification returned false in a number of places, including the above. For Gist, there is an equivalent test will still vary based on the strategy number used dubbed "the consistent function", which seems analogous to the above. So, you're going to have an extra strcoll()/strxfrm() + strcmp() here, as part of a "not-equal-but-maybe-equivalent" test, which is bad. However, if that means that we can cache a text constant as a strxfrm() blob, and compare in a strxfrm()-wise fashion, that will more than pay for itself, even for btree traversal alone. For a naive strxfrm() + strcoll() implementation, that will save just under half of the work, and everyone knows that the cost of the comparison is what dominates here, particularly for certain collations. We'd probably formalise it to the point where there'd be a btree strategy number and fully-fledged equivalency operator that the user could conceivably use themselves. There seems to be scope-creep here. I'm not sure that I should continue with this as part of this review. Maybe this should be something that I work on for the next commitfest. It would be nice to hear what others thought of these ideas before I actually start writing a patch that both fixes these problems (our behaviour is incorrect for some locales according to the Unicode standard), facilitates a strxfrm() optimisation, and actually adds a strxfrm() optimisation. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On Sun, Jun 17, 2012 at 9:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The trick for hashing such datatypes is to be able to guarantee that > "equal" values hash to the same hash code, which is typically possible > as long as you know the equality rules well enough. We could possibly > do that for text with pure-strcoll equality if we knew all the details > of what strcoll would consider "equal", but we do not. It occurs to me that strxfrm would answer this question. If we made the hash function hash the result of strxfrm then we could make equality use strcoll and not fall back to strcmp. I'm suspect in a green field that's what we would do though the cpu cost might be enough to think hard about it. I'm not sure it's worth considering switching though. The cases where it matters to users incidentally is when you have a multi-column sort order and have values that are supposed to sort equal in the first column but print differently. Given that there seems to be some controversy in the locale definitions -- most locals seem to use "insignificant" factors like accents or ligatures as tie-breakers and avoid claiming different sequences are equal even when the language usually treats them as equivalent -- it doesn't seem super important to maintain the property for the few locales that fall the other way. Unless my impression is wrong and there's a good principled reason why some locales treat nearly equivalent strings one way and some treat them the other. -- greg
On 20 June 2012 15:10, Greg Stark <stark@mit.edu> wrote: > On Sun, Jun 17, 2012 at 9:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The trick for hashing such datatypes is to be able to guarantee that >> "equal" values hash to the same hash code, which is typically possible >> as long as you know the equality rules well enough. We could possibly >> do that for text with pure-strcoll equality if we knew all the details >> of what strcoll would consider "equal", but we do not. > > It occurs to me that strxfrm would answer this question. If we made > the hash function hash the result of strxfrm then we could make > equality use strcoll and not fall back to strcmp. What about per-column collations? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On Wed, Jun 20, 2012 at 3:19 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote: >> It occurs to me that strxfrm would answer this question. If we made >> the hash function hash the result of strxfrm then we could make >> equality use strcoll and not fall back to strcmp. > > What about per-column collations? Well collations aren't really per-column, they're specific to the comparison in the expression at hand. The per-column collation is just the default for comparisons against that column. So for a hash join for example you would use build the hash table using strxfrm for the collation being used for the join expression. For hash indexes the index would only be valid for a specific collation, just like a btree index is. But this all seems like a lot of work for a case that most locales seem to think isn't worth worrying about. -- greg
Peter Geoghegan <peter@2ndquadrant.com> writes: > I think that this change may have made the difference between the > Hungarians getting away with it and not getting away with it. Might it > have been that for text, they were using some operator that wasn't '=' > (perhaps one which has no fastpath, and thus correctly made a > representation about equivalency) rather than texteq prior to this > commit? Uh, no. There aren't any magic variants of equality now, and there were not back then either. I'm inclined to think that the "Hungarian problem" did exist long before we fixed it. > So, you're going to have an extra strcoll()/strxfrm() + strcmp() here, > as part of a "not-equal-but-maybe-equivalent" test, which is bad. > However, if that means that we can cache a text constant as a > strxfrm() blob, and compare in a strxfrm()-wise fashion, that will > more than pay for itself, even for btree traversal alone. Um ... are you proposing to replace text btree index entries with strxfrm values? Because if you aren't, I don't see that this is likely to win anything. And if you are, it loses on the grounds of (a) index bloat and (b) loss of ability to do index-only scans. > It would be nice to hear what others thought of these ideas before I > actually start writing a patch that both fixes these problems (our > behaviour is incorrect for some locales according to the Unicode > standard), facilitates a strxfrm() optimisation, and actually adds a > strxfrm() optimisation. Personally I think this is not a direction we want to go in. I think that it's going to end up a significant performance loss in many cases, break backwards compatibility in numerous ways, and provide a useful behavioral improvement to only a small minority of users. If you check the archives, we get far more complaints from people who think their locale definition is wrong than from people who are unhappy because we don't hew exactly to the corner cases of their locale. regards, tom lane
On 20 June 2012 15:55, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Geoghegan <peter@2ndquadrant.com> writes: >> I think that this change may have made the difference between the >> Hungarians getting away with it and not getting away with it. Might it >> have been that for text, they were using some operator that wasn't '=' >> (perhaps one which has no fastpath, and thus correctly made a >> representation about equivalency) rather than texteq prior to this >> commit? > > Uh, no. There aren't any magic variants of equality now, and there were > not back then either. I'm inclined to think that the "Hungarian > problem" did exist long before we fixed it. I was suggesting that an operator other than equality may have been used for some reason. It probably isn't a significant issue though. > Um ... are you proposing to replace text btree index entries with > strxfrm values? Because if you aren't, I don't see that this is likely > to win anything. And if you are, it loses on the grounds of (a) index > bloat and (b) loss of ability to do index-only scans. No, I'm suggesting it would probably be at least a bit of a win here to cache the constant, and only have to do a strxfrm() + strcmp() per comparison. Not enough to justify all the added complexity of course, but I wouldn't seek to justify this on the basis of improvements to the speed of btree traversal. It would obviously be much more valuable for tuple sorting. > Personally I think this is not a direction we want to go in. I think > that it's going to end up a significant performance loss in many cases, > break backwards compatibility in numerous ways, and provide a useful > behavioral improvement to only a small minority of users. I may have over-emphasized the improvement in correctness that this would bring, which I personally consider to be very much a secondary benefit. The fact is that this is likely to be a fairly significant performance win, because strxfrm() is quite simply the way you're supposed to do collation-aware sorting, and is documented as such. For that reason, C standard library implementations should not be expected to emphasize its performance - they assume that you're using strxfrm() + their highly optimised strcmp() (as I've said, the glibc implementation is written in ASM, and makes use of hand-written SSSE3 instructions on x86_64 for example). The only performance downside that I can think of is the added check for equivalence for each tuple within _bt_checkkeys() - perhaps you were thinking of something else that hasn't occurred to me though. Were you? The added overhead is only going to be paid only once per index scan, and not once per tuple returned by an index scan. Since equality implies equivalence for us, there is no need to check equivalence until something is unequal, and when that happens, and equivalence doesn't hold, we're done. Meanwhile, that entire index scan just got measurably faster from caching the constant's strxfrm() blob. Now, maybe it is a bit funky that this equivalence test has to happen even though the vast majority of locales don't care. If the only alternative is to bloat up the strxfrm() blobs with the original string, I'd judge that the funkiness is well worth it. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Peter Geoghegan <peter@2ndquadrant.com> writes: > No, I'm suggesting it would probably be at least a bit of a win here > to cache the constant, and only have to do a strxfrm() + strcmp() per > comparison. Um, have you got any hard evidence to support that notion? The traditional advice is that strcoll is faster than using strxfrm unless the same strings are to be compared repeatedly. I'm not convinced that saving the strxfrm on just one side will swing the balance. > The fact is that this is likely to be a fairly significant > performance win, because strxfrm() is quite simply the way you're > supposed to do collation-aware sorting, and is documented as such. For > that reason, C standard library implementations should not be expected > to emphasize its performance - they assume that you're using strxfrm() > + their highly optimised strcmp() Have you got any evidence in support of this claim, or is it just wishful thinking about what's likely to be inside libc? I'd also note that any comparisons you may have seen about this are certainly not accounting for the effects of data bloat from strxfrm (ie, possible spill to disk, more merge passes, etc). In any case, if you have to redefine the meaning of equality in order to justify a performance patch, I'm prepared to walk away at the start. The range of likely performance costs/benefits across different locales and different implementations is so wide that if you can't show it to be a win even with the strcmp tiebreaker, it's not likely to be a reliable win without that. regards, tom lane
On Wed, Jun 20, 2012 at 12:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> The fact is that this is likely to be a fairly significant >> performance win, because strxfrm() is quite simply the way you're >> supposed to do collation-aware sorting, and is documented as such. For >> that reason, C standard library implementations should not be expected >> to emphasize its performance - they assume that you're using strxfrm() >> + their highly optimised strcmp() > > Have you got any evidence in support of this claim, or is it just > wishful thinking about what's likely to be inside libc? I'd also note > that any comparisons you may have seen about this are certainly not > accounting for the effects of data bloat from strxfrm (ie, possible > spill to disk, more merge passes, etc). > > In any case, if you have to redefine the meaning of equality in order > to justify a performance patch, I'm prepared to walk away at the start. > The range of likely performance costs/benefits across different locales > and different implementations is so wide that if you can't show it to be > a win even with the strcmp tiebreaker, it's not likely to be a reliable > win without that. On the testing I've done, the strcmp() tie-breaking rarely gets run anyway. Unless you're sorting data with only a few distinct values, most comparisons are between values that are distinct under any choice of collation, and therefore strcoll() returns 0 very rarely, and therefore the additional runtime it consumes does not matter very much. Also, it's quite a bit faster than strcoll() anyway, so even when it does run it doesn't add much to the total time. I think the elephant in the room here is that we're relying on the OS to do everything for us, and the OS API we use (strcoll) requires an extra memcpy and is also dreadfully slow. If we could solve that problem, it would save us a lot more than worrying about the extra strcmp(). Of course, solving that problem is hard: we either have to get the glibc and FreeBSD libc folks to provide a better API (that takes lengths for each input instead of relying on trailing nul bytes), or reimplement locales within PG, or store trailing NUL bytes that we don't really need in the index so we can apply strcoll directly, none of which are very appealing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 20 June 2012 17:41, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Geoghegan <peter@2ndquadrant.com> writes: >> No, I'm suggesting it would probably be at least a bit of a win here >> to cache the constant, and only have to do a strxfrm() + strcmp() per >> comparison. > > Um, have you got any hard evidence to support that notion? The > traditional advice is that strcoll is faster than using strxfrm unless > the same strings are to be compared repeatedly. I'm not convinced that > saving the strxfrm on just one side will swing the balance. Fair enough, but I'd suggest that that traditional advice assumes that strcoll()'s space-efficiency and general avoidance of dynamic allocation is an important factor, which, for us, it clearly isn't, since we can re-use a single buffer in the manner of Robert's text sortsupport patch for each and every per-tuple strxfrm() blob (when traversing an index). This is a direct quote from glibc's strcoll_l.c : Perform the first pass over the string and while doing this find and store the weights for each character. Sincewe want this to be as fast as possible we are using `alloca' to store the temporary values. But since there isno limit on the length of the string we have to use `malloc' if the string is too long. We should be very conservativehere. Here, alloca is used to allocate space in a stack frame. I believe that this is an entirely inappropriate trade-off for Postgres to be making. strxfrm(), in constrast, leaves buffer sizing and management up to the caller. That has to be a big part of the problem here. >> The fact is that this is likely to be a fairly significant >> performance win, because strxfrm() is quite simply the way you're >> supposed to do collation-aware sorting, and is documented as such. For >> that reason, C standard library implementations should not be expected >> to emphasize its performance - they assume that you're using strxfrm() >> + their highly optimised strcmp() > > Have you got any evidence in support of this claim, or is it just > wishful thinking about what's likely to be inside libc? According to the single-unix specification's strcoll() documentation, "The strxfrm() and strcmp() functions should be used for sorting large lists". If that isn't convincing enough for you, there is the fact that glibc's strcmp() is clearly highly optimised for each and every architecture, and that we are currently throwing away an extra strcmp() in the event of strcoll() equality. > I'd also note that any comparisons you may have seen about this are certainly not > accounting for the effects of data bloat from strxfrm (ie, possible > spill to disk, more merge passes, etc). What about the fact that strcoll() may be repeatedly allocating and freeing memory per comparison? The blobs really aren't that much larger than the strings to be sorted, which are typically quite short. > In any case, if you have to redefine the meaning of equality in order > to justify a performance patch, I'm prepared to walk away at the start. The advantage of my proposed implementation is precisely that I won't have to redefine the meaning of equality, and that only the text datatype will have to care about equivalency, so you can just skip over an explanation of equivalency for most audiences. If you feel that strongly about it, and I have no possible hope of getting this accepted, I'm glad that I know now rather than after completing a significant amount of work on this. I would like to hear other people's opinions before I drop it though. > The range of likely performance costs/benefits across different locales > and different implementations is so wide that if you can't show it to be > a win even with the strcmp tiebreaker, it's not likely to be a reliable > win without that. glibc is the implementation that really matters. My test-case used en_US.UTF-8 as its locale, which has to be one of the least stressful to strcoll() - I probably could have shown a larger improvement just by selecting a locale that was known to have to make more passes, like, say, hu_HU.UTF-8. I expect to have access to a 16 core server next week, which Bull have made available to me. Maybe I'll get some interesting performance numbers from it. The reason that I don't want to use the blob with original string hack is because it's ugly, space-inefficient, unnecessary and objectively incorrect, since it forces us to violate the conformance requirement C9 of Unicode 3.0, marginal though that may be. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On 20 June 2012 17:41, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Geoghegan <peter@2ndquadrant.com> writes: >> No, I'm suggesting it would probably be at least a bit of a win here >> to cache the constant, and only have to do a strxfrm() + strcmp() per >> comparison. > > Um, have you got any hard evidence to support that notion? The > traditional advice is that strcoll is faster than using strxfrm unless > the same strings are to be compared repeatedly. I'm not convinced that > saving the strxfrm on just one side will swing the balance. I've written a very small C++ program, which I've attached, that basically proves that this can still make a fairly large difference - I hope it's okay that that it's C++, but that allowed me to write the program quickly, with no dependencies for anyone who cares to try this out, other than a C++ compiler and the standard library. It just inserts 500,000 elements (the same succession of psuedo-random strings again and again) into a std::set, which is implemented using a red-black tree on my machine. In one case, we just use strcmp() (there is actually no tie-breaker). In the other case, the comparator allocates and fills memory for a strxfrm blob when needed, caches it, and uses strxfrm() to generate blobs for other elements into a dedicated buffer which persists across comparisons. It prints the duration of each run, and a running average for each case until terminated. Here is what the output looks like on my machine: [peter@peterlaptop strxfrm_test]$ g++ -O2 set_test.cpp [peter@peterlaptop strxfrm_test]$ ./a.out Time elapsed with strcoll: 1.485290 seconds Time elapsed with strxfrm optimization: 1.070211 seconds Time elapsed with strcoll: 1.813725 seconds Time elapsed with strxfrm optimization: 1.097950 seconds Time elapsed with strcoll: 1.813381 seconds Time elapsed with strxfrm optimization: 1.102769 seconds Time elapsed with strcoll: 1.826708 seconds Time elapsed with strxfrm optimization: 1.105093 seconds Time elapsed with strcoll: 1.842156 seconds Time elapsed with strxfrm optimization: 1.103198 seconds *****strcoll average: 1.756252 strxfrm average: 1.095844***** Time elapsed with strcoll: 1.834785 seconds Time elapsed with strxfrm optimization: 1.105369 seconds Time elapsed with strcoll: 1.817449 seconds Time elapsed with strxfrm optimization: 1.110386 seconds Time elapsed with strcoll: 1.812446 seconds Time elapsed with strxfrm optimization: 1.101266 seconds Time elapsed with strcoll: 1.813595 seconds Time elapsed with strxfrm optimization: 1.099444 seconds Time elapsed with strcoll: 1.814584 seconds Time elapsed with strxfrm optimization: 1.099542 seconds *****strcoll average: 1.787412 strxfrm average: 1.099523***** Time elapsed with strcoll: 1.820218 seconds Time elapsed with strxfrm optimization: 1.102984 seconds Time elapsed with strcoll: 1.817549 seconds Time elapsed with strxfrm optimization: 1.100526 seconds Time elapsed with strcoll: 1.817020 seconds Time elapsed with strxfrm optimization: 1.099273 seconds Time elapsed with strcoll: 1.820983 seconds Time elapsed with strxfrm optimization: 1.100501 seconds Time elapsed with strcoll: 1.813270 seconds Time elapsed with strxfrm optimization: 1.098904 seconds *****strcoll average: 1.797544 strxfrm average: 1.099828***** Time elapsed with strcoll: 1.815952 seconds Time elapsed with strxfrm optimization: 1.102583 seconds If you'd like to print the contents of the set, there are a couple of lines you can uncomment. It should be straightforward to understand the program, even if you don't know any C++. Note that I still think that the compelling case for strxfrm() caching is sorting tuples, not btree index traversal. All that I ask is that you allow that on the whole, this will pay for the cost of verifying equivalency within _bt_checkkeys() once per index-scan. I am aware that a red-black tree isn't generally an excellent proxy for how a btree will perform, but I don't think that that really matters here. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Attachment
On 21 June 2012 01:22, Peter Geoghegan <peter@2ndquadrant.com> wrote: > I've written a very small C++ program, which I've attached, that > basically proves that this can still make a fairly large difference - > I hope it's okay that that it's C++, but that allowed me to write the > program quickly, with no dependencies for anyone who cares to try this > out, other than a C++ compiler and the standard library. So, it turns out that with a relatively expensive to sort collation like hu_HU.UTF-8, the difference here is huge: [peter@peterlaptop strxfrm_test]$ ./a.out locale set: hu_HU.UTF-8 Time elapsed with strcoll: 11.122695 seconds Time elapsed with strxfrm optimization: 2.328365 seconds Time elapsed with strcoll: 11.404160 seconds Time elapsed with strxfrm optimization: 2.355684 seconds Time elapsed with strcoll: 11.411680 seconds Time elapsed with strxfrm optimization: 2.342553 seconds Time elapsed with strcoll: 11.415693 seconds Time elapsed with strxfrm optimization: 2.343593 seconds Time elapsed with strcoll: 11.428686 seconds Time elapsed with strxfrm optimization: 2.345204 seconds *****strcoll average: 11.356583 strxfrm average: 2.343080***** -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On Jun20, 2012, at 19:38 , Peter Geoghegan wrote: > On 20 June 2012 17:41, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> In any case, if you have to redefine the meaning of equality in order >> to justify a performance patch, I'm prepared to walk away at the start. > > The advantage of my proposed implementation is precisely that I won't > have to redefine the meaning of equality, and that only the text > datatype will have to care about equivalency, so you can just skip > over an explanation of equivalency for most audiences. Ok, so what exactly is it that you're proposing then? AFAICS, you're proposing to make the less-than operator rely solely on strcol(). If you don't also redefine the quality operator to match, you'll end up with cases where !(a < b) and !(b < a), yet also !(a == b). This breaks the trichotomy law, no? Which in turns breaks merge-joins - I believe we'd output the cartesian product {potty, potyty} x {potyty, potty} when merge-joining {potty, potyty} with itself, unless the comparison operator contains a tie-breaker. Which is obviously wrong unless you decree that 'potty' = 'potyty'. I do agree that there's a use-case for having a textual type which treats equivalent strings as equal (and which should then also treat equivalent Unicode representations of the same character as equal). But it should be a separate type, not bolted onto the existing textual types. best regards, Florian Pflug
On Jun21, 2012, at 02:22 , Peter Geoghegan wrote: > I've written a very small C++ program, which I've attached, that > basically proves that this can still make a fairly large difference - > I hope it's okay that that it's C++, but that allowed me to write the > program quickly, with no dependencies for anyone who cares to try this > out, other than a C++ compiler and the standard library. Uh, are you sure the results aren't swapped? string_wrapper takes a parameter called use_strcoll, and it indeed uses strcoll() if that parameter is true, and strxfm() otherwise. But then it passes *false* to determine the speed of strcoll(), and *true* to determine the speed of strxfm()... Also, place_random_string() needs to insert a terminating zero byte, otherwise set_test segfaults, at least on my OSX machine. best regards, Florian Pflug
On 21 June 2012 10:24, Florian Pflug <fgp@phlo.org> wrote: > On Jun21, 2012, at 02:22 , Peter Geoghegan wrote: >> I've written a very small C++ program, which I've attached, that >> basically proves that this can still make a fairly large difference - >> I hope it's okay that that it's C++, but that allowed me to write the >> program quickly, with no dependencies for anyone who cares to try this >> out, other than a C++ compiler and the standard library. > > Uh, are you sure the results aren't swapped? string_wrapper takes a > parameter called use_strcoll, and it indeed uses strcoll() if that > parameter is true, and strxfm() otherwise. But then it passes *false* > to determine the speed of strcoll(), and *true* to determine the speed > of strxfm()... Egads, you're right. Apparently in my haste I gave the wrong boolean argument to the class constructor in each case. I am completely wrong about this. I apologise for the careless mistake. At least I advanced the debate, though - we don't want to do any ad-hoc generation of strxfrm() output within comparators, even when one side can have a cached value. You're right about merge-joins; I cannot answer that without arbitrarily making a convenient exception about the conditions that they join on, which is itself unacceptable. I cannot change the semantics of equality to do something with strxfrm either, since that would have unacceptable performance implications, as is evident from my test-case. I wish someone had a better idea, but I suppose at this point the idea of concatenating strxfrm() output with the original string begins to look like the least-worst option. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On Jun21, 2012, at 11:55 , Peter Geoghegan wrote: > On 21 June 2012 10:24, Florian Pflug <fgp@phlo.org> wrote: >> On Jun21, 2012, at 02:22 , Peter Geoghegan wrote: >>> I've written a very small C++ program, which I've attached, that >>> basically proves that this can still make a fairly large difference - >>> I hope it's okay that that it's C++, but that allowed me to write the >>> program quickly, with no dependencies for anyone who cares to try this >>> out, other than a C++ compiler and the standard library. >> >> Uh, are you sure the results aren't swapped? string_wrapper takes a >> parameter called use_strcoll, and it indeed uses strcoll() if that >> parameter is true, and strxfm() otherwise. But then it passes *false* >> to determine the speed of strcoll(), and *true* to determine the speed >> of strxfm()... > > Egads, you're right. Apparently in my haste I gave the wrong boolean > argument to the class constructor in each case. I am completely wrong > about this. I apologise for the careless mistake. At least I advanced > the debate, though - we don't want to do any ad-hoc generation of > strxfrm() output within comparators, even when one side can have a > cached value. FWIW, I've played around a bit with a fixed version of your set_test.cpp. I made it use the cached sort key of the RHS also, made it preserve sort keys during copy construction, even used a boost::shared_ptr to avoid actually copying the cached sort key. The strxfrm() version still performed about 30% worse than the strcoll() version. From that, I figured that tree inserts don't trigger enough comparisons to make strxfrm() worthwhile. So I mangled set_test.cpp a bit more and instead of a set::set, I created an (C-style) array of string_wrapper instances, and sorted them with std::sort(). Which made strcoll() twice as fast as strxfrm()... At this point, my theory is that your choice of "random" strings prevents strxfrm() from ever winning over strcoll(). The reason being that you pick each letter uniformly distributed from a-z, resulting in a probability of two string differing in the first of 1 - 1/26 =~ 96%. Thus, even for extremely complex collation rules, strcoll() probably only needs to compare a few characters to determine the order. Whereas strxfrm() has to compute the whole sort key, no matter what. The question is thus, how good a model are your "random" strings for the input of a typical sorting step in postgres? My guess is, a quite good one actually, since people probably don't deal with a lot of very similar strings very often. Which makes we wonder if using strxfrm() during sorting wouldn't be a net loss, all things considered… My modified set_test.cpp (which should now be called array_test.cpp) is attached. best regards, Florian Pflug
Attachment
On 21 June 2012 11:40, Florian Pflug <fgp@phlo.org> wrote: > At this point, my theory is that your choice of "random" strings prevents > strxfrm() from ever winning over strcoll(). The reason being that you pick > each letter uniformly distributed from a-z, resulting in a probability of > two string differing in the first of 1 - 1/26 =~ 96%. Thus, even for > extremely complex collation rules, strcoll() probably only needs to compare > a few characters to determine the order. Whereas strxfrm() has to compute > the whole sort key, no matter what. Good point. > The question is thus, how good a model are your "random" strings for the > input of a typical sorting step in postgres? My guess is, a quite good one > actually, since people probably don't deal with a lot of very similar strings > very often. Which makes we wonder if using strxfrm() during sorting wouldn't > be a net loss, all things considered… Well, I think the answer to that has to be no. I posted a simple postgres text -> strxfrm blob bytea SQL-callable wrapper function a few days ago that clearly wins by quite a bit on some sample queries against the dellstore sample database, which has a representative sample set. Completely uniformly-distributed data actually isn't representative of the real world at all. Normal distributions abound. This C++ program was never intended to justify the general utility of using strxfrm() for sorting, which I don't believe is in question. I just wanted to get some idea about the performance characteristics of using strxfrm() to traverse an index. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On Fri, Jun 15, 2012 at 6:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I would be concerned about this if it were per-sort-tuple wastage, but > what I understood to be happening was that this was a single instance of > an expansible buffer (per sort, perhaps, but still just one buffer). > And, as you keep pointing out when it suits your argument, it's > relatively uncommon to be sorting enormous values anyway. So no, I am > not particularly worried about that. If you are, there are more > important places to be concerned about allocation pad wastage, starting > with palloc itself. And, of course, palloc doesn't use power-of-two allocation up to infinity either, as proposed here; beyond a certain limit it switches to calling malloc(), which doesn't use power-of-two allocations for large requests either, at least not in glibc - for anything 128kB or above, it uses mmap to grab something of exactly the requested size (rounded up to the OS page size) and returns it the OS unconditionally when it's freed. However, what this really boils down to is that you and Peter don't like this line of code: + tss->buflen1 = TYPEALIGN(TEXTBUFLEN, len1); What would you like it to say instead? The obvious formulation is: while (len1 < tss->buflen1) tss->buflen *= 2; Or perhaps the following, which will normally be more efficient, though possibly not as efficient as what I've got there now: tss->buflen = 1 << ffs(len1); Of course if len1 happens to be large enough that the high-order bit is set, the first proposal above will fail to terminate and the second one will produce 0. That can't really happen in practice because a toasted datum is limited to 1GB, of course, and I suppose a lot of code would need to be adjusted if we ever wanted to raise that limit, so maybe it's not worth worrying about it. So, is one of the above code fragments what people want? Or something else? Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 23 July 2012 16:09, Robert Haas <robertmhaas@gmail.com> wrote: > However, what this really boils down to is that you and Peter don't > like this line of code: > > + tss->buflen1 = TYPEALIGN(TEXTBUFLEN, len1); I can only speak for myself, though I agree with your summary here. > What would you like it to say instead? > > The obvious formulation is: > > while (len1 < tss->buflen1) > tss->buflen *= 2; That's what I had in mind. +1. > Or perhaps the following, which will normally be more efficient, > though possibly not as efficient as what I've got there now: > > tss->buflen = 1 << ffs(len1); I'm sorry, I don't follow you here. What is ffs() ? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On Mon, Jul 23, 2012 at 11:34 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote: > On 23 July 2012 16:09, Robert Haas <robertmhaas@gmail.com> wrote: >> However, what this really boils down to is that you and Peter don't >> like this line of code: >> >> + tss->buflen1 = TYPEALIGN(TEXTBUFLEN, len1); > > I can only speak for myself, though I agree with your summary here. > >> What would you like it to say instead? >> >> The obvious formulation is: >> >> while (len1 < tss->buflen1) >> tss->buflen *= 2; > > That's what I had in mind. +1. > >> Or perhaps the following, which will normally be more efficient, >> though possibly not as efficient as what I've got there now: >> >> tss->buflen = 1 << ffs(len1); > > I'm sorry, I don't follow you here. What is ffs() ? Sorry, fls, not ffs. I always get those mixed up. See src/port/fls.c -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 23 July 2012 16:36, Robert Haas <robertmhaas@gmail.com> wrote: > On Mon, Jul 23, 2012 at 11:34 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote: >>> tss->buflen = 1 << ffs(len1); >> >> I'm sorry, I don't follow you here. What is ffs() ? > > Sorry, fls, not ffs. I always get those mixed up. > > See src/port/fls.c Oh, okay. Since, I infer, we're starting from a buffer-size that's a power-of-two anyway, is there really any advantage in doing this rather than just doubling the buffer size each time? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On Mon, Jul 23, 2012 at 12:07 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote: > On 23 July 2012 16:36, Robert Haas <robertmhaas@gmail.com> wrote: >> On Mon, Jul 23, 2012 at 11:34 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote: >>>> tss->buflen = 1 << ffs(len1); >>> >>> I'm sorry, I don't follow you here. What is ffs() ? >> >> Sorry, fls, not ffs. I always get those mixed up. >> >> See src/port/fls.c > > Oh, okay. Since, I infer, we're starting from a buffer-size that's a > power-of-two anyway, is there really any advantage in doing this > rather than just doubling the buffer size each time? Well, if you're using a builtin fls rather than our src/port implementation, it's probably a single machine language instruction instead of a loop. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Do you think you can follow through on this soon, Robert? I don't believe that there are any outstanding issues. I'm not going to make an issue of the fact that strxfrm() hasn't been taken advantage of. If you could please post a new revision, with the suggested alterations (that you agree with), I'd be happy to take another look. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On Mon, Oct 8, 2012 at 8:47 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote: > Do you think you can follow through on this soon, Robert? I don't > believe that there are any outstanding issues. I'm not going to make > an issue of the fact that strxfrm() hasn't been taken advantage of. If > you could please post a new revision, with the suggested alterations > (that you agree with), I'd be happy to take another look. I don't have any plans to work on this further. I think all of the criticism that has been leveled at this patch is 100% bogus, and I greatly dislike the changes that have been proposed. That may not be fair, but it's how I feel, and in light of that feeling it does not make sense for me to pursue this. Sorry. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 8 October 2012 16:30, Robert Haas <robertmhaas@gmail.com> wrote: > I don't have any plans to work on this further. I think all of the > criticism that has been leveled at this patch is 100% bogus, and I > greatly dislike the changes that have been proposed. That may not be > fair, but it's how I feel, and in light of that feeling it does not > make sense for me to pursue this. Sorry. If it was the case that you were only 50% of the way to getting something committable, I guess I'd understand; this is, after all, a volunteer effort, and you are of course free to pursue or not pursue whatever you like. It's more like 95% though, so I have a hard time understanding this attitude. The only criticism that I imagine that you could be referring to is my criticism (which was seconded by Tom) concerning the approach to sizing a buffer that is used as state between successive calls to the sortsupport routine. This was a fairly trivial point even within the context of this patch. It was regrettable that it got so out of hand, but that certainly wasn't my intention. I must say that I find this disheartening as a reviewer, particularly since it comes from someone who reviews a lot of patches (including most of my own), and has often complained about the difficulties that reviewers face. Is it really so hard for you to accept a criticism from me? I didn't expect you to fully accept my criticism; I only expected you to take it into consideration, and possibly let it go for the sake of progress. For what it's worth, I have high regard for your work generally. In all sincerity, this appears to me to be a slight, and I find it upsetting, both on a personal level, and because it creates a concern about being able to work with you in the future, which is certainly something that I've benefited from in the past, and hope to continue to benefit from. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
On Mon, Oct 8, 2012 at 12:26 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote: > If it was the case that you were only 50% of the way to getting > something committable, I guess I'd understand; this is, after all, a > volunteer effort, and you are of course free to pursue or not pursue > whatever you like. It's more like 95% though, so I have a hard time > understanding this attitude. > > The only criticism that I imagine that you could be referring to is my > criticism (which was seconded by Tom) concerning the approach to > sizing a buffer that is used as state between successive calls to the > sortsupport routine. This was a fairly trivial point even within the > context of this patch. It was regrettable that it got so out of hand, > but that certainly wasn't my intention. > > I must say that I find this disheartening as a reviewer, particularly > since it comes from someone who reviews a lot of patches (including > most of my own), and has often complained about the difficulties that > reviewers face. Is it really so hard for you to accept a criticism > from me? I didn't expect you to fully accept my criticism; I only > expected you to take it into consideration, and possibly let it go for > the sake of progress. > > For what it's worth, I have high regard for your work generally. In > all sincerity, this appears to me to be a slight, and I find it > upsetting, both on a personal level, and because it creates a concern > about being able to work with you in the future, which is certainly > something that I've benefited from in the past, and hope to continue > to benefit from. Hey, if me deciding I don't want to work on a patch any more is going to make you feel slighted, then you're out of luck. The archives are littered with people who have decided to stop working on things because the consensus position on list was different than the approach that they personally favored, and I have as much right to do that as anyone else. I think that you are perfectly entitled to feel disappointed that I don't like your suggestions, even as I am disappointed that the patch encountered so much resistance. But let's not turn it into a personal issue. I don't think that you're a bad person because you don't like my approach, and I hope you don't think I'm a bad person because I don't like yours. If you do, then that's just something we're both going to have to live with, because I am not going to make a policy of working on things because other people will be offended if I don't. As to what the substantive issue is, well, yeah, I don't like the memory allocation strategy that you and Tom are proposing. In the worst case it chews up a lot of extra memory, and in the best case there's no measurable performance benefit - or at least nobody done any work to demonstrate that there is one. I've already pointed out that, in fact, the strategy of doing power-of-two allocations is rarely followed for very large allocations, a point to which I believe no one has responded substantively. Now I don't think anyone else is required to respond to that point, but neither am I required to revise the patch into a form that I don't agree with. Equally, I will not presume to commit it in a form that you and Tom do not agree with. That would be asinine, and I try (not always successfully) not to be an ass. It is not as if anyone has phrased this as a maybe-we-should sort of argument; there have been quite definite arguments on both sides over apparently strongly-held positions. I had hoped that this was going to be a quick and non-controversial win, but 74 email messages later it has become clear that it will be neither of those things. I do not have a problem with accepting review feedback from you or from anyone else. There are many examples in the archives of cases where I have improved my code based on review feedback; indeed, the possibility of getting high-quality feedback from the very smart developers who inhabit this mailing list is for me a principle attraction of working on PostgreSQL, not to mention a major reason why PostgreSQL is as good a product as it is. However, that general truth does not oblige me to accept any particular piece of feedback as well-founded, and this is hardly the first suggestion that I have argued against in four years as a PostgreSQL developer, nor the first of my patches to land on the cutting-room floor. Nor do all of my suggestions for other people's patches get adopted either. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 8 October 2012 21:35, Robert Haas <robertmhaas@gmail.com> wrote: > Hey, if me deciding I don't want to work on a patch any more is going > to make you feel slighted, then you're out of luck. The archives are > littered with people who have decided to stop working on things > because the consensus position on list was different than the approach > that they personally favored, and I have as much right to do that as > anyone else. Sure you do. However, I doubt very many of those who gave up did so over so trivial an issue as to how to grow a buffer somewhere, and those that did usually did not go on to become major contributors, and certainly not committers. The buffer thing is, as far as I know, the single solitary point of contention with this patch. We're talking about 2 lines of code. To give up now is just petulant. There is no other way of looking at it. > It is not as if anyone has phrased this as a maybe-we-should > sort of argument; there have been quite definite arguments on both > sides over apparently strongly-held positions. I had hoped that this > was going to be a quick and non-controversial win, but 74 email > messages later it has become clear that it will be neither of those > things. Many of those 74 emails concerned my completely unrelated digression into exploiting strxfrm(); we spent a ridiculously long time discussing how to size this buffer, but it still wasn't anything like 74 messages. -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Peter Geoghegan <peter@2ndquadrant.com> writes: > On 8 October 2012 21:35, Robert Haas <robertmhaas@gmail.com> wrote: Gentlemen, you can't fight here. This is the War Room. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support