Thread: sortsupport for text

Re: sortsupport for text

From

Noah Misch

Date:

08 March 2012, 11:20:01

On Fri, Mar 02, 2012 at 03:45:38PM -0500, Robert Haas wrote:
> SELECT SUM(1) FROM (SELECT * FROM randomtext ORDER BY t) x;
> 
> On unpatched master, this takes about 416 ms (plus or minus a few).
> With the attached patch, it takes about 389 ms (plus or minus a very
> few), a speedup of about 7%.
> 
> I repeated the experiment using the C locale, like this:
> 
> SELECT SUM(1) FROM (SELECT * FROM randomtext ORDER BY t COLLATE "C") x;
> 
> Here, it takes about 202 ms with the patch, and about 231 ms on
> unpatched master, a savings of about 13%.

> [oprofile report, further discussion]

Thanks for looking into this.  Your patch is also a nice demonstration of
sortsupport's ability to help with more than just fmgr overhead.

> Considering all that, I
> had hoped for more like a 15-20% gain from this approach, but it
> didn't happen, I suppose because some of the instructions saved just
> resulted in more processor stalls.  All the same, I'm inclined to
> think it's still worth doing.

This is a border case, but I suggest that a 13% speedup on a narrowly-tailored
benchmark, degrading to 7% in common configurations, is too meager to justify
adopting this patch.

nm

Re: sortsupport for text

From

"Kevin Grittner"

Date:

08 March 2012, 12:30:19

Noah Misch <noah@leadboat.com> wrote:
> On Fri, Mar 02, 2012 at 03:45:38PM -0500, Robert Haas wrote:
>> SELECT SUM(1) FROM (SELECT * FROM randomtext ORDER BY t) x;
>> [13% faster with patch for C collation; 7% faster for UTF8]
>> I had hoped for more like a 15-20% gain from this approach, but
>> it didn't happen, I suppose because some of the instructions
>> saved just resulted in more processor stalls.  All the same, I'm
>> inclined to think it's still worth doing.
> 
> This is a border case, but I suggest that a 13% speedup on a 
> narrowly-tailored benchmark, degrading to 7% in common
> configurations, is too meager to justify adopting this patch.
We use the C collation and have character strings in most indexes
and ORDER BY clauses.  Unless there are significant contra-
indications, I'm in favor of adopting this patch.
-Kevin

Re: sortsupport for text

From

Greg Stark

Date:

17 March 2012, 19:59:30

On Fri, Mar 2, 2012 at 8:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> 12789    28.2686  libc-2.13.so             strcoll_l
> 6802     15.0350  postgres                 text_cmp

I'm still curious how it would compare to call strxfrm and sort the
resulting binary blobs. I don't think the sortsupport stuff actually
makes this any easier though. Since using it requires storing the
binary blob somewhere I think the support would have to be baked into
tuplesort (or hacked into the sortkey as an expr that was evaluated
earlier somehow).

It's a tradeoff and not an obvious one. The binary blobs are larger
and it would mean reading and copying more data around memory. But it
would mean doing the work that strcoll_l does only n times instead of
nlogn times. That might be a pretty significant gain.

--
greg

Re: sortsupport for text

From

Tom Lane

Date:

18 March 2012, 12:08:30

Greg Stark <stark@mit.edu> writes:
> I'm still curious how it would compare to call strxfrm and sort the
> resulting binary blobs.

In principle that should be a win; it's hard to believe that strxfrm
would have gotten into the standards if it were not a win for sorting
applications.

> I don't think the sortsupport stuff actually
> makes this any easier though. Since using it requires storing the
> binary blob somewhere I think the support would have to be baked into
> tuplesort (or hacked into the sortkey as an expr that was evaluated
> earlier somehow).

Well, obviously something has to be done, but I think it might be
possible to express this as another sortsupport API function rather than
doing anything as ugly as hardwiring strxfrm into the callers.

However, it occurred to me that we could pretty easily jury-rig
something that would give us an idea about the actual benefit available
here.  To wit: make a C function that wraps strxfrm, basically
strxfrm(text) returns bytea.  Then compare the performance of
ORDER BY text_col to ORDER BY strxfrm(text_col).

(You would need to have either both or neither of text and bytea
using the sortsupport code paths for this to be a fair comparison.)

One other thing I've always wondered about in this connection is the
general performance of sorting toasted datums.  Is it better to detoast
them in every comparison, or pre-detoast to save comparison cycles at
the cost of having to push much more data around?  I didn't see any
discussion of this point in Robert's benchmarks, but I don't think we
should go very far towards enabling sortsupport for text until we
understand the issue and know whether we need to add more infrastructure
for it.  If you cross your eyes a little bit, this is very much like
the strxfrm question...
        regards, tom lane

Re: sortsupport for text

From

Robert Haas

Date:

19 March 2012, 13:20:12

On Sat, Mar 17, 2012 at 6:58 PM, Greg Stark <stark@mit.edu> wrote:
> On Fri, Mar 2, 2012 at 8:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> 12789    28.2686  libc-2.13.so             strcoll_l
>> 6802     15.0350  postgres                 text_cmp
>
> I'm still curious how it would compare to call strxfrm and sort the
> resulting binary blobs. I don't think the sortsupport stuff actually
> makes this any easier though. Since using it requires storing the
> binary blob somewhere I think the support would have to be baked into
> tuplesort (or hacked into the sortkey as an expr that was evaluated
> earlier somehow).

Well, the real problem here is that the strxfrm'd representations
aren't just bigger - they are huge.  On MacBook Pro, if the input
representation is n characters, the strxfrm'd representation is 9x+3
characters.  If the we're sorting very wide tuples of which the sort
key is only a small part, maybe that would be acceptable, but if the
sort key makes up the bulk of the tuple than caching the strxfrm()
representation works out to slashing work_mem tenfold.  That might be
just fine if the sort is going to fit in work_mem either way, but if
it turns a quicksort into a heap sort then I feel pretty confident
that it's going to be a loser.  Keep in mind that even if the
strxfrm'd representation were no larger at all, it would still amount
to an additional copy of the data, so you'd still potentially be
eating up lots of work_mem that way.  The fact that it's an order of
magnitude larger is just making a bad problem worse.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: sortsupport for text

From

Robert Haas

Date:

19 March 2012, 13:27:50

On Sun, Mar 18, 2012 at 11:08 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> However, it occurred to me that we could pretty easily jury-rig
> something that would give us an idea about the actual benefit available
> here.  To wit: make a C function that wraps strxfrm, basically
> strxfrm(text) returns bytea.  Then compare the performance of
> ORDER BY text_col to ORDER BY strxfrm(text_col).
>
> (You would need to have either both or neither of text and bytea
> using the sortsupport code paths for this to be a fair comparison.)

Since the index will be ~9x bigger at least on this machine, I think I
know the answer, but I suppose it doesn't hurt to test it.  It's not
that much work.

> One other thing I've always wondered about in this connection is the
> general performance of sorting toasted datums.  Is it better to detoast
> them in every comparison, or pre-detoast to save comparison cycles at
> the cost of having to push much more data around?  I didn't see any
> discussion of this point in Robert's benchmarks, but I don't think we
> should go very far towards enabling sortsupport for text until we
> understand the issue and know whether we need to add more infrastructure
> for it.  If you cross your eyes a little bit, this is very much like
> the strxfrm question...

It would be surprising to me if there is a one-size-fits-all answer to
this question.  For example, if the tuple got toasted because it's got
lots of columns and we had to take desperate measures to get it to fit
into an 8kB block at all, chances are that detoasting will work out
well.  We'll use a bit more memory, but hopefully that'll be repaid by
much faster comparisons.  OTOH, if you have a data set with many
relatively short strings and a few very long ones, detoasting up-front
could turn a quicksort into a heapsort.  Since only a small fraction
of the comparisons would have involved one of the problematic long
strings anyway, it's unlikely to be worth the expense of keeping those
strings around in detoasted form for the entire sort (unless maybe
reconstructing them even a few times is problematic because we're
under heavy cache pressure and we get lots of disk seeks as a result).

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: sortsupport for text

From

Martijn van Oosterhout

Date:

19 March 2012, 18:23:38

On Mon, Mar 19, 2012 at 12:19:53PM -0400, Robert Haas wrote:
> On Sat, Mar 17, 2012 at 6:58 PM, Greg Stark <stark@mit.edu> wrote:
> > On Fri, Mar 2, 2012 at 8:45 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> >> 12789    28.2686  libc-2.13.so             strcoll_l
> >> 6802     15.0350  postgres                 text_cmp
> >
> > I'm still curious how it would compare to call strxfrm and sort the
> > resulting binary blobs. I don't think the sortsupport stuff actually
> > makes this any easier though. Since using it requires storing the
> > binary blob somewhere I think the support would have to be baked into
> > tuplesort (or hacked into the sortkey as an expr that was evaluated
> > earlier somehow).
>
> Well, the real problem here is that the strxfrm'd representations
> aren't just bigger - they are huge.  On MacBook Pro, if the input
> representation is n characters, the strxfrm'd representation is 9x+3
> characters.

Ouch. I was holding out hope that you could get a meaningful
improvement if we could use the first X bytes of the strxfrm output so
you only need to do a strcoll on strings that actually nearly match.
But with an information density of 9 bytes for one 1 character it
doesn't seem worthwhile.

That and this gem in the strxfrm manpage:

RETURN VALUE      The  strxfrm()  function returns the number of bytes required to      store the transformed string in
destexcluding the terminating      '\0' character.  If the value returned is n or more, the      contents of dest are
indeterminate.

Which means that you have to take the entire transformed string, you
can't just ask for the first bit. I think that kind of leaves the whole
idea dead in the water.

Just for interest I looked at the ICU API for this and they have the
same restriction.  There is another function which you can use to
return partial sort keys (ucol_nextSortKeyPart) but it produces
"uncompressed sortkeys", which it seems is what Mac OSX is doing, which
seems useless for our purposes.  Either this is a hard problem or we're
nowhere near a target use case.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.  -- Arthur Schopenhauer

Re: sortsupport for text

From

Greg Stark

Date:

19 March 2012, 22:21:17

On Mon, Mar 19, 2012 at 9:23 PM, Martijn van Oosterhout
<kleptog@svana.org> wrote:
> Ouch. I was holding out hope that you could get a meaningful
> improvement if we could use the first X bytes of the strxfrm output so
> you only need to do a strcoll on strings that actually nearly match.
> But with an information density of 9 bytes for one 1 character it
> doesn't seem worthwhile.

When I was playing with glibc it was 4n. I think what they do is have
n bytes for the high order bits, then n bytes for low order bits like
capitalization or whitespace differences. I suspect they used to use
16 bits for each and have gone to some larger size.

> That and this gem in the strxfrm manpage:
>
> RETURN VALUE
>       The  strxfrm()  function returns the number of bytes required to
>       store the transformed string in dest excluding the terminating
>       '\0' character.  If the value returned is n or more, the
>       contents of dest are indeterminate.
>
> Which means that you have to take the entire transformed string, you
> can't just ask for the first bit. I think that kind of leaves the whole
> idea dead in the water.

I believe the intended API is that you allocate a buffer with your
guess of the right size, call strxfrm and if it returns a larger
number you realloc your buffer and call it again.

--
greg

Re: sortsupport for text

From

Peter Geoghegan

Date:

14 June 2012, 12:37:14

On 18 March 2012 15:08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> One other thing I've always wondered about in this connection is the
> general performance of sorting toasted datums.  Is it better to detoast
> them in every comparison, or pre-detoast to save comparison cycles at
> the cost of having to push much more data around?  I didn't see any
> discussion of this point in Robert's benchmarks, but I don't think we
> should go very far towards enabling sortsupport for text until we
> understand the issue and know whether we need to add more infrastructure
> for it.  If you cross your eyes a little bit, this is very much like
> the strxfrm question...

I see the parallels. I note that glibc's strcoll_l() is implemented
entirely in C (strcoll() itself is implemented in terms of strcoll_l()
), whereas the various strcmp.S are written in hand-optimized
assembler, with SSE3 instructions in the "Highly optimized version for
x86-64", for example. I wonder just how important a factor that is. I
suppose the reason why the glibc guys haven't just done something
equivalent internally might be that they much prefer to perform the
comparison in-place, due to the need to target a conservative lowest
common denominator...or it could be because it just doesn't matter
that much.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: sortsupport for text

From

Robert Haas

Date:

14 June 2012, 13:35:25

On Thu, Jun 14, 2012 at 11:36 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
> On 18 March 2012 15:08, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> One other thing I've always wondered about in this connection is the
>> general performance of sorting toasted datums.  Is it better to detoast
>> them in every comparison, or pre-detoast to save comparison cycles at
>> the cost of having to push much more data around?  I didn't see any
>> discussion of this point in Robert's benchmarks, but I don't think we
>> should go very far towards enabling sortsupport for text until we
>> understand the issue and know whether we need to add more infrastructure
>> for it.  If you cross your eyes a little bit, this is very much like
>> the strxfrm question...
>
> I see the parallels.

The problem with pre-detoasting to save comparison cycles is that you
can now fit many, many fewer tuples in work_mem.  There might be cases
where it wins (for example, because the entire data set fits even
after decompressing everything) but in most cases it seems like a
loser.

Also, my guess is that most values people sort by are pretty short,
making this concern mostly academic.  Suppose you are sorting a bunch
of strings which might be either 100 characters in length or 1MB.  If
they're all 100 characters, you probably don't need to detoast.  If
they're all 1MB, you probably can't detoast without eating up a ton of
memory (and even if you have it, this might not be the best use for
it).  If you have a mix, detoasting might be affordable provided that
the percentage of long strings is small, but it's also not going to
save you much, because if the percentage of long strings is small,
then most comparisons will be between two short strings where we don't
save anything anyway.

All things considered, this seems to me to be aiming at a pretty
narrow target, but maybe I'm just not thinking about it creatively
enough.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: sortsupport for text

From

Peter Geoghegan

Date:

14 June 2012, 14:56:42

On 14 June 2012 17:35, Robert Haas <robertmhaas@gmail.com> wrote:
> The problem with pre-detoasting to save comparison cycles is that you
> can now fit many, many fewer tuples in work_mem.  There might be cases
> where it wins (for example, because the entire data set fits even
> after decompressing everything) but in most cases it seems like a
> loser.

If I had to guess, I'd say you're probably right about that -
optimising sorting toasted text doesn't seem like a terribly sensible
use of your time.

What about the strxfrm suggestion of Greg's? You might find that the
added benefit of being able to avail of a highly optimised strcmp()
tipped the balance in favour of that idea, beyond the simple fact that
there's only a linear number of what you might loosely call "strcoll_l
units of work" rather than as many as O(n ^ 2). Furthermore, I'd
speculate that if you were to interlace the strxfrm() calls with
copying each text string, somewhere like within a specialised
datumCopy(), that would make the approach more efficient still, as you
specify a location for the blob in the just-palloc()'d leading-key
private memory directly, rather than just using memcpy.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: sortsupport for text

From

Peter Geoghegan

Date:

14 June 2012, 15:10:47

On 2 March 2012 20:45, Robert Haas <robertmhaas@gmail.com> wrote:
> I decided to investigate the possible virtues of allowing "text" to
> use the sortsupport infrastructure, since strings are something people
> often want to sort.

I should mention up-front that I agree with the idea that it is worth
optimising text sorting because it is a very common thing to have to
do, and therefore the standard for inclusion ought to be lower. I
don't intend to talk about tapesort though - that isn't really fair,
not least because I have some serious doubts about the quality of our
implementation. Furthermore, I think that it is logical that doing
things like resolving collations occur within a preparatory function
in advance of sorting, rather than redundantly doing that for each and
every comparison.

Why have you made the reusable buffer managed by sortsupport
TEXTBUFLEN-aligned? The existing rationale for that constant (whose
value is 1024) does not seem to carry forward here:
* This should be large enough that most strings will be fit, but small* enough that we feel comfortable putting it on
thestack. 

ISTM it would be on average worth the hit of having to repalloc a few
more times for larger strings by making that buffer much smaller
initially, and doubling its size each time that proved insufficient,
rather than increasing its size to the smallest possible
TEXTBUFLEN-aligned size that you can get away with for the immediately
subsequent memcpy. Realistically, any database I've ever worked with
would probably be able to fit a large majority of its text strings
into 16 chars of memory - you yourself said that sorting toasted text
isn't at all common.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: sortsupport for text

From

Robert Haas

Date:

14 June 2012, 15:11:29

On Thu, Jun 14, 2012 at 1:56 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
> On 14 June 2012 17:35, Robert Haas <robertmhaas@gmail.com> wrote:
>> The problem with pre-detoasting to save comparison cycles is that you
>> can now fit many, many fewer tuples in work_mem.  There might be cases
>> where it wins (for example, because the entire data set fits even
>> after decompressing everything) but in most cases it seems like a
>> loser.
>
> If I had to guess, I'd say you're probably right about that -
> optimising sorting toasted text doesn't seem like a terribly sensible
> use of your time.
>
> What about the strxfrm suggestion of Greg's? You might find that the
> added benefit of being able to avail of a highly optimised strcmp()
> tipped the balance in favour of that idea, beyond the simple fact that
> there's only a linear number of what you might loosely call "strcoll_l
> units of work" rather than as many as O(n ^ 2). Furthermore, I'd
> speculate that if you were to interlace the strxfrm() calls with
> copying each text string, somewhere like within a specialised
> datumCopy(), that would make the approach more efficient still, as you
> specify a location for the blob in the just-palloc()'d leading-key
> private memory directly, rather than just using memcpy.

Well, it's still got the problem of blowing up memory usage.  I just
can't get excited about optimizing for the case where we can consume
10x the memory and still fit in work_mem.  If we've got that case, the
sort is gonna be pretty fast anyway.   The case where preprocessing
wins is when there are going to be a large number of comparisons
against each tuple - i.e. lg(N) is large.  But the cases where we
could pre-transform with strxfrm are those where the data fits in a
small percentage of work mem - i.e. lg(N) is small.  I'm open to
somebody showing up with a test result that demonstrates that it's
worthwhile, but to me it seems like it's chasing diminishing returns.

The point of this patch isn't really to improve things for the
collation-aware case, although it's nice that it does.  The point is
rather to shave off a double-digit percentage off the time it takes to
do the sort required to build a C-collation index, which is what
people should be using when they don't care about < and >, which most
don't.  Despite Tom's concerns, I don't think there's anything in this
patch that can't be fairly easily revised at a later date if we decide
we want a different API.  I think it's worth picking the low-hanging
fruit in the meantime.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: sortsupport for text

From

Robert Haas

Date:

14 June 2012, 15:29:06

On Thu, Jun 14, 2012 at 2:10 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
> Why have you made the reusable buffer managed by sortsupport
> TEXTBUFLEN-aligned? The existing rationale for that constant (whose
> value is 1024) does not seem to carry forward here:
>
>  * This should be large enough that most strings will be fit, but small
>  * enough that we feel comfortable putting it on the stack.

Well, as the comments explain:

+       /*
+        * We avoid repeated palloc/pfree on long strings by keeping the buffers
+        * we allocate around for the duration of the sort.  When we
expand them,
+        * we round off the to the next multiple of TEXTBUFLEN in order to avoid
+        * repeatedly expanding them by very small amounts.
+        */

> ISTM it would be on average worth the hit of having to repalloc a few
> more times for larger strings by making that buffer much smaller
> initially, and doubling its size each time that proved insufficient,
> rather than increasing its size to the smallest possible
> TEXTBUFLEN-aligned size that you can get away with for the immediately
> subsequent memcpy. Realistically, any database I've ever worked with
> would probably be able to fit a large majority of its text strings
> into 16 chars of memory - you yourself said that sorting toasted text
> isn't at all common.

I thought that doubling repeatedly would be overly aggressive in terms
of memory usage.  Blowing the buffers out to 8kB because we hit a
string that's a bit over 4kB isn't so bad, but blowing them out to
256MB because we hit a string that's a bit over 128MB seems a bit
excessive.

Also, I don't think it really saves anything from a performance point
of view.  The worst case for this is if we're asked to repeatedly
compare strings that expand the buffer by a kilobyte each time.
First, how likely is that to happen in a real world test case?  In
many cases, most of the input strings will be of approximately equal
length; also in many cases, that length will be short; even if the
lengths take the worst possible distribution, we have to hit them in
the worst possible order for this to be a problem.  Second, even if it
does happen, does it really matter?  Suppose we expand the buffer a
kilobyte at a time from an initial size of 1kB all the way out to
256MB.  That's 256,000 palloc calls, so we must be sorting at least
256,000 datums, at least 128,000 of which are longer than 128MB.  I
think the cost of calling memcpy() and strcoll() repeatedly on all
those long datums - not to mention repeatedly detoasting them - is
going to bludgeon the palloc overhead into complete insignificance.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: sortsupport for text

From

Peter Geoghegan

Date:

14 June 2012, 16:24:36

On 14 June 2012 19:28, Robert Haas <robertmhaas@gmail.com> wrote:
> I thought that doubling repeatedly would be overly aggressive in terms
> of memory usage.  Blowing the buffers out to 8kB because we hit a
> string that's a bit over 4kB isn't so bad, but blowing them out to
> 256MB because we hit a string that's a bit over 128MB seems a bit
> excessive.

That's pretty much what all popular dynamic array implementations do,
from C++'s std::vector to Python's list (it's a misnomer). Having to
allocate 256MB for a buffer to contain a string a bit over 128MB may
seem excessive, until you later get a 250MB string. Even if doubling
is generally excessive, which I doubt, that's beside the point, which
is that expanding the array by some constant proportion results in
each insertion taking amortized constant time.

> Suppose we expand the buffer a
> kilobyte at a time from an initial size of 1kB all the way out to
> 256MB.  That's 256,000 palloc calls, so we must be sorting at least
> 256,000 datums, at least 128,000 of which are longer than 128MB.  I
> think the cost of calling memcpy() and strcoll() repeatedly on all
> those long datums - not to mention repeatedly detoasting them - is
> going to bludgeon the palloc overhead into complete insignificance.

I fail to understand how this sortsupport buffer fundamentally differs
from a generic dynamic array abstraction built to contain chars. That
being the case, I see no reason not to just do what everyone else does
when expanding dynamic arrays, and no reason why we shouldn't make
essentially the same time-space trade-off here as others do elsewhere.

Another concern is that it seems fairly pointless to have two buffers.
Wouldn't it be more sensible to have a single buffer that was
partitioned to make two logical, equally-sized buffers, given that in
general each buffer is expected to grow at exactly the same rate?

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: sortsupport for text

From

Robert Haas

Date:

14 June 2012, 16:32:43

On Thu, Jun 14, 2012 at 3:24 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
> On 14 June 2012 19:28, Robert Haas <robertmhaas@gmail.com> wrote:
>> I thought that doubling repeatedly would be overly aggressive in terms
>> of memory usage.  Blowing the buffers out to 8kB because we hit a
>> string that's a bit over 4kB isn't so bad, but blowing them out to
>> 256MB because we hit a string that's a bit over 128MB seems a bit
>> excessive.
>
> That's pretty much what all popular dynamic array implementations do,
> from C++'s std::vector to Python's list (it's a misnomer). Having to
> allocate 256MB for a buffer to contain a string a bit over 128MB may
> seem excessive, until you later get a 250MB string. Even if doubling
> is generally excessive, which I doubt, that's beside the point, which
> is that expanding the array by some constant proportion results in
> each insertion taking amortized constant time.

Yeah, but *it doesn't matter*.  If you test this on strings that are
long enough that they get pushed out to TOAST, you'll find that it
doesn't measurably improve performance, because the overhead of
detoasting so completely dominates any savings on the palloc side that
you can't pick them out of the inter-run noise.  Risking eating up an
extra 100MB of memory that we don't really need in order to obtain a
performance optimization that is far too small to measure does not
make sense.  The case with std::vector is not analagous; they don't
have any way of knowing what other overhead you are incurring between
insertions into the vector, so it's reasonable to support that the
cost of the vector insertions themselves might be material.  Here we
know that it doesn't matter, so the application of Knuth's first law
of optimization is appropriate.

>> Suppose we expand the buffer a
>> kilobyte at a time from an initial size of 1kB all the way out to
>> 256MB.  That's 256,000 palloc calls, so we must be sorting at least
>> 256,000 datums, at least 128,000 of which are longer than 128MB.  I
>> think the cost of calling memcpy() and strcoll() repeatedly on all
>> those long datums - not to mention repeatedly detoasting them - is
>> going to bludgeon the palloc overhead into complete insignificance.
>
> I fail to understand how this sortsupport buffer fundamentally differs
> from a generic dynamic array abstraction built to contain chars. That
> being the case, I see no reason not to just do what everyone else does
> when expanding dynamic arrays, and no reason why we shouldn't make
> essentially the same time-space trade-off here as others do elsewhere.
>
> Another concern is that it seems fairly pointless to have two buffers.
> Wouldn't it be more sensible to have a single buffer that was
> partitioned to make two logical, equally-sized buffers, given that in
> general each buffer is expected to grow at exactly the same rate?

Sure, but it would be making the code more complicated in return for
no measurable performance benefit.  We generally avoid that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: sortsupport for text

From

Peter Geoghegan

Date:

14 June 2012, 19:31:12

On 14 June 2012 20:32, Robert Haas <robertmhaas@gmail.com> wrote:
> Yeah, but *it doesn't matter*.  If you test this on strings that are
> long enough that they get pushed out to TOAST, you'll find that it
> doesn't measurably improve performance, because the overhead of
> detoasting so completely dominates any savings on the palloc side that
> you can't pick them out of the inter-run noise.

That's probably true, but it's also beside the point. As recently as a
few hours ago, you yourself said "my guess is that most values people
sort by are pretty short, making this concern mostly academic". Why
are you getting hung up on toasting now?

> Here we know that it doesn't matter, so the application of Knuth's first law
> of optimization is appropriate.

I'm not advocating some Byzantine optimisation, or even something that
could reasonably be described as an optimisation at all here. I'm
questioning why you've unnecessarily complicated the code by having
the buffer size just big enough to fit the biggest value seen so far,
but arbitrarily aligned to a value that is completely irrelevant to
bttextfastcmp_locale(), rather than using simple geometric expansion,
which is more or less the standard way of managing the growth of a
dynamic array.

You have to grow the array in some way. The basic approach I've
outlined has something to recommend it - why does it make sense to
align the size of the buffer to TEXTBUFLEN in particular though? It's
quite easy to imagine what you've done here resulting in an excessive
number of allocations (and pfree()s), which *could* be expensive. If
you're so conservative about allocating memory, don't grow the array
at quite so aggressive a rate as doubling it each time.

There is a trade-off between space and time to be made here, but I
don't know why you think that the right choice is to use almost the
smallest possible amount of memory in all cases.

>> Another concern is that it seems fairly pointless to have two buffers.
>> Wouldn't it be more sensible to have a single buffer that was
>> partitioned to make two logical, equally-sized buffers, given that in
>> general each buffer is expected to grow at exactly the same rate?
>
> Sure, but it would be making the code more complicated in return for
> no measurable performance benefit.  We generally avoid that.

Fair enough.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: sortsupport for text

From

Robert Haas

Date:

14 June 2012, 23:25:50

On Thu, Jun 14, 2012 at 6:30 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
>> Here we know that it doesn't matter, so the application of Knuth's first law
>> of optimization is appropriate.
>
> I'm not advocating some Byzantine optimisation, or even something that
> could reasonably be described as an optimisation at all here. I'm
> questioning why you've unnecessarily complicated the code by having
> the buffer size just big enough to fit the biggest value seen so far,
> but arbitrarily aligned to a value that is completely irrelevant to
> bttextfastcmp_locale(), rather than using simple geometric expansion,
> which is more or less the standard way of managing the growth of a
> dynamic array.

Well, so... palloc, and most malloc implementations, don't actually
just double every time forever.  They double up to some relatively
small number like 1MB, and then they expand in 1MB chunks.
std::vector may well behave as you're describing, but that's doing
something completely different.  We're not accumulating a string of
unknown length into a buffer, with the risk of having to copy the
whole buffer every time we reallocate.  We know the exact size of the
buffer we need for any given string, so the cost of a pfree/palloc is
only the cost of releasing and allocating memory; there is no
additional copying cost, as there would be for std::vector.

Look at it another way.  Right now, the worst case number of temporary
buffer allocations is about 2N lg N, assuming we do about N lg N
comparisons during a sort.  If we simply implemented the most naive
buffer reallocation algorithm possible, we might in the worst case
reallocate each buffer up to N times.  That is already better than
what we do now by a factor of lg N.  If we suppose N is about a
million, that's an improvement of 20x *in the worst case* - typically,
the improvement will be much larger, because all the strings will be
of similar length or we won't hit them in exactly increasing order of
length.  The algorithm I've actually implemented bounds the worst case
1024 times more tightly - given N strings, we can't need to enlarge
the buffer more than N/1024 times.  So with N of around a million,
this algorithm should eliminate *at least* 99.995% of the pallocs we
currently do .  How much better does it need to be?

> You have to grow the array in some way. The basic approach I've
> outlined has something to recommend it - why does it make sense to
> align the size of the buffer to TEXTBUFLEN in particular though?

There's nothing magic about TEXTBUFLEN as far as that goes.  We could
round to the nearest multiple of 8kB or whatever if that seemed
better.  But if we rounded to, say, the nearest 1MB, then someone
sorting strings that are 2kB long would use 2MB of memory for these
buffers.  Considering that we ship with work_mem = 1MB, that could
easily end up wasting almost twice work_mem.  I don't see how you can
view that as a trivial problem.  It might be worth sucking that up if
it seemed likely to dramatically improve performance, but it doesn't.

> It's
> quite easy to imagine what you've done here resulting in an excessive
> number of allocations (and pfree()s), which *could* be expensive.

Actually, I can't imagine that at all, per the above analysis.  If I
could, I might agree with you.  :-)

> There is a trade-off between space and time to be made here, but I
> don't know why you think that the right choice is to use almost the
> smallest possible amount of memory in all cases.

Because I think that with the current implementation I can have my
cake and eat it, too.  I believe I've gotten essentially all the
available speedup for essentially no memory wastage.  If you can
convince me that I've missed something and there is still a meaningful
amount of palloc overhead left to be squeezed out, I'm all ears.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: sortsupport for text

From

Tom Lane

Date:

15 June 2012, 13:23:16

Peter Geoghegan <peter@2ndquadrant.com> writes:
> On 14 June 2012 19:28, Robert Haas <robertmhaas@gmail.com> wrote:
>> I thought that doubling repeatedly would be overly aggressive in terms
>> of memory usage.

> I fail to understand how this sortsupport buffer fundamentally differs
> from a generic dynamic array abstraction built to contain chars. That
> being the case, I see no reason not to just do what everyone else does
> when expanding dynamic arrays, and no reason why we shouldn't make
> essentially the same time-space trade-off here as others do elsewhere.

I agree with Peter on this one; not only is double-each-time the most
widespread plan, but it is what we do in just about every other place
in Postgres that needs a dynamically expansible buffer.  If you do it
randomly differently here, readers of the code will be constantly
stopping to wonder why it's different here and if that's a bug or not.
(And from a performance standpoint, I'm not entirely convinced it's not
a bug, anyway.  Worst-case behavior could be pretty bad.)
        regards, tom lane

Re: sortsupport for text

From

Robert Haas

Date:

15 June 2012, 13:35:26

On Fri, Jun 15, 2012 at 12:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Geoghegan <peter@2ndquadrant.com> writes:
>> On 14 June 2012 19:28, Robert Haas <robertmhaas@gmail.com> wrote:
>>> I thought that doubling repeatedly would be overly aggressive in terms
>>> of memory usage.
>
>> I fail to understand how this sortsupport buffer fundamentally differs
>> from a generic dynamic array abstraction built to contain chars. That
>> being the case, I see no reason not to just do what everyone else does
>> when expanding dynamic arrays, and no reason why we shouldn't make
>> essentially the same time-space trade-off here as others do elsewhere.
>
> I agree with Peter on this one; not only is double-each-time the most
> widespread plan, but it is what we do in just about every other place
> in Postgres that needs a dynamically expansible buffer.  If you do it
> randomly differently here, readers of the code will be constantly
> stopping to wonder why it's different here and if that's a bug or not.

That could, of course, be addressed by adding a comment.

> (And from a performance standpoint, I'm not entirely convinced it's not
> a bug, anyway.  Worst-case behavior could be pretty bad.)

Instead of simply asserting that, could you respond to the specific
points raised in my analysis?  I think there's no way it can be bad.
I am happy to be proven wrong, but I like to understand why it is that
I am wrong before changing things.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: sortsupport for text

From

Tom Lane

Date:

15 June 2012, 14:46:04

Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Jun 15, 2012 at 12:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> (And from a performance standpoint, I'm not entirely convinced it's not
>> a bug, anyway.  Worst-case behavior could be pretty bad.)

> Instead of simply asserting that, could you respond to the specific
> points raised in my analysis?  I think there's no way it can be bad.
> I am happy to be proven wrong, but I like to understand why it is that
> I am wrong before changing things.

Maybe I missed something, but as far as I saw your argument was not that
the performance wasn't bad but that the rest of the sort code would
dominate the runtime anyway.  I grant that entirely, but that doesn't
mean that it's good for this piece of it to possibly have bad behavior.

In any case, it seems to me that if the bottom line is that the
performance of this piece of code isn't going to matter overall,
then we might as well use the simplest, least surprising implementation
we can.  And I concur with Peter that a doubling-based approach meets
that description, while what you've got here doesn't.
        regards, tom lane

Re: sortsupport for text

From

Robert Haas

Date:

15 June 2012, 17:06:54

On Fri, Jun 15, 2012 at 1:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, Jun 15, 2012 at 12:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> (And from a performance standpoint, I'm not entirely convinced it's not
>>> a bug, anyway.  Worst-case behavior could be pretty bad.)
>
>> Instead of simply asserting that, could you respond to the specific
>> points raised in my analysis?  I think there's no way it can be bad.
>> I am happy to be proven wrong, but I like to understand why it is that
>> I am wrong before changing things.
>
> Maybe I missed something, but as far as I saw your argument was not that
> the performance wasn't bad but that the rest of the sort code would
> dominate the runtime anyway.  I grant that entirely, but that doesn't
> mean that it's good for this piece of it to possibly have bad behavior.

That, plus the fact that not wasting memory in code paths where memory
is at a premium seems important to me.  I'm shocked that either of you
think it's OK to overallocate by as much as 2X in a code path that's
only going to be used when we're going through fantastic gyrations to
make memory usage fit inside work_mem.  The over-allocation by itself
could easily exceed work_mem.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: sortsupport for text

From

Peter Geoghegan

Date:

15 June 2012, 17:51:58

On 15 June 2012 21:06, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jun 15, 2012 at 1:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> On Fri, Jun 15, 2012 at 12:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> (And from a performance standpoint, I'm not entirely convinced it's not
>>>> a bug, anyway.  Worst-case behavior could be pretty bad.)
>>
>>> Instead of simply asserting that, could you respond to the specific
>>> points raised in my analysis?  I think there's no way it can be bad.
>>> I am happy to be proven wrong, but I like to understand why it is that
>>> I am wrong before changing things.
>>
>> Maybe I missed something, but as far as I saw your argument was not that
>> the performance wasn't bad but that the rest of the sort code would
>> dominate the runtime anyway.  I grant that entirely, but that doesn't
>> mean that it's good for this piece of it to possibly have bad behavior.
>
> That, plus the fact that not wasting memory in code paths where memory
> is at a premium seems important to me.  I'm shocked that either of you
> think it's OK to overallocate by as much as 2X in a code path that's
> only going to be used when we're going through fantastic gyrations to
> make memory usage fit inside work_mem.  The over-allocation by itself
> could easily exceed work_mem.

That seems pretty thin to me. We're talking about a couple of buffers
whose ultimate size is only approximately just big enough to hold the
largest text datum seen when sorting. Meanwhile, if it's the leading
key we're dealing with (and of course, it usually will be), before
exceeding work_mem all of the *entire* set of strings to be sorted are
sitting in palloc()'d memory anyway. I'm surprised that you didn't
immediately concede the point, to be honest.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: sortsupport for text

From

Tom Lane

Date:

15 June 2012, 19:10:54

Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Jun 15, 2012 at 1:45 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Maybe I missed something, but as far as I saw your argument was not that
>> the performance wasn't bad but that the rest of the sort code would
>> dominate the runtime anyway. �I grant that entirely, but that doesn't
>> mean that it's good for this piece of it to possibly have bad behavior.

> That, plus the fact that not wasting memory in code paths where memory
> is at a premium seems important to me.  I'm shocked that either of you
> think it's OK to overallocate by as much as 2X in a code path that's
> only going to be used when we're going through fantastic gyrations to
> make memory usage fit inside work_mem.  The over-allocation by itself
> could easily exceed work_mem.

I would be concerned about this if it were per-sort-tuple wastage, but
what I understood to be happening was that this was a single instance of
an expansible buffer (per sort, perhaps, but still just one buffer).
And, as you keep pointing out when it suits your argument, it's
relatively uncommon to be sorting enormous values anyway.  So no, I am
not particularly worried about that.  If you are, there are more
important places to be concerned about allocation pad wastage, starting
with palloc itself.
        regards, tom lane

Re: sortsupport for text

From

Peter Geoghegan

Date:

17 June 2012, 12:36:44

The fly in the ointment for strxfrm() adoption may be the need to be
consistent with this earlier behaviour:

commit 656beff59033ccc5261a615802e1a85da68e8fad
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Thu Dec 22 22:50:00 2005 +0000
   Adjust string comparison so that only bitwise-equal strings are considered   equal: if strcoll claims two strings
areequal, check it with strcmp, and   sort according to strcmp if not identical.  This fixes inconsistent   behavior
underglibc's hu_HU locale, and probably under some other locales   as well.  Also, take advantage of the
now-well-definedbehavior to speed up   texteq, textne, bpchareq, bpcharne: they may as well just do a bitwise
comparisonand not bother with strcoll at all. 
   NOTE: affected databases may need to REINDEX indexes on text columns to be   sure they are self-consistent.

Here is the relevant code:
    /*     * In some locales strcoll() can claim that nonidentical strings are     * equal.  Believing that would be
badnews for a number of reasons,     * so we follow Perl's lead and sort "equal" strings according to     * strcmp().
 */    if (result == 0)        result = strcmp(a1p, a2p); 

I'm not sure I agree with this decision; why should we presume to know
better than the glibc locale what constitutes equality? What are the
number of reasons referred to? It's seems very likely that the main
one was the then-need to guard against poor quality qsort()
implementations that went quadratic in the face of lots of duplicates,
but we already removed a bunch of other such hacks, because of course
we now control the qsort implementation used, and have since the year
after this commit was made, 2006.

Obviously this decision was made a number of years ago now, and at
least one person went on to rely on this behaviour, so it can only be
revisited with that in mind. However, provided we are able to say
"here is a compatibility ordering operator" to those that complain
about this, and provided it is appropriately listed as a compatibility
issue in the 9.3 release notes, I think it would be worth reverting
this commit to facilitate strxfrm().

How many people:

A) are using hu_HU or some other locale where this can happen?

and

B) will care?

Now, I'm sure that there is another workaround too, so this doesn't
need to be a blocker even if it is absolutely unacceptable to revert -
but I have to wonder if that's worth it. People don't have any
business relying on a sort order that is consistent in any way other
than the one they actually asked for. A few people still do even as we
go blue in the face telling them not to of course, but that's fairly
normal.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: sortsupport for text

From

Tom Lane

Date:

17 June 2012, 13:01:22

Peter Geoghegan <peter@2ndquadrant.com> writes:
> The fly in the ointment for strxfrm() adoption may be the need to be
> consistent with this earlier behaviour:

>     if strcoll claims two strings are equal, check it with strcmp, and
>     sort according to strcmp if not identical.

> I'm not sure I agree with this decision; why should we presume to know
> better than the glibc locale what constitutes equality?

The killer reason why it must be like that is that you can't use hash
methods on text if text equality is some unknown condition subtly
different from bitwise equality.  My recollection is that there were
some other problems as well, but I'm too lazy to search the archives
for you.

> It's seems very likely that the main
> one was the then-need to guard against poor quality qsort()
> implementations that went quadratic in the face of lots of duplicates,

No, I don't recall that that had anything to do with it.
        regards, tom lane

Re: sortsupport for text

From

Peter Geoghegan

Date:

17 June 2012, 13:43:35

On 17 June 2012 17:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm not sure I agree with this decision; why should we presume to know
>> better than the glibc locale what constitutes equality?
>
> The killer reason why it must be like that is that you can't use hash
> methods on text if text equality is some unknown condition subtly
> different from bitwise equality.

Fair enough, but I doubt that we need to revert the changes made in
this commit to texteq in addition to the changes I'd like to see in
order to be semantically self-consistent. That is because there is
often a distinction made between equality and equivalence, and we
could adopt this distinction. strcoll() could be said to be just
making a representation that its two arguments are equivalent (and not
necessarily equal) when it returns 0. This distinction is explicitly
made in the C++ standard library, and failing to understand it can
result in bugs:

http://www.cplusplus.com/reference/algorithm/equal_range/

Note the use of the word "equivalent" rather than "equal" in the text.
equal_range is a bit of a misnomer.

This distinction is important enough to have warranted an entire
subsection of the book "Effective STL" by Scott Meyers, a
well-respected expert on the language. This comes up more often than
you'd think - "std::set::insert" determines if an element already
exists (to know if it must replace it) based on equivalency (usually,
though not necessarily, defined in terms of operator< ), whereas the
"find" algorithm finds elements based on equality (operator==).

> My recollection is that there were
> some other problems as well, but I'm too lazy to search the archives
> for you.

Fair enough. I'll search for it myself later. I'm about to head out now.

>> It's seems very likely that the main
>> one was the then-need to guard against poor quality qsort()
>> implementations that went quadratic in the face of lots of duplicates,
>
> No, I don't recall that that had anything to do with it.

Oh, okay. It looked very much like the "avoid equality at all costs"
thing you still see some of in tuplesort.c .

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: sortsupport for text

From

Tom Lane

Date:

17 June 2012, 13:50:28

Peter Geoghegan <peter@2ndquadrant.com> writes:
> On 17 June 2012 17:01, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The killer reason why it must be like that is that you can't use hash
>> methods on text if text equality is some unknown condition subtly
>> different from bitwise equality.

> Fair enough, but I doubt that we need to revert the changes made in
> this commit to texteq in addition to the changes I'd like to see in
> order to be semantically self-consistent. That is because there is
> often a distinction made between equality and equivalence, and we
> could adopt this distinction.

How exactly do you plan to shoehorn that into SQL?  You could invent
some nonstandard "equivalence" operator I suppose, but what will be the
value?  We aren't going to set things up in such a way that we can't
use hash join or hash aggregation in queries that use the regular "="
operator.  IMO there just aren't going to be enough people who care to
use a non-default operator.
        regards, tom lane

Re: sortsupport for text

From

Peter Geoghegan

Date:

17 June 2012, 15:49:01

<p><br /> On Jun 17, 2012 5:50 PM, "Tom Lane" <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>>
wrote:<br/> ><br /> > Peter Geoghegan <<a href="mailto:peter@2ndquadrant.com">peter@2ndquadrant.com</a>>
writes:<br/> > > On 17 June 2012 17:01, Tom Lane <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>>
wrote:<br/> > How exactly do you plan to shoehorn that into SQL?  You could invent<br /> > some nonstandard
"equivalence"operator I suppose, but what will be the<br /> > value?  We aren't going to set things up in such a way
thatwe can't<br /> > use hash join or hash aggregation in queries that use the regular "="<br /> >
operator.<p>Right,most people won't care. You may or may not want a new <br /> Operator for equivalency. The regular
operatorfor equality doesn't have to and shouldn't change. It is both useful and conceptually clean to not guarantee
thata compator can be relied upon to indicate equality and not just equivalency.

Re: sortsupport for text

From

Tom Lane

Date:

17 June 2012, 17:26:56

Peter Geoghegan <peter@2ndquadrant.com> writes:
> Right, most people won't care. You may or may not want a new
> Operator for equivalency. The regular operator for equality doesn't have to
> and shouldn't change. It is both useful and conceptually clean to not
> guarantee that a compator can be relied upon to indicate equality and not
> just equivalency.

Sure, and in general we only expect that "=" operators mean equivalency;
a concrete example is float8 "=", which on IEEE-spec machines will say
that zero and minus zero are equal.

The trick for hashing such datatypes is to be able to guarantee that
"equal" values hash to the same hash code, which is typically possible
as long as you know the equality rules well enough.  We could possibly
do that for text with pure-strcoll equality if we knew all the details
of what strcoll would consider "equal", but we do not.

See also citext for an example of a datatype where we can manage to
treat distinct textual values as equal and still hash them.
        regards, tom lane

Re: sortsupport for text

From

Peter Geoghegan

Date:

17 June 2012, 19:59:47

On 17 June 2012 21:26, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Sure, and in general we only expect that "=" operators mean equivalency;
> a concrete example is float8 "=", which on IEEE-spec machines will say
> that zero and minus zero are equal.

Right; the spec says that, and we punt to the spec. No one sensible
thinks that minus zero and zero floats are not equal, by virtue of the
fact that if you compare them in C using ==, it evaluates to true.
Equivalency as I use the term can't really be talked about without
talking about sorting or less than operators.

> The trick for hashing such datatypes is to be able to guarantee that
> "equal" values hash to the same hash code, which is typically possible
> as long as you know the equality rules well enough. We could possibly
> do that for text with pure-strcoll equality if we knew all the details
> of what strcoll would consider "equal", but we do not.

I see. I am tentatively suggesting that we don't change the definition
of equal, but allow that equivalent text values may not be equal.

> See also citext for an example of a datatype where we can manage to
> treat distinct textual values as equal and still hash them.

I'm not talking about textually distinct values being
equal/equivalent. That doesn't quite capture it (although I suppose a
corollary of what I am trying to express is that equivalent though
non-equal values should invariably have different textual
representations in Postgres).

I think a good example that illustrates the difference between
equivalency and equality is the German ß character (esszet), which is
regarded as equivalent to 'ss' (two 's' characters). The following
German words are sorted correctly as required by de_DE.UTF-8:

Arg
Ärgerlich
Arm
Assistant
Aßlar
Assoziation

So if you take the word "Aßlar" here - that is equivalent to "Asslar",
and so strcoll("Aßlar", "Asslar") will return 0 if you have the right
LC_COLLATE (if you tried this out for yourself and found that I was
actually lying through my teeth, pretend I said Hungarian instead of
German and "some really obscure character" rather than ß). It seems a
bit unsatisfactory to me that a unique constraint will happily accept
these two equivalent strings because they're not bitwise identical,
when the collation was supposed to have that normalisation baked in.
At the same time, I find it intuitively obvious that "Aßlar" ==
"Asslar" is false, and at the very least I think that very few people
would not grant that it is a not unreasonable state of affairs for
both of those two things to hold, at least on the face of it
(presuming that I wasn't actually lying about the particulars of this,
which I was, since this is apparently confined to less popular locales
and I'm too lazy to research the exact details).

Now, I do realise that there is what might appear to be a tension in
what I'm saying; it is precisely the fact that we can traverse a btree
index using comparators that allows a btree to satisfy an equality
condition (or an equivalency condition; however you choose to
characterise whatever it is that the '=' operator does).

To restate the problem: The '=' operator implies equivalency and not
equality. Or does it? Look at this code from nbtutils.c's
_bt_checkkeys() function:
test = FunctionCall2Coll(&key->sk_func, key->sk_collation, datum, key->sk_argument);
if (!DatumGetBool(test)) { /* * Tuple fails this qual. If it's a required qual for the current
* scan direction, then we can conclude no further tuples will * pass, either. * * Note:
becausewe stop the scan as soon as any required equality * qual fails, it is critical that equality quals be
usedfor the * initial positioning in _bt_first() when they are available. See * comments in
_bt_first(). */
***SNIP***

The qual is verified for the index tuple itself (the test return value
lets us know if it matches) - the equality operator is actually
called, and is actually re-verified via texteq(). So what appears to
happen is that the btree code finds equivalent tuples, and then,
knowing that all pairs of equal tuples are equivalent (but not
necessarily the inverse) checks that it actually has a tuple that's
equal/satisfies the qual. Makes sense, and by this standard I'd judge
that '=' was actually an equality operator that sometimes took
advantage of equivalency purely as an implementation detail, but I'm
not familiar enough with that part of the code to have any degree of
confidence that I haven't made a leap here that I shouldn't have.

ISTM if '=' was really a mere equivalency operator, we'd only every
check (a < b && b < a) in the btree code.

It occurs to me that we might also have a new equivalency operator
(=== or something) so that we actually could answer questions like
"show me the tuples with the word Aßlar in some non-unique column -
both spellings will do". Maybe that's a bit off the wall, I'm not
sure.

Simple question: if you were to just remove the strcmp tie-breaker for
strcoll() in varstr_cmp(), but not touch anything else, would Postgres
exhibit objectively incorrect behaviour?

Incidentally, I think it's interesting that the SQL-exposed interface
to all of this merely presents the user with a boolean:

postgres=# select '4'::text < '4'::text;?column?
----------f
(1 row)

This is pretty much what C++ does; operator< and friends
conventionally return a bool, not an int. This might have something to
do with the need to make a break from the qsort() convention of
returning an int that may indicate equality. So the formal definition
of equivalency there may be that two equivalent values will always be
either less than each other or not less than each other - you have no
business expecting them to come out of a sort in any particular order
relative to each other (except with a stable sort). Which is not the
same as equal or "textually distinct but equal", which I think is how
I'd classify your IEEE 754 example.

We can decree that equivalency implies equality, or make all this
internal (which, perversely, I suppose the C++ committee people
cannot).

So, I may have lost sight of why I starting on about equivalency,
which is that it sure would be nice if we could use strxfrm to prepare
strings for sorting, which looks to be a fairly significant win. Does
anyone have any simpler ideas, assuming this one is no good?

--
Peter Geoghegan http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: sortsupport for text

From

Peter Geoghegan

Date:

17 June 2012, 20:32:39

On 17 June 2012 23:58, Peter Geoghegan <peter@2ndquadrant.com> wrote:
> We can decree that equivalency implies equality, or make all this
> internal (which, perversely, I suppose the C++ committee people
> cannot).

Sorry, that should obviously read "equality implies equivalency". We
may not have to decree it, because it may already be a tacit
assumption - I'm not sure.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: sortsupport for text

From

Tom Lane

Date:

17 June 2012, 20:38:35

Peter Geoghegan <peter@2ndquadrant.com> writes:
> ISTM if '=' was really a mere equivalency operator, we'd only every
> check (a < b && b < a) in the btree code.

You're not really making a lot of sense here, or at least I'm not
grasping the distinction you want to draw.  btree indexes (and sorting
in general) require the trichotomy law to hold, so what you say above is
tautological.  The only reason we test "a = b" and not "a < b || a > b"
is that the latter is at least twice as expensive to evaluate.
The last section of src/backend/access/nbtree/README has some notes
that you might find relevant.

> Simple question: if you were to just remove the strcmp tie-breaker for
> strcoll() in varstr_cmp(), but not touch anything else, would Postgres
> exhibit objectively incorrect behaviour?

Yes, it would, and did, before we put that in; see the archives for the
discussions that led up to the patch you mentioned earlier.

> So, I may have lost sight of why I starting on about equivalency,
> which is that it sure would be nice if we could use strxfrm to prepare
> strings for sorting, which looks to be a fairly significant win.

We could still do that as long as we were willing to store the original
strings as well as the strxfrm output.  Given the memory bloat already
implied by strxfrm, I don't think that's necessarily ridiculous on its
face --- it just makes the bar a bit higher for whether this is a win.
        regards, tom lane

Re: sortsupport for text

From

Peter Geoghegan

Date:

18 June 2012, 13:00:28

On 18 June 2012 00:38, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The only reason we test "a = b" and not "a < b || a > b"
> is that the latter is at least twice as expensive to evaluate.

Perhaps I've been unclear. I meant to write "(!(a < b) && !(b < a))",
and not "(a < b && b < a)". The former is not tautological when the
trichotomy law holds, but the latter is. My mistake - I was a little
tired when I wrote that (though you seemed to know what I meant).

You can sort things just fine if you only have a less than comparator
than returns a boolean, which is what I sought to illustrate with that
example: There is no requirement that the comparator indicate anything
more than whether one element is less than the other. In particular,
it need not tell the sort function that two elements are equal, though
the sort function will still (almost invariably in practice) put equal
elements beside each other. Consider std::set, a highly-generic
template class that enforces uniqueness, usually implemented using a
red-black tree (so its elements are sorted), that only requires that
the datatype it is instantiated with have an operator<. The datatype
used may not even have an operator== for std::set to consult if it
wanted to.

So, in principle, std::set could figure out if any two given elements
were equivalent. That is to say, it could determine:

if (!(a < b) && !(b < a))

It could not, even in principle given its implicit
interface/requirements for datatypes, know for sure that:

if (a == b)

Despite the fact that equivalency and equality can be expected to
coincide the vast majority of the time, they are not quite the same
thing. It's perfectly reasonable that it might actually not hold that
a and b are equal, even though they are equivalent, for certain
datatypes. Certainly not for integers. But, for example, it could make
sense to have a string datatype where with a certain locale a certain
Hungarian word was equivalent to the same string with a minor
variation in diacritical marks or whatever (an entirely culturally
defined equivalency), even though it was not equal according to this
datatype's own idiosyncratic notion of equality, which is bitwise
equality.

That would be a datatype that std::set would be entirely capable of
rolling with, because it doesn't care about equality. It might be
better if our text type had the same semantics as this hypothetical
string datatype, because:

1. A culture has gone so far as to make these variations of
diacritical marks or whatever equivalent. The Unicode consortium
consulted with the Hungarian government, or maybe the Hungarian
version of the Académie française, and they asked for this, which
seems pretty serious to me. Maybe they'd like to have a unique
constraint reject what they consider to be equivalent strings. If this
sounds pedantic to you, google "Han unification controversy", because
this sounds like that in reverse (Hungarian separation controversy).
The fact that when a unique constraint is violated, it isn't actually
necessary to check the equality operator function (equivalence will
do; no need to call texteq as when selecting with the same value in
the qual) suggests that this wouldn't be too hard to facilitate,
though again, I admit my understanding here is more shallow than I'd
like. Now, I'll grant you that I'm not currently losing any sleep
about our cultural insensitivity, and apparently neither are any
Hungarians; this just serves to illustrate that my position is The
Right Thing (I hope).

Here is a description of the Hungarian problem:

http://blogs.msdn.com/b/michkap/archive/2005/08/10/449909.aspx

It says " the feature is such that since dzs is a compression within
Hungarian, that if you see ddzs it should be treated as equivalent to
dzsdzs...Basically, the feature is such that since dzs is a
compression within Hungarian, that if you see ddzs it should be
treated as equivalent to dzsdzs.".

We're not the first people to have to deal with this:

http://bugs.mysql.com/bug.php?id=12519

Here is Tom's original analysis, that lead to this patch:

http://archives.postgresql.org/pgsql-general/2005-12/msg00803.php

2. This way, it's still possible to do the strxfrm() optimisation more
or less for free, without breaking hash methods on text. That is not
to be sniffed at - sorting text is very common and important. Most
people don't know about the C locale, and couldn't use it if they did.

> The last section of src/backend/access/nbtree/README has some notes
> that you might find relevant.

I assume that you're referring to "Notes to Operator Class
Implementors". It says:

"""
An = operator must be an equivalence relation; that is, for all non-null
values A,B,C of the datatype:
A = A is true                        reflexive lawif A = B, then B = A                    symmetric lawif A = B and B =
C,then A = C                transitive law 
"""

This would still hold; we'd just have to change the = operators here
to something representing equivalency, I suppose (or, more likely,
don't bring up equivalency .Vs equality here for pedagogical reasons).
If it didn't hold, than it would be impossible to write a strxfrm()
implementation, since the two variations of the textually distinct
though equivalent Hungarian strings would produce identical strxfrm()
blobs that were on the one hand bitwise identical, but on the other
not reflexive, symmetric or transitive when bitwise compared. This is
obviously a logical contradiction.

So, we'd still be using a comparison that returns an integer under the
hood, but we'd interpret 0 as equivalency and not equality.

>> Simple question: if you were to just remove the strcmp tie-breaker for
>>  () in varstr_cmp(), but not touch anything else, would Postgres
>> exhibit objectively incorrect behaviour?
>
> Yes, it would, and did, before we put that in; see the archives for the
> discussions that led up to the patch you mentioned earlier.

It wouldn't be the same though, because you also changed the
definition of texteq, textne, bpchareq, bpcharne in those commits (to
just use strncmp(), not varstr_cmp()), and I am not suggesting that
you consider changing them back too.

Incidentally, I should mention that I tried removing the strcmp()
tie-breaker in varstr_cmp() on master, and unsurprisingly the
regression tests don't break.

Now, there might be a better way of explaining all of this that is
more clear and succinct, and that's what I'll attempt to do:

tuplesort.c tells lies to the btree code: That two index tuples that
are logically equal are not. Now, the btree code doesn't "repeat"
those lies to anyone else, like the user, which is unsurprising
because that would lead to incorrect answers. Like all good liars, it
is consistent in the lies that it tells (because the lies are derived
from the item-pointers of index tuples, which are used as a
tie-breaker).

I'm tentatively suggesting that it might be to our advantage to tell a
similar though different lie to the btree code that it would never
have to know about, a lie originating higher-up and reverberating more
places, within the comparator proper ( that is, varstr_cmp() ), rather
than the index-tuple encapsulating comparator within tuplesort (that
is, comparetup_index_btree() ): That two index tuples are equal, when
in-fact it might just be that they're equivalent.

Now, for most people, this is exactly the same thing since every
datatype except text doesn't have a separate notion of equivalence,
and text basically only has that notion when using certain locales
like hu_HU.UTF-8 (hardly surprising that the regression tests didn't
fail on me).

The Hungarians get to have uniqueness enforced for textually distinct
variants that strcoll() returns 0 for (equivalent not equal values) -
in other words, they get what they asked for. However, they still must
specify the correct exact variation in predicates, since the equality
operator only cares about bitwise equality, which is consistent with
general Postgres convention. I believe that that could be the most
correct behaviour for them.

So, the original complainant saw this behaviour, a clear violation of
the trichotomy law:

mage=# select 'potyty'::varchar = 'potty'::varchar;?column?
----------f
(1 row)

mage=# select 'potyty'::varchar < 'potty'::varchar;?column?
----------f
(1 row)

mage=# select 'potyty'::varchar > 'potty'::varchar;?column?
----------f
(1 row)

The thing is that I think that the above user-visible behaviour might
actually be desirable. 'potyty'::varchar is* not* bitwise equal (our
general definition of text equality) to 'potty'::varchar. So, suppose
there was an equivalency operator, say ===, all would be right with
the world:

mage=# select 'potyty'::varchar === 'potty'::varchar;?column?
----------t
(1 row)

mage=# select 'potyty'::varchar < 'potty'::varchar;?column?
----------f
(1 row)

mage=# select 'potyty'::varchar > 'potty'::varchar;?column?
----------f
(1 row)

The reason everything wasn't right with the world, and this example
got screwed up for the complainant at the time was that texteq() still
returned false via the len1 != len2 fastpath - otherwise Tom's
testcase here would not have shown a problem. So at the time and under
the exact circumstances, the '=' operator was actually
equality-orientated (if only by accident, because len1 != len2 here),
whereas varstr_cmp() was entirely equivalency-orientated. So that
explains this test-case of Tom's. It does not explain the confusing
behaviour that led to the complaint though, since few people are in
the habit of verifying that the trichotomy law holds just for the fun
of it.

online=# select * from common_logins where username = 'potyty';
uid | username | password | lastlogin | status | usertype | loginnum
-----+----------+----------+-----------+--------+----------+----------
(0 rows)

online=# select * from common_logins where username like 'potyty';
uid | username | password | lastlogin | status | usertype | loginnum
--------+----------+----------+----------------------------+--------+----------+----------
155505 | potyty | board | 2004-08-16 17:45:55.723829 | A | S | 1
60067 | potyty | board | 2004-07-07 20:22:17.68699 | A | S | 3
174041 | potyty | board | 2005-02-17 00:00:13.706144 | A | S | 3
(3 rows)

online=# select username, username = 'potyty' from common_logins where
username like 'potyty';
username | ?column?
----------+----------
potyty   | t
potyty   | t
potyty   | t
(3 rows)

That isn't something that I've been able to figure out yet. I think
that maybe an index got corrupted because

Incidentally, I should point out that right now this code appears in
like_match.c:

MatchText(char *t, int tlen, char *p, int plen,      pg_locale_t locale, bool locale_is_c)

The only way that the locale and locale_is_c are used within this
function is via a recursive call to MatchText() - they're entirely
vestigial, it seems.

Now, again I must admit that it isn't obvious to me that my white lie
to the btree code about something being equal that is actually just
equivalent won't come back to haunt me - lies have short legs. On the
face of it though, it looks like I might get away with it, since
equality in a qual is re-checked when I execute a select statement,
but isn't checked when I attempt to violate a unique constraint (a
behavioural difference which the Hungarians will be happy about, since
they took the unusual step of representing that the strings 'potyty'
and 'potty' were exactly equivalent to the appropriate authorities).

Perhaps more importantly, I cannot recreate any of these problems on
my Fedora 16 machine. Even with hu_HU on LATIN2, Tom's original test
case (from 2005, on a Fedora 4 machine) cannot be recreated. So it may
be that they've tightened these things up in some way. It's far from
clear why that should be.

It could be worth

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: sortsupport for text

From

Peter Geoghegan

Date:

18 June 2012, 13:05:06

On 18 June 2012 16:59, Peter Geoghegan <peter@2ndquadrant.com> wrote:
> Perhaps more importantly, I cannot recreate any of these problems on
> my Fedora 16 machine. Even with hu_HU on LATIN2, Tom's original test
> case (from 2005, on a Fedora 4 machine) cannot be recreated. So it may
> be that they've tightened these things up in some way. It's far from
> clear why that should be.
>
> It could be worth

Ugh, hit send too soon. I meant to close with the observation that it
may be safe to rely on strcoll() / strxfrm() + strcmp() not ever
returning 0 on certain platforms, if my inability to recreate this is
anything to go by. There could be a test run by configure that
determines this, enabling us to then use the strxfrm() optimisation.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: sortsupport for text

From

Peter Geoghegan

Date:

19 June 2012, 11:41:41

So, just to give a bit more weight to my argument that we should
recognise that equivalent strings ought to be treated identically, I
direct your attention to conformance requirement C9 of Unicode 3.0:

http://www.unicode.org/unicode/standard/versions/enumeratedversions.html#Unicode_3_0_0

This "requires that canonically equivalent text be treated
identically. See also C10 and several other places in The Unicode
Standard."

This page, "DETERMINISTIC SORTING", was also interesting:

http://www.unicode.org/notes/tn9/

The same hack that we use in varstr_cmp() appears as psuedo-code,
under "Forcing Deterministic Comparisons". I'm not sure in what
general sense what *they'd* call a non-deterministic comparison (i.e.
plain old strcoll()) is actually non-deterministic. Clearly a
"non-deterministic comparison" is deterministic to the extent that
given the same two binary strings and encoding and collation, the
comparison function will always give the same answer.

The article goes on to describe how we could bolt-on a strxfrm() type
value to a string, to ensure "deterministic comparison" (i.e.
identical behaviour to Postgres master) with pre-processing. Tom did
think that this might still be worth it. I'd rather avoid it.

The article goes on to say of so-called deterministic comparisons:
"However, a deterministic comparison is generally not best practice.
First, it has a certain performance cost in comparison, and a quite
substantial impact on sort key size. (For example, [ICU]
language-sensitive sort keys are generally about the size of the
original string, so appending a copy of the original string generally
doubles the size of the sort key.)  A database using these sort keys
can have substantially increased disk footprint and memory footprint,
and consequently reduced performance."

I note also that the The Single UNIX Specification, Version 2 says of
strcoll(): "The strxfrm() and strcmp() functions should be used for
sorting large lists". Why has it taken us this long to research this?
I am aware that Greg Stark suggested it back in 2005, but I doubt that
that was the first time.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: sortsupport for text

From

"Kevin Grittner"

Date:

19 June 2012, 12:18:07

Peter Geoghegan <peter@2ndquadrant.com> wrote:
> So, just to give a bit more weight to my argument that we should
> recognise that equivalent strings ought to be treated identically
Since we appear to be questioning everything in this area, I'll
raise something which has been bugging me for a while: in some other
systems I've used, the "tie-breaker" comparison for equivalent
values comes after equivalence sorting on *all* sort keys, rather
than *each* sort key.  For example, this much makes sense with
lc_collate = 'en_US.UTF-8':
test=# create table c (last_name text not null, first_name text);
CREATE TABLE
test=# insert into c values ('smith', 'bob'), ('smith', 'peter'),
('SMITH', 'EDWARD');
INSERT 0 3
test=# select * from c order by 2;last_name | first_name 
-----------+------------smith     | bobSMITH     | EDWARDsmith     | peter
(3 rows)
This seems completely wrong:
test=# select * from c order by 1,2;last_name | first_name 
-----------+------------smith     | bobsmith     | peterSMITH     | EDWARD
(3 rows)
I have seen other databases which get it in the order I would expect
-- where the C compare only matters within groups of equivalent
rows.  It seems that PostgreSQL effectively orders by: last_name using collation 'en_US.UTF-8' last_name using
collation'C' first_name using collation 'en_US.UTF-8' first_name using collation 'C'
 
while some other products order by: last_name using collation 'en_US.UTF-8' first_name using collation 'en_US.UTF-8'
last_nameusing collation 'C' first_name using collation 'C'
 
I'm sure the latter is harder to do and slower to execute; but the
former just doesn't seem defensible as correct.
-Kevin

Re: sortsupport for text

From

Peter Geoghegan

Date:

19 June 2012, 13:17:50

On 19 June 2012 16:17, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
> Peter Geoghegan <peter@2ndquadrant.com> wrote:
>
>> So, just to give a bit more weight to my argument that we should
>> recognise that equivalent strings ought to be treated identically
>
> Since we appear to be questioning everything in this area, I'll
> raise something which has been bugging me for a while: in some other
> systems I've used, the "tie-breaker" comparison for equivalent
> values comes after equivalence sorting on *all* sort keys, rather
> than *each* sort key.

Are you sure that they actually have a tie-breaker, and don't just
make the distinction between equality and equivalence (if only
internally)? I would have checked that myself already, but I don't
have access to any other RDBMS that I'd expect to care about these
kinds of distinctions. They make sense for ensuring that the text
comparator's notion of equality is consistent with text's general
notion (if that's bitwise equality, which I suspect it is in these
other products too for the same reasons it is for us). I don't see why
you'd want a tie-breaker across multiple keys. I mean, you could, I
just don't see any reason to.

> test=# select * from c order by 2;
>  last_name | first_name
> -----------+------------
>  smith     | bob
>  SMITH     | EDWARD
>  smith     | peter
> (3 rows)
>
> This seems completely wrong:
>
> test=# select * from c order by 1,2;
>  last_name | first_name
> -----------+------------
>  smith     | bob
>  smith     | peter
>  SMITH     | EDWARD
> (3 rows)

Agreed. Definitely a POLA violation.

> I'm sure the latter is harder to do and slower to execute; but the
> former just doesn't seem defensible as correct.

This same gripe is held by the author of that sorting document I
linked to from the Unicode consortium, with a very similar example. So
it seems like this could be a win from several perspectives, as it
would enable the strxfrm() optimisation. I'm pretty sure that
pg_upgrade wouldn't be very happy about this, so we'd have to have a
legacy compatibility mode.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: sortsupport for text

From

"Kevin Grittner"

Date:

19 June 2012, 13:46:09

Peter Geoghegan <peter@2ndquadrant.com> wrote:
> Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
>> Since we appear to be questioning everything in this area, I'll
>> raise something which has been bugging me for a while: in some
>> other systems I've used, the "tie-breaker" comparison for
>> equivalent values comes after equivalence sorting on *all* sort
>> keys, rather than *each* sort key.
> 
> Are you sure that they actually have a tie-breaker, and don't just
> make the distinction between equality and equivalence (if only
> internally)?
I'm pretty sure that when I was using Sybase ASE the order for
non-equal values was always predictable, and it behaved in the
manner I describe below.  I'm less sure about any other product.
> I don't see why you'd want a tie-breaker across multiple keys. I
> mean, you could, I just don't see any reason to.
So that you can have entirely repeatable results across multiple
runs?
>> test=# select * from c order by 2;
>>  last_name | first_name
>> -----------+------------
>>  smith     | bob
>>  SMITH     | EDWARD
>>  smith     | peter
>> (3 rows)
>>
>> This seems completely wrong:
>>
>> test=# select * from c order by 1,2;
>>  last_name | first_name
>> -----------+------------
>>  smith     | bob
>>  smith     | peter
>>  SMITH     | EDWARD
>> (3 rows)
> 
> Agreed. Definitely a POLA violation.
> 
>> I'm sure the latter is harder to do and slower to execute; but
>> the former just doesn't seem defensible as correct.
> 
> This same gripe is held by the author of that sorting document I
> linked to from the Unicode consortium, with a very similar
> example. So it seems like this could be a win from several
> perspectives, as it would enable the strxfrm() optimisation. I'm
> pretty sure that pg_upgrade wouldn't be very happy about this, so
> we'd have to have a legacy compatibility mode.
At a minimum, it could require rebuilding indexes with multiple
columns where any indexed value before the last index column can be
equivalent but not equal.
-Kevin

Re: sortsupport for text

From

Peter Geoghegan

Date:

19 June 2012, 14:20:37

On 19 June 2012 17:45, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
> Peter Geoghegan <peter@2ndquadrant.com> wrote:
>> Are you sure that they actually have a tie-breaker, and don't just
>> make the distinction between equality and equivalence (if only
>> internally)?
>
> I'm pretty sure that when I was using Sybase ASE the order for
> non-equal values was always predictable, and it behaved in the
> manner I describe below.  I'm less sure about any other product.

Maybe it used a physical row identifier as a tie-breaker? Note that we
use ItemPointers as a tie-breaker for sorting index tuples.

I imagine that it was at least predictable among columns being sorted,
if only because en_US.UTF-8 doesn't have any notion of equivalence
(that is, it just so happens that there are no two strings that are
equivalent but not bitwise equal). It would surely be impractical to
do a comparison for the entire row, as that could be really expensive.

>> I don't see why you'd want a tie-breaker across multiple keys. I
>> mean, you could, I just don't see any reason to.
>
> So that you can have entirely repeatable results across multiple
> runs?

I suppose that isn't an unreasonable use-case, but it would be
unreasonable to require that everyone pay for it if, particularly if
it implied another strcmp(). You don't need me to tell you that we
generally discourage the idea that the order that tuples are returned
in is defined in any way beyond that asked for by the user.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: sortsupport for text

From

"Kevin Grittner"

Date:

19 June 2012, 14:57:57

Peter Geoghegan <peter@2ndquadrant.com> wrote:
> Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
>> I'm pretty sure that when I was using Sybase ASE the order for
>> non-equal values was always predictable, and it behaved in the
>> manner I describe below.  I'm less sure about any other product.
> 
> Maybe it used a physical row identifier as a tie-breaker? Note
> that we use ItemPointers as a tie-breaker for sorting index
> tuples.
> 
> I imagine that it was at least predictable among columns being
> sorted, if only because en_US.UTF-8 doesn't have any notion of
> equivalence (that is, it just so happens that there are no two
> strings that are equivalent but not bitwise equal). It would
> surely be impractical to do a comparison for the entire row, as
> that could be really expensive.
We weren't using en_US.UTF-8 collation (or any other "proper"
collation) on Sybase -- I'm not sure whether they even supported
proper collation sequences on the versions we used.  I'm thinking of
when we were using their "case insensitive" sorting.  I don't know
the implementation details, but the behavior was consistent with
including each character-based column twice: once in the requested
position in the ORDER BY clause but folded to a consistent case, and
again after all the columns in the ORDER BY clause in original form,
with C collation.
I wasn't aware that en_US.UTF-8 doesn't have equivalence without
equality.  I guess that surprising result in my last post is just
plain inevitable with that collation then.  Bummer.  Is there
actually anyone who finds that to be a useful behavior?  For a
collation which considered upper-case and lower-case to be
equivalent, would PostgreSQL sort as I wanted, or is it doing some
tie-break per column within equivalent values?
-Kevin

Re: sortsupport for text

From

Peter Geoghegan

Date:

19 June 2012, 15:44:58

On 19 June 2012 18:57, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
> We weren't using en_US.UTF-8 collation (or any other "proper"
> collation) on Sybase -- I'm not sure whether they even supported
> proper collation sequences on the versions we used.  I'm thinking of
> when we were using their "case insensitive" sorting.  I don't know
> the implementation details, but the behavior was consistent with
> including each character-based column twice: once in the requested
> position in the ORDER BY clause but folded to a consistent case, and
> again after all the columns in the ORDER BY clause in original form,
> with C collation.
>
> I wasn't aware that en_US.UTF-8 doesn't have equivalence without
> equality.

Not that many do. The underlying cause of the problem back in 2005 was
the tacit assumption that none do, which presumably we got away with
for a while. I mentioned Hungarian, but it happens a bit in Swedish
too.

PostgreSQL supported Unicode before 2005, when the tie-breaker was
introduced. I know at least one Swede who used Postgres95. I just took
a look at the REL6_4 branch, and it looks much the same in 1999 as it
did in 2005, in that there is no tie-breaker after the strcoll(). Now,
that being the case, and Hungarian in particular having a whole bunch
of these equivalencies, I have to wonder if the original complainant's
problem really was diagnosed correctly. It could of had something to
do with the fact that texteq() was confused about whether it reported
equality or equivalency - it may have taken that long for the (len1 !=
len2) fastpath thing (only holds for equality, not equivalence,
despite the fact that the 2005-era strcoll() call checks equivalence
within texteq() ) to trip someone out, because texteq() would have
thereby given inconsistent answers in a very subtle way, that were not
correct either according to the Hungarian locale, nor according to
simple bitwise equality. That's mostly speculation, but the question
must be asked.

> I guess that surprising result in my last post is just
> plain inevitable with that collation then.  Bummer.  Is there
> actually anyone who finds that to be a useful behavior?

Looking at the details again and assuming a US locale, yeah, it is.
The substance of your complain holds though.

> For a collation which considered upper-case and lower-case to be
> equivalent, would PostgreSQL sort as I wanted, or is it doing some
> tie-break per column within equivalent values?

You could do that, and some people do use custom collations for
various reasons. That's obviously very much of minority interest
though. Most people will just use citext or something. However, since
citext is itself a client of varstr_cmp(), this won't help you.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: sortsupport for text

From

Tom Lane

Date:

19 June 2012, 15:47:03

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> I wasn't aware that en_US.UTF-8 doesn't have equivalence without
> equality.  I guess that surprising result in my last post is just
> plain inevitable with that collation then.  Bummer.  Is there
> actually anyone who finds that to be a useful behavior?  For a
> collation which considered upper-case and lower-case to be
> equivalent, would PostgreSQL sort as I wanted, or is it doing some
> tie-break per column within equivalent values?

Well, there are two different questions there.

As far as the overall structure of an ORDER BY or btree index is
concerned, all it knows is what the datatype comparator functions
tell it.  There is no such thing as a second pass to reconsider
values found to be "equal"; that's all the info there is.

As far as the behavior of the comparator function for text is concerned,
we choose to break ties reported by strcoll via strcmp.  But that's not
visible from outside the comparator, so there's no notion of an
"equivalence" behavior different from "equality".

I'm not exactly convinced that we could have a second-pass tiebreak
mechanism without creating serious problems for btree indexes; in
particular it seems like ordering considering only some leading keys
might not match the actual index ordering.
        regards, tom lane

Re: sortsupport for text

From

Peter Geoghegan

Date:

19 June 2012, 16:00:29

On 19 June 2012 19:44, Peter Geoghegan <peter@2ndquadrant.com> wrote:
> You could do that, and some people do use custom collations for
> various reasons. That's obviously very much of minority interest
> though. Most people will just use citext or something. However, since
> citext is itself a client of varstr_cmp(), this won't help you.

I spoke too soon - that would work fine in the U.S, since the text is
converted to lower case on-the-fly in a locale and collation aware
fashion.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: sortsupport for text

From

Peter Eisentraut

Date:

20 June 2012, 07:00:33

On sön, 2012-06-17 at 23:58 +0100, Peter Geoghegan wrote:
> So if you take the word "Aßlar" here - that is equivalent to "Asslar",
> and so strcoll("Aßlar", "Asslar") will return 0 if you have the right
> LC_COLLATE

This is not actually correct.  glibc will sort Asslar before Aßlar, and
that is correct in my mind.

When a Wikipedia page on some particular language's alphabet says
something like "$letterA and $letterB are equivalent", what it really
means is that they are sorted the same compared to other letters, but
are distinct when ties are broken.

>  (if you tried this out for yourself and found that I was
> actually lying through my teeth, pretend I said Hungarian instead of
> German and "some really obscure character" rather than ß).

Yeah, there are obviously exceptions, which led to the original change
being made, but they are not as wide-spread as they appear to be.

The real issue in this area, I suspect, will be dealing with Unicode
combining sequences versus equivalent precombined characters.  But
support for that is generally crappy, so it's not urgent to deal with
it.

Re: sortsupport for text

From

Peter Geoghegan

Date:

20 June 2012, 07:28:22

On 20 June 2012 11:00, Peter Eisentraut <peter_e@gmx.net> wrote:
> On sön, 2012-06-17 at 23:58 +0100, Peter Geoghegan wrote:
>> So if you take the word "Aßlar" here - that is equivalent to "Asslar",
>> and so strcoll("Aßlar", "Asslar") will return 0 if you have the right
>> LC_COLLATE
>
> This is not actually correct.  glibc will sort Asslar before Aßlar, and
> that is correct in my mind.

Uh, what happened here was that I assumed that it was correct, and
then went to verify it and found that it wasn't before sending the
mail, and couldn't immediately find any hard data about what
characters this did apply to, I decided to turn it into a joke. I say
this, and yet you've included that bit of the e-mail inline in your
reply, so maybe it just wasn't a very good joke.

> When a Wikipedia page on some particular language's alphabet says
> something like "$letterA and $letterB are equivalent", what it really
> means is that they are sorted the same compared to other letters, but
> are distinct when ties are broken.

I know.

>>  (if you tried this out for yourself and found that I was
>> actually lying through my teeth, pretend I said Hungarian instead of
>> German and "some really obscure character" rather than ß).
>
> Yeah, there are obviously exceptions, which led to the original change
> being made, but they are not as wide-spread as they appear to be.

True.

> The real issue in this area, I suspect, will be dealing with Unicode
> combining sequences versus equivalent precombined characters.  But
> support for that is generally crappy, so it's not urgent to deal with
> it.

I agree that it isn't urgent. However, I have an ulterior motive,
which is that in allowing for this, we remove the need to strcmp()
after each strcoll(), and consequently it becomes possible to use
strxfrm() instead. Now, we could also use a hack that's going to make
the strxfrm() blobs even bulkier still (basically, concatenate the
original text to the blob before strcmp()), but I don't want to go
there if it can possibly be avoided.

I should also point out that we pride ourselves on following the
letter of the standard when that makes sense, and we are currently not
doing that in respect of the Unicode standard.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: sortsupport for text

From

Peter Geoghegan

Date:

20 June 2012, 09:36:50

On 19 June 2012 19:44, Peter Geoghegan <peter@2ndquadrant.com> wrote:
> PostgreSQL supported Unicode before 2005, when the tie-breaker was
> introduced. I know at least one Swede who used Postgres95. I just took
> a look at the REL6_4 branch, and it looks much the same in 1999 as it
> did in 2005, in that there is no tie-breaker after the strcoll(). Now,
> that being the case, and Hungarian in particular having a whole bunch
> of these equivalencies, I have to wonder if the original complainant's
> problem really was diagnosed correctly. It could of had something to
> do with the fact that texteq() was confused about whether it reported
> equality or equivalency - it may have taken that long for the (len1 !=
> len2) fastpath thing (only holds for equality, not equivalence,
> despite the fact that the 2005-era strcoll() call checks equivalence
> within texteq() ) to trip someone out, because texteq() would have
> thereby given inconsistent answers in a very subtle way, that were not
> correct either according to the Hungarian locale, nor according to
> simple bitwise equality.

It seems likely that this is more-or-less correct. The two equivalent
strings had a variable number of characters, so the fastpath made
texteq not accord with varstr_cmp(), even though texteq() itself also
only had a single strcoll() call.

So there was some tuples with the hungarian string "potty" in the 2005
bug report. They were not visible for any of the queries seen in test
cases, even the "good" ones. There were a few tuples with equivalent
strings like "potyty" that were visible.

The index scans didn't fail to return the expected tuples because the
indexes were internally inconsistent or otherwise corrupt. Rather, the
_bt_checkkeys() function returned early because the faulty "half
equivalence, half equality" texteq() comparator reported that the
tuple returned didn't satisfy the qual, and on that basis the index
scan stopped. This usually wouldn't happen with Swedish, because their
equivalencies tend to be one character long, and are less common.

So far, so good, but how did this not blow-up sooner? Did Hungarians
only start using Postgres in late 2005, immediately after the 8.1
release? Hardly.

Commit c1d62bfd00f4d1ea0647e12947ca1de9fea39b33, made in late 2003,
"Add operator strategy and comparison-value datatype fields to
ScanKey", may be part of the problem here.

Consider this test within _bt_checkkeys(), that was changed by that commit:

-       if (key->sk_flags & SK_COMMUTE)
-           test = FunctionCall2(&key->sk_func,
-                                key->sk_argument, datum);
-       else
-           test = FunctionCall2(&key->sk_func,
-                                datum, key->sk_argument);
+       test = FunctionCall2(&key->sk_func, datum, key->sk_argument);

-       if (DatumGetBool(test) == !!(key->sk_flags & SK_NEGATE))
+       if (!DatumGetBool(test))

I think that this change may have made the difference between the
Hungarians getting away with it and not getting away with it. Might it
have been that for text, they were using some operator that wasn't '='
(perhaps one which has no fastpath, and thus correctly made a
representation about equivalency) rather than texteq prior to this
commit? I didn't eyeball the pg_amop entries of the era myself, but it
seems quite possible. In any case, I find it hard to believe that it
took at least ten years for this problem to manifest itself just
because it took that long for a Hungarian with a strcoll()
implementation that correctly represented equivalency to use Postgres.
A year is a more plausible window.

If we do introduce an idea of equivalency to make all this work, that
means there'll have to be equivalency verification when equality
verification returned false in a number of places, including the
above. For Gist, there is an equivalent test will still vary based on
the strategy number used dubbed "the consistent function", which seems
analogous to the above.

So, you're going to have an extra strcoll()/strxfrm() + strcmp() here,
as part of a "not-equal-but-maybe-equivalent" test, which is bad.
However, if that means that we can cache a text constant as a
strxfrm() blob, and compare in a strxfrm()-wise fashion, that will
more than pay for itself, even for btree traversal alone. For a naive
strxfrm() + strcoll() implementation, that will save just under half
of the work, and everyone knows that the cost of the comparison is
what dominates here, particularly for certain collations.

We'd probably formalise it to the point where there'd be a btree
strategy number and fully-fledged equivalency operator that the user
could conceivably use themselves.

There seems to be scope-creep here. I'm not sure that I should
continue with this as part of this review. Maybe this should be
something that I work on for the next commitfest.

It would be nice to hear what others thought of these ideas before I
actually start writing a patch that both fixes these problems (our
behaviour is incorrect for some locales according to the Unicode
standard),  facilitates a strxfrm() optimisation, and actually adds a
strxfrm() optimisation.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: sortsupport for text

From

Greg Stark

Date:

20 June 2012, 11:11:44

On Sun, Jun 17, 2012 at 9:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The trick for hashing such datatypes is to be able to guarantee that
> "equal" values hash to the same hash code, which is typically possible
> as long as you know the equality rules well enough.  We could possibly
> do that for text with pure-strcoll equality if we knew all the details
> of what strcoll would consider "equal", but we do not.

It occurs to me that strxfrm would answer this question. If we made
the hash function hash the result of strxfrm then we could make
equality use strcoll and not fall back to strcmp.

I'm suspect in a green field that's what we would do though the cpu
cost might be enough to think hard about it. I'm not sure it's worth
considering switching though.

The cases where it matters to users incidentally is when you have a
multi-column sort order and have values that are supposed to sort
equal in the first column but print differently. Given that there
seems to be some controversy in the locale definitions -- most locals
seem to use "insignificant" factors like accents or ligatures as
tie-breakers and avoid claiming different sequences are equal even
when the language usually treats them as equivalent -- it doesn't seem
super important to maintain the property for the few locales that fall
the other way. Unless my impression is wrong and there's a good
principled reason why some locales treat nearly equivalent strings one
way and some treat them the other.

--
greg

Re: sortsupport for text

From

Peter Geoghegan

Date:

20 June 2012, 11:19:45

On 20 June 2012 15:10, Greg Stark <stark@mit.edu> wrote:
> On Sun, Jun 17, 2012 at 9:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The trick for hashing such datatypes is to be able to guarantee that
>> "equal" values hash to the same hash code, which is typically possible
>> as long as you know the equality rules well enough.  We could possibly
>> do that for text with pure-strcoll equality if we knew all the details
>> of what strcoll would consider "equal", but we do not.
>
> It occurs to me that strxfrm would answer this question. If we made
> the hash function hash the result of strxfrm then we could make
> equality use strcoll and not fall back to strcmp.

What about per-column collations?

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: sortsupport for text

From

Greg Stark

Date:

20 June 2012, 11:38:25

On Wed, Jun 20, 2012 at 3:19 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
>> It occurs to me that strxfrm would answer this question. If we made
>> the hash function hash the result of strxfrm then we could make
>> equality use strcoll and not fall back to strcmp.
>
> What about per-column collations?

Well collations aren't really per-column, they're specific to the
comparison in the expression at hand. The per-column collation is just
the default for comparisons against that column.

So for a hash join for example you would use build the hash table
using strxfrm for the collation being used for the join  expression.
For hash indexes the index would only be valid for a specific
collation, just like a btree index is.

But this all seems like a lot of work for a case that most locales
seem to think isn't worth worrying about.

-- 
greg

Re: sortsupport for text

From

Tom Lane

Date:

20 June 2012, 11:55:49

Peter Geoghegan <peter@2ndquadrant.com> writes:
> I think that this change may have made the difference between the
> Hungarians getting away with it and not getting away with it. Might it
> have been that for text, they were using some operator that wasn't '='
> (perhaps one which has no fastpath, and thus correctly made a
> representation about equivalency) rather than texteq prior to this
> commit?

Uh, no.  There aren't any magic variants of equality now, and there were
not back then either.  I'm inclined to think that the "Hungarian
problem" did exist long before we fixed it.

> So, you're going to have an extra strcoll()/strxfrm() + strcmp() here,
> as part of a "not-equal-but-maybe-equivalent" test, which is bad.
> However, if that means that we can cache a text constant as a
> strxfrm() blob, and compare in a strxfrm()-wise fashion, that will
> more than pay for itself, even for btree traversal alone.

Um ... are you proposing to replace text btree index entries with
strxfrm values?  Because if you aren't, I don't see that this is likely
to win anything.  And if you are, it loses on the grounds of (a) index
bloat and (b) loss of ability to do index-only scans.

> It would be nice to hear what others thought of these ideas before I
> actually start writing a patch that both fixes these problems (our
> behaviour is incorrect for some locales according to the Unicode
> standard),  facilitates a strxfrm() optimisation, and actually adds a
> strxfrm() optimisation.

Personally I think this is not a direction we want to go in.  I think
that it's going to end up a significant performance loss in many cases,
break backwards compatibility in numerous ways, and provide a useful
behavioral improvement to only a small minority of users.  If you check
the archives, we get far more complaints from people who think their
locale definition is wrong than from people who are unhappy because
we don't hew exactly to the corner cases of their locale.
        regards, tom lane

Re: sortsupport for text

From

Peter Geoghegan

Date:

20 June 2012, 12:57:15

On 20 June 2012 15:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Geoghegan <peter@2ndquadrant.com> writes:
>> I think that this change may have made the difference between the
>> Hungarians getting away with it and not getting away with it. Might it
>> have been that for text, they were using some operator that wasn't '='
>> (perhaps one which has no fastpath, and thus correctly made a
>> representation about equivalency) rather than texteq prior to this
>> commit?
>
> Uh, no.  There aren't any magic variants of equality now, and there were
> not back then either.  I'm inclined to think that the "Hungarian
> problem" did exist long before we fixed it.

I was suggesting that an operator other than equality may have been
used for some reason. It probably isn't a significant issue though.

> Um ... are you proposing to replace text btree index entries with
> strxfrm values?  Because if you aren't, I don't see that this is likely
> to win anything.  And if you are, it loses on the grounds of (a) index
> bloat and (b) loss of ability to do index-only scans.

No, I'm suggesting it would probably be at least a bit of a win here
to cache the constant, and only have to do a strxfrm() + strcmp() per
comparison. Not enough to justify all the added complexity of course,
but I wouldn't seek to justify this on the basis of improvements to
the speed of btree traversal.  It would obviously be much more
valuable for tuple sorting.

> Personally I think this is not a direction we want to go in.  I think
> that it's going to end up a significant performance loss in many cases,
> break backwards compatibility in numerous ways, and provide a useful
> behavioral improvement to only a small minority of users.

I may have over-emphasized the improvement in correctness that this
would bring, which I personally consider to be very much a secondary
benefit. The fact is that this is likely to be a fairly significant
performance win, because strxfrm() is quite simply the way you're
supposed to do collation-aware sorting, and is documented as such. For
that reason, C standard library implementations should not be expected
to emphasize its performance - they assume that you're using strxfrm()
+ their highly optimised strcmp() (as I've said, the glibc
implementation is written in ASM, and makes use of hand-written SSSE3
instructions on x86_64 for example).

The only performance downside that I can think of is the added check
for equivalence for each tuple within _bt_checkkeys() - perhaps you
were thinking of something else that hasn't occurred to me though.
Were you?

The added overhead is only going to be paid only once per index scan,
and not once per tuple returned by an index scan. Since equality
implies equivalence for us, there is no need to check equivalence
until something is unequal, and when that happens, and equivalence
doesn't hold, we're done. Meanwhile, that entire index scan just got
measurably faster from caching the constant's strxfrm() blob.

Now, maybe it is a bit funky that this equivalence test has to happen
even though the vast majority of locales don't care. If the only
alternative is to bloat up the strxfrm() blobs with the original
string, I'd judge that the funkiness is well worth it.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: sortsupport for text

From

Tom Lane

Date:

20 June 2012, 13:42:04

Peter Geoghegan <peter@2ndquadrant.com> writes:
> No, I'm suggesting it would probably be at least a bit of a win here
> to cache the constant, and only have to do a strxfrm() + strcmp() per
> comparison.

Um, have you got any hard evidence to support that notion?  The
traditional advice is that strcoll is faster than using strxfrm unless
the same strings are to be compared repeatedly.  I'm not convinced that
saving the strxfrm on just one side will swing the balance.

> The fact is that this is likely to be a fairly significant
> performance win, because strxfrm() is quite simply the way you're
> supposed to do collation-aware sorting, and is documented as such. For
> that reason, C standard library implementations should not be expected
> to emphasize its performance - they assume that you're using strxfrm()
> + their highly optimised strcmp()

Have you got any evidence in support of this claim, or is it just
wishful thinking about what's likely to be inside libc?  I'd also note
that any comparisons you may have seen about this are certainly not
accounting for the effects of data bloat from strxfrm (ie, possible
spill to disk, more merge passes, etc).

In any case, if you have to redefine the meaning of equality in order
to justify a performance patch, I'm prepared to walk away at the start.
The range of likely performance costs/benefits across different locales
and different implementations is so wide that if you can't show it to be
a win even with the strcmp tiebreaker, it's not likely to be a reliable
win without that.
        regards, tom lane

Re: sortsupport for text

From

Robert Haas

Date:

20 June 2012, 14:35:53

On Wed, Jun 20, 2012 at 12:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The fact is that this is likely to be a fairly significant
>> performance win, because strxfrm() is quite simply the way you're
>> supposed to do collation-aware sorting, and is documented as such. For
>> that reason, C standard library implementations should not be expected
>> to emphasize its performance - they assume that you're using strxfrm()
>> + their highly optimised strcmp()
>
> Have you got any evidence in support of this claim, or is it just
> wishful thinking about what's likely to be inside libc?  I'd also note
> that any comparisons you may have seen about this are certainly not
> accounting for the effects of data bloat from strxfrm (ie, possible
> spill to disk, more merge passes, etc).
>
> In any case, if you have to redefine the meaning of equality in order
> to justify a performance patch, I'm prepared to walk away at the start.
> The range of likely performance costs/benefits across different locales
> and different implementations is so wide that if you can't show it to be
> a win even with the strcmp tiebreaker, it's not likely to be a reliable
> win without that.

On the testing I've done, the strcmp() tie-breaking rarely gets run
anyway.  Unless you're sorting data with only a few distinct values,
most comparisons are between values that are distinct under any choice
of collation, and therefore strcoll() returns 0 very rarely, and
therefore the additional runtime it consumes does not matter very
much.  Also, it's quite a bit faster than strcoll() anyway, so even
when it does run it doesn't add much to the total time.

I think the elephant in the room here is that we're relying on the OS
to do everything for us, and the OS API we use (strcoll) requires an
extra memcpy and is also dreadfully slow.  If we could solve that
problem, it would save us a lot more than worrying about the extra
strcmp().  Of course, solving that problem is hard: we either have to
get the glibc and FreeBSD libc folks to provide a better API (that
takes lengths for each input instead of relying on trailing nul
bytes), or reimplement locales within PG, or store trailing NUL bytes
that we don't really need in the index so we can apply strcoll
directly, none of which are very appealing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: sortsupport for text

From

Peter Geoghegan

Date:

20 June 2012, 14:38:54

On 20 June 2012 17:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Geoghegan <peter@2ndquadrant.com> writes:
>> No, I'm suggesting it would probably be at least a bit of a win here
>> to cache the constant, and only have to do a strxfrm() + strcmp() per
>> comparison.
>
> Um, have you got any hard evidence to support that notion?  The
> traditional advice is that strcoll is faster than using strxfrm unless
> the same strings are to be compared repeatedly.  I'm not convinced that
> saving the strxfrm on just one side will swing the balance.

Fair enough,  but I'd suggest that that traditional advice assumes
that strcoll()'s space-efficiency and general avoidance of dynamic
allocation is an important factor, which, for us, it clearly isn't,
since we can re-use a single buffer in the manner of Robert's text
sortsupport patch for each and every per-tuple strxfrm() blob (when
traversing an index). This is a direct quote from glibc's strcoll_l.c
:
    Perform the first pass over the string and while doing this find    and store the weights for each character.
Sincewe want this to    be as fast as possible we are using `alloca' to store the temporary    values.  But since there
isno limit on the length of the string    we have to use `malloc' if the string is too long.  We should be    very
conservativehere. 

Here, alloca is used to allocate space in a stack frame. I believe
that this is an entirely inappropriate trade-off for Postgres to be
making. strxfrm(), in constrast, leaves buffer sizing and management
up to the caller. That has to be a big part of the problem here.

>> The fact is that this is likely to be a fairly significant
>> performance win, because strxfrm() is quite simply the way you're
>> supposed to do collation-aware sorting, and is documented as such. For
>> that reason, C standard library implementations should not be expected
>> to emphasize its performance - they assume that you're using strxfrm()
>> + their highly optimised strcmp()
>
> Have you got any evidence in support of this claim, or is it just
> wishful thinking about what's likely to be inside libc?

According to the single-unix specification's strcoll() documentation,
"The strxfrm() and strcmp() functions should be used for sorting large
lists". If that isn't convincing enough for you, there is the fact
that glibc's strcmp() is clearly highly optimised for each and every
architecture, and that we are currently throwing away an extra
strcmp() in the event of strcoll() equality.

> I'd also note that any comparisons you may have seen about this are certainly not
> accounting for the effects of data bloat from strxfrm (ie, possible
> spill to disk, more merge passes, etc).

What about the fact that strcoll() may be repeatedly allocating and
freeing memory per comparison? The blobs really aren't that much
larger than the strings to be sorted, which are typically quite short.

> In any case, if you have to redefine the meaning of equality in order
> to justify a performance patch, I'm prepared to walk away at the start.

The advantage of my proposed implementation is precisely that I won't
have to redefine the meaning of equality, and that only the text
datatype will have to care about equivalency, so you can just skip
over an explanation of equivalency for most audiences.

If you feel that strongly about it, and I have no possible hope of
getting this accepted, I'm glad that I know now rather than after
completing a significant amount of work on this. I would like to hear
other people's opinions before I drop it though.

> The range of likely performance costs/benefits across different locales
> and different implementations is so wide that if you can't show it to be
> a win even with the strcmp tiebreaker, it's not likely to be a reliable
> win without that.

glibc is the implementation that really matters. My test-case used
en_US.UTF-8 as its locale, which has to be one of the least stressful
to strcoll() - I probably could have shown a larger improvement just
by selecting a locale that was known to have to make more passes,
like, say, hu_HU.UTF-8.

I expect to have access to a 16 core server next week, which Bull have
made available to me. Maybe I'll get some interesting performance
numbers from it.

The reason that I don't want to use the blob with original string hack
is because it's ugly, space-inefficient, unnecessary and objectively
incorrect, since it forces us to violate the conformance requirement
C9 of Unicode 3.0, marginal though that may be.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: sortsupport for text

From

Peter Geoghegan

Date:

20 June 2012, 21:23:16

On 20 June 2012 17:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Geoghegan <peter@2ndquadrant.com> writes:
>> No, I'm suggesting it would probably be at least a bit of a win here
>> to cache the constant, and only have to do a strxfrm() + strcmp() per
>> comparison.
>
> Um, have you got any hard evidence to support that notion?  The
> traditional advice is that strcoll is faster than using strxfrm unless
> the same strings are to be compared repeatedly.  I'm not convinced that
> saving the strxfrm on just one side will swing the balance.

I've written a very small C++ program, which I've attached, that
basically proves that this can still make a fairly large difference -
I hope it's okay that that it's C++, but that allowed me to write the
program quickly, with no dependencies for anyone who cares to try this
out, other than a C++ compiler and the standard library.

It just inserts 500,000 elements (the same succession of psuedo-random
strings again and again) into a std::set, which is implemented using a
red-black tree on my machine. In one case, we just use strcmp() (there
is actually no tie-breaker). In the other case, the comparator
allocates and fills memory for a strxfrm blob when needed, caches it,
and uses strxfrm() to generate blobs for other elements into a
dedicated buffer which persists across comparisons.

It prints the duration of each run, and a running average for each
case until terminated. Here is what the output looks like on my
machine:

[peter@peterlaptop strxfrm_test]$ g++ -O2 set_test.cpp
[peter@peterlaptop strxfrm_test]$ ./a.out
Time elapsed with strcoll: 1.485290 seconds
Time elapsed with strxfrm optimization: 1.070211 seconds
Time elapsed with strcoll: 1.813725 seconds
Time elapsed with strxfrm optimization: 1.097950 seconds
Time elapsed with strcoll: 1.813381 seconds
Time elapsed with strxfrm optimization: 1.102769 seconds
Time elapsed with strcoll: 1.826708 seconds
Time elapsed with strxfrm optimization: 1.105093 seconds
Time elapsed with strcoll: 1.842156 seconds
Time elapsed with strxfrm optimization: 1.103198 seconds
*****strcoll average: 1.756252 strxfrm average: 1.095844*****
Time elapsed with strcoll: 1.834785 seconds
Time elapsed with strxfrm optimization: 1.105369 seconds
Time elapsed with strcoll: 1.817449 seconds
Time elapsed with strxfrm optimization: 1.110386 seconds
Time elapsed with strcoll: 1.812446 seconds
Time elapsed with strxfrm optimization: 1.101266 seconds
Time elapsed with strcoll: 1.813595 seconds
Time elapsed with strxfrm optimization: 1.099444 seconds
Time elapsed with strcoll: 1.814584 seconds
Time elapsed with strxfrm optimization: 1.099542 seconds
*****strcoll average: 1.787412 strxfrm average: 1.099523*****
Time elapsed with strcoll: 1.820218 seconds
Time elapsed with strxfrm optimization: 1.102984 seconds
Time elapsed with strcoll: 1.817549 seconds
Time elapsed with strxfrm optimization: 1.100526 seconds
Time elapsed with strcoll: 1.817020 seconds
Time elapsed with strxfrm optimization: 1.099273 seconds
Time elapsed with strcoll: 1.820983 seconds
Time elapsed with strxfrm optimization: 1.100501 seconds
Time elapsed with strcoll: 1.813270 seconds
Time elapsed with strxfrm optimization: 1.098904 seconds
*****strcoll average: 1.797544 strxfrm average: 1.099828*****
Time elapsed with strcoll: 1.815952 seconds
Time elapsed with strxfrm optimization: 1.102583 seconds

If you'd like to print the contents of the set, there are a couple of
lines you can uncomment. It should be straightforward to understand
the program, even if you don't know any C++.

Note that I still think that the compelling case for strxfrm() caching
is sorting tuples, not btree index traversal. All that I ask is that
you allow that on the whole, this will pay for the cost of verifying
equivalency within _bt_checkkeys() once per index-scan.

I am aware that a red-black tree isn't generally an excellent proxy
for how a btree will perform, but I don't think that that really
matters here.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachment

set_test.cpp

Re: sortsupport for text

From

Peter Geoghegan

Date:

20 June 2012, 23:17:33

On 21 June 2012 01:22, Peter Geoghegan <peter@2ndquadrant.com> wrote:
> I've written a very small C++ program, which I've attached, that
> basically proves that this can still make a fairly large difference -
> I hope it's okay that that it's C++, but that allowed me to write the
> program quickly, with no dependencies for anyone who cares to try this
> out, other than a C++ compiler and the standard library.

So, it turns out that with a relatively expensive to sort collation
like hu_HU.UTF-8, the difference here is huge:

[peter@peterlaptop strxfrm_test]$ ./a.out
locale set: hu_HU.UTF-8
Time elapsed with strcoll: 11.122695 seconds
Time elapsed with strxfrm optimization: 2.328365 seconds
Time elapsed with strcoll: 11.404160 seconds
Time elapsed with strxfrm optimization: 2.355684 seconds
Time elapsed with strcoll: 11.411680 seconds
Time elapsed with strxfrm optimization: 2.342553 seconds
Time elapsed with strcoll: 11.415693 seconds
Time elapsed with strxfrm optimization: 2.343593 seconds
Time elapsed with strcoll: 11.428686 seconds
Time elapsed with strxfrm optimization: 2.345204 seconds
*****strcoll average: 11.356583 strxfrm average: 2.343080*****

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: sortsupport for text

From

Florian Pflug

Date:

21 June 2012, 04:38:13

On Jun20, 2012, at 19:38 , Peter Geoghegan wrote:
> On 20 June 2012 17:41, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> In any case, if you have to redefine the meaning of equality in order
>> to justify a performance patch, I'm prepared to walk away at the start.
> 
> The advantage of my proposed implementation is precisely that I won't
> have to redefine the meaning of equality, and that only the text
> datatype will have to care about equivalency, so you can just skip
> over an explanation of equivalency for most audiences.

Ok, so what exactly is it that you're proposing then? AFAICS, you're
proposing to make the less-than operator rely solely on strcol(). If you
don't also redefine the quality operator to match, you'll end up with
cases where !(a < b) and !(b < a), yet also !(a == b). This breaks the
trichotomy law, no? Which in turns breaks merge-joins - I believe we'd
output the cartesian product {potty, potyty} x {potyty, potty} when
merge-joining {potty, potyty} with itself, unless the comparison operator
contains a tie-breaker. Which is obviously wrong unless you decree
that 'potty' = 'potyty'.

I do agree that there's a use-case for having a textual type which
treats equivalent strings as equal (and which should then also treat
equivalent Unicode representations of the same character as equal). But
it should be a separate type, not bolted onto the existing textual types.

best regards,
Florian Pflug

Re: sortsupport for text

From

Florian Pflug

Date:

21 June 2012, 06:24:44

On Jun21, 2012, at 02:22 , Peter Geoghegan wrote:
> I've written a very small C++ program, which I've attached, that
> basically proves that this can still make a fairly large difference -
> I hope it's okay that that it's C++, but that allowed me to write the
> program quickly, with no dependencies for anyone who cares to try this
> out, other than a C++ compiler and the standard library.

Uh, are you sure the results aren't swapped? string_wrapper takes a
parameter called use_strcoll, and it indeed uses strcoll() if that
parameter is true, and strxfm() otherwise. But then it passes *false*
to determine the speed of strcoll(), and *true* to determine the speed
of strxfm()...

Also, place_random_string() needs to insert a terminating zero byte,
otherwise set_test segfaults, at least on my OSX machine.

best regards,
Florian Pflug

Re: sortsupport for text

From

Peter Geoghegan

Date:

21 June 2012, 06:56:14

On 21 June 2012 10:24, Florian Pflug <fgp@phlo.org> wrote:
> On Jun21, 2012, at 02:22 , Peter Geoghegan wrote:
>> I've written a very small C++ program, which I've attached, that
>> basically proves that this can still make a fairly large difference -
>> I hope it's okay that that it's C++, but that allowed me to write the
>> program quickly, with no dependencies for anyone who cares to try this
>> out, other than a C++ compiler and the standard library.
>
> Uh, are you sure the results aren't swapped? string_wrapper takes a
> parameter called use_strcoll, and it indeed uses strcoll() if that
> parameter is true, and strxfm() otherwise. But then it passes *false*
> to determine the speed of strcoll(), and *true* to determine the speed
> of strxfm()...

Egads, you're right. Apparently in my haste I gave the wrong boolean
argument to the class constructor in each case. I am completely wrong
about this. I apologise for the careless mistake. At least I advanced
the debate, though - we don't want to do any ad-hoc generation of
strxfrm() output within comparators, even when one side can have a
cached value.

You're right about merge-joins; I cannot answer that without
arbitrarily making a convenient exception about the conditions that
they join on, which is itself unacceptable. I cannot change the
semantics of equality to do something with strxfrm either, since that
would have unacceptable performance implications, as is evident from
my test-case. I wish someone had a better idea, but I suppose at this
point the idea of concatenating strxfrm() output with the original
string begins to look like the least-worst option.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: sortsupport for text

From

Florian Pflug

Date:

21 June 2012, 07:41:18

On Jun21, 2012, at 11:55 , Peter Geoghegan wrote:
> On 21 June 2012 10:24, Florian Pflug <fgp@phlo.org> wrote:
>> On Jun21, 2012, at 02:22 , Peter Geoghegan wrote:
>>> I've written a very small C++ program, which I've attached, that
>>> basically proves that this can still make a fairly large difference -
>>> I hope it's okay that that it's C++, but that allowed me to write the
>>> program quickly, with no dependencies for anyone who cares to try this
>>> out, other than a C++ compiler and the standard library.
>>
>> Uh, are you sure the results aren't swapped? string_wrapper takes a
>> parameter called use_strcoll, and it indeed uses strcoll() if that
>> parameter is true, and strxfm() otherwise. But then it passes *false*
>> to determine the speed of strcoll(), and *true* to determine the speed
>> of strxfm()...
>
> Egads, you're right. Apparently in my haste I gave the wrong boolean
> argument to the class constructor in each case. I am completely wrong
> about this. I apologise for the careless mistake. At least I advanced
> the debate, though - we don't want to do any ad-hoc generation of
> strxfrm() output within comparators, even when one side can have a
> cached value.

FWIW, I've played around a bit with a fixed version of your set_test.cpp.
I made it use the cached sort key of the RHS also, made it preserve sort
keys during copy construction, even used a boost::shared_ptr to avoid
actually copying the cached sort key. The strxfrm() version still performed
about 30% worse than the strcoll() version.

From that, I figured that tree inserts don't trigger enough comparisons to
make strxfrm() worthwhile. So I mangled set_test.cpp a bit more and instead
of a set::set, I created an (C-style) array of string_wrapper instances,
and sorted them with std::sort(). Which made strcoll() twice as fast as
strxfrm()...

At this point, my theory is that your choice of "random" strings prevents
strxfrm() from ever winning over strcoll(). The reason being that you pick
each letter uniformly distributed from a-z, resulting in a probability of
two string differing in the first of 1 - 1/26 =~ 96%. Thus, even for
extremely complex collation rules, strcoll() probably only needs to compare
a few characters to determine the order. Whereas strxfrm() has to compute
the whole sort key, no matter what.

The question is thus, how good a model are your "random" strings for the
input of a typical sorting step in postgres? My guess is, a quite good one
actually, since people probably don't deal with a lot of very similar strings
very often. Which makes we wonder if using strxfrm() during sorting wouldn't
be a net loss, all things considered…

My modified set_test.cpp (which should now be called array_test.cpp)
is attached.

best regards,
Florian Pflug

Attachment

set_test.cpp

Re: sortsupport for text

From

Peter Geoghegan

Date:

21 June 2012, 08:16:01

On 21 June 2012 11:40, Florian Pflug <fgp@phlo.org> wrote:
> At this point, my theory is that your choice of "random" strings prevents
> strxfrm() from ever winning over strcoll(). The reason being that you pick
> each letter uniformly distributed from a-z, resulting in a probability of
> two string differing in the first of 1 - 1/26 =~ 96%. Thus, even for
> extremely complex collation rules, strcoll() probably only needs to compare
> a few characters to determine the order. Whereas strxfrm() has to compute
> the whole sort key, no matter what.

Good point.

> The question is thus, how good a model are your "random" strings for the
> input of a typical sorting step in postgres? My guess is, a quite good one
> actually, since people probably don't deal with a lot of very similar strings
> very often. Which makes we wonder if using strxfrm() during sorting wouldn't
> be a net loss, all things considered…

Well, I think the answer to that has to be no. I posted a simple
postgres text -> strxfrm blob bytea SQL-callable wrapper function a
few days ago that clearly wins by quite a bit on some sample queries
against the dellstore sample database, which has a representative
sample set. Completely uniformly-distributed data actually isn't
representative of the real world at all. Normal distributions abound.

This C++ program was never intended to justify the general utility of
using strxfrm() for sorting, which I don't believe is in question. I
just wanted to get some idea about the performance characteristics of
using strxfrm() to traverse an index.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: sortsupport for text

From

Robert Haas

Date:

23 July 2012, 12:09:55

On Fri, Jun 15, 2012 at 6:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I would be concerned about this if it were per-sort-tuple wastage, but
> what I understood to be happening was that this was a single instance of
> an expansible buffer (per sort, perhaps, but still just one buffer).
> And, as you keep pointing out when it suits your argument, it's
> relatively uncommon to be sorting enormous values anyway.  So no, I am
> not particularly worried about that.  If you are, there are more
> important places to be concerned about allocation pad wastage, starting
> with palloc itself.

And, of course, palloc doesn't use power-of-two allocation up to
infinity either, as proposed here; beyond a certain limit it switches
to calling malloc(), which doesn't use power-of-two allocations for
large requests either, at least not in glibc - for anything 128kB or
above, it uses mmap to grab something of exactly the requested size
(rounded up to the OS page size) and returns it the OS unconditionally
when it's freed.

However, what this really boils down to is that you and Peter don't
like this line of code:

+               tss->buflen1 = TYPEALIGN(TEXTBUFLEN, len1);

What would you like it to say instead?

The obvious formulation is:

while (len1 < tss->buflen1)   tss->buflen *= 2;

Or perhaps the following, which will normally be more efficient,
though possibly not as efficient as what I've got there now:

tss->buflen = 1 << ffs(len1);

Of course if len1 happens to be large enough that the high-order bit
is set, the first proposal above will fail to terminate and the second
one will produce 0.  That can't really happen in practice because a
toasted datum is limited to 1GB, of course, and I suppose a lot of
code would need to be adjusted if we ever wanted to raise that limit,
so maybe it's not worth worrying about it.

So, is one of the above code fragments what people want?  Or something else?

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: sortsupport for text

From

Peter Geoghegan

Date:

23 July 2012, 12:34:52

On 23 July 2012 16:09, Robert Haas <robertmhaas@gmail.com> wrote:
> However, what this really boils down to is that you and Peter don't
> like this line of code:
>
> +               tss->buflen1 = TYPEALIGN(TEXTBUFLEN, len1);

I can only speak for myself, though I agree with your summary here.

> What would you like it to say instead?
>
> The obvious formulation is:
>
> while (len1 < tss->buflen1)
>     tss->buflen *= 2;

That's what I had in mind. +1.

> Or perhaps the following, which will normally be more efficient,
> though possibly not as efficient as what I've got there now:
>
> tss->buflen = 1 << ffs(len1);

I'm sorry, I don't follow you here. What is ffs() ?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: sortsupport for text

From

Robert Haas

Date:

23 July 2012, 12:37:14

On Mon, Jul 23, 2012 at 11:34 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
> On 23 July 2012 16:09, Robert Haas <robertmhaas@gmail.com> wrote:
>> However, what this really boils down to is that you and Peter don't
>> like this line of code:
>>
>> +               tss->buflen1 = TYPEALIGN(TEXTBUFLEN, len1);
>
> I can only speak for myself, though I agree with your summary here.
>
>> What would you like it to say instead?
>>
>> The obvious formulation is:
>>
>> while (len1 < tss->buflen1)
>>     tss->buflen *= 2;
>
> That's what I had in mind. +1.
>
>> Or perhaps the following, which will normally be more efficient,
>> though possibly not as efficient as what I've got there now:
>>
>> tss->buflen = 1 << ffs(len1);
>
> I'm sorry, I don't follow you here. What is ffs() ?

Sorry, fls, not ffs.  I always get those mixed up.

See src/port/fls.c

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: sortsupport for text

From

Peter Geoghegan

Date:

23 July 2012, 13:07:38

On 23 July 2012 16:36, Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Jul 23, 2012 at 11:34 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
>>> tss->buflen = 1 << ffs(len1);
>>
>> I'm sorry, I don't follow you here. What is ffs() ?
>
> Sorry, fls, not ffs.  I always get those mixed up.
>
> See src/port/fls.c

Oh, okay. Since, I infer, we're starting from a buffer-size that's a
power-of-two anyway, is there really any advantage in doing this
rather than just doubling the buffer size each time?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: sortsupport for text

From

Robert Haas

Date:

23 July 2012, 13:09:32

On Mon, Jul 23, 2012 at 12:07 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
> On 23 July 2012 16:36, Robert Haas <robertmhaas@gmail.com> wrote:
>> On Mon, Jul 23, 2012 at 11:34 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
>>>> tss->buflen = 1 << ffs(len1);
>>>
>>> I'm sorry, I don't follow you here. What is ffs() ?
>>
>> Sorry, fls, not ffs.  I always get those mixed up.
>>
>> See src/port/fls.c
>
> Oh, okay. Since, I infer, we're starting from a buffer-size that's a
> power-of-two anyway, is there really any advantage in doing this
> rather than just doubling the buffer size each time?

Well, if you're using a builtin fls rather than our src/port
implementation, it's probably a single machine language instruction
instead of a loop.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: sortsupport for text

From

Peter Geoghegan

Date:

08 October 2012, 12:48:07

Do you think you can follow through on this soon, Robert? I don't
believe that there are any outstanding issues. I'm not going to make
an issue of the fact that strxfrm() hasn't been taken advantage of. If
you could please post a new revision, with the suggested alterations
(that you agree with), I'd be happy to take another look.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: sortsupport for text

From

Robert Haas

Date:

08 October 2012, 15:31:04

On Mon, Oct 8, 2012 at 8:47 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
> Do you think you can follow through on this soon, Robert? I don't
> believe that there are any outstanding issues. I'm not going to make
> an issue of the fact that strxfrm() hasn't been taken advantage of. If
> you could please post a new revision, with the suggested alterations
> (that you agree with), I'd be happy to take another look.

I don't have any plans to work on this further.  I think all of the
criticism that has been leveled at this patch is 100% bogus, and I
greatly dislike the changes that have been proposed.  That may not be
fair, but it's how I feel, and in light of that feeling it does not
make sense for me to pursue this.  Sorry.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: sortsupport for text

From

Peter Geoghegan

Date:

08 October 2012, 16:26:25

On 8 October 2012 16:30, Robert Haas <robertmhaas@gmail.com> wrote:
> I don't have any plans to work on this further.  I think all of the
> criticism that has been leveled at this patch is 100% bogus, and I
> greatly dislike the changes that have been proposed.  That may not be
> fair, but it's how I feel, and in light of that feeling it does not
> make sense for me to pursue this.  Sorry.

If it was the case that you were only 50% of the way to getting
something committable, I guess I'd understand; this is, after all, a
volunteer effort, and you are of course free to pursue or not pursue
whatever you like. It's more like 95% though, so I have a hard time
understanding this attitude.

The only criticism that I imagine that you could be referring to is my
criticism (which was seconded by Tom) concerning the approach to
sizing a buffer that is used as state between successive calls to the
sortsupport routine. This was a fairly trivial point even within the
context of this patch. It was regrettable that it got so out of hand,
but that certainly wasn't my intention.

I must say that I find this disheartening as a reviewer, particularly
since it comes from someone who reviews a lot of patches (including
most of my own), and has often complained about the difficulties that
reviewers face. Is it really so hard for you to accept a criticism
from me? I didn't expect you to fully accept my criticism; I only
expected you to take it into consideration, and possibly let it go for
the sake of progress.

For what it's worth, I have high regard for your work generally. In
all sincerity, this appears to me to be a slight, and I find it
upsetting, both on a personal level, and because it creates a concern
about being able to work with you in the future, which is certainly
something that I've benefited from in the past, and hope to continue
to benefit from.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: sortsupport for text

From

Robert Haas

Date:

08 October 2012, 20:35:44

On Mon, Oct 8, 2012 at 12:26 PM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
> If it was the case that you were only 50% of the way to getting
> something committable, I guess I'd understand; this is, after all, a
> volunteer effort, and you are of course free to pursue or not pursue
> whatever you like. It's more like 95% though, so I have a hard time
> understanding this attitude.
>
> The only criticism that I imagine that you could be referring to is my
> criticism (which was seconded by Tom) concerning the approach to
> sizing a buffer that is used as state between successive calls to the
> sortsupport routine. This was a fairly trivial point even within the
> context of this patch. It was regrettable that it got so out of hand,
> but that certainly wasn't my intention.
>
> I must say that I find this disheartening as a reviewer, particularly
> since it comes from someone who reviews a lot of patches (including
> most of my own), and has often complained about the difficulties that
> reviewers face. Is it really so hard for you to accept a criticism
> from me? I didn't expect you to fully accept my criticism; I only
> expected you to take it into consideration, and possibly let it go for
> the sake of progress.
>
> For what it's worth, I have high regard for your work generally. In
> all sincerity, this appears to me to be a slight, and I find it
> upsetting, both on a personal level, and because it creates a concern
> about being able to work with you in the future, which is certainly
> something that I've benefited from in the past, and hope to continue
> to benefit from.

Hey, if me deciding I don't want to work on a patch any more is going
to make you feel slighted, then you're out of luck.  The archives are
littered with people who have decided to stop working on things
because the consensus position on list was different than the approach
that they personally favored, and I have as much right to do that as
anyone else.  I think that you are perfectly entitled to feel
disappointed that I don't like your suggestions, even as I am
disappointed that the patch encountered so much resistance.  But let's
not turn it into a personal issue.  I don't think that you're a bad
person because you don't like my approach, and I hope you don't think
I'm a bad person because I don't like yours.  If you do, then that's
just something we're both going to have to live with, because I am not
going to make a policy of working on things because other people will
be offended if I don't.

As to what the substantive issue is, well, yeah, I don't like the
memory allocation strategy that you and Tom are proposing.  In the
worst case it chews up a lot of extra memory, and in the best case
there's no measurable performance benefit - or at least nobody done
any work to demonstrate that there is one.  I've already pointed out
that, in fact, the strategy of doing power-of-two allocations is
rarely followed for very large allocations, a point to which I believe
no one has responded substantively.  Now I don't think anyone else is
required to respond to that point, but neither am I required to revise
the patch into a form that I don't agree with.  Equally, I will not
presume to commit it in a form that you and Tom do not agree with.
That would be asinine, and I try (not always successfully) not to be
an ass.  It is not as if anyone has phrased this as a maybe-we-should
sort of argument; there have been quite definite arguments on both
sides over apparently strongly-held positions.  I had hoped that this
was going to be a quick and non-controversial win, but 74 email
messages later it has become clear that it will be neither of those
things.

I do not have a problem with accepting review feedback from you or
from anyone else.  There are many examples in the archives of cases
where I have improved my code based on review feedback; indeed, the
possibility of getting high-quality feedback from the very smart
developers who inhabit this mailing list is for me a principle
attraction of working on PostgreSQL, not to mention a major reason why
PostgreSQL is as good a product as it is.  However, that general truth
does not oblige me to accept any particular piece of feedback as
well-founded, and this is hardly the first suggestion that I have
argued against in four years as a PostgreSQL developer, nor the first
of my patches to land on the cutting-room floor.  Nor do all of my
suggestions for other people's patches get adopted either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: sortsupport for text

From

Peter Geoghegan

Date:

08 October 2012, 22:35:44

On 8 October 2012 21:35, Robert Haas <robertmhaas@gmail.com> wrote:
> Hey, if me deciding I don't want to work on a patch any more is going
> to make you feel slighted, then you're out of luck.  The archives are
> littered with people who have decided to stop working on things
> because the consensus position on list was different than the approach
> that they personally favored, and I have as much right to do that as
> anyone else.

Sure you do. However, I doubt very many of those who gave up did so
over so trivial an issue as to how to grow a buffer somewhere, and
those that did usually did not go on to become major contributors, and
certainly not committers. The buffer thing is, as far as I know, the
single solitary point of contention with this patch. We're talking
about 2 lines of code. To give up now is just petulant. There is no
other way of looking at it.

> It is not as if anyone has phrased this as a maybe-we-should
> sort of argument; there have been quite definite arguments on both
> sides over apparently strongly-held positions.  I had hoped that this
> was going to be a quick and non-controversial win, but 74 email
> messages later it has become clear that it will be neither of those
> things.

Many of those 74 emails concerned my completely unrelated digression
into exploiting strxfrm(); we spent a ridiculously long time
discussing how to size this buffer, but it still wasn't anything like
74 messages.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Re: sortsupport for text

From

Dimitri Fontaine

Date:

09 October 2012, 07:22:39

Peter Geoghegan <peter@2ndquadrant.com> writes:
> On 8 October 2012 21:35, Robert Haas <robertmhaas@gmail.com> wrote:

Gentlemen, you can't fight here. This is the War Room.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support