Re: B-Tree support function number 3 (strxfrm() optimization) - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: B-Tree support function number 3 (strxfrm() optimization) |
Date | |
Msg-id | CAM3SWZS3ZcYM7eYZw0dUu6CieJ8W4B3CYv1f=j2PgGxhvr8O_A@mail.gmail.com Whole thread Raw |
In response to | Re: B-Tree support function number 3 (strxfrm() optimization) (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: B-Tree support function number 3 (strxfrm() optimization)
Re: B-Tree support function number 3 (strxfrm() optimization) |
List | pgsql-hackers |
On Thu, Sep 4, 2014 at 9:19 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Tue, Sep 2, 2014 at 10:27 PM, Peter Geoghegan <pg@heroku.com> wrote: >> * Still doesn't address the open question of whether or not we should >> optimistically always try "memcmp() == 0" on tiebreak. I still lean >> towards "yes". > > Let m be the cost of a memcmp() that fails near the end of the > strings; and let s be the cost of a strcoll that does likewise. > Clearly s > m. But approximately what is s/m on platforms where you > can test? Say, with 100 byte string, in a few different locales. Just to be clear: I imagine you're more or less sold on the idea of testing equality in the event of a tie-break, where the leading 8 primary weight bytes are already known to be equal (and the full text string lengths also match); the theory of operation behind testing how good a proxy for full key cardinality abbreviated key cardinality is is very much predicated on that. We can still win big with very low cardinality sets this way, which are an important case. What I consider an open question is whether or not we should do that on the first call when there is no abbreviated comparison, such as on the second or subsequent attribute in a multi-column sort, in the hope that equality will just happen to be indicated. > If for example s/m > 100 then it's a no-brainer, because in the worst > case we're adding 1% overhead, and in the best case we're saving 99%. > OTOH, if s/m < 2 then I almost certainly wouldn't do it, because in > the worst case we're adding >50% overhead, and in the best case we're > saving <50%. That seems like it's doubling down on the abbreviated > key stuff to work mostly all the time, and I'm not prepared to make > that bet. There is of course a lot of daylight between a 2-to-1 ratio > and a 100-to-1 ratio and I expect the real value is somewhere in the > middle (probably closer to 2); I haven't at this time made up my mind > what value would make this worthwhile, but I'd like to know what the > real numbers are. Well, we can only lose when the strings happen to be the same size. So that's something. But I'm willing to consider the possibility that the memcmp() is virtually free. I would only proceed with this extra optimization if that is actually the case. Modern CPUs are odd things. Branch prediction/instruction pipelining, and the fact that we're frequently stalled on cache misses might combine to make it effectively the case that the opportunistic memcmp() is free. I could be wrong about that, and I'm certainly wrong if you test large enough strings with differences only towards the very end, but it seems reasonable to speculate that it would work well with appropriate precautions (in particular, don't do it when the strings are huge). Let me try and come up with some numbers for a really unsympathetic case, since you've already seen sympathetic numbers. I think the sympathetic country/province/city sort test case [1] is actually fairly representative; sort keys *are* frequently correlated like that, implying that there are lots of savings to be had by being "memcmp() == 0 optimistic" when resolving comparisons using the second or subsequent attribute. >> * Leaves open the question of what to do when we can't use the >> abbreviated keys optimization just because a datum tuplesort is >> preferred when sorting single-attribute tuples (recall that datum case >> tuplesorts cannot use abbreviated keys). We want to avail of tuplesort >> datum sorting where we can, except when abbreviated keys are >> available, which presumably tips the balance in favor of heap tuple >> sorting even when sorting on only one attribute, simply because then >> we can then use abbreviated keys. I'm thinking in particular of >> nodeAgg.c, which is an important case. > > I favor leaving this issue to a future patch. The last thing this > patch needs is more changes that someone might potentially dislike. > Let's focus on getting the core thing working, and then you can > enhance it once we all agree that it is. Makes sense. I think we should make a separate pass to enable sort support for B-Tree sorting - that's probably the most compelling case, after all. That's certainly the thing that I've heard complaints about. There could be as many as 2-3 follow-up commits. > On the substance of this issue, I suspect that for pass-by-value data > types it can hardly be wrong to use the datum tuplesort approach; but > it's possible we will want to disable it for pass-by-reference data > types when the abbreviated-key infrastructure is available. That will > lose if it turns out that the abbreviated keys aren't capturing enough > of the entropy, but maybe we'll decide that's OK. Or maybe not. But > I don't think it's imperative that this patch make a change in that > area, and indeed, in the interest of keeping separate changes > isolated, I think it's better if it doesn't. Right. I had presumed that we'd want to figure that out each time. I wasn't sure how best to go about doing that, which is why it's an open item. [1] http://www.postgresql.org/message-id/CAM3SWZQTYv3KP+CakZJZV3RwB1OJjaHwPCZ9cOYJXPkhbtcBVg@mail.gmail.com -- Peter Geoghegan
pgsql-hackers by date: