Re: Progress on fast path sorting, btree index creation time - Mailing list pgsql-hackers
| From | Peter Geoghegan | 
|---|---|
| Subject | Re: Progress on fast path sorting, btree index creation time | 
| Date | |
| Msg-id | CAEYLb_XbGFOXVDztbP+S0LfoTYdsjN43oVCim+2xi412ULSHFw@mail.gmail.com Whole thread Raw | 
| In response to | Re: Progress on fast path sorting, btree index creation time (Robert Haas <robertmhaas@gmail.com>) | 
| Responses | Re: Progress on fast path sorting, btree index creation time | 
| List | pgsql-hackers | 
On 27 January 2012 14:37, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Jan 27, 2012 at 9:27 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote: >> Well, I don't think it's all that subjective - it's more the case that >> it is just difficult, or it gets that way as you consider more >> specialisations. > > Sure it's subjective. Two well-meaning people could have different > opinions without either of them being "wrong". If you do a lot of > small, in-memory sorts, more of this stuff is going to seem worthwhile > than if you don't. But if you don't, then you're not going to have your cache compromised, so the cost is limited to having to store a few tens of kilobytes of extra binary executable data on disk, and perhaps main-memory, that you wouldn't otherwise have to - a cost that is virtually indistinguishable from zero. When you do eventually need to do some in-memory sort, you get to have that go significantly faster, and since you don't have much use for the specialisations anyway, you get that with essentially no down-side. The concern is a perfect storm of all specialisations being simultaneously used such that it'd be more efficient to use a generic qsort. I think that's pretty implausible - the general assumption is that database applications are not frequently CPU bound. They're assumed to be frequently memory-bound though, so any effort to reduced memory consumption - which this patch effectively does - is probably going to be more valuable. Even if we suppose that the perfect storm can and does occur, on a chip that is so starved of instruction cache that it turns out to be a net loss, surely even then the perfect storm is a rare occurrence, and the aggregate effect is that they benefit. Besides, Postgres performance optimisations for which you can contrive a case that results in a net-loss in performance are well precedented. >> As for what types/specialisations may not make the cut, I'm >> increasingly convinced that floats (in the following order: float4, >> float8) should be the first to go. Aside from the fact that we cannot >> use their specialisations for anything like dates and timestamps, >> floats are just way less useful than integers in the context of >> database applications, or at least those that I've been involved with. >> As important as floats are in the broad context of computing, it's >> usually only acceptable to store data in a database as floats within >> scientific applications, and only then when their limitations are >> well-understood and acceptable. I think we've all heard anecdotes at >> one time or another, involving their limitations not being well >> understood. > > While we're waiting for anyone else to weigh in with an opinion on the > right place to draw the line here, do you want to post an updated > patch with the changes previously discussed? Patch is attached. I have not changed the duplicate functions. This is because I concluded that it was the lesser of two evils to have to get the template to generate both declarations in the header file, and definitions in the .c file - that seemed particularly obscure. We're never going to have to expose/duplicate any more comparators anyway. Do you agree? It's pretty easy to remove a specialisation at any time - just remove less than 10 lines of code. It's also pretty difficult to determine, with everyone's absolute confidence, where the right balance lies. Perhaps the sensible thing to do is to not be so conservative in what we initially commit, while clearly acknowledging that we may not have the balance right, and that it may have to change. We then have the entire beta part of the cycle in which to decide to roll back from that position, without any plausible downside. If, on the other hand, we conservatively lean towards fewer specialisations in the initial commit, no one will complain about the lack of an improvement in performance that they never had. Tom's objections related to the total number of specialisations, and their distributed costs - the very idea of full specialisations was not objected to. I think it's fair to say that there is no controversy at all remaining about whether or not we should have *some* number of specialisations. Therefore, I'm going to suggest that assuming you have no further objections to the style of the code, and no one else voices any other objections in the next couple of days, that you provisionally commit this latest revision with all of its specialisations, while putting people on notice about this. I think that possibly the one remaining blocker to tentatively committing this with all specialisations intact is that I haven't tested this on Windows, as I don't currently have access to a Windows development environment. I have set one up before, but it's a huge pain. Can anyone help me out? -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Attachment
pgsql-hackers by date: