Home > mailing lists

Re: Progress on fast path sorting, btree index creation time - Mailing list pgsql-hackers

From	Peter Geoghegan
Subject	Re: Progress on fast path sorting, btree index creation time
Date	January 27, 2012 16:34:17
Msg-id	CAEYLb_XbGFOXVDztbP+S0LfoTYdsjN43oVCim+2xi412ULSHFw@mail.gmail.com Whole thread Raw
In response to	Re: Progress on fast path sorting, btree index creation time (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Progress on fast path sorting, btree index creation time
List	pgsql-hackers

Tree view

On 27 January 2012 14:37, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jan 27, 2012 at 9:27 AM, Peter Geoghegan <peter@2ndquadrant.com> wrote:
>> Well, I don't think it's all that subjective - it's more the case that
>> it is just difficult, or it gets that way as you consider more
>> specialisations.
>
> Sure it's subjective.  Two well-meaning people could have different
> opinions without either of them being "wrong".  If you do a lot of
> small, in-memory sorts, more of this stuff is going to seem worthwhile
> than if you don't.

But if you don't, then you're not going to have your cache
compromised, so the cost is limited to having to store a few tens of
kilobytes of extra binary executable data on disk, and perhaps
main-memory, that you wouldn't otherwise have to - a cost that is
virtually indistinguishable from zero. When you do eventually need to
do some in-memory sort, you get to have that go significantly faster,
and since you don't have much use for the specialisations anyway, you
get that with essentially no down-side.

The concern is a perfect storm of all specialisations being
simultaneously used such that it'd be more efficient to use a generic
qsort. I think that's pretty implausible - the general assumption is
that database applications are not frequently CPU bound. They're
assumed to be frequently memory-bound though, so any effort to reduced
memory consumption - which this patch effectively does - is probably
going to be more valuable.

Even if we suppose that the perfect storm can and does occur, on a
chip that is so starved of instruction cache that it turns out to be a
net loss, surely even then the perfect storm is a rare occurrence, and
the aggregate effect is that they benefit. Besides, Postgres
performance optimisations for which you can contrive a case that
results in a net-loss in performance are well precedented.

>> As for what types/specialisations may not make the cut, I'm
>> increasingly convinced that floats (in the following order: float4,
>> float8) should be the first to go. Aside from the fact that we cannot
>> use their specialisations for anything like dates and timestamps,
>> floats are just way less useful than integers in the context of
>> database applications, or at least those that I've been involved with.
>> As important as floats are in the broad context of computing, it's
>> usually only acceptable to store data in a database as floats within
>> scientific applications, and only then when their limitations are
>> well-understood and acceptable. I think we've all heard anecdotes at
>> one time or another, involving their limitations not being well
>> understood.
>
> While we're waiting for anyone else to weigh in with an opinion on the
> right place to draw the line here, do you want to post an updated
> patch with the changes previously discussed?

Patch is attached. I have not changed the duplicate functions. This is
because I concluded that it was the lesser of two evils to have to get
the template to generate both declarations in the header file, and
definitions in the .c file - that seemed particularly obscure. We're
never going to have to expose/duplicate any more comparators anyway.
Do you agree?

It's pretty easy to remove a specialisation at any time - just remove
less than 10 lines of code. It's also pretty difficult to determine,
with everyone's absolute confidence, where the right balance lies.
Perhaps the sensible thing to do is to not be so conservative in what
we initially commit, while clearly acknowledging that we may not have
the balance right, and that it may have to change. We then have the
entire beta part of the cycle in which to decide to roll back from
that position, without any plausible downside. If, on the other hand,
we conservatively lean towards fewer specialisations in the initial
commit, no one will complain about the lack of an improvement in
performance that they never had.

Tom's objections related to the total number of specialisations, and
their distributed costs - the very idea of full specialisations was
not objected to. I think it's fair to say that there is no controversy
at all remaining about whether or not we should have *some* number of
specialisations. Therefore, I'm going to suggest that assuming you
have no further objections to the style of the code, and no one else
voices any other objections in the next couple of days, that you
provisionally commit this latest revision with all of its
specialisations, while putting people on notice about this.

I think that possibly the one remaining blocker to tentatively
committing this with all specialisations intact is that I haven't
tested this on Windows, as I don't currently have access to a Windows
development environment. I have set one up before, but it's a huge
pain. Can anyone help me out?

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachment

fastpath_sort_2012_01_27.patch

pgsql-hackers by date:

From: "Pierre C"
Date: 27 January 2012, 15:56:53
Subject: Re: Multithread Query Planner

From: Dan Scales
Date: 27 January 2012, 17:07:23
Subject: Re: 16-bit page checksums for 9.2

Re: Progress on fast path sorting, btree index creation time - Mailing list pgsql-hackers

Attachment

Previous

Next