Re: Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers
| From | Claudio Freire |
|---|---|
| Subject | Re: Parallel tuplesort (for parallel B-Tree index creation) |
| Date | |
| Msg-id | CAGTBQpaxaJ9t95E18WTQdXKPjA7zp4g81+rS-FApZb+jEtvT1w@mail.gmail.com Whole thread Raw |
| In response to | Re: Parallel tuplesort (for parallel B-Tree index creation) (Peter Geoghegan <pg@heroku.com>) |
| Responses |
Re: Parallel tuplesort (for parallel B-Tree index creation)
|
| List | pgsql-hackers |
On Tue, Sep 6, 2016 at 8:28 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> In the meanwhile, I'll go and do some perf testing.
>>
>> Assuming the speedup is realized during testing, LGTM.
>
> Thanks. I suggest spending at least as much time on unsympathetic
> cases (e.g., only 2 or 3 tapes must be merged). At the same time, I
> suggest focusing on a type that has relatively expensive comparisons,
> such as collated text, to make differences clearer.
The tests are still running (the benchmark script I came up with runs
for a lot longer than I anticipated, about 2 days), but preliminar
results are very promising, I can see a clear and consistent speedup.
We'll have to wait for the complete results to see if there's any
significant regression, though. I'll post the full results when I have
them, but until now it all looks like this:
setup:
create table lotsofitext(i text, j text, w text, z integer, z2 bigint);
insert into lotsofitext select cast(random() * 1000000000.0 as text)
|| 'blablablawiiiiblabla', cast(random() * 1000000000.0 as text) ||
'blablablawjjjblabla', cast(random() * 1000000000.0 as text) ||
'blablabl
awwwabla', random() * 1000000000.0, random() * 1000000000000.0 from
generate_series(1, 10000000);
timed:
select count(*) FROM (select * from lotsofitext order by i, j, w, z, z2) t;
Unpatched Time: 100351.251 ms
Patched Time: 75180.787 ms
That's like a 25% speedup on random input. As we say over here, rather
badly translated, not a turkey's boogers (meaning "nice!")
On Tue, Sep 6, 2016 at 9:50 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Tue, Sep 6, 2016 at 9:19 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> On Tue, Sep 6, 2016 at 4:55 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>>> I noticed, but here n = state->memtupcount
>>>
>>> + Assert(memtuples[0].tupindex == newtup->tupindex);
>>> +
>>> + CHECK_FOR_INTERRUPTS();
>>> +
>>> + n = state->memtupcount; /* n is heap's size,
>>> including old root */
>>> + imin = 0; /*
>>> start with caller's "hole" in root */
>>> + i = imin;
>>
>> I'm fine with using "n" in the later assertion you mentioned, if
>> that's clearer to you. memtupcount is broken out as "n" simply because
>> that's less verbose, in a place where that makes things far clearer.
>>
>>> In fact, the assert on the patch would allow writing memtuples outside
>>> the heap, as in calling tuplesort_heap_root_displace if
>>> memtupcount==0, but I don't think that should be legal (memtuples[0]
>>> == memtuples[imin] would be outside the heap).
>>
>> You have to have a valid heap (i.e. there must be at least one
>> element) to call tuplesort_heap_root_displace(), and it doesn't
>> directly compact the heap, so it must remain valid on return. The
>> assertion exists to make sure that everything is okay with a
>> one-element heap, a case which is quite possible.
>
> More than using "n" or "memtupcount" what I'm saying is to assert that
> memtuples[imin] is inside the heap, which would catch the same errors
> the original assert would, and more.
>
> Assert(imin < state->memtupcount)
>
> If you prefer.
>
> The original asserts allows any value of imin for memtupcount>1, and
> that's my main concern. It shouldn't.
So, for the assertions to properly avoid clobbering/reading out of
bounds memory, you need both the above assert:
+ */+ memtuples[i] = memtuples[imin];+ i = imin;+ }+
>+ Assert(imin < state->memtupcount);+ memtuples[imin] = *newtup;+}
And another one at the beginning, asserting:
+ SortTuple *memtuples = state->memtuples;+ int n,+ imin,+
i;+
>+ Assert(state->memtupcount > 0 && memtuples[0].tupindex == newtup->tupindex);++ CHECK_FOR_INTERRUPTS();
It's worth making that change, IMHO, unless I'm missing something.
pgsql-hackers by date: