Home > mailing lists

Re: Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers

From	Claudio Freire
Subject	Re: Parallel tuplesort (for parallel B-Tree index creation)
Date	September 8, 2016 15:53:33
Msg-id	CAGTBQpaxaJ9t95E18WTQdXKPjA7zp4g81+rS-FApZb+jEtvT1w@mail.gmail.com Whole thread Raw
In response to	Re: Parallel tuplesort (for parallel B-Tree index creation) (Peter Geoghegan <pg@heroku.com>)
Responses	Re: Parallel tuplesort (for parallel B-Tree index creation)
List	pgsql-hackers

Tree view

On Tue, Sep 6, 2016 at 8:28 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> In the meanwhile, I'll go and do some perf testing.
>>
>> Assuming the speedup is realized during testing, LGTM.
>
> Thanks. I suggest spending at least as much time on unsympathetic
> cases (e.g., only 2 or 3 tapes must be merged). At the same time, I
> suggest focusing on a type that has relatively expensive comparisons,
> such as collated text, to make differences clearer.

The tests are still running (the benchmark script I came up with runs
for a lot longer than I anticipated, about 2 days), but preliminar
results are very promising, I can see a clear and consistent speedup.
We'll have to wait for the complete results to see if there's any
significant regression, though. I'll post the full results when I have
them, but until now it all looks like this:

setup:

create table lotsofitext(i text, j text, w text, z integer, z2 bigint);
insert into lotsofitext select cast(random() * 1000000000.0 as text)
|| 'blablablawiiiiblabla', cast(random() * 1000000000.0 as text) ||
'blablablawjjjblabla', cast(random() * 1000000000.0 as text) ||
'blablabl
awwwabla', random() * 1000000000.0, random() * 1000000000000.0 from
generate_series(1, 10000000);

timed:

select count(*) FROM (select * from lotsofitext order by i, j, w, z, z2) t;

Unpatched Time: 100351.251 ms
Patched Time: 75180.787 ms

That's like a 25% speedup on random input. As we say over here, rather
badly translated, not a turkey's boogers (meaning "nice!")


On Tue, Sep 6, 2016 at 9:50 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Tue, Sep 6, 2016 at 9:19 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> On Tue, Sep 6, 2016 at 4:55 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>>> I noticed, but here n = state->memtupcount
>>>
>>> +       Assert(memtuples[0].tupindex == newtup->tupindex);
>>> +
>>> +       CHECK_FOR_INTERRUPTS();
>>> +
>>> +       n = state->memtupcount;                 /* n is heap's size,
>>> including old root */
>>> +       imin = 0;                                               /*
>>> start with caller's "hole" in root */
>>> +       i = imin;
>>
>> I'm fine with using "n" in the later assertion you mentioned, if
>> that's clearer to you. memtupcount is broken out as "n" simply because
>> that's less verbose, in a place where that makes things far clearer.
>>
>>> In fact, the assert on the patch would allow writing memtuples outside
>>> the heap, as in calling tuplesort_heap_root_displace if
>>> memtupcount==0, but I don't think that should be legal (memtuples[0]
>>> == memtuples[imin] would be outside the heap).
>>
>> You have to have a valid heap (i.e. there must be at least one
>> element) to call tuplesort_heap_root_displace(), and it doesn't
>> directly compact the heap, so it must remain valid on return. The
>> assertion exists to make sure that everything is okay with a
>> one-element heap, a case which is quite possible.
>
> More than using "n" or "memtupcount" what I'm saying is to assert that
> memtuples[imin] is inside the heap, which would catch the same errors
> the original assert would, and more.
>
> Assert(imin < state->memtupcount)
>
> If you prefer.
>
> The original asserts allows any value of imin for memtupcount>1, and
> that's my main concern. It shouldn't.

So, for the assertions to properly avoid clobbering/reading out of
bounds memory, you need both the above assert:
+                */+               memtuples[i] = memtuples[imin];+               i = imin;+       }+
>+       Assert(imin < state->memtupcount);+       memtuples[imin] = *newtup;+}

And another one at the beginning, asserting:
+       SortTuple  *memtuples = state->memtuples;+       int             n,+                               imin,+
                       i;+
 
>+       Assert(state->memtupcount > 0 && memtuples[0].tupindex == newtup->tupindex);++       CHECK_FOR_INTERRUPTS();

It's worth making that change, IMHO, unless I'm missing something.

pgsql-hackers by date:

From: Alvaro Herrera
Date: 08 September 2016, 15:48:28
Subject: Re: CF app and patch series

From: Peter Eisentraut
Date: 08 September 2016, 15:55:19
Subject: Re: [PATCH v2] Add overflow checks to money type input function

Re: Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers

Previous

Next