Re: Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers
From | Claudio Freire |
---|---|
Subject | Re: Parallel tuplesort (for parallel B-Tree index creation) |
Date | |
Msg-id | CAGTBQpaxaJ9t95E18WTQdXKPjA7zp4g81+rS-FApZb+jEtvT1w@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel tuplesort (for parallel B-Tree index creation) (Peter Geoghegan <pg@heroku.com>) |
Responses |
Re: Parallel tuplesort (for parallel B-Tree index creation)
|
List | pgsql-hackers |
On Tue, Sep 6, 2016 at 8:28 PM, Peter Geoghegan <pg@heroku.com> wrote: >> In the meanwhile, I'll go and do some perf testing. >> >> Assuming the speedup is realized during testing, LGTM. > > Thanks. I suggest spending at least as much time on unsympathetic > cases (e.g., only 2 or 3 tapes must be merged). At the same time, I > suggest focusing on a type that has relatively expensive comparisons, > such as collated text, to make differences clearer. The tests are still running (the benchmark script I came up with runs for a lot longer than I anticipated, about 2 days), but preliminar results are very promising, I can see a clear and consistent speedup. We'll have to wait for the complete results to see if there's any significant regression, though. I'll post the full results when I have them, but until now it all looks like this: setup: create table lotsofitext(i text, j text, w text, z integer, z2 bigint); insert into lotsofitext select cast(random() * 1000000000.0 as text) || 'blablablawiiiiblabla', cast(random() * 1000000000.0 as text) || 'blablablawjjjblabla', cast(random() * 1000000000.0 as text) || 'blablabl awwwabla', random() * 1000000000.0, random() * 1000000000000.0 from generate_series(1, 10000000); timed: select count(*) FROM (select * from lotsofitext order by i, j, w, z, z2) t; Unpatched Time: 100351.251 ms Patched Time: 75180.787 ms That's like a 25% speedup on random input. As we say over here, rather badly translated, not a turkey's boogers (meaning "nice!") On Tue, Sep 6, 2016 at 9:50 PM, Claudio Freire <klaussfreire@gmail.com> wrote: > On Tue, Sep 6, 2016 at 9:19 PM, Peter Geoghegan <pg@heroku.com> wrote: >> On Tue, Sep 6, 2016 at 4:55 PM, Claudio Freire <klaussfreire@gmail.com> wrote: >>> I noticed, but here n = state->memtupcount >>> >>> + Assert(memtuples[0].tupindex == newtup->tupindex); >>> + >>> + CHECK_FOR_INTERRUPTS(); >>> + >>> + n = state->memtupcount; /* n is heap's size, >>> including old root */ >>> + imin = 0; /* >>> start with caller's "hole" in root */ >>> + i = imin; >> >> I'm fine with using "n" in the later assertion you mentioned, if >> that's clearer to you. memtupcount is broken out as "n" simply because >> that's less verbose, in a place where that makes things far clearer. >> >>> In fact, the assert on the patch would allow writing memtuples outside >>> the heap, as in calling tuplesort_heap_root_displace if >>> memtupcount==0, but I don't think that should be legal (memtuples[0] >>> == memtuples[imin] would be outside the heap). >> >> You have to have a valid heap (i.e. there must be at least one >> element) to call tuplesort_heap_root_displace(), and it doesn't >> directly compact the heap, so it must remain valid on return. The >> assertion exists to make sure that everything is okay with a >> one-element heap, a case which is quite possible. > > More than using "n" or "memtupcount" what I'm saying is to assert that > memtuples[imin] is inside the heap, which would catch the same errors > the original assert would, and more. > > Assert(imin < state->memtupcount) > > If you prefer. > > The original asserts allows any value of imin for memtupcount>1, and > that's my main concern. It shouldn't. So, for the assertions to properly avoid clobbering/reading out of bounds memory, you need both the above assert: + */+ memtuples[i] = memtuples[imin];+ i = imin;+ }+ >+ Assert(imin < state->memtupcount);+ memtuples[imin] = *newtup;+} And another one at the beginning, asserting: + SortTuple *memtuples = state->memtuples;+ int n,+ imin,+ i;+ >+ Assert(state->memtupcount > 0 && memtuples[0].tupindex == newtup->tupindex);++ CHECK_FOR_INTERRUPTS(); It's worth making that change, IMHO, unless I'm missing something.
pgsql-hackers by date: