Re: Parallel tuplesort (for parallel B-Tree index creation) - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Parallel tuplesort (for parallel B-Tree index creation) |
Date | |
Msg-id | CAM3SWZTpeTcdZ1nFPuokKrFcvsoxMzeyw0ts3EppJ6xQysWnmw@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel tuplesort (for parallel B-Tree index creation) (Claudio Freire <klaussfreire@gmail.com>) |
Responses |
Re: Parallel tuplesort (for parallel B-Tree index creation)
|
List | pgsql-hackers |
On Tue, Sep 6, 2016 at 4:55 PM, Claudio Freire <klaussfreire@gmail.com> wrote: > I noticed, but here n = state->memtupcount > > + Assert(memtuples[0].tupindex == newtup->tupindex); > + > + CHECK_FOR_INTERRUPTS(); > + > + n = state->memtupcount; /* n is heap's size, > including old root */ > + imin = 0; /* > start with caller's "hole" in root */ > + i = imin; I'm fine with using "n" in the later assertion you mentioned, if that's clearer to you. memtupcount is broken out as "n" simply because that's less verbose, in a place where that makes things far clearer. > In fact, the assert on the patch would allow writing memtuples outside > the heap, as in calling tuplesort_heap_root_displace if > memtupcount==0, but I don't think that should be legal (memtuples[0] > == memtuples[imin] would be outside the heap). You have to have a valid heap (i.e. there must be at least one element) to call tuplesort_heap_root_displace(), and it doesn't directly compact the heap, so it must remain valid on return. The assertion exists to make sure that everything is okay with a one-element heap, a case which is quite possible. If you want to see a merge involving one input tape, apply the entire parallel CREATE INDEX patch set, set "force_parallal_mode = regress", and note that the leader merge merges only 1 input tape, making the heap only ever contain one element. In general, most use of the heap for k-way merging will eventually end up as a one element heap, at the very end. Maybe that assertion you mention is overkill, but I like to err on the side of overkill with assertions. It doesn't seem that important, though. > Sure, that's a weird enough case (that assert up there already reads > memtuples[0] which would be equally illegal if memtupcount==0), but it > goes on to show that the assert expression just seems odd for its > intent. > > BTW, I know it's not the scope of the patch, but shouldn't > root_displace be usable on the TSS_BOUNDED phase? I don't think it should be, no. With a top-n heap sort, the expectation is that after a little while, we can immediately determine that most tuples do not belong in the heap (this will require more than one comparison per tuple when the tuple that may be entered into the heap will in fact go in the heap, which should be fairly rare after a time). That's why that general strategy can be so much faster, of course. Note that that heap is "reversed" -- the sort order is inverted, so that we can use a minheap. The top of the heap is the most marginal tuple in the top-n heap so far, and so is the next to be removed from consideration entirely (not the next to be returned to caller, when merging). Anyway, I just don't think that this is important enough to change -- it couldn't possibly be worth much of any risk. I can see the appeal of consistency, but I also see the appeal of sticking to how things work there: continually and explicitly inserting into and compacting the heap seems like a good enough way of framing what a top-n heap does, since there are no groupings of tuples (tapes) involved there. -- Peter Geoghegan
pgsql-hackers by date: