Re: Tuplesort merge pre-reading - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Tuplesort merge pre-reading |
Date | |
Msg-id | CAM3SWZTpaORV=yQGVCG8Q4axcZ3MvF-05xe39ZvORdU9JcD6hQ@mail.gmail.com Whole thread Raw |
In response to | Re: Tuplesort merge pre-reading (Heikki Linnakangas <hlinnaka@iki.fi>) |
Responses |
Re: Tuplesort merge pre-reading
|
List | pgsql-hackers |
On Thu, Sep 29, 2016 at 4:10 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > Bah, I fumbled the initSlabAllocator() function, attached is a fixed > version. This looks much better. It's definitely getting close. Thanks for being considerate of my more marginal concerns. More feedback: * Should say "fixed number of...": > + * we start merging. Merging only needs to keep a small, fixed number tuples * Minor concern about these new macros: > +#define IS_SLAB_SLOT(state, tuple) \ > + ((char *) tuple >= state->slabMemoryBegin && \ > + (char *) tuple < state->slabMemoryEnd) > + > +/* > + * Return the given tuple to the slab memory free list, or free it > + * if it was palloc'd. > + */ > +#define RELEASE_SLAB_SLOT(state, tuple) \ > + do { \ > + SlabSlot *buf = (SlabSlot *) tuple; \ > + \ > + if (IS_SLAB_SLOT(state, tuple)) \ > + { \ > + buf->nextfree = state->slabFreeHead; \ > + state->slabFreeHead = buf; \ > + } else \ > + pfree(tuple); \ > + } while(0) I suggest duplicating the paranoia seen elsewhere around what "state" macro argument could expand to. You know, by surrounding "state" with parenthesis each time it is used. This is what we see with existing, similar macros. * Should cast to int64 here (for the benefit of win64): > + elog(LOG, "using " INT64_FORMAT " KB of memory for read buffers among %d input tapes", > + (long) (availBlocks * BLCKSZ) / 1024, numInputTapes); * FWIW, I still don't love this bit: > + * numTapes and numInputTapes reflect the actual number of tapes we will > + * use. Note that the output tape's tape number is maxTapes - 1, so the > + * tape numbers of the used tapes are not consecutive, so you cannot > + * just loop from 0 to numTapes to visit all used tapes! > + */ > + if (state->Level == 1) > + { > + numInputTapes = state->currentRun; > + numTapes = numInputTapes + 1; > + FREEMEM(state, (state->maxTapes - numTapes) * TAPE_BUFFER_OVERHEAD); > + } But I can see how the verbosity of almost-duplicating the activeTapes stuff seems unappealing. That said, I think that you should point out in comments that you're calculating the number of maybe-active-in-some-merge tapes. They're maybe-active in that they have some number of real tapes. Not going to insist on that, but something to think about. * Shouldn't this use state->tapeRange?: > + else > + { > + numInputTapes = state->maxTapes - 1; > + numTapes = state->maxTapes; > + } * Doesn't it also set numTapes without it being used? Maybe that variable can be declared within "if (state->Level == 1)" block. * Minor issues with initSlabAllocator(): You call the new function initSlabAllocator() as follows: > + if (state->tuples) > + initSlabAllocator(state, numInputTapes + 1); > + else > + initSlabAllocator(state, 0); Isn't the number of slots (the second argument to initSlabAllocator()) actually just numInputTapes when we're "state->tuples"? And so, shouldn't the "+ 1" bit happen within initSlabAllocator() itself? It can just inspect "state->tuples" itself. In short, maybe push a bit more into initSlabAllocator(). Making the arguments match those passed to initTapeBuffers() a bit later would be nice, perhaps. * This could be simpler, I think: > + /* Release the read buffers on all the other tapes, by rewinding them. */ > + for (tapenum = 0; tapenum < state->maxTapes; tapenum++) > + { > + if (tapenum == state->result_tape) > + continue; > + LogicalTapeRewind(state->tapeset, tapenum, true); > + } Can't you just use state->tapeRange, and remove the "continue"? I recommend referring to "now-exhausted input tapes" here, too. * I'm not completely prepared to give up on using MemoryContextAllocHuge() within logtape.c just yet. Maybe you're right that it couldn't possibly matter that we impose a MaxAllocSize cap within logtape.c (per tape), but I have slight reservations that I need to address. Maybe a better way of putting it would be that I have some reservations about possible regressions at the very high end, with very large workMem. Any thoughts on that? -- Peter Geoghegan
pgsql-hackers by date: