Re: Implementing Sorting Refinements - Mailing list pgsql-hackers
From | Decibel! |
---|---|
Subject | Re: Implementing Sorting Refinements |
Date | |
Msg-id | E8E2C8B9-EA19-41A1-B477-F36D0DAC8413@decibel.org Whole thread Raw |
In response to | Implementing Sorting Refinements (Manolo _ <mac_man2005@hotmail.it>) |
Responses |
Re: Implementing Sorting Refinements
|
List | pgsql-hackers |
You'll get better response if you don't hijack threads. :) Also, there's nothing in here that describes what the benefits of this change are. On Jan 1, 2008, at 2:09 PM, Manolo _ wrote: > > Hi to all. > > This mail is aimed at asking some suggestion to face PostgreSQL > (PG) development to implement a refinement to PG source code. I'll > briefly summarize the idea of the "2-Way Replacement > Selection" (2WRS) refinement, just in case. If you already remember > what 2WRS is, you can please jump directly to the IMPLEMENTATION > ISSUES part of this mail. > > > 2WRS. > Refinement of the Replacement Selection (RS) technique currently > used by PG in src/backend/utils/sort/tuplesort.c . > The 2WRS uses two heaps instead of just one in order to create the > current (logical) run. Here there are some fundamental points of > the 2WRS technique: > - 'qsort' the initial unsorted 'memtuples' array > - divide the 'memtuples' array into two parts and each of those > will be managed as a heap > - one of the heaps will arrange its elements in ascending order, > while the other heap in descending order > - each heap will spill its current root in its corresponding run > (i.e.: we have a run per each of those two heaps), so we are > actually creating 2 physical current runs > - those 2 current physical runs could "theoretically" be merged > into the same logical run, actually we can make 'mergesort' think > they do belong to the same physical run. That reduces the number of > comparisons 'mergesort' has to do at each merge step (that means > less seek delay time on mass storage). We can also think the > average length of logical runs produced by 2WRS will probably be > greater than the average length produced by RS (again less seek > delay time: the longer each run the less number of final runs > produced, that means the less number of comparisons at each > 'mergesort' step). > > > IMPLEMENTATION ISSUES. > Where to place those heaps? > 1) I think that both heaps could be arranged on the same > 'memtuples' array. That allows easily subsequent resizing those > heaps according to their effective use or according to some > heuristic, without reallocating memory. > > How to arrange those heaps? > 2a) It's convenient to arrange those heaps "root to root". That is > arranging those heaps with their roots toward the center of > 'memtuples' (in a way we can say they lay "face to face", or "root > to root" as said before) while their leaves lay towards the extreme > indexes of the 'memtuples' array (that is the last leaf of one heap > will lay at index 0, the last leaf of the other heap laying at > index memtupsize-1. This arrangement prevents overlapping elements > between those physical runs associated to the same current logical > run. > PRO: once we qsort memtuples and divide it into 2 parts we already > get those two heaps, no need to build them. > CONTRA: ??? > > 2b) As in 2a) but arranging heaps "leaf to leaf", that is their > roots will lay at the extreme indexes of 'memtuples' while their > leaves towards the center of the 'memtuples' array. > Or even start building heaps as soon as we get initial elements, > instead of qsort the whole 'memtuples' array. > Any PRO/CONTRA compared to 2a)??? > > Current run numbers > I think I should duplicate the 'int currentRun' variable in the > Tuplesortstate struct. I'll replace it with a 'int currentRunUP' > and 'int currentRunDOWN' variables in order to distinguish those > two physical runs associated to those 2 heaps. In this case I will > give a run number (max{currentRunUP,currentRunDOWN} + 1) to > elements not belonging to the current logical run. I suppose no > need to touch 'long availMem' nor 'long allowedMem' variables nor > any others. > > Heap functions > I will duplicate all the heap management functions in order to > adapt them to the kind of heap they should be applied to (for > example, the tuplesort_heap_siftup function should be replaced with > tuplesort_heap_siftupUP and tuplesort_heap_siftupDOWN functions). > > Merge Plan > This technique would use a sort of "merge plan" to instruct > mergesort on how to use those physical runs. Actually mergesort > should consider at first "odd runs" before "pair runs". That is, > for example, mergesort should start merging runs with run number > 1,3,5,7,... and when run number X terminates start considering run > number X+1. Obviously that doesn't need any merge plan, but I > remember someone else (as Simon Riggs) was interested in sorting > improvements so it could be a good thing to know if I should > consider any conventions or paramethers in order to possibly create > that merge plan. > > > DEVELOPMENT CONTEXT > I preferred to use the "last stable release" at the moment, that is > 8.2. > > > Any comment/suggestion/advice ? > > Thanks for your attention and for your time. > Regards, Manolo. > _________________________________________________________________ > Express yourself instantly with MSN Messenger! Download today it's > FREE! > http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ > ---------------------------(end of > broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
pgsql-hackers by date: