Re: polyphase merge? - Mailing list pgsql-hackers
From | Don Marvick |
---|---|
Subject | Re: polyphase merge? |
Date | |
Msg-id | d18e24870902032242v4d868296o39774f928ad76ff0@mail.gmail.com Whole thread Raw |
In response to | polyphase merge? (Don Marvick <donmarvick@gmail.com>) |
Responses |
Re: polyphase merge?
Re: polyphase merge? |
List | pgsql-hackers |
Dear All,<br /><br />Since nobody replied, I would give it a try. I am going to implement the merge pattern described inKnuth Page 365 (5.4.9), essentially it is as follow:<br />- create initial runs using replacement selection (basicallythis is as in the current implementation)<br /> - add enough dummy runs of size 0 so that the number of sortedrun minus one can be divided by k-1 (k is merge fan in)<br />- repetitively merge k smallest runs until only 1 runleft<br /><br />I am new to postgresql, hence any advice would be appreciated.<br /><br />Can anybody give me some adviceon how it can done? <br />1. how a run should be represented? should I reuse the tape mechanism? e.g. 1 tape 1 run<br/>- or should I use a temp buffer?<br /><br />2. How memory should be allocated? I assume I divide the memory equallyto k runs, hence each run will get M/k memory. Each read of a run would be of size M/k bytes. <br /><br />3. Prefetching.Then, we can precompute the read sequence of blocks of run during the entire merge process. Based on this, weknow the blocks of run to be prefetched at a point of time.<br /><br />3. Is it possible to perform read/write I/O whilethe merge is being performed? Hence we overlap I/O and computation.<br /><br />4. any other issue needs consideration?<br/><br />Best regards,<br /><br />Don<br /><br /><div class="gmail_quote">On Thu, Jan 29, 2009 at 4:11 PM,Don Marvick <span dir="ltr"><<a href="mailto:donmarvick@gmail.com">donmarvick@gmail.com</a>></span> wrote:<br /><blockquoteclass="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left:1ex;">Dear All,<br /><br />I apologize if this has been discussed before. I have tried to search to the mailinglist and could not find an exact answer.<br /><br />Currently, Postgres uses Knuth's Algorithm 5.4.2D to perform externalmerge sort. IMHO the algorithm is designed for tapes. <br /><br />Why don't the simple text book merge pattern describedin Knuth Page 365 (5.4.9) is used for disk based system? The same merge pattern is also described in Ramakrishnantext book and it is optimal if seek time is not counted, which of course not the real-world case.<br /><br />Bestregards,<br /><font color="#888888"><br />Don<br /><br /></font></blockquote></div><br />
pgsql-hackers by date: