Re: Sorting Improvements for 8.4 - Mailing list pgsql-hackers
From | Mark Mielke |
---|---|
Subject | Re: Sorting Improvements for 8.4 |
Date | |
Msg-id | 47698466.4070609@mark.mielke.cc Whole thread Raw |
In response to | Re: Sorting Improvements for 8.4 (Jeff Davis <pgsql@j-davis.com>) |
Responses |
Re: Sorting Improvements for 8.4
Re: Sorting Improvements for 8.4 Re: Sorting Improvements for 8.4 |
List | pgsql-hackers |
Jeff Davis wrote: <blockquote cite="mid:1198088313.28804.387.camel@dogma.ljc.laika.com" type="cite"><pre wrap="">On Wed,2007-12-19 at 12:08 -0500, Mark Mielke wrote: </pre><blockquote type="cite"><pre wrap="">Stupid question #2: Is it wellrecognized that the CPU is the bottleneck in the PostgreSQL sorting mechanism? Or might it be memory bandwidth and I/O? </pre></blockquote><pre wrap="">I think it depends a lot on several factors. It's probably a different bottleneck for integers versus localized text, and depends on the available memory and I/O characteristics. </pre></blockquote> Makes sense.<br /><br /><blockquote cite="mid:1198088313.28804.387.camel@dogma.ljc.laika.com"type="cite"><blockquote type="cite"><pre wrap="">It would seem tome that any sort worth parallelizing (administrative and synchronization overhead), must have data larger than the L2 cache. If larger than the L2 cache, it becomes real memory speed. If real memory speed, wouldn't one CPU without hardware synchronization, be able to fill the memory read/write pipe? </pre></blockquote><pre wrap="">We do an external merge sort, which involvesmerging M runs together. You seem to be implying that we can generate the output run at disk speed, and therefore the CPU speed is unimportant. </pre></blockquote> Correct. Or, alternatively, you could achieve thesame effect using asychronous I/O or read ahead.<br /><blockquote cite="mid:1198088313.28804.387.camel@dogma.ljc.laika.com"type="cite"><pre wrap="">I suspect that the comparison costs areenough that the above statement isn't true in all cases, particularly in the case of localized text.</pre></blockquote> That sounds possible, but I stillfeel myself suspecting that disk reads will be much slower than localized text comparison. Perhaps I am overestimatingthe performance of the comparison function?<br /><br /><blockquote cite="mid:1198088313.28804.387.camel@dogma.ljc.laika.com"type="cite"><pre wrap=""> Also, there is probably a lot of memory copying going on, and that probably destroys a lot of the effectiveness of L2 caching. When L2 caching is ineffective, the CPU spends a lot of time just waiting on memory. In that case, it's better to have P threads of execution all waiting on memory operations in parallel. </pre></blockquote> I didn't consider the high throughput / high latency effect.This could be true if the CPU prefetch isn't effective enough.<br /><br /><blockquote cite="mid:1198088313.28804.387.camel@dogma.ljc.laika.com"type="cite"><pre wrap=""> This would explain why "1p2t" would outperform a "1p1t" in Ron's reference above. These are just my first thoughts, however. There is a lot of existing research out there that we can look into, and also a lot of tests that we can run before jumping into this. I think parallel sorting can be looked into separately from the other sorting improvements. </pre></blockquote> Yep - I started to read up on it. It still sounds like it's a hard-ish problem(to achieve near N times speedup for N CPU cores without degrading performance for existing loads), but that doesn'tmean impossible. :-)<br /><br /> Cheers,<br /> mark<br /><br /><pre class="moz-signature" cols="72">-- Mark Mielke <a class="moz-txt-link-rfc2396E" href="mailto:mark@mielke.cc"><mark@mielke.cc></a> </pre>
pgsql-hackers by date: