Re: Using quicksort for every external sort run - Mailing list pgsql-hackers
From | Jeff Janes |
---|---|
Subject | Re: Using quicksort for every external sort run |
Date | |
Msg-id | CAMkU=1wufLoS-rL_+tnBzNG_LpMj1rp1aMqnCEwL0qXxmRgmyA@mail.gmail.com Whole thread Raw |
In response to | Re: Using quicksort for every external sort run (Peter Geoghegan <pg@heroku.com>) |
Responses |
Re: Using quicksort for every external sort run
|
List | pgsql-hackers |
On Sat, Nov 28, 2015 at 4:05 PM, Peter Geoghegan <pg@heroku.com> wrote: > On Sat, Nov 28, 2015 at 2:04 PM, Jeff Janes <jeff.janes@gmail.com> wrote: ... >> >> The final merging is intermixed with whatever other work goes on to >> build the actual index files out of the sorted data, so I don't know >> exactly what the timing of just the merge part was. But it was >> certainly a minority of the time, even if you assume the actual index >> build were free. For the patched code, the majority of the time goes >> to the quick sorting stages. > > I'm not sure what you mean here. I had no point to make here, I was just trying to answer one of your questions about how much time was spent merging. I don't know, because it is interleaved with and not separately instrumented from the index build. > > I would generally expect that the merge phase takes significantly less > than sorting runs, regardless of how we sort runs, unless parallelism > is involved, where merging could dominate. The master branch has a > faster merge step, at least proportionally, because it has larger > runs. > >> When I test each version of the code at its own most efficient >> maintenance_work_mem, I get >> 3007.2 seconds at 1GB for patched and 3836.46 seconds at 64MB for unpatched. > > As I said, it seems a little bit unfair to hand-tune work_mem or > maintenance_work_mem like that. Who can afford to do that? I think you > agree that it's untenable to have DBAs allocate work_mem differently > for cases where an internal sort or external sort is expected; > workloads are just far too complicated and changeable. Right, I agree with all that. But I think it is important to know where the benefits come from. It looks like about half comes from being more robust to overly-large memory usage, and half from absolute improvements which you get at each implementations own best setting. Also, if someone had previously restricted work_mem (or more likely maintenance_work_mem) simply to avoid the large memory penalty, they need to know to revisit that decision. Although they still don't get any actual benefit from using too much memory, just a reduced penalty. I'm kind of curious as to why the optimal for the patched code appears at 1GB and not lower. If I get a chance to rebuild the test, I will look into that more. > >> I'm attaching the trace_sort output from the client log for all 4 of >> those scenarios. "sort_0005" means all 5 of your patches were >> applied, "origin" means none of them were. > > Thanks for looking at this. This is very helpful. It looks like the > server you used here had fairly decent disks, and that we tended to be > CPU bound more often than not. That's a useful testing ground. It has a Perc H710 RAID controller with 15,000 RPM drives, but it is also a virtualized system that has other stuff going on. The disks are definitely better than your average household computer, but I don't think they are anything special as far as real database hardware goes. It is hard to saturate the disks for sequential reads. It will be interesting to see what parallel builds can do. What would be next in reviewing the patches? Digging into the C-level implementation? Cheers, Jeff
pgsql-hackers by date: