Re: Using quicksort for every external sort run - Mailing list pgsql-hackers
From | Greg Stark |
---|---|
Subject | Re: Using quicksort for every external sort run |
Date | |
Msg-id | CAM-w4HNVaKk3q50=yWEAU-yGiV_mwQGZNBvBj7UhZW6C9z6ccw@mail.gmail.com Whole thread Raw |
In response to | Re: Using quicksort for every external sort run (Peter Geoghegan <pg@heroku.com>) |
Responses |
Re: Using quicksort for every external sort run
Re: Using quicksort for every external sort run Re: Using quicksort for every external sort run |
List | pgsql-hackers |
On Thu, Nov 19, 2015 at 6:56 PM, Peter Geoghegan <pg@heroku.com> wrote: > Yes, I really do mean it when I say that the DBA is not supposed to > see this message, no matter how much or how little memory or data is > involved. There is no nuance intended here; it isn't sensible to allow > a multi-pass sort, just as it isn't sensible to allow checkpoints > every 5 seconds. Both of those things can be thought of as thrashing. Hm. So a bit of back-of-envelope calculation. If we have want to buffer at least 1MB for each run -- I think we currently do more actually -- and say that a 1GB work_mem ought to be enough to run reasonably (that's per sort after all and there might be multiple sorts to say nothing of other users on the system). That means we can merge about 1,000 runs in the final merge. Each run will be about 2GB currently but 1GB if we quicksort the runs. So the largest table we can sort in a single pass is 1-2 TB. If we go above those limits we have the choice of buffering less per run or doing a whole second pass through the data. I suspect we would get more horsepower out of buffering less though I'm not sure where the break-even point is. Certainly if we did random I/O for every I/O that's much more expensive than a factor of 2 over sequential I/O. We could probably do the math based on random_page_cost and sequential_page_cost to calculate the minimum amount of buffering before it's worth doing an extra pass. So I think you're kind of right and kind of wrong. The vast majority of use cases are either sub 1TB or are in work environments designed specifically for data warehouse queries where a user can obtain much more memory for their queries. However I think it's within the intended use cases that Postgres should be able to handle a few terabytes of data on a moderately sized machine in a shared environment too. Our current defaults are particularly bad for this though. If you initdb a new Postgres database today and create a table even a few gigabytes and try to build an index on it it takes forever. The last time I did a test I canceled it after it had run for hours, raised maintenance_work_mem and built the index in a few minutes. The problem is that if we just raise those limits then people will use more resources when they don't need it. If it were safer for to have those limits be much higher then we could make the defaults reflect what people want when they do bigger jobs rather than just what they want for normal queries or indexes. > I think that if the DBA ever sees the multipass_warning message, he or she does not have an OLTP workload. Hm, that's pretty convincing. I guess this isn't the usual sort of warning due to the time it would take to trigger. -- greg
pgsql-hackers by date: