Home > mailing lists

Re: Using quicksort for every external sort run - Mailing list pgsql-hackers

From	Greg Stark
Subject	Re: Using quicksort for every external sort run
Date	November 19, 2015 20:36:09
Msg-id	CAM-w4HNVaKk3q50=yWEAU-yGiV_mwQGZNBvBj7UhZW6C9z6ccw@mail.gmail.com Whole thread Raw
In response to	Re: Using quicksort for every external sort run (Peter Geoghegan <pg@heroku.com>)
Responses	Re: Using quicksort for every external sort run Re: Using quicksort for every external sort run Re: Using quicksort for every external sort run
List	pgsql-hackers

Tree view

On Thu, Nov 19, 2015 at 6:56 PM, Peter Geoghegan <pg@heroku.com> wrote:
> Yes, I really do mean it when I say that the DBA is not supposed to
> see this message, no matter how much or how little memory or data is
> involved. There is no nuance intended here; it isn't sensible to allow
> a multi-pass sort, just as it isn't sensible to allow checkpoints
> every 5 seconds. Both of those things can be thought of as thrashing.

Hm. So a bit of back-of-envelope calculation. If we have want to
buffer at least 1MB for each run -- I think we currently do more
actually -- and say that a 1GB work_mem ought to be enough to run
reasonably (that's per sort after all and there might be multiple
sorts to say nothing of other users on the system). That means we can
merge about 1,000 runs in the final merge. Each run will be about 2GB
currently but 1GB if we quicksort the runs. So the largest table we
can sort in a single pass is 1-2 TB.

If we go above those limits we have the choice of buffering less per
run or doing a whole second pass through the data. I suspect we would
get more horsepower out of buffering less though I'm not sure where
the break-even point is. Certainly if we did random I/O for every I/O
that's much more expensive than a factor of 2 over sequential I/O. We
could probably do the math based on random_page_cost and
sequential_page_cost to calculate the minimum amount of buffering
before it's worth doing an extra pass.

So I think you're kind of right and kind of wrong. The vast majority
of use cases are either sub 1TB or are in work environments designed
specifically for data warehouse queries where a user can obtain much
more memory for their queries. However I think it's within the
intended use cases that Postgres should be able to handle a few
terabytes of data on a moderately sized machine in a shared
environment too.

Our current defaults are particularly bad for this though. If you
initdb a new Postgres database today and create a table even a few
gigabytes and try to build an index on it it takes forever. The last
time I did a test I canceled it after it had run for hours, raised
maintenance_work_mem and built the index in a few minutes. The problem
is that if we just raise those limits then people will use more
resources when they don't need it. If it were safer for to have those
limits be much higher then we could make the defaults reflect what
people want when they do bigger jobs rather than just what they want
for normal queries or indexes.

> I think that if the DBA ever sees the multipass_warning message, he or she does not have an OLTP workload.

Hm, that's pretty convincing. I guess this isn't the usual sort of
warning due to the time it would take to trigger.

-- 
greg

pgsql-hackers by date:

From: Simon Riggs
Date: 19 November 2015, 20:13:04
Subject: Re: Using quicksort for every external sort run

From: Peter Geoghegan
Date: 19 November 2015, 20:43:58
Subject: Re: Using quicksort for every external sort run

Re: Using quicksort for every external sort run - Mailing list pgsql-hackers

Previous

Next