Re: Using quicksort for every external sort run - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Using quicksort for every external sort run |
Date | |
Msg-id | CA+Tgmob9ssG0XB4-GzPARCe0C8gXzHBCrAzNyout1oWj+9Nc9Q@mail.gmail.com Whole thread Raw |
In response to | Re: Using quicksort for every external sort run (Peter Geoghegan <pg@heroku.com>) |
Responses |
Re: Using quicksort for every external sort run
|
List | pgsql-hackers |
On Thu, Nov 19, 2015 at 3:43 PM, Peter Geoghegan <pg@heroku.com> wrote: >> I'd be interested in seeing this analysis in some detail. > > Sure. Jeff mentioned 8MB as a work_mem setting, so let's examine a > case where that's the work_mem setting, and see experimentally where > the crossover point for a multi-pass sort ends up. > > If this table is created: > > postgres=# create unlogged table bar as select (random() * 1e9)::int4 > idx, 'payload xyz'::text payload from generate_series(1, 10100000) i; > SELECT 10100000 > > Then, on my system, a work_mem setting of 8MB *just about* avoids > seeing the multipass_warning message with this query: > > postgres=# select count(distinct idx) from bar ; > > count > ------------ > 10,047,433 > (1 row) > > A work_mem setting of 235MB is just enough to make the query's sort > fully internal. > > Let's see how things change with a higher work_mem setting of 16MB. I > mentioned quadratic growth: Having doubled work_mem, let's *quadruple* > the number of tuples, to see where this leaves a 16MB setting WRT a > multi-pass merge: > > postgres=# drop table bar ; > DROP TABLE > postgres=# create unlogged table bar as select (random() * 1e9)::int4 > idx, 'payload xyz'::text payload from generate_series(1, 10100000 * 4) > i; > SELECT 40400000 > > Further experiments show that this is the exact point at which the > 16MB work_mem setting similarly narrowly avoids a multi-pass warning. > This should be the dominant consideration, because now a fully > internal sort requires 4X the work_mem of my original 16MB work_mem > example table/query. > > The quadratic growth in a simple hybrid sort-merge strategy's ability > to avoid a multi-pass merge phase (growth relative to linear increases > in work_mem) can be demonstrated with simple experiments. OK, so reversing this analysis, with the default work_mem of 4MB, we'd need a multi-pass merge for more than 235MB/4 = 58MB of data. That is very, very far from being a can't-happen scenario, and I would not at all think it would be acceptable to ignore such a case. Even ignoring the possibility that someone with work_mem = 8MB will try to sort 235MB of data strikes me as out of the question. Those seem like entirely reasonable things for users to do. Greg's example of someone with work_mem = 1GB trying to sort 4TB does not seem like a crazy thing to me. Yeah, in all of those cases you might think that users should set work_mem higher, but that doesn't mean that they actually do. Most systems have to set work_mem very conservatively to make sure they don't start swapping under heavily load. I think you need to revisit your assumptions here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: