Re: [HACKERS] Cost model for parallel CREATE INDEX - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [HACKERS] Cost model for parallel CREATE INDEX |
Date | |
Msg-id | CA+TgmoakYL5wfcpg8bnQViNiqwUbZnSpRYkNPVXT9-fxtvRzJw@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Cost model for parallel CREATE INDEX (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: [HACKERS] Cost model for parallel CREATE INDEX
|
List | pgsql-hackers |
On Thu, Mar 2, 2017 at 10:38 PM, Peter Geoghegan <pg@bowt.ie> wrote: > I'm glad. This justifies the lack of much of any "veto" on the > logarithmic scaling. The only thing that can do that is > max_parallel_workers_maintenance, the storage parameter > parallel_workers (maybe this isn't a storage parameter in V9), and > insufficient maintenance_work_mem per worker (as judged by > min_parallel_relation_size being greater than workMem per worker). > > I guess that the workMem scaling threshold thing could be > min_parallel_index_scan_size, rather than min_parallel_relation_size > (which we now call min_parallel_table_scan_size)? No, it should be based on min_parallel_table_scan_size, because that is the size of the parallel heap scan that will be done as input to the sort. >> I think it's totally counter-intuitive that any hypothetical index >> storage parameter would affect the degree of parallelism involved in >> creating the index and also the degree of parallelism involved in >> scanning it. Whether or not other systems do such crazy things seems >> to me to beside the point. I think if CREATE INDEX allows an explicit >> specification of the degree of parallelism (a decision I would favor) >> it should have a syntactically separate place for unsaved build >> options vs. persistent storage parameters. > > I can see both sides of it. > > On the one hand, it's weird that you might have query performance > adversely affected by what you thought was a storage parameter that > only affected the index build. On the other hand, it's useful that you > retain that as a parameter, because you may want to periodically > REINDEX, or have a way of ensuring that pg_restore does go on to use > parallelism, since it generally won't otherwise. (As mentioned > already, pg_restore does not trust the cost model due to issues with > the availability of statistics). If you make the changes I'm proposing above, this parenthetical issue goes away, because the only statistic you need is the table size, which is what it is. As to the rest, I think a bare REINDEX should just use the cost model as if it were CREATE INDEX, and if you want to override that behavior, you can do that by explicit syntax. I see very little utility for a setting that fixes the number of workers to be used for future reindexes: there won't be many of them, and it's kinda confusing. But even if we decide to have that, I see no justification at all for conflating it with the number of workers to be used for a scan, which is something else altogether. > To be clear, I don't have any strong feelings on all this. I just > think it's worth pointing out that there are reasons to not do what > you suggest, that you might want to consider if you haven't already. I have considered them. I also acknowledge that other people may view the situation differently than I do. I'm just telling you my opinion on the topic. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: