Re: Disable parallel query by default - Mailing list pgsql-hackers
From | Scott Mead |
---|---|
Subject | Re: Disable parallel query by default |
Date | |
Msg-id | d30e2e13-0105-4afb-9e05-e4755bd92932@app.fastmail.com Whole thread Raw |
In response to | Re: Disable parallel query by default (Laurenz Albe <laurenz.albe@cybertec.at>) |
List | pgsql-hackers |
On Wed, May 21, 2025, at 3:50 AM, Laurenz Albe wrote: > On Tue, 2025-05-20 at 16:58 -0400, Scott Mead wrote: > > On Wed, May 14, 2025, at 4:06 AM, Laurenz Albe wrote: > > > On Tue, 2025-05-13 at 17:53 -0400, Scott Mead wrote: > > > > On Tue, May 13, 2025, at 5:07 PM, Greg Sabino Mullane wrote: > > > > > On Tue, May 13, 2025 at 4:37 PM Scott Mead <scott@meads.us> wrote: > > > > > > I'll open by proposing that we prevent the planner from automatically > > > > > > selecting parallel plans by default > > > > > > > > > What is the fallout? When a high-volume, low-latency query flips to > > > > > > parallel execution on a busy system, we end up in a situation where > > > > > > the database is effectively DDOSing itself with a very high rate of > > > > > > connection establish and tear-down requests. > > > > > > You are painting a bleak picture indeed. I get to see PostgreSQL databases > > > in trouble regularly, but I have not seen anything like what you describe. > > > > > > With an argument like that, you may as well disable nested loop joins. > > > I have seen enough cases where disabling nested loop joins, without any > > > deeper analysis, made very slow queries reasonably fast. > > > > My argument is that parallel query should not be allowed to be invoked without > > user intervention. Yes, nestedloop can have a similar impact, but let's take > > a look at the breakdown at scale of PQ: > > > > [pgbench run that shows that parallel query is bad for throughput] > > I think that your experiment is somewhat misleading. Sure, if you > overload the machine with parallel workers, that will eventually also > harm the query response time. But many databases out there are not > overloaded, and the shorter response time that parallel query offers > makes many users happy. It's not intended to be misleading, sorry for that. I agree that PQ can have a positive effect, the point is that our currentdefaults will very quickly take a basic workload on a modest (16 CPU box) and quickly swamp it with a concurrencyof 5, which is counter-intuitive, hard to debug, and usually not desired (again, in the case of a plan that silentlyinvokes parallelism). FWIW, setting max_parallel_workers_per_gather to 0 by default only disables automatic PQ selection behind a SIGHUP (or witha user context), users can easily re-enable it if they think want without having to restart (similar to parallel_setup_cost,but without the uncertainty). During my testing, I actually found (again, at concurrency = 5) that the default max_parallel_workers and max_worker_processesof 8 is not high enough. If the default max_parallel_workers_per_gather is 0, then we'd be able to tocrank those defaults up (especially max_worker_processes which requires a restart). > > It is well known that what is beneficial for response time is detrimental > for the overall throughput and vice versa. It is well-known. What's not is that the postgres defaults will quickly swamp a machine with parallelism. That's a lessonthat many only learn after it's happened to them. ISTM that the better path is to let someone try to optimize withparallelism rather than have to fight with it during an emergent event. IOW: I'd rather know that I'm walking into a marsh with rattlesnakes rather than find out after I'd been bitten. > Now parallel query clearly is a feature that is good for response time > and bad for throughput, but that is not necessarily wrong. Agreed, I do like and use parallel query. I just don't think it's wise that we allow that planner to make that decision ona user's behalf when the overhead is this high and the concurrency behavior falls apart so spectacularly fast. > > Essentially, you are arguing that the default configuration should favor > throughput over response time. That's one take on it, I'm actually saying that the default configuration should protect medium-sized systems from unintendedbehavior that quickly degrades performance while being very hard to identify and quantify. > > > Going back to the original commit which enabled PQ by default[1], it was > > done so that the feature would be tested during beta. I think it's time > > that we limit the accidental impact this can have to users by disabling > > the feature by default. > > I disagree. > My experience is that parallel query often improves the user experience. > Sure, there are cases where I recommend disabling it, but I think that > disabling it by default would be a move in the wrong direction. > > On the other hand, I have also seen cases where bad estimates trigger > parallel query by mistake, making queries slower. So I'd support an > effort to increase the default value for "parallel_setup_cost". I'm open to discussing a value for parallel_setup_cost that protects users from runaway here, I just haven't been able tofind a value that allows users to be protected while simultaneously allowing users who want automatic parallel-plan selectionto take advantage of it. What I've found (and it sounds somewhat similar to what you are saying) is that if you use parallelism intentionally anddesign for it (hardware, concurrency model, etc...) it's very, very powerful. In cases where it 'just kicks in', I haven'tseen an example that makes users happy. > > Yours, > Laurenz Albe > -- Scott Mead Amazon Web Services scott@meads.us
pgsql-hackers by date: