Re: Disable parallel query by default - Mailing list pgsql-hackers

From Scott Mead
Subject Re: Disable parallel query by default
Date
Msg-id 947e64fd-3e1b-40f2-acf3-2b77a358512c@app.fastmail.com
Whole thread Raw
In response to Re: Disable parallel query by default  (Laurenz Albe <laurenz.albe@cybertec.at>)
Responses Re: Disable parallel query by default
List pgsql-hackers
On Wed, May 14, 2025, at 4:06 AM, Laurenz Albe wrote:
> On Tue, 2025-05-13 at 17:53 -0400, Scott Mead wrote:
> > On Tue, May 13, 2025, at 5:07 PM, Greg Sabino Mullane wrote:
> > > On Tue, May 13, 2025 at 4:37 PM Scott Mead <scott@meads.us> wrote:
> > > > I'll open by proposing that we prevent the planner from automatically
> > > > selecting parallel plans by default
> > >
> > > That seems a pretty heavy hammer, when we have things like
> > > parallel_setup_cost that should be tweaked first.
> >
> > I agree it's a big hammer and I thought through parallel_setup_cost
> > quite a bit myself.  The problem with parallel_setup_cost is that it
> > doesn't actually represent the overhead of a setting up parallel
> > query for a busy system.  It does define the cost of setup for a
> > *single* parallel session, but it cannot accurately express the
> > cost of CPU and other overhead associated with the second, third,
> > fourth, etc... query that is executed as parallel.  The expense to
> > the operating system is a function of the _rate_ of parallel query
> > executions being issued.  Without new infrastructure, there's no way
> > to define something that will give me a true representation of the
> > cost of issuing a query with parallelism.
>
> There is no way for the optimizer to represent that your system is
> under CPU overload currently.  But I agree with Greg that
> parallel_setup_cost is the setting that should be adjusted.
> If PostgreSQL is more reluctant to even start considering a parallel plan,
> that would be a move in the right direction in a case like this:
>
> > > > What is the fallout?  When a high-volume, low-latency query flips to
> > > > parallel execution on a busy system, we end up in a situation where
> > > > the database is effectively DDOSing itself with a very high rate of
> > > > connection establish and tear-down requests.  Even if the query ends
> > > > up being faster (it generally does not), the CPU requirements for the
> > > > same workload rapidly double or worse, with most of it being spent
> > > > in the OS (context switch, fork(), destroy()).  When looking at the
> > > > database, you'll see a high load average, and high wait for CPU with
> > > > very little actual work being done within the database.
>
> You are painting a bleak picture indeed.  I get to see PostgreSQL databases
> in trouble regularly, but I have not seen anything like what you describe.
> If a rather cheap, very frequent query is suddenly estimated to be
> expensive enough to warrant a parallel plan, I'd suspect that the estimates
> must be seriously off.
>
> With an argument like that, you may as well disable nested loop joins.
> I have seen enough cases where disabling nested loop joins, without any
> deeper analysis, made very slow queries reasonably fast.

My argument is that parallel query should not be allowed to be invoked without user intervention.  Yes, nestedloop can
havea similar impact, but let's take a look at the breakdown at scale of PQ:  

1. pgbench -i -s 100

2. Make a query that will execute in parallel

SELECT aid, a.bid, bbalance
   FROM pgbench_accounts a, pgbench_branches b
WHERE a.bid = b.bid
ORDER BY bbalance desc;

Non Parallel query = 4506.559 ms
Parallel query = 2849.073

Arguably, much better.

3. Use pgbench to execute these with a concurrency of 10, rate limit of 1 tps

pgbench -R 1 -r -T 120 -P 5 --no-vacuum -f pselect.sql -c 10

4. The parallel query was executing ~ 2.8 seconds in isolation, but when running with 10 concurrent sessions, breaks
downto 5.8 seconds the non-parallel version executes on average of 5.5 seconds.  You've completely erased the gains and
onlyhave a concurrency of 5 (that's with max_parallel_workers = 8).  If you increase max_parallel_workers, this quickly
becomesworse. 

Even though parallel query is faster in isolation, even a small amount of concurrency has a quickly compounding effect
thedegrades very quickly (again, defaults with a 16 processor machine). 

Concurrency - Non Parallel Runtime - Parallel Runtime
              1        -       5003.951              -   3681.452
              5        -      4936.565              -   4565.171
             10        -      5573.239              -   5894.397
             15       -       6224.292              -   8470.982
             20      -       5632.948              -   13277.857

Even with max_parallel_workers protecting us with '8' (default), we erase our advantage by the time we go to
concurrencyof 5 clients.   

Going back to the original commit which enabled PQ by default[1], it was done so that the feature would be tested
duringbeta.  I think it's time that we limit the accidental impact this can have to users by disabling the feature by
default. 



[1]-
https://github.com/postgres/postgres/commit/77cd477c4ba885cfa1ba67beaa82e06f2e182b85

"
Enable parallel query by default.
Change max_parallel_degree default from 0 to 2.  It is possible that
this is not a good idea, or that we should go with 1 worker rather
than 2, but we won't find out without trying it.  Along the way,
reword the documentation for max_parallel_degree a little bit to
hopefully make it more clear.

Discussion: 20160420174631.3qjjhpwsvvx5bau5@alap3.anarazel.de
"

>
> Sure enough, I often see systems where I recommend disabling parallel
> query - in fact, whenever throughput is more important than response time.
> But I also see many cases where parallel query works just like it should
> and leads to a better user experience.
>
> I have come to disable JIT by default, but not parallel query.
>
> The primary problem that I encounter with parallel query is that dynamic
> shared memory segments grow to a size where they cause OOM errors.
> That's the most frequent reason for me to recommend disabling parallel query.
>
> Yours,
> Laurenz Albe
>

--
Scott Mead
Amazon Web Services
scott@meads.us



pgsql-hackers by date:

Previous
From: Nico Williams
Date:
Subject: Re: Violation of principle that plan trees are read-only
Next
From: Bennie Swart
Date:
Subject: Re: Join removal and attr_needed cleanup