Home > mailing lists

Re: dynamic background workers - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: dynamic background workers
Date	June 20, 2013 15:29:32
Msg-id	CA+TgmoZ73afXXADphkvD0jgwDj7A8+RsyPxr4KNXLW--qonoxA@mail.gmail.com Whole thread Raw
In response to	Re: dynamic background workers (Markus Wanner <markus@bluegap.ch>)
Responses	Re: dynamic background workers
List	pgsql-hackers

Tree view

On Thu, Jun 20, 2013 at 10:59 AM, Markus Wanner <markus@bluegap.ch> wrote:
> On 06/20/2013 04:41 PM, Robert Haas wrote:
>> The constant factor is also very small.  Generally, I would expect
>> num_worker_processes <~ # CPUs
>
> That assumption might hold for parallel querying, yes. In case of
> Postgres-R, it doesn't. In the worst case, i.e. with a 100% write load,
> a cluster of n nodes, each with m backends performing transactions, all
> of them replicated to all other (n-1) nodes, you end up with ((n-1) * m)
> bgworkers. Which is pretty likely to be way above the # CPUs on any
> single node.
>
> I can imagine other extensions or integral features like autonomous
> transactions that might possibly want many more bgworkers as well.

Yeah, maybe.  I think in general it's not going to work great to have
zillions of backends floating around, because eventually the OS
scheduler overhead - and the memory overhead - are going to become
pain points.  And I'm hopeful that autonomous transactions can be
implemented without needing to start a new backend for each one,
because that sounds pretty expensive.  Some users of other database
products will expect autonomous transactions to be cheap; aside from
that, cheap is better than expensive.  But we will see.  At any rate I
think your basic point is that people might end up creating a lot more
background workers than I'm imagining, which is certainly a fair
point.

>> and scanning a 32, 64, or even 128
>> element array is not a terribly time-consuming operation.
>
> I'd extend that to say scanning an array with a few thousand elements is
> not terribly time-consuming, either. IMO the simplicity is worth it,
> ATM. It's all relative to your definition of ... eh ... "terribly".
>
> .oO( ... premature optimization ... all evil ... )

Yeah, that thing.

>> One thing I think we probably want to explore in the future, for both
>> worker backends and regular backends, is pre-forking.  We could avoid
>> some of the latency associated with starting up a new backend or
>> opening a new connection in that way.  However, there are quite a few
>> details to be thought through there, so I'm not eager to pursue that
>> just yet.  Once we have enough infrastructure to implement meaningful
>> parallelism, we can benchmark it and find out where the bottlenecks
>> are, and which solutions actually help most.
>
> Do you mean pre-forking and connecting to a specific database? Or really
> just the forking?

I've considered both at various times, although in this context I was
mostly thinking about just the forking. Pre-connecting to a specific
database would save an unknown but possibly significant amount of
additional latency.  Against that, it's more complex (because we've
got to track which preforked workers are associated with which
databases) and there's some cost to guessing wrong (because then we're
keeping workers around that we can't use, or maybe even having to turn
around and kill them to make slots for the workers we actually need).
I suspect we'll want to pursue the idea at some point but it's not
near the top of my list.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Andrew Dunstan
Date: 20 June 2013, 15:16:59
Subject: Re: dump difference between 9.3 and master after upgrade

From: Andres Freund
Date: 20 June 2013, 15:44:52
Subject: Re: dynamic background workers

Re: dynamic background workers - Mailing list pgsql-hackers

Previous

Next