Re: dynamic background workers - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: dynamic background workers |
Date | |
Msg-id | CA+TgmoZ73afXXADphkvD0jgwDj7A8+RsyPxr4KNXLW--qonoxA@mail.gmail.com Whole thread Raw |
In response to | Re: dynamic background workers (Markus Wanner <markus@bluegap.ch>) |
Responses |
Re: dynamic background workers
|
List | pgsql-hackers |
On Thu, Jun 20, 2013 at 10:59 AM, Markus Wanner <markus@bluegap.ch> wrote: > On 06/20/2013 04:41 PM, Robert Haas wrote: >> The constant factor is also very small. Generally, I would expect >> num_worker_processes <~ # CPUs > > That assumption might hold for parallel querying, yes. In case of > Postgres-R, it doesn't. In the worst case, i.e. with a 100% write load, > a cluster of n nodes, each with m backends performing transactions, all > of them replicated to all other (n-1) nodes, you end up with ((n-1) * m) > bgworkers. Which is pretty likely to be way above the # CPUs on any > single node. > > I can imagine other extensions or integral features like autonomous > transactions that might possibly want many more bgworkers as well. Yeah, maybe. I think in general it's not going to work great to have zillions of backends floating around, because eventually the OS scheduler overhead - and the memory overhead - are going to become pain points. And I'm hopeful that autonomous transactions can be implemented without needing to start a new backend for each one, because that sounds pretty expensive. Some users of other database products will expect autonomous transactions to be cheap; aside from that, cheap is better than expensive. But we will see. At any rate I think your basic point is that people might end up creating a lot more background workers than I'm imagining, which is certainly a fair point. >> and scanning a 32, 64, or even 128 >> element array is not a terribly time-consuming operation. > > I'd extend that to say scanning an array with a few thousand elements is > not terribly time-consuming, either. IMO the simplicity is worth it, > ATM. It's all relative to your definition of ... eh ... "terribly". > > .oO( ... premature optimization ... all evil ... ) Yeah, that thing. >> One thing I think we probably want to explore in the future, for both >> worker backends and regular backends, is pre-forking. We could avoid >> some of the latency associated with starting up a new backend or >> opening a new connection in that way. However, there are quite a few >> details to be thought through there, so I'm not eager to pursue that >> just yet. Once we have enough infrastructure to implement meaningful >> parallelism, we can benchmark it and find out where the bottlenecks >> are, and which solutions actually help most. > > Do you mean pre-forking and connecting to a specific database? Or really > just the forking? I've considered both at various times, although in this context I was mostly thinking about just the forking. Pre-connecting to a specific database would save an unknown but possibly significant amount of additional latency. Against that, it's more complex (because we've got to track which preforked workers are associated with which databases) and there's some cost to guessing wrong (because then we're keeping workers around that we can't use, or maybe even having to turn around and kill them to make slots for the workers we actually need). I suspect we'll want to pursue the idea at some point but it's not near the top of my list. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: