Thread: Database disconnection and switch inside a single bgworker
Hi all, Currently, bgworkers offer the possibility to connect to a given database using BackgroundWorkerInitializeConnection in bgworker.h, but there is actually no way to disconnect from a given database inside the same bgworker process. One of the use cases for that would be the possibility to have the same bgworker performing for example some analysis on a database A, like some analysis of statistics using a common database like postgres, and then perform some actions on another database like, let's imagine an ANALYSE on a given relation only on database B. Using the infrastructure of 9.4 as of now, it would be possible of course to have a bgworker process launching some other child processes dynamically on different databases, but users (including me) might not want to do that all the time. Database disconnection would be also pretty cool for things like parallel query processing using a pool of bgworker processes that all the backends could use in parallel as a single ressource. Note that I didn't have a look at the code yet to see how it would be possible to do that (looks tricky though), if any infrastructure is needed or if it could be possible to do that without modifying the core code of Postgres. So, opinions about that as well as additional thoughts are welcome! Regards, -- Michael
Michael Paquier <michael.paquier@gmail.com> writes: > Currently, bgworkers offer the possibility to connect to a given > database using BackgroundWorkerInitializeConnection in bgworker.h, but > there is actually no way to disconnect from a given database inside > the same bgworker process. That's isomorphic to having a backend switch to a different database, which occasionally gets requested, but there is no real likelihood that we'll ever implement. The problem is, how can you be sure you have flushed all the database-specific state that's been built up? The relcache and catcaches are only the tip of the iceberg; we've got caches all over the place. And once you had flushed that data, you'd have to recreate it --- but the code for doing so is intimately intertwined with connection startup tasks that you'd most likely not want to repeat. And, once you'd done all that work, what would you have? A database switch methodology that would save a fork(), but not much else. The time to warm up the caches wouldn't be any better than in a fresh process. The cost/benefit ratio for making this work just doesn't look very promising. That's why autovacuum is built the way it is. regards, tom lane
On Fri, Nov 15, 2013 at 10:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Michael Paquier <michael.paquier@gmail.com> writes: >> Currently, bgworkers offer the possibility to connect to a given >> database using BackgroundWorkerInitializeConnection in bgworker.h, but >> there is actually no way to disconnect from a given database inside >> the same bgworker process. > > That's isomorphic to having a backend switch to a different database, > which occasionally gets requested, but there is no real likelihood > that we'll ever implement. The problem is, how can you be sure you > have flushed all the database-specific state that's been built up? > The relcache and catcaches are only the tip of the iceberg; we've > got caches all over the place. And once you had flushed that data, > you'd have to recreate it --- but the code for doing so is intimately > intertwined with connection startup tasks that you'd most likely not > want to repeat. > > And, once you'd done all that work, what would you have? A database > switch methodology that would save a fork(), but not much else. > The time to warm up the caches wouldn't be any better than in a > fresh process. Well, you'd have whatever backend-local state you had accumulated apart from stuff in the caches. It's clearly not useless, especially for background workers. And you might actually save a little bit, because I think we established previously that a good fraction of the startup cost was actually page faults, which would not need to be re-incurred. But that having been said... > The cost/benefit ratio for making this work just doesn't look very > promising. That's why autovacuum is built the way it is. ...yeah. From a performance point of view, what's a bit frustrating is that we have to reload a bunch of information that probably isn't different, like pg_am entries, and the pg_class and pg_attribute entries for pg_class and pg_attribute themselves. If we could figure out some way to avoid that, the potential performance win here would be bigger. But it's not obvious to me that it's a good place to spend development time; even if it worked perfectly, the overhead of forking new backends just doesn't seem like our biggest problem right now. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company