Thread: Database disconnection and switch inside a single bgworker

Database disconnection and switch inside a single bgworker

From

Michael Paquier

Date:

15 November 2013, 02:55:21

Hi all,

Currently, bgworkers offer the possibility to connect to a given
database using BackgroundWorkerInitializeConnection in bgworker.h, but
there is actually no way to disconnect from a given database inside
the same bgworker process.

One of the use cases for that would be the possibility to have the
same bgworker performing for example some analysis on a database A,
like some analysis of statistics using a common database like
postgres, and then perform some actions on another database like,
let's imagine an ANALYSE on a given relation only on database B.

Using the infrastructure of 9.4 as of now, it would be possible of
course to have a bgworker process launching some other child processes
dynamically on different databases, but users (including me) might not
want to do that all the time.

Database disconnection would be also pretty cool for things like
parallel query processing using a pool of bgworker processes that all
the backends could use in parallel as a single ressource.

Note that I didn't have a look at the code yet to see how it would be
possible to do that (looks tricky though), if any infrastructure is
needed or if it could be possible to do that without modifying the
core code of Postgres. So, opinions about that as well as additional
thoughts are welcome!

Regards,
-- 
Michael

Re: Database disconnection and switch inside a single bgworker

From

Tom Lane

Date:

15 November 2013, 15:14:56

Michael Paquier <michael.paquier@gmail.com> writes:
> Currently, bgworkers offer the possibility to connect to a given
> database using BackgroundWorkerInitializeConnection in bgworker.h, but
> there is actually no way to disconnect from a given database inside
> the same bgworker process.

That's isomorphic to having a backend switch to a different database,
which occasionally gets requested, but there is no real likelihood
that we'll ever implement.  The problem is, how can you be sure you
have flushed all the database-specific state that's been built up?
The relcache and catcaches are only the tip of the iceberg; we've
got caches all over the place.  And once you had flushed that data,
you'd have to recreate it --- but the code for doing so is intimately
intertwined with connection startup tasks that you'd most likely not
want to repeat.

And, once you'd done all that work, what would you have?  A database
switch methodology that would save a fork(), but not much else.
The time to warm up the caches wouldn't be any better than in a
fresh process.

The cost/benefit ratio for making this work just doesn't look very
promising.  That's why autovacuum is built the way it is.
        regards, tom lane

Re: Database disconnection and switch inside a single bgworker

From

Robert Haas

Date:

19 November 2013, 13:25:43

On Fri, Nov 15, 2013 at 10:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> Currently, bgworkers offer the possibility to connect to a given
>> database using BackgroundWorkerInitializeConnection in bgworker.h, but
>> there is actually no way to disconnect from a given database inside
>> the same bgworker process.
>
> That's isomorphic to having a backend switch to a different database,
> which occasionally gets requested, but there is no real likelihood
> that we'll ever implement.  The problem is, how can you be sure you
> have flushed all the database-specific state that's been built up?
> The relcache and catcaches are only the tip of the iceberg; we've
> got caches all over the place.  And once you had flushed that data,
> you'd have to recreate it --- but the code for doing so is intimately
> intertwined with connection startup tasks that you'd most likely not
> want to repeat.
>
> And, once you'd done all that work, what would you have?  A database
> switch methodology that would save a fork(), but not much else.
> The time to warm up the caches wouldn't be any better than in a
> fresh process.

Well, you'd have whatever backend-local state you had accumulated
apart from stuff in the caches.  It's clearly not useless, especially
for background workers.  And you might actually save a little bit,
because I think we established previously that a good fraction of the
startup cost was actually page faults, which would not need to be
re-incurred.  But that having been said...

> The cost/benefit ratio for making this work just doesn't look very
> promising.  That's why autovacuum is built the way it is.

...yeah.

From a performance point of view, what's a bit frustrating is that we
have to reload a bunch of information that probably isn't different,
like pg_am entries, and the pg_class and pg_attribute entries for
pg_class and pg_attribute themselves.  If we could figure out some way
to avoid that, the potential performance win here would be bigger.
But it's not obvious to me that it's a good place to spend development
time; even if it worked perfectly, the overhead of forking new
backends just doesn't seem like our biggest problem right now.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company