Home > mailing lists

Re: Refactoring the checkpointer's fsync request queue - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: Refactoring the checkpointer's fsync request queue
Date	February 22, 2019 22:48:53
Msg-id	20190222224853.ahvm3bf52zqr7k7h@alap3.anarazel.de Whole thread Raw
In response to	Re: Refactoring the checkpointer's fsync request queue (Thomas Munro <thomas.munro@gmail.com>)
Responses	Re: Refactoring the checkpointer's fsync request queue
List	pgsql-hackers

Tree view

Hi,

On 2019-02-23 11:42:49 +1300, Thomas Munro wrote:
> On Sat, Feb 23, 2019 at 11:15 AM Andres Freund <andres@anarazel.de> wrote:
> > On 2019-02-22 10:18:57 -0800, Shawn Debnath wrote:
> > > I think using callbacks is the better path forward than having md or
> > > other components issue an invalidate request for each and every segment
> > > which can get quite heavy handed for large databases.
> >
> > I'm not sure I buy this. Unlinking files isn't cheap, involves many disk
> > writes, etc. The cost of an inval request in comparison isn't
> > particularly large.  Especially for relation-level (rather than database
> > level) truncation, per-segment invals will likely commonly be faster
> > than the sequential scan.
> 
> Well even if you do it with individual segment cancel messages for
> relations, you still need a way to deal with whole-database drops
> (generating the cancels for every segment in every relation in the
> database would be nuts), and that means either exposing some structure
> to the requests, right?  So the requests would have { request type,
> callback ID, db, opaque tag }, where request type is SYNC, CANCEL,
> CANCEL_WHOLE_DB, callback ID is used to find the function that
> converts opaque tags to paths, and db is used for handling
> CANCEL_WHOLE_DB requests where you need to scan the whole hash table.
> Right?

I'm ok with using callbacks to allow pruning for things like droping
databases. If we use callbacks, I don't see a need to explicitly include
the db in the request however? The callback can look into the opaque
tag, no?  Also, why do we need a separation between request type and
callback? That seems like it'll commonly be entirely redundant?


> > > At the time of smgrinit(), mdinit() would call into sync and register
> > > it's callbacks with an ID. We can use the same value that we are using
> > > for smgr_which to map the callbacks. Each fsync request will then also
> > > accompany this ID which the sync mechanism will use to call handlers for
> > > resolving forget requests or obtaining paths for files.
> >
> > I'm not seeing a need to do this dynamically at runtime. Given that smgr
> > isn't extensible, why don't we just map callbacks (or even just some
> > switch based logic) based on some enum?  Doing things at *init time has
> > more potential to go wrong, because say a preload_shared_library does
> > different things in postmaster than normal backends (in EXEC_BACKEND
> > cases).
> 
> Yeah I suggested dynamic registration to avoid the problem that eg
> src/backend/storage/sync.c otherwise needs to forward declare
> md_tagtopath(), undofile_tagtopath(), slru_tagtopath(), ..., or maybe
> #include <storage/md.h> etc, which seemed like exactly the sort of
> thing up with which you would not put.

I'm not sure I understand. If we have a few known tag types, what's the
problem with including the headers with knowledge of how to implement
them into sync.c file?

Greetings,

Andres Freund

pgsql-hackers by date:

From: Thomas Munro
Date: 22 February 2019, 22:42:49
Subject: Re: Refactoring the checkpointer's fsync request queue

From: Thomas Munro
Date: 22 February 2019, 22:59:04
Subject: Re: Refactoring the checkpointer's fsync request queue

Re: Refactoring the checkpointer's fsync request queue - Mailing list pgsql-hackers

Previous

Next