Re: Refactoring the checkpointer's fsync request queue - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Refactoring the checkpointer's fsync request queue |
Date | |
Msg-id | 20190222224853.ahvm3bf52zqr7k7h@alap3.anarazel.de Whole thread Raw |
In response to | Re: Refactoring the checkpointer's fsync request queue (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: Refactoring the checkpointer's fsync request queue
|
List | pgsql-hackers |
Hi, On 2019-02-23 11:42:49 +1300, Thomas Munro wrote: > On Sat, Feb 23, 2019 at 11:15 AM Andres Freund <andres@anarazel.de> wrote: > > On 2019-02-22 10:18:57 -0800, Shawn Debnath wrote: > > > I think using callbacks is the better path forward than having md or > > > other components issue an invalidate request for each and every segment > > > which can get quite heavy handed for large databases. > > > > I'm not sure I buy this. Unlinking files isn't cheap, involves many disk > > writes, etc. The cost of an inval request in comparison isn't > > particularly large. Especially for relation-level (rather than database > > level) truncation, per-segment invals will likely commonly be faster > > than the sequential scan. > > Well even if you do it with individual segment cancel messages for > relations, you still need a way to deal with whole-database drops > (generating the cancels for every segment in every relation in the > database would be nuts), and that means either exposing some structure > to the requests, right? So the requests would have { request type, > callback ID, db, opaque tag }, where request type is SYNC, CANCEL, > CANCEL_WHOLE_DB, callback ID is used to find the function that > converts opaque tags to paths, and db is used for handling > CANCEL_WHOLE_DB requests where you need to scan the whole hash table. > Right? I'm ok with using callbacks to allow pruning for things like droping databases. If we use callbacks, I don't see a need to explicitly include the db in the request however? The callback can look into the opaque tag, no? Also, why do we need a separation between request type and callback? That seems like it'll commonly be entirely redundant? > > > At the time of smgrinit(), mdinit() would call into sync and register > > > it's callbacks with an ID. We can use the same value that we are using > > > for smgr_which to map the callbacks. Each fsync request will then also > > > accompany this ID which the sync mechanism will use to call handlers for > > > resolving forget requests or obtaining paths for files. > > > > I'm not seeing a need to do this dynamically at runtime. Given that smgr > > isn't extensible, why don't we just map callbacks (or even just some > > switch based logic) based on some enum? Doing things at *init time has > > more potential to go wrong, because say a preload_shared_library does > > different things in postmaster than normal backends (in EXEC_BACKEND > > cases). > > Yeah I suggested dynamic registration to avoid the problem that eg > src/backend/storage/sync.c otherwise needs to forward declare > md_tagtopath(), undofile_tagtopath(), slru_tagtopath(), ..., or maybe > #include <storage/md.h> etc, which seemed like exactly the sort of > thing up with which you would not put. I'm not sure I understand. If we have a few known tag types, what's the problem with including the headers with knowledge of how to implement them into sync.c file? Greetings, Andres Freund
pgsql-hackers by date: