Re: Refactoring the checkpointer's fsync request queue - Mailing list pgsql-hackers
From | Shawn Debnath |
---|---|
Subject | Re: Refactoring the checkpointer's fsync request queue |
Date | |
Msg-id | 20190216193905.GA53174@f01898859afd.ant.amazon.com Whole thread Raw |
In response to | Re: Refactoring the checkpointer's fsync request queue (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Refactoring the checkpointer's fsync request queue
|
List | pgsql-hackers |
On Fri, Feb 15, 2019 at 06:45:02PM -0800, Andres Freund wrote: > > One of the advantages of that approach is that there are probably > > other files that need to be fsync'd for each checkpoint that could > > benefit from being offloaded to the checkpointer. Another is that you > > break the strange cycle mentioned above. > > The other issue is that I think your approach moves the segmentation > logic basically out of md into smgr. I think that's wrong. We shouldn't > presume that every type of storage is going to have segmentation that's > representable in a uniform way imo. I had a discussion with Thomas on this and am working on a new version of the patch that incorporates what you guys discussed at FOSDEM, but avoiding passing pathnames to checkpointer. The mdsync machinery will be moved out of md.c and pending ops table will incorporate the segment number as part of the key. Still deciding on how to cleanly re-factor _mdfd_getseg which mdsync utilizes during the file sync operations. The ultimate goal is to get checkpointer the file descriptor it can use to issue the fsync using FileSync. So perhaps a function in smgr that returns just that based on the RelFileNode, fork and segno combination. Dealing only with file descriptors will allow us to implement passing FDs to checkpointer directly as part of the request in the future. The goal is to encapsulate relation specific knowledge within md.c while allowing undo and generic block store (ex-SLRU) to do their own mapping within the smgr layer later. Yes, checkpointer will "call back" into smgr, but these would be to retrieve information that should be managed by smgr. Allowing checkpointer to focus on its job of tracking requests and syncing files via the fd interfaces. > > Another consideration if we do that is that the existing scheme has a > > kind of hierarchy that allows fsync requests to be cancelled in bulk > > when you drop relations and databases. That is, the checkpointer > > knows about the internal hierarchy of tablespace, db, rel, seg. If we > > get rid of that and have just paths, it seems like a bad idea to teach > > the checkpointer about the internal structure of the paths (even > > though we know they contain the same elements encoded somehow). You'd > > have to send an explicit cancel for every key; that is, if you're > > dropping a relation, you need to generate a cancel message for every > > segment, and if you're dropping a database, you need to generate a > > cancel message for every segment of every relation. > > I can't see that being a problem - compared to the overhead of dropping > a relation, that doesn't seem to be a meaningfully large cost? With the scheme above - dropping hierarchies will require scanning the hash table for matching dboid or reloid and removing those entries. We do this today for FORGET_DATABASE_FSYNC in RememberFsyncRequest. The matching function will belong in smgr. We can see how scanning the whole hash table impacts performance and iterate on it from there if needed. -- Shawn Debnath Amazon Web Services (AWS)
pgsql-hackers by date: