Home > mailing lists

Re: Refactoring the checkpointer's fsync request queue - Mailing list pgsql-hackers

From	Shawn Debnath
Subject	Re: Refactoring the checkpointer's fsync request queue
Date	February 16, 2019 19:39:05
Msg-id	20190216193905.GA53174@f01898859afd.ant.amazon.com Whole thread Raw
In response to	Re: Refactoring the checkpointer's fsync request queue (Andres Freund <andres@anarazel.de>)
Responses	Re: Refactoring the checkpointer's fsync request queue
List	pgsql-hackers

Tree view

On Fri, Feb 15, 2019 at 06:45:02PM -0800, Andres Freund wrote:

> > One of the advantages of that approach is that there are probably
> > other files that need to be fsync'd for each checkpoint that could
> > benefit from being offloaded to the checkpointer.  Another is that you
> > break the strange cycle mentioned above.
> 
> The other issue is that I think your approach moves the segmentation
> logic basically out of md into smgr. I think that's wrong. We shouldn't
> presume that every type of storage is going to have segmentation that's
> representable in a uniform way imo.

I had a discussion with Thomas on this and am working on a new version 
of the patch that incorporates what you guys discussed at FOSDEM, but 
avoiding passing pathnames to checkpointer.

The mdsync machinery will be moved out of md.c and pending ops table 
will incorporate the segment number as part of the key. Still deciding 
on how to cleanly re-factor _mdfd_getseg which mdsync utilizes during 
the file sync operations. The ultimate goal is to get checkpointer the 
file descriptor it can use to issue the fsync using FileSync. So perhaps 
a function in smgr that returns just that based on the RelFileNode, fork 
and segno combination. Dealing only with file descriptors will allow us 
to implement passing FDs to checkpointer directly as part of the request 
in the future.

The goal is to encapsulate relation specific knowledge within md.c while 
allowing undo and generic block store (ex-SLRU) to do their own mapping 
within the smgr layer later. Yes, checkpointer will "call back" into 
smgr, but these would be to retrieve information that should be managed 
by smgr. Allowing checkpointer to focus on its job of tracking requests 
and syncing files via the fd interfaces.

> > Another consideration if we do that is that the existing scheme has a
> > kind of hierarchy that allows fsync requests to be cancelled in bulk
> > when you drop relations and databases.  That is, the checkpointer
> > knows about the internal hierarchy of tablespace, db, rel, seg.  If we
> > get rid of that and have just paths, it seems like a bad idea to teach
> > the checkpointer about the internal structure of the paths (even
> > though we know they contain the same elements encoded somehow).  You'd
> > have to send an explicit cancel for every key; that is, if you're
> > dropping a relation, you need to generate a cancel message for every
> > segment, and if you're dropping a database, you need to generate a
> > cancel message for every segment of every relation.
> 
> I can't see that being a problem - compared to the overhead of dropping
> a relation, that doesn't seem to be a meaningfully large cost?

With the scheme above - dropping hierarchies will require scanning the 
hash table for matching dboid or reloid and removing those entries. We 
do this today for FORGET_DATABASE_FSYNC in RememberFsyncRequest. The 
matching function will belong in smgr. We can see how scanning the whole 
hash table impacts performance and iterate on it from there if needed.  
-- 
Shawn Debnath
Amazon Web Services (AWS)

pgsql-hackers by date:

From: Alexander Korotkov
Date: 16 February 2019, 19:22:32
Subject: Re: 2019-03 CF Summary / Review - Tranche #2

From: Andres Freund
Date: 16 February 2019, 20:31:05
Subject: Re: 2019-03 CF Summary / Review - Tranche #2

Re: Refactoring the checkpointer's fsync request queue - Mailing list pgsql-hackers

Previous

Next