Re: Load distributed checkpoint - Mailing list pgsql-hackers

From Ron Mayer
Subject Re: Load distributed checkpoint
Date
Msg-id 45786549.2000602@cheapcomplexdevices.com
Whole thread Raw
In response to Re: Load distributed checkpoint  ("Takayuki Tsunakawa" <tunakawa@soft.fujitsu.com>)
Responses Re: Load distributed checkpoint
Re: Load distributed checkpoint
List pgsql-hackers
Takayuki Tsunakawa wrote:
> Hello, Itagaki-san
>> Checkpoint consists of the following four steps, and the major
>> performance
>> problem is 2nd step. All dirty buffers are written without interval
>> in it.
>> 1. Query information (REDO pointer, next XID etc.)
>> 2. Write dirty pages in buffer pool
>> 3. Flush all modified files
>> 4. Update control file
> 
> Hmm. Isn't it possible that step 3 affects the performance greatly?
> I'm sorry if you have already identified step 2 as disturbing
> backends.
> 
> As you know, PostgreSQL does not transfer the data to disk when
> write()ing. Actual transfer occurs when fsync()ing at checkpoints,
> unless the filesystem cache runs short. So, disk is overworked at
> fsync()s.

It seems to me that virtual memory settings of the OS will determine
if step 2 or step 3 causes much of the actual disk I/O.

In particular, on Linux, things like /proc/sys/vm/dirty_expire_centisecs
and dirty_writeback_centisecs and possibly dirty_background_ratio
would affect this.  If those numbers are high, ISTM most write()s
from step 2 would wait for the flush in step 3.  If I understand
correctly, if the dirty_expire_centisecs number is low, most write()s
from step 2 would happen before step 3 because of the pdflush daemons.
I expect other OS's would have different but similar knobs to tune this.

It seems to me that the most portable way postgresql could force
the I/O to be balanced would be to insert otherwise unnecessary
fsync()s into step 2; but that it might (not sure why) be better
to handle this through OS-specific tuning outside of postgres.


pgsql-hackers by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: old synchronized scan patch
Next
From: "Heikki Linnakangas"
Date:
Subject: Dead code in _bt_split?