Home > mailing lists

Re: [HACKERS] TODO item - Mailing list pgsql-hackers

From	Tatsuo Ishii
Subject	Re: [HACKERS] TODO item
Date	February 9, 2000 03:18:40
Msg-id	20000209172202B.t-ishii@sra.co.jp Whole thread Raw
In response to	Re: [HACKERS] TODO item (Tatsuo Ishii <t-ishii@sra.co.jp>)
Responses	Re: [HACKERS] TODO item Re: [HACKERS] TODO item Re: [HACKERS] TODO item
List	pgsql-hackers

Tree view

> BTW, Hiroshi has noticed me an excellent point #3:
> 
> >Session-1
> >begin;
> >update A ...;
> >
> >Session-2
> >begin;
> >select * fromB ..;
> >    There's no PostgreSQL shared buffer available.
> >    This backend has to force the flush of a free buffer
> >    page. Unfortunately the page was dirtied by the
> >    above operation of Session-1 and calls pg_fsync()
> >    for the table A. However fsync() is postponed until
> >    commit of this backend.
> >
> >Session-1
> >commit;
> >    There's no dirty buffer page for the table A.
> >    So pg_fsync() isn't called for the table A.
> 
> Seems there's no easy solution for this. Maybe now is the time to give
> up my idea...

Thinking about a little bit more, I have come across yet another
possible solution. It is actually *very* simple. Details as follows.

In xact.c:RecordTransactionCommit() there are two FlushBufferPool
calls. One is for relation files and the other is for pg_log. I add
sync() right after these FlushBufferPool. It will force any pending
kernel buffers physically be written onto disk, thus should guarantee
the ACID of the transaction (see attached code fragment).

There are two things that we should worry about sync, however.

1. Does sync really wait for the completion of data be written on to
disk?

I looked into the man page of sync(2) on Linux 2.0.36:
      According to  the  standard  specification  (e.g.,  SVID),      sync()  schedules  the  writes,  but may return
beforethe      actual writing is done.   However,  since  version  1.3.20      Linux  does actually wait.  (This still
doesnot guarantee      data integrity: modern disks have large caches.)
 

It seems that sync(2) blocks until data is written. So it would be ok
at least with Linux. I'm not sure about other platforms, though.

2. Are we suffered any performance penalty from sync?

Since sync forces *all* dirty buffers on the system be written onto
disk, it might be slower than fsync. So I did some testings using
contrib/pgbench. Starting postmaster with -F on (and with sync
modification), I ran 32 concurrent clients with performing 10
transactions each. In total 320 transactions are performed. Each
transaction contains an UPDATE and a SELECT to a table that has 1000k
tuples and an INSERT to another small table. The result showed that -F
+ sync was actually faster than the default mode (no -F, no
modifications). The system is a Red Hat 5.2, with 128MB RAM.
        -F + sync    normal mode
--------------------------------------------------------
transactions/sec    3.46        2.93

Of course if there are disk activities other than PostgreSQL, sync
would be suffered by it. However, in most cases the system is
dedicated for only PostgreSQL, and I don't think this is a big problem
in the real world.

Note that for large COPY or INSERT was much faster than the normal
mode due to no per-page-fsync.

Thinking about all these, I would like to propose we add a new switch
to postgres to run with -F + sync.

------------------------------------------------------------------------/* * If no one shared buffer was changed by
thistransaction then * we don't flush shared buffers and don't record commit status. */if (SharedBufferChanged){
FlushBufferPool();   sync();    if (leak)        ResetBufferPool();
 
    /*     *    have the transaction access methods record the status     *    of this transaction id in the pg_log
relation.    */    TransactionIdCommit(xid);
 
    /*     *    Now write the log info to the disk too.     */    leak = BufferPoolCheckLeak();    FlushBufferPool();
sync();}

pgsql-hackers by date:

From: Tom Lane
Date: 09 February 2000, 02:52:40
Subject: Re: [HACKERS] backend startup

From: Chris
Date: 09 February 2000, 04:13:41
Subject: Re: [HACKERS] backend startup

Re: [HACKERS] TODO item - Mailing list pgsql-hackers

Previous

Next