Re: [HACKERS] TODO item - Mailing list pgsql-hackers
From | Tatsuo Ishii |
---|---|
Subject | Re: [HACKERS] TODO item |
Date | |
Msg-id | 20000209172202B.t-ishii@sra.co.jp Whole thread Raw |
In response to | Re: [HACKERS] TODO item (Tatsuo Ishii <t-ishii@sra.co.jp>) |
Responses |
Re: [HACKERS] TODO item
Re: [HACKERS] TODO item Re: [HACKERS] TODO item |
List | pgsql-hackers |
> BTW, Hiroshi has noticed me an excellent point #3: > > >Session-1 > >begin; > >update A ...; > > > >Session-2 > >begin; > >select * fromB ..; > > There's no PostgreSQL shared buffer available. > > This backend has to force the flush of a free buffer > > page. Unfortunately the page was dirtied by the > > above operation of Session-1 and calls pg_fsync() > > for the table A. However fsync() is postponed until > > commit of this backend. > > > >Session-1 > >commit; > > There's no dirty buffer page for the table A. > > So pg_fsync() isn't called for the table A. > > Seems there's no easy solution for this. Maybe now is the time to give > up my idea... Thinking about a little bit more, I have come across yet another possible solution. It is actually *very* simple. Details as follows. In xact.c:RecordTransactionCommit() there are two FlushBufferPool calls. One is for relation files and the other is for pg_log. I add sync() right after these FlushBufferPool. It will force any pending kernel buffers physically be written onto disk, thus should guarantee the ACID of the transaction (see attached code fragment). There are two things that we should worry about sync, however. 1. Does sync really wait for the completion of data be written on to disk? I looked into the man page of sync(2) on Linux 2.0.36: According to the standard specification (e.g., SVID), sync() schedules the writes, but may return beforethe actual writing is done. However, since version 1.3.20 Linux does actually wait. (This still doesnot guarantee data integrity: modern disks have large caches.) It seems that sync(2) blocks until data is written. So it would be ok at least with Linux. I'm not sure about other platforms, though. 2. Are we suffered any performance penalty from sync? Since sync forces *all* dirty buffers on the system be written onto disk, it might be slower than fsync. So I did some testings using contrib/pgbench. Starting postmaster with -F on (and with sync modification), I ran 32 concurrent clients with performing 10 transactions each. In total 320 transactions are performed. Each transaction contains an UPDATE and a SELECT to a table that has 1000k tuples and an INSERT to another small table. The result showed that -F + sync was actually faster than the default mode (no -F, no modifications). The system is a Red Hat 5.2, with 128MB RAM. -F + sync normal mode -------------------------------------------------------- transactions/sec 3.46 2.93 Of course if there are disk activities other than PostgreSQL, sync would be suffered by it. However, in most cases the system is dedicated for only PostgreSQL, and I don't think this is a big problem in the real world. Note that for large COPY or INSERT was much faster than the normal mode due to no per-page-fsync. Thinking about all these, I would like to propose we add a new switch to postgres to run with -F + sync. ------------------------------------------------------------------------/* * If no one shared buffer was changed by thistransaction then * we don't flush shared buffers and don't record commit status. */if (SharedBufferChanged){ FlushBufferPool(); sync(); if (leak) ResetBufferPool(); /* * have the transaction access methods record the status * of this transaction id in the pg_log relation. */ TransactionIdCommit(xid); /* * Now write the log info to the disk too. */ leak = BufferPoolCheckLeak(); FlushBufferPool(); sync();}
pgsql-hackers by date: