Re: [HACKERS] TODO item - Mailing list pgsql-hackers
From | Alfred Perlstein |
---|---|
Subject | Re: [HACKERS] TODO item |
Date | |
Msg-id | 20000209020448.P17536@fw.wintelcom.net Whole thread Raw |
In response to | Re: [HACKERS] TODO item (Tatsuo Ishii <t-ishii@sra.co.jp>) |
Responses |
Re: [HACKERS] TODO item
|
List | pgsql-hackers |
* Tatsuo Ishii <t-ishii@sra.co.jp> [000209 00:51] wrote: > > BTW, Hiroshi has noticed me an excellent point #3: > > > > >Session-1 > > >begin; > > >update A ...; > > > > > >Session-2 > > >begin; > > >select * fromB ..; > > > There's no PostgreSQL shared buffer available. > > > This backend has to force the flush of a free buffer > > > page. Unfortunately the page was dirtied by the > > > above operation of Session-1 and calls pg_fsync() > > > for the table A. However fsync() is postponed until > > > commit of this backend. > > > > > >Session-1 > > >commit; > > > There's no dirty buffer page for the table A. > > > So pg_fsync() isn't called for the table A. > > > > Seems there's no easy solution for this. Maybe now is the time to give > > up my idea... > > Thinking about a little bit more, I have come across yet another > possible solution. It is actually *very* simple. Details as follows. > > In xact.c:RecordTransactionCommit() there are two FlushBufferPool > calls. One is for relation files and the other is for pg_log. I add > sync() right after these FlushBufferPool. It will force any pending > kernel buffers physically be written onto disk, thus should guarantee > the ACID of the transaction (see attached code fragment). > > There are two things that we should worry about sync, however. > > 1. Does sync really wait for the completion of data be written on to > disk? > > I looked into the man page of sync(2) on Linux 2.0.36: > > According to the standard specification (e.g., SVID), > sync() schedules the writes, but may return before the > actual writing is done. However, since version 1.3.20 > Linux does actually wait. (This still does not guarantee > data integrity: modern disks have large caches.) > > It seems that sync(2) blocks until data is written. So it would be ok > at least with Linux. I'm not sure about other platforms, though. It is incorrect to assume that sync() wait until all buffers are flushed on any other platform than Linux, I didn't think that Linux even did so but the kernel sources say yes. Solaris doesn't do this and niether does FreeBSD/NetBSD. I guess if you wanted to implement this for linux only then it would work, you ought to then also warn people that a non-dedicated db server could experiance different performance using this code. -Alfred
pgsql-hackers by date: