Thread: RE: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

RE: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

From

"Mikheev, Vadim"

Date:

08 November 2000, 13:01:52

> > > > New CHECKPOINT command.
> > > > Auto removing of offline log files and creating new file
> > > > at checkpoint time.
> 
> Can you tell me how to use CHECKPOINT please?

You shouldn't normally use it - postmaster will start backend
each 3-5 minutes to do this automatically.

> > > Is this the same as a SAVEPOINT?
> > 
> > No. Checkpoints are to speedup after crash recovery and
> > to remove/archive log files. With WAL server doesn't write
> > any datafiles on commit, only commit record goes to log
> > (and log fsync-ed). Dirty buffers remains in memory long
> 
> Is log fsynced even I turn of -F?

Yes, though we can change this. We also can implement now
feature that Bruce wanted so long and so much -:) -
fsync log not on each commit but each ~ 5sec, if
losing some recent commits is acceptable.

Nevertheless, when bufmgr replaces dirty buffer it must
ensure first that log record of last buffer update is
on disk already and so bufmgr forces log fsync if required.
This cannot be changed - rule is simple: log before applying
changes to permanent storage.

Vadim

Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

From

Bruce Momjian

Date:

08 November 2000, 13:35:33

[ Charset ISO-8859-1 unsupported, converting... ]
> > > > > New CHECKPOINT command.
> > > > > Auto removing of offline log files and creating new file
> > > > > at checkpoint time.
> > 
> > Can you tell me how to use CHECKPOINT please?
> 
> You shouldn't normally use it - postmaster will start backend
> each 3-5 minutes to do this automatically.
> 
> > > > Is this the same as a SAVEPOINT?
> > > 
> > > No. Checkpoints are to speedup after crash recovery and
> > > to remove/archive log files. With WAL server doesn't write
> > > any datafiles on commit, only commit record goes to log
> > > (and log fsync-ed). Dirty buffers remains in memory long
> > 
> > Is log fsynced even I turn of -F?
> 
> Yes, though we can change this. We also can implement now
> feature that Bruce wanted so long and so much -:) -
> fsync log not on each commit but each ~ 5sec, if
> losing some recent commits is acceptable.

Great.  I think this middle ground is something we could never address
before.


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

RE: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

From

Tatsuo Ishii

Date:

10 November 2000, 21:37:01

> > Can you tell me how to use CHECKPOINT please?
> 
> You shouldn't normally use it - postmaster will start backend
> each 3-5 minutes to do this automatically.

Oh, I see.

> > > > Is this the same as a SAVEPOINT?
> > > 
> > > No. Checkpoints are to speedup after crash recovery and
> > > to remove/archive log files. With WAL server doesn't write
> > > any datafiles on commit, only commit record goes to log
> > > (and log fsync-ed). Dirty buffers remains in memory long

Ok, so with CHECKPOINTS, we could move the offline log files to
somewhere else so that we could archive them, in my
undertstanding. Now question is, how we could recover from disaster
like losing every table files except log files. Can we do this with
WAL? If so, how can we do it?

> > Is log fsynced even I turn of -F?
> 
> Yes, though we can change this. We also can implement now
> feature that Bruce wanted so long and so much -:) -
> fsync log not on each commit but each ~ 5sec, if
> losing some recent commits is acceptable.

Sounds great.
--
Tatsuo Ishii

Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

From

Alfred Perlstein

Date:

10 November 2000, 22:20:25

* Tatsuo Ishii <t-ishii@sra.co.jp> [001110 18:42] wrote:
> > 
> > Yes, though we can change this. We also can implement now
> > feature that Bruce wanted so long and so much -:) -
> > fsync log not on each commit but each ~ 5sec, if
> > losing some recent commits is acceptable.
> 
> Sounds great.

Not really, I thought an ack on a commit would mean that the data
is actually in stable storage, breaking that would be pretty bad
no?  Or are you only talking about when someone is running with
async Postgresql?

Although this doesn't have an effect on my current application,
when running Postgresql with sync commits and WAL can one expect
the old behavior, ie. success only after data and meta data (log)
are written?

Another question I had was what would the effect of a mid-fsync
crash have on a system using WAL, let's say someone yanks the
power while the OS in the midst of fsync, will all be ok?

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."

Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

From

Bruce Momjian

Date:

11 November 2000, 03:08:34

> * Tatsuo Ishii <t-ishii@sra.co.jp> [001110 18:42] wrote:
> > > 
> > > Yes, though we can change this. We also can implement now
> > > feature that Bruce wanted so long and so much -:) -
> > > fsync log not on each commit but each ~ 5sec, if
> > > losing some recent commits is acceptable.
> > 
> > Sounds great.
> 
> Not really, I thought an ack on a commit would mean that the data
> is actually in stable storage, breaking that would be pretty bad
> no?  Or are you only talking about when someone is running with
> async Postgresql?

The default is to sync on commit, but we need to give people options of
several seconds delay for performance reasons.  Inforimx calls it
buffered logging, and it is used by most of the sites I know because it
has much better performance that sync on commit.

If the machine crashes five seconds after commit, many people don't have
a problem with just re-entering the data.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

From

Tatsuo Ishii

Date:

11 November 2000, 04:01:05

> * Tatsuo Ishii <t-ishii@sra.co.jp> [001110 18:42] wrote:
> > > 
> > > Yes, though we can change this. We also can implement now
> > > feature that Bruce wanted so long and so much -:) -
> > > fsync log not on each commit but each ~ 5sec, if
> > > losing some recent commits is acceptable.
> > 
> > Sounds great.
> 
> Not really, I thought an ack on a commit would mean that the data
> is actually in stable storage, breaking that would be pretty bad
> no?  Or are you only talking about when someone is running with
> async Postgresql?
> 
> Although this doesn't have an effect on my current application,
> when running Postgresql with sync commits and WAL can one expect
> the old behavior, ie. success only after data and meta data (log)
> are written?

Probably you misunderstand what Bruce expected to have. He wished to
have not-everytime-fsync as an *option*. I believe we wil do strict
fsync in default.
--
Tatsuo Ishii

Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

From

Alfred Perlstein

Date:

11 November 2000, 05:24:43

* Bruce Momjian <pgman@candle.pha.pa.us> [001111 00:16] wrote:
> > * Tatsuo Ishii <t-ishii@sra.co.jp> [001110 18:42] wrote:
> > > > 
> > > > Yes, though we can change this. We also can implement now
> > > > feature that Bruce wanted so long and so much -:) -
> > > > fsync log not on each commit but each ~ 5sec, if
> > > > losing some recent commits is acceptable.
> > > 
> > > Sounds great.
> > 
> > Not really, I thought an ack on a commit would mean that the data
> > is actually in stable storage, breaking that would be pretty bad
> > no?  Or are you only talking about when someone is running with
> > async Postgresql?
> 
> The default is to sync on commit, but we need to give people options of
> several seconds delay for performance reasons.  Inforimx calls it
> buffered logging, and it is used by most of the sites I know because it
> has much better performance that sync on commit.
> 
> If the machine crashes five seconds after commit, many people don't have
> a problem with just re-entering the data.

We have several critical tables and running certain updates/deletes/inserts
on them in async mode worries me.  Would it be possible to add a
'set' command to force a backend into fsync mode and perhaps back
into non-fsync mode as well?

What about setting an attribute on a table that could mean
a) anyone updating me better fsync me.
b) anyone updating me better fsync me as well as fsyncing  anything else they touch.

I swear one of these days I'm going to get more familiar with the
codebase and actually submit some useful patches for the backend.
:(

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]

Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

From

Tom Lane

Date:

11 November 2000, 11:02:50

Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> Not really, I thought an ack on a commit would mean that the data
>> is actually in stable storage, breaking that would be pretty bad
>> no?

> The default is to sync on commit, but we need to give people options of
> several seconds delay for performance reasons.  Inforimx calls it
> buffered logging, and it is used by most of the sites I know because it
> has much better performance that sync on commit.

I have to agree with Alfred here: this does not sound like a feature,
it sounds like a horrid hack.  You're giving up *all* consistency
guarantees for a performance gain that is really going to be pretty
minimal in the WAL context.

Earlier, Vadim was talking about arranging to share fsyncs of the WAL
log file across transactions (after writing your commit record to the
log, sleep a few milliseconds to see if anyone else fsyncs before you
do; if not, issue the fsync yourself).  That would offer less-than-
one-fsync-per-transaction performance without giving up any guarantees.
If people feel a compulsion to have a tunable parameter, let 'em tune
the length of the pre-fsync sleep ...
        regards, tom lane

Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

From

Bruce Momjian

Date:

11 November 2000, 14:14:21

> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >> Not really, I thought an ack on a commit would mean that the data
> >> is actually in stable storage, breaking that would be pretty bad
> >> no?
> 
> > The default is to sync on commit, but we need to give people options of
> > several seconds delay for performance reasons.  Inforimx calls it
> > buffered logging, and it is used by most of the sites I know because it
> > has much better performance that sync on commit.
> 
> I have to agree with Alfred here: this does not sound like a feature,
> it sounds like a horrid hack.  You're giving up *all* consistency
> guarantees for a performance gain that is really going to be pretty
> minimal in the WAL context.

It does not give up consistency.  The db is still consistent, it is just
consistent from a few seconds ago, rather than commit time.  This is
standard Informix practice at most law firms I work with.

> 
> Earlier, Vadim was talking about arranging to share fsyncs of the WAL
> log file across transactions (after writing your commit record to the
> log, sleep a few milliseconds to see if anyone else fsyncs before you
> do; if not, issue the fsync yourself).  That would offer less-than-
> one-fsync-per-transaction performance without giving up any guarantees.
> If people feel a compulsion to have a tunable parameter, let 'em tune
> the length of the pre-fsync sleep ...

That would work.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

From

Tom Lane

Date:

11 November 2000, 15:03:56

Bruce Momjian <pgman@candle.pha.pa.us> writes:
>> I have to agree with Alfred here: this does not sound like a feature,
>> it sounds like a horrid hack.  You're giving up *all* consistency
>> guarantees for a performance gain that is really going to be pretty
>> minimal in the WAL context.

> It does not give up consistency.  The db is still consistent, it is just
> consistent from a few seconds ago, rather than commit time.

No, it isn't consistent.  Without the fsync you don't know what order
the kernel will choose to plop down WAL log blocks in; you could end up
with a corrupt log.  (Actually, perhaps that could be worked around if
the log blocks are suitably marked so that you can tell where the last
sequentially valid one is.  I haven't looked at the log structure in
any detail...)
        regards, tom lane

Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

From

Bruce Momjian

Date:

11 November 2000, 15:11:34

> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >> I have to agree with Alfred here: this does not sound like a feature,
> >> it sounds like a horrid hack.  You're giving up *all* consistency
> >> guarantees for a performance gain that is really going to be pretty
> >> minimal in the WAL context.
> 
> > It does not give up consistency.  The db is still consistent, it is just
> > consistent from a few seconds ago, rather than commit time.
> 
> No, it isn't consistent.  Without the fsync you don't know what order
> the kernel will choose to plop down WAL log blocks in; you could end up
> with a corrupt log.  (Actually, perhaps that could be worked around if
> the log blocks are suitably marked so that you can tell where the last
> sequentially valid one is.  I haven't looked at the log structure in
> any detail...)
> 

I am just suggesting that instead of flushing the log on every
transaction end, just do it every X seconds.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

From

Bruce Momjian

Date:

11 November 2000, 15:15:06

> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >> I have to agree with Alfred here: this does not sound like a feature,
> >> it sounds like a horrid hack.  You're giving up *all* consistency
> >> guarantees for a performance gain that is really going to be pretty
> >> minimal in the WAL context.
> 
> > It does not give up consistency.  The db is still consistent, it is just
> > consistent from a few seconds ago, rather than commit time.
> 
> No, it isn't consistent.  Without the fsync you don't know what order
> the kernel will choose to plop down WAL log blocks in; you could end up
> with a corrupt log.  (Actually, perhaps that could be worked around if
> the log blocks are suitably marked so that you can tell where the last
> sequentially valid one is.  I haven't looked at the log structure in
> any detail...)

Well, WAL already has to be careful in the order it plops down the log
blocks because a single transaction can span multiple log blocks.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026

Re: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)

From

Alfred Perlstein

Date:

12 November 2000, 02:06:32

* Tom Lane <tgl@sss.pgh.pa.us> [001111 12:06] wrote:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> >> I have to agree with Alfred here: this does not sound like a feature,
> >> it sounds like a horrid hack.  You're giving up *all* consistency
> >> guarantees for a performance gain that is really going to be pretty
> >> minimal in the WAL context.
> 
> > It does not give up consistency.  The db is still consistent, it is just
> > consistent from a few seconds ago, rather than commit time.
> 
> No, it isn't consistent.  Without the fsync you don't know what order
> the kernel will choose to plop down WAL log blocks in; you could end up
> with a corrupt log.  (Actually, perhaps that could be worked around if
> the log blocks are suitably marked so that you can tell where the last
> sequentially valid one is.  I haven't looked at the log structure in
> any detail...)

This could be fixed by using O_FSYNC on the open call for the WAL
data files on *BSD, i'm not sure of the sysV equivelant, but I know
it exists.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."