Thread: RE: RE: [COMMITTERS] pgsql/src/backend/access/transam ( xact.c xlog.c)
> > > > New CHECKPOINT command. > > > > Auto removing of offline log files and creating new file > > > > at checkpoint time. > > Can you tell me how to use CHECKPOINT please? You shouldn't normally use it - postmaster will start backend each 3-5 minutes to do this automatically. > > > Is this the same as a SAVEPOINT? > > > > No. Checkpoints are to speedup after crash recovery and > > to remove/archive log files. With WAL server doesn't write > > any datafiles on commit, only commit record goes to log > > (and log fsync-ed). Dirty buffers remains in memory long > > Is log fsynced even I turn of -F? Yes, though we can change this. We also can implement now feature that Bruce wanted so long and so much -:) - fsync log not on each commit but each ~ 5sec, if losing some recent commits is acceptable. Nevertheless, when bufmgr replaces dirty buffer it must ensure first that log record of last buffer update is on disk already and so bufmgr forces log fsync if required. This cannot be changed - rule is simple: log before applying changes to permanent storage. Vadim
[ Charset ISO-8859-1 unsupported, converting... ] > > > > > New CHECKPOINT command. > > > > > Auto removing of offline log files and creating new file > > > > > at checkpoint time. > > > > Can you tell me how to use CHECKPOINT please? > > You shouldn't normally use it - postmaster will start backend > each 3-5 minutes to do this automatically. > > > > > Is this the same as a SAVEPOINT? > > > > > > No. Checkpoints are to speedup after crash recovery and > > > to remove/archive log files. With WAL server doesn't write > > > any datafiles on commit, only commit record goes to log > > > (and log fsync-ed). Dirty buffers remains in memory long > > > > Is log fsynced even I turn of -F? > > Yes, though we can change this. We also can implement now > feature that Bruce wanted so long and so much -:) - > fsync log not on each commit but each ~ 5sec, if > losing some recent commits is acceptable. Great. I think this middle ground is something we could never address before. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> > Can you tell me how to use CHECKPOINT please? > > You shouldn't normally use it - postmaster will start backend > each 3-5 minutes to do this automatically. Oh, I see. > > > > Is this the same as a SAVEPOINT? > > > > > > No. Checkpoints are to speedup after crash recovery and > > > to remove/archive log files. With WAL server doesn't write > > > any datafiles on commit, only commit record goes to log > > > (and log fsync-ed). Dirty buffers remains in memory long Ok, so with CHECKPOINTS, we could move the offline log files to somewhere else so that we could archive them, in my undertstanding. Now question is, how we could recover from disaster like losing every table files except log files. Can we do this with WAL? If so, how can we do it? > > Is log fsynced even I turn of -F? > > Yes, though we can change this. We also can implement now > feature that Bruce wanted so long and so much -:) - > fsync log not on each commit but each ~ 5sec, if > losing some recent commits is acceptable. Sounds great. -- Tatsuo Ishii
* Tatsuo Ishii <t-ishii@sra.co.jp> [001110 18:42] wrote: > > > > Yes, though we can change this. We also can implement now > > feature that Bruce wanted so long and so much -:) - > > fsync log not on each commit but each ~ 5sec, if > > losing some recent commits is acceptable. > > Sounds great. Not really, I thought an ack on a commit would mean that the data is actually in stable storage, breaking that would be pretty bad no? Or are you only talking about when someone is running with async Postgresql? Although this doesn't have an effect on my current application, when running Postgresql with sync commits and WAL can one expect the old behavior, ie. success only after data and meta data (log) are written? Another question I had was what would the effect of a mid-fsync crash have on a system using WAL, let's say someone yanks the power while the OS in the midst of fsync, will all be ok? -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk."
> * Tatsuo Ishii <t-ishii@sra.co.jp> [001110 18:42] wrote: > > > > > > Yes, though we can change this. We also can implement now > > > feature that Bruce wanted so long and so much -:) - > > > fsync log not on each commit but each ~ 5sec, if > > > losing some recent commits is acceptable. > > > > Sounds great. > > Not really, I thought an ack on a commit would mean that the data > is actually in stable storage, breaking that would be pretty bad > no? Or are you only talking about when someone is running with > async Postgresql? The default is to sync on commit, but we need to give people options of several seconds delay for performance reasons. Inforimx calls it buffered logging, and it is used by most of the sites I know because it has much better performance that sync on commit. If the machine crashes five seconds after commit, many people don't have a problem with just re-entering the data. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> * Tatsuo Ishii <t-ishii@sra.co.jp> [001110 18:42] wrote: > > > > > > Yes, though we can change this. We also can implement now > > > feature that Bruce wanted so long and so much -:) - > > > fsync log not on each commit but each ~ 5sec, if > > > losing some recent commits is acceptable. > > > > Sounds great. > > Not really, I thought an ack on a commit would mean that the data > is actually in stable storage, breaking that would be pretty bad > no? Or are you only talking about when someone is running with > async Postgresql? > > Although this doesn't have an effect on my current application, > when running Postgresql with sync commits and WAL can one expect > the old behavior, ie. success only after data and meta data (log) > are written? Probably you misunderstand what Bruce expected to have. He wished to have not-everytime-fsync as an *option*. I believe we wil do strict fsync in default. -- Tatsuo Ishii
* Bruce Momjian <pgman@candle.pha.pa.us> [001111 00:16] wrote: > > * Tatsuo Ishii <t-ishii@sra.co.jp> [001110 18:42] wrote: > > > > > > > > Yes, though we can change this. We also can implement now > > > > feature that Bruce wanted so long and so much -:) - > > > > fsync log not on each commit but each ~ 5sec, if > > > > losing some recent commits is acceptable. > > > > > > Sounds great. > > > > Not really, I thought an ack on a commit would mean that the data > > is actually in stable storage, breaking that would be pretty bad > > no? Or are you only talking about when someone is running with > > async Postgresql? > > The default is to sync on commit, but we need to give people options of > several seconds delay for performance reasons. Inforimx calls it > buffered logging, and it is used by most of the sites I know because it > has much better performance that sync on commit. > > If the machine crashes five seconds after commit, many people don't have > a problem with just re-entering the data. We have several critical tables and running certain updates/deletes/inserts on them in async mode worries me. Would it be possible to add a 'set' command to force a backend into fsync mode and perhaps back into non-fsync mode as well? What about setting an attribute on a table that could mean a) anyone updating me better fsync me. b) anyone updating me better fsync me as well as fsyncing anything else they touch. I swear one of these days I'm going to get more familiar with the codebase and actually submit some useful patches for the backend. :( -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
Bruce Momjian <pgman@candle.pha.pa.us> writes: >> Not really, I thought an ack on a commit would mean that the data >> is actually in stable storage, breaking that would be pretty bad >> no? > The default is to sync on commit, but we need to give people options of > several seconds delay for performance reasons. Inforimx calls it > buffered logging, and it is used by most of the sites I know because it > has much better performance that sync on commit. I have to agree with Alfred here: this does not sound like a feature, it sounds like a horrid hack. You're giving up *all* consistency guarantees for a performance gain that is really going to be pretty minimal in the WAL context. Earlier, Vadim was talking about arranging to share fsyncs of the WAL log file across transactions (after writing your commit record to the log, sleep a few milliseconds to see if anyone else fsyncs before you do; if not, issue the fsync yourself). That would offer less-than- one-fsync-per-transaction performance without giving up any guarantees. If people feel a compulsion to have a tunable parameter, let 'em tune the length of the pre-fsync sleep ... regards, tom lane
> Bruce Momjian <pgman@candle.pha.pa.us> writes: > >> Not really, I thought an ack on a commit would mean that the data > >> is actually in stable storage, breaking that would be pretty bad > >> no? > > > The default is to sync on commit, but we need to give people options of > > several seconds delay for performance reasons. Inforimx calls it > > buffered logging, and it is used by most of the sites I know because it > > has much better performance that sync on commit. > > I have to agree with Alfred here: this does not sound like a feature, > it sounds like a horrid hack. You're giving up *all* consistency > guarantees for a performance gain that is really going to be pretty > minimal in the WAL context. It does not give up consistency. The db is still consistent, it is just consistent from a few seconds ago, rather than commit time. This is standard Informix practice at most law firms I work with. > > Earlier, Vadim was talking about arranging to share fsyncs of the WAL > log file across transactions (after writing your commit record to the > log, sleep a few milliseconds to see if anyone else fsyncs before you > do; if not, issue the fsync yourself). That would offer less-than- > one-fsync-per-transaction performance without giving up any guarantees. > If people feel a compulsion to have a tunable parameter, let 'em tune > the length of the pre-fsync sleep ... That would work. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: >> I have to agree with Alfred here: this does not sound like a feature, >> it sounds like a horrid hack. You're giving up *all* consistency >> guarantees for a performance gain that is really going to be pretty >> minimal in the WAL context. > It does not give up consistency. The db is still consistent, it is just > consistent from a few seconds ago, rather than commit time. No, it isn't consistent. Without the fsync you don't know what order the kernel will choose to plop down WAL log blocks in; you could end up with a corrupt log. (Actually, perhaps that could be worked around if the log blocks are suitably marked so that you can tell where the last sequentially valid one is. I haven't looked at the log structure in any detail...) regards, tom lane
> Bruce Momjian <pgman@candle.pha.pa.us> writes: > >> I have to agree with Alfred here: this does not sound like a feature, > >> it sounds like a horrid hack. You're giving up *all* consistency > >> guarantees for a performance gain that is really going to be pretty > >> minimal in the WAL context. > > > It does not give up consistency. The db is still consistent, it is just > > consistent from a few seconds ago, rather than commit time. > > No, it isn't consistent. Without the fsync you don't know what order > the kernel will choose to plop down WAL log blocks in; you could end up > with a corrupt log. (Actually, perhaps that could be worked around if > the log blocks are suitably marked so that you can tell where the last > sequentially valid one is. I haven't looked at the log structure in > any detail...) > I am just suggesting that instead of flushing the log on every transaction end, just do it every X seconds. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> Bruce Momjian <pgman@candle.pha.pa.us> writes: > >> I have to agree with Alfred here: this does not sound like a feature, > >> it sounds like a horrid hack. You're giving up *all* consistency > >> guarantees for a performance gain that is really going to be pretty > >> minimal in the WAL context. > > > It does not give up consistency. The db is still consistent, it is just > > consistent from a few seconds ago, rather than commit time. > > No, it isn't consistent. Without the fsync you don't know what order > the kernel will choose to plop down WAL log blocks in; you could end up > with a corrupt log. (Actually, perhaps that could be worked around if > the log blocks are suitably marked so that you can tell where the last > sequentially valid one is. I haven't looked at the log structure in > any detail...) Well, WAL already has to be careful in the order it plops down the log blocks because a single transaction can span multiple log blocks. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
* Tom Lane <tgl@sss.pgh.pa.us> [001111 12:06] wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > >> I have to agree with Alfred here: this does not sound like a feature, > >> it sounds like a horrid hack. You're giving up *all* consistency > >> guarantees for a performance gain that is really going to be pretty > >> minimal in the WAL context. > > > It does not give up consistency. The db is still consistent, it is just > > consistent from a few seconds ago, rather than commit time. > > No, it isn't consistent. Without the fsync you don't know what order > the kernel will choose to plop down WAL log blocks in; you could end up > with a corrupt log. (Actually, perhaps that could be worked around if > the log blocks are suitably marked so that you can tell where the last > sequentially valid one is. I haven't looked at the log structure in > any detail...) This could be fixed by using O_FSYNC on the open call for the WAL data files on *BSD, i'm not sure of the sysV equivelant, but I know it exists. -- -Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org] "I have the heart of a child; I keep it in a jar on my desk."