Thread: Insert Performance with WAL and Fsync

Insert Performance with WAL and Fsync

From

Mike Schroepfer

Date:

10 January 2002, 14:15:13

Hi All,

I've posted this too admin with no responses - looks like I should have
posted it too general - so here goes:

Our application needs to do a moderate level of inserts into
the database as it runs.  We can hit 100-200 inserts a second
under certain conditions - and it is not something we can batch up for
later processing.   While testing out postgres I noticed
some troubling performance data under this sort of workload.  I've
got a run of pgbench which exhibits the behavior. This variant of
pgbench is a good approximation of our application
and the results mirror those of tests I did with our app - so changing
the scaling, client factors, etc of pgbench is not relevant.

I tried changing most other parameters of postmaster.conf (log files,
buffers,
etc) and none have such dramatic effects as illustrated below.

It appears the CPU utilization on both machines is very low (<15%)- so I'm
guessing it is mostly I/O overhead.

At the bottom of the message is the detailed result of pg_bench -t 500 under
the following conditions:

        wal_sync_method = open_sync    wal_sync_method = fsync
OS        FSYNC OFF    FSYNC ON        FSYNC ON
Solaris    74/75        22/22            30/30
Win2k        112/113    124/127        21/21


so the following questions arise:

1) How can I improve the Solaris performance without
   disabling fsync?
2) What are others getting for TPS on Solaris machines with
   fysnc on?  Are my numbers low/high/right in the ballpark?
3) Why does the Solaris performance with fysnc on/off differ
   by a factor of 3.4x while the windows fsync on/off differs
   by only 1.1x?  I thought WAL was supposed to dramatically
   reduce the cost of fsyncs?
4) Why does the Win2k behavior with fsync and open_sync differ
   so greatly?  Is fysnc on cygwin slow or does OPEN_SYNC
   not work properly (i.e. is not really syncing)
5) Assuming the Win2k open_sync implementation is working
   why is my Win2k laptop with a much slower harddrive with fsync
   on 5.6x faster than the solaris box.
6) Is there anything I could have done in the building of postgres
   from source to cause problems on Solaris?

Any help or insight would be greatly appreciated.

Mike




Here's the specs:
Win2k Sp2
PIII 1.2G/512 mb Cache
512mb Ram
1 4400RPM IDE disk (its a laptop)
Postgres 7.1.3 on cygwin
postmaster.conf default execpt where noted below

Solaris 8 01/01 with relatively recent patch sets
E220R 2x450Mhz USII
2GB Ram
1 36Gig 10000RPM SCSI Disk
Postgres 7.1.2 compiled using gcc
postmaster.conf default execpt where noted below

Here are the results of pg_bench:

Solaris:
wal_sync_method = open_sync
fsync = false

starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 1
number of transactions per client: 500
number of transactions actually processed: 500/500
tps = 74.825791(including connections establishing)
tps = 75.418729(excluding connections establishing)

Solaris:
wal_sync_method = open_sync
fsync = true

transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 1
number of transactions per client: 500
number of transactions actually processed: 500/500
tps = 22.691402(including connections establishing)
tps = 22.745163(excluding connections establishing)

Solaris:
wal_sync_method = fsync
fsync = true

transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 1
number of transactions per client: 500
number of transactions actually processed: 500/500
tps = 30.489359(including connections establishing)
tps = 30.591695(excluding connections establishing)
Windows:
wal_sync_method = open_sync
fsync = true

starting vacuum...end.
transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 1
number of transactions per client: 500
number of transactions actually processed: 500/500
tps = 124.409521(including connections establishing)
tps = 127.591055(excluding connections establishing)

Windows:
wal_sync_method = open_sync
fsync = false

transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 1
number of transactions per client: 500
number of transactions actually processed: 500/500
tps = 112.103652(including connections establishing)
tps = 113.250950(excluding connections establishing)

wal_sync_method = fsync
fsync = true

scaling factor: 1
number of clients: 1
number of transactions per client: 500
number of transactions actually processed: 500/500
tps = 21.191655(including connections establishing)
tps = 21.232517(excluding connections establishing)

Re: Insert Performance with WAL and Fsync

From

Tom Lane

Date:

10 January 2002, 16:55:05

Mike Schroepfer <mike@raplix.com> writes:
> It appears the CPU utilization on both machines is very low (<15%)- so I'm
> guessing it is mostly I/O overhead.

It looks like your Solaris box is a dual CPU machine?  PG 7.1.* suffers
from pretty awful performance on multiprocessors, due to a rather
braindead implementation of spinlocks.  If vmstat shows that neither the
CPUs nor the disks are working real hard, then I'd suspect this to be
the problem --- cf
http://archives.postgresql.org/pgsql-hackers/2002-01/msg00449.php
and other recent pghackers threads.

You might care to try your tests on current development sources (*not*
7.2b4; pull from CVS or use a nightly-snapshot tarball).  I think we've
improved the SMP performance considerably since 7.1, though more could
probably be done in future.  BTW, don't put a production database on
current sources, there's at least two unpleasant known bugs.

> 3) Why does the Solaris performance with fysnc on/off differ
>    by a factor of 3.4x while the windows fsync on/off differs
>    by only 1.1x?  I thought WAL was supposed to dramatically
>    reduce the cost of fsyncs?
> 4) Why does the Win2k behavior with fsync and open_sync differ
>    so greatly?  Is fysnc on cygwin slow or does OPEN_SYNC
>    not work properly (i.e. is not really syncing)

I don't know the innards of cygwin, but it would not surprise me in the
least to hear that it doesn't implement fsync & OPEN_SYNC efficiently
and/or correctly.  It has to sit atop Windows, which probably doesn't
have compatible APIs to support these behaviors reasonably.  The results
you show sure make it look like OPEN_SYNC is a no-op on cygwin...

            regards, tom lane

Re: Insert Performance with WAL and Fsync

From

Andrew Sullivan

Date:

10 January 2002, 17:15:11

> It appears the CPU utilization on both machines is very low (<15%)- so I'm
> guessing it is mostly I/O overhead.

Are you seeing anything due to swapping, &c?  What do memstat and
friends tell you?

> At the bottom of the message is the detailed result of pg_bench -t 500 under
> the following conditions:
>
>         wal_sync_method = open_sync    wal_sync_method = fsync
> OS        FSYNC OFF    FSYNC ON        FSYNC ON
> Solaris    74/75        22/22            30/30
> Win2k        112/113    124/127        21/21

We're using Solaris 7 and seeing considerably better performance than
that (mind you' we've got some honkin' big hardware underneath, and a
big RAID array with internal battery-backed smart caches, so I might
not be the right person to ask).

I note that you have IDE disks on your Win machine.  Are you sure
you're really getting an fsync?  Win2000, AFAIR, is pretty aggressive
about write-caching.  I may be wrong though -- I'm no Windows guy.

One thing I noticed is that the WAL commit_delay and siblings
settings were a big help.  My theory was WAL was costing us too much,
because we had such volume that WAL became a bottleneck -- it was
firing too quickly.  My answer was to increase those settings; I
noticed an immediate improvement.  I had to increase the segments as
well, in order to keep up; this takes slightly longer to recover in
case of a crash, of course, but not so long as to make the difference
worth worrying about.

> 1 36Gig 10000RPM SCSI Disk

I'd also worry about this 1-disk set-up.  I'd be inclined to double
the disk in order to put the WAL file on another spindle (or use RAID
to speed things up, but that's a lot more disk).

> Postgres 7.1.2 compiled using gcc

There are a couple of issues that make it worth the upgrade to 7.1.3.
See the archives.  Nothing to help your perf. though, if I recall.

If I can think of anything else, I'll let you know.

A
--
----
Andrew Sullivan                               87 Mowat Avenue
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3
                                         +1 416 646 3304 x110

Re: Insert Performance with WAL and Fsync

From

Mike Schroepfer

Date:

10 January 2002, 19:25:09

Andrew,

Thanks for the reply - responses below:

> > It appears the CPU utilization on both machines is very low
> (<15%)- so I'm
> > guessing it is mostly I/O overhead.
>
> Are you seeing anything due to swapping, &c?  What do memstat and
> friends tell you?

No swapping problems at all - lots of CPU and mem to go around (vmstat 5):

 procs     memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr s0 s1 s6 --   in   sy   cs us sy
id
 0 0 0 2140608 1820520 2 15  1  1  1  0  0  2  0  0  0  430  130   71  1  1
99
 0 0 0 2091848 1674728 21 65 0  0  0  0  0 67  0  0  0  806 2151  633 18 15
67
 0 0 0 2091848 1673928 21 52 0  0  0  0  0 65  0  0  0  806 2081  611 20 15
64
 0 0 0 2092472 1674008 12 19 0  8  8  0  0 39  0  0  0  634 1073  333 10  6
84

> We're using Solaris 7 and seeing considerably better performance than
> that (mind you' we've got some honkin' big hardware underneath, and a
> big RAID array with internal battery-backed smart caches, so I might
> not be the right person to ask).

What kind of numbers are you getting - so I can get an idea
of where I am relative.


> One thing I noticed is that the WAL commit_delay and siblings
> settings were a big help.  My theory was WAL was costing us too much,
> because we had such volume that WAL became a bottleneck -- it was
> firing too quickly.  My answer was to increase those settings; I
> noticed an immediate improvement.  I had to increase the segments as
> well, in order to keep up; this takes slightly longer to recover in
> case of a crash, of course, but not so long as to make the difference
> worth worrying about.

Can you share your config?  I've played with it without much help - I
probably
am missing something.



> > 1 36Gig 10000RPM SCSI Disk
>
> I'd also worry about this 1-disk set-up.  I'd be inclined to double
> the disk in order to put the WAL file on another spindle (or use RAID
> to speed things up, but that's a lot more disk).

Yeah - this is not a production system - I'm trying to do some testing
to understand our app's characteristics and hardware requirements.  I
certainly
would at least mirror for safety and would love to get the wal on a another
spindle.


> > Postgres 7.1.2 compiled using gcc
>
> There are a couple of issues that make it worth the upgrade to 7.1.3.
> See the archives.  Nothing to help your perf. though, if I recall.

Hmm - looked through the logs a good while ago and didn't see anything that
I was too
concerned about.  I'll double check tho - thanks!

> If I can think of anything else, I'll let you know.

Appreciate your help!

Mike


>
> A
> --
> ----
> Andrew Sullivan                               87 Mowat Avenue
> Liberty RMS                           Toronto, Ontario Canada
> <andrew@libertyrms.info>                              M6K 3E3
>                                          +1 416 646 3304 x110
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster
>

Re: Insert Performance with WAL and Fsync

From

Mike Schroepfer

Date:

10 January 2002, 19:25:17

Tom,

Thanks for the prompt reply.  I have some more info for you:

Results:

a.1) 69tps 7.1.2 (same build as before) on a Uniprocessor Ultra 10 Box with
fysnc
a.2) 52tps 7.1.2 (same build as before) on a Uniprocessor Ultra 10 Box
without fysnc
b.1) 32tps 7.1.2 on the dual processor box again
b.2) 31tps 7.2 tip of cvs on the dual processor box
b.3) Output of vmstat 5 during b.1

So 7.2 doesn't appear to help.  For fun I also disabled one of the
processors
and re-ran the tests.  Overall the numbers went down not up.   So
I do not think it is the locking/SMP problem.

Any other thoughts?  I'm happy to run tests/collect more info as it helps.

The other curious thing is that my little Ultra 10 is also beating the
220R.  Can anyone else chime in with comparative numbers so I know
how good/bad these are ?

Detailed results below.

Thanks!

Mike


a.1)
transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 1
number of transactions per client: 500
number of transactions actually processed: 500/500
tps = 69.109284(including connections establishing)
tps = 69.580949(excluding connections establishing)

a.2)
scaling factor: 1
number of clients: 1
number of transactions per client: 500
number of transactions actually processed: 500/500
tps = 52.001209(including connections establishing)
tps = 52.283831(excluding connections establishing)


b.1)
transaction type: TPC-B (sort of)
scaling factor: 1
number of clients: 1
number of transactions per client: 500
number of transactions actually processed: 500/500
tps = 32.517491(including connections establishing)
tps = 32.635141(excluding connections establishing)

b.2)
scaling factor: 1
number of clients: 1
number of transactions per client: 500
number of transactions actually processed: 500/500
tps = 31.648947(including connections establishing)
tps = 31.723237(excluding connections establishing)

b.3) output of vmstat 5 during b.1:

procs     memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr s0 s1 s6 --   in   sy   cs us sy
id
 0 0 0 2140760 1820952 2 15  1  1  1  0  0  2  0  0  0  430  130   71  1  1
99
 0 0 0 2093696 1690392 15 128 0 91 91 0  0 128 0  0  0 1232 1886  894 11  3
86
 0 0 0 2093440 1690048 5  0  0  0  0  0  0 168 0  0  0 1475 2213 1181 17  4
79
 0 0 0 2093440 1689824 3  0  0  1  1  0  0 167 0  0  0 1462 2228 1169 20  3
77
 0 0 0 2094472 1690568 27 42 0 12 12  0  0 36  0  0  0  630  646  286  4  1
94

> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Thursday, January 10, 2002 1:42 PM
> To: Mike Schroepfer
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Insert Performance with WAL and Fsync
>
>
> Mike Schroepfer <mike@raplix.com> writes:
> > It appears the CPU utilization on both machines is very low
> (<15%)- so I'm
> > guessing it is mostly I/O overhead.
>
> It looks like your Solaris box is a dual CPU machine?  PG
> 7.1.* suffers
> from pretty awful performance on multiprocessors, due to a rather
> braindead implementation of spinlocks.  If vmstat shows that
> neither the
> CPUs nor the disks are working real hard, then I'd suspect this to be
> the problem --- cf
> http://archives.postgresql.org/pgsql-hackers/2002-01/msg00449.php
> and other recent pghackers threads.
>
> You might care to try your tests on current development sources (*not*
> 7.2b4; pull from CVS or use a nightly-snapshot tarball).  I
> think we've
> improved the SMP performance considerably since 7.1, though more could
> probably be done in future.  BTW, don't put a production database on
> current sources, there's at least two unpleasant known bugs.
>
> > 3) Why does the Solaris performance with fysnc on/off differ
> >    by a factor of 3.4x while the windows fsync on/off differs
> >    by only 1.1x?  I thought WAL was supposed to dramatically
> >    reduce the cost of fsyncs?
> > 4) Why does the Win2k behavior with fsync and open_sync differ
> >    so greatly?  Is fysnc on cygwin slow or does OPEN_SYNC
> >    not work properly (i.e. is not really syncing)
>
> I don't know the innards of cygwin, but it would not surprise
> me in the
> least to hear that it doesn't implement fsync & OPEN_SYNC efficiently
> and/or correctly.  It has to sit atop Windows, which probably doesn't
> have compatible APIs to support these behaviors reasonably.
> The results
> you show sure make it look like OPEN_SYNC is a no-op on cygwin...
>
>             regards, tom lane
>

Re: Insert Performance with WAL and Fsync

From

Tom Lane

Date:

10 January 2002, 19:55:05

Mike Schroepfer <mike@raplix.com> writes:
> b.1) 32tps 7.1.2 on the dual processor box again
> b.2) 31tps 7.2 tip of cvs on the dual processor box

> So 7.2 doesn't appear to help.

Mmm, yeah, I missed that you were only looking at the single-client
case; so the extra CPU has nothing to do anyway, and there's no spinlock
contention to cause a problem.  Most of the work we've been doing for
7.2 was concerned with improving multi-client performance.  But still,
seems odd that you can't do better than 32 tps on this setup.

> b.3) output of vmstat 5 during b.1:

> procs     memory            page            disk          faults      cpu
>  r b w   swap  free  re  mf pi po fr de sr s0 s1 s6 --   in   sy   cs us sy
> id
>  0 0 0 2140760 1820952 2 15  1  1  1  0  0  2  0  0  0  430  130   71  1  1
> 99
>  0 0 0 2093696 1690392 15 128 0 91 91 0  0 128 0  0  0 1232 1886  894 11  3
> 86

What are the "disk" numbers in this variant of vmstat?  Given the low
CPU usage, one would hope that your disk is going flat out ... but if
128 is page transfers per second, it's not very impressive...

            regards, tom lane

Re: Insert Performance with WAL and Fsync

From

Mike Schroepfer

Date:

10 January 2002, 20:25:09

> > b.3) output of vmstat 5 during b.1:
>
> > procs     memory            page            disk
> faults      cpu
> >  r b w   swap  free  re  mf pi po fr de sr s0 s1 s6 --   in
>   sy   cs us sy
> > id
> >  0 0 0 2140760 1820952 2 15  1  1  1  0  0  2  0  0  0  430
>  130   71  1  1
> > 99
> >  0 0 0 2093696 1690392 15 128 0 91 91 0  0 128 0  0  0 1232
> 1886  894 11  3
> > 86
>
> What are the "disk" numbers in this variant of vmstat?  Given the low
> CPU usage, one would hope that your disk is going flat out ... but if
> 128 is page transfers per second, it's not very impressive...

here's iostat:
Its 184.8 io/s (8kpages) = 1.1mb/s - 92% busy.  Not too impressive.  I'm
following up too see if something is misconfigured.

device       r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b  tin tout  us
sy wt id
sd0          0.0  184.6    0.0 1145.3  0.0  1.7    9.0   0  92    0  116  13
3 20 64

Thanks again,

Mike