Home > mailing lists

Re: Analysis of ganged WAL writes - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Analysis of ganged WAL writes
Date	October 6, 2002 19:09:59
Msg-id	14533.1033945650@sss.pgh.pa.us Whole thread Raw
In response to	Analysis of ganged WAL writes (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Analysis of ganged WAL writes Re: Analysis of ganged WAL writes
List	pgsql-hackers

Tree view

I said:
> There is a simple error
> in the current code that is easily corrected: in XLogFlush(), the
> wait to acquire WALWriteLock should occur before, not after, we try
> to acquire WALInsertLock and advance our local copy of the write
> request pointer.  (To be exact, xlog.c lines 1255-1269 in CVS tip
> ought to be moved down to before line 1275, inside the "if" that
> tests whether we are going to call XLogWrite.)

That patch was not quite right, as it didn't actually flush the
later-arriving data.   The correct patch is

*** src/backend/access/transam/xlog.c.orig    Thu Sep 26 18:58:33 2002
--- src/backend/access/transam/xlog.c    Sun Oct  6 18:45:57 2002
***************
*** 1252,1279 ****     /* done already? */     if (!XLByteLE(record, LogwrtResult.Flush))     {
-         /* if something was added to log cache then try to flush this too */
-         if (LWLockConditionalAcquire(WALInsertLock, LW_EXCLUSIVE))
-         {
-             XLogCtlInsert *Insert = &XLogCtl->Insert;
-             uint32        freespace = INSERT_FREESPACE(Insert);
- 
-             if (freespace < SizeOfXLogRecord)    /* buffer is full */
-                 WriteRqstPtr = XLogCtl->xlblocks[Insert->curridx];
-             else
-             {
-                 WriteRqstPtr = XLogCtl->xlblocks[Insert->curridx];
-                 WriteRqstPtr.xrecoff -= freespace;
-             }
-             LWLockRelease(WALInsertLock);
-         }         /* now wait for the write lock */         LWLockAcquire(WALWriteLock, LW_EXCLUSIVE);
LogwrtResult= XLogCtl->Write.LogwrtResult;         if (!XLByteLE(record, LogwrtResult.Flush))         {
 
!             WriteRqst.Write = WriteRqstPtr;
!             WriteRqst.Flush = record;             XLogWrite(WriteRqst);         }
LWLockRelease(WALWriteLock);
--- 1252,1284 ----     /* done already? */     if (!XLByteLE(record, LogwrtResult.Flush))     {         /* now wait for
thewrite lock */         LWLockAcquire(WALWriteLock, LW_EXCLUSIVE);         LogwrtResult = XLogCtl->Write.LogwrtResult;
       if (!XLByteLE(record, LogwrtResult.Flush))         {
 
!             /* try to write/flush later additions to XLOG as well */
!             if (LWLockConditionalAcquire(WALInsertLock, LW_EXCLUSIVE))
!             {
!                 XLogCtlInsert *Insert = &XLogCtl->Insert;
!                 uint32        freespace = INSERT_FREESPACE(Insert);
! 
!                 if (freespace < SizeOfXLogRecord)    /* buffer is full */
!                     WriteRqstPtr = XLogCtl->xlblocks[Insert->curridx];
!                 else
!                 {
!                     WriteRqstPtr = XLogCtl->xlblocks[Insert->curridx];
!                     WriteRqstPtr.xrecoff -= freespace;
!                 }
!                 LWLockRelease(WALInsertLock);
!                 WriteRqst.Write = WriteRqstPtr;
!                 WriteRqst.Flush = WriteRqstPtr;
!             }
!             else
!             {
!                 WriteRqst.Write = WriteRqstPtr;
!                 WriteRqst.Flush = record;
!             }             XLogWrite(WriteRqst);         }         LWLockRelease(WALWriteLock);


To test this, I made a modified version of pgbench in which each
transaction consists of a simpleinsert into table_NNN values(0);
where each client thread has a separate insertion target table.
This is about the simplest transaction I could think of that would
generate a WAL record each time.

Running this modified pgbench with postmaster parameterspostmaster -i -N 120 -B 1000 --wal_buffers=250
and all other configuration settings at default, CVS tip code gives me
a pretty consistent 115-118 transactions per second for anywhere from
1 to 100 pgbench client threads.  This is exactly what I expected,
since the database (including WAL file) is on a 7200 RPM SCSI drive.
The theoretical maximum rate of sync'd writes to the WAL file is
therefore 120 per second (one per disk revolution), but we lose a little
because once in awhile the disk has to seek to a data file.

Inserting the above patch, and keeping all else the same, I get:

$ mybench -c 1 -t 10000 bench1
number of clients: 1
number of transactions per client: 10000
number of transactions actually processed: 10000/10000
tps = 116.694205 (including connections establishing)
tps = 116.722648 (excluding connections establishing)

$ mybench -c 5 -t 2000 -S -n bench1
number of clients: 5
number of transactions per client: 2000
number of transactions actually processed: 10000/10000
tps = 282.808341 (including connections establishing)
tps = 283.656898 (excluding connections establishing)

$ mybench -c 10 -t 1000 bench1
number of clients: 10
number of transactions per client: 1000
number of transactions actually processed: 10000/10000
tps = 443.131083 (including connections establishing)
tps = 447.406534 (excluding connections establishing)

$ mybench -c 50 -t 200 bench1
number of clients: 50
number of transactions per client: 200
number of transactions actually processed: 10000/10000
tps = 416.154173 (including connections establishing)
tps = 436.748642 (excluding connections establishing)

$ mybench -c 100 -t 100 bench1
number of clients: 100
number of transactions per client: 100
number of transactions actually processed: 10000/10000
tps = 336.449110 (including connections establishing)
tps = 405.174237 (excluding connections establishing)

CPU loading goes from 80% idle at 1 client to 50% idle at 5 clients
to <10% idle at 10 or more.

So this does seem to be a nice win, and unless I hear objections
I will apply it ...
        regards, tom lane

pgsql-hackers by date:

From: Alvaro Herrera
Date: 06 October 2002, 17:58:11
Subject: Naming convention

From: Greg Copeland
Date: 06 October 2002, 19:37:45
Subject: Re: Analysis of ganged WAL writes

Re: Analysis of ganged WAL writes - Mailing list pgsql-hackers

Previous

Next