Home > mailing lists

V2 of PITR performance improvement for 8.4 - Mailing list pgsql-hackers

From	Koichi Suzuki
Subject	V2 of PITR performance improvement for 8.4
Date	November 27, 2008 08:04:51
Msg-id	a778a7260811270404g49254640x8ed58b12b7c65d0b@mail.gmail.com Whole thread Raw
Responses	Re: V2 of PITR performance improvement for 8.4 Re: V2 of PITR performance improvement for 8.4
List	pgsql-hackers

Tree view

Please find enclosed a revised version of pg_readahead and a patch to
invoke pg_readahead.
Changes from the previous one is as follows:

Pg_readahead now does not return any prefetched point.  It simply
prefetches all the datapages
refered from WAL records in a given WAL segment, except for those
whose first WAL record
includes full page write.

Because of this change, patch to the core was changed so that
pg_readahead is invoked when
WAL segment is opened.

Details will be found in README.

I've done a benchmark to see the effect of the prefetch.   Here's a report.

--------------------------------
Benchmark: DBT-2

Database size: 20GB

Gave less number of transactions than DBT-2 default avoid overload
status.  We ran the
benchmark for on hour with chekpoint timeout 30min and completion_target 0.5.
Then, collected all the archive log and run PITR.

Disks: RAID0 array (8 disks, 7200rpm).

Detailed conditions are given at the last.

Measure ment result is as follows: (for readability, PDF chart is also attached)

----------------------+------------+--------------------+---------------
WAL conditions        | Recovery   | Amount of          | recovery
                      | time (sec) | physical read (MB) | rate (TX/min)
----------------------+------------+--------------------+---------------
w/o prefetch          |            |                    |
archived with cp      |  6,611     |     5,435          |    402
FPW=off               |            |                    |
----------------------+------------+--------------------+---------------
w/o prefetch          |            |                    |
archived with cp      |  1,683     |       801          |  1,458
FPW=on                |            |                    |  (8.3)
----------------------+------------+--------------------+---------------
w/o prefetch          |            |                    |
archived with lesslog |  6,644     |     5,090          |    369
FPW=on                |            |                    |
----------------------+------------+--------------------+---------------
With prefetch         |            |                    |
archived with cp      |  1,161     |     5,543          |  2,290
FPW=off               |            |                    |
----------------------+------------+--------------------+---------------
With prefetch         |            |                    |
archived with cp      |  1,415     |     2,157          |  1,733
FPW=on                |            |                    |
----------------------+------------+--------------------+---------------
With prefetch         |            |                    |
archived with lesslog |  1,196     |     5,369          |  2,051
FPW=on                |            |                    | (This proposal)
----------------------+------------+--------------------+---------------
* lesslog means pg_compresslog
** DBT-2 thoughput: 682TPM (FPW=on), 739TPM (FPW=off)


This shows that although prefetch does not reduce the physical read,
it tremendously
impreves the time to read and as a result, if WAL archive is taken
with pg_lesslog and
prefetch is done, recovery duration is somewhat shorter than current
FPW=on score.

Important point is that the recovery rate is much higher than DBT-2
throughput.
Therefore, this can be combined with synchronous replication and hot standby,
tremendously reducing the amount of logs to be shipped (up to as small
as one tenth),
improving the recovery time and maintaining crash recovery success chance.

Just without FWP=off or with pg_compresslog, recovery does not catch up.

Because current pg_readahead only works in Linux, I'd like the patch to be into
the core and pg_readahead into contrib.

Other (major) environment is given below.

----<< H/W and OS >>-------------------
CPU: Pentium D, 2.8GHz
Memory: 2GB
Internal Disk: SATA 150GB, used to archive WAL.
External Disk: RAID 0 (Ultra Wide SCSI), 8 disks (SATA 7200rpm)
OS: RHEL ES 5.1 (64bit)

----<< Other PostgreSQL configuration >>--------
PostgreSQL: 8.4 dev. head, as of Oct.28th
max_connections: 100
shared_buffers:  32MB
checkpoint_segments: 1000
checkpoint_timeout: 30min
checkpoint_completion target: 0.5
archive_mode: on
autovacuum: on
logging_collector: on

--
------
Koichi Suzuki

Attachment

pgsql-hackers by date:

From: Alvaro Herrera
Date: 27 November 2008, 08:00:54
Subject: Re: Thread safety

From: Alvaro Herrera
Date: 27 November 2008, 08:07:47
Subject: Re: Fwd: [PATCHES] Auto Partitioning Patch - WIP version 1

V2 of PITR performance improvement for 8.4 - Mailing list pgsql-hackers

Attachment

Previous

Next