Re: [Slony1-general] WAL partition overloaded--by autovacuum? - Mailing list pgsql-performance

From Greg Smith
Subject Re: [Slony1-general] WAL partition overloaded--by autovacuum?
Date
Msg-id 4C37AF89.1070103@2ndquadrant.com
Whole thread Raw
In response to Re: [Slony1-general] WAL partition overloaded--by autovacuum?  (Richard Yen <richyen@iparadigms.com>)
List pgsql-performance
Richard Yen wrote:
> I figured that pg_xlog and data/base could both be on the FusionIO drive, since there would be no latency when there
areno spindles. 
>


(Rolls eyes) Please be careful about how much SSD Kool-Aid you drink,
and be skeptical of vendor claims. They don't just make latency go away,
particularly on heavy write workloads where the technology is at its
weakest.

Also, random note, I'm seeing way too many FusionIO drive setups where
people don't have any redundancy to cope with a drive failure, because
the individual drives are so expensive they don't have more than one.
Make sure that if you lose one of the drives, you won't have a massive
data loss. Replication might help with that, if you can stand a little
bit of data loss when the SSD dies. Not if--when. Even if you have a
good one they don't last forever.

> This means my pg_xlog partition should be (2 + checkpoint_completion_target) * checkpoint_segments + 1 = 41 files, or
656MB. Then, if there are more than 49 files, unneeded segment files will be deleted, but in this case all segment
filesare needed, so they never got deleted.  Perhaps we should add in the docs that pg_xlog should be the size of the
DBor larger? 
>

Excessive write volume beyond the capacity of the hardware can end up
delaying the normal checkpoint that would have cleaned up all the xlog
files. There's a nasty spiral that can get into I've seen a couple of
times in similar form to what you reported. The pg_xlog should never
exceed the size computed by that formula for very long, but it can burst
above its normal size limits for a little bit. This is already mentioned
as possibility in the manual: "If, due to a short-term peak of log
output rate, there are more than 3 * checkpoint_segments + 1 segment
files, the unneeded segment files will be deleted instead of recycled
until the system gets back under this limit." Autovacuum is an easy way
to get the sort of activity needed to cause this problem, but I don't
know if it's a necessary component to see the problem. You have to be in
an unusual situation before the sum of the xlog files is anywhere close
to the size of the database though.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Index usage with functions in where condition
Next
From: Jeremy Palmer
Date:
Subject: Re: Index usage with functions in where condition