Thread: recovery from xid wraparound

recovery from xid wraparound

From

"Shane Wright"

Date:

24 October 2006, 04:04:33

Hi

I'm running 7.4 on RHAS 4, and I think I've had a transaction id
wraparound issue in a stats database we have. Running the command below gives the suitably
worrying negative number:

[dbname]=# SELECT datname, age(datfrozenxid) FROM pg_database;
datname | age
------------------+-------------
[maindbname] | -2081610471
[otherdbname] | 1075601025
[otherdbname] | 1257289757
[otherdbname] | 1074582099
[otherdbname] | 1257289757

Which is weird - because I have vacuumed the database quite a lot -
both individual tables and I thought a vacuum of the whole database a
month or so ago.

Anyway - not noticed any data loss yet and was hoping it would be such
that if all tables had been vacuumed recently (including system catalog
tables), that there would be no remaining rows that would appear to
have a future xid and so the database should be ok?

Obviously I'm now doing the write thing with a vacuumdb -a - however
this has been running 9 hours now and looks like at least 7 hours to
go just on this one monstrous table

in the interests of risk reduction I've just knocked up a script to run
ahead and quickly vacuum all the other tables.

But my questions are thus...

a) is my assumption about the database being ok correct - assuming all
tables have been vacuumed recently, including catalog tables?

b) is it possible to safely abort my whole table vacuum now so I can
run it at the weekend when there's less traffic?

c) if I have experienced data loss, on the assumption all the table
structure remains (looks like it does), and I have a working backup
from before the xid wraparound (I do), can I just reinsert any
detected-missing data at the application level without needing a
dump/reload?

Any help appreciated in this really not-fun time,

thanks

S

Re: recovery from xid wraparound

From

Martijn van Oosterhout

Date:

24 October 2006, 06:24:35

On Tue, Oct 24, 2006 at 07:43:15AM +0100, Shane Wright wrote:
> Anyway - not noticed any data loss yet and was hoping it would be such
> that if all tables had been vacuumed recently (including system catalog
> tables), that there would be no remaining rows that would appear to
> have a future xid and so the database should be ok?

Running vacuum is the right solution, but I think you have to let it
finish. In particular, in that version a database-wide vacuum has to
complete before it will update the datfrozenxid (it's not tracked per
table).

> a) is my assumption about the database being ok correct - assuming all
> tables have been vacuumed recently, including catalog tables?

Should be ok, but apparently you missed one, or didn't do a database
wide vacuum.

> b) is it possible to safely abort my whole table vacuum now so I can
> run it at the weekend when there's less traffic?

Aborting vacuum is safe, but you have to do a database-wide vacuum at
some point.

> c) if I have experienced data loss, on the assumption all the table
> structure remains (looks like it does), and I have a working backup
> from before the xid wraparound (I do), can I just reinsert any
> detected-missing data at the application level without needing a
> dump/reload?

A VACUUM will recover any data that slipped beyond the horizon less
than 1 billion transactions ago, which I think covers you completely.
The only issue is that unique indexes may be confused because new
conflicting data may have been inserted while the old data was
invisible. Only you can say if that's going to be an issue.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

signature.asc

Re: recovery from xid wraparound

From

"Shane Wright"

Date:

24 October 2006, 07:02:47

Martin,

Thanks :)

>Running vacuum is the right solution, but I think you have to let it
>finish. In particular, in that version a database-wide vacuum has to
>complete before it will update the datfrozenxid (it's not tracked per
>table).
>> a) is my assumption about the database being ok correct - assuming
all
>> tables have been vacuumed recently, including catalog tables?
>Should be ok, but apparently you missed one, or didn't do a database
wide
>vacuum.

Yes, probably missed this 220 million row beast that's still running
now..

If I was to abort this vacuum, given that all other tables are vacuumed
(including system catalog tables), what's the worst case scenario? -
given that more transactions are happening on the database

If I understand correctly, it would be that some rows could disappear
from this large unvacuumed table if their xid was too old - but no other
consequence?

(fully aware that a db-wide vacuum is needed, but if it can [safely]
wait for the weekend that would be preferable)

Many thanks,

S

-----Original Message-----
From: Martijn van Oosterhout [mailto:kleptog@svana.org]
Sent: 24 October 2006 10:24
To: Shane Wright
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] recovery from xid wraparound

On Tue, Oct 24, 2006 at 07:43:15AM +0100, Shane Wright wrote:
> Anyway - not noticed any data loss yet and was hoping it would be such

> that if all tables had been vacuumed recently (including system
> catalog tables), that there would be no remaining rows that would
> appear to have a future xid and so the database should be ok?

Running vacuum is the right solution, but I think you have to let it
finish. In particular, in that version a database-wide vacuum has to
complete before it will update the datfrozenxid (it's not tracked per
table).

> a) is my assumption about the database being ok correct - assuming all

> tables have been vacuumed recently, including catalog tables?

Should be ok, but apparently you missed one, or didn't do a database
wide vacuum.

> b) is it possible to safely abort my whole table vacuum now so I can
> run it at the weekend when there's less traffic?

Aborting vacuum is safe, but you have to do a database-wide vacuum at
some point.

> c) if I have experienced data loss, on the assumption all the table
> structure remains (looks like it does), and I have a working backup
> from before the xid wraparound (I do), can I just reinsert any
> detected-missing data at the application level without needing a
> dump/reload?

A VACUUM will recover any data that slipped beyond the horizon less than
1 billion transactions ago, which I think covers you completely. The
only issue is that unique indexes may be confused because new
conflicting data may have been inserted while the old data was
invisible. Only you can say if that's going to be an issue.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability
> to litigate.

Re: recovery from xid wraparound

From

Martijn van Oosterhout

Date:

24 October 2006, 07:50:36

On Tue, Oct 24, 2006 at 11:06:01AM +0100, Shane Wright wrote:
> If I was to abort this vacuum, given that all other tables are vacuumed
> (including system catalog tables), what's the worst case scenario? -
> given that more transactions are happening on the database

Only tables that havn't been vacuumed in the last billion transactions
are at risk. It's possible that if you've vacuumed that large table
recently by itself that all the data is actually safe, just the system
doesn't realise it.

Just make sure you've really covered *all* the system tables. If they
go you get really wierd results.

> If I understand correctly, it would be that some rows could disappear
> from this large unvacuumed table if their xid was too old - but no other
> consequence?

The VACUUM would make them reappear. To truly disappear they would
have to be 3 billion transactions old. That leaves the unique index
issue I mentioned.

> (fully aware that a db-wide vacuum is needed, but if it can [safely]
> wait for the weekend that would be preferable)

That's risk-management. For example, all the really old tuples are
possibly near the beginning of the table, thus this current vacuum will
have fixed them already. But to get a handle on that you need to
analyse your tuple turnover and usage ratio.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

signature.asc

Re: recovery from xid wraparound

From

"Shane Wright"

Date:

24 October 2006, 08:28:47

Martijn,

Thanks,

>Just make sure you've really covered *all* the system tables. If they
go you
>get really weird results.

I've been under the impression system tables get done first, then
user(me)-created tables after - which means my previous [aborted]
attempts at vacuuming them would have covered it, unless I'm missing
something?

(db was created by initdb, then pg_restore to load data into it,
database was then vacuumed before production work began)

I've looked at the list of catalog tables from 7.4's docs (URL below),
and all 28 have been processed in this vacuum, so presumably same order
for previous attempts:
http://www.postgresql.org/docs/7.4/static/catalogs.html

Checked with 'grep vacuuming vacuum.log | grep pg_ | grep toast -v |
sort | uniq'

Does this sound like a fair assumption?

(it is on the first database in the cluster, these aren't coming up from
other databases)

Many thanks for your help!

S

-----Original Message-----
From: Martijn van Oosterhout [mailto:kleptog@svana.org]
Sent: 24 October 2006 11:50
To: Shane Wright
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] recovery from xid wraparound

On Tue, Oct 24, 2006 at 11:06:01AM +0100, Shane Wright wrote:
> If I was to abort this vacuum, given that all other tables are
> vacuumed (including system catalog tables), what's the worst case
> scenario? - given that more transactions are happening on the database

Only tables that havn't been vacuumed in the last billion transactions
are at risk. It's possible that if you've vacuumed that large table
recently by itself that all the data is actually safe, just the system
doesn't realise it.

Just make sure you've really covered *all* the system tables. If they go
you get really wierd results.

> If I understand correctly, it would be that some rows could disappear
> from this large unvacuumed table if their xid was too old - but no
> other consequence?

The VACUUM would make them reappear. To truly disappear they would have
to be 3 billion transactions old. That leaves the unique index issue I
mentioned.

> (fully aware that a db-wide vacuum is needed, but if it can [safely]
> wait for the weekend that would be preferable)

That's risk-management. For example, all the really old tuples are
possibly near the beginning of the table, thus this current vacuum will
have fixed them already. But to get a handle on that you need to analyse
your tuple turnover and usage ratio.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability
> to litigate.

Re: recovery from xid wraparound

From

Tom Lane

Date:

24 October 2006, 11:08:45

Martijn van Oosterhout <kleptog@svana.org> writes:
> A VACUUM will recover any data that slipped beyond the horizon less
> than 1 billion transactions ago, which I think covers you completely.
> The only issue is that unique indexes may be confused because new
> conflicting data may have been inserted while the old data was
> invisible. Only you can say if that's going to be an issue.

I don't think there's a risk there.  Uniqueness checks use SnapshotDirty
so they aren't time-sensitive.

            regards, tom lane

Re: recovery from xid wraparound

From

Tom Lane

Date:

24 October 2006, 11:23:31

"Shane Wright" <shane.wright@edigitalresearch.com> writes:
>> Just make sure you've really covered *all* the system tables.

> I've been under the impression system tables get done first, then
> user(me)-created tables after -

No, there's no such guarantee.  A database-wide vacuum just does the
tables in the order it finds 'em in pg_class.

            regards, tom lane

Re: recovery from xid wraparound

From

"Shane Wright"

Date:

24 October 2006, 11:25:36

Tom,

Thanks

But are there just 28 (the 28 that have been vacuumed), or are there more (in 7.4).

Happy there's no guarantee, but would help to know any possible damager in my current situation,

Thanks

S

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: 24 October 2006 15:23
To: Shane Wright
Cc: Martijn van Oosterhout; pgsql-general@postgresql.org
Subject: Re: [GENERAL] recovery from xid wraparound

"Shane Wright" <shane.wright@edigitalresearch.com> writes:
>> Just make sure you've really covered *all* the system tables.

> I've been under the impression system tables get done first, then
> user(me)-created tables after -

No, there's no such guarantee.  A database-wide vacuum just does the tables in the order it finds 'em in pg_class.

            regards, tom lane