Thread: recovery from xid wraparound
Hi
I'm running 7.4 on RHAS 4, and I think I've had a transaction id
wraparound issue in a stats database we have. Running the command below gives the suitably
worrying negative number:
[dbname]=# SELECT datname, age(datfrozenxid) FROM pg_database;
datname | age
------------------+-------------
[maindbname] | -2081610471
[otherdbname] | 1075601025
[otherdbname] | 1257289757
[otherdbname] | 1074582099
[otherdbname] | 1257289757
Which is weird - because I have vacuumed the database quite a lot -
both individual tables and I thought a vacuum of the whole database a
month or so ago.
Anyway - not noticed any data loss yet and was hoping it would be such
that if all tables had been vacuumed recently (including system catalog
tables), that there would be no remaining rows that would appear to
have a future xid and so the database should be ok?
Obviously I'm now doing the write thing with a vacuumdb -a - however
this has been running 9 hours now and looks like at least 7 hours to
go just on this one monstrous table
in the interests of risk reduction I've just knocked up a script to run
ahead and quickly vacuum all the other tables.
But my questions are thus...
a) is my assumption about the database being ok correct - assuming all
tables have been vacuumed recently, including catalog tables?
b) is it possible to safely abort my whole table vacuum now so I can
run it at the weekend when there's less traffic?
c) if I have experienced data loss, on the assumption all the table
structure remains (looks like it does), and I have a working backup
from before the xid wraparound (I do), can I just reinsert any
detected-missing data at the application level without needing a
dump/reload?
Any help appreciated in this really not-fun time,
thanks
S
I'm running 7.4 on RHAS 4, and I think I've had a transaction id
wraparound issue in a stats database we have. Running the command below gives the suitably
worrying negative number:
[dbname]=# SELECT datname, age(datfrozenxid) FROM pg_database;
datname | age
------------------+-----------
[maindbname] | -2081610471
[otherdbname] | 1075601025
[otherdbname] | 1257289757
[otherdbname] | 1074582099
[otherdbname] | 1257289757
Which is weird - because I have vacuumed the database quite a lot -
both individual tables and I thought a vacuum of the whole database a
month or so ago.
Anyway - not noticed any data loss yet and was hoping it would be such
that if all tables had been vacuumed recently (including system catalog
tables), that there would be no remaining rows that would appear to
have a future xid and so the database should be ok?
Obviously I'm now doing the write thing with a vacuumdb -a - however
this has been running 9 hours now and looks like at least 7 hours to
go just on this one monstrous table
in the interests of risk reduction I've just knocked up a script to run
ahead and quickly vacuum all the other tables.
But my questions are thus...
a) is my assumption about the database being ok correct - assuming all
tables have been vacuumed recently, including catalog tables?
b) is it possible to safely abort my whole table vacuum now so I can
run it at the weekend when there's less traffic?
c) if I have experienced data loss, on the assumption all the table
structure remains (looks like it does), and I have a working backup
from before the xid wraparound (I do), can I just reinsert any
detected-missing data at the application level without needing a
dump/reload?
Any help appreciated in this really not-fun time,
thanks
S
On Tue, Oct 24, 2006 at 07:43:15AM +0100, Shane Wright wrote: > Anyway - not noticed any data loss yet and was hoping it would be such > that if all tables had been vacuumed recently (including system catalog > tables), that there would be no remaining rows that would appear to > have a future xid and so the database should be ok? Running vacuum is the right solution, but I think you have to let it finish. In particular, in that version a database-wide vacuum has to complete before it will update the datfrozenxid (it's not tracked per table). > a) is my assumption about the database being ok correct - assuming all > tables have been vacuumed recently, including catalog tables? Should be ok, but apparently you missed one, or didn't do a database wide vacuum. > b) is it possible to safely abort my whole table vacuum now so I can > run it at the weekend when there's less traffic? Aborting vacuum is safe, but you have to do a database-wide vacuum at some point. > c) if I have experienced data loss, on the assumption all the table > structure remains (looks like it does), and I have a working backup > from before the xid wraparound (I do), can I just reinsert any > detected-missing data at the application level without needing a > dump/reload? A VACUUM will recover any data that slipped beyond the horizon less than 1 billion transactions ago, which I think covers you completely. The only issue is that unique indexes may be confused because new conflicting data may have been inserted while the old data was invisible. Only you can say if that's going to be an issue. Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
Martin, Thanks :) >Running vacuum is the right solution, but I think you have to let it >finish. In particular, in that version a database-wide vacuum has to >complete before it will update the datfrozenxid (it's not tracked per >table). >> a) is my assumption about the database being ok correct - assuming all >> tables have been vacuumed recently, including catalog tables? >Should be ok, but apparently you missed one, or didn't do a database wide >vacuum. Yes, probably missed this 220 million row beast that's still running now.. If I was to abort this vacuum, given that all other tables are vacuumed (including system catalog tables), what's the worst case scenario? - given that more transactions are happening on the database If I understand correctly, it would be that some rows could disappear from this large unvacuumed table if their xid was too old - but no other consequence? (fully aware that a db-wide vacuum is needed, but if it can [safely] wait for the weekend that would be preferable) Many thanks, S -----Original Message----- From: Martijn van Oosterhout [mailto:kleptog@svana.org] Sent: 24 October 2006 10:24 To: Shane Wright Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] recovery from xid wraparound On Tue, Oct 24, 2006 at 07:43:15AM +0100, Shane Wright wrote: > Anyway - not noticed any data loss yet and was hoping it would be such > that if all tables had been vacuumed recently (including system > catalog tables), that there would be no remaining rows that would > appear to have a future xid and so the database should be ok? Running vacuum is the right solution, but I think you have to let it finish. In particular, in that version a database-wide vacuum has to complete before it will update the datfrozenxid (it's not tracked per table). > a) is my assumption about the database being ok correct - assuming all > tables have been vacuumed recently, including catalog tables? Should be ok, but apparently you missed one, or didn't do a database wide vacuum. > b) is it possible to safely abort my whole table vacuum now so I can > run it at the weekend when there's less traffic? Aborting vacuum is safe, but you have to do a database-wide vacuum at some point. > c) if I have experienced data loss, on the assumption all the table > structure remains (looks like it does), and I have a working backup > from before the xid wraparound (I do), can I just reinsert any > detected-missing data at the application level without needing a > dump/reload? A VACUUM will recover any data that slipped beyond the horizon less than 1 billion transactions ago, which I think covers you completely. The only issue is that unique indexes may be confused because new conflicting data may have been inserted while the old data was invisible. Only you can say if that's going to be an issue. Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability > to litigate.
On Tue, Oct 24, 2006 at 11:06:01AM +0100, Shane Wright wrote: > If I was to abort this vacuum, given that all other tables are vacuumed > (including system catalog tables), what's the worst case scenario? - > given that more transactions are happening on the database Only tables that havn't been vacuumed in the last billion transactions are at risk. It's possible that if you've vacuumed that large table recently by itself that all the data is actually safe, just the system doesn't realise it. Just make sure you've really covered *all* the system tables. If they go you get really wierd results. > If I understand correctly, it would be that some rows could disappear > from this large unvacuumed table if their xid was too old - but no other > consequence? The VACUUM would make them reappear. To truly disappear they would have to be 3 billion transactions old. That leaves the unique index issue I mentioned. > (fully aware that a db-wide vacuum is needed, but if it can [safely] > wait for the weekend that would be preferable) That's risk-management. For example, all the really old tuples are possibly near the beginning of the table, thus this current vacuum will have fixed them already. But to get a handle on that you need to analyse your tuple turnover and usage ratio. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Attachment
Martijn, Thanks, >Just make sure you've really covered *all* the system tables. If they go you >get really weird results. I've been under the impression system tables get done first, then user(me)-created tables after - which means my previous [aborted] attempts at vacuuming them would have covered it, unless I'm missing something? (db was created by initdb, then pg_restore to load data into it, database was then vacuumed before production work began) I've looked at the list of catalog tables from 7.4's docs (URL below), and all 28 have been processed in this vacuum, so presumably same order for previous attempts: http://www.postgresql.org/docs/7.4/static/catalogs.html Checked with 'grep vacuuming vacuum.log | grep pg_ | grep toast -v | sort | uniq' Does this sound like a fair assumption? (it is on the first database in the cluster, these aren't coming up from other databases) Many thanks for your help! S -----Original Message----- From: Martijn van Oosterhout [mailto:kleptog@svana.org] Sent: 24 October 2006 11:50 To: Shane Wright Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] recovery from xid wraparound On Tue, Oct 24, 2006 at 11:06:01AM +0100, Shane Wright wrote: > If I was to abort this vacuum, given that all other tables are > vacuumed (including system catalog tables), what's the worst case > scenario? - given that more transactions are happening on the database Only tables that havn't been vacuumed in the last billion transactions are at risk. It's possible that if you've vacuumed that large table recently by itself that all the data is actually safe, just the system doesn't realise it. Just make sure you've really covered *all* the system tables. If they go you get really wierd results. > If I understand correctly, it would be that some rows could disappear > from this large unvacuumed table if their xid was too old - but no > other consequence? The VACUUM would make them reappear. To truly disappear they would have to be 3 billion transactions old. That leaves the unique index issue I mentioned. > (fully aware that a db-wide vacuum is needed, but if it can [safely] > wait for the weekend that would be preferable) That's risk-management. For example, all the really old tuples are possibly near the beginning of the table, thus this current vacuum will have fixed them already. But to get a handle on that you need to analyse your tuple turnover and usage ratio. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability > to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes: > A VACUUM will recover any data that slipped beyond the horizon less > than 1 billion transactions ago, which I think covers you completely. > The only issue is that unique indexes may be confused because new > conflicting data may have been inserted while the old data was > invisible. Only you can say if that's going to be an issue. I don't think there's a risk there. Uniqueness checks use SnapshotDirty so they aren't time-sensitive. regards, tom lane
"Shane Wright" <shane.wright@edigitalresearch.com> writes: >> Just make sure you've really covered *all* the system tables. > I've been under the impression system tables get done first, then > user(me)-created tables after - No, there's no such guarantee. A database-wide vacuum just does the tables in the order it finds 'em in pg_class. regards, tom lane
Tom, Thanks But are there just 28 (the 28 that have been vacuumed), or are there more (in 7.4). Happy there's no guarantee, but would help to know any possible damager in my current situation, Thanks S -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: 24 October 2006 15:23 To: Shane Wright Cc: Martijn van Oosterhout; pgsql-general@postgresql.org Subject: Re: [GENERAL] recovery from xid wraparound "Shane Wright" <shane.wright@edigitalresearch.com> writes: >> Just make sure you've really covered *all* the system tables. > I've been under the impression system tables get done first, then > user(me)-created tables after - No, there's no such guarantee. A database-wide vacuum just does the tables in the order it finds 'em in pg_class. regards, tom lane