Thread: 9.0.4 Data corruption issue

9.0.4 Data corruption issue

From

Ken Caruso

Date:

16 July 2011, 08:10:29

Hello,

Had a DB that had bloated due to lots of updates and insuffcient auto vacuum settings. Attempted the following in the following order:

VACUUM VERBOSE full tablename;

CLUSTER tablename;

Did the full first to ensure that the cluster had enough space to run.

The db had already been previously clustered to an index. At some point during the VACUUM I started seeing the following errors in the logs:

CONTEXT: writing block 12125253 of relation base/2651908/652397108

WARNING: could not write block 12125253 of base/2651908/652397108

DETAIL: Multiple failures --- write error might be permanent.

Also seeing the same error for block 12125253, 2978.

And the cluster failed as well. I have attempted a check point using pg_start_backup() and that failed as well.

The system is still running and I have yet to shut it down. I am hoping to find out what possibilities I have if any to recover from this. Also how do I avoid this in the future?

-Ken

Re: 9.0.4 Data corruption issue

From

Ken Caruso

Date:

16 July 2011, 08:43:18

Forgot to mention we have streaming replication partner as well that around the same time stop recovering because a WAL segment could not be found on the primary.

On Sat, Jul 16, 2011 at 4:05 AM, Ken Caruso <ken@ipl31.net> wrote:

Hello,

Had a DB that had bloated due to lots of updates and insuffcient auto vacuum settings. Attempted the following in the following order:

VACUUM VERBOSE full tablename;
CLUSTER tablename;

Did the full first to ensure that the cluster had enough space to run.
The db had already been previously clustered to an index. At some point during the VACUUM I started seeing the following errors in the logs:

CONTEXT: writing block 12125253 of relation base/2651908/652397108
WARNING: could not write block 12125253 of base/2651908/652397108
DETAIL: Multiple failures --- write error might be permanent.

Also seeing the same error for block 12125253, 2978.

And the cluster failed as well. I have attempted a check point using pg_start_backup() and that failed as well.

The system is still running and I have yet to shut it down. I am hoping to find out what possibilities I have if any to recover from this. Also how do I avoid this in the future?

-Ken

Re: 9.0.4 Data corruption issue

From

Tom Lane

Date:

16 July 2011, 13:00:03

Ken Caruso <ken@ipl31.net> writes:
> Had a DB that had bloated due to lots of updates and insuffcient auto vacuum
> settings. Attempted the following in the following order:

> VACUUM VERBOSE full tablename;
> CLUSTER tablename;

> Did the full first to ensure that the cluster had enough space to run.
> The db had already been previously clustered to an index. At some point
> during the VACUUM I started seeing the following errors in the logs:

>  CONTEXT:  writing block 12125253 of relation base/2651908/652397108

>  WARNING:  could not write block 12125253 of base/2651908/652397108

>  DETAIL:  Multiple failures --- write error might be permanent.

Ummm .... what's the actual ERROR?

            regards, tom lane

Re: 9.0.4 Data corruption issue

From

Ken Caruso

Date:

16 July 2011, 16:42:54

On Sat, Jul 16, 2011 at 8:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ken Caruso <ken@ipl31.net> writes:
> Had a DB that had bloated due to lots of updates and insuffcient auto vacuum
> settings. Attempted the following in the following order:

> VACUUM VERBOSE full tablename;
> CLUSTER tablename;

> Did the full first to ensure that the cluster had enough space to run.
> The db had already been previously clustered to an index. At some point
> during the VACUUM I started seeing the following errors in the logs:

> CONTEXT: writing block 12125253 of relation base/2651908/652397108

> WARNING: could not write block 12125253 of base/2651908/652397108

> DETAIL: Multiple failures --- write error might be permanent.

Ummm .... what's the actual ERROR?

Sorry, the actual error reported by CLUSTER is:

gpup=> cluster verbose tablename;

INFO: clustering "dbname.tablename"

WARNING: could not write block 12125253 of base/2651908/652397108

DETAIL: Multiple failures --- write error might be permanent.

ERROR: could not open file "base/2651908/652397108.1" (target block 12125253): No such file or directory

CONTEXT: writing block 12125253 of relation base/2651908/652397108

Thanks

-Ken

regards, tom lane

Re: 9.0.4 Data corruption issue

From

Tom Lane

Date:

16 July 2011, 18:30:44

Ken Caruso <ken@ipl31.net> writes:
> Sorry, the actual error reported by CLUSTER is:

> gpup=> cluster verbose tablename;
> INFO:  clustering "dbname.tablename"
> WARNING:  could not write block 12125253 of base/2651908/652397108
> DETAIL:  Multiple failures --- write error might be permanent.
> ERROR:  could not open file "base/2651908/652397108.1" (target block
> 12125253): No such file or directory
> CONTEXT:  writing block 12125253 of relation base/2651908/652397108

Hmm ... it looks like you've got a dirty buffer in shared memory that
corresponds to a block that no longer exists on disk; in fact, the whole
table segment it belonged to is gone.  Or maybe the block or file number
in the shared buffer header is corrupted somehow.

I imagine you're seeing errors like this during each checkpoint attempt?

I can't think of any very good way to clean that up.  What I'd try here
is a forced database shutdown (immediate-mode stop) and see if it starts
up cleanly.  It might be that whatever caused this has also corrupted
the back WAL and so WAL replay will result in the same or similar error.
In that case you'll be forced to do a pg_resetxlog to get the DB to come
up again.  If so, a dump and reload and some manual consistency checking
would be indicated :-(

            regards, tom lane

Re: 9.0.4 Data corruption issue

From

Ken Caruso

Date:

17 July 2011, 04:31:49

On Sat, Jul 16, 2011 at 2:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

Ken Caruso <ken@ipl31.net> writes:
> Sorry, the actual error reported by CLUSTER is:

> gpup=> cluster verbose tablename;
> INFO: clustering "dbname.tablename"
> WARNING: could not write block 12125253 of base/2651908/652397108
> DETAIL: Multiple failures --- write error might be permanent.
> ERROR: could not open file "base/2651908/652397108.1" (target block
> 12125253): No such file or directory
> CONTEXT: writing block 12125253 of relation base/2651908/652397108

Hmm ... it looks like you've got a dirty buffer in shared memory that
corresponds to a block that no longer exists on disk; in fact, the whole
table segment it belonged to is gone. Or maybe the block or file number
in the shared buffer header is corrupted somehow.

I imagine you're seeing errors like this during each checkpoint attempt?

Hi Tom,

Thanks for the reply.

Yes, I tried a pg_start_backup() to force a checkpoint and it failed due to similar error.

I can't think of any very good way to clean that up. What I'd try here
is a forced database shutdown (immediate-mode stop) and see if it starts
up cleanly. It might be that whatever caused this has also corrupted
the back WAL and so WAL replay will result in the same or similar error.
In that case you'll be forced to do a pg_resetxlog to get the DB to come
up again. If so, a dump and reload and some manual consistency checking
would be indicated :-(

Before seeing this message, I restarted Postgres and it was able to get to a consistent state at which point I reclustered the db without error and everything appears to be fine. Any idea what caused this? Was it something to do with the Vacuum Full?

Thanks

-Ken

regards, tom lane

Re: 9.0.4 Data corruption issue

From

Cédric Villemain

Date:

17 July 2011, 07:05:28

2011/7/17 Ken Caruso <ken@ipl31.net>:
>
>
> On Sat, Jul 16, 2011 at 2:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> Ken Caruso <ken@ipl31.net> writes:
>> > Sorry, the actual error reported by CLUSTER is:
>>
>> > gpup=> cluster verbose tablename;
>> > INFO:  clustering "dbname.tablename"
>> > WARNING:  could not write block 12125253 of base/2651908/652397108
>> > DETAIL:  Multiple failures --- write error might be permanent.
>> > ERROR:  could not open file "base/2651908/652397108.1" (target block
>> > 12125253): No such file or directory
>> > CONTEXT:  writing block 12125253 of relation base/2651908/652397108
>>
>> Hmm ... it looks like you've got a dirty buffer in shared memory that
>> corresponds to a block that no longer exists on disk; in fact, the whole
>> table segment it belonged to is gone.  Or maybe the block or file number
>> in the shared buffer header is corrupted somehow.
>>
>> I imagine you're seeing errors like this during each checkpoint attempt?
>
> Hi Tom,
> Thanks for the reply.
> Yes, I tried a pg_start_backup() to force a checkpoint and it failed due to
> similar error.
>
>>
>> I can't think of any very good way to clean that up.  What I'd try here
>> is a forced database shutdown (immediate-mode stop) and see if it starts
>> up cleanly.  It might be that whatever caused this has also corrupted
>> the back WAL and so WAL replay will result in the same or similar error.
>> In that case you'll be forced to do a pg_resetxlog to get the DB to come
>> up again.  If so, a dump and reload and some manual consistency checking
>> would be indicated :-(
>
> Before seeing this message, I restarted Postgres and it was able to get to a
> consistent state at which point I reclustered the db without error and
> everything appears to be fine. Any idea what caused this? Was it something
> to do with the Vacuum Full?

Block number 12125253 is bigger that any block we can find in
base/2651908/652397108.1
Should the table size be in the 100GB range or 2-3 GB range ?
This should help decide: if in the former case, then probably at least
a segment disappear or, in the later, the shared_buffer turn
corrupted.

Ken, you didn't change RELSEG_SIZE, right ? (it needs to be change in
source code before compile it yourself)
In both case a hardware check is welcome I believe.
--
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

Re: 9.0.4 Data corruption issue

From

Ken Caruso

Date:

17 July 2011, 13:54:28

On Sun, Jul 17, 2011 at 3:04 AM, Cédric Villemain <cedric.villemain.debian@gmail.com> wrote:

2011/7/17 Ken Caruso <ken@ipl31.net>:
>
>
> On Sat, Jul 16, 2011 at 2:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> Ken Caruso <ken@ipl31.net> writes:
>> > Sorry, the actual error reported by CLUSTER is:
>>
>> > gpup=> cluster verbose tablename;
>> > INFO: clustering "dbname.tablename"
>> > WARNING: could not write block 12125253 of base/2651908/652397108
>> > DETAIL: Multiple failures --- write error might be permanent.
>> > ERROR: could not open file "base/2651908/652397108.1" (target block
>> > 12125253): No such file or directory
>> > CONTEXT: writing block 12125253 of relation base/2651908/652397108
>>
>> Hmm ... it looks like you've got a dirty buffer in shared memory that
>> corresponds to a block that no longer exists on disk; in fact, the whole
>> table segment it belonged to is gone. Or maybe the block or file number
>> in the shared buffer header is corrupted somehow.
>>
>> I imagine you're seeing errors like this during each checkpoint attempt?
>
> Hi Tom,
> Thanks for the reply.
> Yes, I tried a pg_start_backup() to force a checkpoint and it failed due to
> similar error.
>
>>
>> I can't think of any very good way to clean that up. What I'd try here
>> is a forced database shutdown (immediate-mode stop) and see if it starts
>> up cleanly. It might be that whatever caused this has also corrupted
>> the back WAL and so WAL replay will result in the same or similar error.
>> In that case you'll be forced to do a pg_resetxlog to get the DB to come
>> up again. If so, a dump and reload and some manual consistency checking
>> would be indicated :-(
>
> Before seeing this message, I restarted Postgres and it was able to get to a
> consistent state at which point I reclustered the db without error and
> everything appears to be fine. Any idea what caused this? Was it something
> to do with the Vacuum Full?

Block number 12125253 is bigger that any block we can find in
base/2651908/652397108.1
Should the table size be in the 100GB range or 2-3 GB range ?
This should help decide: if in the former case, then probably at least
a segment disappear or, in the later, the shared_buffer turn
corrupted.

Ken, you didn't change RELSEG_SIZE, right ? (it needs to be change in
source code before compile it yourself)
In both case a hardware check is welcome I believe.

No I am using Pre-built Ubuntu binaries so no source changes. Hardware appears to be fine, the controller is a P800 on a HP DL380 and all the diagnostics indicate the disks, controller, and battery are doing well.

Thanks

-Ken

--
Cédric Villemain 2ndQuadrant
http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support

Re: 9.0.4 Data corruption issue

From

Ken Caruso

Date:

19 July 2011, 16:27:30

On Sun, Jul 17, 2011 at 3:04 AM, Cédric Villemain <cedric.villemain.debian@gmail.com> wrote:

2011/7/17 Ken Caruso <ken@ipl31.net>:
>
>
> On Sat, Jul 16, 2011 at 2:30 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>
>> Ken Caruso <ken@ipl31.net> writes:
>> > Sorry, the actual error reported by CLUSTER is:
>>
>> > gpup=> cluster verbose tablename;
>> > INFO: clustering "dbname.tablename"
>> > WARNING: could not write block 12125253 of base/2651908/652397108
>> > DETAIL: Multiple failures --- write error might be permanent.
>> > ERROR: could not open file "base/2651908/652397108.1" (target block
>> > 12125253): No such file or directory
>> > CONTEXT: writing block 12125253 of relation base/2651908/652397108
>>
>> Hmm ... it looks like you've got a dirty buffer in shared memory that
>> corresponds to a block that no longer exists on disk; in fact, the whole
>> table segment it belonged to is gone. Or maybe the block or file number
>> in the shared buffer header is corrupted somehow.
>>
>> I imagine you're seeing errors like this during each checkpoint attempt?
>
> Hi Tom,
> Thanks for the reply.
> Yes, I tried a pg_start_backup() to force a checkpoint and it failed due to
> similar error.
>
>>
>> I can't think of any very good way to clean that up. What I'd try here
>> is a forced database shutdown (immediate-mode stop) and see if it starts
>> up cleanly. It might be that whatever caused this has also corrupted
>> the back WAL and so WAL replay will result in the same or similar error.
>> In that case you'll be forced to do a pg_resetxlog to get the DB to come
>> up again. If so, a dump and reload and some manual consistency checking
>> would be indicated :-(
>
> Before seeing this message, I restarted Postgres and it was able to get to a
> consistent state at which point I reclustered the db without error and
> everything appears to be fine. Any idea what caused this? Was it something
> to do with the Vacuum Full?

Block number 12125253 is bigger that any block we can find in
base/2651908/652397108.1
Should the table size be in the 100GB range or 2-3 GB range ?
This should help decide: if in the former case, then probably at least
a segment disappear or, in the later, the shared_buffer turn
corrupted.

The DB was in the 200GB-300GB range when this happened. What would cause the segment to go missing? Just wondering if there is any further action I should take like filing a bug or if this is a known issue. Thanks for everyone's help.

-Ken

Ken, you didn't change RELSEG_SIZE, right ? (it needs to be change in
source code before compile it yourself)
In both case a hardware check is welcome I believe.
--
Cédric Villemain 2ndQuadrant
http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support

Re: 9.0.4 Data corruption issue

From

Cédric Villemain

Date:

20 July 2011, 10:33:11

2011/7/20 Tom Lane <tgl@sss.pgh.pa.us>:
> Ken Caruso <ken@ipl31.net> writes:
>> On Sun, Jul 17, 2011 at 3:04 AM, Cédric Villemain <
>> cedric.villemain.debian@gmail.com> wrote:
>>>> Block number 12125253 is bigger that any block we can find in
>>>> base/2651908/652397108.1
>
>>> Should the table size be in the 100GB range or 2-3 GB range ?
>
>> The DB was in the 200GB-300GB range when this happened.
>
> Cédric was asking about the particular table's size, not the whole DB...
> the table in question is the one with relfilenode = 652397108.

Yes, the error origin is probably due to faillure outside PostgreSQL.
It should not happen but... well I was wondering how a table can be,
by error, partially truncated (truncation of segment, not the sql
TRUNCATE) by PostgreSQL...

--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation