Corrupted database's files (linux RAID5 + PostgreSQL 8.3.0) - Mailing list pgsql-general
From | Peter Petrov |
---|---|
Subject | Corrupted database's files (linux RAID5 + PostgreSQL 8.3.0) |
Date | |
Msg-id | 48331F9F.9030508@demabg.com Whole thread Raw |
Responses |
Re: Corrupted database's files (linux RAID5 + PostgreSQL 8.3.0)
|
List | pgsql-general |
Hi, Today one of the disk was marked as as failed .... and now some files are corrupted. I've decided to copy the pgsqldata directory and try to fix PG_VERSION (see below for information - what PostgreSQL don't like) files ... and see if the database will come up. During copying files and etc. I'll be open for any other idea how to deal with the problem ;) PostgreSQL's log offer me to run initdb (HINT message from LOG file) - what will happen if then I try to copy the rest ot the structure into the newly created database cluster ? linux (Slackware 12.0.0), software RAID5 (partition based) + PostgreSQL 8.3.0: Here's what happen (from dmesg): --------------------------------------- # uname -a Linux xeonito 2.6.21.5 #3 SMP Tue Oct 2 16:20:48 EEST 2007 i686 Intel(R) Xeon(R) CPU E5335 @ 2.00GHz GenuineIntel GNU/Linux --------------------------------------- # dmesg sd 0:0:3:0: SCSI error: return code = 0x08000002 sdd: Current: sense key=0x4 ASC=0x44 ASCQ=0x0 Info fld=0x0 end_request: I/O error, dev sdd, sector 159620863 sd 0:0:3:0: SCSI error: return code = 0x08000002 sdd: Current: sense key=0x4 ASC=0x44 ASCQ=0x0 Info fld=0x0 end_request: I/O error, dev sdd, sector 159617119 raid5: Disk failure on sdd1, disabling device. Operation continuing on 4 devices ...... RAID5 conf printout: --- rd:5 wd:4 disk 0, o:1, dev:sdb1 disk 1, o:1, dev:sdc1 disk 2, o:0, dev:sdd1 disk 3, o:1, dev:sde1 disk 4, o:1, dev:sdf1 RAID5 conf printout: --- rd:5 wd:4 disk 0, o:1, dev:sdb1 disk 1, o:1, dev:sdc1 disk 3, o:1, dev:sde1 disk 4, o:1, dev:sdf1 --------------------------------------- # cat /proc/mdstat Personalities : [linear] [raid0] [raid1] [raid10] [raid6] [raid5] [raid4] [multipath] [faulty] md1 : active raid5 sdb1[0] sdf1[4] sde1[3] sdd1[5](F) sdc1[1] 585924608 blocks level 5, 8192k chunk, algorithm 2 [5/4] [UU_UU] md0 : active raid5 sdb2[0] sdf2[4] sde2[3] sdd2[5](F) sdc2[1] 390053888 blocks level 5, 1024k chunk, algorithm 2 [5/4] [UU_UU] unused devices: <none> --------------------------------------- And here's what the partitions look like: # fdisk -l /dev/sdb Disk /dev/sdb: 249.8 GB, 249865175040 bytes 255 heads, 63 sectors/track, 30377 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdb1 1 18237 146488671 83 Linux /dev/sdb2 18238 30377 97514550 83 Linux --------------------------------------- Kernel parameters: echo 4200000000 > /proc/sys/kernel/shmmax echo 4200000000 > /proc/sys/kernel/shmall sysctl -w vm.overcommit_memory=2 echo 8192 > /sys/block/md0/md/stripe_cache_size echo 8192 > /sys/block/md1/md/stripe_cache_size --------------------------------------- Both md0 and md1 are used from PostgreSQL - initially it was not design to use the whole disk sdb-sdf, but due to size requirement I join also the other unused space to be used by PostgreSQL. And here's the Postgre's log (FATAL message is coming when I try to connect to the database, of course this is the case for the most interesting database ... some other small databases are working fine): LOG: received smart shutdown request LOG: autovacuum launcher shutting down LOG: shutting down LOG: database system is shut down LOG: could not create IPv6 socket: Address family not supported by protocol LOG: database system was shut down at 2008-05-20 17:54:17 EEST LOG: autovacuum launcher started LOG: database system is ready to accept connections FATAL: "base/16399" is not a valid data directory DETAIL: File "base/16399/PG_VERSION" does not contain valid data. HINT: You might need to initdb. Of course base/16399/PG_VERSION contains something strange not the version information: # cat base/16399/PG_VERSION X ---------------------------------------
pgsql-general by date: