Thread: Database corruption
Postgres version 8.4.4 Hardware: 12 cpu intel sda 15K 600GB raid 1 sdb ssd 173GB raid 1 (most tables and indexes on sdb) adaptec controller 24G of memory shared buffers 12GB Machine age 3 months Normal load 2-4 200-300 Transactions per second We had a database failure last night after one of the tables had a corrupted block. After we noticed the corruption all available memory was used up plus swap. The database died (or rather killed by the kernel) with an out of memory error. We switched to the warm standby which doesn't have any corruption. On the postmortem I found 4 tables with corruption. Only thing that links these tables was there was autovacuum (to prevent wraparound) either running or had run on those tables. All tables are in the Gig range or multi Gig range. The vacuum of some of the tables had been going on for days. The errors from the log file were in the form of : ERROR: invalid page header in block 290125 of relation pg_tblspc/16385/18674/205612 After an attempted vaccum we had this error: 2010-06-24 17:31:09 UTC [31766]: [36-1]WARNING: PD_ALL_VISIBLE flag was incorrectly set in relation "org_crawl_page_scrape_result" page 128902 The first error was logged at 10:15pm At 1 am a pg_dump was run from cron and failed after 20 minutes while try ing to allocate an immenense amount of memory while attempting to dump one of the corrupted tables. At 2:00 am All memory was used up and cpu was maxed and a load average of 56. We transferred to the standby and rebooted the machine. At this time the database is sitting there though we'll need to remake the database and turn it into the warm standby.