Home > mailing lists

Re: Allow WAL information to recover corrupted pg_controldata - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: Allow WAL information to recover corrupted pg_controldata
Date	July 5, 2012 02:01:28
Msg-id	002801cd5a69$bd528460$37f78d20$@kapila@huawei.com Whole thread Raw
In response to	Re: Allow WAL information to recover corrupted pg_controldata (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Allow WAL information to recover corrupted pg_controldata
List	pgsql-hackers

Tree view

From: Robert Haas [mailto:robertmhaas@gmail.com]
Sent: Friday, June 22, 2012 8:59 PM
On Fri, Jun 22, 2012 at 5:25 AM, Amit Kapila <amit.kapila@huawei.com> wrote:
>> Based on the discussion and suggestions in this mail chain, following
features can be implemented:
>>
>> 1. To compute the value of max LSN in data pages based on user input
whether he wants it for an individual file,
>>   a particular directory or whole database.
>>
>> 2a. To search the available WAL files for the latest checkpoint record
and prints the value.
>> 2b. To search the available WAL files for the latest checkpoint record
and recreates a pg_control file pointing at that checkpoint.
>>
>> I have kept both options to address different kind of corruption
scenarios.

> I think I can see all of those things being potentially useful.  There
> are a couple of pending patches that will revise the WAL format
> slightly; not sure how much those are likely to interfere with any
> development you might do on (2) in the meantime.

Below is the details of Option-2, for Option-1, I will send mail separately

New option for pg_resetxlog:
-----------------------------
1. Introduce option -r to restore the control file if possible and print
those values.
3. User need to give option -f along with -r to rewrite the control file
from WAL files.
2. If not able to get the control information from WAL files then the   control data will be guessed and proceedes as
normalreset xlog.  
4. If the control information is restored, then the option -l is ignored.

Design for new Option:
----------------------

1.  Validate the pg_xlog directory before proceeding of restoring control
values. if the directory    is invalid then the control values will be guessed.
2.  Read the pg_xlog directory and read all the existing files.
3.  If it is a valid xlog file then add it to a list in an increasing order,
Otherwise the file    is ignored and continue to the next file.
4.  Try to find the last timestamp file from the list to start reading for a
checkpoint record.
5.  Read the first page from the file and validate it. if the validation
fails the restore happens with    guessed values.
6.  Read the first record as start of the record from the identified first
xlog file.
7.  If the first record is a continuation record from a previous record then
ignore the record    and continue to the next record.
8.  After getting the entire record then the record is validated, if it is
not a valid record    searching for the next record will be stopped and the control values
will be guessed.
9.  Search all the files to the end of the last file to get the latest
checkpoint record.
10. While searching for the record, if it is not reaching the last file
(there is missing file or invalid record)    then treat this scenario as a failure of finding the checkpoint record
and go for guessing the control values.
11. After finding the last checkpoint record, update the checkpoint record
information in the control file.

Implementation:
----------------
1. We need to use most of the functionality of functions mentioned below.
One way is to duplicate the code of these   functions related to functionality required by pg_resetxlog in
pg_resetxlog module. I have checked other modules also   but didn't find how we can use common functionality in server
utility
from backend code.   Could you please point me for the appropriate way for doing it.
  The list of functions:  1. ValidateXLOGDirectoryStructure   2. XLogPageRead   3. ReadRecord   4. RecordIsValid   5.
ValidXLOGPageHeader  6. ValidXLogRecordHeader 

Suggestions/Comments/Thoughts?

With Regards,
Amit Kapila.

pgsql-hackers by date:

From: Peter Geoghegan
Date: 04 July 2012, 22:49:58
Subject: Re: embedded list v2

From: Craig Ringer
Date: 05 July 2012, 02:34:07
Subject: Query from VMWare re use of PostgreSQL in new TPC-V benchmark

Re: Allow WAL information to recover corrupted pg_controldata - Mailing list pgsql-hackers

Previous

Next