Home > mailing lists

Re: Cannot rebuild a standby server - Mailing list pgsql-admin

From	John Scalia
Subject	Re: Cannot rebuild a standby server
Date	June 20, 2014 19:01:13
Msg-id	53A484E8.2060200@gmail.com Whole thread Raw
In response to	Re: Cannot rebuild a standby server (Kevin Grittner <kgrittn@ymail.com>)
Responses	Re: Cannot rebuild a standby server
List	pgsql-admin

Tree view

Well, I did finally get it working by adding -X s -c fast to the pg_basebackup command.

Kevin, if I didn't copy WALs over, the database still refused to start as it claimed it was looking for a one of the
firstspecific files. Also, I've not seen any references to  
removing certain files like a backup_label file in the standby's data directory causing problems. The other files I
removedwere the old postgresql.pid file from the primary and a  
file called archiving_active, which I use for controlling whether postgresql writes WAL files or not. Seems a little
funnyto me that I've done this same procedure for over 4  
months with no problems, and today was the first time it bit me.

On 6/20/2014 2:09 PM, Kevin Grittner wrote:
> John Scalia <jayknowsunix@gmail.com> wrote:
>
>> In the true definition of insanity, I've tried to rebuild a standby
>> streaming replication server using the following steps several times:
>>
>> 1) ensure the postgresql data directory, /var/lib/pgsql/9.3/data, is empty.
>> 2) run: pg_basebackup -h <primary server> -D /var/lib/pgsql/9.3/data
>> 3) manually copy the WAL's from the primary server's pg_xlog directory
>> to the directory specified in the standby's recovery.conf restore_command.
> Step 3 is enough to cause database corruption on the replica.
>
>> 4) rm any artifacts from the standby's new data directory like the
>> backup_label file.
> So is that.
>
>> 5) copy the saved recovery.conf into the standby's data directory and check
>> it is accurate.
>> 6) Start the database using "service postgresql-9.3 start"
>>
>> Every time, however, the following appears in the pg_log/postgresql-Fri.log:
>> <timestamp> LOG: entering standby mode
>> <timestamp> LOG: restored log file "00000003.history"
>> <timestamp> LOG: invalid secondary checkpoint record
>> <timestamp> PANIC: could not locate a valid checkpoint record
> Yep, that's about the best result you can expect with the above
> procedure; it is also occasionally possible to get it to start, but
> if it did there would almost certainly be data loss or corruption.
>
>> All this was originally caused by testing the failover mechanism in pgpool. That
>> didn't succeed and I'm trying to get the servers back to their original
>> states. I've done this kind
>> of thing before, but don't know what's wrong with this effort. What have
>> I missed?
> You should enable WAL archiving and the restore_command in
> recovery.conf should copy WAL files from the archive.  The pg_xlog
> directory should be empty when starting recovery unless the primary
> is stopped and you only copy pg_xlog files from the stopped server
> into the pg_xlog directory of the recovery cluster.  Don't delete
> the backup_label file, because it has the information recovery
> needs about the point from which it should start WAL replay --
> without it, it will have to guess, and is very likely to get that
> wrong.
>
> The documentation is your friend.  It gives pretty specific
> instructions for what to do.
>
> --
> Kevin Grittner
> EDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

pgsql-admin by date:

From: Kevin Grittner
Date: 20 June 2014, 18:09:29
Subject: Re: Cannot rebuild a standby server

From: Kevin Grittner
Date: 20 June 2014, 20:36:16
Subject: Re: Cannot rebuild a standby server

Re: Cannot rebuild a standby server - Mailing list pgsql-admin

Previous

Next