Thread: New Slave - timeline ERROR
I've started a new SLAVE PostgreSQL server set up.
* NOTE: I run the pg_basebackup from another STANDBY SERVER. Not from the MASTER
1 - screen -t basebackup
2 - su - postgres
3 - cd ~/9.2/data/
4 - ssh postgres@slave01 'pg_basebackup --pgdata=- --format=tar --label=bb_master --progress --host=localhost --port=5432 --username=replicator --xlog | pv --quiet --rate-limit 100M' | tar -x --no-same-owner
5 - I've commented the "primary_conninfo =" and "standby_mode=" so the slave can get the files from WAL_ARCHIVE
6 - Afte I got the logs:
postgres(iostreams)[10037]: 2016-01-09 00:07:26.604 UTC|10085|LOG: database system is ready to accept read only connections
7 - After the server finished the WAL_ARCHIVE, I turned on replication from MASTER on recovery.conf:
recovery.conf on the New Slave:
restore_command = 'exec nice -n 19 ionice -c 2 -n 7 ../../bin/restore_wal_segment.bash "../wal_archive/%f" "%p"'
archive_cleanup_command = 'exec nice -n 19 ionice -c 2 -n 7 ../../bin/pg_archivecleaup_mv.bash -d "../wal_archive" "%r"'
recovery_target_timeline = 'latest'
standby_mode = on
primary_conninfo = 'host=192.168.100.XX port=5432 user=replicator application_name=replication_slave02'
But, once I've restarted the POSTGRESQL I got this error:
WAL segment `../wal_archive/00000005.history` not found
2016-01-09 01:13:39.183 UTC|774|FATAL: timeline 2 of the primary does not match recovery target timeline 4
What can I do to solve the problem?
It's really important as it's a production New Slave. Thank you!
http://dba.stackexchange.com/a/53546/24393
pg_basebackup --pgdata=$PGDATA --host=$MASTER_IP --port=$PGPORT --username=replication --no-password --xlog-method=stream --format=plain --progress --verbose
primary_conninfo = 'user=replication host=$IP_OF_UPSTREAM_SLAVE_OR_MASTER port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres'
recovery_target_timeline = 'latest'
archive_cleanup_command = '/usr/pgsql-9.3/bin/pg_archivecleanup /path/to/WALs %r'
restore_command = 'cp /path/to/WALs/%f "%p" 2>> /your/PGDATA/path/pg_log/standby.log'
postgres(iostreams)[10037]: 2016-01-09 00:07:26.604 UTC|10085|LOG: database system is ready to accept read only connections
standby_mode = on
primary_conninfo = 'host=IP_TO_THE_OTHER_SLAVE port=5432 user=replicator application_name=replication_slave02'
time pg_basebackup --pgdata=$PGDATA --host=IP_OF_MASTER --port=5432 --username=replication --password --xlog-method=stream --format=plain --progress --verbose
- -xlog
Using this option is equivalent of using -X with method fetch.
On Fri, Jan 8, 2016 at 8:44 PM, drum.lucas@gmail.com <drum.lucas@gmail.com> wrote:Hi, I'm a bit too lazy to try suss out the exact reasons for your failure, but here is a reasonably thorough guide to set up replication:
http://dba.stackexchange.com/a/53546/24393A few tips:- Having the master ship WALs to the slaves is handy if you can pull it off. If you are doing it over the wire and using rsync, "-z" for compression is recommended. If you are doing the tar format of the pg_basebackup, you *must* have the master ship the WALs to the slave otherwise it won't be able to synchronize (the "stream" method ships WALs over the wire so the end result is a synchronized system.- I always run pg_basebackup from the slave I am building, for simplicity.- I create new slaves almost every day (we have thousands of databases) using a bash script and it almost much never fails. In essence it is a big wrapper around the pg_basebackup command (though we are using pg93 mostly).The base backup command that I run from the slave I am building:
pg_basebackup --pgdata=$PGDATA --host=$MASTER_IP --port=$PGPORT --username=replication --no-password --xlog-method=stream --format=plain --progress --verboseThe recovery.conf:standby_mode = 'on'
primary_conninfo = 'user=replication host=$IP_OF_UPSTREAM_SLAVE_OR_MASTER port=5432 sslmode=prefer sslcompression=1 krbsrvname=postgres'
recovery_target_timeline = 'latest'
archive_cleanup_command = '/usr/pgsql-9.3/bin/pg_archivecleanup /path/to/WALs %r'
restore_command = 'cp /path/to/WALs/%f "%p" 2>> /your/PGDATA/path/pg_log/standby.log'
Do you have any other ideia? Do you know if --xlog it's the problem and I should re-run the pg_basebackup again with the --xlog-method=stream option?
If the master is successfully ships WALs to the slave you are setting up you do not need the "stream" option.
On Sat, Jan 9, 2016 at 12:36 PM, drum.lucas@gmail.com <drum.lucas@gmail.com> wrote:Do you have any other ideia? Do you know if --xlog it's the problem and I should re-run the pg_basebackup again with the --xlog-method=stream option?If the master is successfully ships WALs to the slave you are setting up you do not need the "stream" option.
Hi,If the master is successfully ships WALs to the slave you are setting up you do not need the "stream" option.yes.. the master is successfully shipping the WALs....Is there anything else? Help, please hehehehe
If you are able to stop Postgres on the slave you are taking the base backup from, you could do this:
On Sat, Jan 9, 2016 at 1:49 PM, drum.lucas@gmail.com <drum.lucas@gmail.com> wrote:Hi,If the master is successfully ships WALs to the slave you are setting up you do not need the "stream" option.yes.. the master is successfully shipping the WALs....Is there anything else? Help, please heheheheIf you are able to stop Postgres on the slave you are taking the base backup from, you could do this:1). Stop postgres on slave12). Rsync slave1 to slave2 to copy only the deltas.3). When you start up slave2 the WALs that the master has shipped to slave2 should apply and bring your system up to consistency.
Hi,If you are able to stop Postgres on the slave you are taking the base backup from, you could do this:I'm not... the data base is 2 TB.So, a RSYNC would take DAYS..... And I'm not able to stop the SLAVE for that long time
Depending on when you took the base backup and how many changes have occurred at your source (slave1) database cluster, the rsync execution time may or may not take as long as a new base backup if is only only shipping deltas (changed files).
On Sat, Jan 9, 2016 at 1:54 PM, drum.lucas@gmail.com <drum.lucas@gmail.com> wrote:Hi,If you are able to stop Postgres on the slave you are taking the base backup from, you could do this:I'm not... the data base is 2 TB.So, a RSYNC would take DAYS..... And I'm not able to stop the SLAVE for that long timeDepending on when you took the base backup and how many changes have occurred at your source (slave1) database cluster, the rsync execution time may or may not take as long as a new base backup if is only only shipping deltas (changed files).
I could stop the slave then.... But I'm afraid getting it back online and get some other errors
At this point I think your options are slim. If you are feeling adventurous, you can try doing the rsync with the slave running, then do a second rsync with the slave stopped or do it from the master after putting the master in backup mode (eg. executing "pg_start_backup('slave_backup')")
On Sat, Jan 9, 2016 at 2:10 PM, drum.lucas@gmail.com <drum.lucas@gmail.com> wrote:I could stop the slave then.... But I'm afraid getting it back online and get some other errorsAt this point I think your options are slim. If you are feeling adventurous, you can try doing the rsync with the slave running, then do a second rsync with the slave stopped or do it from the master after putting the master in backup mode (eg. executing "pg_start_backup('slave_backup')")
rsync would be something like:from slave1:rsync -av data/pg_xlog slave2:/var/lib/postgresql/data/
rsync -azr --progress --partial postgres@$MASTER_IP:/var/lib/postgresql/data /var/lib/postgresql/data/ --exclude postmaster.pid
I didn't unterstand why doing RSYNC twice... sorry
rsync -azr --progress --partial postgres@$MASTER_IP:/var/lib/postgresql/data/var/lib/postgresql/data/ --exclude postmaster.pid
On Sat, Jan 9, 2016 at 2:22 PM, drum.lucas@gmail.com <drum.lucas@gmail.com> wrote:rsync would be something like:from slave1:rsync -av data/pg_xlog slave2:/var/lib/postgresql/data/Normally I run something like the following from the slave I am setting up.
rsync -azr --progress --partial postgres@$MASTER_IP:/var/lib/postgresql/data /var/lib/postgresql/data/ --exclude postmaster.pidI didn't unterstand why doing RSYNC twice... sorryUnless the source db cluster you are rsync'ing from is stopped, there will be changes to data files replicated from the master. The second rsync might not be necessary given the WALs are shipping from the master to slave2.
rsync -azr --progress --partial postgres@$MASTER_IP:/var/lib/postgresql/data/var/lib/postgresql/data/ --exclude postmaster.pidAh ok! So this will do an incrementa, right? not supposed to copy ALL the base/ again?
On Sunday, 10 January 2016, bricklen <bricklen@gmail.com> wrote:
On Sat, Jan 9, 2016 at 2:35 PM, drum.lucas@gmail.com <drum.lucas@gmail.com> wrote:rsync -azr --progress --partial postgres@$MASTER_IP:/var/lib/postgresql/data/var/lib/postgresql/data/ --exclude postmaster.pidAh ok! So this will do an incrementa, right? not supposed to copy ALL the base/ again?Yes, this is for incremental copying from the upstream source.Actually, you don't need the -r with -a (it is implied), and you can run it first with --dry-run to see what it _would_ do.If you are not shipping over the WAN, then omit the -z flag as you do not need compression.
--
Should I point of replication new slave to same DB?
I can't even guess what you're asking here.
-- john r pierce, recycling bits in santa cruz
What is the --pgdata=- in your original command? Are you perhaps in the wrong directory and not getting all the required files?
I'm out of fresh ideas. The rsync command is what I would go with, given that I can't think of any other commands to try.
Bottom-posting is the convention in the postgresql lists, and makes it easier to follow a long thread.
On Sat, Jan 9, 2016 at 3:16 PM, drum.lucas@gmail.com <drum.lucas@gmail.com> wrote:My servers are not in the same network. A new pg_backup would take 30 hours to complete as I use --rate-limit 100MB.If you had enough bandwidth, you could do some shell magic to parallelize the rsync commands, or use something like http://moo.nac.uci.edu/~hjm/parsync/ to do that. If you are limited by bandwidth, then a single rsync run is probably what you're stuck with.I really need to put his server up! =\If you were running zfs you could also take a snapshot of the fs and use that for your base backup, but I assume you would have mentioned that if it was an option.I don't think that running a pg_basebackup one more time will solve the problem, because I've already done that!I could run actually, but the problem is that it takes 30h! hahahahahWhat is the --pgdata=- in your original command? Are you perhaps in the wrong directory and not getting all the required files?I'm out of fresh ideas. The rsync command is what I would go with, given that I can't think of any other commands to try.
Have a look:Note that there are some limitations in an online backup from the standby:The backup history file is not created in the database cluster backed up.
There is no guarantee that all WAL files required for the backup are archived at the end of backup. If you are planning to use the backup for an archive recovery and want to ensure that all required files are available at that moment, you need to include them into the backup by using -x option.You had that in your original command I believe.
What is the --pgdata=- in your original command? Are you perhaps in the wrong directory and not getting all the required files?I run the pg_basebackup from the Slave on /var/lib/pgsql/9.2/data.So I'm not in the wrong directory...I'm out of fresh ideas. The rsync command is what I would go with, given that I can't think of any other commands to try.I chose the pg_basebackup command just to not stop any database. It's out of circumstances to stop even the slave one... sorry...I really don't know what else to do. Have tried everything!LucasOn 10 January 2016 at 13:31, bricklen <bricklen@gmail.com> wrote:Bottom-posting is the convention in the postgresql lists, and makes it easier to follow a long thread.
On Sat, Jan 9, 2016 at 3:16 PM, drum.lucas@gmail.com <drum.lucas@gmail.com> wrote:My servers are not in the same network. A new pg_backup would take 30 hours to complete as I use --rate-limit 100MB.If you had enough bandwidth, you could do some shell magic to parallelize the rsync commands, or use something like http://moo.nac.uci.edu/~hjm/parsync/ to do that. If you are limited by bandwidth, then a single rsync run is probably what you're stuck with.I really need to put his server up! =\If you were running zfs you could also take a snapshot of the fs and use that for your base backup, but I assume you would have mentioned that if it was an option.I don't think that running a pg_basebackup one more time will solve the problem, because I've already done that!I could run actually, but the problem is that it takes 30h! hahahahahWhat is the --pgdata=- in your original command? Are you perhaps in the wrong directory and not getting all the required files?I'm out of fresh ideas. The rsync command is what I would go with, given that I can't think of any other commands to try.
Have a look:Note that there are some limitations in an online backup from the standby:The backup history file is not created in the database cluster backed up.
There is no guarantee that all WAL files required for the backup are archived at the end of backup. If you are planning to use the backup for an archive recovery and want to ensure that all required files are available at that moment, you need to include them into the backup by using -x option.You had that in your original command I believe.