Re: Issues with log-shipping replication - Mailing list pgsql-admin
From | Rural Hunter |
---|---|
Subject | Re: Issues with log-shipping replication |
Date | |
Msg-id | 4EEA9705.6090806@gmail.com Whole thread Raw |
In response to | Issues with log-shipping replication (Khusro Jaleel <mailing-lists@kerneljack.com>) |
Responses |
Re: Issues with log-shipping replication
|
List | pgsql-admin |
well, is pgsql capable to parse shell variables in postgresql.conf? anway you should check your master log. If it can not archive the wal, there will be errors reported in it. 于2011年12月16日 2:02:03,Khusro Jaleel写到: > Hello, I'm trying out a simple example from the Postgresql 9 > Administration Cookbook about File-based log shipping replication, but > I can't get it to work between 2 CentOS 5.7 VMs running Postgresql > 9.1. Here are my settings: > > .bash_profile on master: > export PGARCHIVE=/var/lib/pgsql/pgarchive > export STANDBYNODE=172.16.1.56 > > .bash_profile on slave: > export PGARCHIVE=/var/lib/pgsql/pgarchive > > postgresql.conf on master: > =================================================================== > # - Settings - > > wal_level = archive # minimal, archive, or hot_standby > # (change requires restart) > #fsync = on # turns forced synchronization on or off > #synchronous_commit = on # synchronization level; on, off, or local > #wal_sync_method = fsync # the default is the first option > # supported by the operating system: > # open_datasync > # fdatasync (default on Linux) > # fsync > # fsync_writethrough > # open_sync > #full_page_writes = on # recover from partial page writes > #wal_buffers = -1 # min 32kB, -1 sets based on shared_buffers > # (change requires restart) > #wal_writer_delay = 200ms # 1-10000 milliseconds > > #commit_delay = 0 # range 0-100000, in microseconds > #commit_siblings = 5 # range 1-1000 > > # - Checkpoints - > > #checkpoint_segments = 3 # in logfile segments, min 1, 16MB each > #checkpoint_timeout = 5min # range 30s-1h > #checkpoint_completion_target = 0.5 # checkpoint target duration, 0.0 > - 1.0 > #checkpoint_warning = 30s # 0 disables > > # - Archiving - > > archive_mode = on # allows archiving to be done > # (change requires restart) > archive_command = 'scp %p $STANDBYNODE:$PGARCHIVE/%f' > archive_timeout = 30 # force a logfile segment switch after this # > number of seconds; 0 disables > =================================================================== > > recovery.conf on slave: > =================================================================== > standby_mode = 'on' > restore_command = 'cp $PGARCHIVE/%f %p' > archive_cleanup_command = '/usr/pgsql-9.1/bin/pg_archivecleanup > $PGARCHIVE %r' > trigger_file = '/tmp/postgresql.trigger.5432 > =================================================================== > > do_backup.sh script to make an initial copy of master onto slave: > =================================================================== > #!/bin/bash > > psql -c "select pg_start_backup('base backup for log shipping')" > rsync -cva --inplace --exclude=*pg_xlog* ${PGDATA}/ $STANDBYNODE:$PGDATA > psql -c "select pg_stop_backup(), current_timestamp" > =================================================================== > > The procedure I am using to test this is: > > 1. Clear out the data dir on the slave, put only the recovery.conf > file there > 2. Run the 'do_backup.sh' script to copy the master data dir to the > slave, excluding pg_xlog > 3. Create the pg_xlog dir on the slave > 4. Make sure the $PGARCHIVE dir exists on the slave. Note that this is > completely outside the normal 'data' dir > 5. Start up the master. As soon as I do this, I see some archive files > appear in the $PGARCHIVE dir on the slave. This means the 'scp' > 'archive_command' I am using IS working. > 6. Now start up the slave, which prints the following to the log in > 9.1/data/pg_log: > > ======================================================================== > LOG: startup process (PID 4771) exited with exit code 1 > LOG: aborting startup due to startup process failure > LOG: database system was interrupted; last known up at 2011-12-08 > 19:45:10 UTC > LOG: creating missing WAL directory "pg_xlog/archive_status" > LOG: entering standby mode > LOG: restored log file "000000010000000000000027" from archive > LOG: redo starts at 0/27000078 > LOG: consistent recovery state reached at 0/28000000 > cp: cannot stat `/var/lib/pgsql/pgarchive/000000010000000000000028': > No such file or directory > cp: cannot stat `/var/lib/pgsql/pgarchive/000000010000000000000028': > No such file or directory > ======================================================================== > > Looking at the master, the file '000000010000000000000028' does exist > in 9.1/data/pg_xlog, but for some reason the master is NOT copying it > over to the slave. The slave has all the files prior to this, but not > anything after. > > Anybody got any ideas why? > > > >
pgsql-admin by date: