I'm in the single-slave scenario, with hot standby capabilities, meaning I want to run queries on the slave. I'm
runningsome tests to evaluate pgbarman, on Ubuntu 11.10. I used only packaged PostgreSQL, and I'm running version
"PostgreSQL9.1.5 on x86_64-pc-linux-gnu, compiled by gcc-4.6.real (Ubuntu/Linaro 4.6.1-9ubuntu3) 4.6.1, 64-bit". Both
themaster and the slave are running on the same host.
master/postgresql.conf
port = 5432
archive_mode = on
wal_level = hot_standby
max_wal_senders = 3
wal_keep_segments = 256
archive_command = '/bin/cp --verbose %p /var/pgexchange/%f'
master/pg_hba.conf (as I said, testing config only):
host replication postgres 127.0.0.1/32 trust
slave/postgrseql.conf:
port = 5433
hot_standby = on
hot_standby_feedback = on
max_standby_archive_delay = -1
max_standby_streaming_delay = -1
slave/pg_hba.conf -- all at default
/var/lib/postgresql/9.1/slave0/recovery.conf:
standby_mode = on
restore_command = '/bin/cp --verbose /var/pgexchange/%f %p'
primary_conninfo = 'host=localhost port=5432 user=postgres password=supersecretpassword'
The slave's log says it's connected to the master, but I can't connect.
# psql -h localhost -p 5433 -U postgres -d mydb
psql: FATAL: the database system is starting up
FATAL: the database system is starting up
The slave's log, after a fresh pg_basebackup + restore for the slave, contains:
==> /var/log/postgresql/postgresql-9.1-slave0.log <==
2012-09-25 00:46:22 UTC LOG: database system was interrupted; last known up at 2012-09-25 00:44:20 UTC
2012-09-25 00:46:22 UTC LOG: creating missing WAL directory "pg_xlog/archive_status"
2012-09-25 00:46:22 UTC LOG: entering standby mode
`/var/pgexchange/000000010000000000000016' -> `pg_xlog/RECOVERYXLOG'
2012-09-25 00:46:22 UTC LOG: restored log file "000000010000000000000016" from archive
2012-09-25 00:46:23 UTC LOG: redo starts at 0/16000020
2012-09-25 00:46:23 UTC LOG: consistent recovery state reached at 0/17000000
/bin/cp: cannot stat `/var/pgexchange/000000010000000000000017': No such file or directory
2012-09-25 00:46:23 UTC LOG: incomplete startup packet
2012-09-25 00:46:23 UTC LOG: streaming replication successfully connected to primary
2012-09-25 00:46:23 UTC FATAL: the database system is starting up
2012-09-25 00:46:24 UTC FATAL: the database system is starting up
2012-09-25 00:46:24 UTC FATAL: the database system is starting up
The "system is starting up" are the result of the pg_ctlcluster script which attempts to connect to the database to
checkif the server's up and available. According to the log above, a consistent state is reached, and the slave
connectsto the primary. During the slave's reconnection, the master emits no messages.
On the master, pg_stat_replication looks fine:
# select * from pg_stat_replication ;
procpid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start
| state | sent_location | write_location | flush_location | replay_location | sync_priority | sync_state
---------+----------+----------+------------------+-------------+-----------------+-------------+-------------------------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
27920 | 10 | postgres | walreceiver | 127.0.0.1 | | 52193 | 2012-09-25
00:46:23.100631+00| streaming | 0/17000000 | 0/17000000 | 0/17000000 | 0/17000000 | 0 |
async
state == streaming; sent == write == flush == replay, so the slave seems to be consistent.
What am I missing here?
Thanks!
François