Thread: Question about wal files / pg_xlogs
ls -ltr /var/lib/pgsql/9.2/data/pg_xlogs/
Aug 4 02:18 000000020000159D000000D1
Aug 4 02:18 000000020000159D000000D2
Aug 4 02:18 000000020000159D000000D3
select * from pg_current_xlog_location();
159D/D6C8DAF8
On the slave:
ls -ltr /var/lib/pgsql/9.2/wal_archive:
Aug 4 00:58 000000020000159C00000071
Aug 4 00:58 000000020000159C00000072
Aug 4 00:58 000000020000159C00000073
On 08/03/2016 07:21 PM, Patrick B wrote: > Hi all, > > I'm currently using PostgreSQL 9.2. I noticed that the wal_files are > being generated by the master well, no problems. But on the slaves, it > seems to be a delay to the delivery of those wal_files. > > I got two slaves using streaming replication and wal files shipment from > Master. > > *On the master:* > > ls -ltr /var/lib/pgsql/9.2/data/pg_xlogs/ > Aug 4 02:18 000000020000159D000000D1 > Aug 4 02:18 000000020000159D000000D2 > Aug 4 02:18 000000020000159D000000D3 > > > select * from pg_current_xlog_location(); > 159D/D6C8DAF8 > > > So, seems to be ok. > > > *On the slave:* > > > ls -ltr /var/lib/pgsql/9.2/wal_archive: > Aug 4 00:58 000000020000159C00000071 > Aug 4 00:58 000000020000159C00000072 > Aug 4 00:58 000000020000159C00000073 > > > See the time difference? 2 hours? It seems the files are being delivered > with 2 hours delay. Both machines have same timezone? How fast are you generating WALs? How are you shipping the WALs? > > The streaming replication is working fine... But if that goes down, I'll > need the wal_files up to date to recover the database. > > How can I see what's going on? What would be the steps? any tips? > > Cheers > Patrick. -- Adrian Klaver adrian.klaver@aklaver.com
Both machines have same timezone?
How fast are you generating WALs?
checkpoint_segments = 64
checkpoint_timeout = 5min
checkpoint_completion_target = 0.6
checkpoint_warning = 30s
archive_timeout = 1800
max_wal_senders = 8
wal_keep_segments = 256
How are you shipping the WALs?
archive_command = 'exec nice -n 19 ionice -c 2 -n 7 archive_command.bash "%p" slave01 slave02'
archive_command.bash:
Basically we use TAR to ship through ssh:
# we use tar over SSH as I don't fully trust scp's exit status. The added benefit is that tar preserves all attributes
# the downside is that it's a little tricky to make the remote path relative
tar -c -O --no-same-owner -C "${WAL_SEGMENT%/*}" "${WAL_SEGMENT##*/}" | ssh -p ${NEXT_PORT} -C -o 'BatchMode=yes' -o 'CompressionLevel=3' "${USER}@${NEXT_HOST}" "exec tar -x --no-same-owner --overwrite -C '${WAL_ARCHIVE_PATH}'";
PS_CONCAT="${PIPESTATUS[*]}";
I use a bash script to ship them. The script hasn't being changed.... So it isn't the problem.postgresql.conf:archive_command = 'exec nice -n 19 ionice -c 2 -n 7 archive_command.bash "%p" slave01 slave02'archive_command.bash:
Basically we use TAR to ship through ssh:
# we use tar over SSH as I don't fully trust scp's exit status. The added benefit is that tar preserves all attributes
# the downside is that it's a little tricky to make the remote path relative
tar -c -O --no-same-owner -C "${WAL_SEGMENT%/*}" "${WAL_SEGMENT##*/}" | ssh -p ${NEXT_PORT} -C -o 'BatchMode=yes' -o 'CompressionLevel=3' "${USER}@${NEXT_HOST}" "exec tar -x --no-same-owner --overwrite -C '${WAL_ARCHIVE_PATH}'";
PS_CONCAT="${PIPESTATUS[*]}";The script is complex, but as I said, nothing has been changed on it.
postgresql.conf:archive_command = 'exec nice -n 19 ionice -c 2 -n 7 archive_command.bash "%p" slave01 slave02'archive_command.bash:
Basically we use TAR to ship through ssh:
# we use tar over SSH as I don't fully trust scp's exit status. The added benefit is that tar preserves all attributes
# the downside is that it's a little tricky to make the remote path relative
tar -c -O --no-same-owner -C "${WAL_SEGMENT%/*}" "${WAL_SEGMENT##*/}" | ssh -p ${NEXT_PORT} -C -o 'BatchMode=yes' -o 'CompressionLevel=3' "${USER}@${NEXT_HOST}" "exec tar -x --no-same-owner --overwrite -C '${WAL_ARCHIVE_PATH}'";
PS_CONCAT="${PIPESTATUS[*]}";
normally, you would ship the archived wal files to a file server via cp-over-nfs or scp, and have the slaves access them as needed via the recovery.conf
-- john r pierce, recycling bits in santa cruz
On 08/03/2016 07:59 PM, Patrick B wrote: > > > Both machines have same timezone? > > > Yes! Shouldn't be showing 2 hours before.. I just checked and both > server has the same date / timezone > > > > How fast are you generating WALs? > > > > Check below please > > checkpoint_segments = 64 > checkpoint_timeout = 5min > checkpoint_completion_target = 0.6 > checkpoint_warning = 30s > archive_timeout = 1800 > max_wal_senders = 8 > wal_keep_segments = 256 > > > > > How are you shipping the WALs? > > > > I use a bash script to ship them. The script hasn't being changed.... So > it isn't the problem. > > > *postgresql.conf:* > > archive_command = 'exec nice -n 19 ionice -c 2 -n 7 > archive_command.bash "%p" slave01 slave02' Seems to me the settings for nice and ionice above would, on a busy machine, slow down the transfer. Has there always been a notable time difference in the transfer or has it gotten worse over time? > > *archive_command.bash:* > > Basically we use TAR to ship through ssh: > > # we use tar over SSH as I don't fully trust scp's exit status. The > added benefit is that tar preserves all attributes > # the downside is that it's a little tricky to make the remote path > relative > tar -c -O --no-same-owner -C "${WAL_SEGMENT%/*}" > "${WAL_SEGMENT##*/}" | ssh -p ${NEXT_PORT} -C -o 'BatchMode=yes' -o > 'CompressionLevel=3' "${USER}@${NEXT_HOST}" "exec tar -x > --no-same-owner --overwrite -C '${WAL_ARCHIVE_PATH}'"; > PS_CONCAT="${PIPESTATUS[*]}"; > > > The script is complex, but as I said, nothing has been changed on it. -- Adrian Klaver adrian.klaver@aklaver.com
Seems to me the settings for nice and ionice above would, on a busy machine, slow down the transfer. Has there always been a notable time difference in the transfer or has it gotten worse over time?
normally, you would ship the archived wal files to a file server via cp-over-nfs or scp, and have the slaves access them as needed via the recovery.conf
What if the NFS server goes down? Networking goes down? We have had that kind of problem in the past, that's why I'm shipping the wal_files to each slave, separately. Also, to have an extra copy of them.
@Venkata Balaji N,
Not sure why the script is so complex. Do you see any messages in the postgresql log file on master ? and on slave ? which indicates the reason for delayed shipping of WAL archives. Did you notice any network level issues ?
Yes the script is complex.. I've hidden almost all of it for privacy purpose.. sorry....
I don't see any messages on the log files... not on the master and not on the slaves as well. I just see the message of the wal_files being successfully shipped to the slaves.
Also, no networking level issues.. because I got four slaves with streaming replication and all of them are working fine... also, my backup server has never failed... so no networking issues.
Thanks,
Patrick
On 08/04/2016 12:55 PM, Patrick B wrote: > @Adrian, > > > Seems to me the settings for nice and ionice above would, on a busy > machine, slow down the transfer. Has there always been a notable > time difference in the transfer or has it gotten worse over time? > > Yep... I also thought about that. Specially because the master is > constantly getting 100% of IO (we use SATA disks still)... > > I'm thinking about removing that `ionice` command... I don't need to > restart Postgres eh?? Just reload the confs? https://www.postgresql.org/docs/9.5/static/continuous-archiving.html#BACKUP-ARCHIVING-WAL "However, archive_command can be changed with a configuration file reload." > > > @John R Pierce, > > normally, you would ship the archived wal files to a file server via > cp-over-nfs or scp, and have the slaves access them as needed via > the recovery.conf > > What if the NFS server goes down? Networking goes down? We have had that > kind of problem in the past, that's why I'm shipping the wal_files to > each slave, separately. Also, to have an extra copy of them. > > > @Venkata Balaji N, > > > Not sure why the script is so complex. Do you see any messages in > the postgresql log file on master ? and on slave ? which indicates > the reason for delayed shipping of WAL archives. Did you notice any > network level issues ? > > Yes the script is complex.. I've hidden almost all of it for privacy > purpose.. sorry.... > > I don't see any messages on the log files... not on the master and not > on the slaves as well. I just see the message of the wal_files > being successfully shipped to the slaves. > > Also, no networking level issues.. because I got four slaves with > streaming replication and all of them are working fine... also, my > backup server has never failed... so no networking issues. > > > Thanks, > > Patrick > > > -- Adrian Klaver adrian.klaver@aklaver.com
https://www.postgresql.org/docs/9.5/static/continuous-archiv ing.html#BACKUP-ARCHIVING-WAL
"However, archive_command can be changed with a configuration file reload."
On 08/04/2016 01:16 PM, Patrick B wrote: > > > https://www.postgresql.org/docs/9.5/static/continuous-archiving.html#BACKUP-ARCHIVING-WAL > <https://www.postgresql.org/docs/9.5/static/continuous-archiving.html#BACKUP-ARCHIVING-WAL> > > "However, archive_command can be changed with a configuration file > reload." > > <mailto:adrian.klaver@aklaver.com> > > > Cheers... I removed the IONICE command from the archive_command. > However, did not see any difference. Well you just did it, so how would it be possible to notice whether it took two hours or not? > > Any idea? > -- Adrian Klaver adrian.klaver@aklaver.com
https://www.postgresql.org/docs/9.5/static/continuous-archiv ing.html#BACKUP-ARCHIVING-WAL
"However, archive_command can be changed with a configuration file reload."Cheers... I removed the IONICE command from the archive_command. However, did not see any difference.Any idea?
Just out of curiostity, are the slaves in the same physical location, or by some chance are they on a remote site?
--
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

Just out of curiostity, are the slaves in the same physical location, or by some chance are they on a remote site?
Just out of curiostity, are the slaves in the same physical location, or by some chance are they on a remote site?two of them in the same physical location, and the other two different country.
>two of them in the same physical location, and the other two different country.
--
I reserve the right to fantasize. Whether or not you
wish to share my fantasy is entirely up to you.

And the time difference is on ALL slaves, or just the two in a different country?
Hello!
As I can see files is delivered not with delay but with timeshift.
1. Can you show me restore_command on slave?
2. Also can you check archived WAL creation time on slaves in archive location after you copied them with archive_command? Is in near WAL creation time in pg_xlogs? Or different?
3. How do you check timezone equivalence between master and slave? What
Alex Ignatov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Hi all,I'm currently using PostgreSQL 9.2. I noticed that the wal_files are being generated by the master well, no problems. But on the slaves, it seems to be a delay to the delivery of those wal_files.I got two slaves using streaming replication and wal files shipment from Master.On the master:ls -ltr /var/lib/pgsql/9.2/data/pg_xlogs/
Aug 4 02:18 000000020000159D000000D1
Aug 4 02:18 000000020000159D000000D2
Aug 4 02:18 000000020000159D000000D3
select * from pg_current_xlog_location();
159D/D6C8DAF8So, seems to be ok.
On the slave:ls -ltr /var/lib/pgsql/9.2/wal_archive:
Aug 4 00:58 000000020000159C00000071
Aug 4 00:58 000000020000159C00000072
Aug 4 00:58 000000020000159C00000073See the time difference? 2 hours? It seems the files are being delivered with 2 hours delay.The streaming replication is working fine... But if that goes down, I'll need the wal_files up to date to recover the database.How can I see what's going on? What would be the steps? any tips?CheersPatrick.