Thread: Replication/cloning: rsync vs modification dates?

Replication/cloning: rsync vs modification dates?

From

Chris Angelico

Date:

16 July 2012, 05:28:50

I'm speccing up a three-node database for reliability, making use of
streaming replication, and it's all working but I have a bit of a
performance concern.

Suppose a node dies and is removed from the cluster, but then returns
(say, a day or two later). I could, of course, utterly wipe the
existing data on that node and take a fresh copy from the master, but
that would entail transferring the entire content of the database. The
recommended option appears to be rsync, which saves on network
traffic, but still has to read and hash every byte of data.

Can the individual files' modification timestamps be relied upon? If
so, it'd potentially mean a lot of savings, as the directory entries
can be read fairly efficiently. I could still then use rsync to
transfer those files (so if it's only a small part that's changed, we
take advantage of its optimizations too).

This may be digging too deep into the internals to be dependable for
future versions. If so, I'd rather put the extra load on the servers
than risk a future upgrade breaking replication subtly.

Chris Angelico

Re: Replication/cloning: rsync vs modification dates?

From

Michael Nolan

Date:

16 July 2012, 12:41:07

On 7/16/12, Chris Angelico <rosuav@gmail.com> wrote:
> I'm speccing up a three-node database for reliability, making use of
> streaming replication, and it's all working but I have a bit of a
> performance concern.
>
>
> Can the individual files' modification timestamps be relied upon? If
> so, it'd potentially mean a lot of savings, as the directory entries
> can be read fairly efficiently. I could still then use rsync to
> transfer those files (so if it's only a small part that's changed, we
> take advantage of its optimizations too).

I did several weeks of tests on 9.1.3 using mod time and file size
rather than checksumming the files, that did not appear to cause any problems
and it sped up the rsync considerably.  (This was about a 40 GB database.)
--
Mike Nolan

Re: Replication/cloning: rsync vs modification dates?

From

Chris Angelico

Date:

16 July 2012, 12:45:33

On Tue, Jul 17, 2012 at 1:40 AM, Michael Nolan <htfoot@gmail.com> wrote:
> I did several weeks of tests on 9.1.3 using mod time and file size
> rather than checksumming the files, that did not appear to cause any problems
> and it sped up the rsync considerably.  (This was about a 40 GB database.)

Thanks! Is file size a necessary part of the check, or can mod time
alone cover it?

I'm looking at having my monitoring application automatically bring
database nodes up, so it looks like the simplest way to handle it will
be to have the new slave mandatorially do the backup/rsync, even if
it's been down for only a couple of minutes. With a mod time check, I
could hopefully do this without too much hassle.

ChrisA

Re: Replication/cloning: rsync vs modification dates?

From

Michael Nolan

Date:

16 July 2012, 12:59:11

On 7/16/12, Chris Angelico <rosuav@gmail.com> wrote:
> On Tue, Jul 17, 2012 at 1:40 AM, Michael Nolan <htfoot@gmail.com> wrote:
>> I did several weeks of tests on 9.1.3 using mod time and file size
>> rather than checksumming the files, that did not appear to cause any
>> problems
>> and it sped up the rsync considerably.  (This was about a 40 GB
>> database.)
>
> Thanks! Is file size a necessary part of the check, or can mod time
> alone cover it?
>
> I'm looking at having my monitoring application automatically bring
> database nodes up, so it looks like the simplest way to handle it will
> be to have the new slave mandatorially do the backup/rsync, even if
> it's been down for only a couple of minutes. With a mod time check, I
> could hopefully do this without too much hassle.

As I understand the docs for rsync, it will use both mod time and file size
if told not to do checksums.
--
Mike Nolan

Re: Replication/cloning: rsync vs modification dates?

From

Chris Angelico

Date:

16 July 2012, 13:01:26

On Tue, Jul 17, 2012 at 1:58 AM, Michael Nolan <htfoot@gmail.com> wrote:
> As I understand the docs for rsync, it will use both mod time and file size
> if told not to do checksums.

Oh, so it does, I misread. Thanks! Time+size it is.

ChrisA

Re: Replication/cloning: rsync vs modification dates?

From

Sergey Konoplev

Date:

16 July 2012, 15:36:28

On Mon, Jul 16, 2012 at 8:01 PM, Chris Angelico <rosuav@gmail.com> wrote:
> On Tue, Jul 17, 2012 at 1:58 AM, Michael Nolan <htfoot@gmail.com> wrote:
>> As I understand the docs for rsync, it will use both mod time and file size
>> if told not to do checksums.

I wonder if it is correct in general to use mtime and size to perform
these checks from the point of view of PostgreSQL.

If it works with the current version then is there a guaranty that it
will work with the future versions?

>
> Oh, so it does, I misread. Thanks! Time+size it is.
>
> ChrisA
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general



--
Sergey Konoplev

a database architect, software developer at PostgreSQL-Consulting.com
http://www.postgresql-consulting.com

Jabber: gray.ru@gmail.com Skype: gray-hemp Phone: +79160686204

Re: Replication/cloning: rsync vs modification dates?

From

Michael Nolan

Date:

16 July 2012, 15:53:17

On 7/16/12, Sergey Konoplev <sergey.konoplev@postgresql-consulting.com> wrote:
> On Mon, Jul 16, 2012 at 8:01 PM, Chris Angelico <rosuav@gmail.com> wrote:
>> On Tue, Jul 17, 2012 at 1:58 AM, Michael Nolan <htfoot@gmail.com> wrote:
>>> As I understand the docs for rsync, it will use both mod time and file
>>> size
>>> if told not to do checksums.
>
> I wonder if it is correct in general to use mtime and size to perform
> these checks from the point of view of PostgreSQL.
>
> If it works with the current version then is there a guaranty that it
> will work with the future versions?

There are many things for which no guarantee of future compatibility
(or sufficiency) are the case.

 For that matter, there's really no assurance that timestamp+size is
sufficient NOW.

But checksums aren't 100% reliable, either.   without doing a byte by
byte comparison of two files, there's no way to ensure they are
identical.
--
Mike Nolan

Re: Replication/cloning: rsync vs modification dates?

From

Chris Angelico

Date:

16 July 2012, 18:43:00

On Tue, Jul 17, 2012 at 4:35 AM, Sergey Konoplev
<sergey.konoplev@postgresql-consulting.com> wrote:
> On Mon, Jul 16, 2012 at 8:01 PM, Chris Angelico <rosuav@gmail.com> wrote:
>> On Tue, Jul 17, 2012 at 1:58 AM, Michael Nolan <htfoot@gmail.com> wrote:
>>> As I understand the docs for rsync, it will use both mod time and file size
>>> if told not to do checksums.
>
> I wonder if it is correct in general to use mtime and size to perform
> these checks from the point of view of PostgreSQL.
>
> If it works with the current version then is there a guaranty that it
> will work with the future versions?

That was my exact question. Ideally, I'd like to hear from someone who
works with the Postgres internals, but the question may not even be
possible to answer.

ChrisA

Re: Replication/cloning: rsync vs modification dates?

From

Steven Schlansker

Date:

16 July 2012, 18:52:54

I think it's pretty easy to show that timestamp+size isn't good enough to do this 100% reliably.

Imagine that your timestamps have a millisecond resolution.  I assume this will vary based on OS / filesystem, but the
pointremains the same no matter what size it is. 

You can have multiple writes occur in the same quantized "instant".

If the prior rsync just happened to catch the first write (at T+0.1ms) in that instant but not the second (which
happenedat T+0.4ms), the second may not be transferred.  But the modification time is the same for the two writes. 

All that said, I think the chances of this actually happening is vanishingly small.  I personally use rsync without
checksumsand have had no problems. 

On Jul 16, 2012, at 2:42 PM, Chris Angelico wrote:

> On Tue, Jul 17, 2012 at 4:35 AM, Sergey Konoplev
> <sergey.konoplev@postgresql-consulting.com> wrote:
>> On Mon, Jul 16, 2012 at 8:01 PM, Chris Angelico <rosuav@gmail.com> wrote:
>>> On Tue, Jul 17, 2012 at 1:58 AM, Michael Nolan <htfoot@gmail.com> wrote:
>>>> As I understand the docs for rsync, it will use both mod time and file size
>>>> if told not to do checksums.
>>
>> I wonder if it is correct in general to use mtime and size to perform
>> these checks from the point of view of PostgreSQL.
>>
>> If it works with the current version then is there a guaranty that it
>> will work with the future versions?
>
> That was my exact question. Ideally, I'd like to hear from someone who
> works with the Postgres internals, but the question may not even be
> possible to answer.
>
> ChrisA
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Re: Replication/cloning: rsync vs modification dates?

From

John R Pierce

Date:

16 July 2012, 19:03:58

On 07/16/12 2:42 PM, Chris Angelico wrote:
> On Tue, Jul 17, 2012 at 4:35 AM, Sergey Konoplev
> <sergey.konoplev@postgresql-consulting.com>  wrote:
>> >I wonder if it is correct in general to use mtime and size to perform
>> >these checks from the point of view of PostgreSQL.
>> >
>> >If it works with the current version then is there a guaranty that it
>> >will work with the future versions?
> That was my exact question. Ideally, I'd like to hear from someone who
> works with the Postgres internals, but the question may not even be
> possible to answer.

as much as anything else, this is dependent on your OS properly updating
mtime on an open file that's getting random writes.



--
john r pierce                            N 37, W 122
santa cruz ca                         mid-left coast

Re: Replication/cloning: rsync vs modification dates?

From

Michael Nolan

Date:

16 July 2012, 19:24:53

On 7/16/12, Steven Schlansker <steven@likeness.com> wrote:
> I think it's pretty easy to show that timestamp+size isn't good enough to do
> this 100% reliably.

That may not be a problem if the slave server synchronization code
always starts to play back WAL entries at a time before the worst case
for timestamp precision.

I'm assuming here that the WAL playback process works something like this:

Look at a WAL entry, see if the disk block it references matches the
'before' indicators for that block in the WAL.   If so, update it to
the 'after' data content.

There are two non-matching conditions:

If the disk block information indicates that it should match a later
update, then that block does not need to be updated.

But if the disk block information indicates that it should match an
earlier update than the one in the WAL entry, then the synchronization
fails.

Re: Replication/cloning: rsync vs modification dates?

From

Bruce Momjian

Date:

26 July 2012, 20:54:11

On Tue, Jul 17, 2012 at 07:42:38AM +1000, Chris Angelico wrote:
> On Tue, Jul 17, 2012 at 4:35 AM, Sergey Konoplev
> <sergey.konoplev@postgresql-consulting.com> wrote:
> > On Mon, Jul 16, 2012 at 8:01 PM, Chris Angelico <rosuav@gmail.com> wrote:
> >> On Tue, Jul 17, 2012 at 1:58 AM, Michael Nolan <htfoot@gmail.com> wrote:
> >>> As I understand the docs for rsync, it will use both mod time and file size
> >>> if told not to do checksums.
> >
> > I wonder if it is correct in general to use mtime and size to perform
> > these checks from the point of view of PostgreSQL.
> >
> > If it works with the current version then is there a guaranty that it
> > will work with the future versions?
>
> That was my exact question. Ideally, I'd like to hear from someone who
> works with the Postgres internals, but the question may not even be
> possible to answer.

You might want to look at the hackers list thread I started about the
same topic a week before your post:

    http://archives.postgresql.org/pgsql-hackers/2012-07/msg00416.php

Basically, you can only use mtime/size if you are replaying WAL.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Re: Replication/cloning: rsync vs modification dates?

From

Chris Angelico

Date:

26 July 2012, 20:58:10

On Fri, Jul 27, 2012 at 9:53 AM, Bruce Momjian <bruce@momjian.us> wrote:
> You might want to look at the hackers list thread I started about the
> same topic a week before your post:
>
>         http://archives.postgresql.org/pgsql-hackers/2012-07/msg00416.php
>
> Basically, you can only use mtime/size if you are replaying WAL.

I'll check that out in a bit; but hot standby includes replaying WAL,
right? That's what we're doing - full live replication with
possibility to "pg_ctl promote" a slave straight up to master.

ChrisA

Re: Replication/cloning: rsync vs modification dates?

From

Bruce Momjian

Date:

26 July 2012, 22:12:43

On Fri, Jul 27, 2012 at 09:57:55AM +1000, Chris Angelico wrote:
> On Fri, Jul 27, 2012 at 9:53 AM, Bruce Momjian <bruce@momjian.us> wrote:
> > You might want to look at the hackers list thread I started about the
> > same topic a week before your post:
> >
> >         http://archives.postgresql.org/pgsql-hackers/2012-07/msg00416.php
> >
> > Basically, you can only use mtime/size if you are replaying WAL.
>
> I'll check that out in a bit; but hot standby includes replaying WAL,
> right? That's what we're doing - full live replication with
> possibility to "pg_ctl promote" a slave straight up to master.

Yes, WAL is replayed in that case and any sub-second changes are going
to be replayed from the WAL log.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +

Re: Replication/cloning: rsync vs modification dates?

From

Chris Angelico

Date:

27 July 2012, 01:13:49

On Fri, Jul 27, 2012 at 9:57 AM, Chris Angelico <rosuav@gmail.com> wrote:
> On Fri, Jul 27, 2012 at 9:53 AM, Bruce Momjian <bruce@momjian.us> wrote:
>> You might want to look at the hackers list thread I started about the
>> same topic a week before your post:
>>
>>         http://archives.postgresql.org/pgsql-hackers/2012-07/msg00416.php
>>
>> Basically, you can only use mtime/size if you are replaying WAL.
>
> I'll check that out in a bit; but hot standby includes replaying WAL,
> right? That's what we're doing - full live replication with
> possibility to "pg_ctl promote" a slave straight up to master.

Hi, thanks for that link. Just got a chance to read through the thread.

In this post[1] the script executes "checkpoint" before
"pg_start_backup" - is that important? According to the docs[2]:

"There is an optional second parameter of type boolean. If true, it
specifies executing pg_start_backup as quickly as possible. This
forces an immediate checkpoint which will cause a spike in I/O
operations, slowing any concurrently executing queries."

Is "checkpoint; select pg_start_backup('foo');" the same as "select
pg_start_backup('foo',true);"? And what are the consequences of not
calling for a checkpoint that way? My understanding of the docs is
that the pg_start_backup call will hang until a checkpoint happens
organically, ie delaying the backup rather than other clients, but I'm
not really sure and haven't a sample database big or busy enough to
test this on.

Other than that, I think our current setup is fine. I have a script
that, every time a computer attempts to join the cluster, redoes the
"start backup, rsync, stop backup" sequence. I'm depending on (and
assuming) the correct transfer of the last bit of log via the
replication link, as soon as the new slave starts up - presumably
this'll all be provided from wal_keep_segments.

Again, thanks for the pointer! A good read.

ChrisA

[1] http://archives.postgresql.org/pgsql-hackers/2012-07/msg00417.php
[2] http://www.postgresql.org/docs/9.1/static/functions-admin.html

Re: Replication/cloning: rsync vs modification dates?

From

Bruce Momjian

Date:

28 July 2012, 18:54:53

On Fri, Jul 27, 2012 at 02:13:31PM +1000, Chris Angelico wrote:
> On Fri, Jul 27, 2012 at 9:57 AM, Chris Angelico <rosuav@gmail.com> wrote:
> > On Fri, Jul 27, 2012 at 9:53 AM, Bruce Momjian <bruce@momjian.us> wrote:
> >> You might want to look at the hackers list thread I started about the
> >> same topic a week before your post:
> >>
> >>         http://archives.postgresql.org/pgsql-hackers/2012-07/msg00416.php
> >>
> >> Basically, you can only use mtime/size if you are replaying WAL.
> >
> > I'll check that out in a bit; but hot standby includes replaying WAL,
> > right? That's what we're doing - full live replication with
> > possibility to "pg_ctl promote" a slave straight up to master.
>
> Hi, thanks for that link. Just got a chance to read through the thread.
>
> In this post[1] the script executes "checkpoint" before
> "pg_start_backup" - is that important? According to the docs[2]:
>
> "There is an optional second parameter of type boolean. If true, it
> specifies executing pg_start_backup as quickly as possible. This
> forces an immediate checkpoint which will cause a spike in I/O
> operations, slowing any concurrently executing queries."

A checkpoint is always issued by pg_start_backup().  The boolean
controls whether the checkpoint is immediate or smoothed, meaning it can
take a while to return a status of complete.

> Is "checkpoint; select pg_start_backup('foo');" the same as "select
> pg_start_backup('foo',true);"? And what are the consequences of not
> calling for a checkpoint that way? My understanding of the docs is
> that the pg_start_backup call will hang until a checkpoint happens
> organically, ie delaying the backup rather than other clients, but I'm
> not really sure and haven't a sample database big or busy enough to
> test this on.

Right, checkpoint is started by pg_start_backup() but is smoothed by
default.

--
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +