Thread: [GENERAL] Incremental / Level -1 backup in PG

[GENERAL] Incremental / Level -1 backup in PG

From

Rakesh Kumar

Date:

22 March 2017, 03:27:31

PG does not have a concept of incremental  backup. The way it works in Oracle and other RDBMS is that incremental
backuponly backups up changed blocks since the last full backup.  So if only 10% of blocks changed since the last full
backup,incremental backup will be only for 10%. 
I am wondering whether it is technically feasible to implement it like this:

1 - At the time of full backup, note the last modified time of each data file in a repository.
2 - Next time when incremental backup runs, for every data file it will check the last modified time of it with the one
inthe repository to determine whether it has changed since last full backup. If yes, back it up. 

Now on to restore:

1 - First restore full backup.
2 - Restore incremental backup.

My question: Will it work in PG?

Re: [GENERAL] Incremental / Level -1 backup in PG

From

Adrian Klaver

Date:

22 March 2017, 03:35:32

On 03/21/2017 05:27 PM, Rakesh Kumar wrote:
> PG does not have a concept of incremental  backup. The way it works in Oracle and other RDBMS is that incremental
backuponly backups up changed blocks since the last full backup.  So if only 10% of blocks changed since the last full
backup,incremental backup will be only for 10%. 
> I am wondering whether it is technically feasible to implement it like this:
>
> 1 - At the time of full backup, note the last modified time of each data file in a repository.
> 2 - Next time when incremental backup runs, for every data file it will check the last modified time of it with the
onein the repository to determine whether it has changed since last full backup. If yes, back it up. 
>
> Now on to restore:
>
> 1 - First restore full backup.
> 2 - Restore incremental backup.
>
> My question: Will it work in PG?

?:
https://www.postgresql.org/docs/9.6/static/continuous-archiving.html

https://www.postgresql.org/docs/9.6/static/warm-standby.html#STREAMING-REPLICATION



--
Adrian Klaver
adrian.klaver@aklaver.com

Re: [GENERAL] Incremental / Level -1 backup in PG

From

John R Pierce

Date:

22 March 2017, 03:35:39

On 3/21/2017 5:27 PM, Rakesh Kumar wrote:
> PG does not have a concept of incremental  backup. The way it works in Oracle and other RDBMS is that incremental
backuponly backups up changed blocks since the last full backup.  So if only 10% of blocks changed since the last full
backup,incremental backup will be only for 10%. 
> I am wondering whether it is technically feasible to implement it like this:
>
> 1 - At the time of full backup, note the last modified time of each data file in a repository.
> 2 - Next time when incremental backup runs, for every data file it will check the last modified time of it with the
onein the repository to determine whether it has changed since last full backup. If yes, back it up. 
>
> Now on to restore:
>
> 1 - First restore full backup.
> 2 - Restore incremental backup.
>
> My question: Will it work in PG?

basebackup + WAL archive lets you do just exactly this.   you can
restore to any transaction between when that basebackup was taken, and
the latest entry in the WAL archive, its referred in the documentation
as PITR, Point in Time Recovery.


--
john r pierce, recycling bits in santa cruz

Re: [GENERAL] Incremental / Level -1 backup in PG

From

Stephen Frost

Date:

22 March 2017, 03:36:52

Greetings,

* Rakesh Kumar (rakeshkumar464@outlook.com) wrote:
> PG does not have a concept of incremental  backup. The way it works in Oracle and other RDBMS is that incremental
backuponly backups up changed blocks since the last full backup.  So if only 10% of blocks changed since the last full
backup,incremental backup will be only for 10%. 
> I am wondering whether it is technically feasible to implement it like this:
>
> 1 - At the time of full backup, note the last modified time of each data file in a repository.
> 2 - Next time when incremental backup runs, for every data file it will check the last modified time of it with the
onein the repository to determine whether it has changed since last full backup. If yes, back it up. 
>
> Now on to restore:
>
> 1 - First restore full backup.
> 2 - Restore incremental backup.
>
> My question: Will it work in PG?

The short answer is 'no'.  There are complications around this,
particularly at the edges and because files can be written and rewritten
as you're reading them.  Basically, no file with a timestamp after the
checkpoint before the backup can be omitted from an incremental backup.

I strongly recommend you use one of the existing backup solutions for
PostgreSQL which know how to properly perform incremental backups.  I
know at least pgBackrest and barman do, I'm not sure about others.

Thanks!

Stephen

Attachment

signature.asc

Re: [GENERAL] Incremental / Level -1 backup in PG

From

Stephen Frost

Date:

22 March 2017, 03:43:00

John,

* John R Pierce (pierce@hogranch.com) wrote:
> On 3/21/2017 5:27 PM, Rakesh Kumar wrote:
> >PG does not have a concept of incremental  backup. The way it works in Oracle and other RDBMS is that incremental
backuponly backups up changed blocks since the last full backup.  So if only 10% of blocks changed since the last full
backup,incremental backup will be only for 10%. 
> >I am wondering whether it is technically feasible to implement it like this:
> >
> >1 - At the time of full backup, note the last modified time of each data file in a repository.
> >2 - Next time when incremental backup runs, for every data file it will check the last modified time of it with the
onein the repository to determine whether it has changed since last full backup. If yes, back it up. 
> >
> >Now on to restore:
> >
> >1 - First restore full backup.
> >2 - Restore incremental backup.
> >
> >My question: Will it work in PG?
>
> basebackup + WAL archive lets you do just exactly this.   you can
> restore to any transaction between when that basebackup was taken,
> and the latest entry in the WAL archive, its referred in the
> documentation as PITR, Point in Time Recovery.

WAL must always be kept for file-level backups, of course, but it does
not allow the kind of incremental backup the OP was suggesting.

It's important to understand that you might start reading a file whose
timestamp is X, read half of it, and then PG starts writing to the first
half of the file, and you finish reading the file, all within the same
second.

A later incremental backup might assume that file hadn't been changed
from the version you have and therefore not back it up.  The WAL for the
change which was written by PG would be in the first 'full' backup, but
would not be included in the WAL which is generated during the
incremental backup, leading to a situation where that write would be
lost and you have a corrupted backup.

Do not try to implement an incremental backup solution using
simple/naive tools like rsync with timestamp-based incrementals.  It is
not safe.

Thanks!

Stephen

Attachment

signature.asc

Re: [GENERAL] Incremental / Level -1 backup in PG

From

Oleg Bartunov

Date:

22 March 2017, 09:17:48

On Wed, Mar 22, 2017 at 3:27 AM, Rakesh Kumar <rakeshkumar464@outlook.com> wrote:

PG does not have a concept of incremental backup. The way it works in Oracle and other RDBMS is that incremental backup only backups up changed blocks since the last full backup. So if only 10% of blocks changed since the last full backup, incremental backup will be only for 10%.
I am wondering whether it is technically feasible to implement it like this:

Have you seen pg_probackup ? It's in development stage and needs somebody to test it.

1 - At the time of full backup, note the last modified time of each data file in a repository.
2 - Next time when incremental backup runs, for every data file it will check the last modified time of it with the one in the repository to determine whether it has changed since last full backup. If yes, back it up.

Now on to restore:

1 - First restore full backup.
2 - Restore incremental backup.

My question: Will it work in PG?

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Incremental / Level -1 backup in PG

From

rakeshkumar464

Date:

22 March 2017, 11:40:49


basebackup + WAL archive lets you do just exactly this.   you can
restore to any transaction between when that basebackup was taken, and
the latest entry in the WAL archive, its referred in the documentation
as PITR, Point in Time Recovery.

Yes John I do know about using WAL archive.  IMO that will not be as fast as
restoring
using the incremental backup. Eg:

It is common to take a full backup on weekends and incremental on
weeknights.  If we have to restore
upto Thu afternoon, which one do you think will be faster :-

1 -  Restore from basebackup.
2 -  Restore from wed night backup
3 - Apply WAL logs after wed night backup until the time we want to restore.
vs
1 - Restore from basebackup
2 - Apply WAL logs from weekend until the time we want to restore.

If first choice is lot faster in Oracle,DB2, I have reasons to believe that
the same should be true for PG also. But as someone explained, the PG
technology can not support this.

Anyhow it was an academical question.




--
View this message in context: http://www.postgresql-archive.org/Incremental-Level-1-backup-in-PG-tp5951072p5951147.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: [GENERAL] Incremental / Level -1 backup in PG

From

rakeshkumar464

Date:

22 March 2017, 11:49:30

Greetings,

>The short answer is 'no'.  There are complications around this,
>particularly at the edges and because files can be written and rewritten
>as you're reading them.
>Basically, no file with a timestamp after the
>checkpoint before the backup can be omitted from an incremental backup.

what you have written above applies to oracle/db2 too.  In case you are not
aware, during backup, those products have a mechanism to save the image
of any changing block as it existed before the start of the backup. that is
used
to reconstruct the PIT image of the block.

Anyhow looks like this can't be done in PG.

thanks all.






--
View this message in context: http://www.postgresql-archive.org/Incremental-Level-1-backup-in-PG-tp5951072p5951148.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.

Re: [GENERAL] Incremental / Level -1 backup in PG

From

Francisco Olarte

Date:

22 March 2017, 13:30:04

On Wed, Mar 22, 2017 at 9:40 AM, rakeshkumar464
<rakeshkumar464@outlook.com> wrote:
> basebackup + WAL archive lets you do just exactly this.
.....
> Yes John I do know about using WAL archive.  IMO that will not be as fast as
> restoring using the incremental backup.

That's an opinion, have you tried measuring? Because normally I've found that

1.- Incremental backups are slow and impose a greater runtime penalty
on the system than log-change-archiving methods.

2.- Incremental restores are not that fast.

> Eg:
> It is common to take a full backup on weekends and incremental on
> weeknights.  If we have to restore
> upto Thu afternoon, which one do you think will be faster :-
>
> 1 -  Restore from basebackup.
> 2 -  Restore from wed night backup
> 3 - Apply WAL logs after wed night backup until the time we want to restore.

You are assuming your backup product does direct-diff to base. Those
are gonna be costly when friday arrives.

> vs
> 1 - Restore from basebackup
> 2 - Apply WAL logs from weekend until the time we want to restore.

> If first choice is lot faster in Oracle,DB2,

Is it really testable / a lot faster ? ( bear in mind if a product
just supports one strategy there is a huge interest in telling it is
the faster one )

> I have reasons to believe that
> the same should be true for PG also. But as someone explained, the PG
> technology can not support this.

I fear incremental backup capabilities will make postgres slower.

Anyway, with base backup + wal archive you always have the option of
making incremental. Just start a recovery on the backup each time you
receive a wal segment wal and you are done. In fact, you can treat a
replication slave as a very low lag backup.

Francisco Olarte.

Re: [GENERAL] Incremental / Level -1 backup in PG

From

Karsten Hilbert

Date:

22 March 2017, 13:45:03

On Tue, Mar 21, 2017 at 08:43:00PM -0400, Stephen Frost wrote:

> Do not try to implement an incremental backup solution using
> simple/naive tools like rsync with timestamp-based incrementals.  It is
> not safe.

... as long as the server is *running*.

So, "stop" the server when using $RSYNC for $BACKUP.

After which the OPs question becomes entirely
independant from PostgreSQL as such, of course.

Karsten
--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: [GENERAL] Incremental / Level -1 backup in PG

From

Karsten Hilbert

Date:

22 March 2017, 13:47:07

On Wed, Mar 22, 2017 at 01:40:49AM -0700, rakeshkumar464 wrote:

> upto Thu afternoon, which one do you think will be faster :-

All in all, perhaps it is more a question of

    which one *came out* to be faster
    on your hardware
    with your load
    with your data
    after testing

> I have reasons to believe that
> the same should be true for PG also.

As would be ?

Best regards,
Karsten
--
GPG key ID E4071346 @ eu.pool.sks-keyservers.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346

Re: [GENERAL] Incremental / Level -1 backup in PG

From

Stephen Frost

Date:

22 March 2017, 16:04:19

Greetings,

* rakeshkumar464 (rakeshkumar464@outlook.com) wrote:
> If first choice is lot faster in Oracle,DB2, I have reasons to believe that
> the same should be true for PG also. But as someone explained, the PG
> technology can not support this.

This statement isn't correct.  There are, in fact, tools for doing
incremental backup and restore with PG and they work quite well, as I
mentioned.  Currently they operate at a file-level, which means 1G
chunks instead of block-level, but there is work being done to change
that too.  There isn't anything in "PG technology" that makes this
impossible or, really, even that difficult, it's more that there hasn't
been effort put into it simply because the file-level incremental
solution works quite well in most cases.

Thanks!

Stephen

Attachment

signature.asc

Re: [GENERAL] Incremental / Level -1 backup in PG

From

Stephen Frost

Date:

22 March 2017, 16:07:25

Greetings,

* rakeshkumar464 (rakeshkumar464@outlook.com) wrote:
> >The short answer is 'no'.  There are complications around this,
> >particularly at the edges and because files can be written and rewritten
> >as you're reading them.
> >Basically, no file with a timestamp after the
> >checkpoint before the backup can be omitted from an incremental backup.
>
> what you have written above applies to oracle/db2 too.  In case you are not
> aware, during backup, those products have a mechanism to save the image
> of any changing block as it existed before the start of the backup. that is
> used
> to reconstruct the PIT image of the block.

That is WAL archiving, which PG already does, but is different from
backups.

> Anyhow looks like this can't be done in PG.

PG does support WAL archiving, as discussed on this thread, and it works
exactly as you describe above.  As I mentioned, there are also tools for
performing incremental backups, which isn't quite the same as straight
WAL archiving.

Thanks!

Stephen

Attachment

signature.asc

Re: [GENERAL] Incremental / Level -1 backup in PG

From

Thomas Kellerer

Date:

22 March 2017, 16:42:57

Rakesh Kumar schrieb am 22.03.2017 um 01:27:
> PG does not have a concept of incremental  backup.

Postgres doesn't, but external tools can.

e.g. Barman can do incremental backups:

https://blog.2ndquadrant.com/incremental-backup-barman-1-4-0/

Re: [GENERAL] Incremental / Level -1 backup in PG

From

Rakesh Kumar

Date:

22 March 2017, 20:34:57

>> Yes John I do know about using WAL archive.  IMO that will not be as fast as
>> restoring using the incremental backup.

>That's an opinion, have you tried measuring? Because normally I've found that
>1.- Incremental backups are slow and impose a greater runtime penalty
>on the system than log-change-archiving methods.

I assume you are talking about other RDBMS where we can do PITR restore either:

         Restore base backup
         Apply  incremental backup
         Rollfoward WAL logs to the point you want to recover
                vs
         Restore base backup
        Rollfoward WAL logs to the point you want to recover

When the PITR is far apart from the time of base backup (far apart as in, let us
say 4 to 5 days), the first approach beats the second approach hands down. This
coming from experience. Reason is simple. In the second approach every transaction
(from the time of base backup) has to applied to roll-foward to PIT. In incremental backup,
a block is only applied once, regardless of how many times it changed after the base backup.

The diff may not be much if PITR is close to the time of base backup.

Note: I have never tried option (1) for PG.

>You are assuming your backup product does direct-diff to base. Those
>are gonna be costly when friday arrives.

You mean costly as in finding more and more blocks changed since weekend. that is
correct. However Oracle keeps track of location of all changed blocks since last base
backup and it helps in quick backup. It does not scan entire tablespace to figure which
blocks changed.


>Is it really testable / a lot faster ? ( bear in mind if a product
>just supports one strategy there is a huge interest in telling it is
>the faster one )

Nope. Incremental backup is not the only way to reach PITR until Thu afternoon
from a base backup on Sunday. You can always apply redo logs after restoring
from the base backup.
And yes, it is much faster.  We do it to prove to our clients why incremental backup
will benefit them more.

Re: [GENERAL] Incremental / Level -1 backup in PG

From

John R Pierce

Date:

22 March 2017, 20:45:56

On 3/22/2017 10:34 AM, Rakesh Kumar wrote:
> When the PITR is far apart from the time of base backup (far apart as in, let us
> say 4 to 5 days), the first approach beats the second approach hands down. This
> coming from experience. Reason is simple. In the second approach every transaction
> (from the time of base backup) has to applied to roll-foward to PIT. In incremental backup,
> a block is only applied once, regardless of how many times it changed after the base backup.

note postgres' WAL archive is by block, not by transaction.  also note
that postgres effectively does copy-on-write, since update's are treated
as insert+delete, so the same blocks aren't written over and over nearly
as much as they might be in the oracle storage model.

--
john r pierce, recycling bits in santa cruz

Re: [GENERAL] Incremental / Level -1 backup in PG

From

rakeshkumar464

Date:

22 March 2017, 21:11:06


>note postgres' WAL archive is by block, not by transaction.

My understanding is that only the first time a block is updated after a
checkpoint,
is the entire block is written to the WAL logs.  And for that
full_page_writes has to be set to ON.
The only other time PG writes entire block to the WAL is during the time of
backup regardless of full_page_writes setting.

AFAIK rest of the time, WAL takes only row changes.  Otherwise PG will be
generating large number
of WAL logs.

I hope I am right :-)

> also note that postgres effectively does copy-on-write, since update's are
> treated
>as insert+delete, so the same blocks aren't written over and over nearly
>as much as they might be in the oracle storage model.

Good point.



--
View this message in context: http://www.postgresql-archive.org/Incremental-Level-1-backup-in-PG-tp5951072p5951343.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.