Thread: RES: Fastest way to duplicate a quite large database

RES: Fastest way to duplicate a quite large database

From

Edson Richter

Date:

12 April 2016, 15:53:16

De: Adrian Klaver
Enviado:terça-feira, 12 de abril de 2016 12:04
Para: Edson Richter; pgsql-general@postgresql.org
Assunto: Re: [GENERAL] Fastest way to duplicate a quite large database

On 04/12/2016 07:51 AM, Edson Richter wrote:
> Same machine, same cluster - just different database name.

Hmm, running tests against the same cluster you are running the
production database would seem to be a performance hit against the
production database and potentially dangerous should the tests trip a
bug that crashes the server.

>
> Atenciosamente,
>
> Edson Carlos Ericksson Richter
>
> Em 12/04/2016 11:46, John R Pierce escreveu:
>> On 4/12/2016 7:25 AM, Edson Richter wrote:
>>>
>>> I have a database "Customer" with about 60Gb of data.
>>> I know I can backup and restore, but this seems too slow.
>>>
>>> Is there any other option to duplicate this database as
>>> "CustomerTest" as fast as possible (even fastar than backup/restore)
>>> - better if in one operation (something like "copy database A to B")?
>>> I would like to run this everyday, overnight, with minimal impact to
>>> prepare a test environment based on production data.
>>
>>
>> copy to the same machine, or copy to a different test server?
>> different answers.
>>
>>
>>
>
>
>

--
Adrian Klaver
adrian.klaver@aklaver.com

Hi Adrian,

Thanks for your insight. This is not a “test system” in the way I’m testing the database server code.

This is kind of “pre-production evaluation”, the stage were customer will say “yes” or “no” for publishing a new version of our system into production.

Also, server is plenty of RAM and processor cores, so I don’t foresee any kind of trouble here.

The is risk is lower than running a heavy reporting system over the database server.

The point is that customers want to test the new version of our system as close as possible of the production environment.

Thanks,

Edson

Re: Fastest way to duplicate a quite large database

From

Edson Richter

Date:

13 April 2016, 13:58:29

Em 12/04/2016 12:53, Edson Richter escreveu:

De: Adrian Klaver
Enviado:terça-feira, 12 de abril de 2016 12:04
Para: Edson Richter; pgsql-general@postgresql.org
Assunto: Re: [GENERAL] Fastest way to duplicate a quite large database

On 04/12/2016 07:51 AM, Edson Richter wrote:
> Same machine, same cluster - just different database name.

Hmm, running tests against the same cluster you are running the
production database would seem to be a performance hit against the
production database and potentially dangerous should the tests trip a
bug that crashes the server.

>
> Atenciosamente,
>
> Edson Carlos Ericksson Richter
>
> Em 12/04/2016 11:46, John R Pierce escreveu:
>> On 4/12/2016 7:25 AM, Edson Richter wrote:
>>>
>>> I have a database "Customer" with about 60Gb of data.
>>> I know I can backup and restore, but this seems too slow.
>>>
>>> Is there any other option to duplicate this database as
>>> "CustomerTest" as fast as possible (even fastar than backup/restore)
>>> - better if in one operation (something like "copy database A to B")?
>>> I would like to run this everyday, overnight, with minimal impact to
>>> prepare a test environment based on production data.
>>
>>
>> copy to the same machine, or copy to a different test server?
>> different answers.
>>
>>
>>
>
>
>

--
Adrian Klaver
adrian.klaver@aklaver.com

Hi Adrian,

Thanks for your insight. This is not a “test system” in the way I’m testing the database server code.
This is kind of “pre-production evaluation”, the stage were customer will say “yes” or “no” for publishing a new version of our system into production.

Also, server is plenty of RAM and processor cores, so I don’t foresee any kind of trouble here.

The is risk is lower than running a heavy reporting system over the database server.

The point is that customers want to test the new version of our system as close as possible of the production environment.

Thanks,

Edson

Another trouble I've found: I've used "pg_dump" and "pg_restore" to create the new CustomerTest database in my cluster. Immediately, replication started to replicate the 60Gb data into slave, causing big trouble.
Does mark it as "template" avoids replication of that "copied" database?
How can I mark a database to "do not replicate"?

Thanks,

Edson

Re: Fastest way to duplicate a quite large database

From

Adrian Klaver

Date:

13 April 2016, 14:18:28

On 04/13/2016 06:58 AM, Edson Richter wrote:

>
>
> Another trouble I've found: I've used "pg_dump" and "pg_restore" to
> create the new CustomerTest database in my cluster. Immediately,
> replication started to replicate the 60Gb data into slave, causing big
> trouble.
> Does mark it as "template" avoids replication of that "copied" database?
> How can I mark a database to "do not replicate"?

With the Postgres built in binary replication you can't, it replicates
the entire cluster. There are third party solutions that offer that choice:

http://www.postgresql.org/docs/9.5/interactive/different-replication-solutions.html

Table 25-1. High Availability, Load Balancing, and Replication Feature
Matrix

It has been mentioned before, running a non-production database on the
same cluster as the production database is a generally not a good idea.
Per previous suggestions I would host your CustomerTest database on
another instance/cluster of Postgres listening on a different port. Then
all you customers have to do is create a connection that points at the
new port.

>
> Thanks,
>
> Edson
>
>

--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Fastest way to duplicate a quite large database

From

Edson Richter

Date:

13 April 2016, 14:46:56

Em 13/04/2016 11:18, Adrian Klaver escreveu:
> On 04/13/2016 06:58 AM, Edson Richter wrote:
>
>>
>>
>> Another trouble I've found: I've used "pg_dump" and "pg_restore" to
>> create the new CustomerTest database in my cluster. Immediately,
>> replication started to replicate the 60Gb data into slave, causing big
>> trouble.
>> Does mark it as "template" avoids replication of that "copied" database?
>> How can I mark a database to "do not replicate"?
>
> With the Postgres built in binary replication you can't, it replicates
> the entire cluster. There are third party solutions that offer that
> choice:
>
> http://www.postgresql.org/docs/9.5/interactive/different-replication-solutions.html
>
>
> Table 25-1. High Availability, Load Balancing, and Replication Feature
> Matrix

Thanks, I'll look at that.

> It has been mentioned before, running a non-production database on the
> same cluster as the production database is a generally not a good
> idea. Per previous suggestions I would host your CustomerTest database
> on another instance/cluster of Postgres listening on a different port.
> Then all you customers have to do is create a connection that points
> at the new port.

Thanks for the concern.
This "CustomerTest" database is a staging, for customer approval before
upgrading the production system.
I bet the users will only open the system, and say it is ok. As crowded
as people are those days, I doubt they will validate something that is
already validated by our development team.
But our contractor requires, and we provide.
Since we have "express devivery of new versions" (almost 2 per week), we
would like to automate the staging environment.

Thanks,

Edson

>
>>
>> Thanks,
>>
>> Edson
>>
>>
>
>

Re: Fastest way to duplicate a quite large database

From

Adrian Klaver

Date:

13 April 2016, 14:59:33

On 04/13/2016 07:46 AM, Edson Richter wrote:
> Em 13/04/2016 11:18, Adrian Klaver escreveu:
>> On 04/13/2016 06:58 AM, Edson Richter wrote:
>>
>>>
>>>
>>> Another trouble I've found: I've used "pg_dump" and "pg_restore" to
>>> create the new CustomerTest database in my cluster. Immediately,
>>> replication started to replicate the 60Gb data into slave, causing big
>>> trouble.
>>> Does mark it as "template" avoids replication of that "copied" database?
>>> How can I mark a database to "do not replicate"?
>>
>> With the Postgres built in binary replication you can't, it replicates
>> the entire cluster. There are third party solutions that offer that
>> choice:
>>
>> http://www.postgresql.org/docs/9.5/interactive/different-replication-solutions.html
>>
>>
>> Table 25-1. High Availability, Load Balancing, and Replication Feature
>> Matrix
>
> Thanks, I'll look at that.
>
>> It has been mentioned before, running a non-production database on the
>> same cluster as the production database is a generally not a good
>> idea. Per previous suggestions I would host your CustomerTest database
>> on another instance/cluster of Postgres listening on a different port.
>> Then all you customers have to do is create a connection that points
>> at the new port.
>
> Thanks for the concern.
> This "CustomerTest" database is a staging, for customer approval before
> upgrading the production system.
> I bet the users will only open the system, and say it is ok. As crowded
> as people are those days, I doubt they will validate something that is

Not necessarily a bet I would count on:)

> already validated by our development team.
> But our contractor requires, and we provide.
> Since we have "express devivery of new versions" (almost 2 per week), we
> would like to automate the staging environment.

My guess is setting up a different cluster and using pg_basebackup(as
John suggested) will be a lot easier to automate then creating a
differential replication setup. Any way, I have beat this drum long enough.

>
> Thanks,
>
> Edson
>
>>
>>>
>>> Thanks,
>>>
>>> Edson
>>>
>>>
>>
>>
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Fastest way to duplicate a quite large database

From

CS DBA

Date:

13 April 2016, 15:00:55


On 04/13/2016 08:46 AM, Edson Richter wrote:
> Em 13/04/2016 11:18, Adrian Klaver escreveu:
>> On 04/13/2016 06:58 AM, Edson Richter wrote:
>>
>>>
>>>
>>> Another trouble I've found: I've used "pg_dump" and "pg_restore" to
>>> create the new CustomerTest database in my cluster. Immediately,
>>> replication started to replicate the 60Gb data into slave, causing big
>>> trouble.
>>> Does mark it as "template" avoids replication of that "copied"
>>> database?
>>> How can I mark a database to "do not replicate"?
>>
>> With the Postgres built in binary replication you can't, it
>> replicates the entire cluster. There are third party solutions that
>> offer that choice:
>>
>> http://www.postgresql.org/docs/9.5/interactive/different-replication-solutions.html
>>
>>
>> Table 25-1. High Availability, Load Balancing, and Replication
>> Feature Matrix
>
> Thanks, I'll look at that.
>
>> It has been mentioned before, running a non-production database on
>> the same cluster as the production database is a generally not a good
>> idea. Per previous suggestions I would host your CustomerTest
>> database on another instance/cluster of Postgres listening on a
>> different port. Then all you customers have to do is create a
>> connection that points at the new port.
>
> Thanks for the concern.
> This "CustomerTest" database is a staging, for customer approval
> before upgrading the production system.
> I bet the users will only open the system, and say it is ok. As
> crowded as people are those days, I doubt they will validate something
> that is already validated by our development team.
> But our contractor requires, and we provide.
> Since we have "express devivery of new versions" (almost 2 per week),
> we would like to automate the staging environment.
>
> Thanks,
>
> Edson
>
>>
>>>
>>> Thanks,
>>>
>>> Edson
>>>
>>>
>>
>>
>
>
>

Have you tried this:

- to copy database named prod to a database named prod_test:

on the same cluster (same server):
pg_dump prod --create --clean| psql prod_test

Copy from prod db cluster to another cluster/server (i.e. the test server)
pg_dump prod --create --clean| psql -h [test_server_ip] prod_test