Home > mailing lists

Re: pg_rewind in contrib - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: pg_rewind in contrib
Date	March 10, 2015 06:46:20
Msg-id	CAA4eK1+yOG0H-U3xToa4N=av=d5WU25CQKH6jcREEGygciGDaw@mail.gmail.com Whole thread Raw
In response to	Re: pg_rewind in contrib (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses	Re: pg_rewind in contrib
List	pgsql-hackers

Tree view

On Mon, Mar 9, 2015 at 7:32 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> Attached is a new patch version, fixing all the little things you listed. I believe this is pretty much ready for commit. I'm going to read it through myself one more time before committing, but I don't have anything mind now that needs fixing anymore. I just pushed the change to split dbcommands.h into dbcommands.h and dbcommands_xlog.h, as that seems like a nice-to-have anyway.
>

Few assorted comments:

+ <step>

+ <para>

+ Copy all those changed blocks from the new cluster to the old cluster.

</para>

+ </step>

Isn't it possible incase of async replication that old cluster has

some blocks which new cluster doesn't have, what will it do

in such a case?

I have tried to test some form of such a case and it seems to be

failing with below error:

pg_rewind.exe -D ..\..\Data\ --source-pgdata=..\..\Database1

The servers diverged at WAL position 0/16DE858 on timeline 1.

Rewinding from last common checkpoint at 0/16B8A70 on timeline 1

could not open file "..\..\Data\/base/12706/16391" for truncation: No such file

or directory

Failure, exiting

Exact scenario is:

Node -1 (master):

Step-1

Create table t1(c1 int, c2 char(500)) with (fillfactor=10);

insert into t1 values(generate_series(1,110),'aaaa');

Stop Node-1 (pg_ctl stop ..)

Step-2

Copy manually the data-directory (it contains WAL log as well)

to new location say Database1

Node-2 (standby)

Step-3

Change settings to make it stand-by (recovery.conf and change postgresql.conf)

Start Node and verify all data exists.

Step-4

use pg_ctl promote to make Node-2 as master

Step-5

Start Node-1

insert few more records

insert into t1 values(generate_series(110,115),'aaaa');

Step-6

Node-2

Now insert one more records in table t1

insert into t1 values(116,'aaaa');

Stop both the nodes.

Now if I run pg_rewind on old-master(Node-1), it will lead to above error.

I think above scenario can be possible in async replication.

If I insert more records (greater than what I inserted in Step-5)

in Step-6, then pg_rewind works fine.

diff --git a/src/bin/pg_rewind/RewindTest.pm b/src/bin/pg_rewind/RewindTest.pm

+# To run a test, the test script (in t/ subdirectory) calls the functions

What do you mean by t/ subdirectory?

+ <application>pg_rewind</> was run, and thereforce could not be copied by

typo /thereforce

+static void

+sanityChecks(void)

+ /* Check that there's no backup_label in either cluster */

I could not see such a check in code. Am I missing anything?

+ /*

+ * TODO: move old file out of the way, if any. And perhaps create the

+ * file with temporary name first and rename in place after it's done.

+ */

+ snprintf(BackupLabelFilePath, MAXPGPATH,

+ "%s/backup_label" /* BACKUP_LABEL_FILE */, datadir_target);

There are couple of other TODO's in the patch, are these for future?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Kyotaro HORIGUCHI
Date: 10 March 2015, 06:21:15
Subject: Re: Performance improvement for joins where outer side is unique

From: Beena Emerson
Date: 10 March 2015, 06:55:22
Subject: Re: pg_trgm Memory Allocation logic

Re: pg_rewind in contrib - Mailing list pgsql-hackers

Previous

Next