Re: 9.3 to 9.5 upgrade problems - Mailing list pgsql-general
From | Adrian Klaver |
---|---|
Subject | Re: 9.3 to 9.5 upgrade problems |
Date | |
Msg-id | e9bfcb25-47b2-b862-c829-4f4dc704453e@aklaver.com Whole thread Raw |
In response to | 9.3 to 9.5 upgrade problems (Andy Colson <andy@squeakycode.net>) |
Responses |
Re: 9.3 to 9.5 upgrade problems
|
List | pgsql-general |
On 07/03/2016 08:06 AM, Andy Colson wrote: > Hi all, > > I have a master (web1) and two slaves (web2, webserv), one slave is > quite far from the master, the db is 112 Gig, so pg_basebackup is my > last resort. > > I followed the page here: > https://www.postgresql.org/docs/9.5/static/pgupgrade.html > > including the rsync stuff. I practiced it _twice_, once in PG 9.5 beta, > and again a week ago, on two VM's I created locally. Both practice > sessions worked perfect. > > I just ran it on the live databases. The master seems ok, its running > PG 9.5 now, I can login to it, and no errors in the log. > > Neither slave works. After I'd gotten done with the pgupgrade steps, > both slaves gave me this error: > > FATAL: database system identifier differs between the primary and standby > > Sure enough pg_controldata show'd their database system id different > (all three web1, web2, webserv were different. no matches at all), so > I'm assuming the rsync didnt rsync right, or I missed a step and ran it > to early, or something ... I'm not quite sure. > > I needed to get the live website back up and running again, so I let the > master go, ran analyze, and when it was finished, used the steps here to > try and resync: > > https://wiki.postgresql.org/wiki/Binary_Replication_Tutorial > > on Master: > select pg_start_backup('clone',true); > rsync -av --exclude pg_xlog --exclude postgresql.conf /pub/pg95/* > web2:/pub/pg95/ > select pg_stop_backup(); > rsync -av /pub/pg95/pg_xlog web2:/pub/pg95/ Not sure about above rsync, that seems to undo what you did previously. Also was the remote directory empty when you did this? > > > That ran pretty quick, and pg_controldata shows matching numbers, but > when I start the slave I get: > > ,,2016-07-03 06:06:57.173 CDT,: LOG: entering standby mode > ,,2016-07-03 06:06:57.205 CDT,: LOG: redo starts at 369/D6002228 > ,,2016-07-03 06:06:57.984 CDT,: LOG: consistent recovery state reached > at 369/DCC5DB90 > ,,2016-07-03 06:06:57.984 CDT,: LOG: database system is ready to accept > read only connections > ,,2016-07-03 06:06:57.984 CDT,: LOG: invalid record length at 369/DD038ED0 > ,,2016-07-03 06:06:58.344 CDT,: LOG: started streaming WAL from primary > at 369/DD000000 on timeline 1 > web,[unknown],2016-07-03 06:07:11.176 CDT,[local]: FATAL: role "andy" > does not exist > > I can login as myself on the master, but not on the slave. when I "psql > -U postgres" on the slave I get: > > psql: FATAL: cache lookup failed for database 16401 > > This is only on web2, its close to web1, so I'm hoping I can get it > fixed and then rsync it quickly to the far away slave. > > I'm at a loss here, any hints or suggestions would be appreciated. > > Thanks, > > -Andy > > -- Adrian Klaver adrian.klaver@aklaver.com
pgsql-general by date: