Re: Switching timeline over streaming replication - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Switching timeline over streaming replication |
Date | |
Msg-id | 001701cda237$723d8db0$56b8a910$@kapila@huawei.com Whole thread Raw |
In response to | Re: Switching timeline over streaming replication (Amit Kapila <amit.kapila@huawei.com>) |
Responses |
Re: Switching timeline over streaming replication
|
List | pgsql-hackers |
> On Wednesday, October 03, 2012 8:45 PM Heikki Linnakangas wrote: > On Tuesday, October 02, 2012 4:21 PM Heikki Linnakangas wrote: > > Thanks for the thorough review! I committed the xlog.c refactoring > patch > > now. Attached is a new version of the main patch, comments on specific > > points below. I didn't adjust the docs per your comments yet, will do > > that next. > > I have some doubts regarding the comments fixed by you and some more new > review comments. > After this I shall focus majorly towards testing of this Patch. > Testing ----------- Failed Case -------------- 1. promotion of standby to master and follow standby to new master. 2. Stop standby and master. Restart standby first and then master 3. Restart of standby gives below errors E:\pg_git_code\installation\bin>LOG: database system was shut down in recovery at 2012-10-04 18:36:00 IST LOG: entering standby mode LOG: consistent recovery state reached at 0/176B800 LOG: redo starts at 0/176B800 LOG: record with zero length at 0/176BD68 LOG: database system is ready to accept read only connections LOG: streaming replication successfully connected to primary LOG: out-of-sequence timeline ID 1 (after 2) in log segment 0000000200000000000 00001, offset 0 FATAL: terminating walreceiver process due to administrator command LOG: out-of-sequence timeline ID 1 (after 2) in log segment 0000000200000000000 00001, offset 0 LOG: out-of-sequence timeline ID 1 (after 2) in log segment 0000000200000000000 00001, offset 0 LOG: out-of-sequence timeline ID 1 (after 2) in log segment 0000000200000000000 00001, offset 0 LOG: out-of-sequence timeline ID 1 (after 2) in log segment 0000000200000000000 00001, offset 0 Once this error comes, restart master/standby in any order or do some operations on master, always there is above error On standby. Passed Cases ------------- 1. After promoting standby as new master, try to make previous master (having same WAL as new master) as standby. In this case recovery.conf recovery_target_timeline set to latest. It ables to connect to new master and started streaming as per expectation. - As per expected behavior. 2. After promoting standby as new master, try to make previous master (having more WAL compare to new master) as standby, error is displayed. - As per expected behavior 3. After promoting standby as new master, try to make previous master (having same WAL as new master) as standby. In this case recovery.conf recovery_target_timeline is not set. Following LOG is displayed. LOG: fetching timeline history file for timeline 2 from primary server LOG: replication terminatedby primary server DETAIL: End of WAL reached on timeline 1 LOG: walreceiver ended streaming and awaits newinstructions LOG: re-handshaking at position 0/1000000 on tli 1 LOG: replication terminated by primary server DETAIL: End of WAL reached on timeline 1 LOG: walreceiver ended streaming and awaits new instructions LOG: re-handshakingat position 0/1000000 on tli 1 LOG: replication terminated by primary server DETAIL: End of WAL reachedon timeline 1 - As per expected behavior Pending Cases which needs to be tested (these are scenarios, some more testing I will do based on these scenarios) --------------------------------------- 1. a. Master M-1 b. Standby S-1 follows M-1 c. Standby S-2 follows M-1 d. Promote S-1 as master e. Try to followS-2 to S-1 -- operation should be success 2. a. Master M-1 b. Standby S-1 follows M-1 c. Stop S-1, M-1 d. Do the PITR in M-1 2 times. This is to increment timelinein M-1 e. try to follow standby S-1 to M-1 -- it should be success. 3. a. Master M-1 b. Standby S-1, S-2 follows M1 c. Standby S-3, S-4 follows S-1 d. Promote Standby which has highestWAL. e. follow all standby's to the new master. 4. a. Master M-1 b. Synchronous Standby S-1, S-2 c. Promote S-1 d. Follow M-1, S-2 to S-1 -- this operation shouldbe success. Concurrent Operations --------------------------- 1. a. Master M-1 , Standby S-1 follows M-1, Standby S-2 follows M-1 b. Many concurrent operations on master M-1 c. During concurrent ops, Promote S-1 d. try S-2 to followS-1 -- it should happen successfully. 2. During Promotion, call pg_basebackup 3. During Promotion, try to connect client Resource Testing ------------------ 1. a.Make standby follow master which is many time lines ahead b. Observeif there is any resource leak c. Allow the streaming replication for 30 mins d. Observe if there is anyresource leak Code Review ------------- Libpqrcv_readtimelinehistoryfile() { .. .. + if (PQnfields(res) != 2 || PQntuples(res) != 1) + { + int ntuples = PQntuples(res); + int nfields = PQnfields(res); + + PQclear(res); + ereport(ERROR, + (errmsg("invalid response from primary server"), + errdetail("Expected 1 tuple with 3 fields, got %d tuples with %d fields.", + ntuples, nfields))); + } .. } The error message is saying 3 fields needs to be read in timeline history, but the check seems to be is done for 2 fields. Kindly let me know if you want me to focus on any other areas for testing this feature. With Regards, Amit Kapila.
pgsql-hackers by date: