Re: [HACKERS] Proposal for changes to recovery.conf API - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: [HACKERS] Proposal for changes to recovery.conf API |
Date | |
Msg-id | CAB7nPqSwJJ4MfD+WPaiZN_eTKHWyuy1t1jVEV4SD0TZkKggtdA@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Proposal for changes to recovery.conf API (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: [HACKERS] Proposal for changes to recovery.conf API
Re: [HACKERS] Proposal for changes to recovery.conf API |
List | pgsql-hackers |
On Fri, Feb 24, 2017 at 7:39 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > New patch version implementing everything you requested, incl docs and > tap tests. The TAP changes look good to me. I thought that more diffs would have been necessary. > The patch as offered here is what I've been asked to do by everybody > as well as I can do it. I'm very happy to receive comments and to > rework the design based upon further feedback. This meritates a summary after all has happened on this thread and this topic. After reading this patch, here is what this is doing: - The existing standby_mode in recovery.conf is removed, replaced by standby.trigger to decide if a node in recovery should be put in standby mode. - recovery.trigger can be used to put a node in recovery. When both standby.trigger and recovery.trigger are specified, standby.trigger takes priority. - Two GUC parameters, recovery_target_type and recovery_target_value, replace all the existing recovery target parameters. - pg_basebackup -R generates recovery.conf.auto. - trigger_file is removed. FWIW, my only complain is about the removal of trigger_file, this is useful to detect a trigger file on a different partition that PGDATA! Keeping it costs also nothing.. > I'm not completely convinced this is a great design, so I'm happy to > hear input. pg_basebackup -R is the main wrinkle. Yeah, I can imagine that. It is easy to append a new file to a tarball, harder to add data to an existing file perhaps? > The timeline handling has a bug at present that I'm working on, but > I'm not worried it constitutes a major problem. Obviously it will be > fixed before commit, but the patch needs more discussion > now/yesterday. Running the tests I can see failures in 004_timeline_switch.pl, which is what you likely mention here, as well as a failure in 008_fsm_truncation.pl. I can also see that 009_twophase.pl is freezing. > All parameters are set at PGC_POSTMASTER for now. Thanks for considering that, this makes the review of this patch easier, and that's complicated enough as-is. -test_recovery_standby('XID + time + name + LSN', - 'standby_9', $node_master, \@recovery_params, "5000", $lsn5); +# Tests for multiple targets no longer needed from 10.0 onwards I would be fine without any comments as well. What's the point of mentioning a past state here? # Switch standby 2 to replay from standby 1 -rmtree($node_standby_2->data_dir . '/recovery.conf'); +#----still needed? rmtree($node_standby_2->data_dir . '/recovery.conf');my $connstr_1 = $node_standby_1->connstr; No need to worry here, this can be removed. osx:func.sgml:18159:19:X: reference to non-existent ID "RECOVERY-TARGET-NAME" osx:release-9.1.sgml:9814:23:X: reference to non-existent ID "RECOVERY-TARGET-NAME" osx:recovery-config.sgml:311:26:X: reference to non-existent ID "RECOVERY-TARGET-XID" osx:recovery-config.sgml:310:43:X: reference to non-existent ID "RECOVERY-TARGET-TIME" osx:release-9.4.sgml:8034:27:X: reference to non-existent ID "RECOVERY-TARGET" Documentation fails to compile. + The database server can also be started "in recovery" a term that covers + using the server as a standby or for executing a targeted recovery. Typically + standby mode would be used to provide high availability and/or read + scalability, whereas a targeted recovery is used to recover from data loss. Double quotes directly used in a documentation paragraph are not nice. You should add a comma after "in recovery". + To start the server in standby mode create a zero-length file + called <filename>standby.signal</><indexterm><primary>standby.signal</></> Zero-length file is not mandatory. Only creating a file is. + <para> + Parameter settings may be changed in <filename>postgresql.conf</filename> or + by executing the <command>ALTER SYSTEM</command>. Changes will take some + time to take effect, so changes made while not in paused state may not + have the desired effect in all cases. + </para> But those parameters are PGC_POSTMASTER?! + By default, recovery will recover to the end of the WAL log. An earlier + stopping point may be specified using <varname>recovery_target_type</> + and in most cases also <varname>recovery_target_value</>, plus the optional + parameters <varname>recovery_target_inclusive</>, + <varname>recovery_targeti_timeline</> and <varname>recovery_target_action</>. + </para> "recovery will recover" is a bad phrasing, I would change that to "recovery will process". There is also a typo => s/targeti/target/. + <para> + Parameter settings may be changed only at server start, though later + patches may add this capability. + </para> I would just say that "new values for those parameters are considered only at restart of the server". There is no need to speculate about a potential future in the documentation. If nothing happens this would remain incorrect. By default, a standby server restores WAL records from the - primary as soon as possible. It may be useful to have a time-delayed + sending server as soon as possible. It may be useful to have a time-delayed Perhaps that's a separate patch? Cascading servers can have a delay even now. Perhaps it would be worth mentioning promote.signal in the docs? - * recoveryTargetTLI: the desired timeline that we want to end in. + * recoveryTargetTimeLineGoal: what the user requested, if any + * + * recoveryTargetTLIRequested: numeric value of requested timeline, if constant + * + * recoveryTargetTLI: the currently understood target timeline; changes Hm.. This looks like too much complication... + fd = BasicOpenFile(StandbySignalFile, O_RDWR | PG_BINARY | get_sync_bit(sync_method), + S_IRUSR | S_IWUSR); + pg_fsync(fd); + close(fd); + standby_signal_file_found = true; Forgetting to sync the parent directory here, no? + int normal_log_level = LOG; //DEBUG2; Let's not forget that. + if (unlink(StandbySignalFile) != 0 && standby_signal_file_found) + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not remove file \"%s\": %m", + StandbySignalFile))); + if (unlink(RecoverySignalFile) != 0 && recovery_signal_file_found) + ereport(ERROR, + (errcode_for_file_access(), + errmsg("could not remove file \"%s\": %m", + RecoverySignalFile))); unlink() is not durable... It may be better to rename those files using durable_rename() so as it is durable. - if (strlen(restore_name_str) >= MAXFNAMELEN) + if (strlen(restore_name_str) >= MAXRESTOREPOINTNAMELEN Not sure there is much point to have a new variable here. + /* + * If primary_conninfo has been changed while walreceiver is running, + * shut down walreceiver so that a new walreceiver is started and + * initiates replication with the new connection information. + */ + if (strcmp(conninfo, PrimaryConnInfo) != 0) + ereport(FATAL, + (errcode(ERRCODE_ADMIN_SHUTDOWN), + errmsg("closing replication connection because primary_conninfo was changed"))); This cannot happen, those are postmaster parameters. Let's remove any complications if those are not needed. + if (!ParseConfigFile(RECOVERY_AUTOCONF_FILENAME, false, + NULL, 0, 0, elevel, + &head, &tail)) + { + /* Syntax error(s) detected in the file, so bail out */ + error = true; + ConfFileWithError = RECOVERY_AUTOCONF_FILENAME; + goto bail_out; + } Surely this should be documented at least on pg_backbackup page. + else if (strcmp(*newval, "controlfile") == 0 || strcmp(*newval, "") == 0) + rttg = RECOVERY_TARGET_TIMELINE_CONTROLFILE; What the idea behind "controlfile" as value for recovery_target_timeline? +# - Archive Recovery - +#restore_command = '' # command to use to restore an archived logfile segment + # placeholders: %p = path of file to restore + # %f = file name only + # e.g. 'cp /mnt/server/archivedir/%f %p It may be good to mention that those parameters are ignored on a primary server. + * + * Note we don't skip STANDBY_SIGNAL_FILE (correct??) */ My guess is that those should be kept in backups by default, and dropped if -R is used. +#define RecoveryTargetText(t) ( \ + t == RECOVERY_TARGET_UNSET ? "unset" : ( \ + t == RECOVERY_TARGET_XID ? "xid" : ( \ + t == RECOVERY_TARGET_TIME ? "timestamp" : ( \ + t == RECOVERY_TARGET_NAME ? "name" : ( \ + t == RECOVERY_TARGET_LSN ? "lsn" : \ + "immediate" ))))) The default value should be "none", not "immediate". Perhaps it is time to introduce a xlogrecovery.c and reduce again the size of xlog.c... -- Michael
pgsql-hackers by date: