Re: Turning recovery.conf into GUCs - Mailing list pgsql-hackers
From | Alex Shulgin |
---|---|
Subject | Re: Turning recovery.conf into GUCs |
Date | |
Msg-id | 87389ctq24.fsf@commandprompt.com Whole thread Raw |
In response to | Re: Turning recovery.conf into GUCs (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: Turning recovery.conf into GUCs
Re: Turning recovery.conf into GUCs |
List | pgsql-hackers |
Hello, Here's an attempt to revive this patch. It is rebased onto the latest master and also includes handling and documentation of newly added recovery.conf parameters such as primary_slot_name, recovery_min_apply_delay and recovery_target='immediate'. The following feedback had been addressed: Andres Freund <andres@2ndquadrant.com> writes: >> > >> > * --write-standby-enable seems to loose quite some functionality in >> > comparison to --write-recovery-conf since it doesn't seem to set >> > primary_conninfo, standby anymore. >> >> we can add code that looks for postgresql.conf in $PGDATA but if >> postgresql.conf is not there (like the case in debian, there is not >> much we can do about it) or we can write a file ready to be included >> in postgresql.conf, any sugestion? > > People might not like me for the suggestion, but I think we should > simply always include a 'recovery.conf' in $PGDATA > unconditionally. That'd make this easier. > Alternatively we could pass a filename to --write-recovery-conf. Well, the latest version of this patch fails to start when it sees 'recovery.conf' in PGDATA: FATAL: "recovery.conf" is not supported anymore as a recovery method DETAIL: Refer to appropriate documentation about migration methods I've missed all the discussion behind this decision and after reading the ALTER SYSTEM/conf.d thread I'm even more confused so I'd like someone more knowledgeable to speak up on the status of this. Do we want to keep this behavior of the patch? >> > * CheckRecoveryReadyFile() doesn't seem to be a very descriptive >> > function name. >> >> I left it as CheckStartingAsStandby() but i still have a problem of >> this not being completely clear. this function is useful for standby >> or pitr. There's not much left for this function in the current patch version, so maybe we should just move it to StartupXLOG (it's not called from anywhere else either way). >> which leads me to the other problem i have: the recovery trigger file, >> i have left it as standby.enabled but i still don't like it. > >> recovery.trigger (Andres objected on this name) >> forced_recovery.trigger >> user_forced_recovery.trigger > > stay_in_recovery.trigger? That'd be pretty clear for anybody involved in > pg, but otherwise... The docs use the term "continuous recovery". Either way, from the code it is clear that we only stay in recovery if standby_mode is directly turned on. This makes the whole check for a specially named file unnecessary, IMO: we should just check the value of standby_mode (which is off by default). By the way, is there any use in setting standby_mode=on and any of the recovery_target* GUCs at the same time? I think it can only play together if you set the target farther than the latest point you've got in the archive locally. So that's sort of "Point-in-Future-Recovery". Does that make any sense at all? >> > * the description of archive_cleanup_command seems wrong to me. >> >> why? it seems to be the same that was in recovery.conf. where did you >> see the description you're complaining at? > > I dislike the description in guc.c > >> + {"archive_cleanup_command", PGC_POSTMASTER, WAL_ARCHIVE_RECOVERY, >> + gettext_noop("Sets the shell command that will be executed at every restartpoint."), >> + NULL >> + }, >> + &archive_cleanup_command, > > previously it was: > >> -# specifies an optional shell command to execute at every restartpoint. >> -# This can be useful for cleaning up the archive of a standby server. Expanded the GUC desc. >> > * Why the PG_TRY/PG_CATCH in check_recovery_target_time? Besides being >> > really strangely formatted (multiline :? inside a function?) it >> > doesn't a) seem to be correct to ignore potential memory allocation >> > errors by just switching back into the context that just errored out, >> > and continue to work there b) forgo cleanup by just continuing as if >> > nothing happened. That's unlikely to be acceptable. >> >> the code that read recovery.conf didn't has that, so i just removed it > > Well, that's not necessarily correct. recovery.conf was only read during > startup, while this is read on SIGHUP. > [copied from the bottom, related] > > I don't think that's correct. Afaics it will cause the postmaster to > crash on a SIGHUP with invalid data. I think you're unfortunately going > to have to copy a fair bit of timestamptz_in() and even > DateTimeParseError(), replacing the ereport()s by the guc error > reporting. The use of PG_TRY/CATCH does protect from FATALs in SIGHUP indeed. Using CopyErrorData() we can also fetch the actual error message from timestamptz_in, though I wonder we really have to make a full copy. >> > * Why do you include xlog_internal.h in guc.c and not xlog.h? > >> we actually need both but including xlog_internal.h also includes xlog.h >> i added xlog.h and if someone things is enough only putting >> xlog_internal.h let me know > > What's required from xlog_internal.h? Looks like this was addressed before me. >> diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c >> index b53ae87..54f6a0d 100644 >> --- a/src/backend/access/transam/xlog.c >> +++ b/src/backend/access/transam/xlog.c >> @@ -64,11 +64,12 @@ >> extern uint32 bootstrap_data_checksum_version; >> >> /* File path names (all relative to $PGDATA) */ >> -#define RECOVERY_COMMAND_FILE "recovery.conf" >> -#define RECOVERY_COMMAND_DONE "recovery.done" >> +#define RECOVERY_ENABLE_FILE "standby.enabled" > > Imo the file and variable names should stay coherent. Yes, once we settle on the name (and if we really need that extra trigger file.) >> +/* recovery.conf is not supported anymore */ >> +#define RECOVERY_COMMAND_FILE "recovery.conf" > >> +bool StandbyModeRequested = false; >> +static TimestampTz recoveryDelayUntilTime; > > This imo should be lowercase now, the majority of GUC variables are. This is not a GUC variable, though it's calculated based on a GUC recovery_min_apply_delay. >> +/* are we currently in standby mode? */ >> +bool StandbyMode = false; > > Why did you move this? It was easy to move it back though. >> - if (rtliGiven) >> + if (strcmp(recovery_target_timeline_string, "") != 0) >> { > > Why not have the convention that NULL indicates a unset target_timeline > like you use for other GUCs? Mixing things like this is confusing. > > Why is recovery_target_timeline stored as a string? Because it's a > unsigned int? If so, you should have an assign hook setting up a) > rtliGiven, b) properly typed variable. Yes, I believe setting these to NULL by default makes a lot more sense. Then one can check if there was a non-default setting by checking the *_string variable, which is not possible with int or TimestampTz. >> - if (rtli) >> + if (recovery_target_timeline) >> { >> /* Timeline 1 does not have a history file, all else should */ >> - if (rtli != 1 && !existsTimeLineHistory(rtli)) >> + if (recovery_target_timeline != 1 && >> + !existsTimeLineHistory(recovery_target_timeline)) >> ereport(FATAL, >> (errmsg("recovery target timeline %u does not exist", >> - rtli))); >> - recoveryTargetTLI = rtli; >> + recovery_target_timeline))); >> + recoveryTargetTLI = recovery_target_timeline; >> recoveryTargetIsLatest = false; > > So, now we have a recoveryTargetTLI and recovery_target_timeline > variable? Really? Why do we need recoveryTargetTLI at all now? Looks like we still do need both of them. The initial value of recoveryTargetTLI is derived from pg_control, but later it can be overriden by this setting. However, if the recovery_target_timeline was "latest" we need to find the latest TLI, based on the initial value from pg_control. But we can't set final recoveryTargetTLI value in the GUC assign hook, because we might need to fetch some history files first. >> +static void >> +assign_recovery_target_time(const char *newval, void *extra) >> +{ >> + recovery_target_time = *((TimestampTz *) extra); >> + >> + if (recovery_target_xid != InvalidTransactionId) >> + recovery_target = RECOVERY_TARGET_XID; >> + else if (recovery_target_name[0]) >> + recovery_target = RECOVERY_TARGET_NAME; >> + else if (recovery_target_time != 0) >> + recovery_target = RECOVERY_TARGET_TIME; >> + else >> + recovery_target = RECOVERY_TARGET_UNSET; >> +} >> + > > I don't think it's correct to do such hangups in the assign hook - you > have no ideas in which order they will be called and such. Imo that > should happen at startup, like we also do it for other interdependent > variables like wal_buffers. Yeah, that looked weird. Moved to StartupXLOG(). I disliked the strtoul handling in the earlier version of the patch, especially given that with the base=0 it can parse 0x-prefixed hex strings. I would rather error out on non-hex digit instead of stopping and calling it OK. This change is included in the new version. Should we really allow specifying negative values for XID/timeline? Right now it will happily consume "-1" for recovery_target_xid and complain if it's out of range, like this: LOG: starting point-in-time recovery to XID 4294967295 LOG: invalid primary checkpoint record LOG: invalid secondary checkpoint record Allowing negative values makes even less sense for timelines, IMO. -- Alex
Attachment
pgsql-hackers by date: