Re: [PATCH] Exponential backoff for auth_delay - Mailing list pgsql-hackers
From | Michael Banck |
---|---|
Subject | Re: [PATCH] Exponential backoff for auth_delay |
Date | |
Msg-id | 65aa8265.170a0220.1ecef.91df@mx.google.com Whole thread Raw |
In response to | Re: [PATCH] Exponential backoff for auth_delay (Abhijit Menon-Sen <ams@toroid.org>) |
Responses |
Re: [PATCH] Exponential backoff for auth_delay
Re: [PATCH] Exponential backoff for auth_delay |
List | pgsql-hackers |
Hi, many thanks for the review! I went through your comments (a lot of which pertained to the original larger patch I took code from), attached is a reworked version 2. Other changes: 1. Ignore STATUS_EOF, this led to auth_delay being applied twice (maybe due to the gss/kerberos auth psql is trying by default? Is that legit and should this change be reverted?) - i.e. handle STATUS_OK and STATUS_ERROR explicitly. 2. Guard ShmemInitStruct() with LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE) / LWLockRelease(AddinShmemInitLock), as is done in pg_prewarm and pg_stat_statements as well. 3. Added an additional paragraph discussing the value of auth_delay.milliseconds when auth_delay.exponential_backoff is on, see below. I wonder whether maybe auth_delay.max_seconds should either be renamed to auth_delay.exponential_backoff_max_seconds (but then it is rather long) in order to make it clearer it only applies in that context or alternatively just apply to auth_delay.milliseconds as well (though that would be somewhat weird). Further comments to your comments: On Tue, Jan 16, 2024 at 12:00:49PM +0530, Abhijit Menon-Sen wrote: > At 2024-01-04 08:30:36 +0100, mbanck@gmx.net wrote: > > > > +typedef struct AuthConnRecord > > +{ > > + char remote_host[NI_MAXHOST]; > > + bool used; > > + double sleep_time; /* in milliseconds */ > > +} AuthConnRecord; > > Do we really need a "used" field here? You could avoid it by setting > remote_host[0] = '\0' in cleanup_conn_record. Ok, removed. > > static void > > auth_delay_checks(Port *port, int status) > > { > > + double delay; > > I would just initialise this to auth_delay_milliseconds here, instead of > the harder-to-notice "else" below. Done. > > @@ -43,8 +69,150 @@ auth_delay_checks(Port *port, int status) > > */ > > if (status != STATUS_OK) > > { > > - pg_usleep(1000L * auth_delay_milliseconds); > > + if (auth_delay_exp_backoff) > > + { > > + /* > > + * Exponential backoff per remote host. > > + */ > > + delay = record_failed_conn_auth(port); > > + if (auth_delay_max_seconds > 0) > > + delay = Min(delay, 1000L * auth_delay_max_seconds); > > + } > > I would make this comment more specific, maybe something like "Delay by > 2^n seconds after each authentication failure from a particular host, > where n is the number of consecutive authentication failures". Done. > It's slightly discordant to see the new delay being returned by a > function named "record_<something>" (rather than "calculate_delay" or > similar). Maybe a name like "backoff_after_failed_auth" would be better? > Or "increase_delay_on_auth_fail"? I called it increase_delay_after_failed_conn_auth() now. > > +static double > > +record_failed_conn_auth(Port *port) > > +{ > > + AuthConnRecord *acr = NULL; > > + int j = -1; > > + > > + acr = find_conn_record(port->remote_host, &j); > > + > > + if (!acr) > > + { > > + if (j == -1) > > + > > + /* > > + * No free space, MAX_CONN_RECORDS reached. Wait as long as the > > + * largest delay for any remote host. > > + */ > > + return find_conn_max_delay(); > > In this extraordinary situation (where there are lots of hosts with lots > of authentication failures), why not delay by auth_delay_max_seconds > straightaway, especially when the default is only 10s? I don't see much > point in coordinating the delay between fifty known hosts and an unknown > number of others. I was a bit worried about legitimate users suffering here if (for some reason) a lot of different hosts try to guess passwords, but only once or twice or something. But I have changed it now as you suggested as that makes it simpler and I guess the problem I mentioned above is rather contrived. > > + elog(DEBUG1, "new connection: %s, index: %d", acr->remote_host, j); > > I think this should be removed, but if you want to leave it in, the > message should be more specific about what this is actually about, and > where the message is from, so as to not confuse debug-log readers. I left it in but mentioned auth_delay in it now. I wonder though whether this might be a useful message to have at some more standard level like INFO? > (The same applies to the other debug messages.) Those are all gone. > > +static AuthConnRecord * > > +find_conn_record(char *remote_host, int *free_index) > > +{ > > + int i; > > + > > + *free_index = -1; > > + for (i = 0; i < MAX_CONN_RECORDS; i++) > > + { > > + if (!acr_array[i].used) > > + { > > + if (*free_index == -1) > > + /* record unused element */ > > + *free_index = i; > > + continue; > > + } > > + if (strcmp(acr_array[i].remote_host, remote_host) == 0) > > + return &acr_array[i]; > > + } > > + > > + return NULL; > > +} > > It's not a big deal, but to be honest, I would much prefer to (get rid > of used, as suggested earlier, and) have separate find_acr_for_host() > and find_free_acr() functions. Done. > > +static void > > +record_conn_failure(AuthConnRecord *acr) > > +{ > > + if (acr->sleep_time == 0) > > + acr->sleep_time = (double) auth_delay_milliseconds; > > + else > > + acr->sleep_time *= 2; > > +} > > I would just roll this function into record_failed_conn_auth (or > whatever it's named), Done. > In terms of the logic, it would have been slightly clearer to store the > number of failures and calculate the delay, but it's easier to double > the sleep time that way you've written it. I think it's fine. I kept it as-is for now. > It's worth noting that there is no time-based reset of the delay with > this feature, which I think is something that people might expect to go > hand-in-hand with exponential backoff. I think that's probably OK too. You mean something like "after 5 minutes, reset the delay to 0 again"? I agree that this would probably be useful, but would also make the change more complex. > > +static void > > +auth_delay_shmem_startup(void) > > +{ > > + Size required; > > + bool found; > > + > > + if (shmem_startup_next) > > + shmem_startup_next(); > > + > > + required = sizeof(AuthConnRecord) * MAX_CONN_RECORDS; > > + acr_array = ShmemInitStruct("Array of AuthConnRecord", required, &found); > > + if (found) > > + /* this should not happen ? */ > > + elog(DEBUG1, "variable acr_array already exists"); > > + /* all fileds are set to 0 */ > > + memset(acr_array, 0, required); > > } > > I think you can remove the elog and just do the memset if (!found). Also > if you're changing it anyway, I'd suggest something like "total_size" > instead of "required". Done. > > + DefineCustomBoolVariable("auth_delay.exp_backoff", > > + "Exponential backoff for failed connections, per remote host", > > + NULL, > > + &auth_delay_exp_backoff, > > + false, > > + PGC_SIGHUP, > > + 0, > > + NULL, > > + NULL, > > + NULL); > > Maybe "Double the delay after each authentication failure from a > particular host". (Note: authentication failed, not connection.) Done. > I would also mildly prefer to spell out "exponential_backoff" (but leave > auth_delay_exp_backoff as-is). Done. > > + DefineCustomIntVariable("auth_delay.max_seconds", > > + "Maximum seconds to wait when login fails during exponential backoff", > > + NULL, > > + &auth_delay_max_seconds, > > + 10, > > + 0, INT_MAX, > > + PGC_SIGHUP, > > + GUC_UNIT_S, > > + NULL, NULL, NULL); > > + > > Maybe just "Maximum delay when exponential backoff is enabled". Done. > (Parameter indentation doesn't match the earlier block.) I noticed that as well, but pgindent keeps changing it back to this, not sure why... > I'm not able to make up my mind if I think 10s is a good default or not. > In practice, it means that after the first three consecutive failures, > we'll delay by 10s for every subsequent failure. That sounds OK. But is > is much more useful than, for example, delaying the first three failures > by auth_delay_milliseconds and then jumping straight to max_seconds? What I had in mind is that admins would lower auth_delay.milliseconds to something like 100 or 125 when exponential_backoff is on, so that the first few (possibley honest) auth failures do not get an annoying 1 seconds penalty, but later ones then wil. In that case, 10 seconds is probably ok cause you'd need more than a handful of auth failures. I added a paragraph to the documentation to this end. > I can't really imagine wanting to increase max_seconds to, say, 128 and > keep a bunch of backends sleeping while someone's trying to brute-force > a password. And with a reasonably short max_seconds, I'm not sure if > having the backoff be _exponential_ is particularly important. > > Or maybe because this is a contrib module, we don't have to think about > it to that extent? Well, not sure. I think something like 10 seconds should be fine for most brute-force attacks in practise, and it is configurable (and turned off by default). > > diff --git a/doc/src/sgml/auth-delay.sgml b/doc/src/sgml/auth-delay.sgml > > index 0571f2a99d..2ca9528011 100644 > > --- a/doc/src/sgml/auth-delay.sgml > > +++ b/doc/src/sgml/auth-delay.sgml > > @@ -16,6 +16,17 @@ > > connection slots. > > </para> > > > > + <para> > > + It is optionally possible to let <filename>auth_delay</filename> wait longer > > + for each successive authentication failure from a particular remote host, if > > + the configuration parameter <varname>auth_delay.exp_backoff</varname> is > > + active. Once an authentication succeeded from a remote host, the > > + authentication delay is reset to the value of > > + <varname>auth_delay.milliseconds</varname> for this host. The parameter > > + <varname>auth_delay.max_seconds</varname> sets an upper bound for the delay > > + in this case. > > + </para> > > How about something like this… > > If you enable exponential_backoff, auth_delay will double the delay > after each consecutive authentication failure from a particular > host, up to the given max_seconds (default: 10s). If the host > authenticates successfully, the delay is reset. Done, mostly. > > + <varlistentry> > > + <term> > > + <varname>auth_delay.max_seconds</varname> (<type>integer</type>) > > + <indexterm> > > + <primary><varname>auth_delay.max_seconds</varname> configuration parameter</primary> > > + </indexterm> > > + </term> > > + <listitem> > > + <para> > > + How many seconds to wait at most if exponential backoff is active. > > + Setting this parameter to 0 disables it. The default is 10 seconds. > > + </para> > > + </listitem> > > + </varlistentry> > > I suggest "The maximum delay, in seconds, when exponential backoff is > enabled." Done. Michael
Attachment
pgsql-hackers by date: