Re: Redacting information from logs - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Redacting information from logs |
Date | |
Msg-id | 20190803224757.6egkzussvkswnymk@alap3.anarazel.de Whole thread Raw |
In response to | Redacting information from logs (Jeff Davis <pgsql@j-davis.com>) |
Responses |
Re: Redacting information from logs
|
List | pgsql-hackers |
Hi, On 2019-07-30 11:54:55 -0700, Jeff Davis wrote: > My proposal is: > > * redact every '%s' in an ereport by having a special mode for > snprintf.c (this is possible because we now own snprintf) I'm extremely doubtful this is a sane approach. We use snprintf for a heck of a lot of things. The likelihood of this having unintended consequences seems high (consider an error being thrown while trying to report another error message and such). Nor do I think that snprintf.c is a good layer to perform redaction - it's too low level. It's used for both frontend/backend. It's used for both non-error and error purposes. I also don't think you're actually going to get that far with it - there's plenty places where we concatenate error messages without using *printf, but e.g. appendStringInfoString(). > But I don't see a better solution. Right now, it's a pain to treat log > files as sensitive things when there are so many ways they can help > with smooth operations and so many tools available to analyze them. > This proposal seems like a practical solution to enable better use of > log files while protecting potentially-sensitive information. I don't really see a low-effort way either. But I'm fairly certain that this will cause at least many problems as it'll help solve. I think incrementally moving to messages where portions of information are separated out (e.g. the things we'd inline with %s) is, although a lengthy process, the better approach. It'll make richer output formats possible, it'll allow for proper redaction, etc. I.e. something very roughly like ereport(ERROR, errmsg_rich("string with %{named}s references to %{parameter}s"), errparam("named", somevar), errparam("parameter", othervar, .redact = CONTEXT)); Which would allow us to add annotate whether a specific parameter needs to be redacted for certain contexts. I'd probably add a errredact(bool) to annotate whether a message needs to be redacted, mostly so we can easily flag a lot of current messages as OK. When not present, I'd redact the entire message when errmsg() is being used, and redact nothing if errmsg_rich() is used, and none of the parameters flag an error. That'd then also allow us to reference parameters that clients / exception handlers may not see, e.g. the arguments to leakproof functions. Which currently makes a lot of issues harder to debug, because we don't get the values for e.g. overflows, input syntax errors etc. Allowing errparam()s to be specified that are not used in the error messages, we can provide more detail to errors for people using richer log outputs. I'd assume we'd fairly quickly have logfmt/json logging target/format. Greetings, Andres Freund
pgsql-hackers by date: