Home > mailing lists

Re: Redacting information from logs - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: Redacting information from logs
Date	August 3, 2019 22:47:57
Msg-id	20190803224757.6egkzussvkswnymk@alap3.anarazel.de Whole thread Raw
In response to	Redacting information from logs (Jeff Davis <pgsql@j-davis.com>)
Responses	Re: Redacting information from logs
List	pgsql-hackers

Tree view

Hi,

On 2019-07-30 11:54:55 -0700, Jeff Davis wrote:
> My proposal is:
>
>  * redact every '%s' in an ereport by having a special mode for
> snprintf.c (this is possible because we now own snprintf)

I'm extremely doubtful this is a sane approach. We use snprintf for a
heck of a lot of things. The likelihood of this having unintended
consequences seems high (consider an error being thrown while trying to
report another error message and such). Nor do I think that snprintf.c
is a good layer to perform redaction - it's too low level. It's used for
both frontend/backend. It's used for both non-error and error purposes.

I also don't think you're actually going to get that far with it -
there's plenty places where we concatenate error messages without using
*printf, but e.g. appendStringInfoString().

> But I don't see a better solution. Right now, it's a pain to treat log
> files as sensitive things when there are so many ways they can help
> with smooth operations and so many tools available to analyze them.
> This proposal seems like a practical solution to enable better use of
> log files while protecting potentially-sensitive information.

I don't really see a low-effort way either. But I'm fairly certain that
this will cause at least many problems as it'll help solve.

I think incrementally moving to messages where portions of information
are separated out (e.g. the things we'd inline with %s) is, although a
lengthy process, the better approach. It'll make richer output formats
possible, it'll allow for proper redaction, etc.

I.e. something very roughly like

ereport(ERROR,
        errmsg_rich("string with %{named}s references to %{parameter}s"),
        errparam("named", somevar),
        errparam("parameter", othervar, .redact = CONTEXT));

Which would allow us to add annotate whether a specific parameter needs
to be redacted for certain contexts.

I'd probably add a errredact(bool) to annotate whether a message needs
to be redacted, mostly so we can easily flag a lot of current messages
as OK. When not present, I'd redact the entire message when errmsg() is
being used, and redact nothing if errmsg_rich() is used, and none of the
parameters flag an error.

That'd then also allow us to reference parameters that clients /
exception handlers may not see, e.g. the arguments to leakproof
functions. Which currently makes a lot of issues harder to debug,
because we don't get the values for e.g. overflows, input syntax errors
etc.

Allowing errparam()s to be specified that are not used in the error
messages, we can provide more detail to errors for people using richer
log outputs. I'd assume we'd fairly quickly have logfmt/json logging
target/format.

Greetings,

Andres Freund

pgsql-hackers by date:

From: Tom Lane
Date: 03 August 2019, 22:42:48
Subject: Re: A couple of random BF failures in kerberosCheck

From: Chapman Flack
Date: 03 August 2019, 22:57:44
Subject: Re: Redacting information from logs

Re: Redacting information from logs - Mailing list pgsql-hackers

Previous

Next