Re: Proposal: More structured logging - Mailing list pgsql-hackers
From | Greg Stark |
---|---|
Subject | Re: Proposal: More structured logging |
Date | |
Msg-id | CAM-w4HOG6JxdbkKJXjfGtAtrqizZYVdkrKTX_BKsHGkzQ+NYYw@mail.gmail.com Whole thread Raw |
In response to | Proposal: More structured logging (Ronan Dunklau <ronan.dunklau@aiven.io>) |
Responses |
Re: Proposal: More structured logging
|
List | pgsql-hackers |
1) I would like an interface which more or less guarantees that *every* parameter of the log message is included in the structured data. Ideally there should be no actual need to generate the formatted messages for destinations like elastic search, just record the message id and the parameters. To that end I would thing something like: ereport(DEBUG2, (errmsg("table \"%s\": removed %lld dead item identifiers in %u pages", vacrel->relname, (long long) index, vacuumed_pages))); Would end up like: ereport(DEBUG2, (errmsg("table \"%s\": removed %lld dead item identifiers in %u pages", errtag("relname", vacrel->relname), errtag("dead_item_pointers", (long long) index), errtag("vacuumed_pages", vacuumed_pages)))); And any untagged parameters could be included in the structured data with unknown tag names. (Or perhaps some macro magic involving # but I'm not sure that would be possible with va_args.) 2) I was kind of hoping we could have some sort of list of known tag labels that could be used. So instead of an arbitrary string for the label name you just referenced the known labels. Then we could document that whenever "relid" appears as a tag id it's the oid of a relation, for example. This would help with things like "search elastic search for all log entries for relid x, block p". Which wouldn't work if block is sometimes "page" and sometimes "blocknum" and sometimes "block".... So I was imagining something like errtag(TAG_RELID, relid) 3) When it comes to things like queryid it would be helpful to be able to inject tags from higher in the call stack so that invididual ereport sites don't need to include it and we can decide to add new things like it in the future. This will be useful for things like distributed tracing tools who need to inject a span id into the every log message. I imagine it could also be useful more generally. Application developers might want to inject things like a pipeline identifier from their CI tools and then pull any logs with that identifier into their CI reports. Yes, they could be added as new columns in CSV and top level json attributes but we may well have new tags that are not always there or are even created on user request. Or we may have cases where there can be more than one active at a time. 4) As far as code ... this is probably premature microoptimization but I do find the tags all being individually palloced and all going through string formatting seems like a lot of overhead. Like the actual number of errtag() calls in the ereport is always going to be the same and the values are probably always going to be a single parameter, not really using the formatting functionality. I don't necessarily have a better alternative though. The only thing I can think of isn't really any better: errtag_str(RELNAME, relname) errtag_int(RELID, relid) (possibly with some magic where there's an errtag() function that looks up which data type to expect for a given tag id). It looks strange to me that the errorTag struct has a "const char *tagname" but a "char *tagvalue". I think this is a side effect of the code and not actually a sensible way to define the struct. Surely they should both be const?
pgsql-hackers by date: