Thread: Auditing via logical decoding
Hi Postgres Hackers, We have been using our own trigger-based audit system at my firm successfully for some years, but the performance penalty is starting to grate a bit and so I have been tasked with seeing if we can make use of the new logical decoding functions to achieve the same thing. I thought that someone must already have written something that would satisfy our use-case but my internet searches have come up short so far so I am considering writing a logical decoding plugin to do what we want. I thought I would run the idea past you all here just in case my plan is crazy; I’ve browsed around the postgres source code a bit before but I’ve never really gotten my hands dirty and am a little bit nervous about putting my own C code into the heart of our DBMS so if this comes to anything I would like to offer my code up for review and/or possible inclusion as a contributed module. A quick summary of requirements: We want to log (to a separate, remote database) - One row for every transaction that changes the state of the database. We call this table ‘audit_entry’ and contains the xid, transaction timestamp, username, client hostname, and application name of the session that caused the change. - One row for each change made by each transaction which records the state of the tuple before the change. We call this table ‘audit_detail’ and contains xid, statement timestamp, table name & schema, event_type, primary_key (hstore), old_row (hstore), and the text of the query that was responsible for the change. A lot of that information is available already by listening to the pgoutput decoding, and my first thought was that I could just write a receiver for that. However, application name, username, client hostname and current_query() are not available. This is understandable as they aren’t useful for logical replication. I was about to give up, when I discovered pg_logical_emit_message. My current thoughts are to: - Write this extra data into a logical message while the transaction is still in progess Either with a deferred trigger per table or, perhaps better Find some global commit-time (or xid-assigment time) hook emit it there - Then get the information out of the database: Either modify the existing pgoutput plugin & protocol to forward such messages in its stream, Or write a dedicated ‘audit’ decoding plugin with its own protocol - Then get the information into the ‘auditing’ database: Either with some standalone process that connects to both, consumes the output created above, translates it to SQL to run in the auditing DB. Figure out how to create a proper postgres background process to do it, in a similar fashion to the logical replication worker Any input you folks have would be very much appreciated. Kinds Regards, Philip PS: If there is someone out there who is willing & able to build this for less than my company will have to pay me to do it, please drop me a line ☺
On Fri, Jul 27, 2018 at 5:41 AM Philip Scott <from_postgres@safetyphil.com> wrote:
Hi Postgres Hackers,
We have been using our own trigger-based audit system at my firm
successfully for some years, but the performance penalty is starting to
grate a bit and so I have been tasked with seeing if we can make use of
the new logical decoding functions to achieve the same thing. I thought
that someone must already have written something that would satisfy our
use-case but my internet searches have come up short so far so I am
considering writing a logical decoding plugin to do what we want.
I thought I would run the idea past you all here just in case my plan is
crazy; I’ve browsed around the postgres source code a bit before but
I’ve never really gotten my hands dirty and am a little bit nervous
about putting my own C code into the heart of our DBMS so if this comes
to anything I would like to offer my code up for review and/or possible
inclusion as a contributed module.
A quick summary of requirements:
We want to log (to a separate, remote database)
- One row for every transaction that changes the state of the
database.
We call this table ‘audit_entry’ and contains the xid, transaction
timestamp, username, client hostname, and application name of the
session that caused the change.
- One row for each change made by each transaction which records the
state of the tuple before the change.
We call this table ‘audit_detail’ and contains xid, statement
timestamp, table name & schema, event_type, primary_key (hstore),
old_row (hstore), and the text of the query that was responsible for the
change.
A lot of that information is available already by listening to the
pgoutput decoding, and my first thought was that I could just write a
receiver for that. However, application name, username, client hostname
and current_query() are not available. This is understandable as they
aren’t useful for logical replication.
I was about to give up, when I discovered pg_logical_emit_message.
My current thoughts are to:
- Write this extra data into a logical message while the transaction
is still in progess
Either with a deferred trigger per table or, perhaps better
Find some global commit-time (or xid-assigment time) hook emit it there
- Then get the information out of the database:
Either modify the existing pgoutput plugin & protocol to forward
such messages in its stream,
Or write a dedicated ‘audit’ decoding plugin with its own protocol
- Then get the information into the ‘auditing’ database:
Either with some standalone process that connects to both, consumes
the output created above, translates it to SQL to run in the auditing
DB.
Figure out how to create a proper postgres background process to do
it, in a similar fashion to the logical replication worker
Any input you folks have would be very much appreciated.
Kinds Regards,
Philip
PS: If there is someone out there who is willing & able to build this
for less than my company will have to pay me to do it, please drop me a
line ☺
All I can say is +1 this would be an awesome feature to have and I hope to see it someday.
On 07/27/2018 12:44 PM, Philip Scott wrote: > Hi Postgres Hackers, > > We have been using our own trigger-based audit system at my firm > successfully for some years, but the performance penalty is starting to > grate a bit and so I have been tasked with seeing if we can make use of > the new logical decoding functions to achieve the same thing. I thought > that someone must already have written something that would satisfy our > use-case but my internet searches have come up short so far so I am > considering writing a logical decoding plugin to do what we want. > > I thought I would run the idea past you all here just in case my plan is > crazy; I’ve browsed around the postgres source code a bit before but > I’ve never really gotten my hands dirty and am a little bit nervous > about putting my own C code into the heart of our DBMS so if this comes > to anything I would like to offer my code up for review and/or possible > inclusion as a contributed module. > > A quick summary of requirements: > > We want to log (to a separate, remote database) > - One row for every transaction that changes the state of the database. > We call this table ‘audit_entry’ and contains the xid, transaction > timestamp, username, client hostname, and application name of the > session that caused the change. > - One row for each change made by each transaction which records the > state of the tuple before the change. > We call this table ‘audit_detail’ and contains xid, statement > timestamp, table name & schema, event_type, primary_key (hstore), > old_row (hstore), and the text of the query that was responsible for the > change. > Have you checked pgaudit [1]? I haven't checked if it matches all your requirements, but considering it's an extension aimed at auditing use cases it might. And it's already available, of course. [1] https://www.pgaudit.org/ regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>> We have been using our own trigger-based audit system at my firm >> successfully for some years, but the performance penalty is starting to >> grate a bit and so I have been tasked with seeing if we can make use of >> the new logical decoding functions to achieve the same thing. I thought >> that someone must already have written something that would satisfy our >> use-case but my internet searches have come up short so far so I am >> considering writing a logical decoding plugin to do what we want. > Have you checked pgaudit [1]? I haven't checked if it matches all your > requirements, but considering it's an extension aimed at auditing use > cases it might. And it's already available, of course. Actually no, I hadn't come across this before, thanks for the heads up. It is important for us to be able to get the auditdata back into a different database, but it looks like I could scrape the logs and do that. At the very least it showsthat it is possible to hook into postgres in the right places to emit extra logical messages if that turns out to bethe better way to do it. It doesn't appear to support application_name, nor the connection details of the user doing the update - but perhaps thatwould be just as sensible to add it here than in a logical output plugin. It's a bit tricky coming to a big codebase like postgres and trying to decide the best route of doing something; I don'thave much of a mental model about how complicated the various systems are :) Kind Regards, Phil
2018-07-27 16:43 GMT+02:00 <from_postgres@safetyphil.com>:
>> We have been using our own trigger-based audit system at my firm
>> successfully for some years, but the performance penalty is starting to
>> grate a bit and so I have been tasked with seeing if we can make use of
>> the new logical decoding functions to achieve the same thing. I thought
>> that someone must already have written something that would satisfy our
>> use-case but my internet searches have come up short so far so I am
>> considering writing a logical decoding plugin to do what we want.
> Have you checked pgaudit [1]? I haven't checked if it matches all your
> requirements, but considering it's an extension aimed at auditing use
> cases it might. And it's already available, of course.
Actually no, I hadn't come across this before, thanks for the heads up. It is important for us to be able to get the audit data back into a different database, but it looks like I could scrape the logs and do that. At the very least it shows that it is possible to hook into postgres in the right places to emit extra logical messages if that turns out to be the better way to do it.
It doesn't appear to support application_name, nor the connection details of the user doing the update - but perhaps that would be just as sensible to add it here than in a logical output plugin.
it is very simple to show application name or any other info
Few years ago I customized pgAudit and it was not hard work - almost all time
Regards
Pavel
It's a bit tricky coming to a big codebase like postgres and trying to decide the best route of doing something; I don't have much of a mental model about how complicated the various systems are :)
Kind Regards,
Phil
On 07/27/2018 04:43 PM, from_postgres@safetyphil.com wrote: >>> We have been using our own trigger-based audit system at my firm >>> successfully for some years, but the performance penalty is >>> starting to grate a bit and so I have been tasked with seeing if >>> we can make use of the new logical decoding functions to achieve >>> the same thing. I thought that someone must already have written >>> something that would satisfy our use-case but my internet >>> searches have come up short so far so I am considering writing a >>> logical decoding plugin to do what we want. > >> Have you checked pgaudit [1]? I haven't checked if it matches all >> your requirements, but considering it's an extension aimed at >> auditing use cases it might. And it's already available, of >> course. > > Actually no, I hadn't come across this before, thanks for the heads > up. It is important for us to be able to get the audit data back into > a different database, but it looks like I could scrape the logs and > do that. At the very least it shows that it is possible to hook into > postgres in the right places to emit extra logical messages if that > turns out to be the better way to do it. > > It doesn't appear to support application_name, nor the connection > details of the user doing the update - but perhaps that would be just > as sensible to add it here than in a logical output plugin. > > It's a bit tricky coming to a big codebase like postgres and trying > to decide the best route of doing something; I don't have much of a > mental model about how complicated the various systems are :) > It may not support everything, but in general when a feature can be done in an extension, it's generally faster to deliver that way. One of the reasons is the ad hoc release cycle (compared to the 1-per-year cycle of PostgreSQL). So I'd suggest investigating this option first. It may not meet all your requirements immediately (or at all), but well, no solution probably will. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Greetings, * from_postgres@safetyphil.com (from_postgres@safetyphil.com) wrote: > >> We have been using our own trigger-based audit system at my firm > >> successfully for some years, but the performance penalty is starting to > >> grate a bit and so I have been tasked with seeing if we can make use of > >> the new logical decoding functions to achieve the same thing. I thought > >> that someone must already have written something that would satisfy our > >> use-case but my internet searches have come up short so far so I am > >> considering writing a logical decoding plugin to do what we want. > > > Have you checked pgaudit [1]? I haven't checked if it matches all your > > requirements, but considering it's an extension aimed at auditing use > > cases it might. And it's already available, of course. > > Actually no, I hadn't come across this before, thanks for the heads up. It is important for us to be able to get the auditdata back into a different database, but it looks like I could scrape the logs and do that. At the very least it showsthat it is possible to hook into postgres in the right places to emit extra logical messages if that turns out to bethe better way to do it. > > It doesn't appear to support application_name, nor the connection details of the user doing the update - but perhaps thatwould be just as sensible to add it here than in a logical output plugin. I've not been following this very closely but when it comes to pgAudit, the log lines will include whatever is in log_line_prefix, where you can include information like the application_name, login user, etc, which is why you don't explicitly see options for those in pgAudit itself. This is because pgAudit logs messages the same way PG itself does (which also means it goes to the same place, such as to a CSV log that can then be imported into a PG database). Thanks! Stephen