Thread: Auditing via logical decoding

Auditing via logical decoding

From

Philip Scott

Date:

27 July 2018, 13:44:28

Hi Postgres Hackers,

We have been using our own trigger-based audit system at my firm 
successfully for some years, but the performance penalty is starting to 
grate a bit and so I have been tasked with seeing if we can make use of 
the new logical decoding functions to achieve the same thing. I thought 
that someone must already have written something that would satisfy our 
use-case but my internet searches have come up short so far so I am 
considering writing a logical decoding plugin to do what we want.

I thought I would run the idea past you all here just in case my plan is 
crazy; I’ve browsed around the postgres source code a bit before but 
I’ve never really gotten my hands dirty and am a little bit nervous 
about putting my own C code into the heart of our DBMS so if this comes 
to anything I would like to offer my code up for review and/or possible 
inclusion as a contributed module.

A quick summary of requirements:

We want to log (to a separate, remote database)
   - One row for every transaction that changes the state of the 
database.
     We call this table ‘audit_entry’ and contains the xid, transaction 
timestamp, username, client hostname, and application name of the 
session that caused the change.
   - One row for each change made by each transaction which records the 
state of the tuple before the change.
     We call this table ‘audit_detail’ and contains xid, statement 
timestamp, table name & schema, event_type, primary_key (hstore), 
old_row (hstore), and the text of the query that was responsible for the 
change.

A lot of that information is available already by listening to the 
pgoutput decoding, and my first thought was that I could just write a 
receiver for that. However, application name, username, client hostname 
and current_query() are not available. This is understandable as they 
aren’t useful for logical replication.

I was about to give up, when I discovered pg_logical_emit_message.

My current thoughts are to:
   - Write this extra data into a logical message while the transaction 
is still in progess
     Either with a deferred trigger per table or, perhaps better
    Find some global commit-time (or xid-assigment time) hook emit it there

   - Then get the information out of the database:
     Either modify the existing pgoutput plugin & protocol to forward 
such messages in its stream,
     Or write a dedicated ‘audit’ decoding plugin with its own protocol

   - Then get the information into the ‘auditing’ database:
     Either with some standalone process that connects to both, consumes 
the output created above, translates it to SQL to run in the auditing 
DB.
     Figure out how to create a proper postgres background process to do 
it, in a similar fashion to the logical replication worker

Any input you folks have would be very much appreciated.

Kinds Regards,

Philip

PS: If there is someone out there who is willing & able to build this 
for less than my company will have to pay me to do it, please drop me a 
line ☺

Re: Auditing via logical decoding

From

Jeremy Finzel

Date:

27 July 2018, 14:11:03

On Fri, Jul 27, 2018 at 5:41 AM Philip Scott <from_postgres@safetyphil.com> wrote:

Hi Postgres Hackers,

We have been using our own trigger-based audit system at my firm
successfully for some years, but the performance penalty is starting to
grate a bit and so I have been tasked with seeing if we can make use of
the new logical decoding functions to achieve the same thing. I thought
that someone must already have written something that would satisfy our
use-case but my internet searches have come up short so far so I am
considering writing a logical decoding plugin to do what we want.

I thought I would run the idea past you all here just in case my plan is
crazy; I’ve browsed around the postgres source code a bit before but
I’ve never really gotten my hands dirty and am a little bit nervous
about putting my own C code into the heart of our DBMS so if this comes
to anything I would like to offer my code up for review and/or possible
inclusion as a contributed module.

A quick summary of requirements:

We want to log (to a separate, remote database)
- One row for every transaction that changes the state of the
database.
We call this table ‘audit_entry’ and contains the xid, transaction
timestamp, username, client hostname, and application name of the
session that caused the change.
- One row for each change made by each transaction which records the
state of the tuple before the change.
We call this table ‘audit_detail’ and contains xid, statement
timestamp, table name & schema, event_type, primary_key (hstore),
old_row (hstore), and the text of the query that was responsible for the
change.

A lot of that information is available already by listening to the
pgoutput decoding, and my first thought was that I could just write a
receiver for that. However, application name, username, client hostname
and current_query() are not available. This is understandable as they
aren’t useful for logical replication.

I was about to give up, when I discovered pg_logical_emit_message.

My current thoughts are to:
- Write this extra data into a logical message while the transaction
is still in progess
Either with a deferred trigger per table or, perhaps better
Find some global commit-time (or xid-assigment time) hook emit it there

- Then get the information out of the database:
Either modify the existing pgoutput plugin & protocol to forward
such messages in its stream,
Or write a dedicated ‘audit’ decoding plugin with its own protocol

- Then get the information into the ‘auditing’ database:
Either with some standalone process that connects to both, consumes
the output created above, translates it to SQL to run in the auditing
DB.
Figure out how to create a proper postgres background process to do
it, in a similar fashion to the logical replication worker

Any input you folks have would be very much appreciated.

Kinds Regards,

Philip

PS: If there is someone out there who is willing & able to build this
for less than my company will have to pay me to do it, please drop me a
line ☺

All I can say is +1 this would be an awesome feature to have and I hope to see it someday.

Re: Auditing via logical decoding

From

Tomas Vondra

Date:

27 July 2018, 16:36:31


On 07/27/2018 12:44 PM, Philip Scott wrote:
> Hi Postgres Hackers,
> 
> We have been using our own trigger-based audit system at my firm 
> successfully for some years, but the performance penalty is starting to 
> grate a bit and so I have been tasked with seeing if we can make use of 
> the new logical decoding functions to achieve the same thing. I thought 
> that someone must already have written something that would satisfy our 
> use-case but my internet searches have come up short so far so I am 
> considering writing a logical decoding plugin to do what we want.
> 
> I thought I would run the idea past you all here just in case my plan is 
> crazy; I’ve browsed around the postgres source code a bit before but 
> I’ve never really gotten my hands dirty and am a little bit nervous 
> about putting my own C code into the heart of our DBMS so if this comes 
> to anything I would like to offer my code up for review and/or possible 
> inclusion as a contributed module.
> 
> A quick summary of requirements:
> 
> We want to log (to a separate, remote database)
>    - One row for every transaction that changes the state of the database.
>      We call this table ‘audit_entry’ and contains the xid, transaction 
> timestamp, username, client hostname, and application name of the 
> session that caused the change.
>    - One row for each change made by each transaction which records the 
> state of the tuple before the change.
>      We call this table ‘audit_detail’ and contains xid, statement 
> timestamp, table name & schema, event_type, primary_key (hstore), 
> old_row (hstore), and the text of the query that was responsible for the 
> change.
> 

Have you checked pgaudit [1]? I haven't checked if it matches all your 
requirements, but considering it's an extension aimed at auditing use 
cases it might. And it's already available, of course.

[1] https://www.pgaudit.org/

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

RE: Auditing via logical decoding

From

Date:

27 July 2018, 17:43:02

>> We have been using our own trigger-based audit system at my firm
>> successfully for some years, but the performance penalty is starting to
>> grate a bit and so I have been tasked with seeing if we can make use of
>> the new logical decoding functions to achieve the same thing. I thought
>> that someone must already have written something that would satisfy our
>> use-case but my internet searches have come up short so far so I am
>> considering writing a logical decoding plugin to do what we want.

> Have you checked pgaudit [1]? I haven't checked if it matches all your
> requirements, but considering it's an extension aimed at auditing use
> cases it might. And it's already available, of course.

Actually no, I hadn't come across this before, thanks for the heads up. It is important for us to be able to get the
auditdata back into a different database, but it looks like I could scrape the logs and do that. At the very least it
showsthat it is possible to hook into postgres in the right places to emit extra logical messages if that turns out to
bethe better way to do it. 

It doesn't appear to support application_name, nor the connection details of the user doing the update - but perhaps
thatwould be just as sensible to add it here than in a logical output plugin. 

It's a bit tricky coming to a big codebase like postgres and trying to decide the best route of doing something; I
don'thave much of a mental model about how complicated the various systems are :) 

Kind Regards,

Phil

Re: Auditing via logical decoding

From

Pavel Stehule

Date:

27 July 2018, 17:48:35

2018-07-27 16:43 GMT+02:00 <from_postgres@safetyphil.com>:

>> We have been using our own trigger-based audit system at my firm
>> successfully for some years, but the performance penalty is starting to
>> grate a bit and so I have been tasked with seeing if we can make use of
>> the new logical decoding functions to achieve the same thing. I thought
>> that someone must already have written something that would satisfy our
>> use-case but my internet searches have come up short so far so I am
>> considering writing a logical decoding plugin to do what we want.

> Have you checked pgaudit [1]? I haven't checked if it matches all your
> requirements, but considering it's an extension aimed at auditing use
> cases it might. And it's already available, of course.

Actually no, I hadn't come across this before, thanks for the heads up. It is important for us to be able to get the audit data back into a different database, but it looks like I could scrape the logs and do that. At the very least it shows that it is possible to hook into postgres in the right places to emit extra logical messages if that turns out to be the better way to do it.

It doesn't appear to support application_name, nor the connection details of the user doing the update - but perhaps that would be just as sensible to add it here than in a logical output plugin.

it is very simple to show application name or any other info

Few years ago I customized pgAudit and it was not hard work - almost all time

Regards

Pavel

It's a bit tricky coming to a big codebase like postgres and trying to decide the best route of doing something; I don't have much of a mental model about how complicated the various systems are :)

Kind Regards,

Phil

Re: Auditing via logical decoding

From

Tomas Vondra

Date:

27 July 2018, 18:25:31


On 07/27/2018 04:43 PM, from_postgres@safetyphil.com wrote:
>>> We have been using our own trigger-based audit system at my firm 
>>> successfully for some years, but the performance penalty is
>>> starting to grate a bit and so I have been tasked with seeing if
>>> we can make use of the new logical decoding functions to achieve
>>> the same thing. I thought that someone must already have written
>>> something that would satisfy our use-case but my internet
>>> searches have come up short so far so I am considering writing a
>>> logical decoding plugin to do what we want.
> 
>> Have you checked pgaudit [1]? I haven't checked if it matches all
>> your requirements, but considering it's an extension aimed at
>> auditing use cases it might. And it's already available, of
>> course.
> 
> Actually no, I hadn't come across this before, thanks for the heads
> up. It is important for us to be able to get the audit data back into
> a different database, but it looks like I could scrape the logs and
> do that. At the very least it shows that it is possible to hook into
> postgres in the right places to emit extra logical messages if that
> turns out to be the better way to do it.
> 
> It doesn't appear to support application_name, nor the connection
> details of the user doing the update - but perhaps that would be just
> as sensible to add it here than in a logical output plugin.
> 
> It's a bit tricky coming to a big codebase like postgres and trying
> to decide the best route of doing something; I don't have much of a
> mental model about how complicated the various systems are :)
> 

It may not support everything, but in general when a feature can be done 
in an extension, it's generally faster to deliver that way. One of the 
reasons is the ad hoc release cycle (compared to the 1-per-year cycle of 
PostgreSQL). So I'd suggest investigating this option first.

It may not meet all your requirements immediately (or at all), but well, 
no solution probably will.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Auditing via logical decoding

From

Stephen Frost

Date:

27 July 2018, 18:29:37

Greetings,

* from_postgres@safetyphil.com (from_postgres@safetyphil.com) wrote:
> >> We have been using our own trigger-based audit system at my firm
> >> successfully for some years, but the performance penalty is starting to
> >> grate a bit and so I have been tasked with seeing if we can make use of
> >> the new logical decoding functions to achieve the same thing. I thought
> >> that someone must already have written something that would satisfy our
> >> use-case but my internet searches have come up short so far so I am
> >> considering writing a logical decoding plugin to do what we want.
>
> > Have you checked pgaudit [1]? I haven't checked if it matches all your
> > requirements, but considering it's an extension aimed at auditing use
> > cases it might. And it's already available, of course.
>
> Actually no, I hadn't come across this before, thanks for the heads up. It is important for us to be able to get the
auditdata back into a different database, but it looks like I could scrape the logs and do that. At the very least it
showsthat it is possible to hook into postgres in the right places to emit extra logical messages if that turns out to
bethe better way to do it. 
>
> It doesn't appear to support application_name, nor the connection details of the user doing the update - but perhaps
thatwould be just as sensible to add it here than in a logical output plugin. 

I've not been following this very closely but when it comes to pgAudit,
the log lines will include whatever is in log_line_prefix, where you can
include information like the application_name, login user, etc, which is
why you don't explicitly see options for those in pgAudit itself.

This is because pgAudit logs messages the same way PG itself does (which
also means it goes to the same place, such as to a CSV log that can then
be imported into a PG database).

Thanks!

Stephen

Attachment

signature.asc