Home > mailing lists
Re: Command Triggers patch v18 - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: Command Triggers patch v18
Date	March 27, 2012 17:05:36
Msg-id	201203272205.08556.andres@2ndquadrant.com Whole thread Raw
In response to	Re: Command Triggers patch v18 (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Command Triggers patch v18
List	pgsql-hackers
Tree view
Hi,

On Tuesday, March 27, 2012 07:34:46 PM Robert Haas wrote:
> On Tue, Mar 27, 2012 at 11:58 AM, Andres Freund <andres@2ndquadrant.com> 
wrote:
> > I still think the likeliest way towards that is defining some behaviour
> > we want regarding
> > * naming conflict handling
> > * locking
> > 
> > And then religiously make sure the patch adheres to that. That might need
> > some refactoring of existing code (like putting a art of the utility.c
> > handling of create table into its own function and such), but should be
> > doable.
> > 
> > I think BEFORE command triggers ideally should run
> > * before permission checks
> > * before locking
> > * before internal checks are done (nameing conflicts, type checks and
> > such)
> It is possible to do this, and it would actually be much easier and
> less invasive to implement than what Dimitri has done here, but the
> downside is that you won't have done the name lookup either.  See my
> last email to Dimitri for a long explanation of why that restriction
> is not easily circumventable: name lookups, locking, and permission
> checks are necessarily and inextricably intertwined.
Read your other mail. Comming back to it I think the above might be to 
restrictive to be usefull for a big part of use-cases. So your idea of more 
hook points below has some merits.

> Still, if others
> agree this is useful, I think it would be a lot easier to get right
> than what we have now.
I think its pretty important to have something thats usable rather easily 
which requires names to be resolved and thus permission checks already 
performed and (some) locks already acquired.
I think quite some of the possible usages need the facility to be as simple as 
possible and won't care about already acquired locks/names.


> Some of what we're now doing as part of parse analysis should really
> be reclassified.  For example, the ProcessUtility branch that handles
> T_CreateStmt and T_CreateForeignTableStmt should really be split out
> as a separate function that lives in tablecmds.c, and I think at least
> some of what's in transformCreateStmt should be moved to tablecmds.c
> as well.  The current arrangement makes it really hard to guarantee
> things like "the name gets looked up just once", which seems obviously
> desirable, since strange things will surely happen if the whole
> command doesn't agree on which OID it's operating on.
Yes, I aggree, most of that should go from utility.c.

> > Looking up oids and such before calling the trigger doesn't seem to be
> > problematic from my pov. Command triggers are a superuser only facility,
> > additional information they gain are no problem.
> > Thinking about that I think we should enforce command trigger functions
> > to be security definer functions.
> I don't see any benefit from that restriction.
The reason I was thinking it might be a good idea is that they get might get 
more knowledge passed than the user could get directly otherwise. Especially 
if we extend the scheme to more places/informations.

> >> I actually think that, to really meet all needs here, we may need the
> >> ability to get control in more than one place.
> > Not sure what you mean by that. Before/after permission checks,
> > before/after consistency checks? That sort of thing?
> Yes.  For example, above you proposed a kind of trigger that fires
> really early - basically after parsing but before everything else.
> What Dimitri has implemented is a much later trigger that is still
> before the meat of the command, but not before *everything*.  And then
> there's an AFTER trigger, which could fire either after an individual
> piece or stage of the command, or after the whole command is complete,
> either of which seems potentially useful depending on the
> circumstances.  I almost think that the BEFORE/AFTER name serves to
> confuse rather than to clarify, in this case.  It's really a series of
> specific hook points: whenever we parse a command (but before we
> execute it), after security and sanity checks but before we begin
> executing the command, before or after various subcommands, after the
> whole command is done, and maybe a few others.  When we say BEFORE or
> AFTER, we implicitly assume that we want at most two of the things
> from that list, and I am not at all sure that's what going to be
> enough.
You might have a point there. Not sure if the complexity of explaining all the 
different hook points is worth the pain.

What are the phases we have:

* after parse * logging
* after resolving name
* after checking permisssions  (interwined with the former) * override permission check? INSTEAD?
* after locking (interwined with the former) * except it needs to be connected to resolving the name/permission check 
this doesn't seem to be an attractive hook point
* after validation * additional validation likely want to hook here
* after execution * replication might want to hook here

Am I missing an interesting phase?

Allowing all that probably seems to require a substantial refactoring of 
commands/ and catalog/

> One thing I had thought about doing, in the context of sepgsql, and we
> may yet do it, is create a hook that gets invoked whenever we need to
> decide whether it's OK to perform an action on an object.  For
> example, consider ALTER TABLE .. ADD FOREIGN KEY: we'd ask the hook
> both "is it OK to add a foreign key to table X?" and also "is it OK to
> make a foreign key refer to table Y"?  This doesn't fit into the
> command-trigger framework at all, but it's definitely useful for
> sepgsql, and I bet it's good for other things, too - maybe not that
> specific example, but that kind of thing.
Its a neat feature, yes. Seems to be orthogonal yes.

> >> I agree that it's not a very helpful decision, but I'm not the one who
> >> said we wanted command triggers rather than event triggers.  :-)
> > 
> > Well, but on the other hand what will you do with a "pg_attribute row
> > added" event. It doesn't even have a oid ;)
> 
> We refer to everything using ObjectAddress notation - the OID of the
> system catalog that contains objects of that type, the OID of the
> object itself, and a "sub-object ID" which is 0 in general but the
> column number in that specific case.  What you can do is go fetch the
> pg_attribute row and then do whatever you like - log it, throw an
> error, etc.  It's a hook that gets invoked when you add a column.  If
> it's important to differentiate between a column that gets added at
> CREATE TABLE time and one that gets added by ALTER TABLE .. ADD
> COLUMN, that could be done.  It's a lot easier to deal with triggers
> that fire in cases you don't care about than to deal with triggers
> that don't fire in cases you do care about.  The former you can just
> ignore.
I don't believe that distinction can be done that easily without more impact 
than the CT patch.
I think you need a surprisingly high amount of context to know when to ignore 
a trigger. Especially as its not exactly easy to transfer knowledge from one 
to the next.

> > But how would do something like logging the originating statement with
> > the object access hook machinery?
> You can't, but you can do a lot of other useful stuff that command
> triggers as currently envisioned won't allow.  I'm starting to think
> that maybe BEFORE triggers need to fire on commands and AFTER triggers
> on events?  Can we define "start of command execution" as an event?  I
> don't think we need two completely different facilities, but I really
> want to make sure we have a plan for capturing events.  I can't
> imagine how replication is gonna work if we can't log exactly what we
> did, rather than just the raw query string.  Statement-based
> replication is sometimes useful, especially in heterogenous
> environments, but it's always kinda sucky, in that there are edge
> cases that break.
I don't think creating *new* DDL from the parsetree can really count as 
statement based replication. And again, I don't think implementing that will 
cost that much effort.
How would it help us to know - as individual events! - which tuples where 
created where? Reassembling that will be a huge mess. I basically fail to see 
*any* use case besides access checking.

> If you want to know what schema the server is going
> to use for a newly created object before it creates it, you've got to
> do that calculation yourself, and now you've created a risk of getting
> a different answer.  If you want to capture all the drops and replay
> them on the slave, you have to be able to handle CASCADE.
I would like to have CASCADE handled, yes.

> > Well, the problem is with just catalog access its just about impossible
> > to generate differential changes from them. Also the catalog between
> > clusters *wont be the same* in a large part of the use-cases. Oids will
> > basically never match.
> True.  ALTER commands are going to need additional information.  But
> some of that information also won't be available until quite late.
> For example, an ALTER on a parent might cascade down to children, and
> a command trigger might want to know exactly which tables were
> affected.  That trigger isn't really BEFORE or AFTER or INSTEAD OF;
> it's more like DURING, and at a very specific point DURING.
Well, *I* would like a separate commmand trigger to be fired for sub-actions 
but see them qualified as such...

> > Matching the intent, which is mostly visible in the parsetree, seems to
> > be a whole much more helpfull.
> But note that there are a lot of possible things you can't tell from
> the parse-tree without deducing them.  Sure, for LISTEN, the parse
> tree is fine.  But for ALTER TABLE or DROP the parse tree doesn't even
> tell you what object you're operating on (at least, not unless the
> user includes an explicit schema-qualification), let alone the whole
> list of objects the command is ultimately going to process due to
> cascading, inheritance, etc.  You can reimplement that logic in your
> trigger but that's both fragile and laborious.
Resolving objects to the schema qualified variant isn't that much of a problem. 
Inheritance is, I definitely can see that.

For lots of simple use-cases even something that has some restrictions 
attached would be *WAY* better than the current situation where all that is 
not possible at all.

I fear a bit that this discussion is leading to something thats never going to 
materialize because it would require a huge amount of work to get there.


> >> That can be fixed using the optional argument to
> >> InvokeObjectAccessHook, similar to what we've done for differentiating
> >> internal drops from other drops.

> > Several of the points where those hooks are invoked don't even seem to
> > have the necessary information for that... Doing that seems to involve
> > passing heaps of additional state through the backend.

> I've been trying pretty hard to avoid that, but I agree that there are
> possible stumbling blocks there.  I don't think it's an insurmountable
> problem, though. 

> We just need to start with a clean definition of
> what we want the behavior to be; then, we may need to refactor
> somewhat to get there.  The problem with the current code is that it's
> just accepting whatever fits nicely into the current code as a
> specification-in-fact, and I don't think that's a good basis for a
> feature on which people will want to build complex and robust systems.
I agree that we need consistency there.

> I find it rather regrettable that there were so few people doing review this
> CommitFest.  I spent a lot of time on things that could have been done by
> other people, and the result is that some things like this have gone rather
> late.
I find that sad too. Even though I am part of the crowd that didn't do enough. 
Thanks for the work!

Andres

-- 
Andres Freund        http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date:
From: Peter Geoghegan
Date: 27 March 2012, 16:48:01
Subject: Re: Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)
From: Robert Haas
Date: 27 March 2012, 17:25:57
Subject: Re: pg_test_timing tool for EXPLAIN ANALYZE overhead
Re: Command Triggers patch v18 - Mailing list pgsql-hackers

Previous

Next