Re: Command Triggers patch v18 - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Command Triggers patch v18 |
Date | |
Msg-id | 201203272205.08556.andres@2ndquadrant.com Whole thread Raw |
In response to | Re: Command Triggers patch v18 (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Command Triggers patch v18
|
List | pgsql-hackers |
Hi, On Tuesday, March 27, 2012 07:34:46 PM Robert Haas wrote: > On Tue, Mar 27, 2012 at 11:58 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > I still think the likeliest way towards that is defining some behaviour > > we want regarding > > * naming conflict handling > > * locking > > > > And then religiously make sure the patch adheres to that. That might need > > some refactoring of existing code (like putting a art of the utility.c > > handling of create table into its own function and such), but should be > > doable. > > > > I think BEFORE command triggers ideally should run > > * before permission checks > > * before locking > > * before internal checks are done (nameing conflicts, type checks and > > such) > It is possible to do this, and it would actually be much easier and > less invasive to implement than what Dimitri has done here, but the > downside is that you won't have done the name lookup either. See my > last email to Dimitri for a long explanation of why that restriction > is not easily circumventable: name lookups, locking, and permission > checks are necessarily and inextricably intertwined. Read your other mail. Comming back to it I think the above might be to restrictive to be usefull for a big part of use-cases. So your idea of more hook points below has some merits. > Still, if others > agree this is useful, I think it would be a lot easier to get right > than what we have now. I think its pretty important to have something thats usable rather easily which requires names to be resolved and thus permission checks already performed and (some) locks already acquired. I think quite some of the possible usages need the facility to be as simple as possible and won't care about already acquired locks/names. > Some of what we're now doing as part of parse analysis should really > be reclassified. For example, the ProcessUtility branch that handles > T_CreateStmt and T_CreateForeignTableStmt should really be split out > as a separate function that lives in tablecmds.c, and I think at least > some of what's in transformCreateStmt should be moved to tablecmds.c > as well. The current arrangement makes it really hard to guarantee > things like "the name gets looked up just once", which seems obviously > desirable, since strange things will surely happen if the whole > command doesn't agree on which OID it's operating on. Yes, I aggree, most of that should go from utility.c. > > Looking up oids and such before calling the trigger doesn't seem to be > > problematic from my pov. Command triggers are a superuser only facility, > > additional information they gain are no problem. > > Thinking about that I think we should enforce command trigger functions > > to be security definer functions. > I don't see any benefit from that restriction. The reason I was thinking it might be a good idea is that they get might get more knowledge passed than the user could get directly otherwise. Especially if we extend the scheme to more places/informations. > >> I actually think that, to really meet all needs here, we may need the > >> ability to get control in more than one place. > > Not sure what you mean by that. Before/after permission checks, > > before/after consistency checks? That sort of thing? > Yes. For example, above you proposed a kind of trigger that fires > really early - basically after parsing but before everything else. > What Dimitri has implemented is a much later trigger that is still > before the meat of the command, but not before *everything*. And then > there's an AFTER trigger, which could fire either after an individual > piece or stage of the command, or after the whole command is complete, > either of which seems potentially useful depending on the > circumstances. I almost think that the BEFORE/AFTER name serves to > confuse rather than to clarify, in this case. It's really a series of > specific hook points: whenever we parse a command (but before we > execute it), after security and sanity checks but before we begin > executing the command, before or after various subcommands, after the > whole command is done, and maybe a few others. When we say BEFORE or > AFTER, we implicitly assume that we want at most two of the things > from that list, and I am not at all sure that's what going to be > enough. You might have a point there. Not sure if the complexity of explaining all the different hook points is worth the pain. What are the phases we have: * after parse * logging * after resolving name * after checking permisssions (interwined with the former) * override permission check? INSTEAD? * after locking (interwined with the former) * except it needs to be connected to resolving the name/permission check this doesn't seem to be an attractive hook point * after validation * additional validation likely want to hook here * after execution * replication might want to hook here Am I missing an interesting phase? Allowing all that probably seems to require a substantial refactoring of commands/ and catalog/ > One thing I had thought about doing, in the context of sepgsql, and we > may yet do it, is create a hook that gets invoked whenever we need to > decide whether it's OK to perform an action on an object. For > example, consider ALTER TABLE .. ADD FOREIGN KEY: we'd ask the hook > both "is it OK to add a foreign key to table X?" and also "is it OK to > make a foreign key refer to table Y"? This doesn't fit into the > command-trigger framework at all, but it's definitely useful for > sepgsql, and I bet it's good for other things, too - maybe not that > specific example, but that kind of thing. Its a neat feature, yes. Seems to be orthogonal yes. > >> I agree that it's not a very helpful decision, but I'm not the one who > >> said we wanted command triggers rather than event triggers. :-) > > > > Well, but on the other hand what will you do with a "pg_attribute row > > added" event. It doesn't even have a oid ;) > > We refer to everything using ObjectAddress notation - the OID of the > system catalog that contains objects of that type, the OID of the > object itself, and a "sub-object ID" which is 0 in general but the > column number in that specific case. What you can do is go fetch the > pg_attribute row and then do whatever you like - log it, throw an > error, etc. It's a hook that gets invoked when you add a column. If > it's important to differentiate between a column that gets added at > CREATE TABLE time and one that gets added by ALTER TABLE .. ADD > COLUMN, that could be done. It's a lot easier to deal with triggers > that fire in cases you don't care about than to deal with triggers > that don't fire in cases you do care about. The former you can just > ignore. I don't believe that distinction can be done that easily without more impact than the CT patch. I think you need a surprisingly high amount of context to know when to ignore a trigger. Especially as its not exactly easy to transfer knowledge from one to the next. > > But how would do something like logging the originating statement with > > the object access hook machinery? > You can't, but you can do a lot of other useful stuff that command > triggers as currently envisioned won't allow. I'm starting to think > that maybe BEFORE triggers need to fire on commands and AFTER triggers > on events? Can we define "start of command execution" as an event? I > don't think we need two completely different facilities, but I really > want to make sure we have a plan for capturing events. I can't > imagine how replication is gonna work if we can't log exactly what we > did, rather than just the raw query string. Statement-based > replication is sometimes useful, especially in heterogenous > environments, but it's always kinda sucky, in that there are edge > cases that break. I don't think creating *new* DDL from the parsetree can really count as statement based replication. And again, I don't think implementing that will cost that much effort. How would it help us to know - as individual events! - which tuples where created where? Reassembling that will be a huge mess. I basically fail to see *any* use case besides access checking. > If you want to know what schema the server is going > to use for a newly created object before it creates it, you've got to > do that calculation yourself, and now you've created a risk of getting > a different answer. If you want to capture all the drops and replay > them on the slave, you have to be able to handle CASCADE. I would like to have CASCADE handled, yes. > > Well, the problem is with just catalog access its just about impossible > > to generate differential changes from them. Also the catalog between > > clusters *wont be the same* in a large part of the use-cases. Oids will > > basically never match. > True. ALTER commands are going to need additional information. But > some of that information also won't be available until quite late. > For example, an ALTER on a parent might cascade down to children, and > a command trigger might want to know exactly which tables were > affected. That trigger isn't really BEFORE or AFTER or INSTEAD OF; > it's more like DURING, and at a very specific point DURING. Well, *I* would like a separate commmand trigger to be fired for sub-actions but see them qualified as such... > > Matching the intent, which is mostly visible in the parsetree, seems to > > be a whole much more helpfull. > But note that there are a lot of possible things you can't tell from > the parse-tree without deducing them. Sure, for LISTEN, the parse > tree is fine. But for ALTER TABLE or DROP the parse tree doesn't even > tell you what object you're operating on (at least, not unless the > user includes an explicit schema-qualification), let alone the whole > list of objects the command is ultimately going to process due to > cascading, inheritance, etc. You can reimplement that logic in your > trigger but that's both fragile and laborious. Resolving objects to the schema qualified variant isn't that much of a problem. Inheritance is, I definitely can see that. For lots of simple use-cases even something that has some restrictions attached would be *WAY* better than the current situation where all that is not possible at all. I fear a bit that this discussion is leading to something thats never going to materialize because it would require a huge amount of work to get there. > >> That can be fixed using the optional argument to > >> InvokeObjectAccessHook, similar to what we've done for differentiating > >> internal drops from other drops. > > Several of the points where those hooks are invoked don't even seem to > > have the necessary information for that... Doing that seems to involve > > passing heaps of additional state through the backend. > I've been trying pretty hard to avoid that, but I agree that there are > possible stumbling blocks there. I don't think it's an insurmountable > problem, though. > We just need to start with a clean definition of > what we want the behavior to be; then, we may need to refactor > somewhat to get there. The problem with the current code is that it's > just accepting whatever fits nicely into the current code as a > specification-in-fact, and I don't think that's a good basis for a > feature on which people will want to build complex and robust systems. I agree that we need consistency there. > I find it rather regrettable that there were so few people doing review this > CommitFest. I spent a lot of time on things that could have been done by > other people, and the result is that some things like this have gone rather > late. I find that sad too. Even though I am part of the crowd that didn't do enough. Thanks for the work! Andres -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: