Re: Add CREATE support to event triggers - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Add CREATE support to event triggers |
Date | |
Msg-id | CA+TgmoZ=vZriJMxLkqi_V0jg4k4LEAPmwUSC6RWXS5MquXUJNA@mail.gmail.com Whole thread Raw |
In response to | Add CREATE support to event triggers (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Responses |
Re: Add CREATE support to event triggers
Re: Add CREATE support to event triggers |
List | pgsql-hackers |
On Fri, Nov 8, 2013 at 10:33 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Attached you can find a very-much-WIP patch to add CREATE info support > for event triggers (normalized commands). This patch builds mainly on > two things: > > 1. Dimitri's "DDL rewrite" patch he submitted way back, in > http://www.postgresql.org/message-id/m2zk1j9c44.fsf@2ndQuadrant.fr > > I borrowed the whole ddl_rewrite.c code, and tweaked it a bit. There > are several things still wrong with it and which will need to be fixed > before a final patch can even be contemplated; but there are some > questions that require a consensus answer before I go and fix it all, > because what it will look like will depend on said answers. I'm still unhappy with this whole concept. It adds a significant maintenance burden that must be carried by everyone who adds new DDL syntax in the future, and it's easy to imagine this area of the code ending up very poorly tested and rife with bugs. After all, event triggers, as nifty as they are, are only going to be used by a small minority of users. And new DDL features are typically going to be things that are fairly obscure, so again they will only be used by a small minority of users. I think we need to avoid the situation where we have bugs that can only get found when those minorities happen to intersect. If we're going to have DDL rewrite code, then we need a way of making sure that it gets tested in a comprehensive way on a regular basis. Here's one idea: create a contrib module that (somehow, via APIs to be invented) runs every DDL command that gets executed through the deparsing code, and then parses the result and executes *that* instead of the original command. Then, add a build target that runs the regression test suite in that mode, and get the buildfarm configured to run that build target regularly on at least some machines. That way, adding syntax to the regular regression test suite also serves to test that the deparsing logic for that syntax is working. If we do this, there's still some maintenance burden associated with having DDL deparsing code, but at least our chances of noticing when we've failed to maintain it should be pretty good. The other thing that bothers me here is that, while a normalized command string sounds great in theory, as soon as you want to allow (for example) mapping schema A on node 1 to schema B on node 2, the wheels come off: you'll have to deparse that normalized command string so you can change out the schema name and then reassemble it back into a command string again. So we're going to parse the user input, then deparse it, hand over the results to the application code, which will then parse it, modify that, and deparse it again. At every step of that process, any mistake will lead to subtle bugs in the resulting system. Larry Wall once wrote (approximately) that a good programming language makes simple things simple and hard things possible; I think this design fails the second prong of that test. Now, I guess I can live with that if it's what everyone else wants, but I don't have a great feeling about the long-term utility of it. Exposing the data from the DDL statement in some structured way - like what we've done with drops, or a JSON blob, or something like that, feels much more robust and useful than a normalized command string to me. If the normalized command string can easily be built up from that data, then you don't really need to expose the command string separately. If it can't, then you're not doing a good job exposing the data in a usable form. Saying "well, people can always parse the normalized command string" is a cop-out. Parsing SQL is *hard*; the only external project I know of that parses our SQL syntax well is pgpool, and that's because they copy our parser wholesale, surely not the sort of solution we want to foist off on event trigger authors. Finally, I'm very skeptical of the word "normalized". To me, that sounds like an alias for "modifying the command string in unspecified ways that big brother thinks will be useful to event trigger authors".Color me skeptical. What if somebody doesn't wanttheir command string normalized? What if they want it normalized in a way that's different from the way that we've chosen to normalize it? I fear that this whole direction amounts to "we don't know how to design a real API so let's just do surgery on the command string and call whatever pops out the API". Maybe that's harsh, but if it is I don't know why.The normalization steps we build into this processconstitute assumptions about how the feature will be used, and they back the user into using that feature in just that way and no other. > 2. The ideas we used to build DROP support. Mainly, the interesting > thing here is the fact that we use a SRF to report, at > ddl_command_end, all the objects that were created during execution > of that command. We do this by collecting them in a list in some raw > form somewhere during ProcessUtility, and then spitting them out if > the SRF is called. I think the general idea is sound, although of > course I admit there might be bugs in the implementation. Agreed. > Note this patch doesn't try to add any kind of ALTER support. I think > this is fine in principle, because we agreed that we would attack each > kind of command separately (divide to conquer and all that); Also agreed. > but there > is a slight problem for some kind of objects that are represented partly > as ALTER state during creation; for example creating a table with a > sequence uses ALTER SEQ/OWNED BY internally at some point. There might > be other cases I'm missing, also. (The REFRESH command is nominally > also supported.) There are lots of places in the DDL code where we pass around constructed parse trees as a substitute for real argument lists. I expect that many of those places will eventually get refactored away, so it's important that this feature does not end up relying on accidents of the current code structure. For example, an AlterTableStmt can actually do a whole bunch of different things in a single statement: SOME of those are handled by a loop in ProcessUtilitySlow() and OTHERS are handled internally by AlterTable. I'm pretty well convinced that that division of labor is a bad design, and I think it's important that this feature doesn't make that dubious design decision into documented behavior. > Now about the questions I mentioned above: > > a) It doesn't work to reverse-parse the statement nodes in all cases; > there are several unfixable bugs if we only do that. In order to create > always-correct statements, we need access to the catalogs for the > created objects. But if we are doing catalog access, then it seems to > me that we can do away with the statement parse nodes completely and > just reconstruct the objects from catalog information. Shall we go that > route? That works well for CREATE and is definitely appealing in some ways; it probably means needing a whole lot of what's in pg_dump in the backend also. Of course, converting the pg_dump code to a library that can be linked into either a client or the server would make a lot of people happy. Making pg_dump much dumber (at least as regards future versions) and relying on new backend code to serve the same purpose would perhaps be reasonable as well, although I know Tom is against it. But having two copies of that code doesn't sound very good; and we'd need some way of thoroughly testing the one living in the backend. > c) The current patch stashes all objects in a list, whenever there's an > event trigger function. But perhaps some users want to have event > triggers and not have any use for the CREATE statements. So one idea is > to require users that want the objects reported to call a special > function in a ddl_command_start event trigger which enables collection; > if it's not called, objects are not reported. This eliminates > performance impact for people not using it, but then maybe it will be > surprising for people that call the SRF and find that it always returns > empty. This seems like premature optimization to me, but I think I lost the last iteration of this argument. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: