Re: Oracle Style packages on postgres - Mailing list pgsql-hackers
From | Dave Held |
---|---|
Subject | Re: Oracle Style packages on postgres |
Date | |
Msg-id | 49E94D0CFCD4DB43AFBA928DDD20C8F902618505@asg002.asg.local Whole thread Raw |
In response to | Oracle Style packages on postgres (rmm@sqlisor.com) |
Responses |
Re: Oracle Style packages on postgres
Re: Oracle Style packages on postgres |
List | pgsql-hackers |
> -----Original Message----- > From: Bruce Momjian [mailto:pgman@candle.pha.pa.us] > Sent: Tuesday, May 10, 2005 8:43 AM > To: Thomas Hallgren > Cc: Tom Lane; rmm@sqlisor.com; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Oracle Style packages on postgres > > [...] > I suppose. I think we should focus on the use cases for Oracle > packages, rather than the specific functionality it provides. > What things do people need PostgreSQL to do that it already > doesn't do? Is that really the best way to go about things? Already RDBMSes are patchwork quilts of functionality. Is merely adding another patch the most elegant way to evolve the database? The problem is that Oracle et al are trying to be ORDBMSes and aren't exactly sure what the best way to go is. Instead of trying to formulate a rational plan for what an ORDBMS should even look like, they simply look at what would work with their existing infrastructure and tack on features. Then Postgres plays the copycat game. Instead of trying to play catch-up with Oracle, why not beat them at their own game? What packages provide is encapsulation. Hiding the data from the user and forcing him/her to use the public interface (methods). That is an important and admirable OO feature. Some people think that using the DB's security model can achieve the same thing. It can't, exactly, but there's an important lesson to be learned from the suggestion. The problem is that OOP is a *programming* paradigm, and a database is not a *programming language*. In a programming language, there really is no such thing as "security". There is only "visibility" and "accessibility". Private methods in an OOP language do not provide *security*; they only limit *accessibility*. Like so many other differences between the relational model and the OOP model, there is an impedance mismatch here. However, there is also opportunity. In an OOPL, you can say: "Users can call this method from here, but not from there." What you *can't* say is: "User X can call this method, but User Y cannot." As you can see, these are orthogonal concepts. You could call the first "accessibility by location" and the second "accessibility by authentication". An ORDBMS should support both. "Private" does not respect your identity, only your calling location. An ACL does not respect your calling scope, only your identity. A system that has both is clearly more flexible than one that only has one or the other. Now what you need to keep in mind is that each visibility model serves a different purpose. The purpose of a security model is to limit *who* can see/touch certain data because the data has intrinsic value. The purpose of an accessibility model is to limit *where* and *how* data can be seen/touched in order to preserve *program invariants*. So if you have an object (or tuple!) that records the start and stop time of some process, it is probably a logical invariant that the stop time is greater than or equal to the start time. For this reason, in a PL, you would encapsulate these fields (attributes) and only provide controlled access to update them that checks and preserves the invariant, *no matter who you are*. You don't want a superuser violating this invariant any more than Sue User. Now you might object that constraints allow you to preserve invariants as well, and indeed they do. But constraints do not respect calling scope. Suppose there is a process that needs to update the timestamps in a way that temporarily breaks the invariant but restores it afterwards. The only way to effect this in a constraint environment is to drop the constraint, perform the operation, and restore it. However, dropping a constraint is not an ideal solution because there may be other unprivileged processes operating on the relation that still need the constraint to be enforced. There is no way to say: "There is a priviledged class of methods that is allowed to violate this constraint because they are trusted to restore it upon completion." Note that this is different from saying "There is a priviledged class of users that is allowed to violate this constraint." If you try to do something like give read-only access to everybody and only write access to one user and define that user to be the owner of the methods that update the data, you have to follow the convention that that user only operates through the defined interface, and doesn't hack the data directly. That's because user-level accessibility is not the same as scope- level accessibility. Whereas, if you define something like a package, and say: "Package X is allowed full and complete access to relation Y", and stick the interface methods in X, you still have all the user-level security you want while preserving the invariants in the most elegant way. So you can think of a package as a scope in a programming language. It's like a user, but it is not a user. A user has privileges that cut across scopes. Now, whether packages should be different from schemas is a whole different ballgame. The purpose of a schema in Postgres is not entirely clear to me. There's lots of different ways to use schemas, and there is no obvious best way to use them. In order to implement the accessibility features of packages, schemas would have to be changed considerably. Probably a lot of users would be unhappy if schemas were changed in that way. My guess is that this would not be a good idea. I think we can get some guidance from PLs. C++ is what you call a "multi-paradigm language". You can do everything from assembly to metaprogramming in C++. As such, it is very loose and open in some respects. C++ has two kinds of scopes: it has classes and namespaces. Members of a class are encapsulated and support data hiding. Members of a namespace are only loosely grouped and do not support data hiding explicitly. Namespaces exist primarily to avoid name collisions. Java, on the other hand, decided that for OOP purity, everything must be a class. That would be like making schemas into packages and imposing accessibility rules on them. At the end of the day, I think many PL design experts agree that making everything a class is not necessarily the best way to go. So schemas can be like C++ namespaces - they provide a means to loosely group related objects and help avoid name collisions. So the package could be like a class - they provide OOP-like encapsulation via accessibility rules. However, that doesn't mean that nested schemas wouldn't also be a good thing. In C++, nested namespaces are extremely useful when one layer of scoping does not sufficiently partition the namespace to avoid frequent name collisions. I think the same is true of Postgres. I certainly would like to be able to use nested schema names in several contexts. Instead, I have to make a choice between making different schemas, or making different name prefixes. I wouldn't even mind if nested schemas were only allowed to contain schemas except at the leaves of the tree. Another feature that is very useful is the "using clause". Combined with nested namespaces, this is a very powerful way to give programmers/dbas control over names. You can give everything the most natural name, and just put it in the appropriate namespace, and use the namespace that is relevant to the given task at hand. So consider this example: Tables: etl.import.record etl.export.record As you can imagine, I don't really want to make an 'import' and 'export' schema at the top level. There's several tables in each schema, but that should illustrate the point. Then, when constructing queries, it would be nice to be able to do this: USING etl.import ; SELECT * FROM record JOIN header ON ... JOIN file ON ... ; The effect of a USING clause would be to import the schema names into the public namespace for the duration of the transaction. If that leads to ambiguous names, then the parser/planner should emit an error. __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129
pgsql-hackers by date: