Re: API change advice: Passing plan invalidation info from the rewriter into the planner? - Mailing list pgsql-hackers
From | Stephen Frost |
---|---|
Subject | Re: API change advice: Passing plan invalidation info from the rewriter into the planner? |
Date | |
Msg-id | 20140624162733.GO16098@tamriel.snowman.net Whole thread Raw |
In response to | Re: API change advice: Passing plan invalidation info from the rewriter into the planner? (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: API change advice: Passing plan invalidation info from
the rewriter into the planner?
Re: API change advice: Passing plan invalidation info from the rewriter into the planner? |
List | pgsql-hackers |
Robert, I feel like we are getting to the point of simply talking past each other and so I'll try anew, and I'll include my understanding of how the different approaches would address the specific use-case you outlined up-thread. Single policy ------------- The current implementation approach only allows a single policy to be included. The concern raised with this approach is that it won't be very performant due to the qual complexity, which you outlined (reformatted a bit) as: WHERE sales_rep_id = (SELECT oid FROM pg_roles WHERE rolname = current_user AND oid IN (SELECT id FROM person WHERE is_sales_rep)) OR partner_id = (SELECT p.org_id FROM pg_roles a,person p WHERE a.rolname = current_user AND a.oid = p.id) Which I take to mean there is a 'person' table which looks like: id, is_sales_rep, org_id and a table which has the RLS qual which looks like: pk_id, sales_rep_id, partner_id Then, if the individual is_sales_rep and it's their account by sales_rep_id, or if the individual's org_id matches the partner_id, they can see the record. Using this example with security barrier views and indexes on person.id, data.pk_id, data.sales_rep_id, and data.partner_id, we'll get a bitmap heap scan across the 'data' table by having the two OR's run as InitPlan 1 and InitPlan 2. Does that address the concern you had around multi-branch OR policies? This works with more than two OR branches also, though of course we need appropriate indexes to make use of a Bitmap Heap Scan. Even with per-user policies, we would define a policy along these lines, for the "sfrost" role: WHERE sales_rep_id = 16384 OR partner_id = 1 Which also ends up doing a Bitmap Heap Scan across the data table. For the case where a sales rep isn't also a partner, you could simplify this to: WHERE sales_rep_id = 16384 but I'm not sure that really buys you much? With the bitmap heap scan, if one side of the OR ends up not returning anything then it doesn't contribute to the blocks which have to be scanned. The index might still need to be scanned, although I think you could avoid even that with an EXISTS check to see if the user is a partner at all. That's not to say that a bitmap scan is equivilant to an index scan, but it's certainly likely to be far better than a sequential scan. Now, if the query is "select * from data_view with pk_id = 1002;", then we get an indexed lookup on the data table based on the PK. That's what I was trying to point out previously regarding leakproof functions (which comprise about half of the boolean functions we provide, if I recall my previous analysis correctly). We also get indexed lookups with "pk_id < 10" or similar as those are also leakproof. Multiple, Overlapping policies ------------------------------ Per discussion, these would generally be OR'd together. Building up the overall qual which has to include an OR branch for each individual policy qual(s) looks like a complicated bit of work and one which might be better left to the user (and, as just pointed out, the user may actually want AND instead of OR in some cases..). Managing the plan cache in a sensible way is certainly made more complicated by this and might mean that it can't be used at all, which has already been raised as a show-stopper issue. In the example which you provided, while we could represent that the two policies exist (sales representatives vs partners) and that they are to be OR'd together in the catalog, but I don't immediately see how that would change the qual which ends up being added to the query in this case or really improving the overall query plan; at least, not without eliminating one of the OR branches somehow- which I discuss below. Multiple, Non-overlapping policies ---------------------------------- Preventing the overlap of policies ends up being very complicated if many dimensions are allowed. For the simple case, perhaps only the 'current role' dimension is useful. I expect that going down that route would very quickly lead to requests for other dimensions (client IP, etc) which is why I'm not a big fan of it, but if that's the concensus then let's work out the syntax and update the patch and move on. Another option might be to have a qual for each policy which the user can define that indicates if that policy is to be applied or not and then simply pick the first policy for which that qual which returns 'true'. We would require an ordering to be defined in this case, which I believe was an issue up-thread. If we allow all policies matching the quals then we run into the complications mentioned under "Overlapping policies" above. If we decide that per-role policies need to be supported, I very quickly see the need to have "groups" of roles to which a policy is to be applied. This would differ from roles today as they would not be allowed to overlap (otherwise we are into overlapping policies again, or having to figure out which of the overlapping policies should be applied for each query; another option would be to error at run-time, but that seems pretty ugly). In this case we would still need to support "all" as an option, which is what I would expect to have implemented for 9.5, or at least in the early part of 9.5 (I really don't want to wait until the last CF or even the CF before that to get anything in as I suspect it will have grown by that point to be large enough to be an issue..), adding the per-role(s) option could be for 9.6/10.0. In your example, if sales representatives have distinct roles from partners, then the specific policy could be chosen and used based on which role is running the query, which might lead to, perhaps only maginal, improved performance in those specific cases, as discussed above. General multi-policy concerns ----------------------------- Choosing which policy or policies to apply for a given query gets very complicated very quickly if we're to do so in an automated way. Dean suggests that the user would pick which policy to use, to which I argued that roles could be used to manage that instead (a user could 'set role' to a role which has the access requested). That mechanism would also work in the existing single-policy approach by having a policy which depends on the calling role (eg: by looking up that role in a table which defines what access that role should have). It would also work in the above proposal for multiple non-overlapping policies where the policy to use is based on the current role. Overall, while I'm interested in defining where this is going in a way which allows us implement an initial RLS capability while avoiding future upgrade issues, I am perfectly happy to say that the 9.5 RLS implementation may not be exactly syntax-compatible with 9.6 or 10.0. What I wish to avoid is a case where what's in 9.5 includes RLS definitions which can't be implemented in 9.6/10.0 and as would cause upgrade problems. As long as what's in 9.5 can be represented and supported in 9.6/10.0, we can implement the necessary logic to migrate from one to the other in pg_dump. We do not guarantee syntax compatibility between major versions and we often warn users of newer features that there may be some changes in subsequent releases which they'll need to address when they upgrade (and, of course, these are noted in the release notes). Hopefully this will help us move the discussion forward to a point where we have a long-term design as well as a short-term goal which is actionable for 9.5. The current work is around adding the GUCs discussed previously to the RLS patch and modifying pg_dump to use them, to address the concerns raised previously about pg_dump running user code and possibly not having a complete copy of the data. Thanks, Stephen
pgsql-hackers by date: