Thread: Re: [SQL] cursor and update + view
Jan Wieck wrote: > > > My other problem is with the views: it seems not work if there is an > > aggregate command in the SELECT. It's only my experience, or others > > noticed it also? (The details are on the pgsql-novice (subject: view > > and aggregate command problem)) > > Aggregates in view's are still buggy. It is possible to > change the grouping when doing a join from such a view and > another relation in a way, that crashes the backend. We'll have to implement subqueries in FROM someday - this would allow handle aggregates in views... Vadim
Vadim wrote: > > Jan Wieck wrote: > > > > > My other problem is with the views: it seems not work if there is an > > > aggregate command in the SELECT. It's only my experience, or others > > > noticed it also? (The details are on the pgsql-novice (subject: view > > > and aggregate command problem)) > > > > Aggregates in view's are still buggy. It is possible to > > change the grouping when doing a join from such a view and > > another relation in a way, that crashes the backend. > > We'll have to implement subqueries in FROM someday - this would > allow handle aggregates in views... You're right. In the current implementation, any rangetable entry (RTE) that is finally (after rewrite) used in any Var node of the querytree results in a scan of that relation. Having an RTE that contains its own subquery would materialize the view internally and help us out. This kind of subquery RTE will also be the base for functions that return result sets (SETOF complex type). These are broken too. It will be a little tricky to pull out the qualifications that restrict the subquery RTE, so the view must not entirely get materialized. At least any constant expression compared against one of the views attributes must go into. There is another thing that I would like to have. The current rule system tries to turn a qualification that uses an aggregate column of a view into a subquery. This is because the planner doesn't support plain aggregate expressions in the qualification. If it would be possible to have another type of Var node that points to a targetlist entry, we could put the aggregates from the qualification into junk TLE's. TGL - are you listening - I think it's your code I'm ulyfying here :-). Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
Jan Wieck wrote: > > There is another thing that I would like to have. The current > rule system tries to turn a qualification that uses an > aggregate column of a view into a subquery. This is because > the planner doesn't support plain aggregate expressions in > the qualification. If it would be possible to have another > type of Var node that points to a targetlist entry, we could > put the aggregates from the qualification into junk TLE's. This thing would also handled by subqueries in FROM! Having support in planner/executor for queries like this: select * from A, (select c, max(d) as m from B group by c) SQ where SQ.c = A.x and SQ.m = A.y rule system will be able to put _any_ VIEW' query into FROM clause... Vadim
Vadim wrote: > This thing would also handled by subqueries in FROM! > Having support in planner/executor for queries like this: > > select * from A, (select c, max(d) as m from B group by c) SQ > where SQ.c = A.x and SQ.m = A.y > > rule system will be able to put _any_ VIEW' query into > FROM clause... Possible - but IMHO the wrong thing to do. As it is now for a view that has no aggregate, the rule system rewrites the query to something that is the same as if the user resolved the view definition by hand and used all the real tables instead. Have the following: CREATE TABLE t1 (a int4, b int4); CREATE TABLE t2 (a int4, c int4); CREATE TABLE t3 (a int4); CREATE VIEW v1 AS SELECT t1.a, t1.b, t2.c FROM t1, t2 WHERE t1.a = t2.a; Now do a SELECT t3.a, v1.b, v1.c FROM t3, v1 WHERE t3.a = v1.a; The current rewrite system builds a querytree that is exactly that what would have been produced by the parser if you had typed SELECT t3.a, t1.b, t2.c FROM t3, t1, t2 WHERE t3.a = t1.a AND t1.a = t2.a; Now the planner/optimizer has _ALL_ the tables that need to be scanned and _ALL_ the qualifications in _ONE_ querytree. It is the job of the optimizer to decide which is the best join path for this access. To make a good decision, it needs all this information plus the VACUUM statistics. If we put any view into a subquery RTE, we force the planner to materialize the view and do a nestloop over t3 and materialized v1 where possibly using t1 or t2 as the outermost scanrelation would be better. Stonebraker & Co where absolutely right when they spoke about productional rule systems. And what PostgreSQL does now is how I understood them. "Production rule systems are conceptually simple, but there are many subtle points involved in actually using them." -- Stonebraker I think the different grouping requirements for subsets of data when using aggregate columns in views is just one of the the problems he addressed with the above statement. We should build a subquery RTE only if Query->hasAggs is true. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
Jan Wieck wrote: > > > rule system will be able to put _any_ VIEW' query into > > FROM clause... > > Possible - but IMHO the wrong thing to do. As it is now for a "Will be able" doesn't mean "will DO" -:)) I just said that subqueries in FROM will resolve all problems with aggregates in VIEWs. > If we put any view into a subquery RTE, we force the planner > to materialize the view and do a nestloop over t3 and ^^^^^^^^^^^ Do you mean creating some tmp table etc? No - it's not required. > materialized v1 where possibly using t1 or t2 as the > outermost scanrelation would be better. SELECT t3.a, v1.b, v1.c FROM t3, (SELECT t1.a, t1.b, t2.c FROM t1, t2 WHERE t1.a = t2.a) v1 WHERE t3.a = v1.a; can be planned as Nestloop SubPlan ...what is costless for subquery... Seq/Index scan on t3 - no materialization... On the other hand, as we talk about query optimization - why rule system should do optimizer' work? Why not just put _any_ VIEW' query into FROM and let optimizer decide could query be rewritten as join or not? Ppl do strange things sometimes -:) Sometimes they use subqueries in WHERE while joins could be used and our optimizer don't try to catch this. I know that Sybase does. And, imho, we should implement this ... sometime -:)) Vadim
Vadim wrote: > > Jan Wieck wrote: > > > > If we put any view into a subquery RTE, we force the planner > > to materialize the view and do a nestloop over t3 and > ^^^^^^^^^^^ > Do you mean creating some tmp table etc? > No - it's not required. Sometimes a sortset is required (grouping, nesting etc.). With materialize I meant the same thing the executor does for a scan, merge or iter node. They return in memory tuples from a relation or a temp file. In our new case it's mostly the same as a scan node that does the view selection inside. And it returs the same tuples as a SELECT * from the view would. That's internal, on the fly materialization of the view. > On the other hand, as we talk about query optimization - why > rule system should do optimizer' work? Why not just put > _any_ VIEW' query into FROM and let optimizer decide > could query be rewritten as join or not? Ppl do strange > things sometimes -:) Sometimes they use subqueries in > WHERE while joins could be used and our optimizer don't > try to catch this. I know that Sybase does. > And, imho, we should implement this ... sometime -:)) Depends on where the optimization is done. If we do it on the parsetree (Query struct), it's the job of the rule system. The optimizer does not have to modify the parsetree. If it is done on the way from the parsetree to the plan, it is the job of the optimizer. If it is possible to do it on the parsetree, I would do it there. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
Jan Wieck wrote: > > > On the other hand, as we talk about query optimization - why > > rule system should do optimizer' work? Why not just put > > _any_ VIEW' query into FROM and let optimizer decide > > could query be rewritten as join or not? Ppl do strange > > things sometimes -:) Sometimes they use subqueries in > > WHERE while joins could be used and our optimizer don't > > try to catch this. I know that Sybase does. > > And, imho, we should implement this ... sometime -:)) > > Depends on where the optimization is done. If we do it on the > parsetree (Query struct), it's the job of the rule system. > The optimizer does not have to modify the parsetree. If it is > done on the way from the parsetree to the plan, it is the job > of the optimizer. > > If it is possible to do it on the parsetree, I would do it > there. Subquery --> Join transformation/optimization implemented in rule system will be used for Views only. Being implemented in optimizer it will be used in all cases. Vadim
Vadim wrote: > Subquery --> Join transformation/optimization implemented in > rule system will be used for Views only. Being implemented > in optimizer it will be used in all cases. Right for the current rule system, because it looks only for pg_rewrite entries to apply. Since it is called for every optimizable statement, it could do this as a last step on the querylist to be returned. Even if there where no rules to apply. I still think that it's the right place to do. Transforming a subselect into a join means to modify the users input, doing something different finally. This is kind of rewriting like for view rules. Reading the debug output "After rewriting" someone should be able to see which relations get scanned, where and which of their attributes are used for what. "On the other hand" I thought a little deeper about the transformation itself. On the first thought it looked so easy but on the third I confused myself a little. Let's take an easy subquery SELECT A.f1 FROM A WHERE A.f2 IN (SELECT B.f1 FROM B WHERE B.f2 = 'x'); This will return any A.f1 where f2 is referenced by a B.f1 WHERE B.f2 = 'x'. Regardless how often it is referenced, it will only be returned once. I cannot think of a join that can do this. The join SELECT A.f1 FROM A, B WHERE A.f2 = B.f1 AND B.f2 = 'x'; will return A.f1 as many times as there are duplicates in B that match. And DISTINCT doesn't help here because it would remove duplicate A.f1's too (what isn't the same as the initial subselect does). Could you give me an example where a subquery could get translated into a join that produces exactly the same output, no matter if there are duplicates or not? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #
Jan Wieck wrote: > > Vadim wrote: > > > Subquery --> Join transformation/optimization implemented in > > rule system will be used for Views only. Being implemented > > in optimizer it will be used in all cases. > > Right for the current rule system, because it looks only for > pg_rewrite entries to apply. Since it is called for every > optimizable statement, it could do this as a last step on the > querylist to be returned. Even if there where no rules to > apply. > > I still think that it's the right place to do. Transforming a > subselect into a join means to modify the users input, doing > something different finally. This is kind of rewriting like > for view rules. Reading the debug output "After rewriting" > someone should be able to see which relations get scanned, > where and which of their attributes are used for what. ... > Could you give me an example where a subquery could get > translated into a join that produces exactly the same output, > no matter if there are duplicates or not? Sybase' example: select title, price from titles where price = (select price from titles where title = "Straight Talk About Computers") select t1.title, t1.price from titles t1, titles t2 where t1.price = t2.price and t2.title = "Straight Talk About Computers" - yes, executor should ensure that there was only one record with t2.title equal "Straight Talk About Computers", this could be flagged in some way... On the other hand, I'm not sure that we have to concern about such cases and about "unwise" queries at all -:) So, forgive me for noise -:) There are many other things to do and handling subqueries in FROM is one of them. Vadim