Re: [HACKERS] subselect - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: [HACKERS] subselect |
Date | |
Msg-id | 199801051528.KAA10375@candle.pha.pa.us Whole thread Raw |
In response to | Re: [HACKERS] subselect ("Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>) |
List | pgsql-hackers |
> Yes, this is a way to handle subqueries by joining to temp table. > After getting plan we could change temp table access path to > node material. On the other hand, it could be useful to let optimizer > know about cost of temp table creation (have to think more about it)... > Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN > is one example of this - joining by <> will give us invalid results. > Setting special NOT EQUAL flag is not enough: subquery plan must be > always inner one in this case. The same for handling ALL modifier. > Note, that we generaly can't use aggregates here: we can't add MAX to > subquery in the case of > ALL (subquery), because of > ALL should return FALSE > if subquery returns NULL(s) but aggregates don't take NULLs into account. OK, here are my ideas. First, I think you have to handle subselects in the outer node because a subquery could have its own subquery. Also, we now have a field in Aggreg to all us to 'usenulls'. OK, here it is. I recommend we pass the outer and subquery through the parser and optimizer separately. We parse the subquery first. If the subquery is not correlated, it should parse fine. If it is correlated, any columns we find in the subquery that are not already in the FROM list, we add the table to the subquery FROM list, and add the referenced column to the target list of the subquery. When we are finished parsing the subquery, we create a catalog cache entry for it called 'sub1' and make its fields match the target list of the subquery. In the outer query, we add 'sub1' to its target list, and change the subquery reference to point to the new range table. We also add WHERE clauses to do any correlated joins. Here is a simple example: select * from taba where col1 = (select col2 from tabb) This is not correlated, and the subquery parser easily. We create a 'sub1' catalog cache entry, and add 'sub1' to the outer query FROM clause. We also replace 'col1 = (subquery)' with 'col1 = sub1.col2'. Here is a more complex correlated subquery: select * from taba where col1 = (select col2 from tabb where taba.col3 = tabb.col4) Here we must add 'taba' to the subquery's FROM list, and add col3 to the target list of the subquery. After we parse the subquery, add 'sub1' to the FROM list of the outer query, change 'col1 = (subquery)' to 'col1 = sub1.col2', and add to the outer WHERE clause 'AND taba.col3 = sub1.col3'. THe optimizer will do the correlation for us. In the optimizer, we can parse the subquery first, then the outer query, and then replace all 'sub1' references in the outer query to use the subquery plan. I realize making merging the two plans and doing IN and NOT IN is the real challenge, but I hoped this would give us a start. What do you think? -- Bruce Momjian maillist@candle.pha.pa.us
pgsql-hackers by date: