Home > mailing lists

Re: [HACKERS] subselect - Mailing list pgsql-hackers

From	Bruce Momjian
Subject	Re: [HACKERS] subselect
Date	January 5, 1998 10:31:08
Msg-id	199801051528.KAA10375@candle.pha.pa.us Whole thread Raw
In response to	Re: [HACKERS] subselect ("Vadim B. Mikheev" <vadim@sable.krasnoyarsk.su>)
List	pgsql-hackers

Tree view

> Yes, this is a way to handle subqueries by joining to temp table.
> After getting plan we could change temp table access path to
> node material. On the other hand, it could be useful to let optimizer
> know about cost of temp table creation (have to think more about it)...
> Unfortunately, not all subqueries can be handled by "normal" joins: NOT IN
> is one example of this - joining by <> will give us invalid results.
> Setting special NOT EQUAL flag is not enough: subquery plan must be
> always inner one in this case. The same for handling ALL modifier.
> Note, that we generaly can't use aggregates here: we can't add MAX to
> subquery in the case of > ALL (subquery), because of > ALL should return FALSE
> if subquery returns NULL(s) but aggregates don't take NULLs into account.

OK, here are my ideas.  First, I think you have to handle subselects in
the outer node because a subquery could have its own subquery.  Also, we
now have a field in Aggreg to all us to 'usenulls'.

OK, here it is.  I recommend we pass the outer and subquery through
the parser and optimizer separately.

We parse the subquery first.  If the subquery is not correlated, it
should parse fine.  If it is correlated, any columns we find in the
subquery that are not already in the FROM list, we add the table to the
subquery FROM list, and add the referenced column to the target list of
the subquery.

When we are finished parsing the subquery, we create a catalog cache
entry for it called 'sub1' and make its fields match the target
list of the subquery.

In the outer query, we add 'sub1' to its target list, and change
the subquery reference to point to the new range table.  We also add
WHERE clauses to do any correlated joins.

Here is a simple example:

    select *
    from taba
    where col1 = (select col2
              from tabb)

This is not correlated, and the subquery parser easily.  We create a
'sub1' catalog cache entry, and add 'sub1' to the outer query FROM
clause.  We also replace 'col1 = (subquery)' with 'col1 = sub1.col2'.

Here is a more complex correlated subquery:

    select *
    from taba
    where col1 = (select col2
              from tabb
              where taba.col3 = tabb.col4)

Here we must add 'taba' to the subquery's FROM list, and add col3 to the
target list of the subquery.  After we parse the subquery, add 'sub1' to
the FROM list of the outer query, change 'col1 = (subquery)' to 'col1 =
sub1.col2', and add to the outer WHERE clause 'AND taba.col3 = sub1.col3'.
THe optimizer will do the correlation for us.

In the optimizer, we can parse the subquery first, then the outer query,
and then replace all 'sub1' references in the outer query to use the
subquery plan.

I realize making merging the two plans and doing IN and NOT IN is the
real challenge, but I hoped this would give us a start.

What do you think?

--
Bruce Momjian
maillist@candle.pha.pa.us

pgsql-hackers by date:

From: Peter T Mount
Date: 05 January 1998, 10:23:07
Subject: Re: [HACKERS] Re: development

From: Bruce Momjian
Date: 05 January 1998, 10:42:35
Subject: Re: [HACKERS] Re: development

Re: [HACKERS] subselect - Mailing list pgsql-hackers

Previous

Next