Thread: Scalar subquery
Hi,
Apparently scalar subquery when used as a part of SELECT statement and when it does not depend on outer query columns
is executed only once per statement, e.g.:
postgres=# select i, (select random()) rand from generate_series(1, 3) i;
i | rand
---+-------------------
1 | 0.992319826036692
2 | 0.992319826036692
3 | 0.992319826036692
(Though term "depend" is subtle, compare these:
postgres=# select i, (select random() + case when false then i else 0 end ) rand from generate_series(1, 3) i;
i | rand
---+-------------------
1 | 0.806265413761139
2 | 0.806265413761139
3 | 0.806265413761139
(3 rows)
postgres=# select i, (select random() where i=i ) rand from generate_series(1, 3) i;
i | rand
---+-------------------
1 | 0.426443862728775
2 | 0.133071997668594
3 | 0.751982506364584
(3 rows)
postgres=# select i, (select random() where i=i or i is null ) rand from generate_series(1, 3) i;
i | rand
---+-------------------
1 | 0.320982406847179
2 | 0.996762252878398
3 | 0.076554249972105
(3 rows)
Looks like dependence is not there anymore if PG is smart enough to simplify boolean expressions)
Anyway, as some older PG versions and Oracle behave similarly I suppose this result is expected and desired (correct?),
but unfortunately not well-documented (did I miss it mentioned?).
Can anyone shed some light on this and/or probably update docs?
P.S.
I got bitten by a statement like this:
select (select nextval('someseq') * a + b from somefunc()), col1, ....
with a and b being OUT parameters of somefunc().
I just got my hands on mysql (5.0.something) and it does not cache the scalar subquery result.
So... now I'm completely puzzled whether this is a bug, a desired result or just a loosely standardized thing.
Help anyone?
On Fri, Aug 27, 2010 at 5:41 PM, Vyacheslav Kalinin <vka@mgcp.com> wrote:
Hi,Apparently scalar subquery when used as a part of SELECT statement and when it does not depend on outer query columnsis executed only once per statement, e.g.:postgres=# select i, (select random()) rand from generate_series(1, 3) i;i | rand---+-------------------1 | 0.9923198260366922 | 0.9923198260366923 | 0.992319826036692(Though term "depend" is subtle, compare these:postgres=# select i, (select random() + case when false then i else 0 end ) rand from generate_series(1, 3) i;i | rand---+-------------------1 | 0.8062654137611392 | 0.8062654137611393 | 0.806265413761139(3 rows)postgres=# select i, (select random() where i=i ) rand from generate_series(1, 3) i;i | rand---+-------------------1 | 0.4264438627287752 | 0.1330719976685943 | 0.751982506364584(3 rows)postgres=# select i, (select random() where i=i or i is null ) rand from generate_series(1, 3) i;i | rand---+-------------------1 | 0.3209824068471792 | 0.9967622528783983 | 0.076554249972105(3 rows)Looks like dependence is not there anymore if PG is smart enough to simplify boolean expressions)Anyway, as some older PG versions and Oracle behave similarly I suppose this result is expected and desired (correct?),but unfortunately not well-documented (did I miss it mentioned?).Can anyone shed some light on this and/or probably update docs?P.S.I got bitten by a statement like this:select (select nextval('someseq') * a + b from somefunc()), col1, ....with a and b being OUT parameters of somefunc().
Vyacheslav Kalinin <vka@mgcp.com> writes: > I just got my hands on mysql (5.0.something) and it does not cache the > scalar subquery result. > So... now I'm completely puzzled whether this is a bug, a desired result or > just a loosely standardized thing. It's loosely standardized. AFAICS, the spec doesn't address the detailed semantics of subqueries at all, except in wording to this effect: Each <subquery> in the <search condition> is effectively executed for each row of T and the results used in the ap- plication of the <search condition> to the given row of T. If any executed <subquery> contains an outer reference to a column of T, the reference is to the value of that column in the given row of T. There is wording like this for subqueries in WHERE and HAVING, but I haven't found anything at all that mentions the behavior for subqueries in the SELECT targetlist. In any case, the fact that they said "effectively executed" and not simply "executed" seems to be meant to leave implementors a lot of wiggle room. In particular, there isn't any wording that I can find suggesting that the presence of volatile (or in the spec's classification, nondeterministic) functions ought to affect the behavior. PG's interpretation is that if there is no outer reference in a subquery, it's okay to implement it as an initplan, meaning it gets evaluated at most once per call of the containing query. We don't pay attention to whether there are volatile functions in there. regards, tom lane
Thanks, Tom
Can this be clarified in docs? It is stated there now that scalar subquery is one of the kinds of expressions
and it is somewhat counter-intuitive that an expression may sometimes not respect its own degree of volatility.
On Wed, Sep 1, 2010 at 2:16 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Vyacheslav Kalinin <vka@mgcp.com> writes:It's loosely standardized.
> I just got my hands on mysql (5.0.something) and it does not cache the
> scalar subquery result.
> So... now I'm completely puzzled whether this is a bug, a desired result or
> just a loosely standardized thing.
AFAICS, the spec doesn't address the detailed semantics of subqueries at
all, except in wording to this effect:
Each <subquery> in the <search condition> is effectively
executed for each row of T and the results used in the ap-
plication of the <search condition> to the given row of T.
If any executed <subquery> contains an outer reference to a
column of T, the reference is to the value of that column in
the given row of T.
There is wording like this for subqueries in WHERE and HAVING, but I
haven't found anything at all that mentions the behavior for subqueries
in the SELECT targetlist. In any case, the fact that they said
"effectively executed" and not simply "executed" seems to be meant to
leave implementors a lot of wiggle room.
In particular, there isn't any wording that I can find suggesting
that the presence of volatile (or in the spec's classification,
nondeterministic) functions ought to affect the behavior.
PG's interpretation is that if there is no outer reference in a
subquery, it's okay to implement it as an initplan, meaning it gets
evaluated at most once per call of the containing query. We don't
pay attention to whether there are volatile functions in there.
regards, tom lane