Thread: CROSS JOIN performance

CROSS JOIN performance

From
Andy Chambers
Date:
Hi,

In our porting of a big mysql app to postgres, we're finding lots of
queries like

select foo
  from (foo f, bar b)
left join caz c on f.id = f.caz_id
where f.id = b.foo_id

I've seen the message where Tom explains why this is invalid in ANSI
SQL so I converted it to

select foo
  from foo f CROSS JOIN bar b
left join caz c on f.id = f.caz_id
where f.id = b.foo_id

...and it works.  However, sometimes quite slowly.  When we've looked
into the slow ones, we've found that changing it again to

select foo
  from foo f INNER JOIN bar b ON f.id = b.foo_id
left join caz c on f.id = f.caz_id

makes it perform much better.

Furthermore, we're starting to find that performance of the 3rd is
significantly better than the 2nd, *ONLY* when the CROSS JOINs are
followed by more joins (like in this case).  If there are no more
tables being joined, changing to the 3rd version yields no performance
gain.

Are these three queries logically equivalent (well, at least the
latter two since the first isn't valid SQL)?  If so, does it make
sense that the optimizer has difficulty with the second case.

Cheers,
Andy

--
Andy Chambers

Re: CROSS JOIN performance

From
Tom Lane
Date:
Andy Chambers <achambers@mcna.net> writes:
> In our porting of a big mysql app to postgres, we're finding lots of
> queries like

> select foo
>   from (foo f, bar b)
> left join caz c on f.id = f.caz_id
> where f.id = b.foo_id

> I've seen the message where Tom explains why this is invalid in ANSI
> SQL so I converted it to

> select foo
>   from foo f CROSS JOIN bar b
> left join caz c on f.id = f.caz_id
> where f.id = b.foo_id

> ...and it works.  However, sometimes quite slowly.  When we've looked
> into the slow ones, we've found that changing it again to

> select foo
>   from foo f INNER JOIN bar b ON f.id = b.foo_id
> left join caz c on f.id = f.caz_id

> makes it perform much better.

Those formulations look equivalent to me.  Could you provide a concrete
test case, or at least some specific queries and their EXPLAIN ANALYZE
results?  Which PG version are we talking about?  Oh, and are the LEFT
JOIN conditions really not constraining table c at all?

            regards, tom lane