Home > mailing lists

Badly planned queries with JOIN syntax - Mailing list pgsql-general

From	Phil Mayers
Subject	Badly planned queries with JOIN syntax
Date	April 4, 2003 10:04:39
Msg-id	1049468639.3e8d9edf9fcd9@wildfire0.net.ic.ac.uk Whole thread Raw
Responses	Re: Badly planned queries with JOIN syntax Re: Badly planned queries with JOIN syntax
List	pgsql-general

Tree view

All (apologies if this gets posted twice - my outgoing email had changed since I
last posted and Majordomo got confused),

I have a requirement for some rather complex multi-table queries involving
inner, outer and full joins. However, I'm running into some problems because the
planner always JOINs in the order I give them (as documented) - which is not the
optimal plan. The query is (very) dynamically generated, so it's not as simple
as "order the JOINs right" because there are some 40,000 possible queries (and
that's just with the current data and table set).

What I would like to do is push all JOIN constraints down into a WHERE clause,
and for INNER joins specified this way the planner seems to generate the optimal
query each time (since it has freedom to re-order). However, under Postgresql,
I'm not aware of any way of doing OUTER joins with a WHERE clause (I believe
ANSI SQL92 had a "table.column *= otable.ocolumn" which equates to "table LEFT
OUTER JOIN otable on column=ocolumn").

An example:

create table a (id text, somedata text, somedata2 text, primary key (id));
create table b (id text, pid text not null, extradata text, primary key (id));
create index b_pid on b(pid);
create table c (id text, pid textnot null, moredata text, primary key (id));
create index c_pid on c(pid);

a, b, c contain tens of thousands of rows. The search function can search on any
field, but if the user searches on "moredata", you can do:

select * from a join b on b.pid = a.id join c on c.pid = b.id where
moredata like 'blah%';

.This gives me a query plan that does a sequential scan over a and b (usually
with a hash join) before joining to c, which it will index scan. However,
reordering that:

select * from c join b on c.pid = b.id join a on b.pid = a.id where
moredata like 'blah%';

.or doing a query with a like on table a:

select * from a join b on b.pid = a.id join c on c.pid = b.id where
a.somedata like 'foo%';

.will do an indexed scan, which is the optimal plan (I have verified this and
am aware that indexed scans are not always the optimal plan - they *are* in this
case, I assure you).




Put another way - I *don't* want to use the order of the JOINs as an explicit
command to the planner, but *do* need to use the JOIN syntax since I need OUTER
and FULL joins in some or all queries (which you can's specify with WHERE).

I could re-order the JOINs such that the LIKEd tables come first, but that's
really the job of the planner, and some of these queries involve large numbers
of tables and very complex join conditions (as I said, 40,000+ possible query
formats), so it's not obvious to me *how* to order them programatically - but of
course, the planner knows.

Suggestions?

--
Regards,
Phil

+------------------------------------------+
| Phil Mayers                              |
| Network & Infrastructure Group           |
| Information & Communication Technologies |
| Imperial College                         |
+------------------------------------------+





-------------------------------------------------
This mail sent through IMP: http://horde.org/imp/

pgsql-general by date:

From: Tom Lane
Date: 04 April 2003, 09:42:48
Subject: Re: feature request - adding columns with default value

From: Laurent Perez
Date: 04 April 2003, 10:15:18
Subject: heap_mark4update error on UPDATE

Badly planned queries with JOIN syntax - Mailing list pgsql-general

Previous

Next