Thread: Joining bug????

Joining bug????

From

pierre

Date:

26 October 1998, 22:39:47

I've been attempting to get my DB tuned to handle joins as best as
possible. However, no matter which way I try to perform the joins, if I
attempt to join more than two or three tables the joins becomes
unbearably long not matter how many indexes I create in however many ways.

My only solution was to create a hybrid table that contains the join of
all of the tables I'm searching on with multi-keyed indexes. This is a
VERY kludgy solution that makes changing the keys to my DB hard to change.

IS there a problem with postgresql in performing joins? Is there a fix?
What gives?

-=pierre

Re: [SQL] Joining bug????

From

"Gene Selkov Jr."

Date:

26 October 1998, 23:17:16

> I've been attempting to get my DB tuned to handle joins as best as
> possible. However, no matter which way I try to perform the joins, if I
> attempt to join more than two or three tables the joins becomes
> unbearably long not matter how many indexes I create in however many ways.

Ditto. Never saw a quadruple join succeed, even with relatively small tables.

> My only solution was to create a hybrid table that contains the join of
> all of the tables I'm searching on with multi-keyed indexes. This is a
> VERY kludgy solution that makes changing the keys to my DB hard to change.

The solution I use may be the worst imaginable kludge, but it works great: I do the joins (as well as other set
operations)on the client side. Perl hashes are very good for that. 

--Gene

Re: [SQL] Joining bug????

From

David Hartwig

Date:

27 October 1998, 11:40:54

pierre wrote:

> I've been attempting to get my DB tuned to handle joins as best as
> possible. However, no matter which way I try to perform the joins, if I
> attempt to join more than two or three tables the joins becomes
> unbearably long not matter how many indexes I create in however many ways.
>
> My only solution was to create a hybrid table that contains the join of
> all of the tables I'm searching on with multi-keyed indexes. This is a
> VERY kludgy solution that makes changing the keys to my DB hard to change.
>
> IS there a problem with postgresql in performing joins? Is there a fix?
> What gives?

There are many reasons for a query to take more time than expected.
PostgreSQL can do reasonably well joining many tables.   It is impossible for
anyone to even guess without more specifics.

Try submitting the specific query and the result of an EXPLAIN.
Example:
   EXPLAIN SELECT foo FROM bar;

Table sizes, indices, and time elapsed are also helpful.

BTW, in case you don't know, the "VACUUM ANALYZE" statement need to be run
occasionally to facilitate the query planner.

Re: [SQL] Joining bug????

From

pierre@desertmoon.com

Date:

27 October 1998, 11:55:06

>
>
>
> pierre wrote:
>
> > I've been attempting to get my DB tuned to handle joins as best as
> > possible. However, no matter which way I try to perform the joins, if I
> > attempt to join more than two or three tables the joins becomes
> > unbearably long not matter how many indexes I create in however many ways.
> >
> > My only solution was to create a hybrid table that contains the join of
> > all of the tables I'm searching on with multi-keyed indexes. This is a
> > VERY kludgy solution that makes changing the keys to my DB hard to change.
> >
> > IS there a problem with postgresql in performing joins? Is there a fix?
> > What gives?
>
> There are many reasons for a query to take more time than expected.
> PostgreSQL can do reasonably well joining many tables.   It is impossible for
> anyone to even guess without more specifics.
>
> Try submitting the specific query and the result of an EXPLAIN.
> Example:
>    EXPLAIN SELECT foo FROM bar;
>
> Table sizes, indices, and time elapsed are also helpful.

This query takes about 30seconds...way too long for my needs...I've got
the query down to 2-5 seconds by using a hybrid table that is the join
between the pos and pcat tables. All _id fields are indexed, as is the
keyword field.

explain select p.prod_name from prod p, pos o, pcat c, pkey k
where p.prod_id = o.pos_prod_id and
o.pos_os_id = 2 and
o.pos_prod_id = c.pcat_prod_id and
c.pcat_cat_id = 6 and
c.pcat_prod_id = k.pkey_prod_id and
k.pkey_keyword = 'photoshop';
NOTICE:  QUERY PLAN:

Nested Loop  (cost=8.10 size=2 width=28)
  ->  Nested Loop  (cost=6.05 size=1 width=12)
        ->  Nested Loop  (cost=4.05 size=1 width=8)
              ->  Index Scan on o  (cost=2.05 size=1 width=4)
              ->  Index Scan on c  (cost=2.00 size=2 width=4)
        ->  Index Scan on k  (cost=2.00 size=2 width=4)
  ->  Index Scan on p  (cost=2.05 size=85442 width=16)


select count(*) from prod;
EXPLAIN
count
-----
85443
(1 row)


select count(*) from pos;
 count
------
132564
(1 row)


select count(*) from pcat;
 count
------
337251
(1 row)



select count(*) from pkey;
 count
------
492550
(1 row)

EOF
>
> BTW, in case you don't know, the "VACUUM ANALYZE" statement need to be run
> occasionally to facilitate the query planner.
>

All data is newley imported and indexed. So I shouldn't have to perform
this. This web site that is using this DB is still in debug mode.

Thanks for taking a look.

-=pierre

Re: [SQL] Joining bug????

From

David Hartwig

Date:

27 October 1998, 13:41:45

Nice query.   You mentioned having indexes on id columns etc.    Are the indices
single column, or multi-column?   You may want to consider making some of them
multi-column.   The EXPLAIN does indicate usage of your indices, but it may not be
optimal.   If the distribution of your data in the indices is such that the each
index scan must go through large chunks of data to qualify the other attributes in
the join, it will run slow.   By adding some well placed multi-part indices, the
index scans will be narrower in their scan sets.    Or not.

Also,  I could be wrong, but, I have found even batch loaded data needs a VACUUM
ANALYZE to gather distributions statistics.

pierre@desertmoon.com wrote:

> >
> >
> >
> > pierre wrote:
> >
> > > I've been attempting to get my DB tuned to handle joins as best as
> > > possible. However, no matter which way I try to perform the joins, if I
> > > attempt to join more than two or three tables the joins becomes
> > > unbearably long not matter how many indexes I create in however many ways.
> > >
> > > My only solution was to create a hybrid table that contains the join of
> > > all of the tables I'm searching on with multi-keyed indexes. This is a
> > > VERY kludgy solution that makes changing the keys to my DB hard to change.
> > >
> > > IS there a problem with postgresql in performing joins? Is there a fix?
> > > What gives?
> >
> > There are many reasons for a query to take more time than expected.
> > PostgreSQL can do reasonably well joining many tables.   It is impossible for
> > anyone to even guess without more specifics.
> >
> > Try submitting the specific query and the result of an EXPLAIN.
> > Example:
> >    EXPLAIN SELECT foo FROM bar;
> >
> > Table sizes, indices, and time elapsed are also helpful.
>
> This query takes about 30seconds...way too long for my needs...I've got
> the query down to 2-5 seconds by using a hybrid table that is the join
> between the pos and pcat tables. All _id fields are indexed, as is the
> keyword field.
>
> explain select p.prod_name from prod p, pos o, pcat c, pkey k
> where p.prod_id = o.pos_prod_id and
> o.pos_os_id = 2 and
> o.pos_prod_id = c.pcat_prod_id and
> c.pcat_cat_id = 6 and
> c.pcat_prod_id = k.pkey_prod_id and
> k.pkey_keyword = 'photoshop';
> NOTICE:  QUERY PLAN:
>
> Nested Loop  (cost=8.10 size=2 width=28)
>   ->  Nested Loop  (cost=6.05 size=1 width=12)
>         ->  Nested Loop  (cost=4.05 size=1 width=8)
>               ->  Index Scan on o  (cost=2.05 size=1 width=4)
>               ->  Index Scan on c  (cost=2.00 size=2 width=4)
>         ->  Index Scan on k  (cost=2.00 size=2 width=4)
>   ->  Index Scan on p  (cost=2.05 size=85442 width=16)
>
> select count(*) from prod;
> EXPLAIN
> count
> -----
> 85443
> (1 row)
>
> select count(*) from pos;
>  count
> ------
> 132564
> (1 row)
>
> select count(*) from pcat;
>  count
> ------
> 337251
> (1 row)
>
> select count(*) from pkey;
>  count
> ------
> 492550
> (1 row)
>
> EOF
> >
> > BTW, in case you don't know, the "VACUUM ANALYZE" statement need to be run
> > occasionally to facilitate the query planner.
> >
>
> All data is newley imported and indexed. So I shouldn't have to perform
> this. This web site that is using this DB is still in debug mode.
>
> Thanks for taking a look.
>
> -=pierre

Re: [SQL] Joining bug????

From

pierre@desertmoon.com

Date:

27 October 1998, 14:00:16

>
> Nice query.   You mentioned having indexes on id columns etc.    Are the indices
> single column, or multi-column?   You may want to consider making some of them
> multi-column.   The EXPLAIN does indicate usage of your indices, but it may not be
> optimal.   If the distribution of your data in the indices is such that the each
> index scan must go through large chunks of data to qualify the other attributes in
> the join, it will run slow.   By adding some well placed multi-part indices, the
> index scans will be narrower in their scan sets.    Or not.
>

My apologies I should have been a bit more specific. The indices are multi
column, _id first prod_id second. Also  A search for just the keyword
photoshop returns just 35 or so values. I was under the impression that postgresql
used the bottom up method of optimization. I.e. the most narrow criteria
should be declared last, why doesn't the explain show the use of the keyword
first?

> Also,  I could be wrong, but, I have found even batch loaded data needs a VACUUM
> ANALYZE to gather distributions statistics.

I'll try just about anything at this point....oh...and most of these tables
have been clustered by their major indexes.

-=pierre

Re: [SQL] Joining bug????

From

David Hartwig

Date:

27 October 1998, 14:11:29

I would try the following indices:

prod - (prod_id)
pos -   (pos_os_id, pos_prod_id)
pcat: - (pcat_prod_id, pcat_cat_id)
pkey - ( pkey_keyword, pkey_prod_id)

Various permutations of order may also effect the outcome.

pierre@desertmoon.com wrote:

> >
> >
> >
> > pierre wrote:
> >
> > > I've been attempting to get my DB tuned to handle joins as best as
> > > possible. However, no matter which way I try to perform the joins, if I
> > > attempt to join more than two or three tables the joins becomes
> > > unbearably long not matter how many indexes I create in however many ways.
> > >
> > > My only solution was to create a hybrid table that contains the join of
> > > all of the tables I'm searching on with multi-keyed indexes. This is a
> > > VERY kludgy solution that makes changing the keys to my DB hard to change.
> > >
> > > IS there a problem with postgresql in performing joins? Is there a fix?
> > > What gives?
> >
> > There are many reasons for a query to take more time than expected.
> > PostgreSQL can do reasonably well joining many tables.   It is impossible for
> > anyone to even guess without more specifics.
> >
> > Try submitting the specific query and the result of an EXPLAIN.
> > Example:
> >    EXPLAIN SELECT foo FROM bar;
> >
> > Table sizes, indices, and time elapsed are also helpful.
>
> This query takes about 30seconds...way too long for my needs...I've got
> the query down to 2-5 seconds by using a hybrid table that is the join
> between the pos and pcat tables. All _id fields are indexed, as is the
> keyword field.
>
> explain select p.prod_name from prod p, pos o, pcat c, pkey k
> where p.prod_id = o.pos_prod_id and
> o.pos_os_id = 2 and
> o.pos_prod_id = c.pcat_prod_id and
> c.pcat_cat_id = 6 and
> c.pcat_prod_id = k.pkey_prod_id and
> k.pkey_keyword = 'photoshop';
> NOTICE:  QUERY PLAN:
>
> Nested Loop  (cost=8.10 size=2 width=28)
>   ->  Nested Loop  (cost=6.05 size=1 width=12)
>         ->  Nested Loop  (cost=4.05 size=1 width=8)
>               ->  Index Scan on o  (cost=2.05 size=1 width=4)
>               ->  Index Scan on c  (cost=2.00 size=2 width=4)
>         ->  Index Scan on k  (cost=2.00 size=2 width=4)
>   ->  Index Scan on p  (cost=2.05 size=85442 width=16)
>
> select count(*) from prod;
> EXPLAIN
> count
> -----
> 85443
> (1 row)
>
> select count(*) from pos;
>  count
> ------
> 132564
> (1 row)
>
> select count(*) from pcat;
>  count
> ------
> 337251
> (1 row)
>
> select count(*) from pkey;
>  count
> ------
> 492550
> (1 row)
>
> EOF
> >
> > BTW, in case you don't know, the "VACUUM ANALYZE" statement need to be run
> > occasionally to facilitate the query planner.
> >
>
> All data is newley imported and indexed. So I shouldn't have to perform
> this. This web site that is using this DB is still in debug mode.
>
> Thanks for taking a look.
>
> -=pierre

Re: [SQL] Joining bug????

From

Howie

Date:

27 October 1998, 17:53:58

On Mon, 26 Oct 1998, Gene Selkov Jr. wrote:

>
> > I've been attempting to get my DB tuned to handle joins as best as
> > possible. However, no matter which way I try to perform the joins, if I
> > attempt to join more than two or three tables the joins becomes
> > unbearably long not matter how many indexes I create in however many ways.
>
> Ditto. Never saw a quadruple join succeed, even with relatively small tables.

Ive been playing with the idea of moving a client from MySQL to Postgres (
views, triggers, and subselects would _really_ make my life easier ).

---[ CUT ]---
SELECT
        account.accnum,
        account.accname,
        ((customers.firstname || ' ') || customers.lastname) as Customer,
        acctype.descr,
        account.balance,
        account.status,
        billdate.next
FROM
        account,customers,acctype,billdate
WHERE
        account.custnum=customers.custnum AND
        account.acctype=acctype.accid AND
        account.accnum=billdate.accnum AND
        account.accnum<20
ORDER BY
        account.accnum
---[ CUT ]---

works perfectly, takes about 2 seconds to complete.  granted i have
( sometimes unique ) indexes on all the join columns.

account has 7k rows, customers has 5k rows, acctype has 12 rows,
billdate has 7k rows.

a somewhat modified, real-world query ends up crashing postgres, though:

---[ CUT ]---
select
        account.accnum,
        account.accname,
        account.totalfee,
        billdate.next,
        ((customers.firstname || ' ') || customers.lastname) as Customer,
        customers.company,
        customers.street,
        acctype.yearly
FROM
        account,customers,acctype,billdate
where
        account.totalfee>0.00 AND
        billtype=1 OR (billcc1stmo=1 AND account.created=billdate.last)
AND
        balance<>0.00 AND
        billdate.next>='1998-01-01' AND
        billdate.next<='1998-01-05' AND
        account.status<>'C' AND
        billdate.accnum=account.accnum AND
        account.custnum=customers.custnum AND
        account.acctype=acctype.accid
---[ CUT ]---


> > My only solution was to create a hybrid table that contains the join of
> > all of the tables I'm searching on with multi-keyed indexes. This is a
> > VERY kludgy solution that makes changing the keys to my DB hard to change.
>
> The solution I use may be the worst imaginable kludge, but it works
> great: I do the joins (as well as other set operations) on the client
> side. Perl hashes are very good for that.

try making a view.  from what others have said, views save the query plan
and have usually, at least for me, been _alot_ faster than normal queries.

---
Howie <caffeine@toodarkpark.org>   URL: http://www.toodarkpark.org
        "Oh my god, they killed init!  YOU BASTARDS!"

Re: [SQL] Joining bug????

From

jwieck@debis.com (Jan Wieck)

Date:

28 October 1998, 06:27:33

> try making a view.  from what others have said, views save the query plan
> and have usually, at least for me, been _alot_ faster than normal queries.

    Views don't save query plans. They are parsetree templates in
    query rewrite rules.

>
> ---
> Howie <caffeine@toodarkpark.org>   URL: http://www.toodarkpark.org
>         "Oh my god, they killed init!  YOU BASTARDS!"


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #