Thread: Joining bug????
I've been attempting to get my DB tuned to handle joins as best as possible. However, no matter which way I try to perform the joins, if I attempt to join more than two or three tables the joins becomes unbearably long not matter how many indexes I create in however many ways. My only solution was to create a hybrid table that contains the join of all of the tables I'm searching on with multi-keyed indexes. This is a VERY kludgy solution that makes changing the keys to my DB hard to change. IS there a problem with postgresql in performing joins? Is there a fix? What gives? -=pierre
> I've been attempting to get my DB tuned to handle joins as best as > possible. However, no matter which way I try to perform the joins, if I > attempt to join more than two or three tables the joins becomes > unbearably long not matter how many indexes I create in however many ways. Ditto. Never saw a quadruple join succeed, even with relatively small tables. > My only solution was to create a hybrid table that contains the join of > all of the tables I'm searching on with multi-keyed indexes. This is a > VERY kludgy solution that makes changing the keys to my DB hard to change. The solution I use may be the worst imaginable kludge, but it works great: I do the joins (as well as other set operations)on the client side. Perl hashes are very good for that. --Gene
pierre wrote: > I've been attempting to get my DB tuned to handle joins as best as > possible. However, no matter which way I try to perform the joins, if I > attempt to join more than two or three tables the joins becomes > unbearably long not matter how many indexes I create in however many ways. > > My only solution was to create a hybrid table that contains the join of > all of the tables I'm searching on with multi-keyed indexes. This is a > VERY kludgy solution that makes changing the keys to my DB hard to change. > > IS there a problem with postgresql in performing joins? Is there a fix? > What gives? There are many reasons for a query to take more time than expected. PostgreSQL can do reasonably well joining many tables. It is impossible for anyone to even guess without more specifics. Try submitting the specific query and the result of an EXPLAIN. Example: EXPLAIN SELECT foo FROM bar; Table sizes, indices, and time elapsed are also helpful. BTW, in case you don't know, the "VACUUM ANALYZE" statement need to be run occasionally to facilitate the query planner.
> > > > pierre wrote: > > > I've been attempting to get my DB tuned to handle joins as best as > > possible. However, no matter which way I try to perform the joins, if I > > attempt to join more than two or three tables the joins becomes > > unbearably long not matter how many indexes I create in however many ways. > > > > My only solution was to create a hybrid table that contains the join of > > all of the tables I'm searching on with multi-keyed indexes. This is a > > VERY kludgy solution that makes changing the keys to my DB hard to change. > > > > IS there a problem with postgresql in performing joins? Is there a fix? > > What gives? > > There are many reasons for a query to take more time than expected. > PostgreSQL can do reasonably well joining many tables. It is impossible for > anyone to even guess without more specifics. > > Try submitting the specific query and the result of an EXPLAIN. > Example: > EXPLAIN SELECT foo FROM bar; > > Table sizes, indices, and time elapsed are also helpful. This query takes about 30seconds...way too long for my needs...I've got the query down to 2-5 seconds by using a hybrid table that is the join between the pos and pcat tables. All _id fields are indexed, as is the keyword field. explain select p.prod_name from prod p, pos o, pcat c, pkey k where p.prod_id = o.pos_prod_id and o.pos_os_id = 2 and o.pos_prod_id = c.pcat_prod_id and c.pcat_cat_id = 6 and c.pcat_prod_id = k.pkey_prod_id and k.pkey_keyword = 'photoshop'; NOTICE: QUERY PLAN: Nested Loop (cost=8.10 size=2 width=28) -> Nested Loop (cost=6.05 size=1 width=12) -> Nested Loop (cost=4.05 size=1 width=8) -> Index Scan on o (cost=2.05 size=1 width=4) -> Index Scan on c (cost=2.00 size=2 width=4) -> Index Scan on k (cost=2.00 size=2 width=4) -> Index Scan on p (cost=2.05 size=85442 width=16) select count(*) from prod; EXPLAIN count ----- 85443 (1 row) select count(*) from pos; count ------ 132564 (1 row) select count(*) from pcat; count ------ 337251 (1 row) select count(*) from pkey; count ------ 492550 (1 row) EOF > > BTW, in case you don't know, the "VACUUM ANALYZE" statement need to be run > occasionally to facilitate the query planner. > All data is newley imported and indexed. So I shouldn't have to perform this. This web site that is using this DB is still in debug mode. Thanks for taking a look. -=pierre
Nice query. You mentioned having indexes on id columns etc. Are the indices single column, or multi-column? You may want to consider making some of them multi-column. The EXPLAIN does indicate usage of your indices, but it may not be optimal. If the distribution of your data in the indices is such that the each index scan must go through large chunks of data to qualify the other attributes in the join, it will run slow. By adding some well placed multi-part indices, the index scans will be narrower in their scan sets. Or not. Also, I could be wrong, but, I have found even batch loaded data needs a VACUUM ANALYZE to gather distributions statistics. pierre@desertmoon.com wrote: > > > > > > > > pierre wrote: > > > > > I've been attempting to get my DB tuned to handle joins as best as > > > possible. However, no matter which way I try to perform the joins, if I > > > attempt to join more than two or three tables the joins becomes > > > unbearably long not matter how many indexes I create in however many ways. > > > > > > My only solution was to create a hybrid table that contains the join of > > > all of the tables I'm searching on with multi-keyed indexes. This is a > > > VERY kludgy solution that makes changing the keys to my DB hard to change. > > > > > > IS there a problem with postgresql in performing joins? Is there a fix? > > > What gives? > > > > There are many reasons for a query to take more time than expected. > > PostgreSQL can do reasonably well joining many tables. It is impossible for > > anyone to even guess without more specifics. > > > > Try submitting the specific query and the result of an EXPLAIN. > > Example: > > EXPLAIN SELECT foo FROM bar; > > > > Table sizes, indices, and time elapsed are also helpful. > > This query takes about 30seconds...way too long for my needs...I've got > the query down to 2-5 seconds by using a hybrid table that is the join > between the pos and pcat tables. All _id fields are indexed, as is the > keyword field. > > explain select p.prod_name from prod p, pos o, pcat c, pkey k > where p.prod_id = o.pos_prod_id and > o.pos_os_id = 2 and > o.pos_prod_id = c.pcat_prod_id and > c.pcat_cat_id = 6 and > c.pcat_prod_id = k.pkey_prod_id and > k.pkey_keyword = 'photoshop'; > NOTICE: QUERY PLAN: > > Nested Loop (cost=8.10 size=2 width=28) > -> Nested Loop (cost=6.05 size=1 width=12) > -> Nested Loop (cost=4.05 size=1 width=8) > -> Index Scan on o (cost=2.05 size=1 width=4) > -> Index Scan on c (cost=2.00 size=2 width=4) > -> Index Scan on k (cost=2.00 size=2 width=4) > -> Index Scan on p (cost=2.05 size=85442 width=16) > > select count(*) from prod; > EXPLAIN > count > ----- > 85443 > (1 row) > > select count(*) from pos; > count > ------ > 132564 > (1 row) > > select count(*) from pcat; > count > ------ > 337251 > (1 row) > > select count(*) from pkey; > count > ------ > 492550 > (1 row) > > EOF > > > > BTW, in case you don't know, the "VACUUM ANALYZE" statement need to be run > > occasionally to facilitate the query planner. > > > > All data is newley imported and indexed. So I shouldn't have to perform > this. This web site that is using this DB is still in debug mode. > > Thanks for taking a look. > > -=pierre
> > Nice query. You mentioned having indexes on id columns etc. Are the indices > single column, or multi-column? You may want to consider making some of them > multi-column. The EXPLAIN does indicate usage of your indices, but it may not be > optimal. If the distribution of your data in the indices is such that the each > index scan must go through large chunks of data to qualify the other attributes in > the join, it will run slow. By adding some well placed multi-part indices, the > index scans will be narrower in their scan sets. Or not. > My apologies I should have been a bit more specific. The indices are multi column, _id first prod_id second. Also A search for just the keyword photoshop returns just 35 or so values. I was under the impression that postgresql used the bottom up method of optimization. I.e. the most narrow criteria should be declared last, why doesn't the explain show the use of the keyword first? > Also, I could be wrong, but, I have found even batch loaded data needs a VACUUM > ANALYZE to gather distributions statistics. I'll try just about anything at this point....oh...and most of these tables have been clustered by their major indexes. -=pierre
I would try the following indices: prod - (prod_id) pos - (pos_os_id, pos_prod_id) pcat: - (pcat_prod_id, pcat_cat_id) pkey - ( pkey_keyword, pkey_prod_id) Various permutations of order may also effect the outcome. pierre@desertmoon.com wrote: > > > > > > > > pierre wrote: > > > > > I've been attempting to get my DB tuned to handle joins as best as > > > possible. However, no matter which way I try to perform the joins, if I > > > attempt to join more than two or three tables the joins becomes > > > unbearably long not matter how many indexes I create in however many ways. > > > > > > My only solution was to create a hybrid table that contains the join of > > > all of the tables I'm searching on with multi-keyed indexes. This is a > > > VERY kludgy solution that makes changing the keys to my DB hard to change. > > > > > > IS there a problem with postgresql in performing joins? Is there a fix? > > > What gives? > > > > There are many reasons for a query to take more time than expected. > > PostgreSQL can do reasonably well joining many tables. It is impossible for > > anyone to even guess without more specifics. > > > > Try submitting the specific query and the result of an EXPLAIN. > > Example: > > EXPLAIN SELECT foo FROM bar; > > > > Table sizes, indices, and time elapsed are also helpful. > > This query takes about 30seconds...way too long for my needs...I've got > the query down to 2-5 seconds by using a hybrid table that is the join > between the pos and pcat tables. All _id fields are indexed, as is the > keyword field. > > explain select p.prod_name from prod p, pos o, pcat c, pkey k > where p.prod_id = o.pos_prod_id and > o.pos_os_id = 2 and > o.pos_prod_id = c.pcat_prod_id and > c.pcat_cat_id = 6 and > c.pcat_prod_id = k.pkey_prod_id and > k.pkey_keyword = 'photoshop'; > NOTICE: QUERY PLAN: > > Nested Loop (cost=8.10 size=2 width=28) > -> Nested Loop (cost=6.05 size=1 width=12) > -> Nested Loop (cost=4.05 size=1 width=8) > -> Index Scan on o (cost=2.05 size=1 width=4) > -> Index Scan on c (cost=2.00 size=2 width=4) > -> Index Scan on k (cost=2.00 size=2 width=4) > -> Index Scan on p (cost=2.05 size=85442 width=16) > > select count(*) from prod; > EXPLAIN > count > ----- > 85443 > (1 row) > > select count(*) from pos; > count > ------ > 132564 > (1 row) > > select count(*) from pcat; > count > ------ > 337251 > (1 row) > > select count(*) from pkey; > count > ------ > 492550 > (1 row) > > EOF > > > > BTW, in case you don't know, the "VACUUM ANALYZE" statement need to be run > > occasionally to facilitate the query planner. > > > > All data is newley imported and indexed. So I shouldn't have to perform > this. This web site that is using this DB is still in debug mode. > > Thanks for taking a look. > > -=pierre
On Mon, 26 Oct 1998, Gene Selkov Jr. wrote: > > > I've been attempting to get my DB tuned to handle joins as best as > > possible. However, no matter which way I try to perform the joins, if I > > attempt to join more than two or three tables the joins becomes > > unbearably long not matter how many indexes I create in however many ways. > > Ditto. Never saw a quadruple join succeed, even with relatively small tables. Ive been playing with the idea of moving a client from MySQL to Postgres ( views, triggers, and subselects would _really_ make my life easier ). ---[ CUT ]--- SELECT account.accnum, account.accname, ((customers.firstname || ' ') || customers.lastname) as Customer, acctype.descr, account.balance, account.status, billdate.next FROM account,customers,acctype,billdate WHERE account.custnum=customers.custnum AND account.acctype=acctype.accid AND account.accnum=billdate.accnum AND account.accnum<20 ORDER BY account.accnum ---[ CUT ]--- works perfectly, takes about 2 seconds to complete. granted i have ( sometimes unique ) indexes on all the join columns. account has 7k rows, customers has 5k rows, acctype has 12 rows, billdate has 7k rows. a somewhat modified, real-world query ends up crashing postgres, though: ---[ CUT ]--- select account.accnum, account.accname, account.totalfee, billdate.next, ((customers.firstname || ' ') || customers.lastname) as Customer, customers.company, customers.street, acctype.yearly FROM account,customers,acctype,billdate where account.totalfee>0.00 AND billtype=1 OR (billcc1stmo=1 AND account.created=billdate.last) AND balance<>0.00 AND billdate.next>='1998-01-01' AND billdate.next<='1998-01-05' AND account.status<>'C' AND billdate.accnum=account.accnum AND account.custnum=customers.custnum AND account.acctype=acctype.accid ---[ CUT ]--- > > My only solution was to create a hybrid table that contains the join of > > all of the tables I'm searching on with multi-keyed indexes. This is a > > VERY kludgy solution that makes changing the keys to my DB hard to change. > > The solution I use may be the worst imaginable kludge, but it works > great: I do the joins (as well as other set operations) on the client > side. Perl hashes are very good for that. try making a view. from what others have said, views save the query plan and have usually, at least for me, been _alot_ faster than normal queries. --- Howie <caffeine@toodarkpark.org> URL: http://www.toodarkpark.org "Oh my god, they killed init! YOU BASTARDS!"
> try making a view. from what others have said, views save the query plan > and have usually, at least for me, been _alot_ faster than normal queries. Views don't save query plans. They are parsetree templates in query rewrite rules. > > --- > Howie <caffeine@toodarkpark.org> URL: http://www.toodarkpark.org > "Oh my god, they killed init! YOU BASTARDS!" Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #======================================== jwieck@debis.com (Jan Wieck) #