Thread: JOIN vs. LEFT JOIN
Hi !
I heard that it is always better to use a full JOIN rather than a LEFT JOIN, for performance reasons. Is that correct ?
But it's barely possible to use full JOINS all the time, since most of the lookup fields are not required.
Example : in the customer table, the customer type is a looup field to a customer_type table. But that is not required to post the record. So I was thinking to create a record in the customer_type table with ID = -1. And in case the customer type is NULL, to assign it the value -1. That way, it will be always possible to do a full JOIN. I was wondering if that is good practice or rather nonsense ?
Many thanks in advance,
Nico
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Nico Callewaert schrieb: > Hi ! > > I heard that it is always better to use a full JOIN rather than a LEFT > JOIN, for performance reasons. Is that correct ? please note that a JOIN and a LEFT JOIN are tow different things. Let's assume you have two tables (without any sens): books: headline | content - ----------+------------- politics | russia politics | middle east politics | germany interests: content - --------- germany Then fire two select statements like this: #1: SELECT a.headline,a.content,b.content as contentb FROM books a JOIN interests b ON a.content = b.content; headline | content | contentb - ----------+---------+---------- politics | germany | germany (1 row) #2: SELECT a.headline,a.content,b.content as contentb FROM books a LEFT JOIN interests b on a.content = b.content; headline | content | contentb - ----------+-------------+---------- politics | russia | politics | middle east | politics | germany | germany (3 rows) > But it's barely possible to use full JOINS all the time, since most of > the lookup fields are not required. > Example : in the customer table, the customer type is a looup field to a > customer_type table. But that is not required to post the record. So I > was thinking to create a record in the customer_type table with ID = > -1. And in case the customer type is NULL, to assign it the value -1. > That way, it will be always possible to do a full JOIN. I was wondering > if that is good practice or rather nonsense ? Hm concerning the functionality of LEFT JOIN I do not really understand what you wanna do here. You created the customer_type table to have the possibility to give more types to one customer (1:n). But therefore you do not need a LEFT JOIN. The statement could be like: SELECT name a FROM customer a, cutomer_type b WHERE a.id = b.id AND b.type = 'super customer' Or not? Cheers Andy - -- St.Pauli - Hamburg - Germany Andreas Wenk > Many thanks in advance, > Nico -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJgD/9Va7znmSP9AwRAsyDAKC8utO2Agy0ONULuy7nIgz9pG/7rgCfa/li o98EaJSKGqkv2brcd0RcI04= =de2X -----END PGP SIGNATURE-----
> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Nico Callewaert schrieb: >> Hi ! >> >> I heard that it is always better to use a full JOIN rather than a LEFT >> JOIN, for performance reasons. Is that correct ? > > please note that a JOIN and a LEFT JOIN are tow different things. Let's > assume you have > two tables (without any sens): > > books: > > headline | content > - ----------+------------- > politics | russia > politics | middle east > politics | germany > > interests: > > content > - --------- > germany > > Then fire two select statements like this: > > #1: > SELECT a.headline,a.content,b.content as contentb > FROM books a > JOIN interests b ON a.content = b.content; > > headline | content | contentb > - ----------+---------+---------- > politics | germany | germany > (1 row) > > #2: > SELECT a.headline,a.content,b.content as contentb > FROM books a > LEFT JOIN interests b on a.content = b.content; > headline | content | contentb > - ----------+-------------+---------- > politics | russia | > politics | middle east | > politics | germany | germany > (3 rows) > >> But it's barely possible to use full JOINS all the time, since most of >> the lookup fields are not required. >> Example : in the customer table, the customer type is a looup field to a >> customer_type table. But that is not required to post the record. So I >> was thinking to create a record in the customer_type table with ID = >> -1. And in case the customer type is NULL, to assign it the value -1. >> That way, it will be always possible to do a full JOIN. I was wondering >> if that is good practice or rather nonsense ? > > Hm concerning the functionality of LEFT JOIN I do not really understand > what you wanna do > here. You created the customer_type table to have the possibility to give > more types to > one customer (1:n). But therefore you do not need a LEFT JOIN. The > statement could be like: > > SELECT name a > FROM customer a, cutomer_type b > WHERE a.id = b.id > AND b.type = 'super customer' > > Or not? > > Cheers > > Andy Hi Andreas, Thanks for the reply. I know the difference between JOIN and LEFT JOIN. The question was just if there is a performance hit between the 2 ? I never use constructs like this : SELECT name a > FROM customer a, cutomer_type b > WHERE a.id = b.id > AND b.type = 'super customer' Always Joins. I will try to clarify. The current situation is : the customer_type has no value, so = NULL, no problem with LEFT JOIN. But I'm wondering with tables that has thousands of records, if LEFT JOINS are performing well ? Thanks again, Nico
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Nico Callewaert schrieb: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Nico Callewaert schrieb: >>> Hi ! >>> >>> I heard that it is always better to use a full JOIN rather than a LEFT >>> JOIN, for performance reasons. Is that correct ? >> >> please note that a JOIN and a LEFT JOIN are tow different things. >> Let's assume you have >> two tables (without any sens): >> >> books: >> >> headline | content >> - ----------+------------- >> politics | russia >> politics | middle east >> politics | germany >> >> interests: >> >> content >> - --------- >> germany >> >> Then fire two select statements like this: >> >> #1: >> SELECT a.headline,a.content,b.content as contentb >> FROM books a >> JOIN interests b ON a.content = b.content; >> >> headline | content | contentb >> - ----------+---------+---------- >> politics | germany | germany >> (1 row) >> >> #2: >> SELECT a.headline,a.content,b.content as contentb >> FROM books a >> LEFT JOIN interests b on a.content = b.content; >> headline | content | contentb >> - ----------+-------------+---------- >> politics | russia | >> politics | middle east | >> politics | germany | germany >> (3 rows) >> >>> But it's barely possible to use full JOINS all the time, since most of >>> the lookup fields are not required. >>> Example : in the customer table, the customer type is a looup field to a >>> customer_type table. But that is not required to post the record. So I >>> was thinking to create a record in the customer_type table with ID = >>> -1. And in case the customer type is NULL, to assign it the value -1. >>> That way, it will be always possible to do a full JOIN. I was wondering >>> if that is good practice or rather nonsense ? >> >> Hm concerning the functionality of LEFT JOIN I do not really >> understand what you wanna do >> here. You created the customer_type table to have the possibility to >> give more types to >> one customer (1:n). But therefore you do not need a LEFT JOIN. The >> statement could be like: >> >> SELECT name a >> FROM customer a, cutomer_type b >> WHERE a.id = b.id >> AND b.type = 'super customer' >> >> Or not? >> >> Cheers >> >> Andy > > > Hi Andreas, > > Thanks for the reply. I know the difference between JOIN and LEFT JOIN. > The question was just if there is a performance hit between the 2 ? > > I never use constructs like this : SELECT name a >> FROM customer a, cutomer_type b >> WHERE a.id = b.id >> AND b.type = 'super customer' > > Always Joins. > I will try to clarify. The current situation is : the customer_type has > no value, so = NULL, no problem with LEFT JOIN. Why do you need an empty entry in this table? Your application should take care, that this does not happen ... > But I'm wondering with tables that has thousands of records, if LEFT > JOINS are performing well ? EXPLAIN ANALYZE with my examples: #1 test=# explain analyze select a.headline,a.content,b.content as contentb from books a join interests b on a.content = b.content; QUERY PLAN - ------------------------------------------------------------------------------------------------------------------ Hash Join (cost=10.88..23.67 rows=70 width=1548) (actual time=0.051..0.058 rows=1 loops=1) Hash Cond: (("outer".content)::text = ("inner".content)::text) -> Seq Scan on interests b (cost=0.00..11.40 rows=140 width=516) (actual time=0.007..0.010 rows=1 loops=1) -> Hash (cost=10.70..10.70 rows=70 width=1032) (actual time=0.026..0.026 rows=3 loops=1) -> Seq Scan on books a (cost=0.00..10.70 rows=70 width=1032) (actual time=0.004..0.013 rows=3 loops=1) Total runtime: 0.094 ms (6 rows) #2 test=# explain analyze select a.headline,a.content,b.content as contentb from books a left join interests b on a.content = b.content; QUERY PLAN - ---------------------------------------------------------------------------------------------------------------------- Hash Left Join (cost=11.75..23.72 rows=70 width=1548) (actual time=0.038..0.062 rows=3 loops=1) Hash Cond: (("outer".content)::text = ("inner".content)::text) -> Seq Scan on books a (cost=0.00..10.70 rows=70 width=1032) (actual time=0.007..0.014 rows=3 loops=1) -> Hash (cost=11.40..11.40 rows=140 width=516) (actual time=0.015..0.015 rows=1 loops=1) -> Seq Scan on interests b (cost=0.00..11.40 rows=140 width=516) (actual time=0.004..0.007 rows=1 loops=1) Total runtime: 0.102 ms (6 rows) I fired each statement ten times. But seriously - this is more or less just a hint that a LEFT JOIN is less performant tahn a JOIN. To get that exactly it depend on which execution plan the planner is using. So the best way ist to use EXPALIN ANALYZE with your statements. Sorry for the big post with much output ... Cheers Andy > Thanks again, Nico > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJgEWwVa7znmSP9AwRAi6NAKDdU4+b+CRzEUg8WsfN2M2P9CVl0wCg3iJn 14HzK+4/IHfdcVfqY4Jbf1g= =pNKm -----END PGP SIGNATURE-----
>>> Nico Callewaert schrieb: >>>> Hi ! >>>> >>>> I heard that it is always better to use a full JOIN rather than a LEFT >>>> JOIN, for performance reasons. Is that correct ? >>> >>> please note that a JOIN and a LEFT JOIN are tow different things. >>> Let's assume you have >>> two tables (without any sens): >>> >>> books: >>> >>> headline | content >>> - ----------+------------- >>> politics | russia >>> politics | middle east >>> politics | germany >>> >>> interests: >>> >>> content >>> - --------- >>> germany >>> >>> Then fire two select statements like this: >>> >>> #1: >>> SELECT a.headline,a.content,b.content as contentb >>> FROM books a >>> JOIN interests b ON a.content = b.content; >>> >>> headline | content | contentb >>> - ----------+---------+---------- >>> politics | germany | germany >>> (1 row) >>> >>> #2: >>> SELECT a.headline,a.content,b.content as contentb >>> FROM books a >>> LEFT JOIN interests b on a.content = b.content; >>> headline | content | contentb >>> - ----------+-------------+---------- >>> politics | russia | >>> politics | middle east | >>> politics | germany | germany >>> (3 rows) >>> >>>> But it's barely possible to use full JOINS all the time, since most of >>>> the lookup fields are not required. >>>> Example : in the customer table, the customer type is a looup field to >>>> a >>>> customer_type table. But that is not required to post the record. So >>>> I >>>> was thinking to create a record in the customer_type table with ID = >>>> -1. And in case the customer type is NULL, to assign it the value -1. >>>> That way, it will be always possible to do a full JOIN. I was >>>> wondering >>>> if that is good practice or rather nonsense ? >>> >>> Hm concerning the functionality of LEFT JOIN I do not really >>> understand what you wanna do >>> here. You created the customer_type table to have the possibility to >>> give more types to >>> one customer (1:n). But therefore you do not need a LEFT JOIN. The >>> statement could be like: >>> >>> SELECT name a >>> FROM customer a, cutomer_type b >>> WHERE a.id = b.id >>> AND b.type = 'super customer' >>> >>> Or not? >>> >>> Cheers >>> >>> Andy >> >> >> Hi Andreas, >> >> Thanks for the reply. I know the difference between JOIN and LEFT JOIN. >> The question was just if there is a performance hit between the 2 ? >> >> I never use constructs like this : SELECT name a >>> FROM customer a, cutomer_type b >>> WHERE a.id = b.id >>> AND b.type = 'super customer' >> >> Always Joins. >> I will try to clarify. The current situation is : the customer_type has >> no value, so = NULL, no problem with LEFT JOIN. > > Why do you need an empty entry in this table? Your application should take > care, that this > does not happen ... > >> But I'm wondering with tables that has thousands of records, if LEFT >> JOINS are performing well ? > > EXPLAIN ANALYZE with my examples: > > #1 > test=# explain analyze select a.headline,a.content,b.content as contentb > from books a join > interests b on a.content = b.content; > QUERY PLAN > > - > ------------------------------------------------------------------------------------------------------------------ > Hash Join (cost=10.88..23.67 rows=70 width=1548) (actual > time=0.051..0.058 rows=1 loops=1) > Hash Cond: (("outer".content)::text = ("inner".content)::text) > -> Seq Scan on interests b (cost=0.00..11.40 rows=140 width=516) > (actual > time=0.007..0.010 rows=1 loops=1) > -> Hash (cost=10.70..10.70 rows=70 width=1032) (actual > time=0.026..0.026 rows=3 loops=1) > -> Seq Scan on books a (cost=0.00..10.70 rows=70 width=1032) > (actual > time=0.004..0.013 rows=3 loops=1) > Total runtime: 0.094 ms > (6 rows) > > > #2 > test=# explain analyze select a.headline,a.content,b.content as contentb > from books a > left join interests b on a.content = b.content; > QUERY PLAN > > - > ---------------------------------------------------------------------------------------------------------------------- > Hash Left Join (cost=11.75..23.72 rows=70 width=1548) (actual > time=0.038..0.062 rows=3 > loops=1) > Hash Cond: (("outer".content)::text = ("inner".content)::text) > -> Seq Scan on books a (cost=0.00..10.70 rows=70 width=1032) (actual > time=0.007..0.014 rows=3 loops=1) > -> Hash (cost=11.40..11.40 rows=140 width=516) (actual > time=0.015..0.015 rows=1 loops=1) > -> Seq Scan on interests b (cost=0.00..11.40 rows=140 width=516) > (actual > time=0.004..0.007 rows=1 loops=1) > Total runtime: 0.102 ms > (6 rows) > > I fired each statement ten times. But seriously - this is more or less > just a hint that a > LEFT JOIN is less performant tahn a JOIN. To get that exactly it depend on > which execution > plan the planner is using. So the best way ist to use EXPALIN ANALYZE with > your statements. > > Sorry for the big post with much output ... > > Cheers > > Andy > Hi Andy, Thanks again for the fast reply ! > Why do you need an empty entry in this table? Your application should take > care, that this > does not happen ... That is the whole point of my doubts. When a user creates a new customer, it's not an obligation to enter a customer_type, and still many other fields, like tax, payment, yellow pages, payment method, etc... related things. So I always have lots of LEFT JOIN's in my queries, because of the NULL fields. You said the application should handle it. So you mean, even the field is not obligatory, you would put a value on it ? All lookup tables are of course linked through foreign keys to enforce data integrity. So what value would you enter ? Like 0, or -1 ?? And make sure there is a 0, or -1 record in the lookup table ? Otherwise the foreign key will complain. I think I almost get your point, just need a last little push :-) Thanks a lot ! Nico
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Nico Callewaert schrieb: >>>> Nico Callewaert schrieb: >>>>> Hi ! >>>>> >>>>> I heard that it is always better to use a full JOIN rather than a LEFT >>>>> JOIN, for performance reasons. Is that correct ? >>>> >>>> please note that a JOIN and a LEFT JOIN are tow different things. >>>> Let's assume you have >>>> two tables (without any sens): >>>> >>>> books: >>>> >>>> headline | content >>>> - ----------+------------- >>>> politics | russia >>>> politics | middle east >>>> politics | germany >>>> >>>> interests: >>>> >>>> content >>>> - --------- >>>> germany >>>> >>>> Then fire two select statements like this: >>>> >>>> #1: >>>> SELECT a.headline,a.content,b.content as contentb >>>> FROM books a >>>> JOIN interests b ON a.content = b.content; >>>> >>>> headline | content | contentb >>>> - ----------+---------+---------- >>>> politics | germany | germany >>>> (1 row) >>>> >>>> #2: >>>> SELECT a.headline,a.content,b.content as contentb >>>> FROM books a >>>> LEFT JOIN interests b on a.content = b.content; >>>> headline | content | contentb >>>> - ----------+-------------+---------- >>>> politics | russia | >>>> politics | middle east | >>>> politics | germany | germany >>>> (3 rows) >>>> >>>>> But it's barely possible to use full JOINS all the time, since most of >>>>> the lookup fields are not required. >>>>> Example : in the customer table, the customer type is a looup field >>>>> to a >>>>> customer_type table. But that is not required to post the record. >>>>> So I >>>>> was thinking to create a record in the customer_type table with ID = >>>>> -1. And in case the customer type is NULL, to assign it the value -1. >>>>> That way, it will be always possible to do a full JOIN. I was >>>>> wondering >>>>> if that is good practice or rather nonsense ? >>>> >>>> Hm concerning the functionality of LEFT JOIN I do not really >>>> understand what you wanna do >>>> here. You created the customer_type table to have the possibility to >>>> give more types to >>>> one customer (1:n). But therefore you do not need a LEFT JOIN. The >>>> statement could be like: >>>> >>>> SELECT name a >>>> FROM customer a, cutomer_type b >>>> WHERE a.id = b.id >>>> AND b.type = 'super customer' >>>> >>>> Or not? >>>> >>>> Cheers >>>> >>>> Andy >>> >>> >>> Hi Andreas, >>> >>> Thanks for the reply. I know the difference between JOIN and LEFT JOIN. >>> The question was just if there is a performance hit between the 2 ? >>> >>> I never use constructs like this : SELECT name a >>>> FROM customer a, cutomer_type b >>>> WHERE a.id = b.id >>>> AND b.type = 'super customer' >>> >>> Always Joins. >>> I will try to clarify. The current situation is : the customer_type has >>> no value, so = NULL, no problem with LEFT JOIN. >> >> Why do you need an empty entry in this table? Your application should >> take care, that this >> does not happen ... >> >>> But I'm wondering with tables that has thousands of records, if LEFT >>> JOINS are performing well ? >> >> EXPLAIN ANALYZE with my examples: >> >> #1 >> test=# explain analyze select a.headline,a.content,b.content as >> contentb from books a join >> interests b on a.content = b.content; >> QUERY PLAN >> >> - >> ------------------------------------------------------------------------------------------------------------------ >> >> Hash Join (cost=10.88..23.67 rows=70 width=1548) (actual >> time=0.051..0.058 rows=1 loops=1) >> Hash Cond: (("outer".content)::text = ("inner".content)::text) >> -> Seq Scan on interests b (cost=0.00..11.40 rows=140 width=516) >> (actual >> time=0.007..0.010 rows=1 loops=1) >> -> Hash (cost=10.70..10.70 rows=70 width=1032) (actual >> time=0.026..0.026 rows=3 loops=1) >> -> Seq Scan on books a (cost=0.00..10.70 rows=70 width=1032) >> (actual >> time=0.004..0.013 rows=3 loops=1) >> Total runtime: 0.094 ms >> (6 rows) >> >> >> #2 >> test=# explain analyze select a.headline,a.content,b.content as >> contentb from books a >> left join interests b on a.content = b.content; >> QUERY PLAN >> >> - >> ---------------------------------------------------------------------------------------------------------------------- >> >> Hash Left Join (cost=11.75..23.72 rows=70 width=1548) (actual >> time=0.038..0.062 rows=3 >> loops=1) >> Hash Cond: (("outer".content)::text = ("inner".content)::text) >> -> Seq Scan on books a (cost=0.00..10.70 rows=70 width=1032) (actual >> time=0.007..0.014 rows=3 loops=1) >> -> Hash (cost=11.40..11.40 rows=140 width=516) (actual >> time=0.015..0.015 rows=1 loops=1) >> -> Seq Scan on interests b (cost=0.00..11.40 rows=140 >> width=516) (actual >> time=0.004..0.007 rows=1 loops=1) >> Total runtime: 0.102 ms >> (6 rows) >> >> I fired each statement ten times. But seriously - this is more or less >> just a hint that a >> LEFT JOIN is less performant tahn a JOIN. To get that exactly it >> depend on which execution >> plan the planner is using. So the best way ist to use EXPALIN ANALYZE >> with your statements. >> >> Sorry for the big post with much output ... >> >> Cheers >> >> Andy >> > > > Hi Andy, > > Thanks again for the fast reply ! > >> Why do you need an empty entry in this table? Your application should >> take care, that this >> does not happen ... > > That is the whole point of my doubts. When a user creates a new > customer, it's not an obligation to enter a customer_type, and still > many other fields, like tax, payment, yellow pages, payment method, > etc... related things. So I always have lots of LEFT JOIN's in my > queries, because of the NULL fields. You said the application should > handle it. So you mean, even the field is not obligatory, you would put > a value on it ? All lookup tables are of course linked through foreign > keys to enforce data integrity. So what value would you enter ? Like 0, > or -1 ?? And make sure there is a 0, or -1 record in the lookup table > ? Otherwise the foreign key will complain. > I think I almost get your point, just need a last little push :-) Ah ok - now we are more in database/application design ;-). Here's another question: are you really sure that the databse design is correct? Think about having all the info like customer_type, tax, payment, yellow pages, payment method in the customer table also. Menas - do you really need for these info more than one entry for one customer? So are you sure you need the 1:n relation in all cases? I think not, so that would mean you put the info about e.g tax, payment method and so on in the customer table as well and not in one (or more ...? you were writing "all referenced tables") referenced tables. As a result you will have less LEFT JOINS ... Cheers Andy > Thanks a lot ! > Nico > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJgEvGVa7znmSP9AwRAtaZAJ4tRyx5hQ+/f6a1itFJjQygShXAGwCfai7a 7NpGhS3tegajf/i+NZ6VxF8= =Dtb9 -----END PGP SIGNATURE-----
>>>>> Nico Callewaert schrieb: >>>>>> Hi ! >>>>>> >>>>>> I heard that it is always better to use a full JOIN rather than a >>>>>> LEFT >>>>>> JOIN, for performance reasons. Is that correct ? >>>>> >>>>> please note that a JOIN and a LEFT JOIN are tow different things. >>>>> Let's assume you have >>>>> two tables (without any sens): >>>>> >>>>> books: >>>>> >>>>> headline | content >>>>> - ----------+------------- >>>>> politics | russia >>>>> politics | middle east >>>>> politics | germany >>>>> >>>>> interests: >>>>> >>>>> content >>>>> - --------- >>>>> germany >>>>> >>>>> Then fire two select statements like this: >>>>> >>>>> #1: >>>>> SELECT a.headline,a.content,b.content as contentb >>>>> FROM books a >>>>> JOIN interests b ON a.content = b.content; >>>>> >>>>> headline | content | contentb >>>>> - ----------+---------+---------- >>>>> politics | germany | germany >>>>> (1 row) >>>>> >>>>> #2: >>>>> SELECT a.headline,a.content,b.content as contentb >>>>> FROM books a >>>>> LEFT JOIN interests b on a.content = b.content; >>>>> headline | content | contentb >>>>> - ----------+-------------+---------- >>>>> politics | russia | >>>>> politics | middle east | >>>>> politics | germany | germany >>>>> (3 rows) >>>>> >>>>>> But it's barely possible to use full JOINS all the time, since most >>>>>> of >>>>>> the lookup fields are not required. >>>>>> Example : in the customer table, the customer type is a looup field >>>>>> to a >>>>>> customer_type table. But that is not required to post the record. >>>>>> So I >>>>>> was thinking to create a record in the customer_type table with ID = >>>>>> -1. And in case the customer type is NULL, to assign it the >>>>>> value -1. >>>>>> That way, it will be always possible to do a full JOIN. I was >>>>>> wondering >>>>>> if that is good practice or rather nonsense ? >>>>> >>>>> Hm concerning the functionality of LEFT JOIN I do not really >>>>> understand what you wanna do >>>>> here. You created the customer_type table to have the possibility to >>>>> give more types to >>>>> one customer (1:n). But therefore you do not need a LEFT JOIN. The >>>>> statement could be like: >>>>> >>>>> SELECT name a >>>>> FROM customer a, cutomer_type b >>>>> WHERE a.id = b.id >>>>> AND b.type = 'super customer' >>>>> >>>>> Or not? >>>>> >>>>> Cheers >>>>> >>>>> Andy >>>> >>>> >>>> Hi Andreas, >>>> >>>> Thanks for the reply. I know the difference between JOIN and LEFT >>>> JOIN. >>>> The question was just if there is a performance hit between the 2 ? >>>> >>>> I never use constructs like this : SELECT name a >>>>> FROM customer a, cutomer_type b >>>>> WHERE a.id = b.id >>>>> AND b.type = 'super customer' >>>> >>>> Always Joins. >>>> I will try to clarify. The current situation is : the customer_type >>>> has >>>> no value, so = NULL, no problem with LEFT JOIN. >>> >>> Why do you need an empty entry in this table? Your application should >>> take care, that this >>> does not happen ... >>> >>>> But I'm wondering with tables that has thousands of records, if LEFT >>>> JOINS are performing well ? >>> >>> EXPLAIN ANALYZE with my examples: >>> >>> #1 >>> test=# explain analyze select a.headline,a.content,b.content as >>> contentb from books a join >>> interests b on a.content = b.content; >>> QUERY PLAN >>> >>> - >>> ------------------------------------------------------------------------------------------------------------------ >>> >>> Hash Join (cost=10.88..23.67 rows=70 width=1548) (actual >>> time=0.051..0.058 rows=1 loops=1) >>> Hash Cond: (("outer".content)::text = ("inner".content)::text) >>> -> Seq Scan on interests b (cost=0.00..11.40 rows=140 width=516) >>> (actual >>> time=0.007..0.010 rows=1 loops=1) >>> -> Hash (cost=10.70..10.70 rows=70 width=1032) (actual >>> time=0.026..0.026 rows=3 loops=1) >>> -> Seq Scan on books a (cost=0.00..10.70 rows=70 width=1032) >>> (actual >>> time=0.004..0.013 rows=3 loops=1) >>> Total runtime: 0.094 ms >>> (6 rows) >>> >>> >>> #2 >>> test=# explain analyze select a.headline,a.content,b.content as >>> contentb from books a >>> left join interests b on a.content = b.content; >>> QUERY PLAN >>> >>> - >>> ---------------------------------------------------------------------------------------------------------------------- >>> >>> Hash Left Join (cost=11.75..23.72 rows=70 width=1548) (actual >>> time=0.038..0.062 rows=3 >>> loops=1) >>> Hash Cond: (("outer".content)::text = ("inner".content)::text) >>> -> Seq Scan on books a (cost=0.00..10.70 rows=70 width=1032) (actual >>> time=0.007..0.014 rows=3 loops=1) >>> -> Hash (cost=11.40..11.40 rows=140 width=516) (actual >>> time=0.015..0.015 rows=1 loops=1) >>> -> Seq Scan on interests b (cost=0.00..11.40 rows=140 >>> width=516) (actual >>> time=0.004..0.007 rows=1 loops=1) >>> Total runtime: 0.102 ms >>> (6 rows) >>> >>> I fired each statement ten times. But seriously - this is more or less >>> just a hint that a >>> LEFT JOIN is less performant tahn a JOIN. To get that exactly it >>> depend on which execution >>> plan the planner is using. So the best way ist to use EXPALIN ANALYZE >>> with your statements. >>> >>> Sorry for the big post with much output ... >>> >>> Cheers >>> >>> Andy >>> >> >> >> Hi Andy, >> >> Thanks again for the fast reply ! >> >>> Why do you need an empty entry in this table? Your application should >>> take care, that this >>> does not happen ... >> >> That is the whole point of my doubts. When a user creates a new >> customer, it's not an obligation to enter a customer_type, and still >> many other fields, like tax, payment, yellow pages, payment method, >> etc... related things. So I always have lots of LEFT JOIN's in my >> queries, because of the NULL fields. You said the application should >> handle it. So you mean, even the field is not obligatory, you would put >> a value on it ? All lookup tables are of course linked through foreign >> keys to enforce data integrity. So what value would you enter ? Like 0, >> or -1 ?? And make sure there is a 0, or -1 record in the lookup table >> ? Otherwise the foreign key will complain. >> I think I almost get your point, just need a last little push :-) > > Ah ok - now we are more in database/application design ;-). Here's another > question: are > you really sure that the databse design is correct? Think about having all > the info like > customer_type, tax, payment, yellow pages, payment method in the customer > table also. > Menas - do you really need for these info more than one entry for one > customer? So are you > sure you need the 1:n relation in all cases? I think not, so that would > mean you put the > info about e.g tax, payment method and so on in the customer table as well > and not in one > (or more ...? you were writing "all referenced tables") referenced tables. > As a result you > will have less LEFT JOINS ... > > Cheers > > Andy > Hi Andy, I don't know what are the group policies here ? In case I have to cut a part of the message, just tell me, in case it gets too long. And in case it gets off topic, just send me a warning :-) Well, yes, we need all those 1:n relationships, because customer type can be prospect, customer, not active. Payment methods can be visa, cash, wire transfer, etc... Yellow pages are the category numbers. VAT rules can be 21%, 6%, etc... All of these fields are feeded through drop downs in the application, so the customer can easily select a value. Nico
> -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Nico Callewaert schrieb: >> Hi ! >> >> I heard that it is always better to use a full JOIN rather than a LEFT >> JOIN, for performance reasons. Is that correct ? > > please note that a JOIN and a LEFT JOIN are tow different things. Let's > assume you have > two tables (without any sens): > This is a bit of the nitpick, but please don't call an INNER JOIN a FULL JOIN. Those are 2 separate animals altogether. JOIN and INNER JOIN are the same, but I tend to put in the word INNER for clarity even though some may consider it redundant. So in terms of performance [INNER] JOIN -- fastest LEFT JOIN -- generally slower (but there really is no alternative if you don't want to leave out records without matches and the performance hit is not worth the effort of changing your data model to put in junk data that should be rightfully NULL, plus there are other nifty tricks you can perform with LEFTS as I have here) http://www.bostongis.com/blog/index.php?/archives/37-Explain-Analyze-Geo metry-Relation-Operators-and-Joins-Except-Where.html FULL JOIN -- rarely used and slower than INNER, LEFT, RIGHT, but it comes in handy at times. Okay enough of my preaching. Go forth and prosper. Thanks, Regina ----------------------------------------- The substance of this message, including any attachments, may be confidential, legally privileged and/or exempt from disclosure pursuant to Massachusetts law. It is intended solely for the addressee. If you received this in error, please contact the sender and delete the material from any computer.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Nico Callewaert schrieb: >>>>>> Nico Callewaert schrieb: >>>>>>> Hi ! >>>>>>> >>>>>>> I heard that it is always better to use a full JOIN rather than a >>>>>>> LEFT >>>>>>> JOIN, for performance reasons. Is that correct ? >>>>>> >>>>>> please note that a JOIN and a LEFT JOIN are tow different things. >>>>>> Let's assume you have >>>>>> two tables (without any sens): >>>>>> >>>>>> books: >>>>>> >>>>>> headline | content >>>>>> - ----------+------------- >>>>>> politics | russia >>>>>> politics | middle east >>>>>> politics | germany >>>>>> >>>>>> interests: >>>>>> >>>>>> content >>>>>> - --------- >>>>>> germany >>>>>> >>>>>> Then fire two select statements like this: >>>>>> >>>>>> #1: >>>>>> SELECT a.headline,a.content,b.content as contentb >>>>>> FROM books a >>>>>> JOIN interests b ON a.content = b.content; >>>>>> >>>>>> headline | content | contentb >>>>>> - ----------+---------+---------- >>>>>> politics | germany | germany >>>>>> (1 row) >>>>>> >>>>>> #2: >>>>>> SELECT a.headline,a.content,b.content as contentb >>>>>> FROM books a >>>>>> LEFT JOIN interests b on a.content = b.content; >>>>>> headline | content | contentb >>>>>> - ----------+-------------+---------- >>>>>> politics | russia | >>>>>> politics | middle east | >>>>>> politics | germany | germany >>>>>> (3 rows) >>>>>> >>>>>>> But it's barely possible to use full JOINS all the time, since >>>>>>> most of >>>>>>> the lookup fields are not required. >>>>>>> Example : in the customer table, the customer type is a looup field >>>>>>> to a >>>>>>> customer_type table. But that is not required to post the record. >>>>>>> So I >>>>>>> was thinking to create a record in the customer_type table with ID = >>>>>>> -1. And in case the customer type is NULL, to assign it the >>>>>>> value -1. >>>>>>> That way, it will be always possible to do a full JOIN. I was >>>>>>> wondering >>>>>>> if that is good practice or rather nonsense ? >>>>>> >>>>>> Hm concerning the functionality of LEFT JOIN I do not really >>>>>> understand what you wanna do >>>>>> here. You created the customer_type table to have the possibility to >>>>>> give more types to >>>>>> one customer (1:n). But therefore you do not need a LEFT JOIN. The >>>>>> statement could be like: >>>>>> >>>>>> SELECT name a >>>>>> FROM customer a, cutomer_type b >>>>>> WHERE a.id = b.id >>>>>> AND b.type = 'super customer' >>>>>> >>>>>> Or not? >>>>>> >>>>>> Cheers >>>>>> >>>>>> Andy >>>>> >>>>> >>>>> Hi Andreas, >>>>> >>>>> Thanks for the reply. I know the difference between JOIN and LEFT >>>>> JOIN. >>>>> The question was just if there is a performance hit between the 2 ? >>>>> >>>>> I never use constructs like this : SELECT name a >>>>>> FROM customer a, cutomer_type b >>>>>> WHERE a.id = b.id >>>>>> AND b.type = 'super customer' >>>>> >>>>> Always Joins. >>>>> I will try to clarify. The current situation is : the >>>>> customer_type has >>>>> no value, so = NULL, no problem with LEFT JOIN. >>>> >>>> Why do you need an empty entry in this table? Your application should >>>> take care, that this >>>> does not happen ... >>>> >>>>> But I'm wondering with tables that has thousands of records, if LEFT >>>>> JOINS are performing well ? >>>> >>>> EXPLAIN ANALYZE with my examples: >>>> >>>> #1 >>>> test=# explain analyze select a.headline,a.content,b.content as >>>> contentb from books a join >>>> interests b on a.content = b.content; >>>> QUERY PLAN >>>> >>>> - >>>> ------------------------------------------------------------------------------------------------------------------ >>>> >>>> >>>> Hash Join (cost=10.88..23.67 rows=70 width=1548) (actual >>>> time=0.051..0.058 rows=1 loops=1) >>>> Hash Cond: (("outer".content)::text = ("inner".content)::text) >>>> -> Seq Scan on interests b (cost=0.00..11.40 rows=140 width=516) >>>> (actual >>>> time=0.007..0.010 rows=1 loops=1) >>>> -> Hash (cost=10.70..10.70 rows=70 width=1032) (actual >>>> time=0.026..0.026 rows=3 loops=1) >>>> -> Seq Scan on books a (cost=0.00..10.70 rows=70 width=1032) >>>> (actual >>>> time=0.004..0.013 rows=3 loops=1) >>>> Total runtime: 0.094 ms >>>> (6 rows) >>>> >>>> >>>> #2 >>>> test=# explain analyze select a.headline,a.content,b.content as >>>> contentb from books a >>>> left join interests b on a.content = b.content; >>>> QUERY PLAN >>>> >>>> - >>>> ---------------------------------------------------------------------------------------------------------------------- >>>> >>>> >>>> Hash Left Join (cost=11.75..23.72 rows=70 width=1548) (actual >>>> time=0.038..0.062 rows=3 >>>> loops=1) >>>> Hash Cond: (("outer".content)::text = ("inner".content)::text) >>>> -> Seq Scan on books a (cost=0.00..10.70 rows=70 width=1032) >>>> (actual >>>> time=0.007..0.014 rows=3 loops=1) >>>> -> Hash (cost=11.40..11.40 rows=140 width=516) (actual >>>> time=0.015..0.015 rows=1 loops=1) >>>> -> Seq Scan on interests b (cost=0.00..11.40 rows=140 >>>> width=516) (actual >>>> time=0.004..0.007 rows=1 loops=1) >>>> Total runtime: 0.102 ms >>>> (6 rows) >>>> >>>> I fired each statement ten times. But seriously - this is more or less >>>> just a hint that a >>>> LEFT JOIN is less performant tahn a JOIN. To get that exactly it >>>> depend on which execution >>>> plan the planner is using. So the best way ist to use EXPALIN ANALYZE >>>> with your statements. >>>> >>>> Sorry for the big post with much output ... >>>> >>>> Cheers >>>> >>>> Andy >>>> >>> >>> >>> Hi Andy, >>> >>> Thanks again for the fast reply ! >>> >>>> Why do you need an empty entry in this table? Your application should >>>> take care, that this >>>> does not happen ... >>> >>> That is the whole point of my doubts. When a user creates a new >>> customer, it's not an obligation to enter a customer_type, and still >>> many other fields, like tax, payment, yellow pages, payment method, >>> etc... related things. So I always have lots of LEFT JOIN's in my >>> queries, because of the NULL fields. You said the application should >>> handle it. So you mean, even the field is not obligatory, you would put >>> a value on it ? All lookup tables are of course linked through foreign >>> keys to enforce data integrity. So what value would you enter ? Like 0, >>> or -1 ?? And make sure there is a 0, or -1 record in the lookup table >>> ? Otherwise the foreign key will complain. >>> I think I almost get your point, just need a last little push :-) >> >> Ah ok - now we are more in database/application design ;-). Here's >> another question: are >> you really sure that the databse design is correct? Think about having >> all the info like >> customer_type, tax, payment, yellow pages, payment method in the >> customer table also. >> Menas - do you really need for these info more than one entry for one >> customer? So are you >> sure you need the 1:n relation in all cases? I think not, so that >> would mean you put the >> info about e.g tax, payment method and so on in the customer table as >> well and not in one >> (or more ...? you were writing "all referenced tables") referenced >> tables. As a result you >> will have less LEFT JOINS ... >> >> Cheers >> >> Andy >> > > Hi Andy, > > I don't know what are the group policies here ? In case I have to cut a > part of the message, just tell me, in case it gets too long. > And in case it gets off topic, just send me a warning :-) don't worry ;-) > Well, yes, we need all those 1:n relationships, because customer type > can be prospect, customer, not active. Payment methods can be visa, > cash, wire transfer, etc... Yellow pages are the category numbers. VAT > rules can be 21%, 6%, etc... All of these fields are feeded through Ok please don't get me wrong for asking again: one customer can have two tax rates? Why? Because he's living in the USA and has a taxrate of 25% and lives also in Germany and has 19%? Hm ... strange ... ;-) . And also the custumer is 'prospect' OR 'not active' ... but not both ... ;-) > drop downs in the application, so the customer can easily select a value. So all these dropdowns are multiselect ...? I am not sure if I can help you with this. My idea is really to think about the design of the application and / or the database ... again - please don't get me wrong ... > Nico > > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJgFKLVa7znmSP9AwRAlGkAJ9imdvkfk3iTw146Eru3BFB1llPZQCgqPXb fYhH9zDowHAtjAYjfAHlhcM= =Kpf5 -----END PGP SIGNATURE-----
> Ok please don't get me wrong for asking again: one customer can have two > tax rates? Why? > Because he's living in the USA and has a taxrate of 25% and lives also in > Germany and has > 19%? Hm ... strange ... ;-) . And also the custumer is 'prospect' OR 'not > active' ... but > not both ... ;-) > >> drop downs in the application, so the customer can easily select a value. > > So all these dropdowns are multiselect ...? > > I am not sure if I can help you with this. My idea is really to think > about the design of > the application and / or the database ... again - please don't get me > wrong ... Hi Andreas, I think we are not understanding each other :-) It's al the time just 1 value, 1 value for tax, 1 value for customer type. No multiselect drop downs. But all these fields are of course referencing a lookup table by ID. But like Regina also said, the performance hit of the LEFT JOIN seems not worth considering putting junk data in the database that should be NULL (if the drop down is left blanc)... I don't know if I made myself clear ? Thanks ! Nico
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Obe, Regina schrieb: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Nico Callewaert schrieb: >>> Hi ! >>> >>> I heard that it is always better to use a full JOIN rather than a > LEFT >>> JOIN, for performance reasons. Is that correct ? >> please note that a JOIN and a LEFT JOIN are tow different things. > Let's >> assume you have >> two tables (without any sens): >> > > This is a bit of the nitpick, but please don't call an INNER JOIN a FULL > JOIN. > > Those are 2 separate animals altogether. > > JOIN and INNER JOIN are the same, but I tend to put in the word INNER > for clarity > even though some may consider it redundant. > > So in terms of performance > > [INNER] JOIN -- fastest > LEFT JOIN -- generally slower (but there really is no alternative if you > don't want to leave out records without matches > and the performance hit is not worth the effort > of changing your data model to put in junk data that > should be rightfully NULL, plus there are other > nifty tricks you can perform with LEFTS as I have here) > > http://www.bostongis.com/blog/index.php?/archives/37-Explain-Analyze-Geo > metry-Relation-Operators-and-Joins-Except-Where.html > > FULL JOIN -- rarely used and slower than INNER, LEFT, RIGHT, but it > comes in handy at times. > > Okay enough of my preaching. Cool - this is a good "preaching" to be totally correct and academic ;-) Thank you ! Cheers Andy - -- St.Pauli - Hamburg - Germany Andreas Wenk > > Go forth and prosper. > > Thanks, > Regina > ----------------------------------------- > The substance of this message, including any attachments, may be > confidential, legally privileged and/or exempt from disclosure > pursuant to Massachusetts law. It is intended > solely for the addressee. If you received this in error, please > contact the sender and delete the material from any computer. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJgFMvVa7znmSP9AwRAgqIAKCk8z1rJLXaS8GnK7FsWdOjl+LjawCgnGvm WXob+X3CZt0VRSgu8fM9ZaY= =3UTC -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Nico Callewaert schrieb: > >> Ok please don't get me wrong for asking again: one customer can have >> two tax rates? Why? >> Because he's living in the USA and has a taxrate of 25% and lives also >> in Germany and has >> 19%? Hm ... strange ... ;-) . And also the custumer is 'prospect' OR >> 'not active' ... but >> not both ... ;-) >> >>> drop downs in the application, so the customer can easily select a >>> value. >> >> So all these dropdowns are multiselect ...? >> >> I am not sure if I can help you with this. My idea is really to think >> about the design of >> the application and / or the database ... again - please don't get me >> wrong ... > > > Hi Andreas, > > I think we are not understanding each other :-) sometimes I am good with that ;-) > It's al the time just 1 value, 1 value for tax, 1 value for customer > type. No multiselect drop downs. But all these fields are of course > referencing a lookup table by ID. But like Regina also said, the > performance hit of the LEFT JOIN seems not worth considering putting > junk data in the database that should be NULL (if the drop down is left > blanc)... > I don't know if I made myself clear ? yes you did - don't worry. I have just another idea how I would do that. But anyway I think the basic question was if a LEFT JOIN has less performance comparing with a JOIN. So now we have the answer for your case and that hopefully helps you to get your software work well. In all cases EXPLAIN ANALYZE is a good idea to use ... > Thanks ! > Nico > Cheers Andy - -- St.Pauli - Hamburg - Germany Andreas Wenk -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJgFbqVa7znmSP9AwRAiqfAJ9TG2B+oqjvItHz9eIEk26oQkdBRQCgnpNc NlJbJKUmidRrHk7b8k624DM= =uliw -----END PGP SIGNATURE-----
"Obe, Regina" <robe.dnd@cityofboston.gov> writes: > So in terms of performance > [INNER] JOIN -- fastest > LEFT JOIN -- generally slower (but there really is no alternative if you > don't want to leave out records without matches This is nonsense. A left join is not inherently slower than an inner join. What *is* true is that a left join constrains the optimizer more than an inner join, ie some join reorderings are allowed for inner joins but would change the answers if an outer join is involved. So in the context of a specific query you might get a slower plan if you use a left join. But you can't say that as a blanket statement. In a lot of cases there won't be any difference at all (particularly with more recent PG versions --- our optimizer has gotten smarter about outer joins over time). > FULL JOIN -- rarely used and slower than INNER, LEFT, RIGHT, but it > comes in handy at times. Likewise, a full join isn't necessarily slow in itself, but it constrains the possible plans quite a lot. regards, tom lane
Thanks a lot Tom for explaining ! ----- Original Message ----- From: "Tom Lane" <tgl@sss.pgh.pa.us> To: "Obe, Regina" <robe.dnd@cityofboston.gov> Cc: "Nico Callewaert" <callewaert.nico@telenet.be>; "Andreas Wenk" <a.wenk@netzmeister-st-pauli.de>; <pgsql-novice@postgresql.org> Sent: Wednesday, January 28, 2009 5:49 PM Subject: Re: [NOVICE] JOIN vs. LEFT JOIN > "Obe, Regina" <robe.dnd@cityofboston.gov> writes: >> So in terms of performance > >> [INNER] JOIN -- fastest >> LEFT JOIN -- generally slower (but there really is no alternative if you >> don't want to leave out records without matches > > This is nonsense. A left join is not inherently slower than an inner > join. > > What *is* true is that a left join constrains the optimizer more than an > inner join, ie some join reorderings are allowed for inner joins but > would change the answers if an outer join is involved. So in the > context of a specific query you might get a slower plan if you use a > left join. But you can't say that as a blanket statement. In a lot > of cases there won't be any difference at all (particularly with more > recent PG versions --- our optimizer has gotten smarter about outer > joins over time). > >> FULL JOIN -- rarely used and slower than INNER, LEFT, RIGHT, but it >> comes in handy at times. > > Likewise, a full join isn't necessarily slow in itself, but it > constrains the possible plans quite a lot. > > regards, tom lane > >