Re: JOIN vs. LEFT JOIN - Mailing list pgsql-novice
From | Nico Callewaert |
---|---|
Subject | Re: JOIN vs. LEFT JOIN |
Date | |
Msg-id | 5A5C5FB6FC174396A1B165B2B282526A@etsinformatics.local Whole thread Raw |
In response to | JOIN vs. LEFT JOIN ("Nico Callewaert" <callewaert.nico@telenet.be>) |
Responses |
Re: JOIN vs. LEFT JOIN
|
List | pgsql-novice |
>>>>> Nico Callewaert schrieb: >>>>>> Hi ! >>>>>> >>>>>> I heard that it is always better to use a full JOIN rather than a >>>>>> LEFT >>>>>> JOIN, for performance reasons. Is that correct ? >>>>> >>>>> please note that a JOIN and a LEFT JOIN are tow different things. >>>>> Let's assume you have >>>>> two tables (without any sens): >>>>> >>>>> books: >>>>> >>>>> headline | content >>>>> - ----------+------------- >>>>> politics | russia >>>>> politics | middle east >>>>> politics | germany >>>>> >>>>> interests: >>>>> >>>>> content >>>>> - --------- >>>>> germany >>>>> >>>>> Then fire two select statements like this: >>>>> >>>>> #1: >>>>> SELECT a.headline,a.content,b.content as contentb >>>>> FROM books a >>>>> JOIN interests b ON a.content = b.content; >>>>> >>>>> headline | content | contentb >>>>> - ----------+---------+---------- >>>>> politics | germany | germany >>>>> (1 row) >>>>> >>>>> #2: >>>>> SELECT a.headline,a.content,b.content as contentb >>>>> FROM books a >>>>> LEFT JOIN interests b on a.content = b.content; >>>>> headline | content | contentb >>>>> - ----------+-------------+---------- >>>>> politics | russia | >>>>> politics | middle east | >>>>> politics | germany | germany >>>>> (3 rows) >>>>> >>>>>> But it's barely possible to use full JOINS all the time, since most >>>>>> of >>>>>> the lookup fields are not required. >>>>>> Example : in the customer table, the customer type is a looup field >>>>>> to a >>>>>> customer_type table. But that is not required to post the record. >>>>>> So I >>>>>> was thinking to create a record in the customer_type table with ID = >>>>>> -1. And in case the customer type is NULL, to assign it the >>>>>> value -1. >>>>>> That way, it will be always possible to do a full JOIN. I was >>>>>> wondering >>>>>> if that is good practice or rather nonsense ? >>>>> >>>>> Hm concerning the functionality of LEFT JOIN I do not really >>>>> understand what you wanna do >>>>> here. You created the customer_type table to have the possibility to >>>>> give more types to >>>>> one customer (1:n). But therefore you do not need a LEFT JOIN. The >>>>> statement could be like: >>>>> >>>>> SELECT name a >>>>> FROM customer a, cutomer_type b >>>>> WHERE a.id = b.id >>>>> AND b.type = 'super customer' >>>>> >>>>> Or not? >>>>> >>>>> Cheers >>>>> >>>>> Andy >>>> >>>> >>>> Hi Andreas, >>>> >>>> Thanks for the reply. I know the difference between JOIN and LEFT >>>> JOIN. >>>> The question was just if there is a performance hit between the 2 ? >>>> >>>> I never use constructs like this : SELECT name a >>>>> FROM customer a, cutomer_type b >>>>> WHERE a.id = b.id >>>>> AND b.type = 'super customer' >>>> >>>> Always Joins. >>>> I will try to clarify. The current situation is : the customer_type >>>> has >>>> no value, so = NULL, no problem with LEFT JOIN. >>> >>> Why do you need an empty entry in this table? Your application should >>> take care, that this >>> does not happen ... >>> >>>> But I'm wondering with tables that has thousands of records, if LEFT >>>> JOINS are performing well ? >>> >>> EXPLAIN ANALYZE with my examples: >>> >>> #1 >>> test=# explain analyze select a.headline,a.content,b.content as >>> contentb from books a join >>> interests b on a.content = b.content; >>> QUERY PLAN >>> >>> - >>> ------------------------------------------------------------------------------------------------------------------ >>> >>> Hash Join (cost=10.88..23.67 rows=70 width=1548) (actual >>> time=0.051..0.058 rows=1 loops=1) >>> Hash Cond: (("outer".content)::text = ("inner".content)::text) >>> -> Seq Scan on interests b (cost=0.00..11.40 rows=140 width=516) >>> (actual >>> time=0.007..0.010 rows=1 loops=1) >>> -> Hash (cost=10.70..10.70 rows=70 width=1032) (actual >>> time=0.026..0.026 rows=3 loops=1) >>> -> Seq Scan on books a (cost=0.00..10.70 rows=70 width=1032) >>> (actual >>> time=0.004..0.013 rows=3 loops=1) >>> Total runtime: 0.094 ms >>> (6 rows) >>> >>> >>> #2 >>> test=# explain analyze select a.headline,a.content,b.content as >>> contentb from books a >>> left join interests b on a.content = b.content; >>> QUERY PLAN >>> >>> - >>> ---------------------------------------------------------------------------------------------------------------------- >>> >>> Hash Left Join (cost=11.75..23.72 rows=70 width=1548) (actual >>> time=0.038..0.062 rows=3 >>> loops=1) >>> Hash Cond: (("outer".content)::text = ("inner".content)::text) >>> -> Seq Scan on books a (cost=0.00..10.70 rows=70 width=1032) (actual >>> time=0.007..0.014 rows=3 loops=1) >>> -> Hash (cost=11.40..11.40 rows=140 width=516) (actual >>> time=0.015..0.015 rows=1 loops=1) >>> -> Seq Scan on interests b (cost=0.00..11.40 rows=140 >>> width=516) (actual >>> time=0.004..0.007 rows=1 loops=1) >>> Total runtime: 0.102 ms >>> (6 rows) >>> >>> I fired each statement ten times. But seriously - this is more or less >>> just a hint that a >>> LEFT JOIN is less performant tahn a JOIN. To get that exactly it >>> depend on which execution >>> plan the planner is using. So the best way ist to use EXPALIN ANALYZE >>> with your statements. >>> >>> Sorry for the big post with much output ... >>> >>> Cheers >>> >>> Andy >>> >> >> >> Hi Andy, >> >> Thanks again for the fast reply ! >> >>> Why do you need an empty entry in this table? Your application should >>> take care, that this >>> does not happen ... >> >> That is the whole point of my doubts. When a user creates a new >> customer, it's not an obligation to enter a customer_type, and still >> many other fields, like tax, payment, yellow pages, payment method, >> etc... related things. So I always have lots of LEFT JOIN's in my >> queries, because of the NULL fields. You said the application should >> handle it. So you mean, even the field is not obligatory, you would put >> a value on it ? All lookup tables are of course linked through foreign >> keys to enforce data integrity. So what value would you enter ? Like 0, >> or -1 ?? And make sure there is a 0, or -1 record in the lookup table >> ? Otherwise the foreign key will complain. >> I think I almost get your point, just need a last little push :-) > > Ah ok - now we are more in database/application design ;-). Here's another > question: are > you really sure that the databse design is correct? Think about having all > the info like > customer_type, tax, payment, yellow pages, payment method in the customer > table also. > Menas - do you really need for these info more than one entry for one > customer? So are you > sure you need the 1:n relation in all cases? I think not, so that would > mean you put the > info about e.g tax, payment method and so on in the customer table as well > and not in one > (or more ...? you were writing "all referenced tables") referenced tables. > As a result you > will have less LEFT JOINS ... > > Cheers > > Andy > Hi Andy, I don't know what are the group policies here ? In case I have to cut a part of the message, just tell me, in case it gets too long. And in case it gets off topic, just send me a warning :-) Well, yes, we need all those 1:n relationships, because customer type can be prospect, customer, not active. Payment methods can be visa, cash, wire transfer, etc... Yellow pages are the category numbers. VAT rules can be 21%, 6%, etc... All of these fields are feeded through drop downs in the application, so the customer can easily select a value. Nico
pgsql-novice by date: