JOINs ... how I hate them ... - Mailing list pgsql-hackers
From | Marc G. Fournier |
---|---|
Subject | JOINs ... how I hate them ... |
Date | |
Msg-id | 20020207100506.Y57607-100000@earth.hub.org Whole thread Raw |
Responses |
Re: JOINs ... how I hate them ...
Re: JOINs ... how I hate them ... |
List | pgsql-hackers |
Okay, went back through teh archives, as I know that Tom provided a solution for this before, and found it at: http://archives.postgresql.org/pgsql-sql/2001-06/msg00329.php Plain and simple ... makes perfect sense ... doesn't work in v7.2, or, at least, not as I'm expecting it to ... I've broken what I'm trying to do down the the *basest* component I can: explain SELECT p.uid, p.handle FROM gender_f pgf JOIN profiles p ON (pgf.uid = p.uid) ; Which explains out as: Hash Join (cost=1354.80..45297.83 rows=75664 width=27) -> Seq Scan on profiles p (cost=0.00..35707.69 rows=485969 width=19)-> Hash (cost=1165.64..1165.64 rows=75664 width=8) -> Seq Scan on gender_f pgf (cost=0.00..1165.64 rows=75664width=8) Now, profiles has uid as its primary KEY, and there are no duplicates in gender_f ... so, as my HashJoin points out, I should have 75664 results returned ... that is expected ... and the SeqScan on gender_f is expected ... but the SeqScan on profiles is what I would hope to get rid of ... get uid from gender_f, find corresponding entry in profiles ... its only ever goign to pull out 75664 out of 485969 records from profiles, so why would it seqscan *through* profiles for each and every UID? Now, if I go to the next level that I'm trying to pull together: explain SELECT p.uid, p.handle FROM ( orientation_c poc JOIN gender_f pgf USING ( uid ) ) JOIN profiles p ON (pgf.uid =p.uid) ; It still explains, what I think, is wrong: Hash Join (cost=6023.92..47537.10 rows=75664 width=35) -> Seq Scan on profiles p (cost=0.00..35707.69 rows=485969 width=19)-> Hash (cost=5834.76..5834.76 rows=75664 width=16) -> Merge Join (cost=0.00..5834.76 rows=75664 width=16) -> Index Scan using poc_uid on orientation_c poc (cost=0.00..2807.82 rows=126477 width=8) -> Index Scan using pgf_uid on gender_f pgf (cost=0.00..1575.79 rows=75664 width=8) The MergeJoin between poc/pgf will only return 12000 records, and since it is a 1:1 relationship between each of those tables, there will *only* be 12000 records pulled from profiles ... yet its doing a SeqScan through all 485k records for each of those UIDs? This is after I've performed a VACUUM ANALYZE ... The final query itself is: SELECT p.uid, p.profiles_handle FROM ( ( profiles_orientation_c poc JOIN profiles_gender_f pgf USING ( uid ) ) JOIN iwantu_profilesp USING (uid ) ) LEFT JOIN iwantu_last_login ll USING ( uid ); Which explains as: Hash Join (cost=31636.40..78239.34 rows=75664 width=43) -> Hash Join (cost=6023.92..47537.10 rows=75664 width=35) -> Seq Scan on iwantu_profiles p (cost=0.00..35707.69 rows=485969 width=19) -> Hash (cost=5834.76..5834.76 rows=75664width=16) -> Merge Join (cost=0.00..5834.76 rows=75664 width=16) -> Index Scanusing poc_uid on profiles_orientation_c poc (cost=0.00..2807.82 rows=126477 width=8) -> Index Scanusing pgf_uid on profiles_gender_f pgf (cost=0.00..1575.79 rows=75664 width=8) -> Hash (cost=7955.64..7955.64 rows=485964width=8) -> Seq Scan on iwantu_last_login ll (cost=0.00..7955.64 rows=485964 width=8) EXPLAIN So, poc&pgf are MergeJoin's, leaving me with 12000 records again ... then there is the SeqScan/HashJoin wiht profiles, which will leave me with 12000 records, but with more information ... but, again, for each of *those* 12000 records, its doing a SeqScan on last_login's 485k records, instead of using the index ... again, like pgf and poc, there is only one record for every uid, so we aren't dealing with duplicates ... Now, if I 'set enable_seqscan=false;' and do the exact same explain, it definitely comes more in line with what I'd like to see, as far as index usage is concerned: Nested Loop (cost=0.00..546759.46 rows=75664 width=43) -> Nested Loop (cost=0.00..272274.75 rows=75664 width=35) -> Merge Join (cost=0.00..5834.76 rows=75664 width=16) -> Index Scan using poc_uid on profiles_orientation_cpoc (cost=0.00..2807.82 rows=126477 width=8) -> Index Scan using pgf_uid on profiles_gender_fpgf (cost=0.00..1575.79 rows=75664 width=8) -> Index Scan using iwantu_profiles_uid on iwantu_profilesp (cost=0.00..3.51 rows=1 width=19) -> Index Scan using ill_uid on iwantu_last_login ll (cost=0.00..3.62rows=1 width=8)
pgsql-hackers by date: