Thread: bug in JOIN or COUNT or ... ?
Sorry, worst Subject I've ever come up with, but this is one of those "I haven't got a clue how to describe" emails ... Simple query: SELECT distinct s.gid, s.created, i.title FROM status s LEFT JOIN images i ON (s.gid = i.gid AND i.active), personal_datapd, relationship_wanted rw WHERE s.active AND s.status != 0 AND s.gid = 17111 AND (s.gid = pd.gid ANDpd.gender = 0) AND (s.gid = rw.gid AND rw.gender = 0 ); Produces: gid | created | title -------+------------------------+--------17111 | 2000-10-19 15:20:46-04 | image117111 | 2000-10-19 15:20:46-04 | image217111| 2000-10-19 15:20:46-04 | image3 (3 rows) Great, what I expect ... But: SELECT distinct s.gid, s.created, count(i.title) AS images FROM status s LEFT JOIN images i ON (s.gid = i.gid AND i.active),personal_data pd, relationship_wanted rw WHERE s.active AND s.status != 0 AND s.gid = 17111 AND (s.gid =pd.gid AND pd.gender = 0) AND (s.gid = rw.gid AND rw.gender = 0 ) GROUP BY s.gid, s.created; Produces: /tmp/psql.edit.70.62491: 7 lines, 353 characters. gid | created | images -------+------------------------+--------17111 | 2000-10-19 15:20:46-04 | 15 (1 row) So why is it counting 12 more images then are actually found/exist: testdb=# select title from images where gid = 17111;title --------image1image3image2 (3 rows) Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
The Hermit Hacker <scrappy@hub.org> writes: > So why is it counting 12 more images then are actually found/exist: Hm. Could we see the EXPLAIN output for both of those? regards, tom lane
On Sat, 12 May 2001, Tom Lane wrote: > The Hermit Hacker <scrappy@hub.org> writes: > > So why is it counting 12 more images then are actually found/exist: > > Hm. Could we see the EXPLAIN output for both of those? without count: NOTICE: QUERY PLAN: Unique (cost=8.66..8.67 rows=1 width=37) -> Sort (cost=8.66..8.66 rows=1 width=37) -> Nested Loop (cost=0.00..8.65rows=1 width=37) -> Nested Loop (cost=0.00..6.52 rows=1 width=33) -> NestedLoop (cost=0.00..4.26 rows=1 width=29) -> Index Scan using status_gid on status s (cost=0.00..2.23rows=1 width=12) -> Index Scan using images_gid on images i (cost=0.00..2.02 rows=1width=17) -> Index Scan using personal_data_gid on personal_data pd (cost=0.00..2.25 rows=1 width=4) -> Index Scan using relationship_wanted_gid on relationship_wanted rw (cost=0.00..2.11 rows=1 width=4) EXPLAIN with count: NOTICE: QUERY PLAN: Unique (cost=8.68..8.69 rows=1 width=37) -> Sort (cost=8.68..8.68 rows=1 width=37) -> Aggregate (cost=8.66..8.67rows=1 width=37) -> Group (cost=8.66..8.67 rows=1 width=37) -> Sort (cost=8.66..8.66rows=1 width=37) -> Nested Loop (cost=0.00..8.65 rows=1 width=37) -> Nested Loop (cost=0.00..6.52 rows=1 width=33) -> Nested Loop (cost=0.00..4.26 rows=1 width=29) -> Index Scan using status_gid on status s (cost=0.00..2.23 rows=1 width=12) -> Index Scan using images_gid on images i (cost=0.00..2.02 rows=1 width=17) -> Index Scan using personal_data_gid on personal_datapd (cost=0.00..2.25 rows=1 width=4) -> Index Scan using relationship_wanted_gidon relationship_wanted rw (cost=0.00..2.11 rows=1 width=4) EXPLAIN > > regards, tom lane > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
Ah, I see it: your join against relationship_wanted isn't unique. globalmatch=# select count(*) from personal_data pd globalmatch-# where pd.gid = 17111 AND pd.gender = 0;count ------- 1 (1 row) globalmatch=# select count(*) from relationship_wanted rw globalmatch-# where rw.gid = 17111 AND rw.gender = 0;count ------- 5 (1 row) globalmatch=# So that inflates the number of rows coming out of the join by 5. regards, tom lane
On Sat, 12 May 2001, Tom Lane wrote: > Ah, I see it: your join against relationship_wanted isn't unique. > > globalmatch=# select count(*) from personal_data pd > globalmatch-# where pd.gid = 17111 AND pd.gender = 0; > count > ------- > 1 > (1 row) > > globalmatch=# select count(*) from relationship_wanted rw > globalmatch-# where rw.gid = 17111 AND rw.gender = 0; > count > ------- > 5 > (1 row) > > globalmatch=# > > So that inflates the number of rows coming out of the join by 5. Okay, then I'm lost ... why wouldn't that show up without the COUNT()? I doubt doubt your analysis, I just want to understand why ...
On Sat, 12 May 2001, Tom Lane wrote: > The Hermit Hacker <scrappy@hub.org> writes: > >> So that inflates the number of rows coming out of the join by 5. > > > Okay, then I'm lost ... why wouldn't that show up without the COUNT()? I > > doubt doubt your analysis, I just want to understand why ... > > You had DISTINCT on your query, which hid the duplicated rows from you. > But that happens *after* aggregate processing, so it doesn't hide the > dups from COUNT(). Ahhhh, okay, that makes sense ... thanks for taking the time to check it for me ... and explaining what I was missing ...
The Hermit Hacker <scrappy@hub.org> writes: >> So that inflates the number of rows coming out of the join by 5. > Okay, then I'm lost ... why wouldn't that show up without the COUNT()? I > doubt doubt your analysis, I just want to understand why ... You had DISTINCT on your query, which hid the duplicated rows from you. But that happens *after* aggregate processing, so it doesn't hide the dups from COUNT(). regards, tom lane