Thread: bug in JOIN or COUNT or ... ?

bug in JOIN or COUNT or ... ?

From

The Hermit Hacker

Date:

12 May 2001, 19:13:01

Sorry, worst Subject I've ever come up with, but this is one of those "I
haven't got a clue how to describe" emails ...

Simple query:
 SELECT distinct s.gid, s.created, i.title   FROM status s LEFT JOIN images i ON (s.gid = i.gid AND i.active),
personal_datapd, relationship_wanted rw  WHERE s.active AND s.status != 0    AND s.gid = 17111    AND (s.gid = pd.gid
ANDpd.gender = 0)    AND (s.gid = rw.gid AND rw.gender = 0 );
 

Produces:
 gid  |        created         | title
-------+------------------------+--------17111 | 2000-10-19 15:20:46-04 | image117111 | 2000-10-19 15:20:46-04 |
image217111| 2000-10-19 15:20:46-04 | image3
 
(3 rows)

Great, what I expect ...

But:
 SELECT distinct s.gid, s.created, count(i.title) AS images   FROM status s LEFT JOIN images i ON (s.gid = i.gid AND
i.active),personal_data pd, relationship_wanted rw  WHERE s.active AND s.status != 0    AND s.gid = 17111    AND (s.gid
=pd.gid AND pd.gender = 0)    AND (s.gid = rw.gid AND rw.gender = 0 )
 
GROUP BY s.gid, s.created;

Produces:

/tmp/psql.edit.70.62491: 7 lines, 353 characters. gid  |        created         | images
-------+------------------------+--------17111 | 2000-10-19 15:20:46-04 |     15
(1 row)

So why is it counting 12 more images then are actually found/exist:

testdb=# select title from images where gid = 17111;title
--------image1image3image2
(3 rows)

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org

Re: bug in JOIN or COUNT or ... ?

From

Tom Lane

Date:

12 May 2001, 19:52:52

The Hermit Hacker <scrappy@hub.org> writes:
> So why is it counting 12 more images then are actually found/exist:

Hm.  Could we see the EXPLAIN output for both of those?
        regards, tom lane

Re: bug in JOIN or COUNT or ... ?

From

The Hermit Hacker

Date:

12 May 2001, 20:52:04

On Sat, 12 May 2001, Tom Lane wrote:

> The Hermit Hacker <scrappy@hub.org> writes:
> > So why is it counting 12 more images then are actually found/exist:
>
> Hm.  Could we see the EXPLAIN output for both of those?

without count:

NOTICE:  QUERY PLAN:

Unique  (cost=8.66..8.67 rows=1 width=37) ->  Sort  (cost=8.66..8.66 rows=1 width=37)       ->  Nested Loop
(cost=0.00..8.65rows=1 width=37)             ->  Nested Loop  (cost=0.00..6.52 rows=1 width=33)                   ->
NestedLoop  (cost=0.00..4.26 rows=1 width=29)                         ->  Index Scan using status_gid on status s
(cost=0.00..2.23rows=1 width=12)                         ->  Index Scan using images_gid on images i  (cost=0.00..2.02
rows=1width=17)                   ->  Index Scan using personal_data_gid on personal_data pd  (cost=0.00..2.25 rows=1
width=4)            ->  Index Scan using relationship_wanted_gid on relationship_wanted rw  (cost=0.00..2.11 rows=1
width=4)

EXPLAIN

with count:

NOTICE:  QUERY PLAN:

Unique  (cost=8.68..8.69 rows=1 width=37) ->  Sort  (cost=8.68..8.68 rows=1 width=37)       ->  Aggregate
(cost=8.66..8.67rows=1 width=37)             ->  Group  (cost=8.66..8.67 rows=1 width=37)                   ->  Sort
(cost=8.66..8.66rows=1 width=37)                         ->  Nested Loop  (cost=0.00..8.65 rows=1 width=37)
                 ->  Nested Loop  (cost=0.00..6.52 rows=1 width=33)                                     ->  Nested Loop
(cost=0.00..4.26 rows=1 width=29)                                           ->  Index Scan using status_gid on status s
(cost=0.00..2.23 rows=1 width=12)                                           ->  Index Scan using images_gid on images i
(cost=0.00..2.02 rows=1 width=17)                                     ->  Index Scan using personal_data_gid on
personal_datapd  (cost=0.00..2.25 rows=1 width=4)                               ->  Index Scan using
relationship_wanted_gidon relationship_wanted rw  (cost=0.00..2.11 rows=1 width=4)
 

EXPLAIN



>
>             regards, tom lane
>

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org

Re: bug in JOIN or COUNT or ... ?

From

Tom Lane

Date:

12 May 2001, 21:34:47

Ah, I see it: your join against relationship_wanted isn't unique.

globalmatch=# select count(*) from personal_data pd
globalmatch-# where pd.gid = 17111 AND pd.gender = 0;count 
-------    1
(1 row)

globalmatch=# select count(*) from relationship_wanted rw
globalmatch-# where rw.gid = 17111 AND rw.gender = 0;count 
-------    5
(1 row)

globalmatch=# 

So that inflates the number of rows coming out of the join by 5.
        regards, tom lane

Re: bug in JOIN or COUNT or ... ?

From

The Hermit Hacker

Date:

12 May 2001, 22:17:07

On Sat, 12 May 2001, Tom Lane wrote:

> Ah, I see it: your join against relationship_wanted isn't unique.
>
> globalmatch=# select count(*) from personal_data pd
> globalmatch-# where pd.gid = 17111 AND pd.gender = 0;
>  count
> -------
>      1
> (1 row)
>
> globalmatch=# select count(*) from relationship_wanted rw
> globalmatch-# where rw.gid = 17111 AND rw.gender = 0;
>  count
> -------
>      5
> (1 row)
>
> globalmatch=#
>
> So that inflates the number of rows coming out of the join by 5.

Okay, then I'm lost ... why wouldn't that show up without the COUNT()?  I
doubt doubt your analysis, I just want to understand why ...

Re: bug in JOIN or COUNT or ... ?

From

The Hermit Hacker

Date:

12 May 2001, 22:28:19

On Sat, 12 May 2001, Tom Lane wrote:

> The Hermit Hacker <scrappy@hub.org> writes:
> >> So that inflates the number of rows coming out of the join by 5.
>
> > Okay, then I'm lost ... why wouldn't that show up without the COUNT()?  I
> > doubt doubt your analysis, I just want to understand why ...
>
> You had DISTINCT on your query, which hid the duplicated rows from you.
> But that happens *after* aggregate processing, so it doesn't hide the
> dups from COUNT().

Ahhhh, okay, that makes sense ... thanks for taking the time to check it
for me ... and explaining what I was missing ...

Re: bug in JOIN or COUNT or ... ?

From

Tom Lane

Date:

12 May 2001, 23:04:09

The Hermit Hacker <scrappy@hub.org> writes:
>> So that inflates the number of rows coming out of the join by 5.

> Okay, then I'm lost ... why wouldn't that show up without the COUNT()?  I
> doubt doubt your analysis, I just want to understand why ...

You had DISTINCT on your query, which hid the duplicated rows from you.
But that happens *after* aggregate processing, so it doesn't hide the
dups from COUNT().
        regards, tom lane