JOIN between three *simple* tables ... - Mailing list pgsql-hackers
From | Marc G. Fournier |
---|---|
Subject | JOIN between three *simple* tables ... |
Date | |
Msg-id | 20020206162412.F57607-100000@earth.hub.org Whole thread Raw |
Responses |
Re: JOIN between three *simple* tables ...
|
List | pgsql-hackers |
Morning all ... First off, this is using v7.2 release ... Okay, this is going to drive me up the proverbial wall ... very simple query: SELECT p.uid, p.handle FROM orientation_c poc, profiles p, gender_f pgf WHERE (p.uid = pgf.uid ) AND (pgf.uid = poc.uid) ; profiles contains: iwantu=# select count(1) from profiles;count --------485969 (1 row) and is everyone in the system ... no problems there ... gender_f contains: iwantu=# select count(1) from gender_f;count -------75664 (1 row) And is *just* the uid's of those in profiles that are female ... finally, orientation_c: iwantu=# select count(1) from orientation_c;count --------126477 (1 row) Is again *just* the uid's of those in profiles that have a 'c' orientiation ... Now, the above wquery has an explain of: Hash Join (cost=6363.90..47877.08 rows=19692 width=35) -> Seq Scan on profiles p (cost=0.00..35707.69 rows=485969 width=19)-> Hash (cost=6174.74..6174.74 rows=75664 width=16) -> Hash Join (cost=2928.34..6174.74 rows=75664 width=16) -> Seq Scan on gender_f pgf (cost=0.00..1165.64 rows=75664 width=8) -> Hash (cost=1948.77..1948.77rows=126477 width=8) -> Seq Scan on orientation_c poc (cost=0.00..1948.77 rows=126477width=8) Now, a join between poc and pgf alone comes out to: iwantu=# select count(1) from orientation_c poc, gender_f pgf where poc.uid = pgf.uid;count -------12703 (1 row) iwantu=# explain select count(1) from orientation_c poc, gender_f pgf where poc.uid = pgf.uid; NOTICE: QUERY PLAN: Aggregate (cost=6363.90..6363.90 rows=1 width=16) -> Hash Join (cost=2928.34..6174.74 rows=75664 width=16) -> SeqScan on gender_f pgf (cost=0.00..1165.64 rows=75664 width=8) -> Hash (cost=1948.77..1948.77 rows=126477 width=8) -> Seq Scan on orientation_c poc (cost=0.00..1948.77 rows=126477 width=8) EXPLAIN Now, what I'd like to have happen is a SEQ SCAN through the smaller table (gender_f), and grab everything in orientation_c that matches (both tables have zero duplicates of uid, its purely a one of, so I would think that I should be able to take 1 uid from pgf, and use the index on poc to determine if it exists, and do that 75664 times ... That would live me with 12703 UIDs to match up with apropriate records in the almost 500+k records in profiles itself, instead of having to scan through each of thoose 500+k records themselves ... Then again, let's go one simpler: iwantu=# \d orientation_c Table "orientation_c"Column | Type | Modifiers --------+--------+-----------uid | bigint | Indexes: poc_uid iwantu=# \d poc_uidIndex "poc_uid"Column | Type --------+--------uid | bigint btree iwantu=# explain select count(1) from orientation_c poc where uid = 1; NOTICE: QUERY PLAN: Aggregate (cost=2264.97..2264.97 rows=1 width=0) -> Seq Scan on orientation_c poc (cost=0.00..2264.96 rows=1 width=0) EXPLAIN if all varlues in orientation_c are unique, and there are 127k records ... shouldn't it use the index instead of scanning through all 127k records ? Or am I missing something totallyobvious here?
pgsql-hackers by date: