Re: varchar index joins not working? - Mailing list pgsql-performance
From | Adam Gundy |
---|---|
Subject | Re: varchar index joins not working? |
Date | |
Msg-id | 47FE29AF.6010906@starsilk.net Whole thread Raw |
In response to | Re: varchar index joins not working? (Richard Huxton <dev@archonet.com>) |
Responses |
Re: varchar index joins not working?
|
List | pgsql-performance |
Richard Huxton wrote: > Adam Gundy wrote: >> I'm hitting an unexpected problem with postgres 8.3 - I have some >> tables which use varchar(32) for their unique IDs which I'm attempting >> to join using some simple SQL: >> >> select * >> from group_access, groups >> where group_access.groupid = groups.groupid and >> group_access.uid = '7275359408f44591d0717e16890ce335'; >> >> there's a unique index on group_access.groupid, and a non-unique index >> on groups.groupid. both are non-null. > > What about group_access.uid - I'd have thought that + groups pkey is > probably the sensible combination here. that is an index on group_access: "group_access_pkey" PRIMARY KEY, btree (groupid, uid) adding the (uid, groupid) index helps the small database, it will do an index join if forced to, but the full database still refuses to do an index join - it does a full index scan followed by a merge. >> QUERY PLAN >> -------------------------------------------------------------------------------------------------------------------------- >> >> Hash Join (cost=8.89..41329.88 rows=119940 width=287) (actual >> time=0.202..935.136 rows=981 loops=1) > > That's because it's expecting 119,940 rows to match (rather than the > actual 981 you do get). If you were getting that many results this is > probably a sensible plan. sure. but it's estimate is *wildly* off >> Hash Cond: ((groups.groupid)::text = (group_access.groupid)::text) >> -> Seq Scan on groups (cost=0.00..31696.48 rows=1123348 >> width=177) (actual time=0.011..446.091 rows=1125239 loops=1) > > It's got a good idea of the total number of rows in groups. yeah. >> -> Hash (cost=8.51..8.51 rows=30 width=110) (actual >> time=0.148..0.148 rows=30 loops=1) >> -> Seq Scan on group_access (cost=0.00..8.51 rows=30 >> width=110) (actual time=0.014..0.126 rows=30 loops=1) > > And also group_access. Oh, the seq-scan doesn't really matter here. It > probably *is* faster to read all 30 rows in one burst rather than go to > the index and then back to the table. agreed. >> it's running an index scan across the entire table (no condition >> applied) :-( >> >> so, just for the hell of it, I tried making groupid a char(32), >> despite repeated assertions in this group that there's no performance >> difference between the two: > > There's no performance difference between the two. hah. if it makes the join with char (and runs fast), or reads the whole table with varchar, then there *is* a performance difference - a big one! > The char(32) thing isn't important here, what is important is that it's > expecting ~300 rows rather than 120,000. It's still wrong, but it's > close enough to make sense. > So - the question is - why is PG expecting so many matches to your join. more to the point, why does it get the estimate right (or close) with char, but massively wrong with varchar? I've been vacuum analyzing after each change.. with the smaller database, and char type, it (for certain joins) still wants to do a seqscan because the tables are small enough, but if I disable seqscan, it does an index join (usually with a small time penalty). if I switch the types back to varchar, re-analyze, re-run, it *will not* do an index join! > How many distinct values do you have in groups.groupid and > group_access.group_id? for the small database (since it shows the same problem): group_access: 280/268 groups: 2006/139 for the large database: group_access: same groups: 1712647/140 the groupid key is an MD5 hash, so it should be uniformly distributed. maybe that throws the stats? but, again, char works, varchar doesn't :-(
Attachment
pgsql-performance by date: