Re: Help with sql - Mailing list pgsql-general
From | Steve Crawford |
---|---|
Subject | Re: Help with sql |
Date | |
Msg-id | 4FF78947.5080902@pinpointresearch.com Whole thread Raw |
In response to | Help with sql (Perry Smith <pedzsan@gmail.com>) |
Responses |
Re: Help with sql
|
List | pgsql-general |
On 07/06/2012 02:34 PM, Perry Smith wrote: > Hi Guys, > > This isn't a PostgreSQL specific question but just a SQL question. If this is not an appropriate question for this list,please let me know. > > It is also, perhaps, a really silly question. > > This query (without the 'explain' keyword) , when executed takes forever and a day: > >> condor_development=> explain select id from filesets where id not in ( select fileset_id from service_pack_fileset_maps); >> QUERY PLAN >> ---------------------------------------------------------------------------------------------- >> Seq Scan on filesets (cost=0.00..71937742.00 rows=26088 width=4) >> Filter: (NOT (SubPlan 1)) >> SubPlan 1 >> -> Materialize (cost=0.00..2517.78 rows=95852 width=4) >> -> Seq Scan on service_pack_fileset_maps (cost=0.00..1663.52 rows=95852 width=4) >> (5 rows) > This query returns within a second: > >> condor_development=> explain select id from filesets where id not in ( select distinct fileset_id from service_pack_fileset_maps); >> QUERY PLAN >> ---------------------------------------------------------------------------------------------- >> Seq Scan on filesets (cost=2102.31..3153.53 rows=26088 width=4) >> Filter: (NOT (hashed SubPlan 1)) >> SubPlan 1 >> -> HashAggregate (cost=1903.15..2062.48 rows=15933 width=4) >> -> Seq Scan on service_pack_fileset_maps (cost=0.00..1663.52 rows=95852 width=4) >> (5 rows) > The difference is the "distinct" keyword in the inner select. > > What I'm confused about is why isn't the "distinct" implicit? I thought the construct "blah in ( select ... )" was using"sets" and an item (I thought) can not be in a set more than once. > > Perhaps my question is the opposite really? Why would you not always use "distinct" in the inner select when the operatoris "in" or "not in" ? > > And if I can throw in another question on top: is there a different method other than "not in" that would work better? Actually it is *very* PostgreSQL specific. In fact, it may even be PostgreSQL *version* specific as you are delving into how the planner decides how to handle a query. It appears that the planner is assuming, based on collected stats and available indexes, that there will be roughly 1/6 the records returned by the "distinct" query and thus chose a different method to join the records. One useful piece of information would be the indexes on the two tables. As to other methods, you can use: ... where not exists (select 1 from service_pack_fileset_maps where fileset_id = filesets.id)... (Note: as alluded to above, ...not in... works better in some releases and ...not exists... better in others due to improvements over time.) Still another method: select id from filesets except select fileset_id from service_pack_fileset_maps; Cheers, Steve
pgsql-general by date: