Re: Self-referencing table question - Mailing list pgsql-sql
From | Sean Davis |
---|---|
Subject | Re: Self-referencing table question |
Date | |
Msg-id | 1303f92d183bfd2ff7c522d4ad5afde8@mail.nih.gov Whole thread Raw |
In response to | Re: Self-referencing table question (Edmund Bacon <ebacon@onesystem.com>) |
Responses |
Re: Self-referencing table question
|
List | pgsql-sql |
Thanks. I thought about that a bit and it seems like it is highly likely to be expensive for a single query (though I should probably try it at some point). If I do find myself reformatting results after response to user input (i.e., reusing the query), though, then your solution is likely to be very useful. Sean On Mar 24, 2005, at 11:13 AM, Edmund Bacon wrote: > Sometimes using a temp table is a better idea: > > e.g. > > -- start by creating a temp table 'tids' that hold the to_ids that > -- we are interested in. > SELECT to_id > INTO TEMP TABLE tids > FROM correlation > WHERE from_id = 1234 > ORDER BY val DESC limit 100; > > -- The following temp table makes use of the primary key on > -- the correlation table, and the stated goal from the original > -- question that: > -- from_id > to_id > -- and from_id in (tids.to_id) > -- and to_id in (tids.to_id) > > SELECT t1.to_id AS from_id, t2.to_id > INTO TEMP TABLE from_to > FROM tids t1, tids t2 > WHERE t1.to_id > t2.to_id; > > -- Now we can use the from_to table as an index into the correlation > -- table. > > SELECT c.from_id, c.to_id, c.val > FROM from_to > JOIN correlation c USING(from_id, to_id) > WHERE val > 0.5; > > > The explain analyze for the final select works out to: > Nested Loop (cost=0.00..50692.00 rows=8488 width=16) (actual > time=0.171..150.095 rows=2427 loops=1) > -> Seq Scan on from_to (cost=0.00..79.38 rows=5238 width=8) > (actual time=0.006..7.660 rows=4950 loops=1) > -> Index Scan using correlation_pkey on correlation c > (cost=0.00..9.63 rows=2 width=16) (actual time=0.024..0.025 rows=0 > loops=4950) > Index Cond: (("outer".from_id = c.from_id) AND ("outer".to_id > = c.to_id)) > Filter: (val > 0.5::double precision) > Total runtime: 152.261 ms > > > Richard Huxton wrote: > >> Sean Davis wrote: >> >>> I answer my own question, if only for my own records. The following >>> query is about 5-6 times faster than the original. Of course, if >>> anyone else has other ideas, I'd be happy to hear them. >>> >>> Sean >>> >>> explain analyze select from_id,to_id,val from exprsdb.correlation >>> where from_id in (select to_id from exprsdb.correlation where >>> from_id=2424 order by val desc limit 100) and to_id in (select >>> to_id from exprsdb.correlation where from_id=2424 order by val desc >>> limit 100) and val>0.6 and to_id<from_id; >> >> >> Might not be any faster, but you can do this as a self-join with >> subquery: >> >> SELECT c1.from_id, c1.to_id, c1.val >> FROM >> correlation c1, >> ( >> SELECT to_id FROM correlation WHERE from_id=2424 >> ORDER BY val DESC LIMIT 100 >> ) AS c2 >> ( >> SELECT to_id FROM correlation WHERE from_id=2424 >> ORDER BY val DESC LIMIT 100 >> ) AS c3 >> WHERE >> c1.from_id = c2.to_id >> AND c1.to_id = c3.to_id >> AND c1.val > 0.5 >> AND c1.to_id < from_id >> ; >> >> I think PG should be smart enough nowadays to figure out these two >> queries are basically the same. > > > -- > Edmund Bacon <ebacon@onesystem.com> > > > ---------------------------(end of > broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly