Re: Performance With Joins on Large Tables - Mailing list pgsql-performance
From | Joshua Marsh |
---|---|
Subject | Re: Performance With Joins on Large Tables |
Date | |
Msg-id | 38242de90609130749u976295fwb10904ff4797b920@mail.gmail.com Whole thread Raw |
In response to | Re: Performance With Joins on Large Tables ("Jim C. Nasby" <jim@nasby.net>) |
Responses |
Re: Performance With Joins on Large Tables
|
List | pgsql-performance |
On 9/13/06, Jim C. Nasby <jim@nasby.net> wrote: > On Tue, Sep 12, 2006 at 04:17:34PM -0600, Joshua Marsh wrote: > > data=# EXPLAIN SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM > > v.custdate), EXTRACT (YEAR FROM s.custdate) || '-' || EXTRACT (MONTH > > FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s ON v.dsiacctno = > > s.dsiacctno; > > QUERY PLAN > > ----------------------------------------------------------------------------------------------- > > Merge Join (cost=293767607.69..305744319.52 rows=285392608 width=11) > > Merge Cond: ("outer".dsiacctno = "inner".dsiacctno) > > -> Sort (cost=127304933.87..127585815.71 rows=112352736 width=20) > > Sort Key: v.dsiacctno > > -> Seq Scan on view_505 v (cost=100000000.00..104604059.36 > > rows=112352736 width=20) > > -> Sort (cost=166462673.82..167176155.34 rows=285392608 width=17) > > Sort Key: s.dsiacctno > > -> Seq Scan on r3s169 s (cost=100000000.00..106875334.08 > > rows=285392608 width=17) > > (8 rows) > > > > > > Since enable_seqscan is off, my understanding is that in order for the query > > planner to user a sequential scan it must think there is no other > > alternative. > > Both sides are indexed and anaylzed, so that confuses me a little. > > > > I tried it on a smaller sample set of the data and it works fine: > > Actually, enable_seqscan=off just adds a fixed overhead to the seqscan > cost estimate. That's why the cost for the seqscans in that plan starts > at 100000000. I've suggested changing that to a variable overhead based > on the expected rowcount, but the counter-argument was that anyone with > so much data that the fixed amount wouldn't work would most likely be > having bigger issues anyway. > > Other things you can try to get the index scan back would be to reduce > random_page_cost and to analyze the join fields in those tables with a > higher statistics target (though I'm not 100% certain the join cost > estimator actually takes that into account). Or if you don't mind > patching your source code, it wouldn't be difficult to make > enable_seqscan use a bigger 'penalty value' than 10000000. > -- > Jim Nasby jim@nasby.net > EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) > Thanks for the tip. I lowered random_page_cost and got these results: data=# EXPLAIN SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM v.custdate), EXTRACT (YEAR FROM s.custdate) || '-' || EXTRACT (MONTH FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s ON v.dsiacctno = s.dsiacctno; QUERY PLAN ---------------------------------------------------------------------------------------------------------- Merge Join (cost=0.00..20921221.49 rows=285230272 width=11) Merge Cond: ("outer".dsiacctno = "inner".dsiacctno) -> Index Scan using view_505_dsiacctno on view_505 v (cost=0.00..2838595.79 rows=112393848 width=20) -> Index Scan using r3s169_dsiacctno on r3s169 s (cost=0.00..7106203.68 rows=285230272 width=17) (4 rows) That seems to have done it. Are there any side effects to this change? I read about random_page_cost in the documentation and it seems like this is strictly for planning. All the tables on this database will be indexed and of a size similar to these two, so I don't see it causing any other problems. Though I would check though :)
pgsql-performance by date: