Home > mailing lists

Re: Performance With Joins on Large Tables - Mailing list pgsql-performance

From	Joshua Marsh
Subject	Re: Performance With Joins on Large Tables
Date	September 13, 2006 11:49:33
Msg-id	38242de90609130749u976295fwb10904ff4797b920@mail.gmail.com Whole thread Raw
In response to	Re: Performance With Joins on Large Tables ("Jim C. Nasby" <jim@nasby.net>)
Responses	Re: Performance With Joins on Large Tables
List	pgsql-performance

Tree view

On 9/13/06, Jim C. Nasby <jim@nasby.net> wrote:
> On Tue, Sep 12, 2006 at 04:17:34PM -0600, Joshua Marsh wrote:
> > data=# EXPLAIN SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM
> > v.custdate), EXTRACT (YEAR FROM s.custdate) || '-' || EXTRACT (MONTH
> > FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s ON v.dsiacctno =
> > s.dsiacctno;
> >                                          QUERY PLAN
> > -----------------------------------------------------------------------------------------------
> > Merge Join  (cost=293767607.69..305744319.52 rows=285392608 width=11)
> >   Merge Cond: ("outer".dsiacctno = "inner".dsiacctno)
> >   ->  Sort  (cost=127304933.87..127585815.71 rows=112352736 width=20)
> >         Sort Key: v.dsiacctno
> >         ->  Seq Scan on view_505 v  (cost=100000000.00..104604059.36
> > rows=112352736 width=20)
> >   ->  Sort  (cost=166462673.82..167176155.34 rows=285392608 width=17)
> >         Sort Key: s.dsiacctno
> >         ->  Seq Scan on r3s169 s  (cost=100000000.00..106875334.08
> > rows=285392608 width=17)
> > (8 rows)
> >
> >
> > Since enable_seqscan is off, my understanding is that in order for the query
> > planner to user a sequential scan it must think there is no other
> > alternative.
> > Both sides are indexed and anaylzed, so that confuses me a little.
> >
> > I tried it on a smaller sample set of the data and it works fine:
>
> Actually, enable_seqscan=off just adds a fixed overhead to the seqscan
> cost estimate. That's why the cost for the seqscans in that plan starts
> at 100000000. I've suggested changing that to a variable overhead based
> on the expected rowcount, but the counter-argument was that anyone with
> so much data that the fixed amount wouldn't work would most likely be
> having bigger issues anyway.
>
> Other things you can try to get the index scan back would be to reduce
> random_page_cost and to analyze the join fields in those tables with a
> higher statistics target (though I'm not 100% certain the join cost
> estimator actually takes that into account). Or if you don't mind
> patching your source code, it wouldn't be difficult to make
> enable_seqscan use a bigger 'penalty value' than 10000000.
> --
> Jim Nasby                                            jim@nasby.net
> EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)
>

Thanks for the tip. I lowered random_page_cost and got these results:

data=# EXPLAIN SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM
v.custdate), EXTRACT (YEAR FROM s.custdate) || '-' || EXTRACT (MONTH
FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s ON v.dsiacctno =
s.dsiacctno;
                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Merge Join  (cost=0.00..20921221.49 rows=285230272 width=11)
   Merge Cond: ("outer".dsiacctno = "inner".dsiacctno)
   ->  Index Scan using view_505_dsiacctno on view_505 v
(cost=0.00..2838595.79 rows=112393848 width=20)
   ->  Index Scan using r3s169_dsiacctno on r3s169 s
(cost=0.00..7106203.68 rows=285230272 width=17)
(4 rows)

That seems to have done it.  Are there any side effects to this
change?  I read about random_page_cost in the documentation and it
seems like this is strictly for planning.  All the tables on this
database will be indexed and of a size similar to these two, so I
don't see it  causing any other problems.  Though I would check though
:)

pgsql-performance by date:

From: Mark Lewis
Date: 13 September 2006, 11:44:38
Subject: Re: sql-bench

From: Tom Lane
Date: 13 September 2006, 12:32:53
Subject: Re: sql-bench

Re: Performance With Joins on Large Tables - Mailing list pgsql-performance

Previous

Next