Re: Unexpected Performance for the Function simplify_function - Mailing list pgsql-performance

From Tom Lane
Subject Re: Unexpected Performance for the Function simplify_function
Date
Msg-id 2595992.1729802958@sss.pgh.pa.us
Whole thread Raw
In response to Unexpected Performance for the Function simplify_function  (Ba Jinsheng <bajinsheng@u.nus.edu>)
Responses Re: Unexpected Performance for the Function simplify_function
List pgsql-performance
Ba Jinsheng <bajinsheng@u.nus.edu> writes:
> I am not proposing a fixing patch, as the patch is incorrect. Instead, I just want to show disabling the
simplify_function()function brings performance benefit, and it seems unexpected. I am wondering whether we can optimize
simplify_function()to make the performance better for this workload? 

The improvement here seems purely accidental.  Basically, you have
broken the rowcount estimation for the "orders" scan: instead of

>                ->  Parallel Seq Scan on orders  (cost=0.00..35511.00 rows=23858 width=8) (actual time=0.033..62.907
rows=19046loops=3) 
>                      Filter: ((o_orderdate >= '1993-08-01'::date) AND (o_orderdate < '1993-11-01 00:00:00'::timestamp
withouttime zone)) 
>                      Rows Removed by Filter: 480954

we get

>                ->  Parallel Seq Scan on orders  (cost=0.00..37062.50 rows=3125 width=8) (actual time=0.029..63.908
rows=19046loops=3) 
>                   Filter: ((o_orderdate >= '1993-08-01'::date) AND (o_orderdate < ('1993-08-01'::date + '3
mons'::intervalmonth))) 
>                  Rows Removed by Filter: 480954

The rows estimate is a good deal further from reality, as a
consequence of the fact that scalarltsel is now delivering a default
guess instead of a real estimate based on a constant comparison value.
Somehow, that's nonetheless leading to a better plan choice.  It could
be that the underlying problem is a poor cost model for parallel hash
join.  Or it could be as simple as random_page_cost not being tuned
for your environment.  It looks like the better plan involves a
nestloop with inner indexscan on lineitem, which is something whose
estimated cost depends enormously on random_page_cost.  You've given
us exactly zero detail about your test conditions, so it's hard to say
more than that.

            regards, tom lane



pgsql-performance by date:

Previous
From: Shiv Iyer
Date:
Subject: Re: Unexpected Performance for the Function simplify_function
Next
From: Tom Lane
Date:
Subject: Re: Unexpected Performance for the Function simplify_function