Re: Re: BUG #18927: Poor query plan - HashAggregate is more than twice as fast but at a higher cost - Mailing list pgsql-bugs

From Dilip Kumar
Subject Re: Re: BUG #18927: Poor query plan - HashAggregate is more than twice as fast but at a higher cost
Date
Msg-id CAFiTN-vJfQUjxJVt_cX=v317hpy11kbRgZzVUPRqA4DPWezYZw@mail.gmail.com
Whole thread Raw
In response to Re: Re: BUG #18927: Poor query plan - HashAggregate is more than twice as fast but at a higher cost  (萧鸿骏 <23031212454@stu.xidian.edu.cn>)
List pgsql-bugs
On Thu, May 15, 2025 at 6:42 PM 萧鸿骏 <23031212454@stu.xidian.edu.cn> wrote:
>
> Thank you very much for your reply! May I assume that we need to manually adjust the cost based on the actual
situationof the system and testing computer? 
>
> I have been conducting research on SQL performance issues in PG recently, and many of the problems I am currently
studyingare related to it. I would like to consult with you. If the default cost value of the optimizer is not set
reliably,resulting in the selection of a poor execution plan and significant performance differences, is this
considereda direction for optimizer performance optimization? I think if we stand from the user's perspective, this may
bedifficult to detect, leading to significant performance losses. 
>

Multiple factors influence the planner parameters seq_page_cost and
random_page_cost. PostgreSQL sets these to conservative default values
that aim to strike a balance, not assuming either extreme. On one end,
you might have a system where most data is on a slow spinning disk,
while on the other, you may have a machine with large amounts of RAM
such that almost all data is cached and disk I/O is rarely needed.

These cost parameters directly influence the planner’s choice between
access paths, for example, index scan vs sequential scan: An index
scan involves random I/O, since it needs to jump around the heap to
fetch matching rows by TID. A sequential scan reads the table
linearly, which is generally faster on disk due to fewer seeks.

Now, suppose your WHERE clause filters out 50% of the rows. The
planner might estimate that an index scan would involve a high cost
due to frequent random page reads, especially since the default
random_page_cost is 4 times higher than seq_page_cost. As a result, it
may choose a sequential scan as the cheaper plan.

However, if most of your data is already in RAM, there is no
meaningful difference between random and sequential page reads; both
are fast. In such a case, the planner’s assumptions (based on default
cost values) can lead to a suboptimal plan, not because of a bug, but
because it's working with inaccurate cost estimates relative to your
hardware.

So while the defaults work well for many systems, if you’re noticing
suboptimal plans, especially on machines with a lot of RAM or fast
SSDs, it's worth tuning these parameters. For systems that mostly
serve static data and have high cache hit ratios, reducing both
seq_page_cost and random_page_cost (and possibly making them equal)
may help the planner make better decisions.

This is just my opinion, and others may think differently.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-bugs by date:

Previous
From: Corey Huinker
Date:
Subject: Re: BUG #18923: pg_dump 18beta1 fails to process complex table names
Next
From: jian he
Date:
Subject: Re: BUG #18923: pg_dump 18beta1 fails to process complex table names