Thread: explanation for random_page_cost is outdated
The following documentation comment has been logged on the website: Page: https://www.postgresql.org/docs/12/runtime-config-query.html Description: Explanation for random_page_cost is rather outdated, because it did only for case of mechanical hdd. But all modern database servers, which I know, made upon SSD. Do or not do default value for random_page_cost equal to 1 is the question, but, IMHO, at list in the documentation about random_page_cost need to add in a speculation about SSD. It's important because a business programming now is mostly web programming. Most database is poorly designed by web programmer, tables looked like a primary key and a huge json (containing all) with large gin index upon it. Now I am seeing a table with a GIN index 50% of the table size. The database is on SSD, of cause. With default random_page_cost=4 GIN index don't used by planner, but with random_page_cost=1 the result may be not excellent, but acceptable for web programmers.
On Thu, Feb 27, 2020 at 02:48:44PM +0000, PG Doc comments form wrote: > The following documentation comment has been logged on the website: > > Page: https://www.postgresql.org/docs/12/runtime-config-query.html > Description: > > Explanation for random_page_cost is rather outdated, because it did only for > case of mechanical hdd. But all modern database servers, which I know, made > upon SSD. Do or not do default value for random_page_cost equal to 1 is the > question, but, IMHO, at list in the documentation about random_page_cost > need to add in a speculation about SSD. > > It's important because a business programming now is mostly web programming. > Most database is poorly designed by web programmer, tables looked like a > primary key and a huge json (containing all) with large gin index upon it. > Now I am seeing a table with a GIN index 50% of the table size. The database > is on SSD, of cause. With default random_page_cost=4 GIN index don't used > by planner, but with random_page_cost=1 the result may be not excellent, but > acceptable for web programmers. Does this sentence in the random_page_cost docs unclear or not have enough visibility: https://www.postgresql.org/docs/12/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS Storage that has a low random read cost relative to sequential, e.g. solid-state drives, might also be better modeled with a lower value for random_page_cost. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Yep. Unclear. What parameter is recommended for SSD? Lower? 3? 2? 1? Much better will be write: if you use SSD set 1. Олег > 19 марта 2020 г., в 23:56, Bruce Momjian <bruce@momjian.us> написал(а): > > On Thu, Feb 27, 2020 at 02:48:44PM +0000, PG Doc comments form wrote: >> The following documentation comment has been logged on the website: >> >> Page: https://www.postgresql.org/docs/12/runtime-config-query.html >> Description: >> >> Explanation for random_page_cost is rather outdated, because it did only for >> case of mechanical hdd. But all modern database servers, which I know, made >> upon SSD. Do or not do default value for random_page_cost equal to 1 is the >> question, but, IMHO, at list in the documentation about random_page_cost >> need to add in a speculation about SSD. >> >> It's important because a business programming now is mostly web programming. >> Most database is poorly designed by web programmer, tables looked like a >> primary key and a huge json (containing all) with large gin index upon it. >> Now I am seeing a table with a GIN index 50% of the table size. The database >> is on SSD, of cause. With default random_page_cost=4 GIN index don't used >> by planner, but with random_page_cost=1 the result may be not excellent, but >> acceptable for web programmers. > > Does this sentence in the random_page_cost docs unclear or not have enough > visibility: > > https://www.postgresql.org/docs/12/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS > > Storage that has a low random read cost relative to sequential, e.g. > solid-state drives, might also be better modeled with a lower value for > random_page_cost. > > -- > Bruce Momjian <bruce@momjian.us> https://momjian.us > EnterpriseDB https://enterprisedb.com > > + As you are, so once was I. As I am, so you will be. + > + Ancient Roman grave inscription +
Sorry to hijack the thread, I also recently have similar observation that the statement about random_page_cost on SSD is ambiguous. The current document says that
> Storage that has a low random read cost relative to sequential, e.g. solid-state drives, might also be better modeled with a lower value for random_page_cost.
However, this statement does not clarify what values might be good. For some workload, the default value 4.0 would cause bad performance and lowering random_page_cost to a value 3.0 or 2.0 does not solve the performance problem. Only when the random_page_cost is lowered to below 1.2 will the bad performance be mitigated. Thus, I would suggest elaborating on this description further as:
> Storage that has a low random read cost relative to sequential, e.g. solid-state drives, might also be better modeled with a value that is close to 1 for random_page_cost.
Detail:
I run the PostgreSQL 11 on an SSD hardware. The database has two small tables with 6MB and 16MB separately. The pgbench runs a select join query in 1 min. The result shows that when the random_page_cost is 1, the average latency is 14ms. When the random_page_cost is 1.5, 2, 3 or 4, the average latency is 26ms. This result suggests that setting random_page_cost to a value larger than 1.5 would cause almost 2x latency. If I increase the 6MB table to 60MB and rerun the sysbench, the result shows that when the random_page_cost is 1, the average latency is 13ms. When the random_page_cost is 1.5,2,3 or 4, the average latency is 17ms.
I attached my testing script, the postgresql configuration file, and planner output.
> Storage that has a low random read cost relative to sequential, e.g. solid-state drives, might also be better modeled with a lower value for random_page_cost.
However, this statement does not clarify what values might be good. For some workload, the default value 4.0 would cause bad performance and lowering random_page_cost to a value 3.0 or 2.0 does not solve the performance problem. Only when the random_page_cost is lowered to below 1.2 will the bad performance be mitigated. Thus, I would suggest elaborating on this description further as:
> Storage that has a low random read cost relative to sequential, e.g. solid-state drives, might also be better modeled with a value that is close to 1 for random_page_cost.
Detail:
I run the PostgreSQL 11 on an SSD hardware. The database has two small tables with 6MB and 16MB separately. The pgbench runs a select join query in 1 min. The result shows that when the random_page_cost is 1, the average latency is 14ms. When the random_page_cost is 1.5, 2, 3 or 4, the average latency is 26ms. This result suggests that setting random_page_cost to a value larger than 1.5 would cause almost 2x latency. If I increase the 6MB table to 60MB and rerun the sysbench, the result shows that when the random_page_cost is 1, the average latency is 13ms. When the random_page_cost is 1.5,2,3 or 4, the average latency is 17ms.
I attached my testing script, the postgresql configuration file, and planner output.
On Mon, Apr 27, 2020 at 3:19 AM Олег Самойлов <splarv@ya.ru> wrote:
Yep. Unclear. What parameter is recommended for SSD? Lower? 3? 2? 1?
Much better will be write: if you use SSD set 1.
Олег
> 19 марта 2020 г., в 23:56, Bruce Momjian <bruce@momjian.us> написал(а):
>
> On Thu, Feb 27, 2020 at 02:48:44PM +0000, PG Doc comments form wrote:
>> The following documentation comment has been logged on the website:
>>
>> Page: https://www.postgresql.org/docs/12/runtime-config-query.html
>> Description:
>>
>> Explanation for random_page_cost is rather outdated, because it did only for
>> case of mechanical hdd. But all modern database servers, which I know, made
>> upon SSD. Do or not do default value for random_page_cost equal to 1 is the
>> question, but, IMHO, at list in the documentation about random_page_cost
>> need to add in a speculation about SSD.
>>
>> It's important because a business programming now is mostly web programming.
>> Most database is poorly designed by web programmer, tables looked like a
>> primary key and a huge json (containing all) with large gin index upon it.
>> Now I am seeing a table with a GIN index 50% of the table size. The database
>> is on SSD, of cause. With default random_page_cost=4 GIN index don't used
>> by planner, but with random_page_cost=1 the result may be not excellent, but
>> acceptable for web programmers.
>
> Does this sentence in the random_page_cost docs unclear or not have enough
> visibility:
>
> https://www.postgresql.org/docs/12/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS
>
> Storage that has a low random read cost relative to sequential, e.g.
> solid-state drives, might also be better modeled with a lower value for
> random_page_cost.
>
> --
> Bruce Momjian <bruce@momjian.us> https://momjian.us
> EnterpriseDB https://enterprisedb.com
>
> + As you are, so once was I. As I am, so you will be. +
> + Ancient Roman grave inscription +
Attachment
ne 26. 4. 2020 v 21:25 odesílatel yigong hu <yigongh@gmail.com> napsal:
Sorry to hijack the thread, I also recently have similar observation that the statement about random_page_cost on SSD is ambiguous. The current document says that
> Storage that has a low random read cost relative to sequential, e.g. solid-state drives, might also be better modeled with a lower value for random_page_cost.
However, this statement does not clarify what values might be good. For some workload, the default value 4.0 would cause bad performance and lowering random_page_cost to a value 3.0 or 2.0 does not solve the performance problem. Only when the random_page_cost is lowered to below 1.2 will the bad performance be mitigated. Thus, I would suggest elaborating on this description further as:
> Storage that has a low random read cost relative to sequential, e.g. solid-state drives, might also be better modeled with a value that is close to 1 for random_page_cost.
I depends on estimation. Lot of people use random_page_cost as fix of broken estimation. Then configures this value to some strange values. Lot of other queries with good estimation can be worse then.
Detail:
I run the PostgreSQL 11 on an SSD hardware. The database has two small tables with 6MB and 16MB separately. The pgbench runs a select join query in 1 min. The result shows that when the random_page_cost is 1, the average latency is 14ms. When the random_page_cost is 1.5, 2, 3 or 4, the average latency is 26ms. This result suggests that setting random_page_cost to a value larger than 1.5 would cause almost 2x latency. If I increase the 6MB table to 60MB and rerun the sysbench, the result shows that when the random_page_cost is 1, the average latency is 13ms. When the random_page_cost is 1.5,2,3 or 4, the average latency is 17ms.
I attached my testing script, the postgresql configuration file, and planner output.On Mon, Apr 27, 2020 at 3:19 AM Олег Самойлов <splarv@ya.ru> wrote:Yep. Unclear. What parameter is recommended for SSD? Lower? 3? 2? 1?
Much better will be write: if you use SSD set 1.
Олег
> 19 марта 2020 г., в 23:56, Bruce Momjian <bruce@momjian.us> написал(а):
>
> On Thu, Feb 27, 2020 at 02:48:44PM +0000, PG Doc comments form wrote:
>> The following documentation comment has been logged on the website:
>>
>> Page: https://www.postgresql.org/docs/12/runtime-config-query.html
>> Description:
>>
>> Explanation for random_page_cost is rather outdated, because it did only for
>> case of mechanical hdd. But all modern database servers, which I know, made
>> upon SSD. Do or not do default value for random_page_cost equal to 1 is the
>> question, but, IMHO, at list in the documentation about random_page_cost
>> need to add in a speculation about SSD.
>>
>> It's important because a business programming now is mostly web programming.
>> Most database is poorly designed by web programmer, tables looked like a
>> primary key and a huge json (containing all) with large gin index upon it.
>> Now I am seeing a table with a GIN index 50% of the table size. The database
>> is on SSD, of cause. With default random_page_cost=4 GIN index don't used
>> by planner, but with random_page_cost=1 the result may be not excellent, but
>> acceptable for web programmers.
>
> Does this sentence in the random_page_cost docs unclear or not have enough
> visibility:
>
> https://www.postgresql.org/docs/12/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS
>
> Storage that has a low random read cost relative to sequential, e.g.
> solid-state drives, might also be better modeled with a lower value for
> random_page_cost.
>
> --
> Bruce Momjian <bruce@momjian.us> https://momjian.us
> EnterpriseDB https://enterprisedb.com
>
> + As you are, so once was I. As I am, so you will be. +
> + Ancient Roman grave inscription +
On Mon, Apr 27, 2020 at 06:02:41AM +0200, Pavel Stehule wrote: > > > ne 26. 4. 2020 v 21:25 odesílatel yigong hu <yigongh@gmail.com> napsal: > > Sorry to hijack the thread, I also recently have similar observation that > the statement about random_page_cost on SSD is ambiguous. The current > document says that > > > Storage that has a low random read cost relative to sequential, e.g. > solid-state drives, might also be better modeled with a lower value for > random_page_cost. > > However, this statement does not clarify what values might be good. For > some workload, the default value 4.0 would cause bad performance and > lowering random_page_cost to a value 3.0 or 2.0 does not solve the > performance problem. Only when the random_page_cost is lowered to below 1.2 > will the bad performance be mitigated. Thus, I would suggest elaborating on > this description further as: > > > Storage that has a low random read cost relative to sequential, e.g. > solid-state drives, might also be better modeled with a value that is close > to 1 for random_page_cost. > > > I depends on estimation. Lot of people use random_page_cost as fix of broken > estimation. Then configures this value to some strange values. Lot of other > queries with good estimation can be worse then. I have been recommending 1.1 as a value for random_page_cost for SSDs for years, and I think it would be helpful to suggest that value, so doc patch attached. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Attachment
Yes, I saw recommendations for 1.1 early, but why? Why such exactly precision number, why 1.1? Is here ever a theoreticalor experimental prof? As for me, random_page_cost depended not only not characteristic of a storage device (hdd or ssd), but also on assumptionsabout how much of the database is in memory cache (90% by default). And this is a very rough assumption (of causein ideal whole database must fit in the memory cache). And so I don't see any reason to recommend exactly value 1.1, simple 1 is good too, especially for an ideal server with hugememory cache. > 27 апр. 2020 г., в 19:16, Bruce Momjian <bruce@momjian.us> написал(а): > > I have been recommending 1.1 as a value for random_page_cost for SSDs > for years, and I think it would be helpful to suggest that value, so doc > patch attached. > > -- > Bruce Momjian <bruce@momjian.us> https://momjian.us > EnterpriseDB https://enterprisedb.com > > + As you are, so once was I. As I am, so you will be. + > + Ancient Roman grave inscription + > <random.diff>
On Mon, Apr 27, 2020 at 07:24:45PM +0300, Олег Самойлов wrote: > Yes, I saw recommendations for 1.1 early, but why? Why such exactly > precision number, why 1.1? Is here ever a theoretical or experimental > prof? Well, SSD random performance is slightly slower than sequential, so the value should be slighly larger than 1.1. Clearly 2.0 or higher is not recommended. Could it be 1.05 or 1.01 or 1.15? Yeah, but probably giving 1.1 is at least a good suggestion to start with. > As for me, random_page_cost depended not only not characteristic of a > storage device (hdd or ssd), but also on assumptions about how much of > the database is in memory cache (90% by default). And this is a very > rough assumption (of cause in ideal whole database must fit in the > memory cache). True. > And so I don't see any reason to recommend exactly value 1.1, simple 1 > is good too, especially for an ideal server with huge memory cache. Uh, well, 1.0 is fine for non-SSDs too if all the data is in cache, and there are no database writes. --------------------------------------------------------------------------- > > > > 27 апр. 2020 г., в 19:16, Bruce Momjian <bruce@momjian.us> > > написал(а): > > > > I have been recommending 1.1 as a value for random_page_cost for > > SSDs for years, and I think it would be helpful to suggest that > > value, so doc patch attached. > > > > -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB > > https://enterprisedb.com > > > > + As you are, so once was I. As I am, so you will be. + + Ancient > > Roman grave inscription + <random.diff> > -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On Mon, Apr 27, 2020 at 12:16:30PM -0400, Bruce Momjian wrote: > On Mon, Apr 27, 2020 at 06:02:41AM +0200, Pavel Stehule wrote: > > > Storage that has a low random read cost relative to sequential, e.g. > > solid-state drives, might also be better modeled with a value that is close > > to 1 for random_page_cost. > > > > > > I depends on estimation. Lot of people use random_page_cost as fix of broken > > estimation. Then configures this value to some strange values. Lot of other > > queries with good estimation can be worse then. > > I have been recommending 1.1 as a value for random_page_cost for SSDs > for years, and I think it would be helpful to suggest that value, so doc > patch attached. Patch applied back through 9.5. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +