Thread: Rapid Seek Devices (feature request)
Hi all I have some idea/feature request. Now, there are several devices available, that can be called "rapid seek devices" (RSD in future text). I mean SSD disks, some devices like gigabyte I-RAM and other (semi)profesional "ram disk like" solutions for example Acard ANS-9010 . Rapid seek because there are not moving parts, thus non-sequential access (seeks) have no penalty (no moving heads time). I think it would be cool to have explicitly support for that kind of devices. 1: Postgresql has its tablespaces, so creating a tablespce on RSD and put some speed critical indexes/tables there, is optimal from the costs point (RSD are not cheap, so use it for the speed critical tasks within single database is reasonable) but it is only a half of the needed. Query planner act strongly "pro sequentional reads" while it try to guess best query plan. This bias is fine for classic disk based storage, but is not necesery in RSD. So, for tables/indexes on RSD, a suboptimal query plan is likely to be generated (fix me if I am wrong). So I suggest we should have "random_page_cost" and "Sequential_page_cost" configurable on per tablespace basis. And query planner that is aware of it, include situation where tables and corresponding indexes are on different speed tablespaces. Imagine this scenario: Default tablespace on sata disk, random_page_cost=8, sequential_page_cost=3 (from config file), CREATE TABLESPACE fast /path/to/multiple_15k_rpm_SCSI_hw_RAID random_page_cost=4 sequential_page_cost=2 CREATE TABLESPACE lightspeed /path/to/SSD_OR_I-RAM random_page_cost=1 sequential_page_cost=1.2 and now scatter your tables / indexes around new tablespaces :-) 2: Many of that RSD devices are not so much reliable (power outage in ramdisk, weak auxillarity battery in i-ram like devices, block "wear out" in SSD). While moving only an indexes to this device ( I found this article showing there IS a big difference having only an indexes on SSD - http://linux.com/archive/feature/142658.) may be appropriate, and just "reindex" in worst case, this is not suitable in high availability enviroment. So I suggest to have something like this to solve reliability problems on some RSD: CREATE TABLESPACE lightspeed /path/to/SSD_OR_i-RAM random_page_cost=1 sequential_page_cost=1.2 TEE name_of_the_slow_tablespace ie read from fast tablespace, write to both fast and slow, reconstruct fast from slow if appropriate. Thanx for your attention. PS: Execuse my wrong english
On 16/08/2009 9:06 PM, NTPT wrote: > So I suggest we should have "random_page_cost" and > "Sequential_page_cost" configurable on per tablespace basis. That strikes me as a REALLY good idea, personally, though I don't know enough about the planner to factor in implementation practicalities and any cost for people _not_ using the feature. > 2: Many of that RSD devices are not so much reliable (power outage in > ramdisk, weak auxillarity battery in i-ram like devices, block "wear > out" in SSD). [snip] > ie read from fast tablespace, write to both fast and slow, reconstruct > fast from slow if appropriate. This can probably be done as well or better at the OS block layer, using device-mapper or the `md' driver (on Linux). What'd be interesting, though, would be if Pg had support for auto-rebuilding indexes and (if/when explicit support is added) materialized views if it finds the backing files are missing. This would be helpful for such transient devices as RAM disks if you didn't want to bother having physical storage backing it. -- Craig Ringer
On 08/17/2009 03:24 AM, Craig Ringer wrote: > On 16/08/2009 9:06 PM, NTPT wrote: >> So I suggest we should have "random_page_cost" and >> "Sequential_page_cost" configurable on per tablespace basis. > > That strikes me as a REALLY good idea, personally, though I don't know > enough about the planner to factor in implementation practicalities and > any cost for people _not_ using the feature. Could not pgsql *measure* these costs (on a sampling basis, and with long time-constants)? - Jeremy
AFAIK postgresql measure characteristic of the data distribution in the tables and indexes (that is what vacuum ANALYSE does) , but results of that measures are **weighted by** random_page_cost and sequential_page_cost. So measurements are correct, but costs (weight) should reflect a real speed for sequentional and random operation of the storage device(s) (tablespaces) involved. Jeremy Harris napsal(a): > On 08/17/2009 03:24 AM, Craig Ringer wrote: >> On 16/08/2009 9:06 PM, NTPT wrote: >>> So I suggest we should have "random_page_cost" and >>> "Sequential_page_cost" configurable on per tablespace basis. >> >> That strikes me as a REALLY good idea, personally, though I don't know >> enough about the planner to factor in implementation practicalities and >> any cost for people _not_ using the feature. > > Could not pgsql *measure* these costs (on a sampling basis, and with long > time-constants)? > > - Jeremy >
2009/8/17 Jeremy Harris <jgh@wizmail.org>: > Could not pgsql *measure* these costs (on a sampling basis, and with long > time-constants)? In theory, sure. In practice, well, there are some engineering challenges to solve. 1) The cost model isn't perfect so the it's not clear exactly what to measure to get the best result. As the cost model stands now you would have to include some of the cpu time as well as the i/o time. 2) Timing is expensive. On many platforms getting the current time takes a long time and would make a big difference to the runtime of the query. -- greg http://mit.edu/~gsstark/resume.pdf