Re: Initial prefetch performance testing - Mailing list pgsql-hackers
From | Gregory Stark |
---|---|
Subject | Re: Initial prefetch performance testing |
Date | |
Msg-id | 873ajsx7zq.fsf@oxford.xeocode.com Whole thread Raw |
In response to | Re: Initial prefetch performance testing (Ron Mayer <rm_pg@cheapcomplexdevices.com>) |
Responses |
Re: Initial prefetch performance testing
Re: Initial prefetch performance testing |
List | pgsql-hackers |
Ron Mayer <rm_pg@cheapcomplexdevices.com> writes: > For example, on our sites hosted with Amazon's compute cloud (a great > place to host web sites), I know nothing about spindles, but know > about Amazon Elastic Block Store[2]'s and Instance Store's[1]. I > have some specs and are able to run benchmarks on them; but couldn't > guess how many spindles my X% of the N-disk device that corresponds > to. Well I don't see how you're going to guess how much prefetching is optimal for those environments either... > For another example, some of our salesguys with SSD drives > have 0 spindles on their demo machines. Sounds to me like you're finding it pretty intuitive. Actually you would want "1" because it can handle one request at a time. Actually if you have a multipath array I imagine you would want to think of each interface as a spindle because that's the bottleneck and you'll want to keep all the interfaces busy. > I'd rather a parameter that expressed things more in terms of > measurable quantities -- perhaps seeks/second? perhaps > random-access/sequential-access times? Well that's precisely what I'm saying. Simon et al want a parameter to control how much prefetching to do. That's *not* a measurable quantity. I'm suggesting effective_spindle_count which *is* a measurable quantity even if it might be a bit harder to measure in some environments than others. The two other quantities you describe are both currently represented by our random_page_cost (or random_page_cost/sequential_page_cost). What we're dealing with now is an entirely orthogonal property of your system: how many concurrent requests can the system handle. If you have ten spindles then you really want to send enough requests to ensure there are ten concurrent requests being processed on ten different drives (assuming you want each scan to make maximum use of the resources which is primarily true in DSS but might not be true in OLTP). That's a lot more than ten requests though because if you sent ten requests many of them would end up on the same devices. In theory my logic led me to think for ten drives it would be about 30. Experiments seem to show it's more like 300-400. That discrepancy might be a reason to put this debate aside for now anywaysand expose the internal implementation until we understand better what's going on there. Ironically I'm pretty happy to lose this argument because EDB is interested in rolling this into its dynamic tuning module. If there's a consensus -- by my count three people have spoken up already which is more than usual -- then I'll gladly concede. Anyone object to going back to preread_pages? Or should it be prefetch_pages? prefetch_blocks? Something else? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support!
pgsql-hackers by date: