Re: Sequential Scan Read-Ahead - Mailing list pgsql-hackers
From | Curt Sampson |
---|---|
Subject | Re: Sequential Scan Read-Ahead |
Date | |
Msg-id | Pine.NEB.4.43.0204251118040.445-100000@angelic.cynic.net Whole thread Raw |
In response to | Re: Sequential Scan Read-Ahead (Bruce Momjian <pgman@candle.pha.pa.us>) |
Responses |
Re: Sequential Scan Read-Ahead
Re: Sequential Scan Read-Ahead Re: Sequential Scan Read-Ahead |
List | pgsql-hackers |
On Wed, 24 Apr 2002, Bruce Momjian wrote: > > 1. Not all systems do readahead. > > If they don't, that isn't our problem. We expect it to be there, and if > it isn't, the vendor/kernel is at fault. It is your problem when another database kicks Postgres' ass performance-wise. And at that point, *you're* at fault. You're the one who's knowingly decided to do things inefficiently. Sorry if this sounds harsh, but this, "Oh, someone else is to blame" attitude gets me steamed. It's one thing to say, "We don't support this." That's fine; there are often good reasons for that. It's a completely different thing to say, "It's an unrelated entity's fault we don't support this." At any rate, relying on the kernel to guess how to optimise for the workload will never work as well as well as the software that knows the workload doing the optimization. The lack of support thing is no joke. Sure, lots of systems nowadays support unified buffer cache and read-ahead. But how many, besides Solaris, support free-behind, which is also very important to avoid blowing out your buffer cache when doing sequential reads? And who at all supports read-ahead for reverse scans? (Or does Postgres not do those, anyway? I can see the support is there.) And even when the facilities are there, you create problems by using them. Look at the OS buffer cache, for example. Not only do we lose efficiency by using two layers of caching, but (as people have pointed out recently on the lists), the optimizer can't even know how much or what is being cached, and thus can't make decisions based on that. > Yes, seek() in file will turn off read-ahead. Grabbing bigger chunks > would help here, but if you have two people already reading from the > same file, grabbing bigger chunks of the file may not be optimal. Grabbing bigger chunks is always optimal, AFICT, if they're not *too* big and you use the data. A single 64K read takes very little longer than a single 8K read. > > 3. Even when the read-ahead does occur, you're still doing more > > syscalls, and thus more expensive kernel/userland transitions, than > > you have to. > > I would guess the performance impact is minimal. If it were minimal, people wouldn't work so hard to build multi-level thread systems, where multiple userland threads are scheduled on top of kernel threads. However, it does depend on how much CPU your particular application is using. You may have it to spare. > http://candle.pha.pa.us/mhonarc/todo.detail/performance/msg00009.html Well, this message has some points in it that I feel are just incorrect. 1. It is *not* true that you have no idea where data is when using a storage array or other similar system. While you certainly ought not worry about things such as head positions and so on, it's been a given for a long, long time thattwo blocks that have close index numbers are going to be close together in physical storage. 2. Raw devices are quite standard across Unix systems (except in the unfortunate case of Linux, which I think has been remedied, hasn't it?). They're very portable, and have just as well--if not better--defined write semantics as afilesystem. 3. My observations of OS performance tuning over the past six or eight years contradict the statement, "There's a considerable cost in complexity and code in using "raw" storage too, and it's not a one off cost: as the technologieschange, the "fast" way to do things will change and the code will have to be updated to match." While optimizationshave been removed over the years the basic optimizations (order reads by block number, do larger reads ratherthan smaller, cache the data) have remained unchanged for a long, long time. 4. "Better to leave this to the OS vendor where possible, and take advantage of the tuning they do." Well, sorry guys,but have a look at the tuning they do. It hasn't changed in years, except to remove now-unnecessary complexity realatedto really, really old and slow disk devices, and to add a few thing that guess workload but still do a worsejob than if the workload generator just did its own optimisations in the first place. > http://candle.pha.pa.us/mhonarc/todo.detail/optimizer/msg00011.html Well, this one, with statements like "Postgres does have control over its buffer cache," I don't know what to say. You can interpret the statement however you like, but in the end Postgres very little control at all over how data is moved between memory and disk. BTW, please don't take me as saying that all control over physical IO should be done by Postgres. I just think that Posgres could do a better job of managing data transfer between disk and memory than the OS can. The rest of the things (using raw paritions, read-ahead, free-behind, etc.) just drop out of that one idea. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're alllight. --XTC
pgsql-hackers by date: