Re: Optimizer confusion? - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Optimizer confusion? |
Date | |
Msg-id | 8209.966175378@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Optimizer confusion? (Philip Warner <pjw@rhyme.com.au>) |
Responses |
Re: Optimizer confusion?
|
List | pgsql-hackers |
Philip Warner <pjw@rhyme.com.au> writes: > There might be a way to side-step the issue here. I assume that the index > nodes contain a pointer to a record in a file, which has some kind of file > position. By comparing the file positions on one leaf node, and then > averaging the node cluster values, you might be able to get a pretty good > idea of the *real* clustering. Hmm. I have been thinking that an easier way of gathering statistics for the optimizer (on a column for which there is a btree index) is to scan the index sequentially. This makes it trivial to determine the column min, max, and most common value, whereas right now we have very little chance of getting accurate MCV stats if there are more than a few distinct values. If we do that we could also calculate some statistic about how well-ordered the pointers to main-table tuples are. The nifty thing about doing this during ANALYZE is that you'd only have to read the index, not the main table, so it should be reasonably quick. In most contexts that would be tres uncool because you'd not be able to tell index entries for deleted tuples from those for live tuples --- but for ANALYZE I think it'd be perfectly acceptable to just count 'em all. Indeed one could argue that it's *more* accurate to include the deleted index entries than not, since they'll still provoke main-table accesses when scanned, which is exactly the thing we're trying to estimate. >>> I don't suppose I can get the backend to tell me how many logical IOs and >>> how much CPU it used? >> >> Yes you can. Run psql with >> PGOPTIONS="-s" >> and look in the postmaster log. There's also -tparse, -tplan, >> -texec if you'd rather see the query time broken down by stages. > Thanks for this; I see almost no file IO, but lots of paging; is this a > feature of the way Linux does file buffering? Could be. IIRC, it's possible to tell from the stats how many page accesses are short-circuited by Postgres' own disk buffers (vs being given to the kernel) and at least on HPUX it's also possible to tell how many of the kernel requests actually resulted in physical reads (vs being satisfied out of kernel disk buffers). But it takes a certain amount of reading between the lines 'cause the numbers aren't real well labeled. Dunno about how it works on Linux --- comments anyone? regards, tom lane PS: I am leaving town in an hour to go to LinuxWorld. Will be seeing email erratically if at all this week, so don't be surprised at lack of response. Any of y'all planning to be at LinuxWorld, don't forget to stop by the Great Bridge booth!
pgsql-hackers by date: