Home > mailing lists

Re: Optimizer confusion? - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Optimizer confusion?
Date	August 13, 2000 10:03:21
Msg-id	8209.966175378@sss.pgh.pa.us Whole thread Raw
In response to	Re: Optimizer confusion? (Philip Warner <pjw@rhyme.com.au>)
Responses	Re: Optimizer confusion?
List	pgsql-hackers

Tree view

Philip Warner <pjw@rhyme.com.au> writes:
> There might be a way to side-step the issue here. I assume that the index
> nodes contain a pointer to a record in a file, which has some kind of file
> position. By comparing the file positions on one leaf node, and then
> averaging the node cluster values, you might be able to get a pretty good
> idea of the *real* clustering.

Hmm.  I have been thinking that an easier way of gathering statistics
for the optimizer (on a column for which there is a btree index) is to
scan the index sequentially.  This makes it trivial to determine the
column min, max, and most common value, whereas right now we have very
little chance of getting accurate MCV stats if there are more than a
few distinct values.  If we do that we could also calculate some
statistic about how well-ordered the pointers to main-table tuples are.

The nifty thing about doing this during ANALYZE is that you'd only have
to read the index, not the main table, so it should be reasonably quick.
In most contexts that would be tres uncool because you'd not be able to
tell index entries for deleted tuples from those for live tuples --- but
for ANALYZE I think it'd be perfectly acceptable to just count 'em all.
Indeed one could argue that it's *more* accurate to include the deleted
index entries than not, since they'll still provoke main-table accesses
when scanned, which is exactly the thing we're trying to estimate.

>>> I don't suppose I can get the backend to tell me how many logical IOs and
>>> how much CPU it used?
>> 
>> Yes you can.  Run psql with
>> PGOPTIONS="-s"
>> and look in the postmaster log.  There's also -tparse, -tplan,
>> -texec if you'd rather see the query time broken down by stages.

> Thanks for this; I see almost no file IO, but lots of paging; is this a
> feature of the way Linux does file buffering?

Could be.  IIRC, it's possible to tell from the stats how many page
accesses are short-circuited by Postgres' own disk buffers (vs being
given to the kernel) and at least on HPUX it's also possible to tell
how many of the kernel requests actually resulted in physical reads
(vs being satisfied out of kernel disk buffers).  But it takes a certain
amount of reading between the lines 'cause the numbers aren't real well
labeled.  Dunno about how it works on Linux --- comments anyone?
        regards, tom lane

PS: I am leaving town in an hour to go to LinuxWorld.  Will be seeing
email erratically if at all this week, so don't be surprised at lack of
response.  Any of y'all planning to be at LinuxWorld, don't forget to
stop by the Great Bridge booth!

pgsql-hackers by date:

From: Philip Warner
Date: 13 August 2000, 08:41:05
Subject: Re: Optimizer confusion?

From: Philip Warner
Date: 13 August 2000, 21:43:26
Subject: Re: Optimizer confusion?

Re: Optimizer confusion? - Mailing list pgsql-hackers

Previous

Next