Home > mailing lists

Re: Optimizer confusion? - Mailing list pgsql-hackers

From	Philip Warner
Subject	Re: Optimizer confusion?
Date	August 13, 2000 08:41:05
Msg-id	3.0.5.32.20000813224119.02203d70@mail.rhyme.com.au Whole thread Raw
In response to	Re: Optimizer confusion? (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Optimizer confusion?
List	pgsql-hackers

Tree view

At 13:45 12/08/00 -0400, Tom Lane wrote:
>> But this (I think) just highlights the
>> fact that the index is sorted by date, and the rows were added in date
>> order. As a result (for this table, in this query), the index scan get's a
>> much better cache-hit rate, so the actual IO cost is low. 
>
>> Does that sound reasonable?
>
>Quite.  The cost estimates are based on the assumption that the tuples
>visited by an indexscan are scattered randomly throughout the table.

Interestingly, while testing a truly random index on a table with 4M rows,
the index estimates are actually way too optimistic (contrary to my other
example), even for a small retrieval. I'm still playing, but I'll send some
figures soon.

>
>> Does the optimizer know if I have used clustering?
>
>The killer implementation problem here is keeping track of how much the
>table ordering has been altered since the last CLUSTER command.  We have
>talked about using an assumption of "once clustered, always clustered",

This *might* be appropriate to set as an index attribute of some kind, most
particularly for time-series indexes etc (as you suggest).

>I have not yet done anything about this, mainly because I'm unwilling to
>encourage people to use CLUSTER, since it's so far from being ready for
>prime time (see TODO list).  Once we've done something about table
>versioning, we can rewrite CLUSTER so that it's actually reasonable to
>use on a regular basis, and at that point it'd make sense to make the
>optimizer CLUSTER-aware.

There might be a way to side-step the issue here. I assume that the index
nodes contain a pointer to a record in a file, which has some kind of file
position. By comparing the file positions on one leaf node, and then
averaging the node cluster values, you might be able to get a pretty good
idea of the *real* clustering.

Does this sound worthwhile?

It has the advantage of working for all tables, and is presumably updated
by Vacuum.

>> I don't suppose I can get the backend to tell me how many logical IOs and
>> how much CPU it used?
>
>Yes you can.  Run psql with
>    PGOPTIONS="-s"
>and look in the postmaster log.  There's also -tparse, -tplan,
>-texec if you'd rather see the query time broken down by stages.

Thanks for this; I see almost no file IO, but lots of paging; is this a
feature of the way Linux does file buffering?

----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

pgsql-hackers by date:

From: Tom Lane
Date: 12 August 2000, 13:46:27
Subject: Re: Optimizer confusion?

From: Tom Lane
Date: 13 August 2000, 10:03:21
Subject: Re: Optimizer confusion?

Re: Optimizer confusion? - Mailing list pgsql-hackers

Previous

Next