Re: Optimizer confusion? - Mailing list pgsql-hackers
From | Philip Warner |
---|---|
Subject | Re: Optimizer confusion? |
Date | |
Msg-id | 3.0.5.32.20000813224119.02203d70@mail.rhyme.com.au Whole thread Raw |
In response to | Re: Optimizer confusion? (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Optimizer confusion?
|
List | pgsql-hackers |
At 13:45 12/08/00 -0400, Tom Lane wrote: >> But this (I think) just highlights the >> fact that the index is sorted by date, and the rows were added in date >> order. As a result (for this table, in this query), the index scan get's a >> much better cache-hit rate, so the actual IO cost is low. > >> Does that sound reasonable? > >Quite. The cost estimates are based on the assumption that the tuples >visited by an indexscan are scattered randomly throughout the table. Interestingly, while testing a truly random index on a table with 4M rows, the index estimates are actually way too optimistic (contrary to my other example), even for a small retrieval. I'm still playing, but I'll send some figures soon. > >> Does the optimizer know if I have used clustering? > >The killer implementation problem here is keeping track of how much the >table ordering has been altered since the last CLUSTER command. We have >talked about using an assumption of "once clustered, always clustered", This *might* be appropriate to set as an index attribute of some kind, most particularly for time-series indexes etc (as you suggest). >I have not yet done anything about this, mainly because I'm unwilling to >encourage people to use CLUSTER, since it's so far from being ready for >prime time (see TODO list). Once we've done something about table >versioning, we can rewrite CLUSTER so that it's actually reasonable to >use on a regular basis, and at that point it'd make sense to make the >optimizer CLUSTER-aware. There might be a way to side-step the issue here. I assume that the index nodes contain a pointer to a record in a file, which has some kind of file position. By comparing the file positions on one leaf node, and then averaging the node cluster values, you might be able to get a pretty good idea of the *real* clustering. Does this sound worthwhile? It has the advantage of working for all tables, and is presumably updated by Vacuum. >> I don't suppose I can get the backend to tell me how many logical IOs and >> how much CPU it used? > >Yes you can. Run psql with > PGOPTIONS="-s" >and look in the postmaster log. There's also -tparse, -tplan, >-texec if you'd rather see the query time broken down by stages. Thanks for this; I see almost no file IO, but lots of paging; is this a feature of the way Linux does file buffering? ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.B.N. 75 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
pgsql-hackers by date: