Thread: Doc updates for index cost estimator change
I have written the attached bit of doco about the new index cost estimator procedure definition, but I am not sure where to put it. There isn't (AFAICT) any existing documentation about how to make a new kind of index, which would be the proper place for it. May I impose on you to find/make a place for this and mark it up properly? Also, doc/src/graphics/catalogs.ag needs to be updated, but I have no idea how. (The amopselect and amopnpages fields of pg_amop are gone; pg_am has a new field amcostestimate.) regards, tom lane Index cost estimation functions ------------------------------- Every index access method must provide a cost estimation function for use by the planner/optimizer. The procedure OID of this function is given in the 'amcostestimate' field of the access method's pg_am entry. (Note: prior to Postgres 7.0, a different scheme was used for registering index-specific cost estimation functions.) The amcostestimate function is given a list of WHERE clauses that have been determined to be usable with the index. It must return estimates of the cost of accessing the index and the selectivity of the WHERE clauses (that is, the fraction of main-table tuples that will be retrieved during the index scan). For simple cases, nearly all the work of the cost estimator can be done by calling standard routines in the optimizer --- the point of having an amcostestimate function is to allow index access methods to provide index-type-specific knowledge, in case it is possible to improve on the standard estimates. Each amcostestimate function must have the signature void amcostestimate (Query *root, RelOptInfo *rel, IndexOptInfo *index, List *indexQuals, Cost *indexAccessCost, Selectivity *indexSelectivity); The first four parameters are inputs: root The query being processedrel The relation the index is onindex The index itselfindexQuals Listof index qual clauses (implicitly ANDed); NIL list indicates no qualifiers are available The last two parameters are pass-by-reference outputs: *indexAccessCost Set to cost of index processing*indexSelectivity Set to index selectivity Note that cost estimate functions must be written in C, not in SQL or any available procedural language, because they must access internal data structures of the planner/optimizer. The indexAccessCost should be computed in the units used by src/backend/optimizer/path/costsize.c: a disk block fetch has cost 1.0, and the cost of processing one index tuple should usually be taken as cpu_index_page_weight (which is a user-adjustable optimizer parameter). The access cost should include all disk and CPU costs associated with scanning the index itself, but NOT the cost of retrieving or processing the main-table tuples that are identified by the index. The indexSelectivity should be set to the estimated fraction of the main table tuples that will be retrieved during the index scan. In the case of a lossy index, this will typically be higher than the fraction of tuples that actually pass the given qual conditions. A typical cost estimator will proceed as follows: 1. Estimate and return the fraction of main-table tuples that will be visited based on the given qual conditions. In the absence of any index-type-specific knowledge, use the standard optimizer function clauselist_selec(): *indexSelectivity = clauselist_selec(root, indexQuals); 2. Estimate the number of index tuples that will be visited during the scan. For many index types this is the same as indexSelectivity times the number of tuples in the index, but it might be more. (Note that the index's size in pages and tuples is available from the IndexOptInfo struct.) 3. Estimate the number of index pages that will be retrieved during the scan. This might be just indexSelectivity times the index's size in pages. 4. Compute the index access cost as *indexAccessCost = numIndexPages + cpu_index_page_weight * numIndexTuples; Examples of cost estimator functions can be found in src/backend/utils/adt/selfuncs.c. By convention, the pg_proc entry for an amcostestimate function should show prorettype = 0 pronargs = 6 proargtypes = 0 0 0 0 0 0 We use zero ("opaque") for all the arguments since none of them have types that are known in pg_type.
> There isn't (AFAICT) any existing documentation about how to make > a new kind of index, which would be the proper place for it. > May I impose on you to find/make a place for this and mark it up > properly? OK. > Also, doc/src/graphics/catalogs.ag needs to be updated, but I have > no idea how. (The amopselect and amopnpages fields of pg_amop > are gone; pg_am has a new field amcostestimate.) The primary graphics files should be .cgm files, but I see that we only have those for one or two cases. The .ag files are Applix Graphics files, which I use to embed the stuff into hardcopy. I'll see about updating them, and committing a .cgm version of each. - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: > The primary graphics files should be .cgm files, but I see that we > only have those for one or two cases. The .ag files are Applix > Graphics files, which I use to embed the stuff into hardcopy. > > I'll see about updating them, and committing a .cgm version of each. If the .ag files are derived files, meseems they should not be in the CVS tree at all? regards, tom lane
> > The primary graphics files should be .cgm files, but I see that we > > only have those for one or two cases. The .ag files are Applix > > Graphics files, which I use to embed the stuff into hardcopy. > > I'll see about updating them, and committing a .cgm version of each. > If the .ag files are derived files, meseems they should not be in > the CVS tree at all? Easy for you to say, you're not formatting the hardcopy :) Seriously, I'm not certain that we can use .cgm format as a lossless way for me to get figures into the hardcopy. But it seems like it should work; my recollection is that Hannu helped test out formats with me and we settled on .cgm as the most likely to succeed. As soon as we have a complete set of .cgm files, and as soon as I've tried generating hardcopy and jpegs or gifs for the .html, then we can deep-six the .ag versions. In the meantime, it makes it easier on me if we keep carrying those files in cvs. - Thomas -- Thomas Lockhart lockhart@alumni.caltech.edu South Pasadena, California
Has this been dealt with? > I have written the attached bit of doco about the new index cost > estimator procedure definition, but I am not sure where to put it. > There isn't (AFAICT) any existing documentation about how to make > a new kind of index, which would be the proper place for it. > May I impose on you to find/make a place for this and mark it up > properly? > > Also, doc/src/graphics/catalogs.ag needs to be updated, but I have > no idea how. (The amopselect and amopnpages fields of pg_amop > are gone; pg_am has a new field amcostestimate.) > > regards, tom lane > > > Index cost estimation functions > ------------------------------- > > Every index access method must provide a cost estimation function for > use by the planner/optimizer. The procedure OID of this function is > given in the 'amcostestimate' field of the access method's pg_am entry. > (Note: prior to Postgres 7.0, a different scheme was used for registering > index-specific cost estimation functions.) > > The amcostestimate function is given a list of WHERE clauses that have > been determined to be usable with the index. It must return estimates > of the cost of accessing the index and the selectivity of the WHERE > clauses (that is, the fraction of main-table tuples that will be > retrieved during the index scan). For simple cases, nearly all the > work of the cost estimator can be done by calling standard routines > in the optimizer --- the point of having an amcostestimate function is > to allow index access methods to provide index-type-specific knowledge, > in case it is possible to improve on the standard estimates. > > Each amcostestimate function must have the signature > > void > amcostestimate (Query *root, > RelOptInfo *rel, > IndexOptInfo *index, > List *indexQuals, > Cost *indexAccessCost, > Selectivity *indexSelectivity); > > The first four parameters are inputs: > > root The query being processed > rel The relation the index is on > index The index itself > indexQuals List of index qual clauses (implicitly ANDed); > NIL list indicates no qualifiers are available > > The last two parameters are pass-by-reference outputs: > > *indexAccessCost Set to cost of index processing > *indexSelectivity Set to index selectivity > > Note that cost estimate functions must be written in C, not in SQL or > any available procedural language, because they must access internal > data structures of the planner/optimizer. > > The indexAccessCost should be computed in the units used by > src/backend/optimizer/path/costsize.c: a disk block fetch has cost 1.0, > and the cost of processing one index tuple should usually be taken as > cpu_index_page_weight (which is a user-adjustable optimizer parameter). > The access cost should include all disk and CPU costs associated with > scanning the index itself, but NOT the cost of retrieving or processing > the main-table tuples that are identified by the index. > > The indexSelectivity should be set to the estimated fraction of the main > table tuples that will be retrieved during the index scan. In the case > of a lossy index, this will typically be higher than the fraction of > tuples that actually pass the given qual conditions. > > A typical cost estimator will proceed as follows: > > 1. Estimate and return the fraction of main-table tuples that will be visited > based on the given qual conditions. In the absence of any index-type-specific > knowledge, use the standard optimizer function clauselist_selec(): > > *indexSelectivity = clauselist_selec(root, indexQuals); > > 2. Estimate the number of index tuples that will be visited during the > scan. For many index types this is the same as indexSelectivity times > the number of tuples in the index, but it might be more. (Note that the > index's size in pages and tuples is available from the IndexOptInfo struct.) > > 3. Estimate the number of index pages that will be retrieved during the scan. > This might be just indexSelectivity times the index's size in pages. > > 4. Compute the index access cost as > > *indexAccessCost = numIndexPages + cpu_index_page_weight * numIndexTuples; > > Examples of cost estimator functions can be found in > src/backend/utils/adt/selfuncs.c. > > By convention, the pg_proc entry for an amcostestimate function should show > > prorettype = 0 > pronargs = 6 > proargtypes = 0 0 0 0 0 0 > > We use zero ("opaque") for all the arguments since none of them have types > that are known in pg_type. > > ************ > -- Bruce Momjian | http://www.op.net/~candle pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: > Has this been dealt with? >> I have written the attached bit of doco about the new index cost >> estimator procedure definition, but I am not sure where to put it. Yes, that's chapter 44 these days. >> Also, doc/src/graphics/catalogs.ag needs to be updated, but I have >> no idea how. (The amopselect and amopnpages fields of pg_amop >> are gone; pg_am has a new field amcostestimate.) This part is not done, though --- we still seem to have the old fields in the catalogs graphic. regards, tom lane
> >> Also, doc/src/graphics/catalogs.ag needs to be updated, but I have > >> no idea how. (The amopselect and amopnpages fields of pg_amop > >> are gone; pg_am has a new field amcostestimate.) > This part is not done, though --- we still seem to have the old fields > in the catalogs graphic. I updated the graphic at least a little bit (the original Applix version anyway; not sure if it updated the .gif now that I think about it). But I suspect that I didn't fix it up enough. - Thomas