Thread: Enhanced containment selectivity function
Hi, I've recently had problems with slow queries caused by the selectivity of the <@ ltree operator, as you may see in my post here: http://archives.postgresql.org/pgsql-performance/2005-07/msg00473.php Someone on IRC (AndrewSN if I'm not wrong) pointed out that the restriction selectivity function for <@ is contsel, which returns a constant value of 0.001. So I started digging in the source code trying to understand how the default behaviour could be enhanced, and ended up writing a little patch which adds an alternative containment selectivity function (called "contstatsel") which is able to deliver better results. This first version is based on the eqsel function and uses only histogram values to calculate the selectivity and uses the 0.001 constant as a fallback. This also made me think: is there a reason why geometric selectivity functions return constant values rather than checking statistics for a better result? Attached you will find a patch suitable for current CVS HEAD. My C skills are a bit rusty and my knowledge of pg internals are very poor, so I'm sure it could be improved and modified to better fit the pg coding standards. Here are the results on a slow query: test=# EXPLAIN ANALYZE SELECT * FROM gw_users JOIN gw_batches USING (u_id) WHERE tree <@ '1041' AND t_stamp > '2005-07-01'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=0.00..553.02 rows=8 width=364) (actual time=2.423..19787.259 rows=6785 loops=1) -> Index Scan using gw_users_gisttree_key on gw_users (cost=0.00..21.63 rows=5 width=156) (actual time=0.882..107.434 rows=4696 loops=1) Index Cond: (tree <@ '1041'::ltree) -> Index Scan using gw_batches_t_stamp_u_id_key on gw_batches (cost=0.00..106.09 rows=15 width=212) (actual time=3.898..4.171 rows=1 loops=4696) Index Cond: ((gw_batches.t_stamp > '2005-07-01 00:00:00+02'::timestamp with time zone) AND ("outer".u_id = gw_batches.u_id)) Total runtime: 19805.447 ms (6 rows) test=# EXPLAIN ANALYZE SELECT * FROM gw_users JOIN gw_batches USING (u_id) WHERE tree <<@ '1041' AND t_stamp > '2005-07-01'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=245.26..1151.80 rows=7671 width=364) (actual time=69.562..176.966 rows=6785 loops=1) Hash Cond: ("outer".u_id = "inner".u_id) -> Bitmap Heap Scan on gw_batches (cost=57.74..764.39 rows=8212 width=212) (actual time=8.330..39.542 rows=7819 loops=1) Recheck Cond: (t_stamp > '2005-07-01 00:00:00+02'::timestamp with time zone) -> Bitmap Index Scan on gw_batches_t_stamp_u_id_key (cost=0.00..57.74 rows=8212 width=0) (actual time=8.120..8.120 rows=7819 loops=1) Index Cond: (t_stamp > '2005-07-01 00:00:00+02'::timestamp with time zone) -> Hash (cost=175.79..175.79 rows=4692 width=156) (actual time=61.046..61.046 rows=4696 loops=1) -> Seq Scan on gw_users (cost=0.00..175.79 rows=4692 width=156) (actual time=0.083..34.200 rows=4696 loops=1) Filter: (tree <<@ '1041'::ltree) Total runtime: 194.621 ms (10 rows) The second query uses a custom <<@ operator I added to test the alternative selectivity function: CREATE FUNCTION contstatsel(internal, oid, internal, integer) RETURNS double precision AS 'contstatsel' LANGUAGE internal; CREATE OPERATOR <<@ ( LEFTARG = ltree, LEFTARG = ltree, PROCEDURE = ltree_risparent, COMMUTATOR = '@>', RESTRICT = contstatsel, JOIN = contjoinsel ); Of course any comments/feedback are welcome. Best regards -- Matteo Beccati http://phpadsnew.com/ http://phppgads.com/ Index: src/backend/utils/adt/selfuncs.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/selfuncs.c,v retrieving revision 1.187 diff -r1.187 selfuncs.c 1309a1310,1433 > * contstatsel - Selectivity of containment for any data types. > */ > Datum > contstatsel(PG_FUNCTION_ARGS) > { > PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0); > Oid operator = PG_GETARG_OID(1); > List *args = (List *) PG_GETARG_POINTER(2); > int varRelid = PG_GETARG_INT32(3); > VariableStatData vardata; > Node *other; > bool varonleft; > Datum *values; > int nvalues; > double selec = 0.0; > > /* > * If expression is not variable = something or something = variable, > * then punt and return a default estimate. > */ > if (!get_restriction_variable(root, args, varRelid, > &vardata, &other, &varonleft)) > PG_RETURN_FLOAT8(0.001); > > /* > * If the something is a NULL constant, assume operator is strict and > * return zero, ie, operator will never return TRUE. > */ > if (IsA(other, Const) && > ((Const *) other)->constisnull) > { > ReleaseVariableStats(vardata); > PG_RETURN_FLOAT8(0.0); > } > > if (HeapTupleIsValid(vardata.statsTuple)) > { > Form_pg_statistic stats; > > stats = (Form_pg_statistic) GETSTRUCT(vardata.statsTuple); > > if (IsA(other, Const)) > { > /* Variable is being compared to a known non-null constant */ > Datum constval = ((Const *) other)->constvalue; > bool match = false; > int i; > > elog(INFO, "Checking histogram"); > > /* > * Is the constant "=" to any of the column's most common > * values? (Although the given operator may not really be > * "=", we will assume that seeing whether it returns TRUE is > * an appropriate test. If you don't like this, maybe you > * shouldn't be using eqsel for your operator...) > */ > if (get_attstatsslot(vardata.statsTuple, > vardata.atttype, vardata.atttypmod, > STATISTIC_KIND_HISTOGRAM, InvalidOid, > &values, &nvalues, > NULL, NULL)) > { > FmgrInfo contproc; > > fmgr_info(get_opcode(operator), &contproc); > > elog(INFO, "Found %d values", nvalues); > > for (i = 0; i < nvalues; i++) > { > /* be careful to apply operator right way 'round */ > if (varonleft) > match = DatumGetBool(FunctionCall2(&contproc, > values[i], > constval)); > else > match = DatumGetBool(FunctionCall2(&contproc, > constval, > values[i])); > if (match) > selec++; > } > > if (selec > 0.0 && nvalues > 0) > { > selec /= nvalues; > } > > elog(INFO, "Computed selectivity: %04.3f", selec); > > } > else > { > elog(INFO, "No histogram info"); > > /* no most-common-value info available */ > values = NULL; > i = nvalues = 0; > } > > if (!selec) > selec = 0.001; > > free_attstatsslot(vardata.atttype, values, nvalues, > NULL, 0); > } > else > selec = 0.001; > } > else > selec = 0.001; > > ReleaseVariableStats(vardata); > > /* result should be in range, but make sure... */ > CLAMP_PROBABILITY(selec); > > elog(INFO, "Returned selectivity: %04.3f", selec); > > PG_RETURN_FLOAT8((float8) selec); > } > > /* Index: src/include/catalog/pg_proc.h =================================================================== RCS file: /projects/cvsroot/pgsql/src/include/catalog/pg_proc.h,v retrieving revision 1.380 diff -r1.380 pg_proc.h 3752a3753,3755 > DATA(insert OID = 2600 ( contstatsel PGNSP PGUID 12 f f t f s 4 701 "2281 26 2281 23" _null_ _null_ _null_ contstatsel - _null_ )); > DESCR("enhanced restriction selectivity for containment comparison operators"); > Index: src/include/utils/selfuncs.h =================================================================== RCS file: /projects/cvsroot/pgsql/src/include/utils/selfuncs.h,v retrieving revision 1.23 diff -r1.23 selfuncs.h 97a98,99 > extern Datum contstatsel(PG_FUNCTION_ARGS); >
Matteo Beccati <php@beccati.com> writes: > This also made me think: is there a reason why geometric selectivity > functions return constant values rather than checking statistics for a > better result? Because no one's ever bothered to work on them. You should talk to the PostGIS guys, however, because they *have* been working on real statistics and real estimation functions for geometric types. It'd be nice to get some of that work back-ported into the core database. http://www.postgis.org/ regards, tom lane
Matteo Beccati <php@beccati.com> writes: > Someone on IRC (AndrewSN if I'm not wrong) pointed out that the > restriction selectivity function for <@ is contsel, which returns a > constant value of 0.001. So I started digging in the source code trying > to understand how the default behaviour could be enhanced, and ended up > writing a little patch which adds an alternative containment selectivity > function (called "contstatsel") which is able to deliver better results. After looking at this a little, it doesn't seem like it has much to do with the ordinary 2-D notion of containment. In most of the core geometric types, the "histogram" ordering is based on area, and so testing the histogram samples against the query doesn't seem like it's able to give very meaningful containment results --- the items shown in the histogram could have any locations whatever. The approach might be sensible for ltree's isparent operator --- I don't have a very good feeling for the behavior of that operator, but it looks like it has at least some relationship to the ordering induced by the ltree < operator. So my thought is that (assuming Oleg and Teodor agree this is sensible for ltree) we should put the selectivity function into contrib/ltree, not directly into the core. It might be best to call it something like "parentsel", too, to avoid giving the impression that it has something to do with 2-D containment. Also, you should think about using the most-common-values list as well as the histogram. I would guess that many ltree applications would have enough duplicate entries that the MCV list represents a significant fraction of the total population. Keep in mind when thinking about this that the histogram describes the population of data *exclusive of the MCV entries*. regards, tom lane
Tom Lane wrote: > After looking at this a little, it doesn't seem like it has much to do > with the ordinary 2-D notion of containment. In most of the core > geometric types, the "histogram" ordering is based on area, and so > testing the histogram samples against the query doesn't seem like it's > able to give very meaningful containment results --- the items shown > in the histogram could have any locations whatever. > > The approach might be sensible for ltree's isparent operator --- I don't > have a very good feeling for the behavior of that operator, but it looks > like it has at least some relationship to the ordering induced by the > ltree < operator. Actually, this was one of my doubts. The custom function seem to work well with ltree, but this also could be dependant from the way my dataset is organized. > So my thought is that (assuming Oleg and Teodor agree this is sensible > for ltree) we should put the selectivity function into contrib/ltree, > not directly into the core. It might be best to call it something like > "parentsel", too, to avoid giving the impression that it has something > to do with 2-D containment. > > Also, you should think about using the most-common-values list as well > as the histogram. I would guess that many ltree applications would have > enough duplicate entries that the MCV list represents a significant > fraction of the total population. Keep in mind when thinking about this > that the histogram describes the population of data *exclusive of the > MCV entries*. I also agree that "parentsel" would better fit its purpose. My patch was originally using MCV without good results, until I realized that MCV was empty because the column contains unique values :) I'll look into adding a MCV check to it. Moving it in contrib/ltree would be more difficult to me because it depends on other functions declared in selfuncs.c (get_restriction_variable, etc). Thank you for your feedback Best regards -- Matteo Beccati http://phpadsnew.com/ http://phppgads.com/
Matteo Beccati <php@beccati.com> writes: > Moving it in contrib/ltree would be more difficult to me because it > depends on other functions declared in selfuncs.c > (get_restriction_variable, etc). I'd be willing to consider exporting those functions from selfuncs.c. regards, tom lane
Hi, >>Moving it in contrib/ltree would be more difficult to me because it >>depends on other functions declared in selfuncs.c >>(get_restriction_variable, etc). > > I'd be willing to consider exporting those functions from selfuncs.c. In the meanwhile here is the latest patch which uses both mcv and histogram values. BTW, when restoring my test database I've found out that there were many errors on ALTER INDEX "something" OWNER TO ... : ERROR: "something" is not a table, view, or sequence This using 8.1devel pg_restore and a 8.0.3 compressed dump. I could be wrong, but I didn't get those errors a few days ago (some cvs updates ago). Best regards -- Matteo Beccati http://phpadsnew.com/ http://phppgads.com/ Index: contrib/ltree/ltree.sql.in =================================================================== RCS file: /projects/cvsroot/pgsql/contrib/ltree/ltree.sql.in,v retrieving revision 1.9 diff -c -r1.9 ltree.sql.in *** contrib/ltree/ltree.sql.in 30 Mar 2004 15:45:32 -0000 1.9 --- contrib/ltree/ltree.sql.in 6 Aug 2005 13:10:35 -0000 *************** *** 230,236 **** RIGHTARG = ltree, PROCEDURE = ltree_isparent, COMMUTATOR = '<@', ! RESTRICT = contsel, JOIN = contjoinsel ); --- 230,236 ---- RIGHTARG = ltree, PROCEDURE = ltree_isparent, COMMUTATOR = '<@', ! RESTRICT = parentsel, JOIN = contjoinsel ); *************** *** 248,254 **** RIGHTARG = ltree, PROCEDURE = ltree_risparent, COMMUTATOR = '@>', ! RESTRICT = contsel, JOIN = contjoinsel ); --- 248,254 ---- RIGHTARG = ltree, PROCEDURE = ltree_risparent, COMMUTATOR = '@>', ! RESTRICT = parentsel, JOIN = contjoinsel ); Index: src/backend/utils/adt/selfuncs.c =================================================================== RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/selfuncs.c,v retrieving revision 1.187 diff -c -r1.187 selfuncs.c *** src/backend/utils/adt/selfuncs.c 21 Jul 2005 04:41:43 -0000 1.187 --- src/backend/utils/adt/selfuncs.c 6 Aug 2005 13:10:46 -0000 *************** *** 1306,1311 **** --- 1306,1488 ---- return (Selectivity) selec; } + #define DEFAULT_PARENT_SEL 0.001 + + /* + * parentsel - Selectivity of parent relationship for ltree data types. + */ + Datum + parentsel(PG_FUNCTION_ARGS) + { + PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0); + Oid operator = PG_GETARG_OID(1); + List *args = (List *) PG_GETARG_POINTER(2); + int varRelid = PG_GETARG_INT32(3); + VariableStatData vardata; + Node *other; + bool varonleft; + Datum *values; + int nvalues; + float4 *numbers; + int nnumbers; + double selec = 0.0; + + /* + * If expression is not variable <@ something or something <@ variable, + * then punt and return a default estimate. + */ + if (!get_restriction_variable(root, args, varRelid, + &vardata, &other, &varonleft)) + PG_RETURN_FLOAT8(DEFAULT_PARENT_SEL); + + /* + * If the something is a NULL constant, assume operator is strict and + * return zero, ie, operator will never return TRUE. + */ + if (IsA(other, Const) && + ((Const *) other)->constisnull) + { + ReleaseVariableStats(vardata); + PG_RETURN_FLOAT8(0.0); + } + + if (HeapTupleIsValid(vardata.statsTuple)) + { + Form_pg_statistic stats; + double mcvsum = 0.0; + double mcvsel = 0.0; + double hissel = 0.0; + + stats = (Form_pg_statistic) GETSTRUCT(vardata.statsTuple); + + if (IsA(other, Const)) + { + /* Variable is being compared to a known non-null constant */ + Datum constval = ((Const *) other)->constvalue; + bool match = false; + int i; + + /* + * Is the constant "<@" to any of the column's most common + * values? + */ + if (get_attstatsslot(vardata.statsTuple, + vardata.atttype, vardata.atttypmod, + STATISTIC_KIND_MCV, InvalidOid, + &values, &nvalues, + &numbers, &nnumbers)) + { + FmgrInfo contproc; + + fmgr_info(get_opcode(operator), &contproc); + + for (i = 0; i < nvalues; i++) + { + /* be careful to apply operator right way 'round */ + if (varonleft) + match = DatumGetBool(FunctionCall2(&contproc, + values[i], + constval)); + else + match = DatumGetBool(FunctionCall2(&contproc, + constval, + values[i])); + + /* calculate total selectivity of all most-common-values */ + mcvsum += numbers[i]; + + /* calculate selectivity of matching most-common-values */ + if (match) + mcvsel += numbers[i]; + } + } + else + { + /* no most-common-values info available */ + values = NULL; + numbers = NULL; + i = nvalues = nnumbers = 0; + } + + free_attstatsslot(vardata.atttype, values, nvalues, + NULL, 0); + + + /* + * Is the constant "<@" to any of the column's histogram + * values? + */ + if (get_attstatsslot(vardata.statsTuple, + vardata.atttype, vardata.atttypmod, + STATISTIC_KIND_HISTOGRAM, InvalidOid, + &values, &nvalues, + NULL, NULL)) + { + FmgrInfo contproc; + + fmgr_info(get_opcode(operator), &contproc); + + for (i = 0; i < nvalues; i++) + { + /* be careful to apply operator right way 'round */ + if (varonleft) + match = DatumGetBool(FunctionCall2(&contproc, + values[i], + constval)); + else + match = DatumGetBool(FunctionCall2(&contproc, + constval, + values[i])); + /* count matching histogram values */ + if (match) + hissel++; + } + + if (hissel > 0.0) + { + /* + * some matching values found inside histogram, divide matching entries number + * by total histogram entries to get the histogram related selectivity + */ + hissel /= nvalues; + } + } + else + { + /* no histogram info available */ + values = NULL; + i = nvalues = 0; + } + + free_attstatsslot(vardata.atttype, values, nvalues, + NULL, 0); + + + /* + * calculate selectivity based on MCV and histogram result + * histogram selectivity needs to be scaled down if there are any most-common-values + */ + selec = mcvsel + hissel * (1.0 - mcvsum); + + /* don't return 0.0 selectivity unless all table values are inside mcv */ + if (selec == 0.0 && mcvsum != 1.0) + selec = DEFAULT_PARENT_SEL; + } + else + selec = DEFAULT_PARENT_SEL; + } + else + selec = DEFAULT_PARENT_SEL; + + + ReleaseVariableStats(vardata); + + /* result should be in range, but make sure... */ + CLAMP_PROBABILITY(selec); + + PG_RETURN_FLOAT8((float8) selec); + } + /* * eqjoinsel - Join selectivity of "=" */ Index: src/include/catalog/pg_proc.h =================================================================== RCS file: /projects/cvsroot/pgsql/src/include/catalog/pg_proc.h,v retrieving revision 1.380 diff -c -r1.380 pg_proc.h *** src/include/catalog/pg_proc.h 2 Aug 2005 16:11:57 -0000 1.380 --- src/include/catalog/pg_proc.h 6 Aug 2005 13:10:59 -0000 *************** *** 3750,3755 **** --- 3750,3758 ---- DATA(insert OID = 2592 ( gist_circle_compress PGNSP PGUID 12 f f t f i 1 2281 "2281" _null_ _null_ _null_ gist_circle_compress- _null_ )); DESCR("GiST support"); + DATA(insert OID = 2600 ( parentsel PGNSP PGUID 12 f f t f s 4 701 "2281 26 2281 23" _null_ _null_ _null_ parentsel - _null_ )); + DESCR("enhanced restriction selectivity for ltree isparent comparison operators"); + /* * Symbolic values for provolatile column: these indicate whether the result Index: src/include/utils/selfuncs.h =================================================================== RCS file: /projects/cvsroot/pgsql/src/include/utils/selfuncs.h,v retrieving revision 1.23 diff -c -r1.23 selfuncs.h *** src/include/utils/selfuncs.h 5 Jun 2005 22:32:58 -0000 1.23 --- src/include/utils/selfuncs.h 6 Aug 2005 13:11:00 -0000 *************** *** 95,100 **** --- 95,102 ---- extern Datum nlikesel(PG_FUNCTION_ARGS); extern Datum icnlikesel(PG_FUNCTION_ARGS); + extern Datum parentsel(PG_FUNCTION_ARGS); + extern Datum eqjoinsel(PG_FUNCTION_ARGS); extern Datum neqjoinsel(PG_FUNCTION_ARGS); extern Datum scalarltjoinsel(PG_FUNCTION_ARGS);
This has been saved for the 8.2 release: http://momjian.postgresql.org/cgi-bin/pgpatches_hold --------------------------------------------------------------------------- Matteo Beccati wrote: > Hi, > > >>Moving it in contrib/ltree would be more difficult to me because it > >>depends on other functions declared in selfuncs.c > >>(get_restriction_variable, etc). > > > > I'd be willing to consider exporting those functions from selfuncs.c. > > In the meanwhile here is the latest patch which uses both mcv and > histogram values. > > > BTW, when restoring my test database I've found out that there were many > errors on ALTER INDEX "something" OWNER TO ... : > > ERROR: "something" is not a table, view, or sequence > > This using 8.1devel pg_restore and a 8.0.3 compressed dump. I could be > wrong, but I didn't get those errors a few days ago (some cvs updates ago). > > > Best regards > -- > Matteo Beccati > http://phpadsnew.com/ > http://phppgads.com/ > Index: contrib/ltree/ltree.sql.in > =================================================================== > RCS file: /projects/cvsroot/pgsql/contrib/ltree/ltree.sql.in,v > retrieving revision 1.9 > diff -c -r1.9 ltree.sql.in > *** contrib/ltree/ltree.sql.in 30 Mar 2004 15:45:32 -0000 1.9 > --- contrib/ltree/ltree.sql.in 6 Aug 2005 13:10:35 -0000 > *************** > *** 230,236 **** > RIGHTARG = ltree, > PROCEDURE = ltree_isparent, > COMMUTATOR = '<@', > ! RESTRICT = contsel, > JOIN = contjoinsel > ); > > --- 230,236 ---- > RIGHTARG = ltree, > PROCEDURE = ltree_isparent, > COMMUTATOR = '<@', > ! RESTRICT = parentsel, > JOIN = contjoinsel > ); > > *************** > *** 248,254 **** > RIGHTARG = ltree, > PROCEDURE = ltree_risparent, > COMMUTATOR = '@>', > ! RESTRICT = contsel, > JOIN = contjoinsel > ); > > --- 248,254 ---- > RIGHTARG = ltree, > PROCEDURE = ltree_risparent, > COMMUTATOR = '@>', > ! RESTRICT = parentsel, > JOIN = contjoinsel > ); > > Index: src/backend/utils/adt/selfuncs.c > =================================================================== > RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/selfuncs.c,v > retrieving revision 1.187 > diff -c -r1.187 selfuncs.c > *** src/backend/utils/adt/selfuncs.c 21 Jul 2005 04:41:43 -0000 1.187 > --- src/backend/utils/adt/selfuncs.c 6 Aug 2005 13:10:46 -0000 > *************** > *** 1306,1311 **** > --- 1306,1488 ---- > return (Selectivity) selec; > } > > + #define DEFAULT_PARENT_SEL 0.001 > + > + /* > + * parentsel - Selectivity of parent relationship for ltree data types. > + */ > + Datum > + parentsel(PG_FUNCTION_ARGS) > + { > + PlannerInfo *root = (PlannerInfo *) PG_GETARG_POINTER(0); > + Oid operator = PG_GETARG_OID(1); > + List *args = (List *) PG_GETARG_POINTER(2); > + int varRelid = PG_GETARG_INT32(3); > + VariableStatData vardata; > + Node *other; > + bool varonleft; > + Datum *values; > + int nvalues; > + float4 *numbers; > + int nnumbers; > + double selec = 0.0; > + > + /* > + * If expression is not variable <@ something or something <@ variable, > + * then punt and return a default estimate. > + */ > + if (!get_restriction_variable(root, args, varRelid, > + &vardata, &other, &varonleft)) > + PG_RETURN_FLOAT8(DEFAULT_PARENT_SEL); > + > + /* > + * If the something is a NULL constant, assume operator is strict and > + * return zero, ie, operator will never return TRUE. > + */ > + if (IsA(other, Const) && > + ((Const *) other)->constisnull) > + { > + ReleaseVariableStats(vardata); > + PG_RETURN_FLOAT8(0.0); > + } > + > + if (HeapTupleIsValid(vardata.statsTuple)) > + { > + Form_pg_statistic stats; > + double mcvsum = 0.0; > + double mcvsel = 0.0; > + double hissel = 0.0; > + > + stats = (Form_pg_statistic) GETSTRUCT(vardata.statsTuple); > + > + if (IsA(other, Const)) > + { > + /* Variable is being compared to a known non-null constant */ > + Datum constval = ((Const *) other)->constvalue; > + bool match = false; > + int i; > + > + /* > + * Is the constant "<@" to any of the column's most common > + * values? > + */ > + if (get_attstatsslot(vardata.statsTuple, > + vardata.atttype, vardata.atttypmod, > + STATISTIC_KIND_MCV, InvalidOid, > + &values, &nvalues, > + &numbers, &nnumbers)) > + { > + FmgrInfo contproc; > + > + fmgr_info(get_opcode(operator), &contproc); > + > + for (i = 0; i < nvalues; i++) > + { > + /* be careful to apply operator right way 'round */ > + if (varonleft) > + match = DatumGetBool(FunctionCall2(&contproc, > + values[i], > + constval)); > + else > + match = DatumGetBool(FunctionCall2(&contproc, > + constval, > + values[i])); > + > + /* calculate total selectivity of all most-common-values */ > + mcvsum += numbers[i]; > + > + /* calculate selectivity of matching most-common-values */ > + if (match) > + mcvsel += numbers[i]; > + } > + } > + else > + { > + /* no most-common-values info available */ > + values = NULL; > + numbers = NULL; > + i = nvalues = nnumbers = 0; > + } > + > + free_attstatsslot(vardata.atttype, values, nvalues, > + NULL, 0); > + > + > + /* > + * Is the constant "<@" to any of the column's histogram > + * values? > + */ > + if (get_attstatsslot(vardata.statsTuple, > + vardata.atttype, vardata.atttypmod, > + STATISTIC_KIND_HISTOGRAM, InvalidOid, > + &values, &nvalues, > + NULL, NULL)) > + { > + FmgrInfo contproc; > + > + fmgr_info(get_opcode(operator), &contproc); > + > + for (i = 0; i < nvalues; i++) > + { > + /* be careful to apply operator right way 'round */ > + if (varonleft) > + match = DatumGetBool(FunctionCall2(&contproc, > + values[i], > + constval)); > + else > + match = DatumGetBool(FunctionCall2(&contproc, > + constval, > + values[i])); > + /* count matching histogram values */ > + if (match) > + hissel++; > + } > + > + if (hissel > 0.0) > + { > + /* > + * some matching values found inside histogram, divide matching entries number > + * by total histogram entries to get the histogram related selectivity > + */ > + hissel /= nvalues; > + } > + } > + else > + { > + /* no histogram info available */ > + values = NULL; > + i = nvalues = 0; > + } > + > + free_attstatsslot(vardata.atttype, values, nvalues, > + NULL, 0); > + > + > + /* > + * calculate selectivity based on MCV and histogram result > + * histogram selectivity needs to be scaled down if there are any most-common-values > + */ > + selec = mcvsel + hissel * (1.0 - mcvsum); > + > + /* don't return 0.0 selectivity unless all table values are inside mcv */ > + if (selec == 0.0 && mcvsum != 1.0) > + selec = DEFAULT_PARENT_SEL; > + } > + else > + selec = DEFAULT_PARENT_SEL; > + } > + else > + selec = DEFAULT_PARENT_SEL; > + > + > + ReleaseVariableStats(vardata); > + > + /* result should be in range, but make sure... */ > + CLAMP_PROBABILITY(selec); > + > + PG_RETURN_FLOAT8((float8) selec); > + } > + > /* > * eqjoinsel - Join selectivity of "=" > */ > Index: src/include/catalog/pg_proc.h > =================================================================== > RCS file: /projects/cvsroot/pgsql/src/include/catalog/pg_proc.h,v > retrieving revision 1.380 > diff -c -r1.380 pg_proc.h > *** src/include/catalog/pg_proc.h 2 Aug 2005 16:11:57 -0000 1.380 > --- src/include/catalog/pg_proc.h 6 Aug 2005 13:10:59 -0000 > *************** > *** 3750,3755 **** > --- 3750,3758 ---- > DATA(insert OID = 2592 ( gist_circle_compress PGNSP PGUID 12 f f t f i 1 2281 "2281" _null_ _null_ _null_ gist_circle_compress - _null_ )); > DESCR("GiST support"); > > + DATA(insert OID = 2600 ( parentsel PGNSP PGUID 12 f f t f s 4 701 "2281 26 2281 23" _null_ _null_ _null_ parentsel - _null_ )); > + DESCR("enhanced restriction selectivity for ltree isparent comparison operators"); > + > > /* > * Symbolic values for provolatile column: these indicate whether the result > Index: src/include/utils/selfuncs.h > =================================================================== > RCS file: /projects/cvsroot/pgsql/src/include/utils/selfuncs.h,v > retrieving revision 1.23 > diff -c -r1.23 selfuncs.h > *** src/include/utils/selfuncs.h 5 Jun 2005 22:32:58 -0000 1.23 > --- src/include/utils/selfuncs.h 6 Aug 2005 13:11:00 -0000 > *************** > *** 95,100 **** > --- 95,102 ---- > extern Datum nlikesel(PG_FUNCTION_ARGS); > extern Datum icnlikesel(PG_FUNCTION_ARGS); > > + extern Datum parentsel(PG_FUNCTION_ARGS); > + > extern Datum eqjoinsel(PG_FUNCTION_ARGS); > extern Datum neqjoinsel(PG_FUNCTION_ARGS); > extern Datum scalarltjoinsel(PG_FUNCTION_ARGS); > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian ha scritto: > Your patch has been added to the PostgreSQL unapplied patches list at: > > http://momjian.postgresql.org/cgi-bin/pgpatches > > It will be applied as soon as one of the PostgreSQL committers reviews > and approves it. Great. I would just like to remind that Tom said: > I'd be willing to consider exporting those functions from selfuncs.c. so that the selector function could be moved to contrib/ltree, which is its natural place. It could also be noted that a similar feature could be useful outside ltree: I guess there are plenty of cases when scanning statistics would give a better result than using a constant selectivity. Best regards -- Matteo Beccati http://phpadsnew.com http://phppgads.com
Matteo Beccati wrote: > Bruce Momjian ha scritto: > > Your patch has been added to the PostgreSQL unapplied patches list at: > > > > http://momjian.postgresql.org/cgi-bin/pgpatches > > > > It will be applied as soon as one of the PostgreSQL committers reviews > > and approves it. > > Great. I would just like to remind that Tom said: > > > I'd be willing to consider exporting those functions from selfuncs.c. > > so that the selector function could be moved to contrib/ltree, which is > its natural place. > > It could also be noted that a similar feature could be useful outside > ltree: I guess there are plenty of cases when scanning statistics would > give a better result than using a constant selectivity. Yes, I talked to Tom today about moving that stuff into /contrib. I will work on the patch to do that. -- Bruce Momjian http://candle.pha.pa.us SRA OSS, Inc. http://www.sraoss.com + If your life is a hard drive, Christ can be your backup. +