Re: distinct estimate of a hard-coded VALUES list - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: distinct estimate of a hard-coded VALUES list |
Date | |
Msg-id | 10078.1471955305@sss.pgh.pa.us Whole thread Raw |
In response to | Re: distinct estimate of a hard-coded VALUES list (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: [HACKERS] distinct estimate of a hard-coded VALUES list
|
List | pgsql-hackers |
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes: > On 08/22/2016 07:42 PM, Alvaro Herrera wrote: >> Also, if we patch it this way and somebody has a slow query because of a >> lot of duplicate values, it's easy to solve the problem by >> de-duplicating. But with the current code, people that have the >> opposite problem has no way to work around it. > I certainly agree it's better when a smart user can fix his query plan > by deduplicating the values than when we end up generating a poor plan > due to assuming some users are somewhat dumb. Well, that seems to be the majority opinion, so attached is a patch to do it. Do we want to sneak this into 9.6, or wait? > I wonder how expensive would it be to actually count the number of > distinct values - there certainly are complex data types where the > comparisons are fairly expensive, but I would not expect those to be > used in explicit VALUES lists. You'd have to sort the values before de-duping, and deal with VALUES expressions that aren't simple literals. And you'd have to do it a lot --- by my count, get_variable_numdistinct is invoked six times on the Var representing the VALUES output in Jeff's example query. Maybe you could cache the results somewhere, but that adds even more complication. Given the lack of prior complaints about this, it's hard to believe it's worth the trouble. regards, tom lane diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c index 56943f2..43e6220 100644 *** a/src/backend/utils/adt/selfuncs.c --- b/src/backend/utils/adt/selfuncs.c *************** get_variable_numdistinct(VariableStatDat *** 4768,4773 **** --- 4768,4784 ---- */ stadistinct = 2.0; } + else if (vardata->rel && vardata->rel->rtekind == RTE_VALUES) + { + /* + * If the Var represents a column of a VALUES RTE, assume it's unique. + * This could of course be very wrong, but it should tend to be true + * in well-written queries. We could consider examining the RTE + * contents to get some real statistics; but that only works if the + * expressions are constants, and it would be pretty expensive anyway. + */ + stadistinct = -1.0; /* unique (and all non null) */ + } else { /* diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h index fcfb0d4..25715c9 100644 *** a/src/include/nodes/relation.h --- b/src/include/nodes/relation.h *************** typedef struct PlannerInfo *** 399,405 **** * * relid - RTE index (this is redundant with the relids field, but * is provided for convenience of access) ! * rtekind - distinguishes plain relation, subquery, or function RTE * min_attr, max_attr - range of valid AttrNumbers for rel * attr_needed - array of bitmapsets indicating the highest joinrel * in which each attribute is needed; if bit 0 is set then --- 399,405 ---- * * relid - RTE index (this is redundant with the relids field, but * is provided for convenience of access) ! * rtekind - copy of RTE's rtekind field * min_attr, max_attr - range of valid AttrNumbers for rel * attr_needed - array of bitmapsets indicating the highest joinrel * in which each attribute is needed; if bit 0 is set then *************** typedef struct RelOptInfo *** 512,518 **** /* information about a base rel (not set for join rels!) */ Index relid; Oid reltablespace; /* containing tablespace */ ! RTEKind rtekind; /* RELATION, SUBQUERY, or FUNCTION */ AttrNumber min_attr; /* smallest attrno of rel (often <0) */ AttrNumber max_attr; /* largest attrno of rel */ Relids *attr_needed; /* array indexed [min_attr .. max_attr] */ --- 512,518 ---- /* information about a base rel (not set for join rels!) */ Index relid; Oid reltablespace; /* containing tablespace */ ! RTEKind rtekind; /* RELATION, SUBQUERY, FUNCTION, etc */ AttrNumber min_attr; /* smallest attrno of rel (often <0) */ AttrNumber max_attr; /* largest attrno of rel */ Relids *attr_needed; /* array indexed [min_attr .. max_attr] */
pgsql-hackers by date: