Re: [HACKERS] PATCH: multivariate histograms and MCV lists - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: [HACKERS] PATCH: multivariate histograms and MCV lists |
Date | |
Msg-id | b423b2fe-8aff-d64c-152a-8c77cf45faa7@2ndquadrant.com Whole thread Raw |
In response to | Re: [HACKERS] PATCH: multivariate histograms and MCV lists (Dean Rasheed <dean.a.rasheed@gmail.com>) |
Responses |
Re: [HACKERS] PATCH: multivariate histograms and MCV lists
|
List | pgsql-hackers |
On 03/28/2018 04:12 PM, Dean Rasheed wrote: > On 28 March 2018 at 01:34, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: >> Attached is a patch fixing this. In the end I've decided to keep both >> branches - one handling boolean Vars and one for NOT clauses. I think >> you're right we can only see (NOT var) cases, but I'm not sure about that. >> >> For example, what if an operator does not have a negator? Then we can't >> transform NOT (a AND b) => (NOT a OR NOT b), I guess. So I kept this for >> now, and we can remove this later. >> > > OK, but it's going to have to work harder to set "fullmatch" > correctly. If you have a boolean Var clause, which is identical to > "bool_var = true", it ought to add to "eqmatches" if true is found in > the MCV list. Likewise a boolean Var under a NOT clause is identical > to "bool_var = false", so it ought to add to "eqmatches" if false is > found in the MCV list. Both those cases would be easy to handle, if > general NOT support wasn't required, and you just special-cased "NOT > bool_var". > > If you're going to handle the general case of an arbitrary clause > under a NOT, then the recursive call to mcv_update_match_bitmap() > would seem to need to know that it's under a NOT (a new "is_not" > parameter?), to invert the logic around adding to "eqmatches". That > applies to other general OpExpr's too -- for example, "NOT (box_var = > ?)" won't be rewritten because there is no box_ne operator, but when > mcv_update_match_bitmap() is called recursively with the "box_var = > ?", it shouldn't add to "eqmatches", despite this being an EQSEL > operator. > > As mentioned before, I think this whole thing only works if > mcv_update_match_bitmap() returns the "eqmatches" list, so that if it > is called recursively, it can be merged with the caller's list. What > isn't immediately obvious to me is what happens to a NOT clause under > another NOT clause, possibly with an AND or OR in-between. Would the > "is_not" parameter just flip back to false again? > After thinking about this a bit more, I'm not sure if updating the info based on recursive calls makes sense. The fullmatch flag was supposed to answer a simple question - can there be just a single matching item? If there are equality conditions on all columns, there can be just a single matching item - if we have found it in the MCV (i.e. s1 > 0.0), then we don't need to inspect the non-MCV part. But handling this in recursive manner breaks this entirely, because with something like (a=1) AND (b=1 OR b=2) you suddenly can have multiple matching items. Which makes the fullmatch flag somewhat useless. So I think we should be looking at top-level equality clauses only, just like number_of_groups() does. > There's also an interesting question around the NullTest clause. Since > NULLs are being recorded in the MCV list, shouldn't "IS NULL" be > treated as semantically like an equality clause, and cause that > attribute to be added to "eqmatches" if NULL is found in the MCV list? > > >> I've also realized that the "fullmatch" flag is somewhat confused, >> because some places interpreted it as "there is equality on each >> attribute" but in fact it also required an actual MCV match. > > Yes, I was having similar thoughts. I think "eqmatches" / "fullmatch" > probably just wants to track whether there was an exact comparison on > all the attributes, not whether or not the value was in the MCV list, > because the latter is already available in the "matches" bitmap. > Knowing that complete, exact comparisons happened, and it wasn't in > the MCV list, makes the "(1 - mcv_totalsel)) / otherdistinct" estimate > reasonable. > I think we can remove the fullmatch flag from mcv_update_bitmap entirely. All we need to know is the presence of equality clauses and whether there was a match in MCV (which we know from s1 > 0.0). > However, I don't think that tracking "eqmatches" or "fullmatch" is > sufficient for the general case. For example, for other operators like > "!=", "<", "<=", all (or maybe half) the "1 - mcv_totalsel" ought to > count towards the selectivity, plus possibly part of the MCV list > (e.g., for "<=", using the sum of the matching MCV frequencies plus > half the sum of the non-MCV frequencies might be reasonable -- c.f. > scalarineqsel()). For an OR clause, you might want to count the number > of non-MCV matches, because logically each one adds another "(1 - > mcv_totalsel)) / otherdistinct" to the total selectivity. It's not > immediately obvious how that can be made to fit into the current code > structure. Perhaps it could be made to work by tracking the overall > selectivity as it goes along. Or perhaps it could track the > count/proportion of non-MCV matches. > Yes, ignoring the non-equality clauses in 0002 is wrong - that's pretty much why it's WIP and not merged into 0001. thanks for the feedback -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: