Re: Additional improvements to extended statistics - Mailing list pgsql-hackers
From | Dean Rasheed |
---|---|
Subject | Re: Additional improvements to extended statistics |
Date | |
Msg-id | CAEZATCVE2Jx1QB0FFazvgjKAY25tv-EWX9C5XSefEZY-gxn75A@mail.gmail.com Whole thread Raw |
In response to | Re: Additional improvements to extended statistics (Tomas Vondra <tomas.vondra@2ndquadrant.com>) |
Responses |
Re: Additional improvements to extended statistics
|
List | pgsql-hackers |
On Thu, 12 Nov 2020 at 14:18, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote: > > Here is an improved WIP version of the patch series, modified to address > the issue with repeatedly applying the extended statistics, as discussed > with Dean in this thread. It's a bit rough and not committable, but I > need some feedback so I'm posting it in this state. As it stands, it doesn't compile if 0003 is applied, because it missed one of the callers of clauselist_selectivity_simple(), but that's easily fixed. > 0001 is the original patch improving estimates of OR clauses > > 0002 adds thin wrappers for clause[list]_selectivity, with "internal" > functions allowing to specify whether to keep considering extended stats > > 0003 does the same for the "simple" functions > > > I've kept it like this to demonstrate that 0002 is not sufficient. In my > response from March 24 I wrote this: > > > Isn't it the case that clauselist_selectivity_simple (and the OR > > variant) should ignore extended stats entirely? That is, we'd need > > to add a flag (or _simple variant) to clause_selectivity, so that it > > calls causelist_selectivity_simple_or. > But that's actually wrong, as 0002 shows (as it breaks a couple of > regression tests), because of the way we handle OR clauses. At the top > level, an OR-clause is actually just a single clause and it may get > passed to clauselist_selectivity_simple. So entirely disabling extended > stats for the "simple" functions would also mean disabling extended > stats for a large number of OR clauses. Which is clearly wrong. > > So 0003 addresses that, by adding a flag to the two "simple" functions. > Ultimately, this should probably do the same thing as 0002 and add thin > wrappers, because the existing functions are part of the public API. I agree that, taken together, these patches fix the multiple-extended-stats-evaluation issue. However: I think this has ended up with too many variants of these functions, since we now have "_internal" and "_simple" variants, and you're proposing adding more. The original purpose of the "_simple" variants was to compute selectivities without looking at extended stats, and now the "_internal" variants compute selectivities with an additional "use_extended_stats" flag to control whether or not to look at extended stats. Thus they're basically the same, and could be rolled together. Additionally, it's worth noting that the "_simple" variants expose the "estimatedclauses" bitmap as an argument, which IMO is a bit messy as an API. All callers of the "_simple" functions outside of clausesel.c actually pass in estimatedclauses=NULL, so it's possible to refactor and get rid of that, turning estimatedclauses into a purely internal variable. Also, it's quite messy that clauselist_selectivity_simple_or() needs to be passed a Selectivity input (the final argument) that is the selectivity of any already-estimated clauses, or the value to return if no not-already-estimated clauses are found, and must be 0.0 when called from the extended stats code. Attached is the kind of thing I had in mind (as a single patch, since I don't think it's worth splitting up). This replaces the "_simple" and "_internal" variants of these functions with "_opt_ext_stats" variants whose signatures match the originals except for having the single extra "use_extended_stats" boolean parameter. Additionally, the "_simple" functions are merged into the originals (making them more like they were in PG11) so that the "estimatedclauses" bitmap and partial-OR-list Selectivity become internal details, no longer exposed in the API. Regards, Dean
Attachment
pgsql-hackers by date: