Re: PoC/WIP: Extended statistics on expressions - Mailing list pgsql-hackers
From | Dean Rasheed |
---|---|
Subject | Re: PoC/WIP: Extended statistics on expressions |
Date | |
Msg-id | CAEZATCXAPrrOKwEsyZKQ4uzzJQWBCt6QAvOcgqRGdWwT1zb+rQ@mail.gmail.com Whole thread Raw |
In response to | Re: PoC/WIP: Extended statistics on expressions (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: PoC/WIP: Extended statistics on expressions
|
List | pgsql-hackers |
On Fri, 11 Dec 2020 at 20:17, Tomas Vondra <tomas.vondra@enterprisedb.com> wrote: > > OK. Attached is an updated version, reworking it this way. Cool. I think this is an exciting development, so I hope it makes it into the next release. I have started looking at it. So far I have only looked at the catalog, parser and client changes, but I thought it's worth posting my comments so far. > I tried tweaking the grammar to differentiate these two syntax variants, > but that led to shift/reduce conflicts with the existing ones. I tried > fixing that, but I ended up doing that in CreateStatistics(). Yeah, that makes sense. I wasn't expecting the grammar to change. > The other thing is that we probably can't tie this to just MCV, because > functional dependencies need the per-expression stats too. So I simply > build expression stats whenever there's at least one expression. Makes sense. > I also decided to keep the "expressions" statistics kind - it's not > allowed to specify it in CREATE STATISTICS, but it's useful internally > as it allows deciding whether to build the stats in a single place. > Otherwise we'd need to do that every time we build the statistics, etc. Yes, I thought that would be the easiest way to do it. Essentially the "expressions" stats kind is an internal implementation detail, hidden from the user, because it's built automatically when required, so you don't need to (and can't) explicitly ask for it. This new behaviour seems much more logical to me. > I added a brief explanation to the sgml docs, not sure if that's good > enough - maybe it needs more details. Yes, I think that could use a little tidying up, but I haven't looked too closely yet. Some other comments: * I'm not sure I understand the need for 0001. Wasn't there an earlier version of this patch that just did it by re-populating the type array, but which still had it as an array rather than turning it into a list? Making it a list falsifies some of the comments and function/variable name choices in that file. * There's a comment typo in catalog/Makefile -- "are are reputedly other...", should be "there are reputedly other...". * Looking at the pg_stats_ext view, I think perhaps expressions stats should be omitted entirely from that view, since it doesn't show any useful information about them. So it could remove "e" from the "kinds" array, and exclude rows whose only kind is "e", since such rows have no interesting data in them. Essentially, the new view pg_stats_ext_exprs makes having any expression stats in pg_stats_ext redundant. Hiding this data in pg_stats_ext would also be consistent with making the "expressions" stats kind hidden from the user. * In gram.y, it wasn't quite obvious why you converted the column list for CREATE STATISTICS from an expr_list to a stats_params list. I figured it out, and it makes sense, but I think it could use a comment, perhaps something along the lines of the one for index_elem, e.g.: /* * Statistics attributes can be either simple column references, or arbitrary * expressions in parens. For compatibility with index attributes permitted * in CREATE INDEX, we allow an expression that's just a function call to be * written without parens. */ * In parse_func.c and parse_agg.c, there are a few new error strings that use the abbreviation "stats expressions", whereas most other errors refer to "statistics expressions". For consistency, I think they should all be the latter. * In generateClonedExtStatsStmt(), I think the "expressions" stats kind needs to be explicitly excluded, otherwise CREATE TABLE (LIKE ...) fails if the source table has expression stats. * CreateStatistics() uses ShareUpdateExclusiveLock, but in tcop/utility.c the relation is opened with a ShareLock. ISTM that the latter lock mode should be made to match CreateStatistics(). * Why does the new code in tcop/utility.c not use RangeVarGetRelidExtended together with RangeVarCallbackOwnsRelation? That seems preferable to doing the ACL check in CreateStatistics(). For one thing, as it stands, it allows the lock to be taken even if the user doesn't own the table. Is it intentional that the current code allows extended stats to be created on system catalogs? That would be one thing that using RangeVarCallbackOwnsRelation would change, but I can't see a reason to allow it. * In src/bin/psql/describe.c, I think the \d output should also exclude the "expressions" stats kind and just list the other kinds (or have no kinds list at all, if there are no other kinds), to make it consistent with the CREATE STATISTICS syntax. * The pg_dump output for a stats object whose only kind is "expressions" is broken -- it includes a spurious "()" for the kinds list. That's it for now. I'll look at the optimiser changes next, and try to post more comments later this week. Regards, Dean
pgsql-hackers by date: