Re: PoC/WIP: Extended statistics on expressions - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: PoC/WIP: Extended statistics on expressions |
Date | |
Msg-id | 329203a4-7cad-a0e3-d5a8-ce70872c8156@enterprisedb.com Whole thread Raw |
In response to | Re: PoC/WIP: Extended statistics on expressions (Dean Rasheed <dean.a.rasheed@gmail.com>) |
Responses |
Re: PoC/WIP: Extended statistics on expressions
|
List | pgsql-hackers |
On 1/4/21 4:34 PM, Dean Rasheed wrote: > > ... > > Some other comments: > > * I'm not sure I understand the need for 0001. Wasn't there an earlier > version of this patch that just did it by re-populating the type > array, but which still had it as an array rather than turning it into > a list? Making it a list falsifies some of the comments and > function/variable name choices in that file. > That's a bit done to Justin - I think it's fine to use the older version repopulating the type array, but that question is somewhat unrelated to this patch. > * There's a comment typo in catalog/Makefile -- "are are reputedly > other...", should be "there are reputedly other...". > > * Looking at the pg_stats_ext view, I think perhaps expressions stats > should be omitted entirely from that view, since it doesn't show any > useful information about them. So it could remove "e" from the "kinds" > array, and exclude rows whose only kind is "e", since such rows have > no interesting data in them. Essentially, the new view > pg_stats_ext_exprs makes having any expression stats in pg_stats_ext > redundant. Hiding this data in pg_stats_ext would also be consistent > with making the "expressions" stats kind hidden from the user. > Hmmm, not sure. I'm not sure removing 'e' from the array is a good idea. On the one hand it's internal detail, on the other hand most of that view is internal detail too. Excluding rows with only 'e' seems reasonable, though. I need to think about this. > * In gram.y, it wasn't quite obvious why you converted the column list > for CREATE STATISTICS from an expr_list to a stats_params list. I > figured it out, and it makes sense, but I think it could use a > comment, perhaps something along the lines of the one for index_elem, > e.g.: > > /* > * Statistics attributes can be either simple column references, or arbitrary > * expressions in parens. For compatibility with index attributes permitted > * in CREATE INDEX, we allow an expression that's just a function call to be > * written without parens. > */ > OH, right. I'd have trouble figuring this myself, and I wrote that code myself only one or two months ago. > * In parse_func.c and parse_agg.c, there are a few new error strings > that use the abbreviation "stats expressions", whereas most other > errors refer to "statistics expressions". For consistency, I think > they should all be the latter. > OK, will fix. > * In generateClonedExtStatsStmt(), I think the "expressions" stats > kind needs to be explicitly excluded, otherwise CREATE TABLE (LIKE > ...) fails if the source table has expression stats. > Yeah, will fix. I guess this also means we're missing some tests. > * CreateStatistics() uses ShareUpdateExclusiveLock, but in > tcop/utility.c the relation is opened with a ShareLock. ISTM that the > latter lock mode should be made to match CreateStatistics(). > Not sure, will check. > * Why does the new code in tcop/utility.c not use > RangeVarGetRelidExtended together with RangeVarCallbackOwnsRelation? > That seems preferable to doing the ACL check in CreateStatistics(). > For one thing, as it stands, it allows the lock to be taken even if > the user doesn't own the table. Is it intentional that the current > code allows extended stats to be created on system catalogs? That > would be one thing that using RangeVarCallbackOwnsRelation would > change, but I can't see a reason to allow it. > I think I copied the code from somewhere - probably expression indexes, or something like that. Not a proof that it's the right/better way to do this, though. > * In src/bin/psql/describe.c, I think the \d output should also > exclude the "expressions" stats kind and just list the other kinds (or > have no kinds list at all, if there are no other kinds), to make it > consistent with the CREATE STATISTICS syntax. > Not sure I understand. Why would this make it consistent with CREATE STATISTICS? Can you elaborate? > * The pg_dump output for a stats object whose only kind is > "expressions" is broken -- it includes a spurious "()" for the kinds > list. > Will fix. Again, this suggests there are TAP tests missing. > That's it for now. I'll look at the optimiser changes next, and try to > post more comments later this week. > Thanks! regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: