Re: Statistics Import and Export - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Statistics Import and Export |
Date | |
Msg-id | bf724b21-914a-4497-84e3-49944f9776f6@enterprisedb.com Whole thread Raw |
In response to | Re: Statistics Import and Export (Corey Huinker <corey.huinker@gmail.com>) |
Responses |
Re: Statistics Import and Export
Re: Statistics Import and Export |
List | pgsql-hackers |
On 3/25/24 09:27, Corey Huinker wrote: > On Fri, Mar 22, 2024 at 9:51 PM Corey Huinker <corey.huinker@gmail.com> > wrote: > >> v12 attached. >> >> > v13 attached. All the same features as v12, but with a lot more type > checking, bounds checking, value inspection, etc. Perhaps the most notable > feature is that we're now ensuring that histogram values are in ascending > order. This could come in handy for detecting when we are applying stats to > a column of the wrong type, or the right type but with a different > collation. It's not a guarantee of validity, of course, but it would detect > egregious changes in sort order. > Hi, I did take a closer look at v13 today. I have a bunch of comments and some minor whitespace fixes in the attached review patches. 0001 ---- 1) The docs say this: <para> The purpose of this function is to apply statistics values in an upgrade situation that are "good enough" for system operation until they are replaced by the next <command>ANALYZE</command>, usually via <command>autovacuum</command> This function is used by <command>pg_upgrade</command> and <command>pg_restore</command> to convey the statistics from the old system version into the new one. </para> I find this a bit confusing, considering the pg_dump/pg_restore changes are only in 0002, not in this patch. 2) Also, I'm not sure about this: <parameter>relation</parameter>, the parameters in this are all derived from <structname>pg_stats</structname>, and the values given are most often extracted from there. How do we know where do the values come from "most often"? I mean, where else would it come from? 3) The function pg_set_attribute_stats() is veeeeery long - 1000 lines or so, that's way too many for me to think about. I agree the flow is pretty simple, but I still wonder if there's a way to maybe split it into some smaller "meaningful" steps. 4) It took me *ages* to realize the enums at the beginning of some of the functions are actually indexes of arguments in PG_FUNCTION_ARGS. That'd surely deserve a comment explaining this. 5) The comment for param_names in pg_set_attribute_stats says this: /* names of columns that cannot be null */ const char *param_names[] = { ... } but isn't that actually incorrect? I think that applies only to a couple initial arguments, but then other fields (MCV, mcelem stats, ...) can be NULL, right? 6) There's a couple minor whitespace fixes or comments etc. 0002 ---- 1) I don't understand why we have exportExtStatsSupported(). Seems pointless - nothing calls it, even if it did we don't know how to export the stats. 2) I think this condition in dumpTableSchema() is actually incorrect: if ((tbinfo->dobj.dump & DUMP_COMPONENT_STATISTICS) && (tbinfo->relkind == RELKIND_VIEW)) dumpRelationStats(fout, &tbinfo->dobj, reltypename, Aren't indexes pretty much exactly the thing for which we don't want to dump statistics? In fact this skips dumping statistics for table - if you dump a database with a single table (-Fc), pg_restore -l will tell you this: 217; 1259 16385 TABLE public t user 3403; 0 16385 TABLE DATA public t user Which is not surprising, because table is not a view. With an expression index you get this: 217; 1259 16385 TABLE public t user 3404; 0 16385 TABLE DATA public t user 3258; 1259 16418 INDEX public t_expr_idx user 3411; 0 0 STATS IMPORT public INDEX t_expr_idx Unfortunately, fixing the condition does not work: $ pg_dump -Fc test > test.dump pg_dump: warning: archive items not in correct section order This happens for a very simple reason - the statistics are marked as SECTION_POST_DATA, which for the index works, because indexes are in post-data section. But the table stats are dumped right after data, still in the "data" section. IMO that's wrong, the statistics should be delayed to the post-data section. Which probably means there needs to be a separate dumpable object for statistics on table/index, with a dependency on the object. 3) I don't like the "STATS IMPORT" description. For extended statistics we dump the definition as "STATISTICS" so why to shorten it to "STATS" here? And "IMPORT" seems more like the process of loading data, not the data itself. So I suggest "STATISTICS DATA". regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
pgsql-hackers by date: