Re: list of extended statistics on psql - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: list of extended statistics on psql |
Date | |
Msg-id | 914705d5-6dc0-817b-9405-891830f43841@enterprisedb.com Whole thread Raw |
In response to | Re: list of extended statistics on psql (Tatsuro Yamada <tatsuro.yamada.tf@nttcom.co.jp>) |
Responses |
Re: list of extended statistics on psql
|
List | pgsql-hackers |
On 1/19/21 1:44 AM, Tatsuro Yamada wrote: > Hi Tomas, > >>>> As for how to deal with this, I can think of about three ways: >>>> >>>> 1) simplify the command to only print information from >>>> pg_statistic_ext (so on information about which stats are built or >>>> sizes) >>>> >>>> 2) extend pg_stats_ext with necessary information (e.g. sizes) >>>> >>>> 3) create a new system view, with necessary information (so that >>>> pg_stats_ext does not need to be modified) >>>> >>>> 4) add functions returning the necessary information, possibly only >>>> for statistics the user can access (similarly to what pg_stats_ext >>>> does) >>>> >>>> Options 2-4 have the obvious disadvantage that this won't work on >>>> older releases (we can't add views or functions there). So I'm >>>> leaning towards #1 even if that means we have to remove some of the >>>> details. We can consider adding that for new releases, though. >>> >>> >>> Thanks for the useful advice. I go with option 1). >>> The following query is created by using pg_stats_ext instead of >>> pg_statistic_ext and pg_statistic_ext_data. However, I was confused >>> about writing a part of the query for calculating MCV size because >>> there are four columns related to MCV. For example, most_common_vals, >>> most_common_val_nulls, most_common_freqs, and most_common_base_freqs. >>> Currently, I don't know how to calculate the size of MCV by using the >>> four columns. Thoughts? :-) >> >> Well, my suggestion was to use pg_statistic_ext, because that lists >> all statistics, while pg_stats_ext is filtering statistics depending >> on access privileges. I think that's more appropriate for \dX, the >> contents should not change depending on the user. >> >> Also, let me clarify - with option (1) we'd not show the sizes at all. >> The size of the formatted statistics may be very different from the >> on-disk representation, so I see no point in showing it in \dX. >> >> We might show other stats (e.g. number of MCV items, or the fraction >> of data represented by the MCV list), but the user can inspect >> pg_stats_ext if needed. >> >> What we might do is to show those stats when a superuser is running >> this command, but I'm not sure that's a good idea (or how difficult >> would it be to implement). > > > Thanks for clarifying. > I see that your suggestion was to use pg_statistic_ext, not pg_stats_ext. > And we don't need the size of stats. > > If that's the case, we also can't get the status of stats since PG12 or > later > because we can't use pg_statistic_ext_data, as you know. Therefore, it > would be > better to replace the query with the old query that I sent five months > ago like this: > > # the old query > SELECT > stxnamespace::pg_catalog.regnamespace AS "Schema", > stxrelid::pg_catalog.regclass AS "Table", > stxname AS "Name", > (SELECT pg_catalog.string_agg(pg_catalog.quote_ident(attname),', ') > FROM pg_catalog.unnest(stxkeys) s(attnum) > JOIN pg_catalog.pg_attribute a ON (stxrelid = a.attrelid AND > a.attnum = s.attnum AND NOT attisdropped)) AS "Columns", > 'd' = any(stxkind) AS "Ndistinct", > 'f' = any(stxkind) AS "Dependencies", > 'm' = any(stxkind) AS "MCV" > FROM pg_catalog.pg_statistic_ext stat > ORDER BY 1,2; > > Schema | Table | Name | Columns | Ndistinct | Dependencies | MCV > --------+--------+------------+---------+-----------+--------------+----- > public | hoge1 | hoge1_ext | a, b | t | t | t > public | hoge_t | hoge_t_ext | a, b | t | t | t > (2 rows) > > > The above query is so simple so that we would better to use the > following query: > > # This query works on PG10 or later > SELECT > es.stxnamespace::pg_catalog.regnamespace::text AS "Schema", > es.stxname AS "Name", > pg_catalog.format('%s FROM %s', > (SELECT > pg_catalog.string_agg(pg_catalog.quote_ident(a.attname),', ') > FROM pg_catalog.unnest(es.stxkeys) s(attnum) > JOIN pg_catalog.pg_attribute a > ON (es.stxrelid = a.attrelid > AND a.attnum = s.attnum > AND NOT a.attisdropped)), > es.stxrelid::regclass) AS "Definition", > CASE WHEN 'd' = any(es.stxkind) THEN 'defined' > END AS "Ndistinct", > CASE WHEN 'f' = any(es.stxkind) THEN 'defined' > END AS "Dependencies", > CASE WHEN 'm' = any(es.stxkind) THEN 'defined' > END AS "MCV" > FROM pg_catalog.pg_statistic_ext es > ORDER BY 1, 2; > > Schema | Name | Definition | Ndistinct | Dependencies | > Dependencies > --------+------------+------------------+-----------+--------------+-------------- > > public | hoge1_ext | a, b FROM hoge1 | defined | defined | > defined > public | hoge_t_ext | a, b FROM hoge_t | defined | defined | > defined > (2 rows) > > > I'm going to create the WIP patch to use the above queriy. > Any comments welcome. :-D > Yes, I think using this simpler query makes sense. If we decide we need something more elaborate, we can improve that by in future PostgreSQL versions (after adding view/function to core), but I'd leave that as a work for the future. Apologies for all the extra work - I haven't realized this flaw when pushing for showing more stuff :-( regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: