Re: Statistics Import and Export - Mailing list pgsql-hackers

From Corey Huinker
Subject Re: Statistics Import and Export
Date
Msg-id CADkLM=frNuS6xP-PPgAis-62pzL6cF6jve9-H967003voSd6DA@mail.gmail.com
Whole thread Raw
In response to Re: Statistics Import and Export  (jian he <jian.universality@gmail.com>)
List pgsql-hackers


If the relpages option contains -1 only for partitioned tables, shouldn't pg_set_relation_stats restrict the values that can be

specified by table type? The attached patch limits the value to -1 or more if the target

is a partition table, and 0 or more otherwise.

Changing relpages to -1 on a non-partitioned table seems to significantly change the execution plan.


Short answer: It's working as intended. Significantly changing the execution plan in weird ways is part of the intention of the function, even if the execution plan changes for the worse. I appreciate 

Longer answer:

Enforcing -1 on only partitioned tables is tricky, as it seems to be a value for any table that has no local storage. So foreign data wrapper tables could, in theory, also have this value. More importantly, the -1 value seems to be situational, in my experience it only happens on partitioned tables after they have their first partition added, which means that the current valid stat range is set according to facts that can change. Like so :

chuinker=# select version();
                                                              version                                                              
------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 16.4 (Postgres.app) on aarch64-apple-darwin21.6.0, compiled by Apple clang version 14.0.0 (clang-1400.0.29.102), 64-bit
(1 row)
chuinker=# create table part_parent (x integer) partition by range (x);
CREATE TABLE
chuinker=# select relpages from pg_class where oid = 'part_parent'::regclass;
 relpages
----------
        0
(1 row)

chuinker=# analyze part_parent;
ANALYZE
chuinker=# select relpages from pg_class where oid = 'part_parent'::regclass;
 relpages
----------
        0
(1 row)

chuinker=# create table part_child partition of part_parent for values from (0) TO (100);
CREATE TABLE
chuinker=# select relpages from pg_class where oid = 'part_parent'::regclass;
 relpages
----------
        0
(1 row)

chuinker=# analyze part_parent;
ANALYZE
chuinker=# select relpages from pg_class where oid = 'part_parent'::regclass;
 relpages
----------
       -1
(1 row)

chuinker=# drop table part_child;
DROP TABLE
chuinker=# select relpages from pg_class where oid = 'part_parent'::regclass;
 relpages
----------
       -1
(1 row)

chuinker=# analyze part_parent;
ANALYZE
chuinker=# select relpages from pg_class where oid = 'part_parent'::regclass;
 relpages
----------
       -1
(1 row)

Prior versions (March 2024 and earlier) of this patch and the pg_set_attribute_stats patch did have many checks to prevent importing stat values that were "wrong" in some way. Some examples from attribute stats import were:

* Histograms that were not monotonically nondecreasing.
* Frequency values that were out of bounds specified by other values in the array.
* Frequency values outside of the [0.0,1.0] or [-1.0,1.0] depending on the stat type.
* paired arrays of most-common-values and their attendant frequency array not having the same length

All of these checks were removed based on feedback from reviewers and committers who saw the pg_set_*_stats() functions as a fuzzing tool, so the ability to set illogical, wildly implausible, or mathematically impossible values was a feature, not a bug. I would suspect that they would view your demonstration that setting impossible values on a table as proof that the function can be used to experiment with planner scenarios. So, while I previously would have eagerly accepted this patch as another valued validation check, such checks don't fit with the new intention of the functions. Still, I greatly appreciate your helping us discover ways in which we can use this tool to make the planner do odd things.

One thing that could cause us to enforce a check like the one you submitted would be if an invalid value caused a query to fail or a session to crash, even then, that would probably spur a change to make the planner more defensive rather than more checks on the set_* side.

pgsql-hackers by date:

Previous
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: Make default subscription streaming option as Parallel
Next
From: Daniil Davydov
Date:
Subject: Re: Do not lock temp relations