Re: extended stats on partitioned tables - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: extended stats on partitioned tables |
Date | |
Msg-id | e2196aa4-4a52-6168-6f13-f50b4d0edf48@enterprisedb.com Whole thread Raw |
In response to | extended stats on partitioned tables (Justin Pryzby <pryzby@telsasoft.com>) |
Responses |
Re: extended stats on partitioned tables
Re: extended stats on partitioned tables |
List | pgsql-hackers |
On 9/23/21 11:26 PM, Justin Pryzby wrote: > extended stats objects are allowed on partitioned tables since v10. > https://www.postgresql.org/message-id/flat/CAKJS1f-BmGo410bh5RSPZUvOO0LhmHL2NYmdrC_Jm8pk_FfyCA%40mail.gmail.com > 8c5cdb7f4f6e1d6a6104cb58ce4f23453891651b > > But since 859b3003de they're not populated - pg_statistic_ext(_data) is empty. > This was the consequence of a commit to avoid an error I reported with stats on > inheritence parents (not partitioned tables). > > preceding 859b3003de, stats on the parent table *did* improve the estimate, > so this part of the commit message seems to have been wrong? > |commit 859b3003de87645b62ee07ef245d6c1f1cd0cedb > | Don't build extended statistics on inheritance trees > ... > | Moreover, the current selectivity estimation code only works with individual > | relations, so building statistics on inheritance trees would be pointless > | anyway. > > |CREATE TABLE p (i int, a int, b int) PARTITION BY RANGE (i); > |CREATE TABLE pd PARTITION OF p FOR VALUES FROM (1)TO(100); > |TRUNCATE p; INSERT INTO p SELECT 1, a/100, a/100 FROM generate_series(1,999)a; > |CREATE STATISTICS pp ON (a),(b) FROM p; > |VACUUM ANALYZE p; > |SELECT * FROM pg_statistic_ext WHERE stxrelid ='p'::regclass; > > |postgres=# begin; DROP STATISTICS pp; explain analyze SELECT a,b FROM p GROUP BY 1,2; abort; > | HashAggregate (cost=20.98..21.98 rows=100 width=8) (actual time=1.088..1.093 rows=10 loops=1) > > |postgres=# explain analyze SELECT a,b FROM p GROUP BY 1,2; > | HashAggregate (cost=20.98..21.09 rows=10 width=8) (actual time=1.082..1.086 rows=10 loops=1) > > So I think this is a regression, and extended stats should be populated for > partitioned tables - I had actually done that for some parent tables and hadn't > noticed that the stats objects no longer do anything. > > That begs the question if the current behavior for inheritence parents is > correct.. > > CREATE TABLE p (i int, a int, b int); > CREATE TABLE pd () INHERITS (p); > INSERT INTO pd SELECT 1, a/100, a/100 FROM generate_series(1,999)a; > CREATE STATISTICS pp ON (a),(b) FROM p; > VACUUM ANALYZE p; > explain analyze SELECT a,b FROM p GROUP BY 1,2; > > | HashAggregate (cost=25.99..26.99 rows=100 width=8) (actual time=3.268..3.284 rows=10 loops=1) > Agreed, that seems like a regression, but I don't see how to fix that without having the extra flag in the catalog. Otherwise we can store just one version for each statistics object :-( > Since child tables can be queried directly, it's a legitimate question whether > we should collect stats for the table heirarchy or (since the catalog only > supports one) only the table itself. I'd think that stats for the table > hierarchy would be more commonly useful (but we shouldn't change the behavior > in existing releases again). Anyway it seems unfortunate that > statistic_ext_data still has no stxinherited. > Yeah, we probably need the flag - I planned to get it into 14, but then I got distracted by something else :-/ Attached is a PoC that I quickly bashed together today. It's pretty raw, but it passed "make check" and I think it does most of the things right. Can you try if this fixes the estimates with partitioned tables? Extended statistics use two catalogs, pg_statistic_ext for definition, while pg_statistic_ext_data stores the built statistics objects - the flag needs to be in the "data" catalog, and managing the records is a bit challenging - the current PoC code mostly works, but I had to relax some error checks and I'm sure there are cases when we fail to remove a row, or something like that. > Note that for partitioned tables if I enable enable_partitionwise_aggregate, > then stats objects on the child tables can be helpful (but that's also > confusing to the question at hand). > Yeah. I think it'd be helpful to assemble a script with various test cases demonstrating how we estimate various cases. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
pgsql-hackers by date: