Showing applied extended statistics in explain - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Showing applied extended statistics in explain |
Date | |
Msg-id | 8081617b-d80f-ae2b-b79f-ea7e926f9fcf@enterprisedb.com Whole thread Raw |
Responses |
Re: Showing applied extended statistics in explain
Re: Showing applied extended statistics in explain |
List | pgsql-hackers |
Hi, With extended statistics it may not be immediately obvious if they were applied and to which clauses. If you have multiple extended statistics, we may also apply them in different order, etc. And with expressions, there's also the question of matching expressions to the statistics. So it seems useful to include this into in the explain plan - show which statistics were applied, in which order. Attached is an early PoC patch doing that in VERBOSE mode. I'll add it to the next CF. A simple example demonstrating the idea: ====================================================================== create table t (a int, b int); insert into t select mod(i,10), mod(i,10) from generate_series(1,100000) s(i); create statistics s on a, b from t; analyze t; test=# explain (verbose) select * from t where a = 1 and b = 1; QUERY PLAN --------------------------------------------------------------- Seq Scan on public.t (cost=0.00..1943.00 rows=10040 width=8) Output: a, b Filter: ((t.a = 1) AND (t.b = 1)) Statistics: public.s Clauses: ((a = 1) AND (b = 1)) (4 rows) test=# explain (verbose) select 1 from t group by a, b; QUERY PLAN ---------------------------------------------------------------------- HashAggregate (cost=1943.00..1943.10 rows=10 width=12) Output: 1, a, b Group Key: t.a, t.b -> Seq Scan on public.t (cost=0.00..1443.00 rows=100000 width=8) Output: a, b Statistics: public.s Clauses: (a AND b) (6 rows) ====================================================================== The current implementation is a bit ugly PoC, with a couple annoying issues that need to be solved: 1) The information is stashed in multiple lists added to a Plan. Maybe there's a better place, and maybe we need to invent a better way to track the info (a new node stashed in a single List). 2) The deparsing is modeled (i.e. copied) from how we deal with index quals, but it's having issues with nested OR clauses, because there are nested RestrictInfo nodes and the deparsing does not expect that. 3) It does not work for functional dependencies, because we effectively "merge" all functional dependencies and apply the entries. Not sure how to display this, but I think it should show the individual dependencies actually applied. 4) The info is collected always, but I guess we should do that only when in explain mode. Not sure how expensive it is. 5) It includes just statistics name + clauses, but maybe we should include additional info (e.g estimate for that combination of clauses). 6) The clauses in the grouping query are transformed to AND list, which is wrong. This is easy to fix, I was lazy to do that in a PoC patch. 7) It does not show statistics for individual expressions. I suppose examine_variable could add it to the rel somehow, and maybe we could do that with index expressions too? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
pgsql-hackers by date: