Re: ToDo: show size of partitioned table - Mailing list pgsql-hackers
From | Pavel Stehule |
---|---|
Subject | Re: ToDo: show size of partitioned table |
Date | |
Msg-id | CAFj8pRBJetR9tk5cNGt5mLewxwB4gXK0dm7L41fxUqBJ58h2wg@mail.gmail.com Whole thread Raw |
In response to | Re: ToDo: show size of partitioned table (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: ToDo: show size of partitioned table
|
List | pgsql-hackers |
čt 22. 11. 2018 v 1:51 odesílatel Michael Paquier <michael@paquier.xyz> napsal:
On Wed, Nov 21, 2018 at 05:37:33PM +0100, Pavel Stehule wrote:
> st 21. 11. 2018 v 17:21 odesílatel Alvaro Herrera <alvherre@2ndquadrant.com>
> napsal:
>> Hmm, these tests are not going to work, because they have "pavel" in the
>> expected output.
>
> I was blind, thank you for check
+create table testtable_apple(logdate date);
+create table testtable_orange(logdate date);
+create index testtable_apple_index on testtable_apple(logdate);
+create index testtable_orange_index on testtable_orange(logdate);
There are already a bunch of partition relations with multiple levels
created as part of the regression tests, so instead of creating more of
those, I would suggest to test \dP and \dPt in create_table.sql, and
\dPi in indexing.sql (please make sure to add tests for \dP with
partitioned indexes as well).
I think that you should really add the direct parent of a partition in
at least the verbose output, now for multiple partition levels things
are confusing in my opinion. For example with such a schema:
CREATE TABLE parent_tab (id int) PARTITION BY RANGE (id);
CREATE INDEX parent_index ON parent_tab (id);
CREATE TABLE child_0_10 PARTITION OF parent_tab
FOR VALUES FROM (0) TO (10);
CREATE TABLE child_10_20 PARTITION OF parent_tab
FOR VALUES FROM (10) TO (20);
CREATE TABLE child_20_30 PARTITION OF parent_tab
FOR VALUES FROM (20) TO (30);
INSERT INTO parent_tab VALUES (generate_series(0,29));
CREATE TABLE child_30_40 PARTITION OF parent_tab
FOR VALUES FROM (30) TO (40)
PARTITION BY RANGE(id);
CREATE TABLE child_30_35 PARTITION OF child_30_40
FOR VALUES FROM (30) TO (35);
CREATE TABLE child_35_40 PARTITION OF child_30_40
FOR VALUES FROM (35) TO (40);
INSERT INTO parent_tab VALUES (generate_series(30,39));
Then with \dP+ I got that:
=# \dP+
List of partitioned relations
Schema | Name | Owner | Size | Description
--------+-------------+--------+--------+-------------
public | child_30_40 | ioltas | 48 kB |
public | parent_tab | ioltas | 120 kB |
(2 rows)
Showing the parent partition looks like a pretty important to me as I
would expect multi-level partitions to be a frequent case (perhaps it
should show up as well in the non-verbose output?). The field should be
NULL if the relation is the top of the tree.
it looks like bug for me much more.
your example - on my comp
List of relations
+--------+-------------+-------------------+-------+------------+-------------+
| Schema | Name | Type | Owner | Size | Description |
+--------+-------------+-------------------+-------+------------+-------------+
| public | child_0_10 | table | pavel | 8192 bytes | |
| public | child_10_20 | table | pavel | 8192 bytes | |
| public | child_20_30 | table | pavel | 8192 bytes | |
| public | child_30_35 | table | pavel | 8192 bytes | |
| public | child_30_40 | partitioned table | pavel | 0 bytes | |
| public | child_35_40 | table | pavel | 8192 bytes | |
| public | parent_tab | partitioned table | pavel | 0 bytes | |
+--------+-------------+-------------------+-------+------------+-------------+
(7 rows)
+--------+-------------+-------------------+-------+------------+-------------+
| Schema | Name | Type | Owner | Size | Description |
+--------+-------------+-------------------+-------+------------+-------------+
| public | child_0_10 | table | pavel | 8192 bytes | |
| public | child_10_20 | table | pavel | 8192 bytes | |
| public | child_20_30 | table | pavel | 8192 bytes | |
| public | child_30_35 | table | pavel | 8192 bytes | |
| public | child_30_40 | partitioned table | pavel | 0 bytes | |
| public | child_35_40 | table | pavel | 8192 bytes | |
| public | parent_tab | partitioned table | pavel | 0 bytes | |
+--------+-------------+-------------------+-------+------------+-------------+
(7 rows)
there is about 5x 8KB data .. 40KB
But in views I got
List of partitioned tables
+--------+-------------+-------+-------+-------------+
| Schema | Name | Owner | Size | Description |
+--------+-------------+-------+-------+-------------+
| public | child_30_40 | pavel | 16 kB | |
| public | parent_tab | pavel | 40 kB | |
+--------+-------------+-------+-------+-------------+
(2 rows)
+--------+-------------+-------+-------+-------------+
| Schema | Name | Owner | Size | Description |
+--------+-------------+-------+-------+-------------+
| public | child_30_40 | pavel | 16 kB | |
| public | parent_tab | pavel | 40 kB | |
+--------+-------------+-------+-------+-------------+
(2 rows)
there is 16KB more, what is really messy.
I think so most correct is removing child_30_40 from the report.
test=# SELECT n.nspname as "Schema",
c.relname as "Name",
pg_catalog.pg_get_userbyid(c.relowner) as "Owner",
(SELECT pg_catalog.pg_size_pretty(sum(pg_catalog.pg_table_size(relid)))
FROM pg_catalog.pg_partition_tree(c.oid)) AS "Size",
pg_catalog.obj_description(c.oid, 'pg_class') as "Description"
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind IN ('p') and not c.relispartition
AND n.nspname <> 'pg_catalog'
AND n.nspname <> 'information_schema'
AND n.nspname !~ '^pg_toast'
AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 1,2;
+--------+------------+-------+-------+-------------+
| Schema | Name | Owner | Size | Description |
+--------+------------+-------+-------+-------------+
| public | parent_tab | pavel | 40 kB | |
+--------+------------+-------+-------+-------------+
(1 row)
c.relname as "Name",
pg_catalog.pg_get_userbyid(c.relowner) as "Owner",
(SELECT pg_catalog.pg_size_pretty(sum(pg_catalog.pg_table_size(relid)))
FROM pg_catalog.pg_partition_tree(c.oid)) AS "Size",
pg_catalog.obj_description(c.oid, 'pg_class') as "Description"
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
WHERE c.relkind IN ('p') and not c.relispartition
AND n.nspname <> 'pg_catalog'
AND n.nspname <> 'information_schema'
AND n.nspname !~ '^pg_toast'
AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 1,2;
+--------+------------+-------+-------+-------------+
| Schema | Name | Owner | Size | Description |
+--------+------------+-------+-------+-------------+
| public | parent_tab | pavel | 40 kB | |
+--------+------------+-------+-------+-------------+
(1 row)
I afraid of unreadable result if we allow overlap in report. I think so can be strange if some disk space will be reported 2x or more times in one report. Unfortunately It means so some information will be hidden. In this moment I prefer readability and simple meaning.
I am not strong in this topics. Another possibility is show parent (this should be displayed every time, without it it is messy).
This query is much more complex, but the result is more informative
SELECT n.nspname as "Schema",
c.relname as "Name",
n2.nspname as "Parent schema",
c2.relname as "Parent name",
pg_catalog.pg_get_userbyid(c.relowner) as "Owner",
s.max as "Hiearchy deep",
s.size as "Size",
pg_catalog.obj_description(c.oid, 'pg_class') as "Description"
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
LEFT JOIN pg_catalog.pg_inherits i ON c.oid = i.inhrelid
LEFT JOIN pg_catalog.pg_class c2 ON c2.oid = i.inhparent
LEFT JOIN pg_catalog.pg_namespace n2 ON n2.oid = c2.relnamespace,
LATERAL (SELECT max(level), pg_catalog.pg_size_pretty(sum(pg_catalog.pg_table_size(relid))) as size
FROM pg_catalog.pg_partition_tree(c.oid)) s
WHERE c.relkind IN ('p')
AND n.nspname <> 'pg_catalog'
AND n.nspname <> 'information_schema'
AND n.nspname !~ '^pg_toast'
AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 1,2;
c.relname as "Name",
n2.nspname as "Parent schema",
c2.relname as "Parent name",
pg_catalog.pg_get_userbyid(c.relowner) as "Owner",
s.max as "Hiearchy deep",
s.size as "Size",
pg_catalog.obj_description(c.oid, 'pg_class') as "Description"
FROM pg_catalog.pg_class c
LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
LEFT JOIN pg_catalog.pg_inherits i ON c.oid = i.inhrelid
LEFT JOIN pg_catalog.pg_class c2 ON c2.oid = i.inhparent
LEFT JOIN pg_catalog.pg_namespace n2 ON n2.oid = c2.relnamespace,
LATERAL (SELECT max(level), pg_catalog.pg_size_pretty(sum(pg_catalog.pg_table_size(relid))) as size
FROM pg_catalog.pg_partition_tree(c.oid)) s
WHERE c.relkind IN ('p')
AND n.nspname <> 'pg_catalog'
AND n.nspname <> 'information_schema'
AND n.nspname !~ '^pg_toast'
AND pg_catalog.pg_table_is_visible(c.oid)
ORDER BY 1,2;
+--------+-------------+---------------+-------------+-------+---------------+-------+-------------+
| Schema | Name | Parent schema | Parent name | Owner | Hiearchy deep | Size | Description |
+--------+-------------+---------------+-------------+-------+---------------+-------+-------------+
| public | child_30_40 | public | parent_tab | pavel | 1 | 16 kB | |
| public | parent_tab | | | pavel | 2 | 40 kB | |
+--------+-------------+---------------+-------------+-------+---------------+-------+-------------+
(2 rows)
| Schema | Name | Parent schema | Parent name | Owner | Hiearchy deep | Size | Description |
+--------+-------------+---------------+-------------+-------+---------------+-------+-------------+
| public | child_30_40 | public | parent_tab | pavel | 1 | 16 kB | |
| public | parent_tab | | | pavel | 2 | 40 kB | |
+--------+-------------+---------------+-------------+-------+---------------+-------+-------------+
(2 rows)
Still I prefer to not show nested partitioned tables for simplicity, readability reasons. Displaying nested objects in one table doesn't look like good idea for me. But I am ready to accept different common opinion.
Still do you think so variant with parent should be preferred?
Again, with the previous schema:
=# \dPi *idx
List of partitioned indexes
Schema | Name | Owner | Table
--------+--------------------+--------+-------------
public | child_30_40_id_idx | ioltas | child_30_40
(1 row)
=# \dP *idx
Did not find any partitioned relations named "*idx"
I would have expected in the second case to have the partitioned
*relations* showing up in the output, and a relation can be an index as
well if the pattern matches.
I think so it is correct - I don't would to see the index here, because index size is calculated by total_relation_size already.
Here my position is strong. \dP for me doesn't mean "tables or indexes" - it means "partition tables with total relation size". I don't see any sense to show tables and indexes in one report.
Regards
Pavel
Could you please address those problems first? The basic shape of the
patch with the three new sub-commands is fine I think, so we can go
ahead with that, but the two problems reported are blockers in my
opinion.
--
Michael
pgsql-hackers by date: