Home > mailing lists

a fast bloat measurement tool (was Re: Measuring relation free space) - Mailing list pgsql-hackers

From	Abhijit Menon-Sen
Subject	a fast bloat measurement tool (was Re: Measuring relation free space)
Date	April 2, 2014 21:41:53
Msg-id	20140402214144.GA28681@kea.toroid.org Whole thread Raw
In response to	Measuring relation free space (Greg Smith <greg@2ndQuadrant.com>)
Responses	Re: a fast bloat measurement tool (was Re: Measuring relation free space) Re: a fast bloat measurement tool (was Re: Measuring relation free space)
List	pgsql-hackers

Tree view

This is a follow-up to the thread at
http://www.postgresql.org/message-id/4EB5FA1B.1090305@2ndQuadrant.com

A quick summary: that thread proposed adding a relation_free_space()
function to the pageinspect extension. Various review comments were
received, among which was the suggestion that the code belonged in
pg_stattuple as a faster way to calculate free_percent.

===

I've attached an extension that produces largely pgstattuple-compatible
numbers for a table without doing a full-table scan.

It scans through the table, skipping blocks that have their visibility
map bit set. For such pages, it gets the free space from the free space
map, and assumes that all remaining space on the page is taken by live
tuples. It scans other pages tuple-by-tuple and counts live and dead
tuples and free space.

Here's a comparison of fastbloat and pgstattuple output on a 50-million
row table with some holes created with a single big DELETE statement:

ams=# select * from fastbloat('x');
 table_len  | scanned_percent | approx_tuple_count | approx_tuple_len | approx_tuple_percent | dead_tuple_count |
dead_tuple_len| dead_tuple_percent | free_space | free_percent  

------------+-----------------+--------------------+------------------+----------------------+------------------+----------------+--------------------+------------+--------------
 6714761216 |              17 |           41318301 |       5483815648 |                81.67 |          8681708 |
1111258624|              16.55 |   80972912 |         1.21 
(1 row)

Time: 639.455 ms

ams=# select * from pgstattuple('x');
 table_len  | tuple_count | tuple_len  | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent |
free_space| free_percent  

------------+-------------+------------+---------------+------------------+----------------+--------------------+------------+--------------
 6714761216 |    41318292 | 5288741376 |         78.76 |          8681708 |     1111258624 |              16.55 |
91810372|         1.37 
(1 row)

Time: 15610.651 ms

In the above, the table_len is nblocks*BLCKSZ, and the dead_tuple_count,
dead_tuple_len, dead_tuple_percent, free_space, and free_percent are all
exact. scanned_percent shows the percentage of pages that were scanned
tuple-by-tuple (the others having been skipped based on the VM bit).
The live tuple count, size, and percentage are all estimates.

The approx_tuple_count is calculated using vac_estimate_reltuples based
on the pages/tuples that were scanned. The approx_tuple_len is the exact
size of the live tuples on scanned pages, plus the approximate size from
skipped pages (BLCKSZ-GetRecordedFreeSpace()). This is an overestimate,
because it's counting the line pointer array as live tuple space.

Even in the worst case, when every page has dead tuples, fastbloat is
marginally faster than pgstattuple. The same table as the first example,
but with every even-numbered row deleted:

ams=# select * from fastbloat('x');
 table_len  | scanned_percent | approx_tuple_count | approx_tuple_len | approx_tuple_percent | dead_tuple_count |
dead_tuple_len| dead_tuple_percent | free_space | free_percent  

------------+-----------------+--------------------+------------------+----------------------+------------------+----------------+--------------------+------------+--------------
 6714761216 |             100 |           20659150 |       2644371200 |                39.38 |         20659142 |
2644370176|              39.38 | 1203068996 |        17.92 
(1 row)

Time: 8924.511 ms

ams=# select * from pgstattuple('x');
 table_len  | tuple_count | tuple_len  | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent |
free_space| free_percent  

------------+-------------+------------+---------------+------------------+----------------+--------------------+------------+--------------
 6714761216 |    20659150 | 2644371200 |         39.38 |         20659142 |     2644370176 |              39.38 |
1203068996|        17.92 
(1 row)

Time: 13338.712 ms

Since the code depends on the visibility map to determine which pages to
skip, it does not support indexes (which have no visibility map).

(Just drop the attached files into contrib/fastbloat, and "make install"
should just work. Then just "create extension fastbloat".)

Questions and suggestions welcome.

-- Abhijit

Attachment

pgsql-hackers by date:

From: Peter Geoghegan
Date: 02 April 2014, 20:33:39
Subject: Re: Doing better at HINTing an appropriate column within errorMissingColumn()

From: Andres Freund
Date: 02 April 2014, 21:50:28
Subject: Re: jsonb is also breaking the rule against nameless unions

a fast bloat measurement tool (was Re: Measuring relation free space) - Mailing list pgsql-hackers

Attachment

Previous

Next