Thread: Database takes up MUCH more disk space than it should

Database takes up MUCH more disk space than it should

From
Dan Charrois
Date:
Hi everyone.  I'm currently in the situation of administering a rather large PostgreSQL database which for some reason
seemsto be even much larger than it should be. 

I'm currently running version 8.4.5 - not the latest and greatest, I know - but this is a live database that would
problematicto take down to upgrade unless all else fails - especially considering its size if it does need to be
rebuiltsomehow. 

Anyway, I'm no stranger to SQL, but new to PostgreSQL - all my SQL administration in the past has been with MySQL.  So
I'msomewhat bumbling my way through administrative commands trying to solve this - please bear with me. 

The size of the tables reported by \dt+ add up to around 120 GB.  The size of the indexes reported with \di+ adds up to
around15 GB.  This is pretty consistent with what I would expect the data to require. 

The problem is, the disk usage of the pgsql directory where the data is kept (as reported by 'du') comes to 647 GB -
significantlymore than it should.  select pg_database_size('mydatabase') confirms this, returning 690830939920. 

Vacuuming the tables (full and otherwise) hasn't helped, but then considering how the database is used, I didn't really
expectit to.  It's strictly a read-only database, with the exception of once a month when it is refreshed by loading
newdata into newly created tables, and once that is done, vacuum analyzing the new tables, dropping the old tables,
thenrenaming the new ones to have the name of the old ones.  Vacuums never claim to recover any space, and the disk
usagestays the same. 

So how do I find out what's eating up all this extra space?

I'm not sure this is related, but in doing a bit of digging I ran across the following command to try and see where the
spaceis being used: 

SELECT relname, pg_size_pretty(relpages::bigint * 8 *1024) AS size, CASE WHEN relkind = 't' THEN (SELECT pgd.relname
FROMpg_class pgd WHERE pgd.relfilenode::text = SUBSTRING(pg.relname FROM 10)) ELSE (SELECT pgc.relname FROM pg_class
pgcWHERE pg.reltoastrelid = pgc.relfilenode) END AS refrelname, relfilenode, relkind, reltuples::bigint, relpages FROM
pg_classpg ORDER BY relpages DESC; 

The biggest culprit in this is a file named pg_toast_101748 which weighs in at 242 GB.  I understand that the toast
filesare supplemental storage files linked to tables, but I'm wondering if that particular file (and perhaps others)
havelost their links?  The reason I consider this is the third column - which typically shows database names
correspondingto most other toast files, is completely empty for that one.  There are other toast files too that don't
seemto refer to a "real" database, but they only weight in at 2 GB or less, so they're less of a problem. 

Sometimes in the past, the import process I mentioned above has crashed due to a lack of memory, as did the postgres
daemonitself on at least one occasion, which I'm wondering may have left the internal database organization structure
inan uncertain state (since our tables are created from scratch every month, I don't suspect that our particular
databaseis corrupted, though I'm wondering if PostgreSQL's inner working have become so).  Is it possible that in such
ascenario, a pg_toast file might be created but never used?  In such a case, how is that most safely deleted?  Or am I
completelybarking up the wrong tree? 

I've done a bunch of Google searching and haven't come up with anything so far to shed some light on this.  Any help
someonecould provide on how to figure out where this substantial amount of extra disk space is being used would be
greatlyappreciated! 

Thanks!

Dan
--
Syzygy Research & Technology
Box 83, Legal, AB  T0G 1L0 Canada
Phone: 780-961-2213


Re: Database takes up MUCH more disk space than it should

From
Adrian Klaver
Date:
On Saturday, January 21, 2012 12:37:17 am Dan Charrois wrote:
> Hi everyone.  I'm currently in the situation of administering a rather
> large PostgreSQL database which for some reason seems to be even much
> larger than it should be.
>
> I'm currently running version 8.4.5 - not the latest and greatest, I know -
> but this is a live database that would problematic to take down to upgrade
> unless all else fails - especially considering its size if it does need to
> be rebuilt somehow.
>

>
> The size of the tables reported by \dt+ add up to around 120 GB.  The size
> of the indexes reported with \di+ adds up to around 15 GB.  This is pretty
> consistent with what I would expect the data to require.
>
> The problem is, the disk usage of the pgsql directory where the data is
> kept (as reported by 'du') comes to 647 GB - significantly more than it
> should.  select pg_database_size('mydatabase') confirms this, returning
> 690830939920.
>

>
> So how do I find out what's eating up all this extra space?
>
> I'm not sure this is related, but in doing a bit of digging I ran across
> the following command to try and see where the space is being used:
>
> SELECT relname, pg_size_pretty(relpages::bigint * 8 *1024) AS size, CASE
> WHEN relkind = 't' THEN (SELECT pgd.relname FROM pg_class pgd WHERE
> pgd.relfilenode::text = SUBSTRING(pg.relname FROM 10)) ELSE (SELECT
> pgc.relname FROM pg_class pgc WHERE pg.reltoastrelid = pgc.relfilenode)
> END AS refrelname, relfilenode, relkind, reltuples::bigint, relpages FROM
> pg_class pg ORDER BY relpages DESC;
>
> The biggest culprit in this is a file named pg_toast_101748 which weighs in
> at 242 GB.  I understand that the toast files are supplemental storage
> files linked to tables, but I'm wondering if that particular file (and
> perhaps others) have lost their links?  The reason I consider this is the
> third column - which typically shows database names corresponding to most
> other toast files, is completely empty for that one.  There are other
> toast files too that don't seem to refer to a "real" database, but they
> only weight in at 2 GB or less, so they're less of a problem.
>

If I follow the query above correctly, it is not getting the information you
think it is. In particular this part:

...SELECT pgc.relname FROM pg_class pgc WHERE pg.reltoastrelid = pgc.relfilenode)

Per the docs:
http://www.postgresql.org/docs/8.4/interactive/catalog-pg-class.html
reltoastrelid = The OID of the TOAST table not the relfilenode
When I table is created those numbers are the same, but they can diverge over
time.

I would do something like
select oid, relfilenode, relname from pg_class where relname = 'pg_toast_101748';

This will get you the OID and also show if it differs from the relfilenode.

Then something like:
select * from pg_class where relkind='r' and reltoastrelid=[oid from above]

This should show you if the TOAST table has been orphaned and if not what table
it is associated with.

>
> I've done a bunch of Google searching and haven't come up with anything so
> far to shed some light on this.  Any help someone could provide on how to
> figure out where this substantial amount of extra disk space is being used
> would be greatly appreciated!
>
> Thanks!
>
> Dan
> --
> Syzygy Research & Technology
> Box 83, Legal, AB  T0G 1L0 Canada
> Phone: 780-961-2213

--
Adrian Klaver
adrian.klaver@gmail.com

Re: Database takes up MUCH more disk space than it should

From
Scott Marlowe
Date:
On Sat, Jan 21, 2012 at 1:37 AM, Dan Charrois <dan001@syz.com> wrote:
> Hi everyone.  I'm currently in the situation of administering a rather large PostgreSQL database which for some
reasonseems to be even much larger than it should be. 
>
> I'm currently running version 8.4.5 - not the latest and greatest, I know - but this is a live database that would
problematicto take down to upgrade unless all else fails - especially considering its size if it does need to be
rebuiltsomehow. 
>
> Anyway, I'm no stranger to SQL, but new to PostgreSQL - all my SQL administration in the past has been with MySQL.
 SoI'm somewhat bumbling my way through administrative commands trying to solve this - please bear with me. 
>
> The size of the tables reported by \dt+ add up to around 120 GB.  The size of the indexes reported with \di+ adds up
toaround 15 GB.  This is pretty consistent with what I would expect the data to require. 
>
> The problem is, the disk usage of the pgsql directory where the data is kept (as reported by 'du') comes to 647 GB -
significantlymore than it should.  select pg_database_size('mydatabase') confirms this, returning 690830939920. 
>
> Vacuuming the tables (full and otherwise) hasn't helped, but then considering how the database is used, I didn't
reallyexpect it to.  It's strictly a read-only database, with the exception of once a month when it is refreshed by
loadingnew data into newly created tables, and once that is done, vacuum analyzing the new tables, dropping the old
tables,then renaming the new ones to have the name of the old ones.  Vacuums never claim to recover any space, and the
diskusage stays the same. 
>
> So how do I find out what's eating up all this extra space?

Real quick, if you run pg_database_size(name) for each db, including
template1 and postgres, what do you get back?

Re: Database takes up MUCH more disk space than it should

From
Dan Charrois
Date:
>>
>> SELECT relname, pg_size_pretty(relpages::bigint * 8 *1024) AS size, CASE
>> WHEN relkind = 't' THEN (SELECT pgd.relname FROM pg_class pgd WHERE
>> pgd.relfilenode::text = SUBSTRING(pg.relname FROM 10)) ELSE (SELECT
>> pgc.relname FROM pg_class pgc WHERE pg.reltoastrelid = pgc.relfilenode)
>> END AS refrelname, relfilenode, relkind, reltuples::bigint, relpages FROM
>> pg_class pg ORDER BY relpages DESC;
>
> If I follow the query above correctly, it is not getting the information you
> think it is. In particular this part:
>
> ...SELECT pgc.relname FROM pg_class pgc WHERE pg.reltoastrelid = pgc.relfilenode)
>
> Per the docs:
> http://www.postgresql.org/docs/8.4/interactive/catalog-pg-class.html
> reltoastrelid = The OID of the TOAST table not the relfilenode
> When I table is created those numbers are the same, but they can diverge over
> time.
>
> I would do something like
> select oid, relfilenode, relname from pg_class where relname = 'pg_toast_101748';
>
> This will get you the OID and also show if it differs from the relfilenode.
>
> Then something like:
> select * from pg_class where relkind='r' and reltoastrelid=[oid from above]
>
> This should show you if the TOAST table has been orphaned and if not what table
> it is associated with.

Thank you Adrian.  I think that you seem to have found the trouble.  For most of the TOAST tables I have,
oid=relfilenode,but not for that one.  I found the table that has reltoastrelid linking to that huge TOAST table.. and
itmakes some sense, since it is also the largest "regular" table too (79 GB). 

So perhaps there are no orphaned TOAST tables after all, as now I know who its parent is.  The database still takes up
alot more physical storage than I'd anticipated it would, but at least it appears as though that space can be accounted
for.

It's too bad \dt+ doesn't take into account the related TOAST table too - if it had, I would have expected that much
diskspace right from the get-go, and never thought twice about it.  I suppose that's the danger of not learning enough
aboutadministration of PostgreSQL and trying to troubleshoot a perceived problem that may not even have been a problem
inthe first place.  Until a few days ago, I hadn't even heard of TOAST tables, and just presumed all the data was
stuffedinto the database I created directly.  From what I've read about them since, they sound like a great idea - but
Inever anticipated them, or their effect on trying to sort out exactly where my data went. 

Thanks a lot for shedding the light on this subject that I needed!

Dan
--
Syzygy Research & Technology
Box 83, Legal, AB  T0G 1L0 Canada
Phone: 780-961-2213


Re: Database takes up MUCH more disk space than it should

From
Dan Charrois
Date:
On 2012-Jan-21, at 6:39 PM, Scott Marlowe wrote:

> On Sat, Jan 21, 2012 at 1:37 AM, Dan Charrois <dan001@syz.com> wrote:
>> Hi everyone.  I'm currently in the situation of administering a rather large PostgreSQL database which for some
reasonseems to be even much larger than it should be. 
>>
>> I'm currently running version 8.4.5 - not the latest and greatest, I know - but this is a live database that would
problematicto take down to upgrade unless all else fails - especially considering its size if it does need to be
rebuiltsomehow. 
>>
>> Anyway, I'm no stranger to SQL, but new to PostgreSQL - all my SQL administration in the past has been with MySQL.
SoI'm somewhat bumbling my way through administrative commands trying to solve this - please bear with me. 
>>
>> The size of the tables reported by \dt+ add up to around 120 GB.  The size of the indexes reported with \di+ adds up
toaround 15 GB.  This is pretty consistent with what I would expect the data to require. 
>>
>> The problem is, the disk usage of the pgsql directory where the data is kept (as reported by 'du') comes to 647 GB -
significantlymore than it should.  select pg_database_size('mydatabase') confirms this, returning 690830939920. 
>>
>> Vacuuming the tables (full and otherwise) hasn't helped, but then considering how the database is used, I didn't
reallyexpect it to.  It's strictly a read-only database, with the exception of once a month when it is refreshed by
loadingnew data into newly created tables, and once that is done, vacuum analyzing the new tables, dropping the old
tables,then renaming the new ones to have the name of the old ones.  Vacuums never claim to recover any space, and the
diskusage stays the same. 
>>
>> So how do I find out what's eating up all this extra space?
>
> Real quick, if you run pg_database_size(name) for each db, including
> template1 and postgres, what do you get back?


Thanks for your reply, Scott.

My database: 697490323216
postgres: 5537796
template0: 5537796
template1: 5537796

pg_database_size(mydatabase) did return a value consistent with disk usage.  But that value didn't jive with what I
expectedfrom summing up the sizes I got from \dt+ 

It looks like a large TOAST table, not reported by \dt+ was the biggest culprit, but I thought it was orphaned.  Due to
somehelp by Adrian Klaver, it looks like I was mistaken - it was in fact used by one of my tables.  So it looks like
therewasn't really a problem at all - other than my data still taking up a lot more physical storage than I thought it
did.I don't think there's much I can do about it, but that's much better than assuming it was related to a problem that
neededfixing. 

Thanks again!

Dan
--
Syzygy Research & Technology
Box 83, Legal, AB  T0G 1L0 Canada
Phone: 780-961-2213


Re: Database takes up MUCH more disk space than it should

From
John R Pierce
Date:
On 01/22/12 12:32 AM, Dan Charrois wrote:
> It looks like a large TOAST table, not reported by \dt+ was the biggest culprit, but I thought it was orphaned.  Due
tosome help by Adrian Klaver, it looks like I was mistaken - it was in fact used by one of my tables.  So it looks like
therewasn't really a problem at all - other than my data still taking up a lot more physical storage than I thought it
did.I don't think there's much I can do about it, but that's much better than assuming it was related to a problem that
neededfixing. 

well, there's a good possibiltiy that clustering that table will cause a
big chunk of that space to be freed up.   first, make sure you have no
old '<IDLE> in transasction' processes, then run CLUSTER table USING
indexname;    (probably the primary key, unless there's another index
thats used more frequently).   this is a high overhead operation that
will copy all the data in that table to new disk space, then delete the
old files, ordering it by the specified key, it will then rebuild all
indexes.  naturally, there's a global lock on this table for the
duration, so this should be done during a maintenance window with your
application(s) shut down.


--
john r pierce                            N 37, W 122
santa cruz ca                         mid-left coast


Re: Database takes up MUCH more disk space than it should

From
Tom Lane
Date:
Dan Charrois <dan001@syz.com> writes:
> It's too bad \dt+ doesn't take into account the related TOAST table
> too - if it had, I would have expected that much disk space right from
> the get-go, and never thought twice about it.

FWIW, that's been changed as of 9.1.

            regards, tom lane

Re: Database takes up MUCH more disk space than it should

From
Adrian Klaver
Date:
On Sunday, January 22, 2012 12:26:22 am Dan Charrois wrote:

>
> Thank you Adrian.  I think that you seem to have found the trouble.  For
> most of the TOAST tables I have, oid=relfilenode, but not for that one.  I
> found the table that has reltoastrelid linking to that huge TOAST table..
> and it makes some sense, since it is also the largest "regular" table too
> (79 GB).

The reason for that is found here:

http://www.postgresql.org/docs/9.0/interactive/storage-file-layout.html

"
Caution
Note that while a table's filenode often matches its OID, this is not necessarily
the case; some operations, like TRUNCATE, REINDEX, CLUSTER and some forms of
ALTER TABLE, can change the filenode while preserving the OID. Avoid assuming
that filenode and table OID are the same. Also, for certain system catalogs
including pg_class itself, pg_class.relfilenode contains zero. The actual filenode
number of these catalogs is stored in a lower-level data structure, and can be
obtained using the pg_relation_filenode() function.
"

>
> So perhaps there are no orphaned TOAST tables after all, as now I know who
> its parent is.  The database still takes up a lot more physical storage
> than I'd anticipated it would, but at least it appears as though that
> space can be accounted for.
>
> It's too bad \dt+ doesn't take into account the related TOAST table too -
> if it had, I would have expected that much disk space right from the
> get-go, and never thought twice about it.

In pre 9.1 databases you  can use:

"
pg_total_relation_size accepts the OID or name of a table or toast table, and
returns the total on-disk space used for that table, including all associated
indexes. This function is equivalent to pg_table_size + pg_indexes_size.

pg_table_size accepts the OID or name of a table and returns the disk space
needed for that table, exclusive of indexes. (TOAST space, free space map, and
visibility map are included.)
"

See details here:
http://www.postgresql.org/docs/9.0/interactive/functions-admin.html

>
> Dan
> --
> Syzygy Research & Technology
> Box 83, Legal, AB  T0G 1L0 Canada
> Phone: 780-961-2213

--
Adrian Klaver
adrian.klaver@gmail.com