Re: Smaller data types use same disk space - Mailing list pgsql-general

From Steve Crawford
Subject Re: Smaller data types use same disk space
Date
Msg-id 500F43C7.9090708@pinpointresearch.com
Whole thread Raw
In response to Smaller data types use same disk space  ("McGehee, Robert" <Robert.McGehee@geodecapital.com>)
List pgsql-general
On 07/24/2012 03:21 PM, McGehee, Robert wrote:
> Hi,
> I've created two tables labeled "Big" and "Small" that both store the same 10 million rows of data using 493MB and
487MBof disk space respectively. The difference is that the "Big" table uses data types that take up more space
(integerrather than smallint, float rather than real, etc). The "Big" table should need about 27 bytes/row versus 16
bytes/rowfor the "Small" table, indicating to me that the "Big" table should be 70% bigger in actual disk size. In
reality,it's only 1% bigger or 6MB (after clustering, vacuuming and analyzing). Why is this? Shouldn't the "Small"
tablebe about 110MB smaller (11 bytes for 10 million rows)? I'm estimating table size with \d+ 
>
> Thanks, Robert
>
>           Table "Big"
>    Column  |       Type       | Bytes
> ----------+------------------+-----------
>   rmid     | integer          | 4
>   date     | date             | 4
>   rmfactor | text             | 7 (about 3 characters/cell)
>   id       | integer          | 4
>   value    | double precision | 8
> ---------------------------------
>   Total Bytes/Row               27
>   Rows                          10M
>   Actual Size                   493MB
>
>
>      Table "Small"
>   Column |   Type   | Bytes
> --------+----------+-----------
>   rmid   | smallint | 2
>   date   | date     | 4
>   rmfid  | smallint | 2 (rmfid is a smallint index into the rmfactor table)
>   id     | integer  | 4
>   value  | real     | 4
> ---------------------------------
>   Total Bytes/Row     16
>   Rows                10M
>   Actual Size         487MB
>
>
>

More questions than answers:

What version of PostgreSQL?

How are your determining the space used by a table?

Why are you assuming 7 bytes for a 3-character value? (Character values
up to 126 characters long only have 1-character overhead.)

What is the fill-factor on the tables? (Should default to 100% but don't
know how you are configured.)

Do the tables have OIDs or not?

Other considerations are that rows don't split across pages so there is
a bit of waste per page. Also there could be compression considerations
though I'm not sure that small rows like this will be compressed.

Cheers,
Steve



pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: Smaller data types use same disk space
Next
From: jtkells
Date:
Subject: Re: insert binary data into a table column with psql