Re: [PERFORM] Using array instead of sub table (storage and speed) - Mailing list pgsql-performance
From | Lutz Fischer |
---|---|
Subject | Re: [PERFORM] Using array instead of sub table (storage and speed) |
Date | |
Msg-id | 867b4235-95ba-6910-8224-824f75f7ce99@ed.ac.uk Whole thread Raw |
In response to | Re: [PERFORM] Using array instead of sub table (storage and speed) (Stephen Frost <sfrost@snowman.net>) |
Responses |
Re: [PERFORM] Using array instead of sub table (storage and speed)
|
List | pgsql-performance |
Hi Stephen, Thanks for your reply. The data in the sub table (sp) are only read in as a block. Meaning I will always read in all entries in [sp] that belong to one entry in [s]. Meaning I would not lose much in terms of what I could do with the data in [sp] and I could be saving around 2.8K per entry in [s] (just counting the overhead for each tuple in [sp]) per entry in [s] One thing I would still wonder is in how far this would affect the performance retrieving data from [s]. I often need some data from [s] where I don't care about [sp]. So in how far does having these arrays a part of [s] would make these queries slower. Or would be better to store the array data in a separate table e.g. have [s] as it is now but turn [sp] into an array aggregated table. Thanks, Lutz On 15/06/17 15:37, Stephen Frost wrote: > Greetings, > > * Lutz Fischer (l.fischer@ed.ac.uk) wrote: >> Data in [sp] are never changed. I can probably reduce the size by >> changing datatypes from numeric to float but I was wondering if it >> would be more efficient - primarily in terms of storage - to change >> the structure to have two arrays in [s]. E.g. > The short answer is 'probably', but it really depends on how wide your > rows are. > >> I haven't really found anything yet how much space (e.g. how many >> bytes) an array will use compared to a table row in postgresql. > There's a 24-byte overhead for each tuple. If the width of the tuple's > columns ends up being less than 24 bytes then half (or more) of the > space used is for the tuple header. Arrays have a bit of overhead > themsleves but are then densely packed. > > In testing that I've done, a table which looks like: > > CREATE TABLE t1 ( > c1 int > ); > > Will end up with a couple hundred rows per 8k page (perhaps 250 or so), > meaning that you get ~1k of actual data for 8k of storage. Changing > this to an array, like so: > > CREATE TABLE t1 ( > c1 int[] > ); > > And then storing 3-4 tuples per 8k page (keeping each tuple under the 2k > TOAST limit) lead to being able to store something like 450 ints per > tuple with a subsequent result of 1800 ints per page and ~7.2k worth of > actual data for 8k of storage, which was much more efficient for > storage. > > Of course, the tuple header is actually useful data in many > environments- if you go with this approach then you have to work out how > to deal with the fact that a given tuple is either visible or not, and > all the ints in the array for that tuple are all visible and that an > update to that tuple locks the entire tuple and that set of ints, etc. > If the data isn't changed after being loaded and you're able to load an > entire tuple all at once then this could work. > > Note that arrays aren't more efficient than just using individual > columns, and this technique is only going to be helpful if the tuple > overhead in your situation is a large portion of the data and using this > technique allows you to reduce the number of tuples stored. > > Thanks! > > Stephen -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
pgsql-performance by date: