Re: cast bytea to double precision[] - Mailing list pgsql-novice
From | Mathieu Dubois |
---|---|
Subject | Re: cast bytea to double precision[] |
Date | |
Msg-id | 4E2F0F34.9080207@limsi.fr Whole thread Raw |
In response to | Re: cast bytea to double precision[] (Merlin Moncure <mmoncure@gmail.com>) |
Responses |
Re: cast bytea to double precision[]
|
List | pgsql-novice |
On 07/26/2011 08:40 PM, Merlin Moncure wrote: > On Tue, Jul 26, 2011 at 12:25 PM, Mathieu Dubois > <mathieu.dubois@limsi.fr> wrote: >> On 07/26/2011 04:30 PM, Merlin Moncure wrote: >>> On Tue, Jul 26, 2011 at 2:45 AM, Mathieu Dubois<mathieu.dubois@limsi.fr> >>> wrote: >>>> Hello, >>>> >>>> Le 25/07/2011 17:58, Mathieu Dubois a écrit : >>>>> On 07/25/2011 05:54 PM, Merlin Moncure wrote: >>>>>> On Sun, Jul 24, 2011 at 2:03 PM, Mathieu >>>>>> Dubois<mathieu.dubois@limsi.fr> wrote: >>>>>>> I have found a solution by myself for the conversion: >>>>>>> regexp_split_to_array(sig_vector, E',')::double precision[] (elements >>>>>>> are in >>>>>>> fact separated by commas). >>>>>>> >>>>>>> To convert my column I have used: >>>>>>> ALTER TABLE my_table ALTER sig_vector TO double precision[] USING >>>>>>> regexp_split_to_array(sig_vector, E',')::double precision[]; >>>>>>> >>>>>>> Is that correct? >>>>>>> Is it correct to pass the column name to regexp_split_to_array()? >>>>>> Yeah -- you are just passing a column's data into a function as an >>>>>> argument -- standard practice. This will work -- your bytea is really >>>>>> a text column, so it's just a matter of breaking up the string. >>>>>> regexp_* functions are great for that. >>>>> Thank you very much for your reply. >>>>> >>>>> I will launch the conversion right now. >>>>> >>>> The main reason to do this was to have smaller backups. >>>> The size of a compressed backup was around 1GB with bytea. >>>> I have converted the columns (on a copy of the database) but the expected >>>> gain are not here! >>>> With double precision[] it is still around 1GB (a little bit smaller but >>>> just a few MB). >>>> >>>> Also the size on the disk is not smaller. >>>> I have listed the content of /var/lib/postgres/8.4/main/base with du and >>>> the >>>> 2 versions have the same size (3.1GB). >>>> >>>> Does it make sense? >>>> My hypothesis is that the compression algorithm is able to find >>>> regularities >>>> the data so it finds the same regularity in bytea and in double >>>> precision[]. >>>> >>>> Is there any advantage to use double precision[] over bytea in my case? >>> probably not -- arrays can be significantly smaller than a set of >>> individual tuples each holding one value because of the tuple >>> overhead, but you still have to pay for the array header and a 4 byte >>> length/null indicator per element. >>> >>> A packed string is often the smallest way to store data, although not >>> necessarily the best. A double precision[] comes with a lot of syntax >>> advantages. >> Thanks for your advice! >> >> I find the result surprising because the floats are encoded with a lot of >> characters (something like 20) while a double is 8 bytes. >> >> I have tried to run VACCUM but it changed nothing... >> >> All of my code is based on strings so I won't take time to modify it if >> there is no gain. > right. remember that with each array element there is a 4 byte > indicator (which is the length and the null flag) for each element. > So array element size is 12 bytes, not 8, and you have the overhead of > the array itself which contains dimension information and type oid. > So, it's about a wash vs 15 digits precision floating point and one > byte delimiter. Are you sure that the length of each element is stored? That seems like a waste of space (if you now that you are in a double precision[] you know that each element is 8 bytes - or maybe array can contain different types). Hum, I forgot to say that I have several hundred or even thousands elements so saving even few bytes/element might ends up in giga-bytes. > If all the elements are known not null and the internally stored type > is fixed length, it's hypothetically possible to not have to store the > length/null indicator, but postgres does not do that and it might not > be worth it even if you wanted to. I initially wanted to store my vector directly as it is in memory. It's a bad practice since floats can be different from one machine to the other even if they all use IEEE 754 for instance (byte order, 32 or 64bits machines). I remembered having some headaches trying to send floats over the network between old G4 Macs and PC (G4 were using high-endian while Intel machines are little-endian). Moreover I use different programming languages (ranging from Ada to Python) so text was the more common format. I know that Ada has some ways to specify the memory layout of your objects but I don't have a lot of experience with that (and it's probably not the place to ask). C often use the closest type implemented in hardware so you in a mixed environment you have to deal with that Maybe a simple trick is simply to put less data (removing trailing zeros) and/or fixed relative precision. > merlin Thank you again for your time. The postgres list is always helpful :) Mathieu
pgsql-novice by date: