Home > mailing lists

Re: cast bytea to double precision[] - Mailing list pgsql-novice

From	Mathieu Dubois
Subject	Re: cast bytea to double precision[]
Date	July 26, 2011 16:02:21
Msg-id	4E2F0F34.9080207@limsi.fr Whole thread Raw
In response to	Re: cast bytea to double precision[] (Merlin Moncure <mmoncure@gmail.com>)
Responses	Re: cast bytea to double precision[]
List	pgsql-novice

Tree view

On 07/26/2011 08:40 PM, Merlin Moncure wrote:
> On Tue, Jul 26, 2011 at 12:25 PM, Mathieu Dubois
> <mathieu.dubois@limsi.fr>  wrote:
>> On 07/26/2011 04:30 PM, Merlin Moncure wrote:
>>> On Tue, Jul 26, 2011 at 2:45 AM, Mathieu Dubois<mathieu.dubois@limsi.fr>
>>>   wrote:
>>>> Hello,
>>>>
>>>> Le 25/07/2011 17:58, Mathieu Dubois a écrit :
>>>>> On 07/25/2011 05:54 PM, Merlin Moncure wrote:
>>>>>> On Sun, Jul 24, 2011 at 2:03 PM, Mathieu
>>>>>> Dubois<mathieu.dubois@limsi.fr>    wrote:
>>>>>>> I have found a solution by myself for the conversion:
>>>>>>> regexp_split_to_array(sig_vector, E',')::double precision[] (elements
>>>>>>> are in
>>>>>>> fact separated by commas).
>>>>>>>
>>>>>>> To convert my column I have used:
>>>>>>> ALTER TABLE my_table ALTER sig_vector TO double precision[] USING
>>>>>>> regexp_split_to_array(sig_vector, E',')::double precision[];
>>>>>>>
>>>>>>> Is that correct?
>>>>>>> Is it correct to pass the column name to regexp_split_to_array()?
>>>>>> Yeah -- you are just passing a column's data into a function as an
>>>>>> argument -- standard practice. This will work -- your bytea is really
>>>>>> a text column, so it's just a matter of breaking up the string.
>>>>>> regexp_* functions are great for that.
>>>>> Thank you very much for your reply.
>>>>>
>>>>> I will launch the conversion right now.
>>>>>
>>>> The main reason to do this was to have smaller backups.
>>>> The size of a compressed backup was around 1GB with bytea.
>>>> I have converted the columns (on a copy of the database) but the expected
>>>> gain are not here!
>>>> With double precision[] it is still around 1GB (a little bit smaller but
>>>> just a few MB).
>>>>
>>>> Also the size on the disk is not smaller.
>>>> I have listed the content of /var/lib/postgres/8.4/main/base with du and
>>>> the
>>>> 2 versions have the same size (3.1GB).
>>>>
>>>> Does it make sense?
>>>> My hypothesis is that the compression algorithm is able to find
>>>> regularities
>>>> the data so it finds the same regularity in bytea and in double
>>>> precision[].
>>>>
>>>> Is there any advantage to use double precision[] over bytea in my case?
>>> probably not -- arrays can be significantly smaller than a set of
>>> individual tuples each holding one value because of the tuple
>>> overhead, but you still have to pay for the array header and a 4 byte
>>> length/null indicator per element.
>>>
>>> A packed string is often the smallest way to store data, although not
>>> necessarily the best.  A double precision[] comes with a lot of syntax
>>> advantages.
>> Thanks for your advice!
>>
>> I find the result surprising because the floats are encoded with a lot of
>> characters (something like 20) while a double is 8 bytes.
>>
>> I have tried to run VACCUM but it changed nothing...
>>
>> All of my code is based on strings so I won't take time to modify it if
>> there is no gain.
> right. remember that with each array element there is a 4 byte
> indicator (which is the length and the null flag) for each element.
> So array element size is 12 bytes, not 8, and you have the overhead of
> the array itself which contains dimension information and type oid.
> So, it's about a wash vs 15 digits precision floating point and one
> byte delimiter.
Are you sure that the length of each element is stored?
That seems like a waste of space (if you now that you are in a double
precision[] you know that each element is 8 bytes - or maybe array can
contain different types).
Hum, I forgot to say that I have several hundred or even thousands
elements so saving even few bytes/element might ends up in giga-bytes.
> If all the elements are known not null and the internally stored type
> is fixed length, it's hypothetically possible to not have to store the
> length/null indicator, but postgres does not do that and it might not
> be worth it even if you wanted to.
I initially wanted to store my vector directly as it is in memory.
It's a bad practice since floats can be different from one machine to
the other even if they all use IEEE 754 for instance (byte order, 32 or
64bits machines).
I remembered having some headaches trying to send floats over the
network between old G4 Macs and PC (G4 were using high-endian while
Intel machines are little-endian).

Moreover I use different programming languages (ranging from Ada to
Python) so text was the more common format.
I know that Ada has some ways to specify the memory layout of your
objects but I don't have a lot of experience with that (and it's
probably not the place to ask).
C often use the closest type implemented in hardware so you in a mixed
environment you have to deal with that

Maybe a simple trick is simply to put less data (removing trailing
zeros) and/or fixed relative precision.
> merlin
Thank you again for your time.
The postgres list is always helpful :)

Mathieu

pgsql-novice by date:

From: Merlin Moncure
Date: 26 July 2011, 15:40:50
Subject: Re: cast bytea to double precision[]

From: Merlin Moncure
Date: 26 July 2011, 16:19:50
Subject: Re: cast bytea to double precision[]

Re: cast bytea to double precision[] - Mailing list pgsql-novice

Previous

Next