Re: [HACKERS] compression in LO and other fields - Mailing list pgsql-hackers
| From | Karel Zak - Zakkr |
|---|---|
| Subject | Re: [HACKERS] compression in LO and other fields |
| Date | |
| Msg-id | Pine.LNX.3.96.991112101708.14930B-100000@ara.zf.jcu.cz Whole thread Raw |
| In response to | Re: [HACKERS] compression in LO and other fields (wieck@debis.com (Jan Wieck)) |
| Responses |
Re: [HACKERS] compression in LO and other fields
|
| List | pgsql-hackers |
On Fri, 12 Nov 1999, Jan Wieck wrote:
> Just in case someone want to implement a complete compressed
> data type (including comarision functions, operators and
> indexing default operator class).
>
> I already made some tests with a type I called 'lztext'
> locally. Only the input-/output-functions exist so far and
> as the name might suggest, it would be an alternative for
> 'text'. It uses a simple but fast, byte oriented LZ backward
> pointing method. No Huffman coding or variable offset/size
> tagging. First byte of a chunk tells bitwise if the next
> following 8 items are raw bytes to copy or 12 bit offset, 4
> bit size copy information. That is max back offset 4096 and
> max match size 17 bytes.
I is your original implementation or you use any current compression
code? I try bzip2, but output from this algorithm is total binary,
I don't know how this use in PgSQL if in backend are all routines
(in/out) use *char (yes, I'am newbie for PgSQL hacking:-).
>
> What made it my preferred method was the fact, that
> decompression is done entirely using the already decompressed
> portion of the data, so it does not need any code tables or
> the like at that time.
>
> It is really FASTEST on decompression, which I assume would
> be the mostly often used operation on huge data types. With
> some care, comparision could be done on the fly while
> decompressing two values, so that the entire comparision can
> be aborted at the occurence of the first difference.
>
> The compression rates aren't that giantic. I've got 30-50%
Not is problem, that your implementation compress all data at once?
Typically compression use a stream, and compress only small a buffer
in any cycle.
> for rule plan strings (size limit on views!!!). And the
> method used only allows for buffer back references of 4K
> offsets at most, so the rate will not grow for larger data
> chunks. That's a heavy tradeoff between compression rate and
> no memory leakage for sure and speed, I know, but I prefer
> not to force it, instead I usually use a bigger hammer (the
> tuple size limit is still our original problem - and another
> IBM 72GB disk doing 22-37 MB/s will make any compressing data
> type obsolete then).
>
> Sorry for the compression specific slang here. Well, anyone
> interested in the code?
Yes, for me - I finish to_char()/to_data() ora compatible routines
(Thomas, you still quiet?) and this is new appeal for me :-)
Karel
pgsql-hackers by date: