Re: [HACKERS] compression in LO and other fields - Mailing list pgsql-hackers
From | Karel Zak - Zakkr |
---|---|
Subject | Re: [HACKERS] compression in LO and other fields |
Date | |
Msg-id | Pine.LNX.3.96.991112101708.14930B-100000@ara.zf.jcu.cz Whole thread Raw |
In response to | Re: [HACKERS] compression in LO and other fields (wieck@debis.com (Jan Wieck)) |
Responses |
Re: [HACKERS] compression in LO and other fields
|
List | pgsql-hackers |
On Fri, 12 Nov 1999, Jan Wieck wrote: > Just in case someone want to implement a complete compressed > data type (including comarision functions, operators and > indexing default operator class). > > I already made some tests with a type I called 'lztext' > locally. Only the input-/output-functions exist so far and > as the name might suggest, it would be an alternative for > 'text'. It uses a simple but fast, byte oriented LZ backward > pointing method. No Huffman coding or variable offset/size > tagging. First byte of a chunk tells bitwise if the next > following 8 items are raw bytes to copy or 12 bit offset, 4 > bit size copy information. That is max back offset 4096 and > max match size 17 bytes. I is your original implementation or you use any current compression code? I try bzip2, but output from this algorithm is total binary, I don't know how this use in PgSQL if in backend are all routines (in/out) use *char (yes, I'am newbie for PgSQL hacking:-). > > What made it my preferred method was the fact, that > decompression is done entirely using the already decompressed > portion of the data, so it does not need any code tables or > the like at that time. > > It is really FASTEST on decompression, which I assume would > be the mostly often used operation on huge data types. With > some care, comparision could be done on the fly while > decompressing two values, so that the entire comparision can > be aborted at the occurence of the first difference. > > The compression rates aren't that giantic. I've got 30-50% Not is problem, that your implementation compress all data at once? Typically compression use a stream, and compress only small a buffer in any cycle. > for rule plan strings (size limit on views!!!). And the > method used only allows for buffer back references of 4K > offsets at most, so the rate will not grow for larger data > chunks. That's a heavy tradeoff between compression rate and > no memory leakage for sure and speed, I know, but I prefer > not to force it, instead I usually use a bigger hammer (the > tuple size limit is still our original problem - and another > IBM 72GB disk doing 22-37 MB/s will make any compressing data > type obsolete then). > > Sorry for the compression specific slang here. Well, anyone > interested in the code? Yes, for me - I finish to_char()/to_data() ora compatible routines (Thomas, you still quiet?) and this is new appeal for me :-) Karel
pgsql-hackers by date: