Re: [HACKERS] compression in LO and other fields - Mailing list pgsql-hackers
From | wieck@debis.com (Jan Wieck) |
---|---|
Subject | Re: [HACKERS] compression in LO and other fields |
Date | |
Msg-id | m11mIp4-0003kLC@orion.SAPserv.Hamburg.dsh.de Whole thread Raw |
In response to | Re: [HACKERS] compression in LO and other fields (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [HACKERS] compression in LO and other fields
|
List | pgsql-hackers |
Tom Lane wrote: > wieck@debis.com (Jan Wieck) writes: > > > But it requires decompression of every tuple into palloc()'d > > memory during heap access. AFAIK, the heap access routines > > currently return a pointer to the tuple inside the shm > > buffer. Don't know what it's performance impact would be. > > Good point, but the same will be needed when a tuple is split across > multiple blocks. I would expect that (given a reasonably fast > decompressor) there will be a net performance *gain* due to having > less disk I/O to do. Also, this won't be happening for "every" tuple, > just those exceeding a size threshold --- we'd be able to tune the > threshold value to trade off speed and space. Right, this time it's your good point. All of the problems will be there on tuple split implementation. The major problem I see is that a palloc()'d tuple should be pfree()'d after the fetcher is done with it. Since they are in buffer actually, the fetcher doesn't have to care. > One thing that does occur to me is that we need to store the > uncompressed as well as the compressed data size, so that the > working space can be palloc'd before starting the decompression. Yepp - and I'm doing so. Only during compression the result size isn't known. But there is a well known maximum, that is the header overhead plus the data size by 1.125 plus 2 bytes (totally worst case on uncompressable data). And a general mechanism working on the tuple level would fallback to store uncompressed data in the case the compressed size is bigger. > Also, in case it wasn't clear, I was envisioning leaving the tuple > header uncompressed, so that time quals etc can be checked before > decompressing the tuple data. Of course. Well, you asked for the rates on the smaller html files only. 78 files, 131 bytes min, 10000 bytes max, 4582 bytes avg, 357383 bytes total. gzip -9 outputs 145659 bytes (59.2%) gzip -1 outputs 155113 bytes (56.6%) my code outputs 184109 bytes (48.5%) 67 files, 2000 bytes min, 10000 bytes max, 5239 bytes avg, 351006 bytes total. gzip -9 outputs 141772 bytes (59.6%) gzip -1 outputs 151150 bytes (56.9%) my code outputs 179428 bytes (48.9%) The threshold will surely be a tuning parameter of interest. Another tuning option must be to allow/deny compression per table at all. Then we could have both options, using a compressing field type to define which portion of a tuple to compress, or allow to compress the entire tuples. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #========================================= wieck@debis.com (Jan Wieck) #
pgsql-hackers by date: