Re: [PATCH] Compression and on-disk sorting - Mailing list pgsql-patches

From Martijn van Oosterhout
Subject Re: [PATCH] Compression and on-disk sorting
Date
Msg-id 20060519115844.GE17873@svana.org
Whole thread Raw
In response to Re: [PATCH] Compression and on-disk sorting  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-patches
On Fri, May 19, 2006 at 12:43:05PM +0100, Simon Riggs wrote:
> We need to test "SELECT aid from accounts" also, or some other scenarios
> where the data is as uncompressible as possible. We should also try this
> on a table where the rows have been inserted by different transactions,
> so that the xmin value isn't the same for all tuples. We need to see if
> there are cases where this causes a performance regression rather than
> gain.

AFAIK, the xmin is not included in the sort. The only thing is maybe
the ctid which is used in updates. Actually repeating the above tests
but doing:

select xmin,xmax,cmin,cmax,ctid,* from <blah>

Would be interesting. Even that would be compressable though.

Thinking about it, we're storing and compressing HeapTuples right.
There are a few fields there that would compress really well.

t_tableOid
t_len   (if not vairable length fields)
t_natts

Even t_hoff and the bitmask if the patterns of NULLs don't vary much
between rows.

> We still need to examine memory usage. Jim's testing so far is done on
> already sorted data, so only uses 1 out of 715 tapes. If we did utilise
> a much larger number of tapes, we could face difficulties with the
> memory used during decompression.

I'm going to see if I can make some changes to track maximum memory
usage per tape.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

pgsql-patches by date:

Previous
From: Simon Riggs
Date:
Subject: Re: [PATCH] Compression and on-disk sorting
Next
From: Bernd Helmle
Date:
Subject: Add namespace dependency for conversions