Re: ZStandard (with dictionaries) compression support for TOAST compression - Mailing list pgsql-hackers

From Nikhil Kumar Veldanda
Subject Re: ZStandard (with dictionaries) compression support for TOAST compression
Date
Msg-id CAFAfj_FE-wHQtLBkOVab4MR=1GKErsLL7=HZbWBVnAxa=xncSA@mail.gmail.com
Whole thread Raw
In response to Re: ZStandard (with dictionaries) compression support for TOAST compression  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
Thanks Michael, for providing feedback.

On Fri, May 30, 2025 at 12:21 AM Michael Paquier <michael@paquier.xyz> wrote:
>
>
> > During compression, compression methods (zstd_compress_datum) will
> > determine whether to use metadata(dictionary) or not based on
> > CompressionInfo.meta.
>
> Not sure about this one.

I removed the meta field and the CompressionInfo struct. I originally
used CompressionInfo to carry the compression method, zstd_level, and
zstd dict ID downstream, but since we’re now using a default
compression level for zstd, it’s no longer needed.

>
> > ALTER TABLE tblname
> >   ALTER COLUMN colname
> >   SET (zstd_level = 5);
> > ```
> >
>
> Specifying that as an attribute option makes sense here, but I don't
> think that this has to be linked to the initial patch set that should
> extend the toast data for the new compression method.  It's a bit hard
> to say how relevant that is, and IMV it's kind of hard for users to
> know which level makes more sense.  Setting up the wrong level can be
> equally very costly in CPU.  For now, my suggestion would be to focus
> on the basics, and discard this part until we figure out the rest.
>

Ack. I’ve removed that option and will stick with ZSTD_CLEVEL_DEFAULT
as the compression level.

> +CompressionInfo
> +setup_cmp_info(char cmethod, Form_pg_attribute att)

Removed setup_cmp_info and its references.

> This routine declares a Form_pg_attribute as argument, does not use
> it.  Due to that, it looks that attoptcache.h is pulled into
> toast_compression.c.
>
Removed it.

> Patch 0001 has the concept of metadata with various facilities, like
> VARATT_4BCE_HAS_META(), CompressionInfo, etc.  However at the current
> stage we don't need that at all.  Wouldn't it be better to delay this
> kind of abstraction layer to happen after we discuss how (and if) the
> dictionary part should be introduced rather than pay the cost of the
> facility in the first step of the implementation?  This is not
> required as a first step.  The toast zstd routines introduced in patch
> 0002 use !meta, discard meta=true as an error case.
>
Removed all metadata-related abstractions from patch 0001.

> +/* Helper: pack <flag, cmid> into a single byte:  flag (b0), cmid-2
> (b1..7) */
>
> Having a one-liner here is far from enough?  This is the kind of thing
> where we should spend time describing how things are done and why they
> are done this way.  This is not sufficient, there's just too much to
> guess.  The fact that we have VARATT_4BCE_EXTFLAG is only, but there's
> no real information about va_ecinfo and that it relates to the three
> bits sets, for example.
>
I’ve added a detailed comment explaining the one-byte layout.

> +#define VARTAG_SIZE(PTR)   \
> [...]
> UNALIGNED_U32()
>
> This stuff feels magic.  It's hard for someone to understand what's
> going on here, and there is no explanation about why it's done this
> way.
>
To clarify, we need to read a 32-bit value from an unaligned address
(specifically va_extinfo inside varatt_external) to determine the
toast_pointer size (by checking the top two bits to see if they equal
0b11, indicating an optional trailer). I wrote two versions of
READ_U32_UNALIGNED(ptr) that load four bytes individually and
reassemble them according to little- or big-endian order:

/**
 * Safely read a 32-bit unsigned integer from *any* address, even when
 * that address is **not** naturally aligned to 4 bytes.  We do the load
 * one byte at a time and re-assemble the word in *host* byte order.
 * For LITTLE ENDIAN systems
 */
#define READ_U32_UNALIGNED(ptr) \
( (uint32) (((const uint8 *)(ptr))[0]) \
| ((uint32)(((const uint8 *)(ptr))[1]) <<  8) \
| ((uint32)(((const uint8 *)(ptr))[2]) << 16) \
| ((uint32)(((const uint8 *)(ptr))[3]) << 24) )

/**
 * For BIG ENDIAN systems.
 */
#define READ_U32_UNALIGNED(ptr) \
( (uint32) (((const uint8 *)(ptr))[3]) \
| ((uint32)(((const uint8 *)(ptr))[2]) <<  8) \
| ((uint32)(((const uint8 *)(ptr))[1]) << 16) \
| ((uint32)(((const uint8 *)(ptr))[0]) << 24) )

Alternatively, one could use:

#define READ_U32_UNALIGNED(src)                           \
    ({                                                    \
        uint32 _tmp;                                      \
        memcpy(&_tmp, (src), sizeof(uint32));             \
        _tmp;                                             \
    })

I chose the byte-by-byte version to avoid extra instructions in a hot path.

> -toast_compress_datum(Datum value, char cmethod)
> +toast_compress_datum(Datum value, CompressionInfo cmp)
> [...]
> -   /* If the compression method is not valid, use the current default */
> -   if (!CompressionMethodIsValid(cmethod))
> -       cmethod = default_toast_compression;
>
> Removing the fallback to the default toast compression GUC if nothing
> is valid does not look right.  There could be extensions that depend
> on that, and it's unclear what the benefits of setup_cmp_info() are,
> because it is not documented, so it's hard for one to understand how
> to use these changes.

I removed setup_cmp_info, all related code has been deleted.

> -   result = (struct varlena *) palloc(TOAST_POINTER_SIZE);
> +   result = (struct varlena *) palloc(TOAST_CMPID_EXTENDED(cm) ? TOAST_POINTER_EXT_SIZE : TOAST_POINTER_NOEXT_SIZE);
> [...]
> -   memcpy(VARDATA_EXTERNAL(result), &toast_pointer,
> sizeof(toast_pointer));
> +   memcpy(VARDATA_EXTERNAL(result), &toast_pointer,
> TOAST_CMPID_EXTENDED(cm) ? TOAST_POINTER_EXT_SIZE - VARHDRSZ_EXTERNAL
> : TOAST_POINTER_NOEXT_SIZE - VARHDRSZ_EXTERNAL) ;
>
> That looks, err...  Hard to maintain to me.  Okay, that's a
> calculation for the extended compression part, but perhaps this is a
> sign that we need to think harder about the surroundings of the
> toast_pointer to ease such calculations.
>
I simplified it by introducing a helper macro:
Now both the palloc call and the memcpy length calculation simply use
TOAST_POINTER_SIZE(cm) and TOAST_POINTER_SIZE(cm) − VARHDRSZ_EXTERNAL,
respectively.

#define TOAST_POINTER_SIZE(cm) \
(TOAST_CMPID_EXTENDED(cm) ? TOAST_POINTER_EXT_SIZE : TOAST_POINTER_NOEXT_SIZE)

> +    {
> +        {
> +            "zstd_level",
> +            "Set column's ZSTD compression level",
> +            RELOPT_KIND_ATTRIBUTE,
> +            ShareUpdateExclusiveLock
> +        },
> +        DEFAULT_ZSTD_LEVEL, MIN_ZSTD_LEVEL, MAX_ZSTD_LEVEL
> +    },
>
> This could be worth a patch on its own, once we get the basics sorted
> out.  I'm not even sure that we absolutely need that, TBH.  The last
> time I've had a discussion on the matter for WAL compression we
> discarded the argument about the level because it's hard to understand
> how to tune, and the default is enough to work magics.  For WAL, we've
> been using ZSTD_CLEVEL_DEFAULT in xloginsert.c, and I've not actually
> heard much about people wanting to tune the compression level.  That
> was a few years ago, perhaps there are some more different opinions on
> the matter.
>
Removed it.

> Your patch introduces a new compression_zstd, touching very lightly
> compression.sql.  I think that we should and can do much better than
> that in the long term.  The coverage of compression.sql is quite good,
> and what the zstd code is adding does not cover all of it.  Let's
> rework the tests of HEAD and split compression.sql for the LZ4 and
> pglz parts.  If one takes a diff between compression.out and
> compression_1.out, he/she would notice that the only differences are
> caused by the existence of the lz4 table.  This is not the smartest
> move we can do if we add more compression methods, so I'd suggest the
> following:
> - Add a new SQL function called pg_toast_compression_available(text)
> or similar, able to return if a toast compression method is supported
> or not.  This would need two arguments once the initial support for
> zstd is done: lz4 and zstd.  For head, we only require one: lz4.
> - Now, the actual reason why a function returning a boolean result is
> useful is for the SQL tests.  It is possible with \if to make the
> tests conditional if LZ4 is supported or now, limiting the noise if
> LZ4 is not supported.  See for example the tricks we use for the UTF-8
> encoding or NUMA.
> - Move the tests related to lz4 into a separate file, outside
> compression.sql, in a new file called compression_lz4.sql.  With the
> addition of zstd toast support, we would add a new file:
> compression_zstd.sql.  The new zstd suite would then just need to
> copy-paste the original one, with few tweaks.  It may be better to
> parameterize that but we don't do that anymore these days with
> input/output regression files.

Agreed. I introduced pg_compression_available(text) and refactored the
SQL tests accordingly. I split out LZ4 tests into compression_lz4.sql
and created compression_zstd.sql with the appropriate differences.

v25-0001-Add-pg_compression_available-and-split-sql-compr.patch -
Introduced pg_compression_available function and split sql tests
related to compression
v25-0002-Design-to-extend-the-varattrib_4b-varatt_externa.patch -
Design proposal for varattrib_4b & varatt_external
v25-0003-Implement-Zstd-compression-no-dictionary-support.patch - zstd
no dictionary compression implementation

--
Nikhil Veldanda

Attachment

pgsql-hackers by date:

Previous
From: shveta malik
Date:
Subject: Re: synchronized_standby_slots used in logical replication
Next
From: Eduard Stefes
Date:
Subject: [V2] Adding new CRC32C implementation for IBM S390X