Thread: ZStandard (with dictionaries) compression support for TOAST compression

ZStandard (with dictionaries) compression support for TOAST compression

From

Nikhil Kumar Veldanda

Date:

06 March, 08:32:07

Hi all,

The ZStandard compression algorithm [1][2], though not currently used for TOAST compression in PostgreSQL, offers significantly improved compression ratios compared to lz4/pglz in both dictionary-based and non-dictionary modes. Attached find for review my patch to add ZStandard compression to Postgres. In tests this patch used with a pre-trained dictionary achieved up to four times the compression ratio of LZ4, while ZStandard without a dictionary outperformed LZ4/pglz by about two times during compression of data.

Notably, this is the first compression algorithm for Postgres that can make use of a dictionary to provide higher levels of compression, but dictionaries have to be generated and maintained, and so I’ve had to break new ground in that regard. To use the dictionary support requires training and storing a dictionary for a given variable-length column type. On a variable-length column, a SQL function will be called. It will sample the column’s data and feed it into the ZStandard training API which will return a dictionary. In the example, the column is of JSONB type. The SQL function takes the table name and the attribute number as inputs. If the training is successful, it will return true; otherwise, it will return false.

‘’‘
test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1);
build_zstd_dict_for_attribute
-------------------------------
t
(1 row)
‘’‘

The sampling logic and data to feed to the ZStandard training API can vary by data type. The patch includes an method to write other type-specific training functions and includes a default for JSONB, TEXT and BYTEA. There is a new option called ‘build_zstd_dict’ that takes a function name as input in ‘CREATE TYPE’. In this way anyone can write their own type-specific training function by handling sampling logic and returning the necessary information for the ZStandard training API in “ZstdTrainingData” format.

```
typedef struct ZstdTrainingData
{
char *sample_buffer; /* Pointer to the raw sample buffer */
size_t *sample_sizes; /* Array of sample sizes */
int nitems; /* Number of sample sizes */
} ZstdTrainingData;
```
This information is feed into the ZStandard train API, which generates a dictionary and inserts it into the dictionary catalog table. Additionally, we update the ‘pg_attribute’ attribute options to include the unique dictionary ID for that specific attribute. During compression, based on the available dictionary ID, we retrieve the dictionary and use it to compress the documents. I’ve created standard training function (`zstd_dictionary_builder`) for JSONB, TEXT, and BYTEA.

We store dictionary and dictid in the new catalog table ‘pg_zstd_dictionaries’

```
test=# \d pg_zstd_dictionaries
Table "pg_catalog.pg_zstd_dictionaries"
Column | Type | Collation | Nullable | Default
--------+-------+-----------+----------+---------
dictid | oid | | not null |
dict | bytea | | not null |
Indexes:
"pg_zstd_dictionaries_dictid_index" PRIMARY KEY, btree (dictid)
```

This is the entire ZStandard dictionary infrastructure. A column can have multiple dictionaries. The latest dictionary will be identified by the pg_attribute attoptions. We never delete dictionaries once they are generated. If a dictionary is not provided and attcompression is set to zstd, we compress with ZStandard without dictionary. For decompression, the zstd-compressed frame contains a dictionary identifier (dictid) that indicates the dictionary used for compression. By retrieving this dictid from the zstd frame, we then fetch the corresponding dictionary and perform decompression.

#############################################################################

Enter toast compression framework changes,

We identify a compressed datum compression algorithm using the top two bits of va_tcinfo (varattrib_4b.va_compressed).
It is possible to have four compression methods. However, based on previous community email discussions regarding toast compression changes[3], the idea of using it for a new compression algorithm has been rejected, and a suggestion has been made to extend it which I’ve implemented in this patch. This change necessitates an update to ‘varattrib_4b’ and ‘varatt_external’ on disk structures. I’ve made sure that this changes are backward compatible.

```
typedef union
{
struct /* Normal varlena (4-byte length) */
{
uint32 va_header;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_4byte;
struct /* Compressed-in-line format */
{
uint32 va_header;
uint32 va_tcinfo; /* Original data size (excludes header) and
* compression method; see va_extinfo */
char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
} va_compressed;
struct
{
uint32 va_header;
uint32 va_tcinfo;
uint32 va_cmp_alg;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_compressed_ext;
} varattrib_4b;

typedef struct varatt_external
{
int32 va_rawsize; /* Original data size (includes header) */
uint32 va_extinfo; /* External saved size (without header) and
* compression method */
Oid va_valueid; /* Unique ID of value within TOAST table */
Oid va_toastrelid; /* RelID of TOAST table containing it */
uint32 va_cmp_alg; /* The additional compression algorithms
* information. */
} varatt_external;
```

As I need to update this structs, I’ve made changes to the existing macros. Additionally added compression and decompression routines related to ZStandard as needed. These are major design changes in the patch to incorporate ZStandard with dictionary compression.

Please let me know what you think about all this. Are there any concerns with my approach? In particular, I would appreciate your thoughts on the on-disk changes that result from this.

kind regards,

Nikhil Veldanda
Amazon Web Services: https://aws.amazon.com

[1] https://facebook.github.io/zstd/
[2] https://github.com/facebook/zstd
[3] https://www.postgresql.org/message-id/flat/YoMiNmkztrslDbNS%40paquier.xyz

Attachment

v1-0001-Add-ZStandard-with-dictionaries-compression-suppo.patch

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Kirill Reshke

Date:

06 March, 15:02:52

On Thu, 6 Mar 2025 at 08:43, Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:
>
> Hi all,
>
> The ZStandard compression algorithm [1][2], though not currently used for TOAST compression in PostgreSQL, offers
significantlyimproved compression ratios compared to lz4/pglz in both dictionary-based and non-dictionary modes.
Attachedfind for review my patch to add ZStandard compression to Postgres. In tests this patch used with a pre-trained
dictionaryachieved up to four times the compression ratio of LZ4, while ZStandard without a dictionary outperformed
LZ4/pglzby about two times during compression of data. 
>
> Notably, this is the first compression algorithm for Postgres that can make use of a dictionary to provide higher
levelsof compression, but dictionaries have to be generated and maintained, and so I’ve had to break new ground in that
regard.To use the dictionary support requires training and storing a dictionary for a given variable-length column
type.On a variable-length column, a SQL function will be called. It will sample the column’s data and feed it into the
ZStandardtraining API which will return a dictionary. In the example, the column is of JSONB type. The SQL function
takesthe table name and the attribute number as inputs. If the training is successful, it will return true; otherwise,
itwill return false. 
>
> ‘’‘
> test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1);
> build_zstd_dict_for_attribute
> -------------------------------
> t
> (1 row)
> ‘’‘
>
> The sampling logic and data to feed to the ZStandard training API can vary by data type. The patch includes an method
towrite other type-specific training functions and includes a default for JSONB, TEXT and BYTEA. There is a new option
called‘build_zstd_dict’ that takes a function name as input in ‘CREATE TYPE’. In this way anyone can write their own
type-specifictraining function by handling sampling logic and returning the necessary information for the ZStandard
trainingAPI in “ZstdTrainingData” format. 
>
> ```
> typedef struct ZstdTrainingData
> {
> char *sample_buffer; /* Pointer to the raw sample buffer */
> size_t *sample_sizes; /* Array of sample sizes */
> int nitems; /* Number of sample sizes */
> } ZstdTrainingData;
> ```
> This information is feed into the ZStandard train API, which generates a dictionary and inserts it into the
dictionarycatalog table. Additionally, we update the ‘pg_attribute’ attribute options to include the unique dictionary
IDfor that specific attribute. During compression, based on the available dictionary ID, we retrieve the dictionary and
useit to compress the documents. I’ve created standard training function (`zstd_dictionary_builder`) for JSONB, TEXT,
andBYTEA. 
>
> We store dictionary and dictid in the new catalog table ‘pg_zstd_dictionaries’
>
> ```
> test=# \d pg_zstd_dictionaries
> Table "pg_catalog.pg_zstd_dictionaries"
> Column | Type | Collation | Nullable | Default
> --------+-------+-----------+----------+---------
> dictid | oid | | not null |
> dict | bytea | | not null |
> Indexes:
> "pg_zstd_dictionaries_dictid_index" PRIMARY KEY, btree (dictid)
> ```
>
> This is the entire ZStandard dictionary infrastructure. A column can have multiple dictionaries. The latest
dictionarywill be identified by the pg_attribute attoptions. We never delete dictionaries once they are generated. If a
dictionaryis not provided and attcompression is set to zstd, we compress with ZStandard without dictionary. For
decompression,the zstd-compressed frame contains a dictionary identifier (dictid) that indicates the dictionary used
forcompression. By retrieving this dictid from the zstd frame, we then fetch the corresponding dictionary and perform
decompression.
>
> #############################################################################
>
> Enter toast compression framework changes,
>
> We identify a compressed datum compression algorithm using the top two bits of va_tcinfo
(varattrib_4b.va_compressed).
> It is possible to have four compression methods. However, based on previous community email discussions regarding
toastcompression changes[3], the idea of using it for a new compression algorithm has been rejected, and a suggestion
hasbeen made to extend it which I’ve implemented in this patch. This change necessitates an update to ‘varattrib_4b’
and‘varatt_external’ on disk structures. I’ve made sure that this changes are backward compatible. 
>
> ```
> typedef union
> {
> struct /* Normal varlena (4-byte length) */
> {
> uint32 va_header;
> char va_data[FLEXIBLE_ARRAY_MEMBER];
> } va_4byte;
> struct /* Compressed-in-line format */
> {
> uint32 va_header;
> uint32 va_tcinfo; /* Original data size (excludes header) and
> * compression method; see va_extinfo */
> char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
> } va_compressed;
> struct
> {
> uint32 va_header;
> uint32 va_tcinfo;
> uint32 va_cmp_alg;
> char va_data[FLEXIBLE_ARRAY_MEMBER];
> } va_compressed_ext;
> } varattrib_4b;
>
> typedef struct varatt_external
> {
> int32 va_rawsize; /* Original data size (includes header) */
> uint32 va_extinfo; /* External saved size (without header) and
> * compression method */
> Oid va_valueid; /* Unique ID of value within TOAST table */
> Oid va_toastrelid; /* RelID of TOAST table containing it */
> uint32 va_cmp_alg; /* The additional compression algorithms
> * information. */
> } varatt_external;
> ```
>
> As I need to update this structs, I’ve made changes to the existing macros. Additionally added compression and
decompressionroutines related to ZStandard as needed. These are major design changes in the patch to incorporate
ZStandardwith dictionary compression. 
>
> Please let me know what you think about all this. Are there any concerns with my approach? In particular, I would
appreciateyour thoughts on the on-disk changes that result from this. 
>
> kind regards,
>
> Nikhil Veldanda
> Amazon Web Services: https://aws.amazon.com
>
> [1] https://facebook.github.io/zstd/
> [2] https://github.com/facebook/zstd
> [3] https://www.postgresql.org/message-id/flat/YoMiNmkztrslDbNS%40paquier.xyz
>

Hi!
I generally love this idea, however I am not convinced in-core support
this is the right direction here. Maybe we can introduce some API
infrastructure here to allow delegating compression to extension's?
This is merely my opinion; perhaps dealing with a redo is not
worthwhile.

I did a brief lookup on patch v1. I feel like this is too much for a
single patch. Take, for example this change:

```
-#define NO_LZ4_SUPPORT() \
+#define NO_METHOD_SUPPORT(method) \
  ereport(ERROR, \
  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), \
- errmsg("compression method lz4 not supported"), \
- errdetail("This functionality requires the server to be built with
lz4 support.")))
+ errmsg("compression method %s not supported", method), \
+ errdetail("This functionality requires the server to be built with
%s support.", method)))
 ```

This could be a separate preliminary refactoring patch in series.
Perhaps we need to divide the patch into smaller pieces if we follow
the suggested course of this thread (in-core support).

I will try to give another in-depth look here soon.

--
Best regards,
Kirill Reshke

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Yura Sokolov

Date:

06 March, 15:52:44

06.03.2025 08:32, Nikhil Kumar Veldanda пишет:
> Hi all,
> 
> The ZStandard compression algorithm [1][2], though not currently used for
> TOAST compression in PostgreSQL, offers significantly improved compression
> ratios compared to lz4/pglz in both dictionary-based and non-dictionary
> modes. Attached find for review my patch to add ZStandard compression to
> Postgres. In tests this patch used with a pre-trained dictionary achieved
> up to four times the compression ratio of LZ4, while ZStandard without a
> dictionary outperformed LZ4/pglz by about two times during compression of data.
> 
> Notably, this is the first compression algorithm for Postgres that can make
> use of a dictionary to provide higher levels of compression, but
> dictionaries have to be generated and maintained, and so I’ve had to break
> new ground in that regard. To use the dictionary support requires training
> and storing a dictionary for a given variable-length column type. On a
> variable-length column, a SQL function will be called. It will sample the
> column’s data and feed it into the ZStandard training API which will return
> a dictionary. In the example, the column is of JSONB type. The SQL function
> takes the table name and the attribute number as inputs. If the training is
> successful, it will return true; otherwise, it will return false.
> 
> ‘’‘
> test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1);
> build_zstd_dict_for_attribute
> -------------------------------
> t
> (1 row)
> ‘’‘
> 
> The sampling logic and data to feed to the ZStandard training API can vary
> by data type. The patch includes an method to write other type-specific
> training functions and includes a default for JSONB, TEXT and BYTEA. There
> is a new option called ‘build_zstd_dict’ that takes a function name as
> input in ‘CREATE TYPE’. In this way anyone can write their own type-
> specific training function by handling sampling logic and returning the
> necessary information for the ZStandard training API in “ZstdTrainingData”
> format.
> 
> ```
> typedef struct ZstdTrainingData
> {
> char *sample_buffer; /* Pointer to the raw sample buffer */
> size_t *sample_sizes; /* Array of sample sizes */
> int nitems; /* Number of sample sizes */
> } ZstdTrainingData;
> ```
> This information is feed into the ZStandard train API, which generates a
> dictionary and inserts it into the dictionary catalog table. Additionally,
> we update the ‘pg_attribute’ attribute options to include the unique
> dictionary ID for that specific attribute. During compression, based on the
> available dictionary ID, we retrieve the dictionary and use it to compress
> the documents. I’ve created standard training function
> (`zstd_dictionary_builder`) for JSONB, TEXT, and BYTEA. 
> 
> We store dictionary and dictid in the new catalog table ‘pg_zstd_dictionaries’
> 
> ```
> test=# \d pg_zstd_dictionaries
> Table "pg_catalog.pg_zstd_dictionaries"
> Column | Type | Collation | Nullable | Default
> --------+-------+-----------+----------+---------
> dictid | oid | | not null |
> dict | bytea | | not null |
> Indexes:
> "pg_zstd_dictionaries_dictid_index" PRIMARY KEY, btree (dictid)
> ``` 
> 
> This is the entire ZStandard dictionary infrastructure. A column can have
> multiple dictionaries. The latest dictionary will be identified by the
> pg_attribute attoptions. We never delete dictionaries once they are
> generated. If a dictionary is not provided and attcompression is set to
> zstd, we compress with ZStandard without dictionary. For decompression, the
> zstd-compressed frame contains a dictionary identifier (dictid) that
> indicates the dictionary used for compression. By retrieving this dictid
> from the zstd frame, we then fetch the corresponding dictionary and perform
> decompression.
> 
> #############################################################################
> 
> Enter toast compression framework changes,
> 
> We identify a compressed datum compression algorithm using the top two bits
> of va_tcinfo (varattrib_4b.va_compressed). 
> It is possible to have four compression methods. However, based on previous
> community email discussions regarding toast compression changes[3], the
> idea of using it for a new compression algorithm has been rejected, and a
> suggestion has been made to extend it which I’ve implemented in this patch.
> This change necessitates an update to ‘varattrib_4b’ and ‘varatt_external’
> on disk structures. I’ve made sure that this changes are backward compatible. 
> 
> ```
> typedef union
> {
> struct /* Normal varlena (4-byte length) */
> {
> uint32 va_header;
> char va_data[FLEXIBLE_ARRAY_MEMBER];
> } va_4byte;
> struct /* Compressed-in-line format */
> {
> uint32 va_header;
> uint32 va_tcinfo; /* Original data size (excludes header) and
> * compression method; see va_extinfo */
> char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
> } va_compressed;
> struct
> {
> uint32 va_header;
> uint32 va_tcinfo;
> uint32 va_cmp_alg;
> char va_data[FLEXIBLE_ARRAY_MEMBER];
> } va_compressed_ext;
> } varattrib_4b;
> 
> typedef struct varatt_external
> {
> int32 va_rawsize; /* Original data size (includes header) */
> uint32 va_extinfo; /* External saved size (without header) and
> * compression method */
> Oid va_valueid; /* Unique ID of value within TOAST table */
> Oid va_toastrelid; /* RelID of TOAST table containing it */
> uint32 va_cmp_alg; /* The additional compression algorithms
> * information. */
> } varatt_external;
> ```
> 
> As I need to update this structs, I’ve made changes to the existing macros.
> Additionally added compression and decompression routines related to
> ZStandard as needed. These are major design changes in the patch to
> incorporate ZStandard with dictionary compression. 
> 
> Please let me know what you think about all this. Are there any concerns
> with my approach? In particular, I would appreciate your thoughts on the
> on-disk changes that result from this.
> 
> kind regards,
> 
> Nikhil Veldanda
> Amazon Web Services: https://aws.amazon.com <https://aws.amazon.com/>
> 
> [1] https://facebook.github.io/zstd/ <https://facebook.github.io/zstd/>
> [2] https://github.com/facebook/zstd <https://github.com/facebook/zstd>
> [3] https://www.postgresql.org/message-id/flat/
> YoMiNmkztrslDbNS%40paquier.xyz <https://www.postgresql.org/message-id/flat/
> YoMiNmkztrslDbNS%40paquier.xyz>

Overall idea is great.

I just want to mention LZ4 also have API to use dictionary. Its dictionary
will be as simple as "virtually prepended" text (in contrast to complex
ZStd dictionary format).

I mean, it would be great if "dictionary" will be common property for
different algorithms.

On the other hand, zstd have "super fast" mode which is actually a bit
faster than LZ4 and compresses a bit better. So may be support for
different algos is not essential. (But then we need a way to change
compression level to that "super fast" mode.)

-------
regards
Yura Sokolov aka funny-falcon

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Aleksander Alekseev

Date:

06 March, 16:35:16

Hi Nikhil,

Many thanks for working on this. I proposed a similar patch some time
ago [1] but the overall feedback was somewhat mixed so I choose to
focus on something else. Thanks for peeking this up.

> test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1);
> build_zstd_dict_for_attribute
> -------------------------------
> t
> (1 row)

Did you have a chance to familiarize yourself with the corresponding
discussion [1] and probably the previous threads? Particularly it was
pointed out that dictionaries should be built automatically during
VACUUM. We also discussed a special syntax for the feature, besides
other things.

[1]: https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22%3D5xVBg7S4vr5rQ%40mail.gmail.com

-- 
Best regards,
Aleksander Alekseev

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Nikhil Kumar Veldanda

Date:

06 March, 19:29:53

Hi,

> Overall idea is great.
>
> I just want to mention LZ4 also have API to use dictionary. Its dictionary
> will be as simple as "virtually prepended" text (in contrast to complex
> ZStd dictionary format).
>
> I mean, it would be great if "dictionary" will be common property for
> different algorithms.
>
> On the other hand, zstd have "super fast" mode which is actually a bit
> faster than LZ4 and compresses a bit better. So may be support for
> different algos is not essential. (But then we need a way to change
> compression level to that "super fast" mode.)
>

zstd compression level and zstd dictionary size is configurable at
attribute level using ALTER TABLE. Default zstd level is 3 and dict
size is 4KB. For super fast mode level can be set to 1.

```
test=# alter table zstd alter column doc set compression zstd;
ALTER TABLE
test=# alter table zstd alter column doc set(zstd_cmp_level = 1);
ALTER TABLE
test=# select * from pg_attribute where attrelid = 'zstd'::regclass
and attname = 'doc';
 attrelid | attname | atttypid | attlen | attnum | atttypmod |
attndims | attbyval | attalign | attstorage | attcompre
ssion | attnotnull | atthasdef | atthasmissing | attidentity |
attgenerated | attisdropped | attislocal | attinhcount
| attcollation | attstattarget | attacl |            attoptions
    | attfdwoptions | attmissingval
----------+---------+----------+--------+--------+-----------+----------+----------+----------+------------+----------
------+------------+-----------+---------------+-------------+--------------+--------------+------------+-------------
+--------------+---------------+--------+----------------------------------+---------------+---------------
    16389 | doc     |     3802 |     -1 |      1 |        -1 |
0 | f        | i        | x          | z
      | f          | f         | f             |             |
     | f            | t          |           0
|            0 |               |        |
{zstd_dictid=1,zstd_cmp_level=1} |               |
(1 row)
```

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Nikhil Kumar Veldanda

Date:

06 March, 19:47:44

Hi

On Thu, Mar 6, 2025 at 5:35 AM Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> Hi Nikhil,
>
> Many thanks for working on this. I proposed a similar patch some time
> ago [1] but the overall feedback was somewhat mixed so I choose to
> focus on something else. Thanks for peeking this up.
>
> > test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1);
> > build_zstd_dict_for_attribute
> > -------------------------------
> > t
> > (1 row)
>
> Did you have a chance to familiarize yourself with the corresponding
> discussion [1] and probably the previous threads? Particularly it was
> pointed out that dictionaries should be built automatically during
> VACUUM. We also discussed a special syntax for the feature, besides
> other things.
>
> [1]:
https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22%3D5xVBg7S4vr5rQ%40mail.gmail.com

Restricting dictionary generation to the vacuum process is not ideal
because it limits user control and flexibility. Compression efficiency
is highly dependent on data distribution, which can change
dynamically. By allowing users to generate dictionaries on demand via
an API, they can optimize compression when they detect inefficiencies
rather than waiting for a vacuum process, which may not align with
their needs.

Additionally, since all dictionaries are stored in the catalog table
anyway, users can generate and manage them independently without
interfering with the system’s automatic maintenance tasks. This
approach ensures better adaptability to real-world scenarios where
compression performance needs to be monitored and adjusted in real
time.

---
Nikhil Veldanda

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Yura Sokolov

Date:

06 March, 20:10:20

06.03.2025 19:29, Nikhil Kumar Veldanda пишет:
> Hi,
> 
>> Overall idea is great.
>>
>> I just want to mention LZ4 also have API to use dictionary. Its dictionary
>> will be as simple as "virtually prepended" text (in contrast to complex
>> ZStd dictionary format).
>>
>> I mean, it would be great if "dictionary" will be common property for
>> different algorithms.
>>
>> On the other hand, zstd have "super fast" mode which is actually a bit
>> faster than LZ4 and compresses a bit better. So may be support for
>> different algos is not essential. (But then we need a way to change
>> compression level to that "super fast" mode.)
>>
> 
> zstd compression level and zstd dictionary size is configurable at
> attribute level using ALTER TABLE. Default zstd level is 3 and dict
> size is 4KB. For super fast mode level can be set to 1.

No. Super-fast mode levels are negative. See parsing "--fast" parameter in
`programs/zstdcli.c` in zstd's repository and definition of ZSTD_minCLevel().

So, to support "super-fast" mode you have to accept negative compression
levels. I didn't check, probably you're already support them?


-------
regards
Yura Sokolov aka funny-falcon

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Nikhil Kumar Veldanda

Date:

06 March, 20:28:29

Hi Yura,

> So, to support "super-fast" mode you have to accept negative compression
> levels. I didn't check, probably you're already support them?
>

The key point I want to emphasize is that both zstd compression levels
and dictionary size should be configurable based on user preferences
at attribute level.

---
Nikhil Veldanda

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Robert Haas

Date:

06 March, 22:15:03

On Thu, Mar 6, 2025 at 12:43 AM Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:
> Notably, this is the first compression algorithm for Postgres that can make use of a dictionary to provide higher
levelsof compression, but dictionaries have to be generated and maintained, 

I think that solving the problems around using a dictionary is going
to be really hard. Can we see some evidence that the results will be
worth it?

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Tom Lane

Date:

06 March, 22:33:30

Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Mar 6, 2025 at 12:43 AM Nikhil Kumar Veldanda
> <veldanda.nikhilkumar17@gmail.com> wrote:
>> Notably, this is the first compression algorithm for Postgres that can make use of a dictionary to provide higher
levelsof compression, but dictionaries have to be generated and maintained, 

> I think that solving the problems around using a dictionary is going
> to be really hard. Can we see some evidence that the results will be
> worth it?

BTW, this is hardly the first such attempt.  See [1] for a prior
attempt at something fairly similar, which ended up going nowhere.
It'd be wise to understand why that failed before pressing forward.

Note that the thread title for [1] is pretty misleading, as the
original discussion about JSONB-specific compression soon migrated
to discussion of compressing TOAST data using dictionaries.  At
least from a ten-thousand-foot viewpoint, that seems like exactly
what you're proposing here.  I see that you dismissed [1] as
irrelevant upthread, but I think you'd better look closer.

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22%3D5xVBg7S4vr5rQ%40mail.gmail.com

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Nikhil Kumar Veldanda

Date:

06 March, 23:59:01

Hi Robert,

> I think that solving the problems around using a dictionary is going
> to be really hard. Can we see some evidence that the results will be
> worth it?

With the latest patch I've shared,

Using a Kaggle dataset of Nintendo-related tweets[1], we leveraged
PostgreSQL's acquire_sample_rows function to quickly gather just 1,000
sample rows for a specific attribute out of 104695 rows. These raw
samples were passed into Zstd's sampling buffer, generating a custom
dictionary. This dictionary was then directly used to compress the
documents, resulting in 62% of space savings after compressed:

```
test=# \dt+
                                         List of tables
 Schema |      Name      | Type  |  Owner   | Persistence | Access
method |  Size  | Description
--------+----------------+-------+----------+-------------+---------------+--------+-------------
 public | lz4            | table | nikhilkv | permanent   | heap
   | 297 MB |
 public | pglz           | table | nikhilkv | permanent   | heap
   | 259 MB |
 public | zstd_with_dict | table | nikhilkv | permanent   | heap
   | 114 MB |
 public | zstd_wo_dict   | table | nikhilkv | permanent   | heap
   | 210 MB |
(4 rows)
```

We've observed similarly strong results on other datasets as well with
using dictionaries.

[1] https://www.kaggle.com/code/dcalambas/nintendo-tweets-analysis/data

---
Nikhil Veldanda

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Nikhil Kumar Veldanda

Date:

07 March, 01:36:28

Hi Tom,

On Thu, Mar 6, 2025 at 11:33 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Thu, Mar 6, 2025 at 12:43 AM Nikhil Kumar Veldanda
> > <veldanda.nikhilkumar17@gmail.com> wrote:
> >> Notably, this is the first compression algorithm for Postgres that can make use of a dictionary to provide higher
levelsof compression, but dictionaries have to be generated and maintained, 
>
> > I think that solving the problems around using a dictionary is going
> > to be really hard. Can we see some evidence that the results will be
> > worth it?
>
> BTW, this is hardly the first such attempt.  See [1] for a prior
> attempt at something fairly similar, which ended up going nowhere.
> It'd be wise to understand why that failed before pressing forward.
>
> Note that the thread title for [1] is pretty misleading, as the
> original discussion about JSONB-specific compression soon migrated
> to discussion of compressing TOAST data using dictionaries.  At
> least from a ten-thousand-foot viewpoint, that seems like exactly
> what you're proposing here.  I see that you dismissed [1] as
> irrelevant upthread, but I think you'd better look closer.
>
>                         regards, tom lane
>
> [1] https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22%3D5xVBg7S4vr5rQ%40mail.gmail.com

Thank you for highlighting the previous discussion—I reviewed [1]
closely. While both methods involve dictionary-based compression, the
approach I'm proposing differs significantly.

The previous method explicitly extracted string values from JSONB and
assigned unique OIDs to each entry, resulting in distinct dictionary
entries for every unique value. In contrast, this approach directly
leverages Zstandard's dictionary training API. We provide raw data
samples to Zstd, which generates a dictionary of a specified size.
This dictionary is then stored in a catalog table and used to compress
subsequent inserts for the specific attribute it was trained on.

Key differences include:

1. No new data types are required.
2. Attributes can optionally have multiple dictionaries; the latest
dictionary is used during compression, and the exact dictionary used
during compression is retrieved and applied for decompression.
3. Compression utilizes Zstandard's trained dictionaries when available.

Additionally, I have provided an option for users to define custom
sampling and training logic, as directly passing raw buffers to the
training API may not always yield optimal results, especially for
certain custom variable-length data types. This flexibility motivates
the necessary adjustments to `pg_type`.

I would greatly appreciate your feedback or any additional suggestions
you might have.

[1] https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22%3D5xVBg7S4vr5rQ%40mail.gmail.com

Best regards,
Nikhil Veldanda

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Aleksander Alekseev

Date:

07 March, 14:42:42

Hi Nikhil,

> Thank you for highlighting the previous discussion—I reviewed [1]
> closely. While both methods involve dictionary-based compression, the
> approach I'm proposing differs significantly.
>
> The previous method explicitly extracted string values from JSONB and
> assigned unique OIDs to each entry, resulting in distinct dictionary
> entries for every unique value. In contrast, this approach directly
> leverages Zstandard's dictionary training API. We provide raw data
> samples to Zstd, which generates a dictionary of a specified size.
> This dictionary is then stored in a catalog table and used to compress
> subsequent inserts for the specific attribute it was trained on.
>
> [...]

You didn't read closely enough I'm afraid. As Tom pointed out, the
title of the thread is misleading. On top of that there are several
separate threads. I did my best to cross-reference them, but
apparently didn't do good enough.

Initially I proposed to add ZSON extension [1][2] to the PostgreSQL
core. However the idea evolved into TOAST improvements that don't
require a user to use special types. You may also find interesting the
related "Pluggable TOASTer" discussion [3]. The idea there was rather
different but the discussion about extending TOAST pointers so that in
the future we can use something else than ZSTD is relevant.

You will find the recent summary of the reached agreements somewhere
around this message [4], take a look at the thread a bit above and
below it.

I believe this effort is important. You can't, however, simply discard
everything that was discussed in this area for the past several years.
If you want to succeed of course. No one will look at your patch if it
doesn't account for all the previous discussions. I'm sorry, I know
it's disappointing. This being said you should have done better
research before submitting the code. You could just ask if anyone was
working on something like this before and save a lot of time.

Personally I would suggest starting with one little step toward
compression dictionaries. Particularly focusing on extendability of
TOAST pointers. You are going to need to store dictionary ids there
and allow using other compression algorithms in the future. This will
require something like a varint/utf8-like bitmask for this. See the
previous discussions.

[1]: https://github.com/afiskon/zson
[2]: https://postgr.es/m/CAJ7c6TP3fCC9TNKJBQAcEf4c%3DL7XQZ7QvuUayLgjhNQMD_5M_A%40mail.gmail.com
[3]: https://postgr.es/m/224711f9-83b7-a307-b17f-4457ab73aa0a%40sigaev.ru
[4]: https://postgr.es/m/CAJ7c6TPSN06C%2B5cYSkyLkQbwN1C%2BpUNGmx%2BVoGCA-SPLCszC8w%40mail.gmail.com

--
Best regards,
Aleksander Alekseev

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Aleksander Alekseev

Date:

07 March, 14:56:01

Hi Robert,

> I think that solving the problems around using a dictionary is going
> to be really hard. Can we see some evidence that the results will be
> worth it?

Compression dictionaries give a good compression ratio (~50%) and also
increase TPS a bit (5-10%) due to better buffer cache utilization. At
least according to synthetic and not trustworthy benchmarks I did some
years ago [1]. The result may be very dependent on the actual data of
course, not to mention particular implementation of the idea.

[1]: https://github.com/afiskon/zson/blob/master/docs/benchmark.md

-- 
Best regards,
Aleksander Alekseev

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Nikhil Kumar Veldanda

Date:

08 March, 04:35:47

Hi,

I reviewed the discussions, and while most agreements focused on
changes to the toast pointer, the design I propose requires no
modifications to it. I’ve carefully considered the design choices made
previously, and I recognize Zstd’s clear advantages in compression
efficiency and performance over algorithms like PGLZ and LZ4, we can
integrate it without altering the existing toast pointer
(varatt_external) structure.

By simply using the top two bits of the va_extinfo field (setting them
to '11') in `varatt_external`, we can signal an alternative
compression algorithm, clearly distinguishing new methods from legacy
ones. The specific algorithm used would then be recorded in the
va_cmp_alg field.

This approach addresses the issues raised in the summarized thread[1]
and to leverage dictionaries for the data that can stay in-line. While
my initial patch includes modifications to toast_pointer due to a
single dependency on (pg_column_compression), those changes aren’t
strictly necessary; resolving that dependency separately would make
the overall design even less intrusive.

Here’s an illustrative structure:
```
typedef union
{
    struct    /* Normal varlena (4-byte length) */
    {
        uint32    va_header;
        char    va_data[FLEXIBLE_ARRAY_MEMBER];
    }    va_4byte;
    struct    /* Current Compressed format */
    {
        uint32    va_header;
        uint32    va_tcinfo;    /* Original size and compression method */
        char    va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
    }    va_compressed;
    struct    /* Extended compression format */
    {
        uint32    va_header;
        uint32    va_tcinfo;
        uint32    va_cmp_alg;
        uint32    va_cmp_dictid;
        char    va_data[FLEXIBLE_ARRAY_MEMBER];
    }    va_compressed_ext;
} varattrib_4b;

typedef struct varatt_external
{
int32 va_rawsize; /* Original data size (includes header) */
uint32 va_extinfo; /* External saved size (without header) and
* compression method */ `11` indicates new compression methods.
Oid va_valueid; /* Unique ID of value within TOAST table */
Oid va_toastrelid; /* RelID of TOAST table containing it */
} varatt_external;
```

Decompression flow remains straightforward: once a datum is identified
as external, we detoast it, then we identify the compression algorithm
using `
TOAST_COMPRESS_METHOD` macro which refers to a varattrib_4b structure
not a toast pointer. We retrieve the compression algorithm from either
va_tcinfo or va_cmp_alg based on adjusted macros, and decompress
accordingly.

In summary, integrating Zstandard into the TOAST framework in this
minimally invasive way should yield substantial benefits.

[1] https://www.postgresql.org/message-id/CAJ7c6TPSN06C%2B5cYSkyLkQbwN1C%2BpUNGmx%2BVoGCA-SPLCszC8w%40mail.gmail.com

Best regards,
Nikhil Veldanda

On Fri, Mar 7, 2025 at 3:42 AM Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> Hi Nikhil,
>
> > Thank you for highlighting the previous discussion—I reviewed [1]
> > closely. While both methods involve dictionary-based compression, the
> > approach I'm proposing differs significantly.
> >
> > The previous method explicitly extracted string values from JSONB and
> > assigned unique OIDs to each entry, resulting in distinct dictionary
> > entries for every unique value. In contrast, this approach directly
> > leverages Zstandard's dictionary training API. We provide raw data
> > samples to Zstd, which generates a dictionary of a specified size.
> > This dictionary is then stored in a catalog table and used to compress
> > subsequent inserts for the specific attribute it was trained on.
> >
> > [...]
>
> You didn't read closely enough I'm afraid. As Tom pointed out, the
> title of the thread is misleading. On top of that there are several
> separate threads. I did my best to cross-reference them, but
> apparently didn't do good enough.
>
> Initially I proposed to add ZSON extension [1][2] to the PostgreSQL
> core. However the idea evolved into TOAST improvements that don't
> require a user to use special types. You may also find interesting the
> related "Pluggable TOASTer" discussion [3]. The idea there was rather
> different but the discussion about extending TOAST pointers so that in
> the future we can use something else than ZSTD is relevant.
>
> You will find the recent summary of the reached agreements somewhere
> around this message [4], take a look at the thread a bit above and
> below it.
>
> I believe this effort is important. You can't, however, simply discard
> everything that was discussed in this area for the past several years.
> If you want to succeed of course. No one will look at your patch if it
> doesn't account for all the previous discussions. I'm sorry, I know
> it's disappointing. This being said you should have done better
> research before submitting the code. You could just ask if anyone was
> working on something like this before and save a lot of time.
>
> Personally I would suggest starting with one little step toward
> compression dictionaries. Particularly focusing on extendability of
> TOAST pointers. You are going to need to store dictionary ids there
> and allow using other compression algorithms in the future. This will
> require something like a varint/utf8-like bitmask for this. See the
> previous discussions.
>
> [1]: https://github.com/afiskon/zson
> [2]: https://postgr.es/m/CAJ7c6TP3fCC9TNKJBQAcEf4c%3DL7XQZ7QvuUayLgjhNQMD_5M_A%40mail.gmail.com
> [3]: https://postgr.es/m/224711f9-83b7-a307-b17f-4457ab73aa0a%40sigaev.ru
> [4]: https://postgr.es/m/CAJ7c6TPSN06C%2B5cYSkyLkQbwN1C%2BpUNGmx%2BVoGCA-SPLCszC8w%40mail.gmail.com
>
> --
> Best regards,
> Aleksander Alekseev

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Nikhil Kumar Veldanda

Date:

08 March, 19:28:17

Hi all,

Attached an updated version of the patch. Specifically, I've removed
changes related to the TOAST pointer structure. This proposal is
different from earlier discussions on this topic[1], where extending
the TOAST pointer was considered essential for enabling
dictionary-based compression.

Key improvements introduced in this proposal:

1. No Changes to TOAST Pointer: The existing TOAST pointer structure
remains untouched, simplifying integration and minimizing potential
disruptions.

2. Extensible Design: The solution is structured to seamlessly
incorporate future compression algorithms beyond zstd [2], providing
greater flexibility and future-proofing.

3. Inline Data Compression with Dictionary Support: Crucially, this
approach supports dictionary-based compression for inline data.
Dictionaries are highly effective for compressing small-sized
documents, providing substantial storage savings. Please refer to the
attached image from the zstd README[2] for supporting evidence.
Omitting dictionary-based compression for inline data would
significantly reduce these benefits. For example, under previous
design constraints [3], if a 16KB document compressed down to 256
bytes using a dictionary, storing this inline would not have been
feasible. The current proposal addresses this limitation, thereby
fully leveraging dictionary-based compression.

I believe this solution effectively addresses the limitations
identified in our earlier discussions [1][3].

Feedback on this approach would be greatly appreciated, I welcome any
feedback or suggestions you might have.

References:
[1]
https://www.postgresql.org/message-id/flat/CAJ7c6TPSN06C%2B5cYSkyLkQbwN1C%2BpUNGmx%2BVoGCA-SPLCszC8w%40mail.gmail.com
[2] https://github.com/facebook/zstd
[3] https://www.postgresql.org/message-id/CAJ7c6TPSN06C%2B5cYSkyLkQbwN1C%2BpUNGmx%2BVoGCA-SPLCszC8w%40mail.gmail.com

```
typedef union
{
struct /* Normal varlena (4-byte length) */
{
uint32 va_header;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_4byte;
struct /* Compressed-in-line format */
{
uint32 va_header;
uint32 va_tcinfo; /* Original data size (excludes header) and
* compression method; see va_extinfo */
char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
} va_compressed;
struct
{
uint32 va_header;
uint32 va_tcinfo;
uint32 va_cmp_alg;
uint32 va_cmp_dictid;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_compressed_ext;
} varattrib_4b;
```
Additional algorithm information and dictid is stored in varattrib_4b.

Best regards,
Nikhil Veldanda

On Fri, Mar 7, 2025 at 5:35 PM Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:
>
> Hi,
>
> I reviewed the discussions, and while most agreements focused on
> changes to the toast pointer, the design I propose requires no
> modifications to it. I’ve carefully considered the design choices made
> previously, and I recognize Zstd’s clear advantages in compression
> efficiency and performance over algorithms like PGLZ and LZ4, we can
> integrate it without altering the existing toast pointer
> (varatt_external) structure.
>
> By simply using the top two bits of the va_extinfo field (setting them
> to '11') in `varatt_external`, we can signal an alternative
> compression algorithm, clearly distinguishing new methods from legacy
> ones. The specific algorithm used would then be recorded in the
> va_cmp_alg field.
>
> This approach addresses the issues raised in the summarized thread[1]
> and to leverage dictionaries for the data that can stay in-line. While
> my initial patch includes modifications to toast_pointer due to a
> single dependency on (pg_column_compression), those changes aren’t
> strictly necessary; resolving that dependency separately would make
> the overall design even less intrusive.
>
> Here’s an illustrative structure:
> ```
> typedef union
> {
>     struct    /* Normal varlena (4-byte length) */
>     {
>         uint32    va_header;
>         char    va_data[FLEXIBLE_ARRAY_MEMBER];
>     }    va_4byte;
>     struct    /* Current Compressed format */
>     {
>         uint32    va_header;
>         uint32    va_tcinfo;    /* Original size and compression method */
>         char    va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
>     }    va_compressed;
>     struct    /* Extended compression format */
>     {
>         uint32    va_header;
>         uint32    va_tcinfo;
>         uint32    va_cmp_alg;
>         uint32    va_cmp_dictid;
>         char    va_data[FLEXIBLE_ARRAY_MEMBER];
>     }    va_compressed_ext;
> } varattrib_4b;
>
> typedef struct varatt_external
> {
> int32 va_rawsize; /* Original data size (includes header) */
> uint32 va_extinfo; /* External saved size (without header) and
> * compression method */ `11` indicates new compression methods.
> Oid va_valueid; /* Unique ID of value within TOAST table */
> Oid va_toastrelid; /* RelID of TOAST table containing it */
> } varatt_external;
> ```
>
> Decompression flow remains straightforward: once a datum is identified
> as external, we detoast it, then we identify the compression algorithm
> using `
> TOAST_COMPRESS_METHOD` macro which refers to a varattrib_4b structure
> not a toast pointer. We retrieve the compression algorithm from either
> va_tcinfo or va_cmp_alg based on adjusted macros, and decompress
> accordingly.
>
> In summary, integrating Zstandard into the TOAST framework in this
> minimally invasive way should yield substantial benefits.
>
> [1] https://www.postgresql.org/message-id/CAJ7c6TPSN06C%2B5cYSkyLkQbwN1C%2BpUNGmx%2BVoGCA-SPLCszC8w%40mail.gmail.com
>
> Best regards,
> Nikhil Veldanda
>
> On Fri, Mar 7, 2025 at 3:42 AM Aleksander Alekseev
> <aleksander@timescale.com> wrote:
> >
> > Hi Nikhil,
> >
> > > Thank you for highlighting the previous discussion—I reviewed [1]
> > > closely. While both methods involve dictionary-based compression, the
> > > approach I'm proposing differs significantly.
> > >
> > > The previous method explicitly extracted string values from JSONB and
> > > assigned unique OIDs to each entry, resulting in distinct dictionary
> > > entries for every unique value. In contrast, this approach directly
> > > leverages Zstandard's dictionary training API. We provide raw data
> > > samples to Zstd, which generates a dictionary of a specified size.
> > > This dictionary is then stored in a catalog table and used to compress
> > > subsequent inserts for the specific attribute it was trained on.
> > >
> > > [...]
> >
> > You didn't read closely enough I'm afraid. As Tom pointed out, the
> > title of the thread is misleading. On top of that there are several
> > separate threads. I did my best to cross-reference them, but
> > apparently didn't do good enough.
> >
> > Initially I proposed to add ZSON extension [1][2] to the PostgreSQL
> > core. However the idea evolved into TOAST improvements that don't
> > require a user to use special types. You may also find interesting the
> > related "Pluggable TOASTer" discussion [3]. The idea there was rather
> > different but the discussion about extending TOAST pointers so that in
> > the future we can use something else than ZSTD is relevant.
> >
> > You will find the recent summary of the reached agreements somewhere
> > around this message [4], take a look at the thread a bit above and
> > below it.
> >
> > I believe this effort is important. You can't, however, simply discard
> > everything that was discussed in this area for the past several years.
> > If you want to succeed of course. No one will look at your patch if it
> > doesn't account for all the previous discussions. I'm sorry, I know
> > it's disappointing. This being said you should have done better
> > research before submitting the code. You could just ask if anyone was
> > working on something like this before and save a lot of time.
> >
> > Personally I would suggest starting with one little step toward
> > compression dictionaries. Particularly focusing on extendability of
> > TOAST pointers. You are going to need to store dictionary ids there
> > and allow using other compression algorithms in the future. This will
> > require something like a varint/utf8-like bitmask for this. See the
> > previous discussions.
> >
> > [1]: https://github.com/afiskon/zson
> > [2]: https://postgr.es/m/CAJ7c6TP3fCC9TNKJBQAcEf4c%3DL7XQZ7QvuUayLgjhNQMD_5M_A%40mail.gmail.com
> > [3]: https://postgr.es/m/224711f9-83b7-a307-b17f-4457ab73aa0a%40sigaev.ru
> > [4]: https://postgr.es/m/CAJ7c6TPSN06C%2B5cYSkyLkQbwN1C%2BpUNGmx%2BVoGCA-SPLCszC8w%40mail.gmail.com
> >
> > --
> > Best regards,
> > Aleksander Alekseev

Attachment

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Robert Haas

Date:

17 March, 23:02:54

On Fri, Mar 7, 2025 at 8:36 PM Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:
>     struct    /* Extended compression format */
>     {
>         uint32    va_header;
>         uint32    va_tcinfo;
>         uint32    va_cmp_alg;
>         uint32    va_cmp_dictid;
>         char    va_data[FLEXIBLE_ARRAY_MEMBER];
>     }    va_compressed_ext;
> } varattrib_4b;

First, thanks for sending along the performance results. I agree that
those are promising. Second, thanks for sending these design details.

The idea of keeping dictionaries in pg_zstd_dictionaries literally
forever doesn't seem very appealing, but I'm not sure what the other
options are. I think we've established in previous work in this area
that compressed values can creep into unrelated tables and inside
records or other container types like ranges. Therefore, we have no
good way of knowing when a dictionary is unreferenced and can be
dropped. So in that sense your decision to keep them forever is
"right," but it's still unpleasant. It would even be necessary to make
pg_upgrade carry them over to new versions.

If we could make sure that compressed datums never leaked out into
other tables, then tables could depend on dictionaries and
dictionaries could be dropped when there were no longer any tables
depending on them. But like I say, previous work suggested that this
would be very difficult to achieve. However, without that, I imagine
users generating new dictionaries regularly as the data changes and
eventually getting frustrated that they can't get rid of the old ones.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Nikhil Kumar Veldanda

Date:

15 April, 21:13:29

Hi Robert,

Thank you for your response, and apologies for the delay in getting
back to you. You raised some important concerns in your reply, I’ve
worked hard to understand and hopefully address these two:

* Dictionary Cleanup via Dependency Tracking
* Addressing Compressed Datum Leaks problem (via CTAS, INSERT INTO ...
SELECT ...)

Dictionary Cleanup via Dependency Tracking:

To address your question on how we can safely clean up unused
dictionaries, I’ve implemented a mechanism based on PostgreSQL’s
standard dependency system (pg_depend), permit me to explain.

When a Zstandard dictionary is created for a table, we record a
DEPENDENCY_NORMAL dependency from the table to the dictionary. This
ensures that when the table is dropped, the corresponding entry is
removed from the pg_depend catalog. Users can then call the
cleanup_unused_dictionaries() function to remove any dictionaries that
are no longer referenced by any table.

// create dependency,
{
    ObjectAddress dictObj;
    ObjectAddress relation;

    ObjectAddressSet(dictObj, ZstdDictionariesRelationId, dictid);
    ObjectAddressSet(relation, RelationRelationId, relid);

    /* NORMAL dependency: relid → Dictionary */
    recordDependencyOn(&relation, &dictObj, DEPENDENCY_NORMAL);
}

Example: Consider two tables, each using its own Zstandard dictionary:

test=# \dt+
                                    List of tables
 Schema | Name  | Type  |  Owner   | Persistence | Access method |
Size  | Description
--------+-------+-------+----------+-------------+---------------+-------+-------------
 public | temp  | table | nikhilkv | permanent   | heap          | 16 kB |
 public | temp1 | table | nikhilkv | permanent   | heap          | 16 kB |
(2 rows)


// Dictionary dependencies
test=# select * from pg_depend where refclassid = 9946;
 classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype
---------+-------+----------+------------+----------+-------------+---------
    1259 | 16389 |        0 |       9946 |        1 |           0 | n
    1259 | 16394 |        0 |       9946 |        2 |           0 | n
(2 rows)

// the corresponding dictionaries:
test=# select * from pg_zstd_dictionaries ;
 dictid |
        dict

--------+----------------------------------------------------------------------------------------------------------------
        ---------------------------------------------------------------------------------------------------------------
        ---------------------------------------------------------------------------------------------------------------
        --------------------------------------
      1 |
\x37a430ec71451a10091010df303333b3770a33f1783c1e8fc7e3f1783ccff3bcf7d442414141414141414141414141414141414141414

14141414141a15028140a8542a15028140a85a2288aa2284a297d74e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1f1783c1e8fc7e3f1789ee779ef01

0100000004000000080000004c6f72656d20697073756d20646f6c6f722073697420616d65742c20636f6e73656374657475722061646970697363696
        e6720656c69742e204c6f72656d2069
      2 |
\x37a430ec7d1a933a091010df303333b3770a33f1783c1e8fc7e3f1783ccff3bcf7d442414141414141414141414141414141414141414

14141414141a15028140a8542a15028140a85a2288aa2284a297d74e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1f1783c1e8fc7e3f1789ee779ef01

0100000004000000080000004e696b68696c206b756d616e722076656c64616e64612c206973206f6b61792063616e6469646174652c2068652069732
        0696e2073656174746c65204e696b68696c20
(2 rows)

If cleanup_unused_dictionaries() is called while the dependencies
still exist, nothing is removed:

test=# select cleanup_unused_dictionaries();
 cleanup_unused_dictionaries
-----------------------------
                           0
(1 row)

After dropping temp1, the associated dictionary becomes eligible for cleanup:

test=# drop table temp1;
DROP TABLE

test=# select cleanup_unused_dictionaries();
 cleanup_unused_dictionaries
-----------------------------
                           1
(1 row)

________________________________
Addressing Compressed Datum Leaks problem (via CTAS, INSERT INTO ... SELECT ...)

As compressed datums can be copied to other unrelated tables via CTAS,
INSERT INTO ... SELECT, or CREATE TABLE ... EXECUTE, I’ve introduced a
method inheritZstdDictionaryDependencies. This method is invoked at
the end of such statements and ensures that any dictionary
dependencies from source tables are copied to the destination table.
We determine the set of source tables using the relationOids field in
PlannedStmt.

This guarantees that if compressed datums reference a zstd dictionary
the destination table is marked as dependent on the dictionaries that
the source tables depend on, preventing premature cleanup by
cleanup_unused_dictionaries.

Example: Consider this example where we have two tables which has
their own dictionary

                                    List of tables
 Schema | Name  | Type  |  Owner   | Persistence | Access method |
Size  | Description
--------+-------+-------+----------+-------------+---------------+-------+-------------
 public | temp  | table | nikhilkv | permanent   | heap          | 16 kB |
 public | temp1 | table | nikhilkv | permanent   | heap          | 16 kB |
(2 rows)

Using CTAS (CREATE TABLE AS), one table is copied to another. In this
case, the compressed datums in the temp table are copied to copy_tbl.
Since the dictionary is shared between two tables, a dependency on
that dictionary is also established for the destination table. Even if
the original temp table is deleted and cleanup is triggered, the
dictionary will not be dropped because there remains an active
dependency.

test=# create table copy_tbl as select * from temp;
SELECT 20

// dictid 1 is shared between two tables.
test=# select * from pg_depend where refclassid = 9946;
 classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype
---------+-------+----------+------------+----------+-------------+---------
    1259 | 16389 |        0 |       9946 |        1 |           0 | n
    1259 | 16404 |        0 |       9946 |        1 |           0 | n
    1259 | 16399 |        0 |       9946 |        3 |           0 | n
(3 rows)

// After dropping the temp tale where dictid 1 is used to compress datums
test=# drop table temp;
DROP TABLE

// dependency for temp table is dropped.
test=# select * from pg_depend where refclassid = 9946;
 classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype
---------+-------+----------+------------+----------+-------------+---------
    1259 | 16404 |        0 |       9946 |        1 |           0 | n
    1259 | 16399 |        0 |       9946 |        3 |           0 | n
(2 rows)

// No dictionaries are being deleted.
test=# select cleanup_unused_dictionaries();
 cleanup_unused_dictionaries
-----------------------------
                           0
(1 row)

Once the new copy_tbl is also deleted, the dictionary can be dropped
because no dependency exists on it:

test=# drop table copy_tbl;
DROP TABLE

// The dictionary is then deleted.
test=# select cleanup_unused_dictionaries();
 cleanup_unused_dictionaries
-----------------------------
                           1
(1 row)

Another example using composite types, including a more complex
scenario involving two source tables.

// Create a base composite type with two text fields
test=# create type my_composite as (f1 text, f2 text);
CREATE TYPE

// Create a nested composite type that uses my_composite twice
test=# create type my_composite1 as (f1 my_composite, f2 my_composite);
CREATE TYPE

test=# \d my_composite
      Composite type "public.my_composite"
 Column | Type | Collation | Nullable | Default
--------+------+-----------+----------+---------
 f1     | text |           |          |
 f2     | text |           |          |

test=# \d my_composite1
         Composite type "public.my_composite1"
 Column |     Type     | Collation | Nullable | Default
--------+--------------+-----------+----------+---------
 f1     | my_composite |           |          |
 f2     | my_composite |           |          |


// Sample table with ZSTD dictionary compression on text columns
test=# \d+ orders
                                            Table "public.orders"
   Column    |  Type   | Collation | Nullable | Default | Storage  |
Compression | Stats target | Description
-------------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
 order_id    | integer |           |          |         | plain    |
          |              |
 customer_id | integer |           |          |         | plain    |
          |              |
 random1     | text    |           |          |         | extended |
zstd        |              |
 random2     | text    |           |          |         | extended |
zstd        |              |
Access method: heap

// Sample table with ZSTD dictionary compression on one of the text column
test=# \d+ customers
                                           Table "public.customers"
   Column    |  Type   | Collation | Nullable | Default | Storage  |
Compression | Stats target | Description
-------------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
 customer_id | integer |           |          |         | plain    |
          |              |
 random3     | text    |           |          |         | extended |
zstd        |              |
 random4     | text    |           |          |         | extended |
          |              |
Access method: heap

// Check existing dictionaries: dictid 1 for random1, dictid 2 for
random2, dictid 3 for random3 attribute
test=# select dictid from pg_zstd_dictionaries;
 dictid
--------
      1
      2
      3
(3 rows)

// List all objects dependent on ZSTD dictionaries
test=# select objid::regclass, * from pg_depend where refclassid = 9946;
   objid   | classid | objid | objsubid | refclassid | refobjid |
refobjsubid | deptype
-----------+---------+-------+----------+------------+----------+-------------+---------
 orders    |    1259 | 16391 |        0 |       9946 |        1 |
     0 | n
 orders    |    1259 | 16391 |        0 |       9946 |        2 |
     0 | n
 customers |    1259 | 16396 |        0 |       9946 |        3 |
     0 | n
(3 rows)

// Create new table using nested composite type
// This copies compressed datums into temp1.
test=# create table temp1 as
    select ROW(
            ROW(random3, random4)::my_composite,
            ROW(random1, random2)::my_composite
            )::my_composite1
    from customers full outer join orders using (customer_id);
SELECT 51

test=# select objid::regclass, * from pg_depend where refclassid = 9946;
   objid   | classid | objid | objsubid | refclassid | refobjid |
refobjsubid | deptype
-----------+---------+-------+----------+------------+----------+-------------+---------
 orders    |    1259 | 16391 |        0 |       9946 |        1 |
     0 | n
 temp1     |    1259 | 16423 |        0 |       9946 |        1 |
     0 | n
 orders    |    1259 | 16391 |        0 |       9946 |        2 |
     0 | n
 temp1     |    1259 | 16423 |        0 |       9946 |        2 |
     0 | n
 temp1     |    1259 | 16423 |        0 |       9946 |        3 |
     0 | n
 customers |    1259 | 16396 |        0 |       9946 |        3 |
     0 | n
(6 rows)

// Drop the original source tables.
test=# drop table orders;
DROP TABLE

test=# drop table customers ;
DROP TABLE

// Even after dropping orders, customers table, temp1 still holds
references to the dictionaries.
test=# select objid::regclass, * from pg_depend where refclassid = 9946;
 objid | classid | objid | objsubid | refclassid | refobjid |
refobjsubid | deptype
-------+---------+-------+----------+------------+----------+-------------+---------
 temp1 |    1259 | 16423 |        0 |       9946 |        1 |           0 | n
 temp1 |    1259 | 16423 |        0 |       9946 |        2 |           0 | n
 temp1 |    1259 | 16423 |        0 |       9946 |        3 |           0 | n
(3 rows)

// Attempt cleanup, No cleanup occurs, because temp1 table still
depends on the dictionaries.
test=# select cleanup_unused_dictionaries();
 cleanup_unused_dictionaries
-----------------------------
                           0
(1 row)

test=# select dictid from pg_zstd_dictionaries ;
 dictid
--------
      1
      2
      3
(3 rows)

// Drop the destination table
test=# drop table temp1;
DROP TABLE

// Confirm no remaining dependencies
test=# select objid::regclass, * from pg_depend where refclassid = 9946;
 objid | classid | objid | objsubid | refclassid | refobjid |
refobjsubid | deptype
-------+---------+-------+----------+------------+----------+-------------+---------
(0 rows)

// Cleanup now succeeds
test=# select cleanup_unused_dictionaries();
 cleanup_unused_dictionaries
-----------------------------
                           3
(1 row)

test=# select dictid from pg_zstd_dictionaries ;
 dictid
--------
(0 rows)


This design ensures that:

Dictionaries are only deleted when no table depends on them.
We avoid costly decompression/recompression to avoid compressed datum leakage.
We don’t retain dictionaries forever.

These changes are the core additions in this revision of the patch to
address concern around long-lived dictionaries and compressed datum
leakage. Additionally, this update incorporates feedback by enabling
automatic zstd dictionary generation and cleanup during the VACUUM
process and includes changes to support copying ZSTD dictionaries
during pg_upgrade.

Patch summary:

v11-0001-varattrib_4b-changes-and-macros-update-needed-to.patch
Refactors varattrib_4b structures and updates related macros to enable
ZSTD dictionary support.
v11-0002-Zstd-compression-and-decompression-routines-incl.patch
Adds ZSTD compression and decompression routines, and introduces a new
catalog to store dictionary metadata.
v11-0003-Zstd-dictionary-training-process.patch
Implements the dictionary training workflow. Includes built-in support
for text and jsonb types. Allows users to define custom sampling
functions per type by specifying a C function name in the
pg_type.typzstdsampling field.
v11-0004-Dependency-tracking-mechanism-to-track-compresse.patch
Introduces a dependency tracking mechanism using pg_depend to record
which ZSTD dictionaries a table depends on. When compressed datums
that rely on a dictionary are copied to unrelated target tables, the
corresponding dictionary dependencies from the source table are also
recorded for the target table, ensuring the dictionaries are not
prematurely cleaned up.
v11-0005-generate-and-cleanup-dictionaries-using-vacuum.patch
Adds integration with VACUUM to automatically generate and clean up
ZSTD dictionaries.
v11-0006-pg_dump-pg_upgrade-needed-changes-to-support-new.patch
Extends pg_dump and pg_upgrade to support migrating ZSTD dictionaries
and their dependencies during pg_upgrade.
v11-0007-Some-tests-related-to-zstd-dictionary-based-comp.patch
Provides test coverage for ZSTD dictionary-based compression features,
including training, usage, and cleanup.

I hope that these changes address your concerns, any thoughts or
suggestions on this approach are welcome.

Best regards,
Nikhil Veldanda

On Mon, Mar 17, 2025 at 1:03 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Mar 7, 2025 at 8:36 PM Nikhil Kumar Veldanda
> <veldanda.nikhilkumar17@gmail.com> wrote:
> >     struct    /* Extended compression format */
> >     {
> >         uint32    va_header;
> >         uint32    va_tcinfo;
> >         uint32    va_cmp_alg;
> >         uint32    va_cmp_dictid;
> >         char    va_data[FLEXIBLE_ARRAY_MEMBER];
> >     }    va_compressed_ext;
> > } varattrib_4b;
>
> First, thanks for sending along the performance results. I agree that
> those are promising. Second, thanks for sending these design details.
>
> The idea of keeping dictionaries in pg_zstd_dictionaries literally
> forever doesn't seem very appealing, but I'm not sure what the other
> options are. I think we've established in previous work in this area
> that compressed values can creep into unrelated tables and inside
> records or other container types like ranges. Therefore, we have no
> good way of knowing when a dictionary is unreferenced and can be
> dropped. So in that sense your decision to keep them forever is
> "right," but it's still unpleasant. It would even be necessary to make
> pg_upgrade carry them over to new versions.
>
> If we could make sure that compressed datums never leaked out into
> other tables, then tables could depend on dictionaries and
> dictionaries could be dropped when there were no longer any tables
> depending on them. But like I say, previous work suggested that this
> would be very difficult to achieve. However, without that, I imagine
> users generating new dictionaries regularly as the data changes and
> eventually getting frustrated that they can't get rid of the old ones.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com

Attachment

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Robert Haas

Date:

18 April, 19:22:18

On Tue, Apr 15, 2025 at 2:13 PM Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:
> Addressing Compressed Datum Leaks problem (via CTAS, INSERT INTO ... SELECT ...)
>
> As compressed datums can be copied to other unrelated tables via CTAS,
> INSERT INTO ... SELECT, or CREATE TABLE ... EXECUTE, I’ve introduced a
> method inheritZstdDictionaryDependencies. This method is invoked at
> the end of such statements and ensures that any dictionary
> dependencies from source tables are copied to the destination table.
> We determine the set of source tables using the relationOids field in
> PlannedStmt.

With the disclaimer that I haven't opened the patch or thought
terribly deeply about this issue, at least not yet, my fairly strong
suspicion is that this design is not going to work out, for multiple
reasons. In no particular order:

1. I don't think users will like it if dependencies on a zstd
dictionary spread like kudzu across all of their tables. I don't think
they'd like it even if it were 100% accurate, but presumably this is
going to add dependencies any time there MIGHT be a real dependency
rather than only when there actually is one.

2. Inserting into a table or updating it only takes RowExclusiveLock,
which is not even self-exclusive. I doubt that it's possible to change
system catalogs in a concurrency-safe way with such a weak lock. For
instance, if two sessions tried to do the same thing in concurrent
transactions, they could both try to add the same dependency at the
same time.

3. I'm not sure that CTAS, INSERT INTO...SELECT, and CREATE
TABLE...EXECUTE are the only ways that datums can creep from one table
into another. For example, what if I create a plpgsql function that
gets a value from one table and stores it in a variable, and then use
that variable to drive an INSERT into another table? I seem to recall
there are complex cases involving records and range types and arrays,
too, where the compressed object gets wrapped inside of another
object; though maybe that wouldn't matter to your implementation if
INSERT INTO ... SELECT uses a sufficiently aggressive strategy for
adding dependencies.

When Dilip and I were working on lz4 TOAST compression, my first
instinct was to not let LZ4-compressed datums leak out of a table by
forcing them to be decompressed (and then possibly recompressed). We
spent a long time trying to make that work before giving up. I think
this is approximately where things started to unravel, and I'd suggest
you read both this message and some of the discussion before and
after:

https://www.postgresql.org/message-id/20210316185455.5gp3c5zvvvq66iyj@alap3.anarazel.de

I think we could add plain-old zstd compression without really
tackling this issue, but if we are going to add dictionaries then I
think we might need to revisit the idea of preventing things from
leaking out of tables. What I can't quite remember at the moment is
how much of the problem was that it was going to be slow to force the
recompression, and how much of it was that we weren't sure we could
even find all the places in the code that might need such handling.

I'm now also curious to know whether Andres would agree that it's bad
if zstd dictionaries are un-droppable. After all, I thought it would
be bad if there was no way to eliminate a dependency on a compression
method, and he disagreed. So maybe he would also think undroppable
dictionaries are fine. But maybe not. It seems even worse to me than
undroppable compression methods, because you'll probably not have that
many compression methods ever, but you could have a large number of
dictionaries eventually.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Michael Paquier

Date:

21 April, 10:02:37

On Fri, Apr 18, 2025 at 12:22:18PM -0400, Robert Haas wrote:
> I think we could add plain-old zstd compression without really
> tackling this issue, but if we are going to add dictionaries then I
> think we might need to revisit the idea of preventing things from
> leaking out of tables. What I can't quite remember at the moment is
> how much of the problem was that it was going to be slow to force the
> recompression, and how much of it was that we weren't sure we could
> even find all the places in the code that might need such handling.

FWIW, this point resonates here.  There is one thing that we have to
do anyway: we just have one bit left in the varlena headers as lz4 is
using the one before last.  So we have to make it extensible, even if
it means that any compression method other than LZ4 and pglz would
consume one more byte in its header by default.  And I think that this
has to happen at some point if we want flexibility in this area.

+    struct
+    {
+        uint32        va_header;
+        uint32        va_tcinfo;
+        uint32        va_cmp_alg;
+        uint32        va_cmp_dictid;
+        char        va_data[FLEXIBLE_ARRAY_MEMBER];
+    }            va_compressed_ext;

Speaking of which, I am confused by this abstraction choice in
varatt.h in the first patch.  Are we sure that we are always going to
have a dictionary attached to a compressed data set or even a
va_cmp_alg?  It seems to me that this could lead to a waste of data in
some cases because these fields may not be required depending on the
compression method used, as some fields may not care about these
details.  This kind of data should be made optional, on a per-field
basis.

One thing that I've been wondering is how it would be possible to make
the area around varattrib_4b more readable while dealing with more
extensibility.  It would be a good occasion to improve that, even if
I'm hand-waving here currently and that the majority of this code is
old enough to vote, with few modifications across the years.

The second thing that I'd love to see on top of the addition of the
extensibility is adding plain compression support for zstd, with
nothing fancy, just the compression and decompression bits.  I've done
quite a few benchmarks with the two, and results kind of point in the
direction that zstd is more efficient than lz4 overall.  Don't take me
wrong: lz4 can be better in some workloads as it can consume less CPU
than zstd while compressing less.  However, a comparison of ratios
like (compression rate / cpu used) has always led me to see zstd as
superior in a large number of cases.  lz4 is still very good if you
are CPU-bound and don't care about the extra space required.  Both are
three classes better than pglz.

Once we have these three points incrementally built-in together (the
last bit extensibility, the potential varatt.h refactoring and the
zstd support), there may be a point in having support for more
advanced options with the compression methods in the shape of dicts or
more requirements linked to other compression methods, but I think the
topic is complex enough that we should make sure that these basics are
implemented in a way sane enough so as we'd be able to extend them
with all the use cases in mind.
--
Michael

Attachment

signature.asc

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Nikhil Kumar Veldanda

Date:

22 April, 03:52:00

Hi Robert,

Thank you for your feedback on the patch. You’re right that my
proposed design will introduce more dictionary dependencies as
dictionaries grow, I chose this path specifically to avoid changing
existing system behavior and prevent perf regressions in CTAS and
related commands.

After reviewing the email thread you attached on previous response, I
identified a natural choke point for both inserts and updates: the
call to "heap_toast_insert_or_update" inside
heap_prepare_insert/heap_update. In the current master branch, that
function only runs when HeapTupleHasExternal is true; my patch extends
it to HeapTupleHasVarWidth tuples as well. By decompressing every
nested compressed datum at this point—no matter how deeply nested—we
can prevent any leaked datum from propagating into unrelated tables.
This mirrors the existing inlining logic in toast_tuple_init for
external toasted datum, but takes it one step further to fully flatten
datum(decompress datum, not just top level at every level).

On the performance side, my basic benchmarks show almost no regression
for simple INSERT … VALUES workloads. CTAS, however, does regress
noticeably: a CTAS completes in about 4 seconds before this patch, but
with this patch it takes roughly 24 seconds. (For reference, a normal
insert into the source table took about 58 seconds when using zstd
dictionary compression), I suspect the extra cost comes from the added
zstd decompression and PGLZ compression on the destination table.

I’ve attached v13-0008-initial-draft-to-address-datum-leak-problem.patch,
which implements this “flatten_datum” method.

I’d love to know your thoughts on this. Am I on the right track for
solving the problem?

Best regards,
Nikhil Veldanda

On Fri, Apr 18, 2025 at 9:22 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Apr 15, 2025 at 2:13 PM Nikhil Kumar Veldanda
> <veldanda.nikhilkumar17@gmail.com> wrote:
> > Addressing Compressed Datum Leaks problem (via CTAS, INSERT INTO ... SELECT ...)
> >
> > As compressed datums can be copied to other unrelated tables via CTAS,
> > INSERT INTO ... SELECT, or CREATE TABLE ... EXECUTE, I’ve introduced a
> > method inheritZstdDictionaryDependencies. This method is invoked at
> > the end of such statements and ensures that any dictionary
> > dependencies from source tables are copied to the destination table.
> > We determine the set of source tables using the relationOids field in
> > PlannedStmt.
>
> With the disclaimer that I haven't opened the patch or thought
> terribly deeply about this issue, at least not yet, my fairly strong
> suspicion is that this design is not going to work out, for multiple
> reasons. In no particular order:
>
> 1. I don't think users will like it if dependencies on a zstd
> dictionary spread like kudzu across all of their tables. I don't think
> they'd like it even if it were 100% accurate, but presumably this is
> going to add dependencies any time there MIGHT be a real dependency
> rather than only when there actually is one.
>
> 2. Inserting into a table or updating it only takes RowExclusiveLock,
> which is not even self-exclusive. I doubt that it's possible to change
> system catalogs in a concurrency-safe way with such a weak lock. For
> instance, if two sessions tried to do the same thing in concurrent
> transactions, they could both try to add the same dependency at the
> same time.
>
> 3. I'm not sure that CTAS, INSERT INTO...SELECT, and CREATE
> TABLE...EXECUTE are the only ways that datums can creep from one table
> into another. For example, what if I create a plpgsql function that
> gets a value from one table and stores it in a variable, and then use
> that variable to drive an INSERT into another table? I seem to recall
> there are complex cases involving records and range types and arrays,
> too, where the compressed object gets wrapped inside of another
> object; though maybe that wouldn't matter to your implementation if
> INSERT INTO ... SELECT uses a sufficiently aggressive strategy for
> adding dependencies.
>
> When Dilip and I were working on lz4 TOAST compression, my first
> instinct was to not let LZ4-compressed datums leak out of a table by
> forcing them to be decompressed (and then possibly recompressed). We
> spent a long time trying to make that work before giving up. I think
> this is approximately where things started to unravel, and I'd suggest
> you read both this message and some of the discussion before and
> after:
>
> https://www.postgresql.org/message-id/20210316185455.5gp3c5zvvvq66iyj@alap3.anarazel.de
>
> I think we could add plain-old zstd compression without really
> tackling this issue, but if we are going to add dictionaries then I
> think we might need to revisit the idea of preventing things from
> leaking out of tables. What I can't quite remember at the moment is
> how much of the problem was that it was going to be slow to force the
> recompression, and how much of it was that we weren't sure we could
> even find all the places in the code that might need such handling.
>
> I'm now also curious to know whether Andres would agree that it's bad
> if zstd dictionaries are un-droppable. After all, I thought it would
> be bad if there was no way to eliminate a dependency on a compression
> method, and he disagreed. So maybe he would also think undroppable
> dictionaries are fine. But maybe not. It seems even worse to me than
> undroppable compression methods, because you'll probably not have that
> many compression methods ever, but you could have a large number of
> dictionaries eventually.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com

Attachment

v13-0008-initial-draft-to-address-datum-leak-problem.patch

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Nikhil Kumar Veldanda

Date:

22 April, 04:53:33

Hi Michael,

Thanks for the feedback and the suggested patch sequence. I completely
agree—we must minimize storage overhead when dictionaries aren’t used,
while ensuring varattrib_4b remains extensible enough to handle future
compression metadata beyond dictionary ID (for other algorithms). I’ll
explore design options that satisfy both goals and share my proposal.

Best regards,
Nikhil Veldanda

On Mon, Apr 21, 2025 at 12:02 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Apr 18, 2025 at 12:22:18PM -0400, Robert Haas wrote:
> > I think we could add plain-old zstd compression without really
> > tackling this issue, but if we are going to add dictionaries then I
> > think we might need to revisit the idea of preventing things from
> > leaking out of tables. What I can't quite remember at the moment is
> > how much of the problem was that it was going to be slow to force the
> > recompression, and how much of it was that we weren't sure we could
> > even find all the places in the code that might need such handling.
>
> FWIW, this point resonates here.  There is one thing that we have to
> do anyway: we just have one bit left in the varlena headers as lz4 is
> using the one before last.  So we have to make it extensible, even if
> it means that any compression method other than LZ4 and pglz would
> consume one more byte in its header by default.  And I think that this
> has to happen at some point if we want flexibility in this area.
>
> +    struct
> +    {
> +        uint32        va_header;
> +        uint32        va_tcinfo;
> +        uint32        va_cmp_alg;
> +        uint32        va_cmp_dictid;
> +        char        va_data[FLEXIBLE_ARRAY_MEMBER];
> +    }            va_compressed_ext;
>
> Speaking of which, I am confused by this abstraction choice in
> varatt.h in the first patch.  Are we sure that we are always going to
> have a dictionary attached to a compressed data set or even a
> va_cmp_alg?  It seems to me that this could lead to a waste of data in
> some cases because these fields may not be required depending on the
> compression method used, as some fields may not care about these
> details.  This kind of data should be made optional, on a per-field
> basis.
>
> One thing that I've been wondering is how it would be possible to make
> the area around varattrib_4b more readable while dealing with more
> extensibility.  It would be a good occasion to improve that, even if
> I'm hand-waving here currently and that the majority of this code is
> old enough to vote, with few modifications across the years.
>
> The second thing that I'd love to see on top of the addition of the
> extensibility is adding plain compression support for zstd, with
> nothing fancy, just the compression and decompression bits.  I've done
> quite a few benchmarks with the two, and results kind of point in the
> direction that zstd is more efficient than lz4 overall.  Don't take me
> wrong: lz4 can be better in some workloads as it can consume less CPU
> than zstd while compressing less.  However, a comparison of ratios
> like (compression rate / cpu used) has always led me to see zstd as
> superior in a large number of cases.  lz4 is still very good if you
> are CPU-bound and don't care about the extra space required.  Both are
> three classes better than pglz.
>
> Once we have these three points incrementally built-in together (the
> last bit extensibility, the potential varatt.h refactoring and the
> zstd support), there may be a point in having support for more
> advanced options with the compression methods in the shape of dicts or
> more requirements linked to other compression methods, but I think the
> topic is complex enough that we should make sure that these basics are
> implemented in a way sane enough so as we'd be able to extend them
> with all the use cases in mind.
> --
> Michael

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Andres Freund

Date:

22 April, 19:24:08

Hi,

On 2025-04-18 12:22:18 -0400, Robert Haas wrote:
> On Tue, Apr 15, 2025 at 2:13 PM Nikhil Kumar Veldanda
> <veldanda.nikhilkumar17@gmail.com> wrote:
> > Addressing Compressed Datum Leaks problem (via CTAS, INSERT INTO ... SELECT ...)
> >
> > As compressed datums can be copied to other unrelated tables via CTAS,
> > INSERT INTO ... SELECT, or CREATE TABLE ... EXECUTE, I’ve introduced a
> > method inheritZstdDictionaryDependencies. This method is invoked at
> > the end of such statements and ensures that any dictionary
> > dependencies from source tables are copied to the destination table.
> > We determine the set of source tables using the relationOids field in
> > PlannedStmt.
> 
> With the disclaimer that I haven't opened the patch or thought
> terribly deeply about this issue, at least not yet, my fairly strong
> suspicion is that this design is not going to work out, for multiple
> reasons. In no particular order:
> 
> 1. I don't think users will like it if dependencies on a zstd
> dictionary spread like kudzu across all of their tables. I don't think
> they'd like it even if it were 100% accurate, but presumably this is
> going to add dependencies any time there MIGHT be a real dependency
> rather than only when there actually is one.
> 
> 2. Inserting into a table or updating it only takes RowExclusiveLock,
> which is not even self-exclusive. I doubt that it's possible to change
> system catalogs in a concurrency-safe way with such a weak lock. For
> instance, if two sessions tried to do the same thing in concurrent
> transactions, they could both try to add the same dependency at the
> same time.
> 
> 3. I'm not sure that CTAS, INSERT INTO...SELECT, and CREATE
> TABLE...EXECUTE are the only ways that datums can creep from one table
> into another. For example, what if I create a plpgsql function that
> gets a value from one table and stores it in a variable, and then use
> that variable to drive an INSERT into another table? I seem to recall
> there are complex cases involving records and range types and arrays,
> too, where the compressed object gets wrapped inside of another
> object; though maybe that wouldn't matter to your implementation if
> INSERT INTO ... SELECT uses a sufficiently aggressive strategy for
> adding dependencies.

+1 to all of these.


> I think we could add plain-old zstd compression without really
> tackling this issue

+1


> I'm now also curious to know whether Andres would agree that it's bad
> if zstd dictionaries are un-droppable. After all, I thought it would
> be bad if there was no way to eliminate a dependency on a compression
> method, and he disagreed.

I still am not too worried about that aspect. However:


> So maybe he would also think undroppable dictionaries are fine.

I'm much less sanguine about this. Imagine a schema based multi-tenancy setup,
where tenants come and go, and where a few of the tables use custom
dictionaries. Whereas not being able to get rid of lz4 at all has basically no
cost whatsoever, collecting more and more unusable dictionaries can imply a
fair amount of space usage after a while. I don't see any argument why that
would be ok, really.


> But maybe not. It seems even worse to me than undroppable compression
> methods, because you'll probably not have that many compression methods
> ever, but you could have a large number of dictionaries eventually.

Agreed on the latter.

Greetings,

Andres Freund

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Robert Haas

Date:

23 April, 18:59:26

On Mon, Apr 21, 2025 at 8:52 PM Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:
> After reviewing the email thread you attached on previous response, I
> identified a natural choke point for both inserts and updates: the
> call to "heap_toast_insert_or_update" inside
> heap_prepare_insert/heap_update. In the current master branch, that
> function only runs when HeapTupleHasExternal is true; my patch extends
> it to HeapTupleHasVarWidth tuples as well.

Isn't that basically all tuples, though? I think that's where this gets painful.

> On the performance side, my basic benchmarks show almost no regression
> for simple INSERT … VALUES workloads. CTAS, however, does regress
> noticeably: a CTAS completes in about 4 seconds before this patch, but
> with this patch it takes roughly 24 seconds. (For reference, a normal
> insert into the source table took about 58 seconds when using zstd
> dictionary compression), I suspect the extra cost comes from the added
> zstd decompression and PGLZ compression on the destination table.

That's nice to know, but I think the key question is not so much what
the feature costs when it is used but what it costs when it isn't
used. If we implement a system where we don't let
dictionary-compressed zstd datums leak out of tables, that's bound to
slow down a CTAS from a table where this feature is used, but that's
kind of OK: the feature has pros and cons, and if you don't like those
tradeoffs, you don't have to use it. However, it sounds like this
could also slow down inserts and updates in some cases even for users
who are not making use of the feature, and that's going to be a major
problem unless it can be shown that there is no case where the impact
is at all significant. Users hate paying for features that they aren't
using.

I wonder if there's a possible design where we only allow
dictionary-compressed datums to exist as top-level attributes in
designated tables to which those dictionaries are attached; and any
time you try to bury that Datum inside a container object (row, range,
array, whatever) detoasting is forced. If there's a clean and
inexpensive way to implement that, then you could avoid having
heap_toast_insert_or_update care about HeapTupleHasExternal(), which
seems like it might be a key point.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Robert Haas

Date:

23 April, 19:27:29

On Wed, Apr 23, 2025 at 11:59 AM Robert Haas <robertmhaas@gmail.com> wrote:
> heap_toast_insert_or_update care about HeapTupleHasExternal(), which
> seems like it might be a key point.

Care about HeapTupleHasVarWidth, rather.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Michael Paquier

Date:

25 April, 03:48:08

On Wed, Apr 23, 2025 at 11:59:26AM -0400, Robert Haas wrote:
> That's nice to know, but I think the key question is not so much what
> the feature costs when it is used but what it costs when it isn't
> used. If we implement a system where we don't let
> dictionary-compressed zstd datums leak out of tables, that's bound to
> slow down a CTAS from a table where this feature is used, but that's
> kind of OK: the feature has pros and cons, and if you don't like those
> tradeoffs, you don't have to use it. However, it sounds like this
> could also slow down inserts and updates in some cases even for users
> who are not making use of the feature, and that's going to be a major
> problem unless it can be shown that there is no case where the impact
> is at all significant. Users hate paying for features that they aren't
> using.

The cost of digesting a dictionnary when decompressing sets of values
is also something I think we should worry about, FWIW (see [1]), as
the digesting cost is documented as costly, so I think that there is
also an argument in making the feature efficient if used.  That would
hurt if a sequential scan needs to detoast multiple blobs with the
same dict.  If we attach that on a per-value value, wouldn't it imply
that we need to digest the dictionnary every time a blob is
decompressed?  This information could be cached, but it seems a bit
weird to me to invent a new level of relation caching for would could
be attached as a relation attribute option in the relcache.  If a
dictionnary gets trained with a new sample of values, we could rely on
the invalidation to pass the new information.

Based on what I'm reading and I know very little about the topic so I
may be wrong, but does it even make sense to allow multiple
dictionnaries to be used in a single attribute?  Of course that may
depend on the JSON blob patterns a single attribute is dealing with,
but I'm not sure that this is worth the extra complexity this creates.

> I wonder if there's a possible design where we only allow
> dictionary-compressed datums to exist as top-level attributes in
> designated tables to which those dictionaries are attached; and any
> time you try to bury that Datum inside a container object (row, range,
> array, whatever) detoasting is forced. If there's a clean and
> inexpensive way to implement that, then you could avoid having
> heap_toast_insert_or_update care about HeapTupleHasExternal(), which
> seems like it might be a key point.

Interesting, not sure.

FWIW, I'd still try to focus on making varatt more extensible with
plain zstd support first, because diving in all these details.  We are
going to need it anyway.

[1]: https://facebook.github.io/zstd/zstd_manual.html#Chapter10
--
Michael

Attachment

signature.asc

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Nikhil Kumar Veldanda

Date:

25 April, 18:15:26

Hi Michael,

Thanks for the suggestions. I agree that we should first solve the
“last–free-bit” problem in varattrib_4b compression bits before
layering on any features. Below is the approach I’ve prototyped to
keep the header compact yet fully extensible, followed by a sketch of
the plain-ZSTD(no dict) patch that sits cleanly on top of it.

1. Minimal but extensible header

/* varatt_cmp_extended follows va_tcinfo when the upper two bits of
 * va_tcinfo are 11.  Compressed data starts immediately after
 * ext_data.  ext_hdr encodes both the compression algorithm and the
 * byte-length of the algorithm-specific metadata.
 */
typedef struct varatt_cmp_extended
{
    uint32 ext_hdr;                 /* [ meta_size:24 | cmpr_id:8 ] */
    char   ext_data[FLEXIBLE_ARRAY_MEMBER];  /* optional metadata */
} varatt_cmp_extended;

a. 24 bits for length → per-datum compression algorithm metadata is
capped at 16 MB, which is far more than any realistic compression
header.
b. 8 bits for algorithm id → up to 256 algorithms.
c. Zero-overhead when unused if an algorithm needs no per-datum
metadata (e.g., ZSTD-nodict),

2. Algorithm registry
/*
 * TOAST compression methods enumeration.
 *
 * Each entry defines:
 *   - NAME         : identifier for the compression algorithm
 *   - VALUE        : numeric enum value
 *   - METADATA type: struct type holding extra info (void when none)
 *
 * The INVALID entry is a sentinel and must remain last.
 */
#define TOAST_COMPRESSION_LIST                                          \
    X(PGLZ,         0, void)                 /* existing */             \
    X(LZ4,          1, void)                 /* existing */             \
    X(ZSTD_NODICT,  2, void)                 /* new, no metadata */     \
    X(ZSTD_DICT,    3, zstd_dict_meta)       /* new, needs dict_id */   \
    X(INVALID,      4, void)                 /* sentinel */

typedef enum ToastCompressionId
{
#define X(name,val,meta) TOAST_##name##_COMPRESSION_ID = val,
    TOAST_COMPRESSION_LIST
#undef X
} ToastCompressionId;

/* Example of an algorithm-specific metadata block */
typedef struct
{
    uint32 dict_id;     /* dictionary Oid */
} zstd_dict_meta;

3. Resulting on-disk layouts for zstd

ZSTD no dict: datum ondisk layout:
+----------------------------------+
| va_header (uint32)           |
+----------------------------------+
| va_tcinfo (uint32)              |  (11 in top two bits specify extended)
+----------------------------------+
| ext_hdr (uint32)                |  <-- [ meta size:24 bits |
compression id:8 bits ]
+----------------------------------+
| Compressed bytes …       |  <-- zstd   (no dictionary)
+----------------------------------+

ZSTD dict: datum ondisk layout
+----------------------------------+
| va_header (uint32)           |
+----------------------------------+
| va_tcinfo (uint32)              |
+----------------------------------+
| ext_hdr (uint32)                |  <-- [ meta size:24 bits |
compression id:8 bits ]
+----------------------------------+
| dict_id (uint32)                  |  <-- zstd_dict_meta
+----------------------------------+
| Compressed bytes …       |  <-- zstd   (dictionary)
+----------------------------------+

4. How does this fit?

Flexibility: Each new algorithm that needs extra metadata simply
defines its own struct and allocates varatt_cmp_extended in
setup_compression_info.
Storage: Everything in varatt_cmp_extended is copied to the datum,
immediately followed by the compressed payload.
Optional, pay-as-you-go metadata – only algorithms that need it pay for it.
Future-proof – new compression algorithms, requires any kind of
metadata like dictid or any other slot into the same ext_data
mechanism.

I’ve split the work into two patches for review:
v19-0001-varattrib_4b-design-proposal-to-make-it-extended.patch:
varattrib_4b extensibility – adds varatt_cmp_extended, enum plumbing,
and macros; behaviour unchanged.
v19-0002-zstd-nodict-support.patch: Plain ZSTD (non dict) support.

Please share your thoughts—and I’d love to hear feedback on the design. Thanks!

On Mon, Apr 21, 2025 at 12:02 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Apr 18, 2025 at 12:22:18PM -0400, Robert Haas wrote:
> > I think we could add plain-old zstd compression without really
> > tackling this issue, but if we are going to add dictionaries then I
> > think we might need to revisit the idea of preventing things from
> > leaking out of tables. What I can't quite remember at the moment is
> > how much of the problem was that it was going to be slow to force the
> > recompression, and how much of it was that we weren't sure we could
> > even find all the places in the code that might need such handling.
>
> FWIW, this point resonates here.  There is one thing that we have to
> do anyway: we just have one bit left in the varlena headers as lz4 is
> using the one before last.  So we have to make it extensible, even if
> it means that any compression method other than LZ4 and pglz would
> consume one more byte in its header by default.  And I think that this
> has to happen at some point if we want flexibility in this area.
>
> +    struct
> +    {
> +        uint32        va_header;
> +        uint32        va_tcinfo;
> +        uint32        va_cmp_alg;
> +        uint32        va_cmp_dictid;
> +        char        va_data[FLEXIBLE_ARRAY_MEMBER];
> +    }            va_compressed_ext;
>
> Speaking of which, I am confused by this abstraction choice in
> varatt.h in the first patch.  Are we sure that we are always going to
> have a dictionary attached to a compressed data set or even a
> va_cmp_alg?  It seems to me that this could lead to a waste of data in
> some cases because these fields may not be required depending on the
> compression method used, as some fields may not care about these
> details.  This kind of data should be made optional, on a per-field
> basis.
>
> One thing that I've been wondering is how it would be possible to make
> the area around varattrib_4b more readable while dealing with more
> extensibility.  It would be a good occasion to improve that, even if
> I'm hand-waving here currently and that the majority of this code is
> old enough to vote, with few modifications across the years.
>
> The second thing that I'd love to see on top of the addition of the
> extensibility is adding plain compression support for zstd, with
> nothing fancy, just the compression and decompression bits.  I've done
> quite a few benchmarks with the two, and results kind of point in the
> direction that zstd is more efficient than lz4 overall.  Don't take me
> wrong: lz4 can be better in some workloads as it can consume less CPU
> than zstd while compressing less.  However, a comparison of ratios
> like (compression rate / cpu used) has always led me to see zstd as
> superior in a large number of cases.  lz4 is still very good if you
> are CPU-bound and don't care about the extra space required.  Both are
> three classes better than pglz.
>
> Once we have these three points incrementally built-in together (the
> last bit extensibility, the potential varatt.h refactoring and the
> zstd support), there may be a point in having support for more
> advanced options with the compression methods in the shape of dicts or
> more requirements linked to other compression methods, but I think the
> topic is complex enough that we should make sure that these basics are
> implemented in a way sane enough so as we'd be able to extend them
> with all the use cases in mind.
> --
> Michael



--
Nikhil Veldanda

--
Nikhil Veldanda

Attachment

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Robert Haas

Date:

28 April, 17:50:13

On Fri, Apr 25, 2025 at 11:15 AM Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:
> a. 24 bits for length → per-datum compression algorithm metadata is
> capped at 16 MB, which is far more than any realistic compression
> header.
> b. 8 bits for algorithm id → up to 256 algorithms.
> c. Zero-overhead when unused if an algorithm needs no per-datum
> metadata (e.g., ZSTD-nodict),

I don't understand why we need to spend 24 bits on a length header
here. I agree with the idea of adding a 1-byte quantity for algorithm
here, but I don't see why we need anything more than that. If the
compression method is zstd-with-a-dict, then the payload data
presumably needs to start with the OID of the dictionary, but it seems
like in your schema every single datum would use these 3 bytes to
store the fact that sizeof(Oid) = 4. The code that interprets
zstd-with-dict datums should already know the header length. Even if
generic code that works with all types of compression needs to be able
to obtain the header length on a per-compression-type basis, there can
be some kind of callback or table for that, rather than storing it in
every single datum.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Nikhil Kumar Veldanda

Date:

29 April, 00:32:08

Hi Robert,

Thanks for raising that question. The idea behind including a 24-bit
length field alongside the 1-byte algorithm ID is to ensure that each
compressed datum self-describes its metadata size. This allows any
compression algorithm to embed variable-length metadata (up to 16 MB)
without the need for hard-coding header sizes. For instance, an
algorithm in feature might require different metadata lengths for each
datum, and a fixed header size table wouldn’t work. By storing the
length in the header, we maintain a generic and future-proof design. I
would greatly appreciate any feedback on this design. Thanks!

On Mon, Apr 28, 2025 at 7:50 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Apr 25, 2025 at 11:15 AM Nikhil Kumar Veldanda
> <veldanda.nikhilkumar17@gmail.com> wrote:
> > a. 24 bits for length → per-datum compression algorithm metadata is
> > capped at 16 MB, which is far more than any realistic compression
> > header.
> > b. 8 bits for algorithm id → up to 256 algorithms.
> > c. Zero-overhead when unused if an algorithm needs no per-datum
> > metadata (e.g., ZSTD-nodict),
>
> I don't understand why we need to spend 24 bits on a length header
> here. I agree with the idea of adding a 1-byte quantity for algorithm
> here, but I don't see why we need anything more than that. If the
> compression method is zstd-with-a-dict, then the payload data
> presumably needs to start with the OID of the dictionary, but it seems
> like in your schema every single datum would use these 3 bytes to
> store the fact that sizeof(Oid) = 4. The code that interprets
> zstd-with-dict datums should already know the header length. Even if
> generic code that works with all types of compression needs to be able
> to obtain the header length on a per-compression-type basis, there can
> be some kind of callback or table for that, rather than storing it in
> every single datum.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com

--
Nikhil Veldanda

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Nikita Malakhov

Date:

29 April, 13:36:51

Hi,

Nikhil, please consider existing discussions on using dictionaries

(mentioned above by Aleksander) and extending the TOAST pointer [1],

it seems you did not check them.

The same question Robert asked above - it's unclear why the header

wastes so much space. You mentioned metadata length - what metadata

do you mean there?

Also Robert pointed out very questionable approaches in your solution -

new dependencies crawling around user tables, new catalog table

with very unclear lifecycle (and, having new catalog table, immediately

having questions with pg_upgrade).

Currently I'm looking through the patch and could share my thoughts

later.

While reading this thread I've thought about storing a dictionary within

the table it is used for - IIUC on dictionary is used for just one attribute,

so it does not make sense to make it global.

Also, I have a question regarding the Zstd implementation you propose -

does it provide a possibility for partial decompression?

Thanks!

[1] https://www.postgresql.org/message-id/flat/CAN-LCVMq2X%3Dfhx7KLxfeDyb3P%2BBXuCkHC0g%3D9GF%2BJD4izfVa0Q%40mail.gmail.com

Regards,

Nikita Malakhov

Postgres Professional

The Russian Postgres Company

https://postgrespro.ru/

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Robert Haas

Date:

29 April, 17:34:42

On Mon, Apr 28, 2025 at 5:32 PM Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:
> Thanks for raising that question. The idea behind including a 24-bit
> length field alongside the 1-byte algorithm ID is to ensure that each
> compressed datum self-describes its metadata size. This allows any
> compression algorithm to embed variable-length metadata (up to 16 MB)
> without the need for hard-coding header sizes. For instance, an
> algorithm in feature might require different metadata lengths for each
> datum, and a fixed header size table wouldn’t work. By storing the
> length in the header, we maintain a generic and future-proof design. I
> would greatly appreciate any feedback on this design. Thanks!

I feel like I gave you some feedback on the design already, which was
that it seems like a waste of 3 bytes to me.

Don't get me wrong: I'm quite impressed by the way you're working on
this problem and I hope you stick around and keep working on it and
figure something out. But I don't quite understand the point of this
response: it seems like you're just restating what the design does
without really justifying it. The question here isn't whether a 3-byte
header can describe a length up to 16MB; I think we all know our
powers of two well enough to agree on the answer to that question. The
question is whether it's a good use of 3 bytes, and I don't think it
is.

I did consider the fact that future compression algorithms might want
to use variable-length headers; but I couldn't see a reason why we
shouldn't let each of those compression algorithms decide for
themselves how to encode whatever information they need. If a
compression algorithm needs a variable-length header, then it just
needs to make that header self-describing. Worst case scenario, it can
make the first byte of that variable-length header a length byte, and
then go from there; but it's probably possible to be even smarter and
use less than a full byte. Say for example we store a dictionary ID
that in concept is a 32-bit quantity but we use a variable-length
integer representation for it. It's easy to see that we shouldn't ever
need more than 3 bits for that so a full length byte is overkill and,
in fact, would undermine the value of a variable-length representation
rather severely. (I suspect it's a bad idea anyway, but it's a worse
idea if you burn a full byte on a length header.)

But there's an even larger question here too, which is why we're
having some kind of discussion about generalized metadata when the
current project seemingly only requires a 4-byte dictionary OID. If
you have some other use of this space in mind, I don't think you've
told us what it is. If you don't, then I'm not sure why we're
designing around an up-to-16MB variable-length quantity when what we
have before us is a 4-byte fixed-length quantity.

Moreover, even if you do have some (undisclosed) idea about what else
might be stored in this metadata area, why would it be important or
even desirable to have the length of that area represented in some
uniform way across compression methods? There's no obvious need for
any code outside the compression method itself to be able to decompose
the Datum into a metadata portion and a payload portion. After all,
the metadata portion could be anything so there's no way for anything
but the compression method to interpret it usefully. If we do want to
have outside code be able to ask questions, we could design some kind
of callback interface - e.g. if we end up with multiple compression
methods that store dictionary OIDs and they maybe do it in different
ways, each could provide an
"extract-the-dictionary-OID-from-this-datum" callback and each
compression method can implement that however it likes.

Maybe you can argue that we will eventually end up with various
compression method callbacks each of which is capable of working on
the metadata, and so then we might want to take an initial slice of a
toasted datum that is just big enough to allow that to work. But that
is pretty hypothetical, and in practice the first chunk of the TOAST
value (~2k) seems like it'd probably work well for most cases.

So, again, if you want us to take seriously the idea of dedicating 3
bytes per Datum to something, you need to give us a really good reason
for so doing. The fact a 24-bit metadata length can describe a
metadata header of up to 2^24 bits isn't a reason, good or bad. It's
just math.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Nikhil Kumar Veldanda

Date:

04 May, 15:54:34

Hi Robert

> But I don't quite understand the point of this
> response: it seems like you're just restating what the design does
> without really justifying it. The question here isn't whether a 3-byte
> header can describe a length up to 16MB; I think we all know our
> powers of two well enough to agree on the answer to that question. The
> question is whether it's a good use of 3 bytes, and I don't think it
> is.

My initial decision to include a 3‑byte length field was driven by two goals:
    1. Avoid introducing separate callbacks for each algorithm.
    2. Provide a single, algorithm-agnostic mechanism for handling
metadata length.

After re-evaluating based on your feedback, I agree that the fixed
overhead of a 3-byte length field outweighs its benefit; per-algorithm
callbacks deliver the same functionality while saving three bytes per
datum.

> I did consider the fact that future compression algorithms might want
> to use variable-length headers; but I couldn't see a reason why we
> shouldn't let each of those compression algorithms decide for
> themselves how to encode whatever information they need. If a
> compression algorithm needs a variable-length header, then it just
> needs to make that header self-describing. Worst case scenario, it can
> make the first byte of that variable-length header a length byte, and
> then go from there; but it's probably possible to be even smarter and
> use less than a full byte. Say for example we store a dictionary ID
> that in concept is a 32-bit quantity but we use a variable-length
> integer representation for it. It's easy to see that we shouldn't ever
> need more than 3 bits for that so a full length byte is overkill and,
> in fact, would undermine the value of a variable-length representation
> rather severely. (I suspect it's a bad idea anyway, but it's a worse
> idea if you burn a full byte on a length header.)
>

I agree. Each compression algorithm can decide its own metadata size
overhead. Callbacks can provide this information as well rather than
storing in fixed length bytes(3 bytes). The revised patch introduces a
"toast_cmpid_meta_size(const varatt_cmp_extended *hdr)", which
calculates the metadata size.

> But there's an even larger question here too, which is why we're
> having some kind of discussion about generalized metadata when the
> current project seemingly only requires a 4-byte dictionary OID. If
> you have some other use of this space in mind, I don't think you've
> told us what it is. If you don't, then I'm not sure why we're
> designing around an up-to-16MB variable-length quantity when what we
> have before us is a 4-byte fixed-length quantity.

This project only requires 4 bytes of fixed-size metadata to store the
dictionary ID.

Updated design for extending varattrib_4b compression

1. extensible header

/*
 * varatt_cmp_extended: an optional per‐datum header for extended
compression method.
 * Only used when va_tcinfo's top two bits are "11".
 */
typedef struct varatt_cmp_extended
{
        uint8           cmp_alg;
        char            cmp_meta[FLEXIBLE_ARRAY_MEMBER];        /*
algorithm‐specific metadata */
} varatt_cmp_extended;

2. Algorithm registry and metadata size dispatch

static inline uint32
unsupported_meta_size(const varatt_cmp_extended *hdr)
{
        elog(ERROR, "toast_cmpid_meta_size called for unsupported
compression algorithm");
        return 0;                                       /* unreachable */
}

/* no metadata for plain-ZSTD */
static inline uint32
zstd_nodict_meta_size(const varatt_cmp_extended *hdr)
{
        return 0;
}

static inline uint32
zstd_dict_meta_size(const varatt_cmp_extended *hdr)
{
        return sizeof(Oid);
}

/*
 * TOAST compression methods enumeration.
 *
 * NAME         : algorithm identifier
 * VALUE        : enum value
 * META-SIZE-FN : Calculates algorithm metadata size.
 */
#define TOAST_COMPRESSION_LIST                                  \
        X(PGLZ,                 0, unsupported_meta_size)       \
        X(LZ4,                  1, unsupported_meta_size)       \
        X(ZSTD_NODICT,          2, zstd_nodict_meta_size)       \
        X(ZSTD_DICT,            3, zstd_dict_meta_size)         \
        X(INVALID,              4, unsupported_meta_size)       /* sentinel */

/* Compression algorithm identifiers */
typedef enum ToastCompressionId
{
#define X(name,val,fn) TOAST_##name##_COMPRESSION_ID = (val),
        TOAST_COMPRESSION_LIST
#undef X
} ToastCompressionId;

/* lookup table to check if compression method uses extended format */
static const bool toast_cmpid_extended[] = {
#define X(name,val,fn)                                          \
        /* PGLZ, LZ4 don't use extended format */               \
        [TOAST_##name##_COMPRESSION_ID] =                       \
                        ((val) != TOAST_PGLZ_COMPRESSION_ID &&  \
                        (val) != TOAST_LZ4_COMPRESSION_ID  &&   \
                        (val) != TOAST_INVALID_COMPRESSION_ID),
        TOAST_COMPRESSION_LIST
#undef X
};

#define TOAST_CMPID_EXTENDED(alg) (toast_cmpid_extended[alg])

/*
 * Prototype for a per-datum metadata-size callback:
 *   given a pointer to the extended header, return
 *   how many metadata bytes follow it.
 */
typedef uint32 (*ToastMetaSizeFn) (const varatt_cmp_extended *hdr);

/* Callback table—indexed by ToastCompressionId */
static const ToastMetaSizeFn toast_meta_size_fns[] = {
#define X(name,val,fn) [TOAST_##name##_COMPRESSION_ID] = fn,
        TOAST_COMPRESSION_LIST
#undef X
};

/* Calculates algorithm metadata size */
static inline uint32
toast_cmpid_meta_size(const varatt_cmp_extended *hdr)
{
        Assert(hdr != NULL);
        return toast_meta_size_fns[hdr->cmp_alg] (hdr);
}

Each compression algorithm provides a static callback that returns the
size of its metadata, given a pointer to the varatt_cmp_extended
header. Algorithms with fixed-size metadata return a constant, while
algorithms with variable-length metadata are responsible for defining
and parsing their own internal headers to compute the metadata size.

3. Resulting on-disk layouts for zstd

ZSTD (nodict) — datum on‑disk layout

+----------------------------------+
| va_header (uint32)               |
+----------------------------------+
| va_tcinfo (uint32)               |  ← top two bits = 11 (extended)
+----------------------------------+
| cmp_alg  (uint8)                 |  ← (ZSTD_NODICT)
+----------------------------------+
| compressed bytes …               |  ← ZSTD frame
+----------------------------------+


ZSTD(dict) — datum on‑disk layout

+----------------------------------+
| va_header (uint32)               |
+----------------------------------+
| va_tcinfo (uint32)               |  ← top two bits = 11 (extended)
+----------------------------------+
| cmp_alg  (uint8)                 |  ← (ZSTD_DICT)
+----------------------------------+
| dict_id   (uint32)               |  ← dictionary OID
+----------------------------------+
| compressed bytes …               |  ← ZSTD frame
+----------------------------------+

I hope this updated design addresses your concerns. I would appreciate
any further feedback you may have. Thanks again for your guidance—it's
been very helpful.

v20-0001-varattrib_4b-design-proposal-to-make-it-extended.patch:
varattrib_4b extensibility – adds varatt_cmp_extended, metadata size
dispatch and useful macros; behaviour unchanged.
v20-0002-zstd-nodict-compression.patch: Plain ZSTD (non dict) support.


--
Nikhil Veldanda

Attachment

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Robert Haas

Date:

05 May, 18:07:01

On Sun, May 4, 2025 at 8:54 AM Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:
> I agree. Each compression algorithm can decide its own metadata size
> overhead. Callbacks can provide this information as well rather than
> storing in fixed length bytes(3 bytes). The revised patch introduces a
> "toast_cmpid_meta_size(const varatt_cmp_extended *hdr)", which
> calculates the metadata size.

I don't understand why we need this. I don't see why we need any sort
of generalized concept of metadata at all here. The zstd-dict
compression method needs to store a four-byte OID, so let it do that.
But we don't need to brand that as metadata; and we don't need a
method for other parts of the system to ask how much metadata exists.
At least, I don't think we do.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Michael Paquier

Date:

07 May, 10:49:43

On Sun, May 04, 2025 at 05:54:34AM -0700, Nikhil Kumar Veldanda wrote:
> 3. Resulting on-disk layouts for zstd
>
> ZSTD (nodict) — datum on‑disk layout
>
> +----------------------------------+
> | va_header (uint32)               |
> +----------------------------------+
> | va_tcinfo (uint32)               |  ← top two bits = 11 (extended)
> +----------------------------------+
> | cmp_alg  (uint8)                 |  ← (ZSTD_NODICT)
> +----------------------------------+
> | compressed bytes …               |  ← ZSTD frame
> +----------------------------------+

This makes sense, yes.  You are allocating an extra byte after
va_tcinfo that serves as a redirection if the three bits dedicated to
the compression method are set.

> ZSTD(dict) — datum on‑disk layout
> +----------------------------------+
> | va_header (uint32)               |
> +----------------------------------+
> | va_tcinfo (uint32)               |  ← top two bits = 11 (extended)
> +----------------------------------+
> | cmp_alg  (uint8)                 |  ← (ZSTD_DICT)
> +----------------------------------+
> | dict_id   (uint32)               |  ← dictionary OID
> +----------------------------------+
> | compressed bytes …               |  ← ZSTD frame
> +----------------------------------+
>
> I hope this updated design addresses your concerns. I would appreciate
> any further feedback you may have. Thanks again for your guidance—it's
> been very helpful.

That makes sense as well structurally if we include a dictionary for
each value.  Not sure that we need that much space, for this purpose,
though.  We are going to need the extra byte anyway AFAIK, so better
to start with that.

I have been reading 0001 and I'm finding that the integration does not
seem to fit much with the existing varatt_external, making the whole
result slightly confusing.  A simple thing: the last bit that we can
use is in varatt_external's va_extinfo, where the patch is using
VARATT_4BCE_MASK to track that we need to go beyond varatt_external to
know what kind of compression information we should use.  This is an
important point, and it is not documented around varatt_external which
still assumes that the last bit could be used for a compression
method.  With what you are doing in 0001 (or even 0002), this becomes
wrong.

Shouldn't we have a new struct portion in varattrib_4b's union for
this purpose at least (I don't recall that we rely on varattrib_4b's
size which would get larger with this extra byte for the new extended
data with the three bits set for the compression are set in
va_extinfo, correct me if I'm wrong here).
--
Michael

Attachment

signature.asc

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Nikita Malakhov

Date:

07 May, 11:40:14

Hi!

Michael, what do you think of this approach (extending varatt_external)

vs extending varatt itself by new tag and structure? The second approach

allows more flexibility, independence of existing structure without modifying

varatt_4b and is extensible further. I mentioned it above (extending

the TOAST pointer), and it could be implemented more easily and in a less

confusing way.

I'm +1 on storing dictionary somewhere around actual data (not necessary

in the data storage area itself) but strongly against new catalog table

with dictionaries - it involves a lot of side effects, including locks while working

with this table resulting in performance degradation, and so on.

Regards,

Nikita Malakhov

Postgres Professional

The Russian Postgres Company

https://postgrespro.ru/

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Nikhil Kumar Veldanda

Date:

08 May, 02:39:17

Hi Michael, Thanks for the feedback.

On Wed, May 7, 2025 at 12:49 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> I have been reading 0001 and I'm finding that the integration does not
> seem to fit much with the existing varatt_external, making the whole
> result slightly confusing.  A simple thing: the last bit that we can
> use is in varatt_external's va_extinfo, where the patch is using
> VARATT_4BCE_MASK to track that we need to go beyond varatt_external to
> know what kind of compression information we should use.  This is an
> important point, and it is not documented around varatt_external which
> still assumes that the last bit could be used for a compression
> method.  With what you are doing in 0001 (or even 0002), this becomes
> wrong.

This is the current logic used in patch for varatt_external.

When a datum is compressed with an extended algorithm and must live in
external storage, we set the top two bits of
va_extinfo(varatt_external) to 0b11.

To figure out the compression method for an external TOAST datum:

1. Inspect the top two bits of va_extinfo.
2. If they equal 0b11(VARATT_4BCE_MASK), call
toast_get_compression_id, which invokes detoast_external_attr to fetch
the datum in its 4-byte varattrib form (no decompression) and then
reads its compression header to find the compression method.
3. Otherwise, fall back to the existing
VARATT_EXTERNAL_GET_COMPRESS_METHOD path to get the compression
method.

We use this macro VARATT_EXTERNAL_COMPRESS_METHOD_EXTENDED to
determine if the compression method is extended or not.

Across the entire codebase, external TOAST‐pointer compression methods
are only inspected in the following functions:
1. pg_column_compression
2. check_tuple_attribute (verify_heapam pg function)
3. detoast_attr_slice (just to check pglz or not)

Could you please help me understand what’s incorrect about this approach?

> Shouldn't we have a new struct portion in varattrib_4b's union for
> this purpose at least (I don't recall that we rely on varattrib_4b's
> size which would get larger with this extra byte for the new extended
> data with the three bits set for the compression are set in
> va_extinfo, correct me if I'm wrong here).
> --

In patch v21, va_compressed.va_data points to varatt_cmp_extended, so
adding it isn’t strictly necessary. If we do want to fold it into the
varattrib_4b union, we could define it like this:

```
typedef union
{
struct /* Normal varlena (4-byte length) */
{
uint32 va_header;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_4byte;
struct /* Compressed-in-line format */
{
uint32 va_header;
uint32 va_tcinfo; /* Original data size (excludes header) and
* compression method; see va_extinfo */
char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
} va_compressed;
struct
{
uint32 va_header;
uint32 va_tcinfo; /* Original data size (excludes header) and
* compression method; see va_extinfo */
uint8 cmp_alg;
char cmp_data[FLEXIBLE_ARRAY_MEMBER];
} varatt_cmp_extended;
} varattrib_4b;
```
we don't depend on varattrib_4b size anywhere.

--
Nikhil Veldanda

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Michael Paquier

Date:

08 May, 03:38:40

On Wed, May 07, 2025 at 04:39:17PM -0700, Nikhil Kumar Veldanda wrote:
> In patch v21, va_compressed.va_data points to varatt_cmp_extended, so
> adding it isn’t strictly necessary. If we do want to fold it into the
> varattrib_4b union, we could define it like this:
>
> ```
> typedef union
> {
>     struct /* Normal varlena (4-byte length) */
>     {
>         uint32 va_header;
>         char va_data[FLEXIBLE_ARRAY_MEMBER];
>     } va_4byte;
>     struct /* Compressed-in-line format */
>     {
>         uint32 va_header;
>         uint32 va_tcinfo; /* Original data size (excludes header) and
>         * compression method; see va_extinfo */
>         char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
>     } va_compressed;
>     struct
>     {
>         uint32 va_header;
>         uint32 va_tcinfo; /* Original data size (excludes header) and
>         * compression method; see va_extinfo */
>         uint8 cmp_alg;
>         char cmp_data[FLEXIBLE_ARRAY_MEMBER];
>     } varatt_cmp_extended;
> } varattrib_4b;
> ```
> we don't depend on varattrib_4b size anywhere.

Yes, I was wondering if this is not the most natural approach in terms
of structure once if we plug an extra byte into the varlena header if
all the bits of va_extinfo for the compression information are used.
Having all the bits may not mean that this necessarily means that the
information would be cmp_data all the time, just that this a natural
option when plugging in a new compression method in the new byte
available.

FWIW, I've tested this exact change yesterday, wondering if we depend
on sizeof(varattrib_4b) after looking at the code and getting the
impression that we don't even for some the in-memory comparisons, and
noted two things:
- check-world was OK.
- a pg_upgrade'd instance with a regression database seems kind of
OK, but I've not done that much in-depth checking on this side so I
have less confidence about that.
--
Michael

Attachment

signature.asc

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Michael Paquier

Date:

08 May, 04:43:47

On Wed, May 07, 2025 at 11:40:14AM +0300, Nikita Malakhov wrote:
> Michael, what do you think of this approach (extending varatt_external)
> vs extending varatt itself by new tag and structure?

I'm reserved on that.  What I'm afraid here is more complications in
the backend code because we have quite a few places where we do
varatt lookups to decide what should be happen, like in PLs, so this
brings complications of its own for something that could be isolated
behind a varattrib_4b, where detoasting is under control.  The patch
posted at [1] means that the custom area could be anything, how do you
make sure that the backend is able to understand what could be
anything?  I guess that this also depends on the pluggable toast part,
of course, but I've not studied enough what's been proposed to have a
hard opinion.  If you have very specific pointers, please feel free.

> The second approach
> allows more flexibility, independence of existing structure without
> modifying
> varatt_4b and is extensible further. I mentioned it above (extending
> the TOAST pointer), and it could be implemented more easily and in a less
> confusing way.

If you mean [0], putting an "extended" flag into ToastCompressionId
which is something used now by the internals of TOAST for a
compression method, with ToastCompressionId being limited to have up
to 4 elements in its enum, does not feel right.  In concept, once
extended, this may point to something more than a compression method,
as there's also metadata around the compression method added.  At
least that's what I'm understanding as a possible scenario from all
the proposals in this area.  There's some overlap with
common/compression.h, for example, even if we are never going to care
about gzip in this case, just saying that this has been buzzing me in
the core code for some time.

One first thing I'd try to do here is to untangle this situation, by
allowing ToastCompressionId to have more extensibility so as we could
use it to track more compression methods, or just perhaps remove it
entirely in a smart way by keeping the information related to the
extra byte and the two bits of va_tcinfo for the compression method
isolated in varatt.h, shaping the code so as adding more compression
methods in the extra byte put after va_tcinfo would be easier once the
surroundings of varattrib_4b are extended.  Without an agreement about
how to use the last bit we have, there's perhaps little point in
aiming for any of that now.

FWIW, extending the area around varattrib_4b feels a natural thing to
do here, and it does not have to overlap with the possibilities around
the varatts.

> I'm +1 on storing dictionary somewhere around actual data (not necessary
> in the data storage area itself) but strongly against new catalog table
> with dictionaries - it involves a lot of side effects, including locks
> while working
> with this table resulting in performance degradation, and so on.

Just wondering.  Have you looked at the potential overhead of doing
computation and decomputation of a dictionnary?  zstd mentions in its
docs that these can easily cause a lot of overhead, hence handling
this stuff without some kind of caching is going to be costly if
performing a lot of chunk decompressions.  It's something that could
be decided later on, of course.  If this area of the code is made
pluggable, then it's up to an extension to just do it.

[0]:
https://www.postgresql.org/message-id/flat/CAN-LCVMq2X%3Dfhx7KLxfeDyb3P%2BBXuCkHC0g%3D9GF%2BJD4izfVa0Q%40mail.gmail.com
[1]: https://www.postgresql.org/message-id/CAN-LCVNxbnpHh4PVUUc9g6dPibE8wZALiLtxcs3TjfivxDkCkA%40mail.gmail.com
--
Michael

Attachment

signature.asc

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Nikhil Kumar Veldanda

Date:

05 June, 10:03:49

Thanks Michael, for providing feedback.

On Fri, May 30, 2025 at 12:21 AM Michael Paquier <michael@paquier.xyz> wrote:
>
>
> > During compression, compression methods (zstd_compress_datum) will
> > determine whether to use metadata(dictionary) or not based on
> > CompressionInfo.meta.
>
> Not sure about this one.

I removed the meta field and the CompressionInfo struct. I originally
used CompressionInfo to carry the compression method, zstd_level, and
zstd dict ID downstream, but since we’re now using a default
compression level for zstd, it’s no longer needed.

>
> > ALTER TABLE tblname
> >   ALTER COLUMN colname
> >   SET (zstd_level = 5);
> > ```
> >
>
> Specifying that as an attribute option makes sense here, but I don't
> think that this has to be linked to the initial patch set that should
> extend the toast data for the new compression method.  It's a bit hard
> to say how relevant that is, and IMV it's kind of hard for users to
> know which level makes more sense.  Setting up the wrong level can be
> equally very costly in CPU.  For now, my suggestion would be to focus
> on the basics, and discard this part until we figure out the rest.
>

Ack. I’ve removed that option and will stick with ZSTD_CLEVEL_DEFAULT
as the compression level.

> +CompressionInfo
> +setup_cmp_info(char cmethod, Form_pg_attribute att)

Removed setup_cmp_info and its references.

> This routine declares a Form_pg_attribute as argument, does not use
> it.  Due to that, it looks that attoptcache.h is pulled into
> toast_compression.c.
>
Removed it.

> Patch 0001 has the concept of metadata with various facilities, like
> VARATT_4BCE_HAS_META(), CompressionInfo, etc.  However at the current
> stage we don't need that at all.  Wouldn't it be better to delay this
> kind of abstraction layer to happen after we discuss how (and if) the
> dictionary part should be introduced rather than pay the cost of the
> facility in the first step of the implementation?  This is not
> required as a first step.  The toast zstd routines introduced in patch
> 0002 use !meta, discard meta=true as an error case.
>
Removed all metadata-related abstractions from patch 0001.

> +/* Helper: pack <flag, cmid> into a single byte:  flag (b0), cmid-2
> (b1..7) */
>
> Having a one-liner here is far from enough?  This is the kind of thing
> where we should spend time describing how things are done and why they
> are done this way.  This is not sufficient, there's just too much to
> guess.  The fact that we have VARATT_4BCE_EXTFLAG is only, but there's
> no real information about va_ecinfo and that it relates to the three
> bits sets, for example.
>
I’ve added a detailed comment explaining the one-byte layout.

> +#define VARTAG_SIZE(PTR)   \
> [...]
> UNALIGNED_U32()
>
> This stuff feels magic.  It's hard for someone to understand what's
> going on here, and there is no explanation about why it's done this
> way.
>
To clarify, we need to read a 32-bit value from an unaligned address
(specifically va_extinfo inside varatt_external) to determine the
toast_pointer size (by checking the top two bits to see if they equal
0b11, indicating an optional trailer). I wrote two versions of
READ_U32_UNALIGNED(ptr) that load four bytes individually and
reassemble them according to little- or big-endian order:

/**
 * Safely read a 32-bit unsigned integer from *any* address, even when
 * that address is **not** naturally aligned to 4 bytes.  We do the load
 * one byte at a time and re-assemble the word in *host* byte order.
 * For LITTLE ENDIAN systems
 */
#define READ_U32_UNALIGNED(ptr) \
( (uint32) (((const uint8 *)(ptr))[0]) \
| ((uint32)(((const uint8 *)(ptr))[1]) <<  8) \
| ((uint32)(((const uint8 *)(ptr))[2]) << 16) \
| ((uint32)(((const uint8 *)(ptr))[3]) << 24) )

/**
 * For BIG ENDIAN systems.
 */
#define READ_U32_UNALIGNED(ptr) \
( (uint32) (((const uint8 *)(ptr))[3]) \
| ((uint32)(((const uint8 *)(ptr))[2]) <<  8) \
| ((uint32)(((const uint8 *)(ptr))[1]) << 16) \
| ((uint32)(((const uint8 *)(ptr))[0]) << 24) )

Alternatively, one could use:

#define READ_U32_UNALIGNED(src)                           \
    ({                                                    \
        uint32 _tmp;                                      \
        memcpy(&_tmp, (src), sizeof(uint32));             \
        _tmp;                                             \
    })

I chose the byte-by-byte version to avoid extra instructions in a hot path.

> -toast_compress_datum(Datum value, char cmethod)
> +toast_compress_datum(Datum value, CompressionInfo cmp)
> [...]
> -   /* If the compression method is not valid, use the current default */
> -   if (!CompressionMethodIsValid(cmethod))
> -       cmethod = default_toast_compression;
>
> Removing the fallback to the default toast compression GUC if nothing
> is valid does not look right.  There could be extensions that depend
> on that, and it's unclear what the benefits of setup_cmp_info() are,
> because it is not documented, so it's hard for one to understand how
> to use these changes.

I removed setup_cmp_info, all related code has been deleted.

> -   result = (struct varlena *) palloc(TOAST_POINTER_SIZE);
> +   result = (struct varlena *) palloc(TOAST_CMPID_EXTENDED(cm) ? TOAST_POINTER_EXT_SIZE : TOAST_POINTER_NOEXT_SIZE);
> [...]
> -   memcpy(VARDATA_EXTERNAL(result), &toast_pointer,
> sizeof(toast_pointer));
> +   memcpy(VARDATA_EXTERNAL(result), &toast_pointer,
> TOAST_CMPID_EXTENDED(cm) ? TOAST_POINTER_EXT_SIZE - VARHDRSZ_EXTERNAL
> : TOAST_POINTER_NOEXT_SIZE - VARHDRSZ_EXTERNAL) ;
>
> That looks, err...  Hard to maintain to me.  Okay, that's a
> calculation for the extended compression part, but perhaps this is a
> sign that we need to think harder about the surroundings of the
> toast_pointer to ease such calculations.
>
I simplified it by introducing a helper macro:
Now both the palloc call and the memcpy length calculation simply use
TOAST_POINTER_SIZE(cm) and TOAST_POINTER_SIZE(cm) − VARHDRSZ_EXTERNAL,
respectively.

#define TOAST_POINTER_SIZE(cm) \
(TOAST_CMPID_EXTENDED(cm) ? TOAST_POINTER_EXT_SIZE : TOAST_POINTER_NOEXT_SIZE)

> +    {
> +        {
> +            "zstd_level",
> +            "Set column's ZSTD compression level",
> +            RELOPT_KIND_ATTRIBUTE,
> +            ShareUpdateExclusiveLock
> +        },
> +        DEFAULT_ZSTD_LEVEL, MIN_ZSTD_LEVEL, MAX_ZSTD_LEVEL
> +    },
>
> This could be worth a patch on its own, once we get the basics sorted
> out.  I'm not even sure that we absolutely need that, TBH.  The last
> time I've had a discussion on the matter for WAL compression we
> discarded the argument about the level because it's hard to understand
> how to tune, and the default is enough to work magics.  For WAL, we've
> been using ZSTD_CLEVEL_DEFAULT in xloginsert.c, and I've not actually
> heard much about people wanting to tune the compression level.  That
> was a few years ago, perhaps there are some more different opinions on
> the matter.
>
Removed it.

> Your patch introduces a new compression_zstd, touching very lightly
> compression.sql.  I think that we should and can do much better than
> that in the long term.  The coverage of compression.sql is quite good,
> and what the zstd code is adding does not cover all of it.  Let's
> rework the tests of HEAD and split compression.sql for the LZ4 and
> pglz parts.  If one takes a diff between compression.out and
> compression_1.out, he/she would notice that the only differences are
> caused by the existence of the lz4 table.  This is not the smartest
> move we can do if we add more compression methods, so I'd suggest the
> following:
> - Add a new SQL function called pg_toast_compression_available(text)
> or similar, able to return if a toast compression method is supported
> or not.  This would need two arguments once the initial support for
> zstd is done: lz4 and zstd.  For head, we only require one: lz4.
> - Now, the actual reason why a function returning a boolean result is
> useful is for the SQL tests.  It is possible with \if to make the
> tests conditional if LZ4 is supported or now, limiting the noise if
> LZ4 is not supported.  See for example the tricks we use for the UTF-8
> encoding or NUMA.
> - Move the tests related to lz4 into a separate file, outside
> compression.sql, in a new file called compression_lz4.sql.  With the
> addition of zstd toast support, we would add a new file:
> compression_zstd.sql.  The new zstd suite would then just need to
> copy-paste the original one, with few tweaks.  It may be better to
> parameterize that but we don't do that anymore these days with
> input/output regression files.

Agreed. I introduced pg_compression_available(text) and refactored the
SQL tests accordingly. I split out LZ4 tests into compression_lz4.sql
and created compression_zstd.sql with the appropriate differences.

v25-0001-Add-pg_compression_available-and-split-sql-compr.patch -
Introduced pg_compression_available function and split sql tests
related to compression
v25-0002-Design-to-extend-the-varattrib_4b-varatt_externa.patch -
Design proposal for varattrib_4b & varatt_external
v25-0003-Implement-Zstd-compression-no-dictionary-support.patch - zstd
no dictionary compression implementation

--
Nikhil Veldanda

Attachment

Re: ZStandard (with dictionaries) compression support for TOAST compression

From

Michael Paquier

Date:

11 June, 05:42:02

On Thu, Jun 05, 2025 at 12:03:49AM -0700, Nikhil Kumar Veldanda wrote:
> Agreed. I introduced pg_compression_available(text) and refactored the
> SQL tests accordingly. I split out LZ4 tests into compression_lz4.sql
> and created compression_zstd.sql with the appropriate differences.
>
> v25-0001-Add-pg_compression_available-and-split-sql-compr.patch -
> Introduced pg_compression_available function and split sql tests
> related to compression

I like that as an independent piece because it's going to help a lot
in having new compression methods, so I'm looking forward to getting
that merged into the tree for v19.  It can be split into two
independent pieces:
- One patch for the addition of the new function
pg_compression_available(), to detect which compression are supported
at binary level to skip the tests.
- One patch to split the LZ4-only tests into its own file.

The split of the tests is not completely clean as presented in your
patch, though.  Your patch only does a copy-paste of the original
file.  Some of the basic tests of compression.sql check the
interactions between the use of two compression methods, and the
"basic" compression.sql could just cut them and rely on the LZ4
scripts to do the job, because we want two active different
compression methods for these scenarios.  For example, cmdata1
switched to use pglz has little uses.  The trick is to have a minimal
set of tests to minimize the run time, while we don't lose in
coverage.  Coverage report numbers are useful to compile when it comes
to such exercises, even if it can be an ant's work sometimes.

+ * pg_compression_available(text) → bool

Non-ASCII characters added in the code comments.

+#include "fmgr.h"
+#include "parser/scansup.h"
+#include "utils/builtins.h"

Include file order.

> v25-0002-Design-to-extend-the-varattrib_4b-varatt_externa.patch -
> Design proposal for varattrib_4b & varatt_external
> v25-0003-Implement-Zstd-compression-no-dictionary-support.patch - zstd
> no dictionary compression implementation

About this part, I am not sure yet.  TBH, I've been working on this
the code for a different proposal in this area, because I've been
reminded during pgconf.dev that we still depend on 4-byte OIDs for
toast values, and we have done nothing about that for a long time.

If I'm able to pull this off correctly, modernizing the code on the
way, it should make additions related to the handling of different
on-disk varatt_external easier; the compression handling is a part of
that.  So yes, that's related to varatt_external, and how we handle
it in the core code in the toasting and detoasting layers.  The
difficult part is finding out how a good layer should look like,
because there's a bunch of hardcoded knowledge related to on-disk
TOAST Datums and entries, like the maximum chunk size (control file)
that depends on the toast_pointer, pointer alignment when inserting
the TOAST datums, etc.  A lot of these things are close to 20 years
old, we have to maintain on-disk compatibility while attempting to
extend the varatt_external compatibility and there have been many
proposals that did not make it.  None of them were really mature
enough in terms of layer deinision.  Probably what I'm doing is going
to be flat-out rejected, but we'll see.
--
Michael

Attachment

signature.asc