Thread: ZStandard (with dictionaries) compression support for TOAST compression

ZStandard (with dictionaries) compression support for TOAST compression

From
Nikhil Kumar Veldanda
Date:
Hi all,

The ZStandard compression algorithm [1][2], though not currently used for TOAST compression in PostgreSQL, offers significantly improved compression ratios compared to lz4/pglz in both dictionary-based and non-dictionary modes. Attached find for review my patch to add ZStandard compression to Postgres. In tests this patch used with a pre-trained dictionary achieved up to four times the compression ratio of LZ4, while ZStandard without a dictionary outperformed LZ4/pglz by about two times during compression of data.

Notably, this is the first compression algorithm for Postgres that can make use of a dictionary to provide higher levels of compression, but dictionaries have to be generated and maintained, and so I’ve had to break new ground in that regard. To use the dictionary support requires training and storing a dictionary for a given variable-length column type. On a variable-length column, a SQL function will be called. It will sample the column’s data and feed it into the ZStandard training API which will return a dictionary. In the example, the column is of JSONB type. The SQL function takes the table name and the attribute number as inputs. If the training is successful, it will return true; otherwise, it will return false.

‘’‘
test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1);
build_zstd_dict_for_attribute
-------------------------------
t
(1 row)
‘’‘

The sampling logic and data to feed to the ZStandard training API can vary by data type. The patch includes an method to write other type-specific training functions and includes a default for JSONB, TEXT and BYTEA. There is a new option called ‘build_zstd_dict’ that takes a function name as input in ‘CREATE TYPE’. In this way anyone can write their own type-specific training function by handling sampling logic and returning the necessary information for the ZStandard training API in “ZstdTrainingData” format.

```
typedef struct ZstdTrainingData
{
char *sample_buffer; /* Pointer to the raw sample buffer */
size_t *sample_sizes; /* Array of sample sizes */
int nitems; /* Number of sample sizes */
} ZstdTrainingData;
```
This information is feed into the ZStandard train API, which generates a dictionary and inserts it into the dictionary catalog table. Additionally, we update the ‘pg_attribute’ attribute options to include the unique dictionary ID for that specific attribute. During compression, based on the available dictionary ID, we retrieve the dictionary and use it to compress the documents. I’ve created standard training function (`zstd_dictionary_builder`) for JSONB, TEXT, and BYTEA. 

We store dictionary and dictid in the new catalog table ‘pg_zstd_dictionaries’

```
test=# \d pg_zstd_dictionaries
Table "pg_catalog.pg_zstd_dictionaries"
Column | Type | Collation | Nullable | Default
--------+-------+-----------+----------+---------
dictid | oid | | not null |
dict | bytea | | not null |
Indexes:
"pg_zstd_dictionaries_dictid_index" PRIMARY KEY, btree (dictid)
``` 

This is the entire ZStandard dictionary infrastructure. A column can have multiple dictionaries. The latest dictionary will be identified by the pg_attribute attoptions. We never delete dictionaries once they are generated. If a dictionary is not provided and attcompression is set to zstd, we compress with ZStandard without dictionary. For decompression, the zstd-compressed frame contains a dictionary identifier (dictid) that indicates the dictionary used for compression. By retrieving this dictid from the zstd frame, we then fetch the corresponding dictionary and perform decompression.

#############################################################################

Enter toast compression framework changes,

We identify a compressed datum compression algorithm using the top two bits of va_tcinfo (varattrib_4b.va_compressed). 
It is possible to have four compression methods. However, based on previous community email discussions regarding toast compression changes[3], the idea of using it for a new compression algorithm has been rejected, and a suggestion has been made to extend it which I’ve implemented in this patch. This change necessitates an update to ‘varattrib_4b’ and ‘varatt_external’ on disk structures. I’ve made sure that this changes are backward compatible. 

```
typedef union
{
struct /* Normal varlena (4-byte length) */
{
uint32 va_header;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_4byte;
struct /* Compressed-in-line format */
{
uint32 va_header;
uint32 va_tcinfo; /* Original data size (excludes header) and
* compression method; see va_extinfo */
char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
} va_compressed;
struct
{
uint32 va_header;
uint32 va_tcinfo;
uint32 va_cmp_alg;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_compressed_ext;
} varattrib_4b;

typedef struct varatt_external
{
int32 va_rawsize; /* Original data size (includes header) */
uint32 va_extinfo; /* External saved size (without header) and
* compression method */
Oid va_valueid; /* Unique ID of value within TOAST table */
Oid va_toastrelid; /* RelID of TOAST table containing it */
uint32 va_cmp_alg; /* The additional compression algorithms
* information. */
} varatt_external;
```

As I need to update this structs, I’ve made changes to the existing macros. Additionally added compression and decompression routines related to ZStandard as needed. These are major design changes in the patch to incorporate ZStandard with dictionary compression. 

Please let me know what you think about all this. Are there any concerns with my approach? In particular, I would appreciate your thoughts on the on-disk changes that result from this.

kind regards,

Nikhil Veldanda
Amazon Web Services: https://aws.amazon.com

[1] https://facebook.github.io/zstd/
[2] https://github.com/facebook/zstd
[3] https://www.postgresql.org/message-id/flat/YoMiNmkztrslDbNS%40paquier.xyz

Attachment
On Thu, 6 Mar 2025 at 08:43, Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:
>
> Hi all,
>
> The ZStandard compression algorithm [1][2], though not currently used for TOAST compression in PostgreSQL, offers
significantlyimproved compression ratios compared to lz4/pglz in both dictionary-based and non-dictionary modes.
Attachedfind for review my patch to add ZStandard compression to Postgres. In tests this patch used with a pre-trained
dictionaryachieved up to four times the compression ratio of LZ4, while ZStandard without a dictionary outperformed
LZ4/pglzby about two times during compression of data. 
>
> Notably, this is the first compression algorithm for Postgres that can make use of a dictionary to provide higher
levelsof compression, but dictionaries have to be generated and maintained, and so I’ve had to break new ground in that
regard.To use the dictionary support requires training and storing a dictionary for a given variable-length column
type.On a variable-length column, a SQL function will be called. It will sample the column’s data and feed it into the
ZStandardtraining API which will return a dictionary. In the example, the column is of JSONB type. The SQL function
takesthe table name and the attribute number as inputs. If the training is successful, it will return true; otherwise,
itwill return false. 
>
> ‘’‘
> test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1);
> build_zstd_dict_for_attribute
> -------------------------------
> t
> (1 row)
> ‘’‘
>
> The sampling logic and data to feed to the ZStandard training API can vary by data type. The patch includes an method
towrite other type-specific training functions and includes a default for JSONB, TEXT and BYTEA. There is a new option
called‘build_zstd_dict’ that takes a function name as input in ‘CREATE TYPE’. In this way anyone can write their own
type-specifictraining function by handling sampling logic and returning the necessary information for the ZStandard
trainingAPI in “ZstdTrainingData” format. 
>
> ```
> typedef struct ZstdTrainingData
> {
> char *sample_buffer; /* Pointer to the raw sample buffer */
> size_t *sample_sizes; /* Array of sample sizes */
> int nitems; /* Number of sample sizes */
> } ZstdTrainingData;
> ```
> This information is feed into the ZStandard train API, which generates a dictionary and inserts it into the
dictionarycatalog table. Additionally, we update the ‘pg_attribute’ attribute options to include the unique dictionary
IDfor that specific attribute. During compression, based on the available dictionary ID, we retrieve the dictionary and
useit to compress the documents. I’ve created standard training function (`zstd_dictionary_builder`) for JSONB, TEXT,
andBYTEA. 
>
> We store dictionary and dictid in the new catalog table ‘pg_zstd_dictionaries’
>
> ```
> test=# \d pg_zstd_dictionaries
> Table "pg_catalog.pg_zstd_dictionaries"
> Column | Type | Collation | Nullable | Default
> --------+-------+-----------+----------+---------
> dictid | oid | | not null |
> dict | bytea | | not null |
> Indexes:
> "pg_zstd_dictionaries_dictid_index" PRIMARY KEY, btree (dictid)
> ```
>
> This is the entire ZStandard dictionary infrastructure. A column can have multiple dictionaries. The latest
dictionarywill be identified by the pg_attribute attoptions. We never delete dictionaries once they are generated. If a
dictionaryis not provided and attcompression is set to zstd, we compress with ZStandard without dictionary. For
decompression,the zstd-compressed frame contains a dictionary identifier (dictid) that indicates the dictionary used
forcompression. By retrieving this dictid from the zstd frame, we then fetch the corresponding dictionary and perform
decompression.
>
> #############################################################################
>
> Enter toast compression framework changes,
>
> We identify a compressed datum compression algorithm using the top two bits of va_tcinfo
(varattrib_4b.va_compressed).
> It is possible to have four compression methods. However, based on previous community email discussions regarding
toastcompression changes[3], the idea of using it for a new compression algorithm has been rejected, and a suggestion
hasbeen made to extend it which I’ve implemented in this patch. This change necessitates an update to ‘varattrib_4b’
and‘varatt_external’ on disk structures. I’ve made sure that this changes are backward compatible. 
>
> ```
> typedef union
> {
> struct /* Normal varlena (4-byte length) */
> {
> uint32 va_header;
> char va_data[FLEXIBLE_ARRAY_MEMBER];
> } va_4byte;
> struct /* Compressed-in-line format */
> {
> uint32 va_header;
> uint32 va_tcinfo; /* Original data size (excludes header) and
> * compression method; see va_extinfo */
> char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
> } va_compressed;
> struct
> {
> uint32 va_header;
> uint32 va_tcinfo;
> uint32 va_cmp_alg;
> char va_data[FLEXIBLE_ARRAY_MEMBER];
> } va_compressed_ext;
> } varattrib_4b;
>
> typedef struct varatt_external
> {
> int32 va_rawsize; /* Original data size (includes header) */
> uint32 va_extinfo; /* External saved size (without header) and
> * compression method */
> Oid va_valueid; /* Unique ID of value within TOAST table */
> Oid va_toastrelid; /* RelID of TOAST table containing it */
> uint32 va_cmp_alg; /* The additional compression algorithms
> * information. */
> } varatt_external;
> ```
>
> As I need to update this structs, I’ve made changes to the existing macros. Additionally added compression and
decompressionroutines related to ZStandard as needed. These are major design changes in the patch to incorporate
ZStandardwith dictionary compression. 
>
> Please let me know what you think about all this. Are there any concerns with my approach? In particular, I would
appreciateyour thoughts on the on-disk changes that result from this. 
>
> kind regards,
>
> Nikhil Veldanda
> Amazon Web Services: https://aws.amazon.com
>
> [1] https://facebook.github.io/zstd/
> [2] https://github.com/facebook/zstd
> [3] https://www.postgresql.org/message-id/flat/YoMiNmkztrslDbNS%40paquier.xyz
>

Hi!
I generally love this idea, however I am not convinced in-core support
this is the right direction here. Maybe we can introduce some API
infrastructure here to allow delegating compression to extension's?
This is merely my opinion; perhaps dealing with a redo is not
worthwhile.

I did a brief lookup on patch v1. I feel like this is too much for a
single patch. Take, for example this change:

```
-#define NO_LZ4_SUPPORT() \
+#define NO_METHOD_SUPPORT(method) \
  ereport(ERROR, \
  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), \
- errmsg("compression method lz4 not supported"), \
- errdetail("This functionality requires the server to be built with
lz4 support.")))
+ errmsg("compression method %s not supported", method), \
+ errdetail("This functionality requires the server to be built with
%s support.", method)))
 ```

This could be a separate preliminary refactoring patch in series.
Perhaps we need to divide the patch into smaller pieces if we follow
the suggested course of this thread (in-core support).

I will try to give another in-depth look here soon.

--
Best regards,
Kirill Reshke



06.03.2025 08:32, Nikhil Kumar Veldanda пишет:
> Hi all,
> 
> The ZStandard compression algorithm [1][2], though not currently used for
> TOAST compression in PostgreSQL, offers significantly improved compression
> ratios compared to lz4/pglz in both dictionary-based and non-dictionary
> modes. Attached find for review my patch to add ZStandard compression to
> Postgres. In tests this patch used with a pre-trained dictionary achieved
> up to four times the compression ratio of LZ4, while ZStandard without a
> dictionary outperformed LZ4/pglz by about two times during compression of data.
> 
> Notably, this is the first compression algorithm for Postgres that can make
> use of a dictionary to provide higher levels of compression, but
> dictionaries have to be generated and maintained, and so I’ve had to break
> new ground in that regard. To use the dictionary support requires training
> and storing a dictionary for a given variable-length column type. On a
> variable-length column, a SQL function will be called. It will sample the
> column’s data and feed it into the ZStandard training API which will return
> a dictionary. In the example, the column is of JSONB type. The SQL function
> takes the table name and the attribute number as inputs. If the training is
> successful, it will return true; otherwise, it will return false.
> 
> ‘’‘
> test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1);
> build_zstd_dict_for_attribute
> -------------------------------
> t
> (1 row)
> ‘’‘
> 
> The sampling logic and data to feed to the ZStandard training API can vary
> by data type. The patch includes an method to write other type-specific
> training functions and includes a default for JSONB, TEXT and BYTEA. There
> is a new option called ‘build_zstd_dict’ that takes a function name as
> input in ‘CREATE TYPE’. In this way anyone can write their own type-
> specific training function by handling sampling logic and returning the
> necessary information for the ZStandard training API in “ZstdTrainingData”
> format.
> 
> ```
> typedef struct ZstdTrainingData
> {
> char *sample_buffer; /* Pointer to the raw sample buffer */
> size_t *sample_sizes; /* Array of sample sizes */
> int nitems; /* Number of sample sizes */
> } ZstdTrainingData;
> ```
> This information is feed into the ZStandard train API, which generates a
> dictionary and inserts it into the dictionary catalog table. Additionally,
> we update the ‘pg_attribute’ attribute options to include the unique
> dictionary ID for that specific attribute. During compression, based on the
> available dictionary ID, we retrieve the dictionary and use it to compress
> the documents. I’ve created standard training function
> (`zstd_dictionary_builder`) for JSONB, TEXT, and BYTEA. 
> 
> We store dictionary and dictid in the new catalog table ‘pg_zstd_dictionaries’
> 
> ```
> test=# \d pg_zstd_dictionaries
> Table "pg_catalog.pg_zstd_dictionaries"
> Column | Type | Collation | Nullable | Default
> --------+-------+-----------+----------+---------
> dictid | oid | | not null |
> dict | bytea | | not null |
> Indexes:
> "pg_zstd_dictionaries_dictid_index" PRIMARY KEY, btree (dictid)
> ``` 
> 
> This is the entire ZStandard dictionary infrastructure. A column can have
> multiple dictionaries. The latest dictionary will be identified by the
> pg_attribute attoptions. We never delete dictionaries once they are
> generated. If a dictionary is not provided and attcompression is set to
> zstd, we compress with ZStandard without dictionary. For decompression, the
> zstd-compressed frame contains a dictionary identifier (dictid) that
> indicates the dictionary used for compression. By retrieving this dictid
> from the zstd frame, we then fetch the corresponding dictionary and perform
> decompression.
> 
> #############################################################################
> 
> Enter toast compression framework changes,
> 
> We identify a compressed datum compression algorithm using the top two bits
> of va_tcinfo (varattrib_4b.va_compressed). 
> It is possible to have four compression methods. However, based on previous
> community email discussions regarding toast compression changes[3], the
> idea of using it for a new compression algorithm has been rejected, and a
> suggestion has been made to extend it which I’ve implemented in this patch.
> This change necessitates an update to ‘varattrib_4b’ and ‘varatt_external’
> on disk structures. I’ve made sure that this changes are backward compatible. 
> 
> ```
> typedef union
> {
> struct /* Normal varlena (4-byte length) */
> {
> uint32 va_header;
> char va_data[FLEXIBLE_ARRAY_MEMBER];
> } va_4byte;
> struct /* Compressed-in-line format */
> {
> uint32 va_header;
> uint32 va_tcinfo; /* Original data size (excludes header) and
> * compression method; see va_extinfo */
> char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
> } va_compressed;
> struct
> {
> uint32 va_header;
> uint32 va_tcinfo;
> uint32 va_cmp_alg;
> char va_data[FLEXIBLE_ARRAY_MEMBER];
> } va_compressed_ext;
> } varattrib_4b;
> 
> typedef struct varatt_external
> {
> int32 va_rawsize; /* Original data size (includes header) */
> uint32 va_extinfo; /* External saved size (without header) and
> * compression method */
> Oid va_valueid; /* Unique ID of value within TOAST table */
> Oid va_toastrelid; /* RelID of TOAST table containing it */
> uint32 va_cmp_alg; /* The additional compression algorithms
> * information. */
> } varatt_external;
> ```
> 
> As I need to update this structs, I’ve made changes to the existing macros.
> Additionally added compression and decompression routines related to
> ZStandard as needed. These are major design changes in the patch to
> incorporate ZStandard with dictionary compression. 
> 
> Please let me know what you think about all this. Are there any concerns
> with my approach? In particular, I would appreciate your thoughts on the
> on-disk changes that result from this.
> 
> kind regards,
> 
> Nikhil Veldanda
> Amazon Web Services: https://aws.amazon.com <https://aws.amazon.com/>
> 
> [1] https://facebook.github.io/zstd/ <https://facebook.github.io/zstd/>
> [2] https://github.com/facebook/zstd <https://github.com/facebook/zstd>
> [3] https://www.postgresql.org/message-id/flat/
> YoMiNmkztrslDbNS%40paquier.xyz <https://www.postgresql.org/message-id/flat/
> YoMiNmkztrslDbNS%40paquier.xyz>

Overall idea is great.

I just want to mention LZ4 also have API to use dictionary. Its dictionary
will be as simple as "virtually prepended" text (in contrast to complex
ZStd dictionary format).

I mean, it would be great if "dictionary" will be common property for
different algorithms.

On the other hand, zstd have "super fast" mode which is actually a bit
faster than LZ4 and compresses a bit better. So may be support for
different algos is not essential. (But then we need a way to change
compression level to that "super fast" mode.)

-------
regards
Yura Sokolov aka funny-falcon



Re: ZStandard (with dictionaries) compression support for TOAST compression

From
Aleksander Alekseev
Date:
Hi Nikhil,

Many thanks for working on this. I proposed a similar patch some time
ago [1] but the overall feedback was somewhat mixed so I choose to
focus on something else. Thanks for peeking this up.

> test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1);
> build_zstd_dict_for_attribute
> -------------------------------
> t
> (1 row)

Did you have a chance to familiarize yourself with the corresponding
discussion [1] and probably the previous threads? Particularly it was
pointed out that dictionaries should be built automatically during
VACUUM. We also discussed a special syntax for the feature, besides
other things.

[1]: https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22%3D5xVBg7S4vr5rQ%40mail.gmail.com

-- 
Best regards,
Aleksander Alekseev



Re: ZStandard (with dictionaries) compression support for TOAST compression

From
Nikhil Kumar Veldanda
Date:
Hi,

> Overall idea is great.
>
> I just want to mention LZ4 also have API to use dictionary. Its dictionary
> will be as simple as "virtually prepended" text (in contrast to complex
> ZStd dictionary format).
>
> I mean, it would be great if "dictionary" will be common property for
> different algorithms.
>
> On the other hand, zstd have "super fast" mode which is actually a bit
> faster than LZ4 and compresses a bit better. So may be support for
> different algos is not essential. (But then we need a way to change
> compression level to that "super fast" mode.)
>

zstd compression level and zstd dictionary size is configurable at
attribute level using ALTER TABLE. Default zstd level is 3 and dict
size is 4KB. For super fast mode level can be set to 1.

```
test=# alter table zstd alter column doc set compression zstd;
ALTER TABLE
test=# alter table zstd alter column doc set(zstd_cmp_level = 1);
ALTER TABLE
test=# select * from pg_attribute where attrelid = 'zstd'::regclass
and attname = 'doc';
 attrelid | attname | atttypid | attlen | attnum | atttypmod |
attndims | attbyval | attalign | attstorage | attcompre
ssion | attnotnull | atthasdef | atthasmissing | attidentity |
attgenerated | attisdropped | attislocal | attinhcount
| attcollation | attstattarget | attacl |            attoptions
    | attfdwoptions | attmissingval
----------+---------+----------+--------+--------+-----------+----------+----------+----------+------------+----------
------+------------+-----------+---------------+-------------+--------------+--------------+------------+-------------
+--------------+---------------+--------+----------------------------------+---------------+---------------
    16389 | doc     |     3802 |     -1 |      1 |        -1 |
0 | f        | i        | x          | z
      | f          | f         | f             |             |
     | f            | t          |           0
|            0 |               |        |
{zstd_dictid=1,zstd_cmp_level=1} |               |
(1 row)
```



Re: ZStandard (with dictionaries) compression support for TOAST compression

From
Nikhil Kumar Veldanda
Date:
Hi

On Thu, Mar 6, 2025 at 5:35 AM Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> Hi Nikhil,
>
> Many thanks for working on this. I proposed a similar patch some time
> ago [1] but the overall feedback was somewhat mixed so I choose to
> focus on something else. Thanks for peeking this up.
>
> > test=# select build_zstd_dict_for_attribute('"public"."zstd"', 1);
> > build_zstd_dict_for_attribute
> > -------------------------------
> > t
> > (1 row)
>
> Did you have a chance to familiarize yourself with the corresponding
> discussion [1] and probably the previous threads? Particularly it was
> pointed out that dictionaries should be built automatically during
> VACUUM. We also discussed a special syntax for the feature, besides
> other things.
>
> [1]:
https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22%3D5xVBg7S4vr5rQ%40mail.gmail.com

Restricting dictionary generation to the vacuum process is not ideal
because it limits user control and flexibility. Compression efficiency
is highly dependent on data distribution, which can change
dynamically. By allowing users to generate dictionaries on demand via
an API, they can optimize compression when they detect inefficiencies
rather than waiting for a vacuum process, which may not align with
their needs.

Additionally, since all dictionaries are stored in the catalog table
anyway, users can generate and manage them independently without
interfering with the system’s automatic maintenance tasks. This
approach ensures better adaptability to real-world scenarios where
compression performance needs to be monitored and adjusted in real
time.

---
Nikhil Veldanda



06.03.2025 19:29, Nikhil Kumar Veldanda пишет:
> Hi,
> 
>> Overall idea is great.
>>
>> I just want to mention LZ4 also have API to use dictionary. Its dictionary
>> will be as simple as "virtually prepended" text (in contrast to complex
>> ZStd dictionary format).
>>
>> I mean, it would be great if "dictionary" will be common property for
>> different algorithms.
>>
>> On the other hand, zstd have "super fast" mode which is actually a bit
>> faster than LZ4 and compresses a bit better. So may be support for
>> different algos is not essential. (But then we need a way to change
>> compression level to that "super fast" mode.)
>>
> 
> zstd compression level and zstd dictionary size is configurable at
> attribute level using ALTER TABLE. Default zstd level is 3 and dict
> size is 4KB. For super fast mode level can be set to 1.

No. Super-fast mode levels are negative. See parsing "--fast" parameter in
`programs/zstdcli.c` in zstd's repository and definition of ZSTD_minCLevel().

So, to support "super-fast" mode you have to accept negative compression
levels. I didn't check, probably you're already support them?


-------
regards
Yura Sokolov aka funny-falcon



Re: ZStandard (with dictionaries) compression support for TOAST compression

From
Nikhil Kumar Veldanda
Date:
Hi Yura,

> So, to support "super-fast" mode you have to accept negative compression
> levels. I didn't check, probably you're already support them?
>

The key point I want to emphasize is that both zstd compression levels
and dictionary size should be configurable based on user preferences
at attribute level.

---
Nikhil Veldanda



On Thu, Mar 6, 2025 at 12:43 AM Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:
> Notably, this is the first compression algorithm for Postgres that can make use of a dictionary to provide higher
levelsof compression, but dictionaries have to be generated and maintained, 

I think that solving the problems around using a dictionary is going
to be really hard. Can we see some evidence that the results will be
worth it?

--
Robert Haas
EDB: http://www.enterprisedb.com



Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Mar 6, 2025 at 12:43 AM Nikhil Kumar Veldanda
> <veldanda.nikhilkumar17@gmail.com> wrote:
>> Notably, this is the first compression algorithm for Postgres that can make use of a dictionary to provide higher
levelsof compression, but dictionaries have to be generated and maintained, 

> I think that solving the problems around using a dictionary is going
> to be really hard. Can we see some evidence that the results will be
> worth it?

BTW, this is hardly the first such attempt.  See [1] for a prior
attempt at something fairly similar, which ended up going nowhere.
It'd be wise to understand why that failed before pressing forward.

Note that the thread title for [1] is pretty misleading, as the
original discussion about JSONB-specific compression soon migrated
to discussion of compressing TOAST data using dictionaries.  At
least from a ten-thousand-foot viewpoint, that seems like exactly
what you're proposing here.  I see that you dismissed [1] as
irrelevant upthread, but I think you'd better look closer.

            regards, tom lane

[1] https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22%3D5xVBg7S4vr5rQ%40mail.gmail.com



Re: ZStandard (with dictionaries) compression support for TOAST compression

From
Nikhil Kumar Veldanda
Date:
Hi Robert,

> I think that solving the problems around using a dictionary is going
> to be really hard. Can we see some evidence that the results will be
> worth it?

With the latest patch I've shared,

Using a Kaggle dataset of Nintendo-related tweets[1], we leveraged
PostgreSQL's acquire_sample_rows function to quickly gather just 1,000
sample rows for a specific attribute out of 104695 rows. These raw
samples were passed into Zstd's sampling buffer, generating a custom
dictionary. This dictionary was then directly used to compress the
documents, resulting in 62% of space savings after compressed:

```
test=# \dt+
                                         List of tables
 Schema |      Name      | Type  |  Owner   | Persistence | Access
method |  Size  | Description
--------+----------------+-------+----------+-------------+---------------+--------+-------------
 public | lz4            | table | nikhilkv | permanent   | heap
   | 297 MB |
 public | pglz           | table | nikhilkv | permanent   | heap
   | 259 MB |
 public | zstd_with_dict | table | nikhilkv | permanent   | heap
   | 114 MB |
 public | zstd_wo_dict   | table | nikhilkv | permanent   | heap
   | 210 MB |
(4 rows)
```

We've observed similarly strong results on other datasets as well with
using dictionaries.

[1] https://www.kaggle.com/code/dcalambas/nintendo-tweets-analysis/data

---
Nikhil Veldanda



Re: ZStandard (with dictionaries) compression support for TOAST compression

From
Nikhil Kumar Veldanda
Date:
Hi Tom,

On Thu, Mar 6, 2025 at 11:33 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Thu, Mar 6, 2025 at 12:43 AM Nikhil Kumar Veldanda
> > <veldanda.nikhilkumar17@gmail.com> wrote:
> >> Notably, this is the first compression algorithm for Postgres that can make use of a dictionary to provide higher
levelsof compression, but dictionaries have to be generated and maintained, 
>
> > I think that solving the problems around using a dictionary is going
> > to be really hard. Can we see some evidence that the results will be
> > worth it?
>
> BTW, this is hardly the first such attempt.  See [1] for a prior
> attempt at something fairly similar, which ended up going nowhere.
> It'd be wise to understand why that failed before pressing forward.
>
> Note that the thread title for [1] is pretty misleading, as the
> original discussion about JSONB-specific compression soon migrated
> to discussion of compressing TOAST data using dictionaries.  At
> least from a ten-thousand-foot viewpoint, that seems like exactly
> what you're proposing here.  I see that you dismissed [1] as
> irrelevant upthread, but I think you'd better look closer.
>
>                         regards, tom lane
>
> [1] https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22%3D5xVBg7S4vr5rQ%40mail.gmail.com

Thank you for highlighting the previous discussion—I reviewed [1]
closely. While both methods involve dictionary-based compression, the
approach I'm proposing differs significantly.

The previous method explicitly extracted string values from JSONB and
assigned unique OIDs to each entry, resulting in distinct dictionary
entries for every unique value. In contrast, this approach directly
leverages Zstandard's dictionary training API. We provide raw data
samples to Zstd, which generates a dictionary of a specified size.
This dictionary is then stored in a catalog table and used to compress
subsequent inserts for the specific attribute it was trained on.

Key differences include:

1. No new data types are required.
2. Attributes can optionally have multiple dictionaries; the latest
dictionary is used during compression, and the exact dictionary used
during compression is retrieved and applied for decompression.
3. Compression utilizes Zstandard's trained dictionaries when available.

Additionally, I have provided an option for users to define custom
sampling and training logic, as directly passing raw buffers to the
training API may not always yield optimal results, especially for
certain custom variable-length data types. This flexibility motivates
the necessary adjustments to `pg_type`.

I would greatly appreciate your feedback or any additional suggestions
you might have.

[1] https://www.postgresql.org/message-id/flat/CAJ7c6TOtAB0z1UrksvGTStNE-herK-43bj22%3D5xVBg7S4vr5rQ%40mail.gmail.com

Best regards,
Nikhil Veldanda



Re: ZStandard (with dictionaries) compression support for TOAST compression

From
Aleksander Alekseev
Date:
Hi Nikhil,

> Thank you for highlighting the previous discussion—I reviewed [1]
> closely. While both methods involve dictionary-based compression, the
> approach I'm proposing differs significantly.
>
> The previous method explicitly extracted string values from JSONB and
> assigned unique OIDs to each entry, resulting in distinct dictionary
> entries for every unique value. In contrast, this approach directly
> leverages Zstandard's dictionary training API. We provide raw data
> samples to Zstd, which generates a dictionary of a specified size.
> This dictionary is then stored in a catalog table and used to compress
> subsequent inserts for the specific attribute it was trained on.
>
> [...]

You didn't read closely enough I'm afraid. As Tom pointed out, the
title of the thread is misleading. On top of that there are several
separate threads. I did my best to cross-reference them, but
apparently didn't do good enough.

Initially I proposed to add ZSON extension [1][2] to the PostgreSQL
core. However the idea evolved into TOAST improvements that don't
require a user to use special types. You may also find interesting the
related "Pluggable TOASTer" discussion [3]. The idea there was rather
different but the discussion about extending TOAST pointers so that in
the future we can use something else than ZSTD is relevant.

You will find the recent summary of the reached agreements somewhere
around this message [4], take a look at the thread a bit above and
below it.

I believe this effort is important. You can't, however, simply discard
everything that was discussed in this area for the past several years.
If you want to succeed of course. No one will look at your patch if it
doesn't account for all the previous discussions. I'm sorry, I know
it's disappointing. This being said you should have done better
research before submitting the code. You could just ask if anyone was
working on something like this before and save a lot of time.

Personally I would suggest starting with one little step toward
compression dictionaries. Particularly focusing on extendability of
TOAST pointers. You are going to need to store dictionary ids there
and allow using other compression algorithms in the future. This will
require something like a varint/utf8-like bitmask for this. See the
previous discussions.

[1]: https://github.com/afiskon/zson
[2]: https://postgr.es/m/CAJ7c6TP3fCC9TNKJBQAcEf4c%3DL7XQZ7QvuUayLgjhNQMD_5M_A%40mail.gmail.com
[3]: https://postgr.es/m/224711f9-83b7-a307-b17f-4457ab73aa0a%40sigaev.ru
[4]: https://postgr.es/m/CAJ7c6TPSN06C%2B5cYSkyLkQbwN1C%2BpUNGmx%2BVoGCA-SPLCszC8w%40mail.gmail.com

--
Best regards,
Aleksander Alekseev



Re: ZStandard (with dictionaries) compression support for TOAST compression

From
Aleksander Alekseev
Date:
Hi Robert,

> I think that solving the problems around using a dictionary is going
> to be really hard. Can we see some evidence that the results will be
> worth it?

Compression dictionaries give a good compression ratio (~50%) and also
increase TPS a bit (5-10%) due to better buffer cache utilization. At
least according to synthetic and not trustworthy benchmarks I did some
years ago [1]. The result may be very dependent on the actual data of
course, not to mention particular implementation of the idea.

[1]: https://github.com/afiskon/zson/blob/master/docs/benchmark.md

-- 
Best regards,
Aleksander Alekseev



Re: ZStandard (with dictionaries) compression support for TOAST compression

From
Nikhil Kumar Veldanda
Date:
Hi,

I reviewed the discussions, and while most agreements focused on
changes to the toast pointer, the design I propose requires no
modifications to it. I’ve carefully considered the design choices made
previously, and I recognize Zstd’s clear advantages in compression
efficiency and performance over algorithms like PGLZ and LZ4, we can
integrate it without altering the existing toast pointer
(varatt_external) structure.

By simply using the top two bits of the va_extinfo field (setting them
to '11') in `varatt_external`, we can signal an alternative
compression algorithm, clearly distinguishing new methods from legacy
ones. The specific algorithm used would then be recorded in the
va_cmp_alg field.

This approach addresses the issues raised in the summarized thread[1]
and to leverage dictionaries for the data that can stay in-line. While
my initial patch includes modifications to toast_pointer due to a
single dependency on (pg_column_compression), those changes aren’t
strictly necessary; resolving that dependency separately would make
the overall design even less intrusive.

Here’s an illustrative structure:
```
typedef union
{
    struct    /* Normal varlena (4-byte length) */
    {
        uint32    va_header;
        char    va_data[FLEXIBLE_ARRAY_MEMBER];
    }    va_4byte;
    struct    /* Current Compressed format */
    {
        uint32    va_header;
        uint32    va_tcinfo;    /* Original size and compression method */
        char    va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
    }    va_compressed;
    struct    /* Extended compression format */
    {
        uint32    va_header;
        uint32    va_tcinfo;
        uint32    va_cmp_alg;
        uint32    va_cmp_dictid;
        char    va_data[FLEXIBLE_ARRAY_MEMBER];
    }    va_compressed_ext;
} varattrib_4b;

typedef struct varatt_external
{
int32 va_rawsize; /* Original data size (includes header) */
uint32 va_extinfo; /* External saved size (without header) and
* compression method */ `11` indicates new compression methods.
Oid va_valueid; /* Unique ID of value within TOAST table */
Oid va_toastrelid; /* RelID of TOAST table containing it */
} varatt_external;
```

Decompression flow remains straightforward: once a datum is identified
as external, we detoast it, then we identify the compression algorithm
using `
TOAST_COMPRESS_METHOD` macro which refers to a varattrib_4b structure
not a toast pointer. We retrieve the compression algorithm from either
va_tcinfo or va_cmp_alg based on adjusted macros, and decompress
accordingly.

In summary, integrating Zstandard into the TOAST framework in this
minimally invasive way should yield substantial benefits.

[1] https://www.postgresql.org/message-id/CAJ7c6TPSN06C%2B5cYSkyLkQbwN1C%2BpUNGmx%2BVoGCA-SPLCszC8w%40mail.gmail.com

Best regards,
Nikhil Veldanda

On Fri, Mar 7, 2025 at 3:42 AM Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> Hi Nikhil,
>
> > Thank you for highlighting the previous discussion—I reviewed [1]
> > closely. While both methods involve dictionary-based compression, the
> > approach I'm proposing differs significantly.
> >
> > The previous method explicitly extracted string values from JSONB and
> > assigned unique OIDs to each entry, resulting in distinct dictionary
> > entries for every unique value. In contrast, this approach directly
> > leverages Zstandard's dictionary training API. We provide raw data
> > samples to Zstd, which generates a dictionary of a specified size.
> > This dictionary is then stored in a catalog table and used to compress
> > subsequent inserts for the specific attribute it was trained on.
> >
> > [...]
>
> You didn't read closely enough I'm afraid. As Tom pointed out, the
> title of the thread is misleading. On top of that there are several
> separate threads. I did my best to cross-reference them, but
> apparently didn't do good enough.
>
> Initially I proposed to add ZSON extension [1][2] to the PostgreSQL
> core. However the idea evolved into TOAST improvements that don't
> require a user to use special types. You may also find interesting the
> related "Pluggable TOASTer" discussion [3]. The idea there was rather
> different but the discussion about extending TOAST pointers so that in
> the future we can use something else than ZSTD is relevant.
>
> You will find the recent summary of the reached agreements somewhere
> around this message [4], take a look at the thread a bit above and
> below it.
>
> I believe this effort is important. You can't, however, simply discard
> everything that was discussed in this area for the past several years.
> If you want to succeed of course. No one will look at your patch if it
> doesn't account for all the previous discussions. I'm sorry, I know
> it's disappointing. This being said you should have done better
> research before submitting the code. You could just ask if anyone was
> working on something like this before and save a lot of time.
>
> Personally I would suggest starting with one little step toward
> compression dictionaries. Particularly focusing on extendability of
> TOAST pointers. You are going to need to store dictionary ids there
> and allow using other compression algorithms in the future. This will
> require something like a varint/utf8-like bitmask for this. See the
> previous discussions.
>
> [1]: https://github.com/afiskon/zson
> [2]: https://postgr.es/m/CAJ7c6TP3fCC9TNKJBQAcEf4c%3DL7XQZ7QvuUayLgjhNQMD_5M_A%40mail.gmail.com
> [3]: https://postgr.es/m/224711f9-83b7-a307-b17f-4457ab73aa0a%40sigaev.ru
> [4]: https://postgr.es/m/CAJ7c6TPSN06C%2B5cYSkyLkQbwN1C%2BpUNGmx%2BVoGCA-SPLCszC8w%40mail.gmail.com
>
> --
> Best regards,
> Aleksander Alekseev



Re: ZStandard (with dictionaries) compression support for TOAST compression

From
Nikhil Kumar Veldanda
Date:
Hi all,

Attached an updated version of the patch. Specifically, I've removed
changes related to the TOAST pointer structure. This proposal is
different from earlier discussions on this topic[1], where extending
the TOAST pointer was considered essential for enabling
dictionary-based compression.

Key improvements introduced in this proposal:

1. No Changes to TOAST Pointer: The existing TOAST pointer structure
remains untouched, simplifying integration and minimizing potential
disruptions.

2. Extensible Design: The solution is structured to seamlessly
incorporate future compression algorithms beyond zstd [2], providing
greater flexibility and future-proofing.

3. Inline Data Compression with Dictionary Support: Crucially, this
approach supports dictionary-based compression for inline data.
Dictionaries are highly effective for compressing small-sized
documents, providing substantial storage savings. Please refer to the
attached image from the zstd README[2] for supporting evidence.
Omitting dictionary-based compression for inline data would
significantly reduce these benefits. For example, under previous
design constraints [3], if a 16KB document compressed down to 256
bytes using a dictionary, storing this inline would not have been
feasible. The current proposal addresses this limitation, thereby
fully leveraging dictionary-based compression.

I believe this solution effectively addresses the limitations
identified in our earlier discussions [1][3].

Feedback on this approach would be greatly appreciated, I welcome any
feedback or suggestions you might have.

References:
[1]
https://www.postgresql.org/message-id/flat/CAJ7c6TPSN06C%2B5cYSkyLkQbwN1C%2BpUNGmx%2BVoGCA-SPLCszC8w%40mail.gmail.com
[2] https://github.com/facebook/zstd
[3] https://www.postgresql.org/message-id/CAJ7c6TPSN06C%2B5cYSkyLkQbwN1C%2BpUNGmx%2BVoGCA-SPLCszC8w%40mail.gmail.com

```
typedef union
{
struct /* Normal varlena (4-byte length) */
{
uint32 va_header;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_4byte;
struct /* Compressed-in-line format */
{
uint32 va_header;
uint32 va_tcinfo; /* Original data size (excludes header) and
* compression method; see va_extinfo */
char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
} va_compressed;
struct
{
uint32 va_header;
uint32 va_tcinfo;
uint32 va_cmp_alg;
uint32 va_cmp_dictid;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_compressed_ext;
} varattrib_4b;
```
Additional algorithm information and dictid is stored in varattrib_4b.

Best regards,
Nikhil Veldanda

On Fri, Mar 7, 2025 at 5:35 PM Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:
>
> Hi,
>
> I reviewed the discussions, and while most agreements focused on
> changes to the toast pointer, the design I propose requires no
> modifications to it. I’ve carefully considered the design choices made
> previously, and I recognize Zstd’s clear advantages in compression
> efficiency and performance over algorithms like PGLZ and LZ4, we can
> integrate it without altering the existing toast pointer
> (varatt_external) structure.
>
> By simply using the top two bits of the va_extinfo field (setting them
> to '11') in `varatt_external`, we can signal an alternative
> compression algorithm, clearly distinguishing new methods from legacy
> ones. The specific algorithm used would then be recorded in the
> va_cmp_alg field.
>
> This approach addresses the issues raised in the summarized thread[1]
> and to leverage dictionaries for the data that can stay in-line. While
> my initial patch includes modifications to toast_pointer due to a
> single dependency on (pg_column_compression), those changes aren’t
> strictly necessary; resolving that dependency separately would make
> the overall design even less intrusive.
>
> Here’s an illustrative structure:
> ```
> typedef union
> {
>     struct    /* Normal varlena (4-byte length) */
>     {
>         uint32    va_header;
>         char    va_data[FLEXIBLE_ARRAY_MEMBER];
>     }    va_4byte;
>     struct    /* Current Compressed format */
>     {
>         uint32    va_header;
>         uint32    va_tcinfo;    /* Original size and compression method */
>         char    va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
>     }    va_compressed;
>     struct    /* Extended compression format */
>     {
>         uint32    va_header;
>         uint32    va_tcinfo;
>         uint32    va_cmp_alg;
>         uint32    va_cmp_dictid;
>         char    va_data[FLEXIBLE_ARRAY_MEMBER];
>     }    va_compressed_ext;
> } varattrib_4b;
>
> typedef struct varatt_external
> {
> int32 va_rawsize; /* Original data size (includes header) */
> uint32 va_extinfo; /* External saved size (without header) and
> * compression method */ `11` indicates new compression methods.
> Oid va_valueid; /* Unique ID of value within TOAST table */
> Oid va_toastrelid; /* RelID of TOAST table containing it */
> } varatt_external;
> ```
>
> Decompression flow remains straightforward: once a datum is identified
> as external, we detoast it, then we identify the compression algorithm
> using `
> TOAST_COMPRESS_METHOD` macro which refers to a varattrib_4b structure
> not a toast pointer. We retrieve the compression algorithm from either
> va_tcinfo or va_cmp_alg based on adjusted macros, and decompress
> accordingly.
>
> In summary, integrating Zstandard into the TOAST framework in this
> minimally invasive way should yield substantial benefits.
>
> [1] https://www.postgresql.org/message-id/CAJ7c6TPSN06C%2B5cYSkyLkQbwN1C%2BpUNGmx%2BVoGCA-SPLCszC8w%40mail.gmail.com
>
> Best regards,
> Nikhil Veldanda
>
> On Fri, Mar 7, 2025 at 3:42 AM Aleksander Alekseev
> <aleksander@timescale.com> wrote:
> >
> > Hi Nikhil,
> >
> > > Thank you for highlighting the previous discussion—I reviewed [1]
> > > closely. While both methods involve dictionary-based compression, the
> > > approach I'm proposing differs significantly.
> > >
> > > The previous method explicitly extracted string values from JSONB and
> > > assigned unique OIDs to each entry, resulting in distinct dictionary
> > > entries for every unique value. In contrast, this approach directly
> > > leverages Zstandard's dictionary training API. We provide raw data
> > > samples to Zstd, which generates a dictionary of a specified size.
> > > This dictionary is then stored in a catalog table and used to compress
> > > subsequent inserts for the specific attribute it was trained on.
> > >
> > > [...]
> >
> > You didn't read closely enough I'm afraid. As Tom pointed out, the
> > title of the thread is misleading. On top of that there are several
> > separate threads. I did my best to cross-reference them, but
> > apparently didn't do good enough.
> >
> > Initially I proposed to add ZSON extension [1][2] to the PostgreSQL
> > core. However the idea evolved into TOAST improvements that don't
> > require a user to use special types. You may also find interesting the
> > related "Pluggable TOASTer" discussion [3]. The idea there was rather
> > different but the discussion about extending TOAST pointers so that in
> > the future we can use something else than ZSTD is relevant.
> >
> > You will find the recent summary of the reached agreements somewhere
> > around this message [4], take a look at the thread a bit above and
> > below it.
> >
> > I believe this effort is important. You can't, however, simply discard
> > everything that was discussed in this area for the past several years.
> > If you want to succeed of course. No one will look at your patch if it
> > doesn't account for all the previous discussions. I'm sorry, I know
> > it's disappointing. This being said you should have done better
> > research before submitting the code. You could just ask if anyone was
> > working on something like this before and save a lot of time.
> >
> > Personally I would suggest starting with one little step toward
> > compression dictionaries. Particularly focusing on extendability of
> > TOAST pointers. You are going to need to store dictionary ids there
> > and allow using other compression algorithms in the future. This will
> > require something like a varint/utf8-like bitmask for this. See the
> > previous discussions.
> >
> > [1]: https://github.com/afiskon/zson
> > [2]: https://postgr.es/m/CAJ7c6TP3fCC9TNKJBQAcEf4c%3DL7XQZ7QvuUayLgjhNQMD_5M_A%40mail.gmail.com
> > [3]: https://postgr.es/m/224711f9-83b7-a307-b17f-4457ab73aa0a%40sigaev.ru
> > [4]: https://postgr.es/m/CAJ7c6TPSN06C%2B5cYSkyLkQbwN1C%2BpUNGmx%2BVoGCA-SPLCszC8w%40mail.gmail.com
> >
> > --
> > Best regards,
> > Aleksander Alekseev

Attachment
On Fri, Mar 7, 2025 at 8:36 PM Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:
>     struct    /* Extended compression format */
>     {
>         uint32    va_header;
>         uint32    va_tcinfo;
>         uint32    va_cmp_alg;
>         uint32    va_cmp_dictid;
>         char    va_data[FLEXIBLE_ARRAY_MEMBER];
>     }    va_compressed_ext;
> } varattrib_4b;

First, thanks for sending along the performance results. I agree that
those are promising. Second, thanks for sending these design details.

The idea of keeping dictionaries in pg_zstd_dictionaries literally
forever doesn't seem very appealing, but I'm not sure what the other
options are. I think we've established in previous work in this area
that compressed values can creep into unrelated tables and inside
records or other container types like ranges. Therefore, we have no
good way of knowing when a dictionary is unreferenced and can be
dropped. So in that sense your decision to keep them forever is
"right," but it's still unpleasant. It would even be necessary to make
pg_upgrade carry them over to new versions.

If we could make sure that compressed datums never leaked out into
other tables, then tables could depend on dictionaries and
dictionaries could be dropped when there were no longer any tables
depending on them. But like I say, previous work suggested that this
would be very difficult to achieve. However, without that, I imagine
users generating new dictionaries regularly as the data changes and
eventually getting frustrated that they can't get rid of the old ones.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: ZStandard (with dictionaries) compression support for TOAST compression

From
Nikhil Kumar Veldanda
Date:
Hi Robert,

Thank you for your response, and apologies for the delay in getting
back to you. You raised some important concerns in your reply, I’ve
worked hard to understand and hopefully address these two:

* Dictionary Cleanup via Dependency Tracking
* Addressing Compressed Datum Leaks problem (via CTAS, INSERT INTO ...
SELECT ...)

Dictionary Cleanup via Dependency Tracking:

To address your question on how we can safely clean up unused
dictionaries, I’ve implemented a mechanism based on PostgreSQL’s
standard dependency system (pg_depend), permit me to explain.

When a Zstandard dictionary is created for a table, we record a
DEPENDENCY_NORMAL dependency from the table to the dictionary. This
ensures that when the table is dropped, the corresponding entry is
removed from the pg_depend catalog. Users can then call the
cleanup_unused_dictionaries() function to remove any dictionaries that
are no longer referenced by any table.

// create dependency,
{
    ObjectAddress dictObj;
    ObjectAddress relation;

    ObjectAddressSet(dictObj, ZstdDictionariesRelationId, dictid);
    ObjectAddressSet(relation, RelationRelationId, relid);

    /* NORMAL dependency: relid → Dictionary */
    recordDependencyOn(&relation, &dictObj, DEPENDENCY_NORMAL);
}

Example: Consider two tables, each using its own Zstandard dictionary:

test=# \dt+
                                    List of tables
 Schema | Name  | Type  |  Owner   | Persistence | Access method |
Size  | Description
--------+-------+-------+----------+-------------+---------------+-------+-------------
 public | temp  | table | nikhilkv | permanent   | heap          | 16 kB |
 public | temp1 | table | nikhilkv | permanent   | heap          | 16 kB |
(2 rows)


// Dictionary dependencies
test=# select * from pg_depend where refclassid = 9946;
 classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype
---------+-------+----------+------------+----------+-------------+---------
    1259 | 16389 |        0 |       9946 |        1 |           0 | n
    1259 | 16394 |        0 |       9946 |        2 |           0 | n
(2 rows)

// the corresponding dictionaries:
test=# select * from pg_zstd_dictionaries ;
 dictid |
        dict

--------+----------------------------------------------------------------------------------------------------------------
        ---------------------------------------------------------------------------------------------------------------
        ---------------------------------------------------------------------------------------------------------------
        --------------------------------------
      1 |
\x37a430ec71451a10091010df303333b3770a33f1783c1e8fc7e3f1783ccff3bcf7d442414141414141414141414141414141414141414

14141414141a15028140a8542a15028140a85a2288aa2284a297d74e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1f1783c1e8fc7e3f1789ee779ef01

0100000004000000080000004c6f72656d20697073756d20646f6c6f722073697420616d65742c20636f6e73656374657475722061646970697363696
        e6720656c69742e204c6f72656d2069
      2 |
\x37a430ec7d1a933a091010df303333b3770a33f1783c1e8fc7e3f1783ccff3bcf7d442414141414141414141414141414141414141414

14141414141a15028140a8542a15028140a85a2288aa2284a297d74e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1e1f1783c1e8fc7e3f1789ee779ef01

0100000004000000080000004e696b68696c206b756d616e722076656c64616e64612c206973206f6b61792063616e6469646174652c2068652069732
        0696e2073656174746c65204e696b68696c20
(2 rows)

If cleanup_unused_dictionaries() is called while the dependencies
still exist, nothing is removed:

test=# select cleanup_unused_dictionaries();
 cleanup_unused_dictionaries
-----------------------------
                           0
(1 row)

After dropping temp1, the associated dictionary becomes eligible for cleanup:

test=# drop table temp1;
DROP TABLE

test=# select cleanup_unused_dictionaries();
 cleanup_unused_dictionaries
-----------------------------
                           1
(1 row)

________________________________
Addressing Compressed Datum Leaks problem (via CTAS, INSERT INTO ... SELECT ...)

As compressed datums can be copied to other unrelated tables via CTAS,
INSERT INTO ... SELECT, or CREATE TABLE ... EXECUTE, I’ve introduced a
method inheritZstdDictionaryDependencies. This method is invoked at
the end of such statements and ensures that any dictionary
dependencies from source tables are copied to the destination table.
We determine the set of source tables using the relationOids field in
PlannedStmt.

This guarantees that if compressed datums reference a zstd dictionary
the destination table is marked as dependent on the dictionaries that
the source tables depend on, preventing premature cleanup by
cleanup_unused_dictionaries.

Example: Consider this example where we have two tables which has
their own dictionary

                                    List of tables
 Schema | Name  | Type  |  Owner   | Persistence | Access method |
Size  | Description
--------+-------+-------+----------+-------------+---------------+-------+-------------
 public | temp  | table | nikhilkv | permanent   | heap          | 16 kB |
 public | temp1 | table | nikhilkv | permanent   | heap          | 16 kB |
(2 rows)

Using CTAS (CREATE TABLE AS), one table is copied to another. In this
case, the compressed datums in the temp table are copied to copy_tbl.
Since the dictionary is shared between two tables, a dependency on
that dictionary is also established for the destination table. Even if
the original temp table is deleted and cleanup is triggered, the
dictionary will not be dropped because there remains an active
dependency.

test=# create table copy_tbl as select * from temp;
SELECT 20

// dictid 1 is shared between two tables.
test=# select * from pg_depend where refclassid = 9946;
 classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype
---------+-------+----------+------------+----------+-------------+---------
    1259 | 16389 |        0 |       9946 |        1 |           0 | n
    1259 | 16404 |        0 |       9946 |        1 |           0 | n
    1259 | 16399 |        0 |       9946 |        3 |           0 | n
(3 rows)

// After dropping the temp tale where dictid 1 is used to compress datums
test=# drop table temp;
DROP TABLE

// dependency for temp table is dropped.
test=# select * from pg_depend where refclassid = 9946;
 classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype
---------+-------+----------+------------+----------+-------------+---------
    1259 | 16404 |        0 |       9946 |        1 |           0 | n
    1259 | 16399 |        0 |       9946 |        3 |           0 | n
(2 rows)

// No dictionaries are being deleted.
test=# select cleanup_unused_dictionaries();
 cleanup_unused_dictionaries
-----------------------------
                           0
(1 row)

Once the new copy_tbl is also deleted, the dictionary can be dropped
because no dependency exists on it:

test=# drop table copy_tbl;
DROP TABLE

// The dictionary is then deleted.
test=# select cleanup_unused_dictionaries();
 cleanup_unused_dictionaries
-----------------------------
                           1
(1 row)

Another example using composite types, including a more complex
scenario involving two source tables.

// Create a base composite type with two text fields
test=# create type my_composite as (f1 text, f2 text);
CREATE TYPE

// Create a nested composite type that uses my_composite twice
test=# create type my_composite1 as (f1 my_composite, f2 my_composite);
CREATE TYPE

test=# \d my_composite
      Composite type "public.my_composite"
 Column | Type | Collation | Nullable | Default
--------+------+-----------+----------+---------
 f1     | text |           |          |
 f2     | text |           |          |

test=# \d my_composite1
         Composite type "public.my_composite1"
 Column |     Type     | Collation | Nullable | Default
--------+--------------+-----------+----------+---------
 f1     | my_composite |           |          |
 f2     | my_composite |           |          |


// Sample table with ZSTD dictionary compression on text columns
test=# \d+ orders
                                            Table "public.orders"
   Column    |  Type   | Collation | Nullable | Default | Storage  |
Compression | Stats target | Description
-------------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
 order_id    | integer |           |          |         | plain    |
          |              |
 customer_id | integer |           |          |         | plain    |
          |              |
 random1     | text    |           |          |         | extended |
zstd        |              |
 random2     | text    |           |          |         | extended |
zstd        |              |
Access method: heap

// Sample table with ZSTD dictionary compression on one of the text column
test=# \d+ customers
                                           Table "public.customers"
   Column    |  Type   | Collation | Nullable | Default | Storage  |
Compression | Stats target | Description
-------------+---------+-----------+----------+---------+----------+-------------+--------------+-------------
 customer_id | integer |           |          |         | plain    |
          |              |
 random3     | text    |           |          |         | extended |
zstd        |              |
 random4     | text    |           |          |         | extended |
          |              |
Access method: heap

// Check existing dictionaries: dictid 1 for random1, dictid 2 for
random2, dictid 3 for random3 attribute
test=# select dictid from pg_zstd_dictionaries;
 dictid
--------
      1
      2
      3
(3 rows)

// List all objects dependent on ZSTD dictionaries
test=# select objid::regclass, * from pg_depend where refclassid = 9946;
   objid   | classid | objid | objsubid | refclassid | refobjid |
refobjsubid | deptype
-----------+---------+-------+----------+------------+----------+-------------+---------
 orders    |    1259 | 16391 |        0 |       9946 |        1 |
     0 | n
 orders    |    1259 | 16391 |        0 |       9946 |        2 |
     0 | n
 customers |    1259 | 16396 |        0 |       9946 |        3 |
     0 | n
(3 rows)

// Create new table using nested composite type
// This copies compressed datums into temp1.
test=# create table temp1 as
    select ROW(
            ROW(random3, random4)::my_composite,
            ROW(random1, random2)::my_composite
            )::my_composite1
    from customers full outer join orders using (customer_id);
SELECT 51

test=# select objid::regclass, * from pg_depend where refclassid = 9946;
   objid   | classid | objid | objsubid | refclassid | refobjid |
refobjsubid | deptype
-----------+---------+-------+----------+------------+----------+-------------+---------
 orders    |    1259 | 16391 |        0 |       9946 |        1 |
     0 | n
 temp1     |    1259 | 16423 |        0 |       9946 |        1 |
     0 | n
 orders    |    1259 | 16391 |        0 |       9946 |        2 |
     0 | n
 temp1     |    1259 | 16423 |        0 |       9946 |        2 |
     0 | n
 temp1     |    1259 | 16423 |        0 |       9946 |        3 |
     0 | n
 customers |    1259 | 16396 |        0 |       9946 |        3 |
     0 | n
(6 rows)

// Drop the original source tables.
test=# drop table orders;
DROP TABLE

test=# drop table customers ;
DROP TABLE

// Even after dropping orders, customers table, temp1 still holds
references to the dictionaries.
test=# select objid::regclass, * from pg_depend where refclassid = 9946;
 objid | classid | objid | objsubid | refclassid | refobjid |
refobjsubid | deptype
-------+---------+-------+----------+------------+----------+-------------+---------
 temp1 |    1259 | 16423 |        0 |       9946 |        1 |           0 | n
 temp1 |    1259 | 16423 |        0 |       9946 |        2 |           0 | n
 temp1 |    1259 | 16423 |        0 |       9946 |        3 |           0 | n
(3 rows)

// Attempt cleanup, No cleanup occurs, because temp1 table still
depends on the dictionaries.
test=# select cleanup_unused_dictionaries();
 cleanup_unused_dictionaries
-----------------------------
                           0
(1 row)

test=# select dictid from pg_zstd_dictionaries ;
 dictid
--------
      1
      2
      3
(3 rows)

// Drop the destination table
test=# drop table temp1;
DROP TABLE

// Confirm no remaining dependencies
test=# select objid::regclass, * from pg_depend where refclassid = 9946;
 objid | classid | objid | objsubid | refclassid | refobjid |
refobjsubid | deptype
-------+---------+-------+----------+------------+----------+-------------+---------
(0 rows)

// Cleanup now succeeds
test=# select cleanup_unused_dictionaries();
 cleanup_unused_dictionaries
-----------------------------
                           3
(1 row)

test=# select dictid from pg_zstd_dictionaries ;
 dictid
--------
(0 rows)


This design ensures that:

Dictionaries are only deleted when no table depends on them.
We avoid costly decompression/recompression to avoid compressed datum leakage.
We don’t retain dictionaries forever.

These changes are the core additions in this revision of the patch to
address concern around long-lived dictionaries and compressed datum
leakage. Additionally, this update incorporates feedback by enabling
automatic zstd dictionary generation and cleanup during the VACUUM
process and includes changes to support copying ZSTD dictionaries
during pg_upgrade.

Patch summary:

v11-0001-varattrib_4b-changes-and-macros-update-needed-to.patch
Refactors varattrib_4b structures and updates related macros to enable
ZSTD dictionary support.
v11-0002-Zstd-compression-and-decompression-routines-incl.patch
Adds ZSTD compression and decompression routines, and introduces a new
catalog to store dictionary metadata.
v11-0003-Zstd-dictionary-training-process.patch
Implements the dictionary training workflow. Includes built-in support
for text and jsonb types. Allows users to define custom sampling
functions per type by specifying a C function name in the
pg_type.typzstdsampling field.
v11-0004-Dependency-tracking-mechanism-to-track-compresse.patch
Introduces a dependency tracking mechanism using pg_depend to record
which ZSTD dictionaries a table depends on. When compressed datums
that rely on a dictionary are copied to unrelated target tables, the
corresponding dictionary dependencies from the source table are also
recorded for the target table, ensuring the dictionaries are not
prematurely cleaned up.
v11-0005-generate-and-cleanup-dictionaries-using-vacuum.patch
Adds integration with VACUUM to automatically generate and clean up
ZSTD dictionaries.
v11-0006-pg_dump-pg_upgrade-needed-changes-to-support-new.patch
Extends pg_dump and pg_upgrade to support migrating ZSTD dictionaries
and their dependencies during pg_upgrade.
v11-0007-Some-tests-related-to-zstd-dictionary-based-comp.patch
Provides test coverage for ZSTD dictionary-based compression features,
including training, usage, and cleanup.

I hope that these changes address your concerns, any thoughts or
suggestions on this approach are welcome.

Best regards,
Nikhil Veldanda

On Mon, Mar 17, 2025 at 1:03 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Mar 7, 2025 at 8:36 PM Nikhil Kumar Veldanda
> <veldanda.nikhilkumar17@gmail.com> wrote:
> >     struct    /* Extended compression format */
> >     {
> >         uint32    va_header;
> >         uint32    va_tcinfo;
> >         uint32    va_cmp_alg;
> >         uint32    va_cmp_dictid;
> >         char    va_data[FLEXIBLE_ARRAY_MEMBER];
> >     }    va_compressed_ext;
> > } varattrib_4b;
>
> First, thanks for sending along the performance results. I agree that
> those are promising. Second, thanks for sending these design details.
>
> The idea of keeping dictionaries in pg_zstd_dictionaries literally
> forever doesn't seem very appealing, but I'm not sure what the other
> options are. I think we've established in previous work in this area
> that compressed values can creep into unrelated tables and inside
> records or other container types like ranges. Therefore, we have no
> good way of knowing when a dictionary is unreferenced and can be
> dropped. So in that sense your decision to keep them forever is
> "right," but it's still unpleasant. It would even be necessary to make
> pg_upgrade carry them over to new versions.
>
> If we could make sure that compressed datums never leaked out into
> other tables, then tables could depend on dictionaries and
> dictionaries could be dropped when there were no longer any tables
> depending on them. But like I say, previous work suggested that this
> would be very difficult to achieve. However, without that, I imagine
> users generating new dictionaries regularly as the data changes and
> eventually getting frustrated that they can't get rid of the old ones.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com

Attachment
On Tue, Apr 15, 2025 at 2:13 PM Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:
> Addressing Compressed Datum Leaks problem (via CTAS, INSERT INTO ... SELECT ...)
>
> As compressed datums can be copied to other unrelated tables via CTAS,
> INSERT INTO ... SELECT, or CREATE TABLE ... EXECUTE, I’ve introduced a
> method inheritZstdDictionaryDependencies. This method is invoked at
> the end of such statements and ensures that any dictionary
> dependencies from source tables are copied to the destination table.
> We determine the set of source tables using the relationOids field in
> PlannedStmt.

With the disclaimer that I haven't opened the patch or thought
terribly deeply about this issue, at least not yet, my fairly strong
suspicion is that this design is not going to work out, for multiple
reasons. In no particular order:

1. I don't think users will like it if dependencies on a zstd
dictionary spread like kudzu across all of their tables. I don't think
they'd like it even if it were 100% accurate, but presumably this is
going to add dependencies any time there MIGHT be a real dependency
rather than only when there actually is one.

2. Inserting into a table or updating it only takes RowExclusiveLock,
which is not even self-exclusive. I doubt that it's possible to change
system catalogs in a concurrency-safe way with such a weak lock. For
instance, if two sessions tried to do the same thing in concurrent
transactions, they could both try to add the same dependency at the
same time.

3. I'm not sure that CTAS, INSERT INTO...SELECT, and CREATE
TABLE...EXECUTE are the only ways that datums can creep from one table
into another. For example, what if I create a plpgsql function that
gets a value from one table and stores it in a variable, and then use
that variable to drive an INSERT into another table? I seem to recall
there are complex cases involving records and range types and arrays,
too, where the compressed object gets wrapped inside of another
object; though maybe that wouldn't matter to your implementation if
INSERT INTO ... SELECT uses a sufficiently aggressive strategy for
adding dependencies.

When Dilip and I were working on lz4 TOAST compression, my first
instinct was to not let LZ4-compressed datums leak out of a table by
forcing them to be decompressed (and then possibly recompressed). We
spent a long time trying to make that work before giving up. I think
this is approximately where things started to unravel, and I'd suggest
you read both this message and some of the discussion before and
after:

https://www.postgresql.org/message-id/20210316185455.5gp3c5zvvvq66iyj@alap3.anarazel.de

I think we could add plain-old zstd compression without really
tackling this issue, but if we are going to add dictionaries then I
think we might need to revisit the idea of preventing things from
leaking out of tables. What I can't quite remember at the moment is
how much of the problem was that it was going to be slow to force the
recompression, and how much of it was that we weren't sure we could
even find all the places in the code that might need such handling.

I'm now also curious to know whether Andres would agree that it's bad
if zstd dictionaries are un-droppable. After all, I thought it would
be bad if there was no way to eliminate a dependency on a compression
method, and he disagreed. So maybe he would also think undroppable
dictionaries are fine. But maybe not. It seems even worse to me than
undroppable compression methods, because you'll probably not have that
many compression methods ever, but you could have a large number of
dictionaries eventually.

--
Robert Haas
EDB: http://www.enterprisedb.com



On Fri, Apr 18, 2025 at 12:22:18PM -0400, Robert Haas wrote:
> I think we could add plain-old zstd compression without really
> tackling this issue, but if we are going to add dictionaries then I
> think we might need to revisit the idea of preventing things from
> leaking out of tables. What I can't quite remember at the moment is
> how much of the problem was that it was going to be slow to force the
> recompression, and how much of it was that we weren't sure we could
> even find all the places in the code that might need such handling.

FWIW, this point resonates here.  There is one thing that we have to
do anyway: we just have one bit left in the varlena headers as lz4 is
using the one before last.  So we have to make it extensible, even if
it means that any compression method other than LZ4 and pglz would
consume one more byte in its header by default.  And I think that this
has to happen at some point if we want flexibility in this area.

+    struct
+    {
+        uint32        va_header;
+        uint32        va_tcinfo;
+        uint32        va_cmp_alg;
+        uint32        va_cmp_dictid;
+        char        va_data[FLEXIBLE_ARRAY_MEMBER];
+    }            va_compressed_ext;

Speaking of which, I am confused by this abstraction choice in
varatt.h in the first patch.  Are we sure that we are always going to
have a dictionary attached to a compressed data set or even a
va_cmp_alg?  It seems to me that this could lead to a waste of data in
some cases because these fields may not be required depending on the
compression method used, as some fields may not care about these
details.  This kind of data should be made optional, on a per-field
basis.

One thing that I've been wondering is how it would be possible to make
the area around varattrib_4b more readable while dealing with more
extensibility.  It would be a good occasion to improve that, even if
I'm hand-waving here currently and that the majority of this code is
old enough to vote, with few modifications across the years.

The second thing that I'd love to see on top of the addition of the
extensibility is adding plain compression support for zstd, with
nothing fancy, just the compression and decompression bits.  I've done
quite a few benchmarks with the two, and results kind of point in the
direction that zstd is more efficient than lz4 overall.  Don't take me
wrong: lz4 can be better in some workloads as it can consume less CPU
than zstd while compressing less.  However, a comparison of ratios
like (compression rate / cpu used) has always led me to see zstd as
superior in a large number of cases.  lz4 is still very good if you
are CPU-bound and don't care about the extra space required.  Both are
three classes better than pglz.

Once we have these three points incrementally built-in together (the
last bit extensibility, the potential varatt.h refactoring and the
zstd support), there may be a point in having support for more
advanced options with the compression methods in the shape of dicts or
more requirements linked to other compression methods, but I think the
topic is complex enough that we should make sure that these basics are
implemented in a way sane enough so as we'd be able to extend them
with all the use cases in mind.
--
Michael

Attachment

Re: ZStandard (with dictionaries) compression support for TOAST compression

From
Nikhil Kumar Veldanda
Date:
Hi Robert,

Thank you for your feedback on the patch. You’re right that my
proposed design will introduce more dictionary dependencies as
dictionaries grow, I chose this path specifically to avoid changing
existing system behavior and prevent perf regressions in CTAS and
related commands.

After reviewing the email thread you attached on previous response, I
identified a natural choke point for both inserts and updates: the
call to "heap_toast_insert_or_update" inside
heap_prepare_insert/heap_update. In the current master branch, that
function only runs when HeapTupleHasExternal is true; my patch extends
it to HeapTupleHasVarWidth tuples as well. By decompressing every
nested compressed datum at this point—no matter how deeply nested—we
can prevent any leaked datum from propagating into unrelated tables.
This mirrors the existing inlining logic in toast_tuple_init for
external toasted datum, but takes it one step further to fully flatten
datum(decompress datum, not just top level at every level).

On the performance side, my basic benchmarks show almost no regression
for simple INSERT … VALUES workloads. CTAS, however, does regress
noticeably: a CTAS completes in about 4 seconds before this patch, but
with this patch it takes roughly 24 seconds. (For reference, a normal
insert into the source table took about 58 seconds when using zstd
dictionary compression), I suspect the extra cost comes from the added
zstd decompression and PGLZ compression on the destination table.

I’ve attached v13-0008-initial-draft-to-address-datum-leak-problem.patch,
which implements this “flatten_datum” method.

I’d love to know your thoughts on this. Am I on the right track for
solving the problem?

Best regards,
Nikhil Veldanda

On Fri, Apr 18, 2025 at 9:22 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Apr 15, 2025 at 2:13 PM Nikhil Kumar Veldanda
> <veldanda.nikhilkumar17@gmail.com> wrote:
> > Addressing Compressed Datum Leaks problem (via CTAS, INSERT INTO ... SELECT ...)
> >
> > As compressed datums can be copied to other unrelated tables via CTAS,
> > INSERT INTO ... SELECT, or CREATE TABLE ... EXECUTE, I’ve introduced a
> > method inheritZstdDictionaryDependencies. This method is invoked at
> > the end of such statements and ensures that any dictionary
> > dependencies from source tables are copied to the destination table.
> > We determine the set of source tables using the relationOids field in
> > PlannedStmt.
>
> With the disclaimer that I haven't opened the patch or thought
> terribly deeply about this issue, at least not yet, my fairly strong
> suspicion is that this design is not going to work out, for multiple
> reasons. In no particular order:
>
> 1. I don't think users will like it if dependencies on a zstd
> dictionary spread like kudzu across all of their tables. I don't think
> they'd like it even if it were 100% accurate, but presumably this is
> going to add dependencies any time there MIGHT be a real dependency
> rather than only when there actually is one.
>
> 2. Inserting into a table or updating it only takes RowExclusiveLock,
> which is not even self-exclusive. I doubt that it's possible to change
> system catalogs in a concurrency-safe way with such a weak lock. For
> instance, if two sessions tried to do the same thing in concurrent
> transactions, they could both try to add the same dependency at the
> same time.
>
> 3. I'm not sure that CTAS, INSERT INTO...SELECT, and CREATE
> TABLE...EXECUTE are the only ways that datums can creep from one table
> into another. For example, what if I create a plpgsql function that
> gets a value from one table and stores it in a variable, and then use
> that variable to drive an INSERT into another table? I seem to recall
> there are complex cases involving records and range types and arrays,
> too, where the compressed object gets wrapped inside of another
> object; though maybe that wouldn't matter to your implementation if
> INSERT INTO ... SELECT uses a sufficiently aggressive strategy for
> adding dependencies.
>
> When Dilip and I were working on lz4 TOAST compression, my first
> instinct was to not let LZ4-compressed datums leak out of a table by
> forcing them to be decompressed (and then possibly recompressed). We
> spent a long time trying to make that work before giving up. I think
> this is approximately where things started to unravel, and I'd suggest
> you read both this message and some of the discussion before and
> after:
>
> https://www.postgresql.org/message-id/20210316185455.5gp3c5zvvvq66iyj@alap3.anarazel.de
>
> I think we could add plain-old zstd compression without really
> tackling this issue, but if we are going to add dictionaries then I
> think we might need to revisit the idea of preventing things from
> leaking out of tables. What I can't quite remember at the moment is
> how much of the problem was that it was going to be slow to force the
> recompression, and how much of it was that we weren't sure we could
> even find all the places in the code that might need such handling.
>
> I'm now also curious to know whether Andres would agree that it's bad
> if zstd dictionaries are un-droppable. After all, I thought it would
> be bad if there was no way to eliminate a dependency on a compression
> method, and he disagreed. So maybe he would also think undroppable
> dictionaries are fine. But maybe not. It seems even worse to me than
> undroppable compression methods, because you'll probably not have that
> many compression methods ever, but you could have a large number of
> dictionaries eventually.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com

Attachment

Re: ZStandard (with dictionaries) compression support for TOAST compression

From
Nikhil Kumar Veldanda
Date:
Hi Michael,

Thanks for the feedback and the suggested patch sequence. I completely
agree—we must minimize storage overhead when dictionaries aren’t used,
while ensuring varattrib_4b remains extensible enough to handle future
compression metadata beyond dictionary ID (for other algorithms). I’ll
explore design options that satisfy both goals and share my proposal.

Best regards,
Nikhil Veldanda

On Mon, Apr 21, 2025 at 12:02 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Apr 18, 2025 at 12:22:18PM -0400, Robert Haas wrote:
> > I think we could add plain-old zstd compression without really
> > tackling this issue, but if we are going to add dictionaries then I
> > think we might need to revisit the idea of preventing things from
> > leaking out of tables. What I can't quite remember at the moment is
> > how much of the problem was that it was going to be slow to force the
> > recompression, and how much of it was that we weren't sure we could
> > even find all the places in the code that might need such handling.
>
> FWIW, this point resonates here.  There is one thing that we have to
> do anyway: we just have one bit left in the varlena headers as lz4 is
> using the one before last.  So we have to make it extensible, even if
> it means that any compression method other than LZ4 and pglz would
> consume one more byte in its header by default.  And I think that this
> has to happen at some point if we want flexibility in this area.
>
> +    struct
> +    {
> +        uint32        va_header;
> +        uint32        va_tcinfo;
> +        uint32        va_cmp_alg;
> +        uint32        va_cmp_dictid;
> +        char        va_data[FLEXIBLE_ARRAY_MEMBER];
> +    }            va_compressed_ext;
>
> Speaking of which, I am confused by this abstraction choice in
> varatt.h in the first patch.  Are we sure that we are always going to
> have a dictionary attached to a compressed data set or even a
> va_cmp_alg?  It seems to me that this could lead to a waste of data in
> some cases because these fields may not be required depending on the
> compression method used, as some fields may not care about these
> details.  This kind of data should be made optional, on a per-field
> basis.
>
> One thing that I've been wondering is how it would be possible to make
> the area around varattrib_4b more readable while dealing with more
> extensibility.  It would be a good occasion to improve that, even if
> I'm hand-waving here currently and that the majority of this code is
> old enough to vote, with few modifications across the years.
>
> The second thing that I'd love to see on top of the addition of the
> extensibility is adding plain compression support for zstd, with
> nothing fancy, just the compression and decompression bits.  I've done
> quite a few benchmarks with the two, and results kind of point in the
> direction that zstd is more efficient than lz4 overall.  Don't take me
> wrong: lz4 can be better in some workloads as it can consume less CPU
> than zstd while compressing less.  However, a comparison of ratios
> like (compression rate / cpu used) has always led me to see zstd as
> superior in a large number of cases.  lz4 is still very good if you
> are CPU-bound and don't care about the extra space required.  Both are
> three classes better than pglz.
>
> Once we have these three points incrementally built-in together (the
> last bit extensibility, the potential varatt.h refactoring and the
> zstd support), there may be a point in having support for more
> advanced options with the compression methods in the shape of dicts or
> more requirements linked to other compression methods, but I think the
> topic is complex enough that we should make sure that these basics are
> implemented in a way sane enough so as we'd be able to extend them
> with all the use cases in mind.
> --
> Michael



Hi,

On 2025-04-18 12:22:18 -0400, Robert Haas wrote:
> On Tue, Apr 15, 2025 at 2:13 PM Nikhil Kumar Veldanda
> <veldanda.nikhilkumar17@gmail.com> wrote:
> > Addressing Compressed Datum Leaks problem (via CTAS, INSERT INTO ... SELECT ...)
> >
> > As compressed datums can be copied to other unrelated tables via CTAS,
> > INSERT INTO ... SELECT, or CREATE TABLE ... EXECUTE, I’ve introduced a
> > method inheritZstdDictionaryDependencies. This method is invoked at
> > the end of such statements and ensures that any dictionary
> > dependencies from source tables are copied to the destination table.
> > We determine the set of source tables using the relationOids field in
> > PlannedStmt.
> 
> With the disclaimer that I haven't opened the patch or thought
> terribly deeply about this issue, at least not yet, my fairly strong
> suspicion is that this design is not going to work out, for multiple
> reasons. In no particular order:
> 
> 1. I don't think users will like it if dependencies on a zstd
> dictionary spread like kudzu across all of their tables. I don't think
> they'd like it even if it were 100% accurate, but presumably this is
> going to add dependencies any time there MIGHT be a real dependency
> rather than only when there actually is one.
> 
> 2. Inserting into a table or updating it only takes RowExclusiveLock,
> which is not even self-exclusive. I doubt that it's possible to change
> system catalogs in a concurrency-safe way with such a weak lock. For
> instance, if two sessions tried to do the same thing in concurrent
> transactions, they could both try to add the same dependency at the
> same time.
> 
> 3. I'm not sure that CTAS, INSERT INTO...SELECT, and CREATE
> TABLE...EXECUTE are the only ways that datums can creep from one table
> into another. For example, what if I create a plpgsql function that
> gets a value from one table and stores it in a variable, and then use
> that variable to drive an INSERT into another table? I seem to recall
> there are complex cases involving records and range types and arrays,
> too, where the compressed object gets wrapped inside of another
> object; though maybe that wouldn't matter to your implementation if
> INSERT INTO ... SELECT uses a sufficiently aggressive strategy for
> adding dependencies.

+1 to all of these.


> I think we could add plain-old zstd compression without really
> tackling this issue

+1


> I'm now also curious to know whether Andres would agree that it's bad
> if zstd dictionaries are un-droppable. After all, I thought it would
> be bad if there was no way to eliminate a dependency on a compression
> method, and he disagreed.

I still am not too worried about that aspect. However:


> So maybe he would also think undroppable dictionaries are fine.

I'm much less sanguine about this. Imagine a schema based multi-tenancy setup,
where tenants come and go, and where a few of the tables use custom
dictionaries. Whereas not being able to get rid of lz4 at all has basically no
cost whatsoever, collecting more and more unusable dictionaries can imply a
fair amount of space usage after a while. I don't see any argument why that
would be ok, really.


> But maybe not. It seems even worse to me than undroppable compression
> methods, because you'll probably not have that many compression methods
> ever, but you could have a large number of dictionaries eventually.

Agreed on the latter.

Greetings,

Andres Freund



On Mon, Apr 21, 2025 at 8:52 PM Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:
> After reviewing the email thread you attached on previous response, I
> identified a natural choke point for both inserts and updates: the
> call to "heap_toast_insert_or_update" inside
> heap_prepare_insert/heap_update. In the current master branch, that
> function only runs when HeapTupleHasExternal is true; my patch extends
> it to HeapTupleHasVarWidth tuples as well.

Isn't that basically all tuples, though? I think that's where this gets painful.

> On the performance side, my basic benchmarks show almost no regression
> for simple INSERT … VALUES workloads. CTAS, however, does regress
> noticeably: a CTAS completes in about 4 seconds before this patch, but
> with this patch it takes roughly 24 seconds. (For reference, a normal
> insert into the source table took about 58 seconds when using zstd
> dictionary compression), I suspect the extra cost comes from the added
> zstd decompression and PGLZ compression on the destination table.

That's nice to know, but I think the key question is not so much what
the feature costs when it is used but what it costs when it isn't
used. If we implement a system where we don't let
dictionary-compressed zstd datums leak out of tables, that's bound to
slow down a CTAS from a table where this feature is used, but that's
kind of OK: the feature has pros and cons, and if you don't like those
tradeoffs, you don't have to use it. However, it sounds like this
could also slow down inserts and updates in some cases even for users
who are not making use of the feature, and that's going to be a major
problem unless it can be shown that there is no case where the impact
is at all significant. Users hate paying for features that they aren't
using.

I wonder if there's a possible design where we only allow
dictionary-compressed datums to exist as top-level attributes in
designated tables to which those dictionaries are attached; and any
time you try to bury that Datum inside a container object (row, range,
array, whatever) detoasting is forced. If there's a clean and
inexpensive way to implement that, then you could avoid having
heap_toast_insert_or_update care about HeapTupleHasExternal(), which
seems like it might be a key point.

--
Robert Haas
EDB: http://www.enterprisedb.com



On Wed, Apr 23, 2025 at 11:59 AM Robert Haas <robertmhaas@gmail.com> wrote:
> heap_toast_insert_or_update care about HeapTupleHasExternal(), which
> seems like it might be a key point.

Care about HeapTupleHasVarWidth, rather.

--
Robert Haas
EDB: http://www.enterprisedb.com



On Wed, Apr 23, 2025 at 11:59:26AM -0400, Robert Haas wrote:
> That's nice to know, but I think the key question is not so much what
> the feature costs when it is used but what it costs when it isn't
> used. If we implement a system where we don't let
> dictionary-compressed zstd datums leak out of tables, that's bound to
> slow down a CTAS from a table where this feature is used, but that's
> kind of OK: the feature has pros and cons, and if you don't like those
> tradeoffs, you don't have to use it. However, it sounds like this
> could also slow down inserts and updates in some cases even for users
> who are not making use of the feature, and that's going to be a major
> problem unless it can be shown that there is no case where the impact
> is at all significant. Users hate paying for features that they aren't
> using.

The cost of digesting a dictionnary when decompressing sets of values
is also something I think we should worry about, FWIW (see [1]), as
the digesting cost is documented as costly, so I think that there is
also an argument in making the feature efficient if used.  That would
hurt if a sequential scan needs to detoast multiple blobs with the
same dict.  If we attach that on a per-value value, wouldn't it imply
that we need to digest the dictionnary every time a blob is
decompressed?  This information could be cached, but it seems a bit
weird to me to invent a new level of relation caching for would could
be attached as a relation attribute option in the relcache.  If a
dictionnary gets trained with a new sample of values, we could rely on
the invalidation to pass the new information.

Based on what I'm reading and I know very little about the topic so I
may be wrong, but does it even make sense to allow multiple
dictionnaries to be used in a single attribute?  Of course that may
depend on the JSON blob patterns a single attribute is dealing with,
but I'm not sure that this is worth the extra complexity this creates.

> I wonder if there's a possible design where we only allow
> dictionary-compressed datums to exist as top-level attributes in
> designated tables to which those dictionaries are attached; and any
> time you try to bury that Datum inside a container object (row, range,
> array, whatever) detoasting is forced. If there's a clean and
> inexpensive way to implement that, then you could avoid having
> heap_toast_insert_or_update care about HeapTupleHasExternal(), which
> seems like it might be a key point.

Interesting, not sure.

FWIW, I'd still try to focus on making varatt more extensible with
plain zstd support first, because diving in all these details.  We are
going to need it anyway.

[1]: https://facebook.github.io/zstd/zstd_manual.html#Chapter10
--
Michael

Attachment

Re: ZStandard (with dictionaries) compression support for TOAST compression

From
Nikhil Kumar Veldanda
Date:
Hi Michael,

Thanks for the suggestions. I agree that we should first solve the
“last–free-bit” problem in varattrib_4b compression bits before
layering on any features. Below is the approach I’ve prototyped to
keep the header compact yet fully extensible, followed by a sketch of
the plain-ZSTD(no dict) patch that sits cleanly on top of it.

1. Minimal but extensible header

/* varatt_cmp_extended follows va_tcinfo when the upper two bits of
 * va_tcinfo are 11.  Compressed data starts immediately after
 * ext_data.  ext_hdr encodes both the compression algorithm and the
 * byte-length of the algorithm-specific metadata.
 */
typedef struct varatt_cmp_extended
{
    uint32 ext_hdr;                 /* [ meta_size:24 | cmpr_id:8 ] */
    char   ext_data[FLEXIBLE_ARRAY_MEMBER];  /* optional metadata */
} varatt_cmp_extended;

a. 24 bits for length → per-datum compression algorithm metadata is
capped at 16 MB, which is far more than any realistic compression
header.
b. 8 bits for algorithm id → up to 256 algorithms.
c. Zero-overhead when unused if an algorithm needs no per-datum
metadata (e.g., ZSTD-nodict),

2. Algorithm registry
/*
 * TOAST compression methods enumeration.
 *
 * Each entry defines:
 *   - NAME         : identifier for the compression algorithm
 *   - VALUE        : numeric enum value
 *   - METADATA type: struct type holding extra info (void when none)
 *
 * The INVALID entry is a sentinel and must remain last.
 */
#define TOAST_COMPRESSION_LIST                                          \
    X(PGLZ,         0, void)                 /* existing */             \
    X(LZ4,          1, void)                 /* existing */             \
    X(ZSTD_NODICT,  2, void)                 /* new, no metadata */     \
    X(ZSTD_DICT,    3, zstd_dict_meta)       /* new, needs dict_id */   \
    X(INVALID,      4, void)                 /* sentinel */

typedef enum ToastCompressionId
{
#define X(name,val,meta) TOAST_##name##_COMPRESSION_ID = val,
    TOAST_COMPRESSION_LIST
#undef X
} ToastCompressionId;

/* Example of an algorithm-specific metadata block */
typedef struct
{
    uint32 dict_id;     /* dictionary Oid */
} zstd_dict_meta;

3. Resulting on-disk layouts for zstd

ZSTD no dict: datum ondisk layout:
+----------------------------------+
| va_header (uint32)           |
+----------------------------------+
| va_tcinfo (uint32)              |  (11 in top two bits specify extended)
+----------------------------------+
| ext_hdr (uint32)                |  <-- [ meta size:24 bits |
compression id:8 bits ]
+----------------------------------+
| Compressed bytes …       |  <-- zstd   (no dictionary)
+----------------------------------+

ZSTD dict: datum ondisk layout
+----------------------------------+
| va_header (uint32)           |
+----------------------------------+
| va_tcinfo (uint32)              |
+----------------------------------+
| ext_hdr (uint32)                |  <-- [ meta size:24 bits |
compression id:8 bits ]
+----------------------------------+
| dict_id (uint32)                  |  <-- zstd_dict_meta
+----------------------------------+
| Compressed bytes …       |  <-- zstd   (dictionary)
+----------------------------------+

4. How does this fit?

Flexibility: Each new algorithm that needs extra metadata simply
defines its own struct and allocates varatt_cmp_extended in
setup_compression_info.
Storage: Everything in varatt_cmp_extended is copied to the datum,
immediately followed by the compressed payload.
Optional, pay-as-you-go metadata – only algorithms that need it pay for it.
Future-proof – new compression algorithms, requires any kind of
metadata like dictid or any other slot into the same ext_data
mechanism.

I’ve split the work into two patches for review:
v19-0001-varattrib_4b-design-proposal-to-make-it-extended.patch:
varattrib_4b extensibility – adds varatt_cmp_extended, enum plumbing,
and macros; behaviour unchanged.
v19-0002-zstd-nodict-support.patch: Plain ZSTD (non dict) support.

Please share your thoughts—and I’d love to hear feedback on the design. Thanks!

On Mon, Apr 21, 2025 at 12:02 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Apr 18, 2025 at 12:22:18PM -0400, Robert Haas wrote:
> > I think we could add plain-old zstd compression without really
> > tackling this issue, but if we are going to add dictionaries then I
> > think we might need to revisit the idea of preventing things from
> > leaking out of tables. What I can't quite remember at the moment is
> > how much of the problem was that it was going to be slow to force the
> > recompression, and how much of it was that we weren't sure we could
> > even find all the places in the code that might need such handling.
>
> FWIW, this point resonates here.  There is one thing that we have to
> do anyway: we just have one bit left in the varlena headers as lz4 is
> using the one before last.  So we have to make it extensible, even if
> it means that any compression method other than LZ4 and pglz would
> consume one more byte in its header by default.  And I think that this
> has to happen at some point if we want flexibility in this area.
>
> +    struct
> +    {
> +        uint32        va_header;
> +        uint32        va_tcinfo;
> +        uint32        va_cmp_alg;
> +        uint32        va_cmp_dictid;
> +        char        va_data[FLEXIBLE_ARRAY_MEMBER];
> +    }            va_compressed_ext;
>
> Speaking of which, I am confused by this abstraction choice in
> varatt.h in the first patch.  Are we sure that we are always going to
> have a dictionary attached to a compressed data set or even a
> va_cmp_alg?  It seems to me that this could lead to a waste of data in
> some cases because these fields may not be required depending on the
> compression method used, as some fields may not care about these
> details.  This kind of data should be made optional, on a per-field
> basis.
>
> One thing that I've been wondering is how it would be possible to make
> the area around varattrib_4b more readable while dealing with more
> extensibility.  It would be a good occasion to improve that, even if
> I'm hand-waving here currently and that the majority of this code is
> old enough to vote, with few modifications across the years.
>
> The second thing that I'd love to see on top of the addition of the
> extensibility is adding plain compression support for zstd, with
> nothing fancy, just the compression and decompression bits.  I've done
> quite a few benchmarks with the two, and results kind of point in the
> direction that zstd is more efficient than lz4 overall.  Don't take me
> wrong: lz4 can be better in some workloads as it can consume less CPU
> than zstd while compressing less.  However, a comparison of ratios
> like (compression rate / cpu used) has always led me to see zstd as
> superior in a large number of cases.  lz4 is still very good if you
> are CPU-bound and don't care about the extra space required.  Both are
> three classes better than pglz.
>
> Once we have these three points incrementally built-in together (the
> last bit extensibility, the potential varatt.h refactoring and the
> zstd support), there may be a point in having support for more
> advanced options with the compression methods in the shape of dicts or
> more requirements linked to other compression methods, but I think the
> topic is complex enough that we should make sure that these basics are
> implemented in a way sane enough so as we'd be able to extend them
> with all the use cases in mind.
> --
> Michael



--
Nikhil Veldanda

--
Nikhil Veldanda

Attachment
On Fri, Apr 25, 2025 at 11:15 AM Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:
> a. 24 bits for length → per-datum compression algorithm metadata is
> capped at 16 MB, which is far more than any realistic compression
> header.
> b. 8 bits for algorithm id → up to 256 algorithms.
> c. Zero-overhead when unused if an algorithm needs no per-datum
> metadata (e.g., ZSTD-nodict),

I don't understand why we need to spend 24 bits on a length header
here. I agree with the idea of adding a 1-byte quantity for algorithm
here, but I don't see why we need anything more than that. If the
compression method is zstd-with-a-dict, then the payload data
presumably needs to start with the OID of the dictionary, but it seems
like in your schema every single datum would use these 3 bytes to
store the fact that sizeof(Oid) = 4. The code that interprets
zstd-with-dict datums should already know the header length. Even if
generic code that works with all types of compression needs to be able
to obtain the header length on a per-compression-type basis, there can
be some kind of callback or table for that, rather than storing it in
every single datum.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: ZStandard (with dictionaries) compression support for TOAST compression

From
Nikhil Kumar Veldanda
Date:
Hi Robert,

Thanks for raising that question. The idea behind including a 24-bit
length field alongside the 1-byte algorithm ID is to ensure that each
compressed datum self-describes its metadata size. This allows any
compression algorithm to embed variable-length metadata (up to 16 MB)
without the need for hard-coding header sizes. For instance, an
algorithm in feature might require different metadata lengths for each
datum, and a fixed header size table wouldn’t work. By storing the
length in the header, we maintain a generic and future-proof design. I
would greatly appreciate any feedback on this design. Thanks!

On Mon, Apr 28, 2025 at 7:50 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Apr 25, 2025 at 11:15 AM Nikhil Kumar Veldanda
> <veldanda.nikhilkumar17@gmail.com> wrote:
> > a. 24 bits for length → per-datum compression algorithm metadata is
> > capped at 16 MB, which is far more than any realistic compression
> > header.
> > b. 8 bits for algorithm id → up to 256 algorithms.
> > c. Zero-overhead when unused if an algorithm needs no per-datum
> > metadata (e.g., ZSTD-nodict),
>
> I don't understand why we need to spend 24 bits on a length header
> here. I agree with the idea of adding a 1-byte quantity for algorithm
> here, but I don't see why we need anything more than that. If the
> compression method is zstd-with-a-dict, then the payload data
> presumably needs to start with the OID of the dictionary, but it seems
> like in your schema every single datum would use these 3 bytes to
> store the fact that sizeof(Oid) = 4. The code that interprets
> zstd-with-dict datums should already know the header length. Even if
> generic code that works with all types of compression needs to be able
> to obtain the header length on a per-compression-type basis, there can
> be some kind of callback or table for that, rather than storing it in
> every single datum.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com



--
Nikhil Veldanda



Hi,

Nikhil, please consider existing discussions on using dictionaries
(mentioned above by Aleksander) and extending the TOAST pointer [1],
it seems you did not check them.

The same question Robert asked above - it's unclear why the header
wastes so much space. You mentioned metadata length - what metadata
do you mean there?

Also Robert pointed out very questionable approaches in your solution -
new dependencies crawling around user tables, new catalog table
with very unclear lifecycle (and, having new catalog table, immediately
having questions with pg_upgrade).
Currently I'm looking through the patch and could share my thoughts
later.

While reading this thread I've thought about storing a dictionary within
the table it is used for - IIUC on dictionary is used for just one attribute,
so it does not make sense to make it global.

Also, I have a question regarding the Zstd implementation you propose - 
does it provide a possibility for partial decompression?


Thanks!


--
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
On Mon, Apr 28, 2025 at 5:32 PM Nikhil Kumar Veldanda
<veldanda.nikhilkumar17@gmail.com> wrote:
> Thanks for raising that question. The idea behind including a 24-bit
> length field alongside the 1-byte algorithm ID is to ensure that each
> compressed datum self-describes its metadata size. This allows any
> compression algorithm to embed variable-length metadata (up to 16 MB)
> without the need for hard-coding header sizes. For instance, an
> algorithm in feature might require different metadata lengths for each
> datum, and a fixed header size table wouldn’t work. By storing the
> length in the header, we maintain a generic and future-proof design. I
> would greatly appreciate any feedback on this design. Thanks!

I feel like I gave you some feedback on the design already, which was
that it seems like a waste of 3 bytes to me.

Don't get me wrong: I'm quite impressed by the way you're working on
this problem and I hope you stick around and keep working on it and
figure something out. But I don't quite understand the point of this
response: it seems like you're just restating what the design does
without really justifying it. The question here isn't whether a 3-byte
header can describe a length up to 16MB; I think we all know our
powers of two well enough to agree on the answer to that question. The
question is whether it's a good use of 3 bytes, and I don't think it
is.

I did consider the fact that future compression algorithms might want
to use variable-length headers; but I couldn't see a reason why we
shouldn't let each of those compression algorithms decide for
themselves how to encode whatever information they need. If a
compression algorithm needs a variable-length header, then it just
needs to make that header self-describing. Worst case scenario, it can
make the first byte of that variable-length header a length byte, and
then go from there; but it's probably possible to be even smarter and
use less than a full byte. Say for example we store a dictionary ID
that in concept is a 32-bit quantity but we use a variable-length
integer representation for it. It's easy to see that we shouldn't ever
need more than 3 bits for that so a full length byte is overkill and,
in fact, would undermine the value of a variable-length representation
rather severely. (I suspect it's a bad idea anyway, but it's a worse
idea if you burn a full byte on a length header.)

But there's an even larger question here too, which is why we're
having some kind of discussion about generalized metadata when the
current project seemingly only requires a 4-byte dictionary OID. If
you have some other use of this space in mind, I don't think you've
told us what it is. If you don't, then I'm not sure why we're
designing around an up-to-16MB variable-length quantity when what we
have before us is a 4-byte fixed-length quantity.

Moreover, even if you do have some (undisclosed) idea about what else
might be stored in this metadata area, why would it be important or
even desirable to have the length of that area represented in some
uniform way across compression methods? There's no obvious need for
any code outside the compression method itself to be able to decompose
the Datum into a metadata portion and a payload portion. After all,
the metadata portion could be anything so there's no way for anything
but the compression method to interpret it usefully. If we do want to
have outside code be able to ask questions, we could design some kind
of callback interface - e.g. if we end up with multiple compression
methods that store dictionary OIDs and they maybe do it in different
ways, each could provide an
"extract-the-dictionary-OID-from-this-datum" callback and each
compression method can implement that however it likes.

Maybe you can argue that we will eventually end up with various
compression method callbacks each of which is capable of working on
the metadata, and so then we might want to take an initial slice of a
toasted datum that is just big enough to allow that to work. But that
is pretty hypothetical, and in practice the first chunk of the TOAST
value (~2k) seems like it'd probably work well for most cases.

So, again, if you want us to take seriously the idea of dedicating 3
bytes per Datum to something, you need to give us a really good reason
for so doing. The fact a 24-bit metadata length can describe a
metadata header of up to 2^24 bits isn't a reason, good or bad. It's
just math.

--
Robert Haas
EDB: http://www.enterprisedb.com