Re: Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: Re: [HACKERS] Custom compression methods |
Date | |
Msg-id | CAFiTN-uUpX3ck=K0mLEk-G_kUQY=SNOTeqdaNRR9FMdQrHKebw@mail.gmail.com Whole thread Raw |
In response to | Re: Re: [HACKERS] Custom compression methods (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Re: [HACKERS] Custom compression methods
Re: Re: [HACKERS] Custom compression methods Re: [HACKERS] Custom compression methods |
List | pgsql-hackers |
On Fri, Jun 19, 2020 at 10:33 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Thu, Mar 7, 2019 at 2:51 AM Alexander Korotkov > <a.korotkov@postgrespro.ru> wrote: > > Yes. I took a look at code of this patch. I think it's in pretty good shape. But high level review/discussion is required. > > I agree that the code of this patch is in pretty good shape, although > there is a lot of rebasing needed at this point. Here is an attempt at > some high level review and discussion: > > - As far as I can see, there is broad agreement that we shouldn't > consider ourselves to be locked into 'pglz' forever. I believe > numerous people have reported that there are other methods of doing > compression that either compress better, or compress faster, or > decompress faster, or all of the above. This isn't surprising and nor > is it a knock on 'pglz'; Jan committed it in 1999, and it's not > surprising that in 20 years some people have come up with better > ideas. Not only that, but the quantity and quality of open source > software that is available for this kind of thing and for many other > kinds of things have improved dramatically in that time. > > - I can see three possible ways of breaking our dependence on 'pglz' > for TOAST compression. Option #1 is to pick one new algorithm which we > think is better than 'pglz' in all relevant ways and use it as the > default for all new compressed datums. This would be dramatically > simpler than what this patch does, because there would be no user > interface. It would just be out with the old and in with the new. > Option #2 is to create a short list of new algorithms that have > different trade-offs; e.g. one that is very fast (like lz4) and one > that has an extremely high compression ratio, and provide an interface > for users to choose between them. This would be moderately simpler > than what this patch does, because we would expose to the user > anything about how a new compression method could be added, but it > would still require a UI for the user to choose between the available > (and hard-coded) options. It has the further advantage that every > PostgreSQL cluster will offer the same options (or a subset of them, > perhaps, depending on configure flags) and so you don't have to worry > that, say, a pg_am row gets lost and suddenly all of your toasted data > is inaccessible and uninterpretable. Option #3 is to do what this > patch actually does, which is to allow for the addition of any number > of compressors, including by extensions. It has the advantage that new > compressors can be added with core's permission, so, for example, if > it is unclear whether some excellent compressor is free of patent > problems, we can elect not to ship support for it in core, while at > the same time people who are willing to accept the associated legal > risk can add that functionality to their own copy as an extension > without having to patch core. The legal climate may even vary by > jurisdiction, so what might be questionable in country A might be > clearly just fine in country B. Aside from those issues, this approach > allows people to experiment and innovate outside of core relatively > quickly, instead of being bound by the somewhat cumbrous development > process which has left this patch in limbo for the last few years. My > view is that option #1 is likely to be impractical, because getting > people to agree is hard, and better things are likely to come along > later, and people like options. So I prefer either #2 or #3. > > - The next question is how a datum compressed with some non-default > method should be represented on disk. The patch handles this first of > all by making the observation that the compressed size can't be >=1GB, > because the uncompressed size can't be >=1GB, and we wouldn't have > stored it compressed if it expanded. Therefore, the upper two bits of > the compressed size should always be zero on disk, and the patch > steals one of them to indicate whether "custom" compression is in use. > If it is, the 4-byte varlena header is followed not only by a 4-byte > size (now with the new flag bit also included) but also by a 4-byte > OID, indicating the compression AM in use. I don't think this is a > terrible approach, but I don't think it's amazing, either. 4 bytes is > quite a bit to use for this; if I guess correctly what will be a > typical cluster configuration, you probably would really only need > about 2 bits. For a datum that is both stored externally and > compressed, the overhead is likely negligible, because the length is > probably measured in kB or MB. But for a datum that is compressed but > not stored externally, it seems pretty expensive; the datum is > probably short, and having an extra 4 bytes of uncompressible data > kinda sucks. One possibility would be to allow only one byte here: > require each compression AM that is installed to advertise a one-byte > value that will denote its compressed datums. If more than one AM > tries to claim the same byte value, complain. Another possibility is > to abandon this approach and go with #2 from the previous paragraph. > Or maybe we add 1 or 2 "privileged" built-in compressors that get > dedicated bit-patterns in the upper 2 bits of the size field, with the > last bit pattern being reserved for future algorithms. (e.g. 0x00 = > pglz, 0x01 = lz4, 0x10 = zstd, 0x11 = something else - see within for > details). > > - I don't really like the use of the phrase "custom compression". I > think the terminology needs to be rethought so that we just talk about > compression methods. Perhaps in certain contexts we need to specify > that we mean extensible compression methods or user-provided > compression methods or something like that, but I don't think the word > "custom" is very well-suited here. The main point of this shouldn't be > for every cluster in the universe to use a different approach to > compression, or to compress columns within a database in 47 different > ways, but rather to help us get out from under 'pglz'. Eventually we > probably want to change the default, but as the patch phrases things > now, that default would be a custom method, which is almost a > contradiction in terms. > > - Yet another possible approach to the on-disk format is to leave > varatt_external.va_extsize and varattrib_4b.rawsize untouched and > instead add new compression methods by adding new vartag_external > values. There's quite a lot of bit-space available there: we have a > whole byte, and we're currently only using 4 values. We could easily > add a half-dozen new possibilities there for new compression methods > without sweating the bit-space consumption. The main thing I don't > like about this is that it only seems like a useful way to provide for > out-of-line compression. Perhaps it could be generalized to allow for > inline compression as well, but it seems like it would take some > hacking. > > - One thing I really don't like about the patch is that it consumes a > bit from infomask2 for a new flag HEAP_HASCUSTOMCOMPRESSED. infomask > bits are at a premium, and there's been no real progress in the decade > plus that I've been hanging around here in clawing back any bit-space. > I think we really need to avoid burning our remaining bits for > anything other than a really critical need, and I don't think I > understand what the need is in this case. I might be missing > something, but I'd really strongly suggest looking for a way to get > rid of this. It also invents the concept of a TupleDesc flag, and the > flag invented is TD_ATTR_CUSTOM_COMPRESSED; I'm not sure I see why we > need that, either. > > - It seems like this kind of approach has a sort of built-in > circularity problem. It means that every place that might need to > detoast a datum needs to be able to access the pg_am catalog. I wonder > if that's actually true. For instance, consider logical decoding. I > guess that can do catalog lookups in general, but can it do them from > the places where detoasting is happening? Moreover, can it do them > with the right snapshot? Suppose we rewrite a table to change the > compression method, then drop the old compression method, then try to > decode a transaction that modified that table before those operations > were performed. As an even more extreme example, suppose we need to > open pg_am, and to do that we have to build a relcache entry for it, > and suppose the relevant pg_class entry had a relacl or reloptions > field that happened to be custom-compressed. Or equally suppose that > any of the various other tables we use when building a relcache entry > had the same kind of problem, especially those that have TOAST tables. > We could just disallow the use of non-default compressors in the > system catalogs, but the benefits mentioned in > http://postgr.es/m/5541614A.5030208@2ndquadrant.com seem too large to > ignore. > > - I think it would be awfully appealing if we could find some way of > dividing this great big patch into some somewhat smaller patches. For > example: > > Patch #1. Add syntax allowing a compression method to be specified, > but the only possible choice is pglz, and the PRESERVE stuff isn't > supported, and changing the value associated with an existing column > isn't supported, but we can add tab-completion support and stuff. > > Patch #2. Add a second built-in method, like gzip or lz4. > > Patch #3. Add support for changing the compression method associated > with a column, forcing a table rewrite. > > Patch #4. Add support for PRESERVE, so that you can change the > compression method associated with a column without forcing a table > rewrite, by including the old method in the PRESERVE list, or with a > rewrite, by not including it in the PRESERVE list. > > Patch #5. Add support for compression methods via the AM interface. > Perhaps methods added in this manner are prohibited in system > catalogs. (This could also go before #4 or even before #3, but with a > noticeable hit to usability.) > > Patch #6 (new development). Add a contrib module using the facility > added in #5, perhaps with a slightly off-beat compressor like bzip2 > that is more of a niche use case. > > I think that if the patch set were broken up this way, it would be a > lot easier to review and get committed. I think you could commit each > bit separately. I don't think you'd want to commit #1 unless you had a > sense that #2 was pretty close to done, and similarly for #5 and #6, > but that would still make things a lot easier than having one giant > monolithic patch, at least IMHO. > > There might be more to say here, but that's what I have got for now. I > hope it helps. I have rebased the patch on the latest head and currently, broken into 3 parts. v1-0001: As suggested by Robert, it provides the syntax support for setting the compression method for a column while creating a table and adding columns. However, we don't support changing the compression method for the existing column. As part of this patch, there is only one built-in compression method that can be set (pglz). In this, we have one in-build am (pglz) and the compressed attributes will directly store the oid of the AM. In this patch, I have removed the pg_attr_compresion as we don't support changing the compression for the existing column so we don't need to preserve the old compressions. v1-0002: Add another built-in compression method (zlib) v1:0003: Remaining patch set (nothing is changed except rebase on the current head, stabilizing check-world and 0001 and 0002 are pulled out of this) Next, I will be working on separating out the remaining patches as per the suggestion by Robert. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: