Re: Re: [HACKERS] Custom compression methods - Mailing list pgsql-hackers
From | Dilip Kumar |
---|---|
Subject | Re: Re: [HACKERS] Custom compression methods |
Date | |
Msg-id | CAFiTN-v2_h2+nPv6A-XLrs+E-ms970Xr78wGYv8A9fmDb1P0AQ@mail.gmail.com Whole thread Raw |
In response to | Re: Re: [HACKERS] Custom compression methods (Dilip Kumar <dilipbalaut@gmail.com>) |
Responses |
Re: [HACKERS] Custom compression methods
|
List | pgsql-hackers |
On Mon, Sep 28, 2020 at 4:18 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > On Sat, Sep 19, 2020 at 1:19 PM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > On Tue, Aug 25, 2020 at 11:20 PM Robert Haas <robertmhaas@gmail.com> wrote: > > > > > > On Mon, Aug 24, 2020 at 2:12 AM Dilip Kumar <dilipbalaut@gmail.com> wrote: > > > > IIUC, the main reason for using this flag is for taking the decision > > > > whether we need any detoasting for this tuple. For example, if we are > > > > rewriting the table because the compression method is changed then if > > > > HEAP_HASCUSTOMCOMPRESSED bit is not set in the tuple header and tuple > > > > length, not tup->t_len > TOAST_TUPLE_THRESHOLD then we don't need to > > > > call heap_toast_insert_or_update function for this tuple. Whereas if > > > > this flag is set then we need to because we might need to uncompress > > > > and compress back using a different compression method. The same is > > > > the case with INSERT into SELECT * FROM. > > > > > > This doesn't really seem worth it to me. I don't see how we can > > > justify burning an on-disk bit just to save a little bit of overhead > > > during a rare maintenance operation. If there's a performance problem > > > here we need to look for another way of mitigating it. Slowing CLUSTER > > > and/or VACUUM FULL down by a large amount for this feature would be > > > unacceptable, but is that really a problem? And if so, can we solve it > > > without requiring this bit? > > > > Okay, if we want to avoid keeping the bit then there are multiple ways > > to handle this, but the only thing is none of that will be specific to > > those scenarios. > > approach1. In ExecModifyTable, we can process the source tuple and see > > if any of the varlena attributes is compressed and its stored > > compression method is not the same as the target table attribute then > > we can decompress it. > > approach2. In heap_prepare_insert, always call the > > heap_toast_insert_or_update, therein we can check if any of the source > > tuple attributes are compressed with different compression methods > > then the target table then we can decompress it. > > > > With either of the approach, we have to do this in a generic path > > because the source of the tuple is not known, I mean it can be a > > output from a function, or the join tuple or a subquery. So in the > > attached patch, I have implemented with approach1. > > > > For testing, I have implemented using approach1 as well as using > > approach2 and I have checked the performance of the pg_bench to see > > whether it impacts the performance of the generic paths or not, but I > > did not see any impact. > > > > > > > > > I have already extracted these 2 patches from the main patch set. > > > > But, in these patches, I am still storing the am_oid in the toast > > > > header. I am not sure can we get rid of that at least for these 2 > > > > patches? But, then wherever we try to uncompress the tuple we need to > > > > know the tuple descriptor to get the am_oid but I think that is not > > > > possible in all the cases. Am I missing something here? > > > > > > I think we should instead use the high bits of the toast size word for > > > patches #1-#4, as discussed upthread. > > > > > > > > > Patch #3. Add support for changing the compression method associated > > > > > > with a column, forcing a table rewrite. > > > > > > > > > > > > Patch #4. Add support for PRESERVE, so that you can change the > > > > > > compression method associated with a column without forcing a table > > > > > > rewrite, by including the old method in the PRESERVE list, or with a > > > > > > rewrite, by not including it in the PRESERVE list. > > > > > > > > Does this make sense to have Patch #3 and Patch #4, without having > > > > Patch #5? I mean why do we need to support rewrite or preserve unless > > > > we have the customer compression methods right? because the build-in > > > > compression method can not be dropped so why do we need to preserve? > > > > > > I think that patch #3 makes sense because somebody might have a table > > > that is currently compressed with pglz and they want to switch to lz4, > > > and I think patch #4 also makes sense because they might want to start > > > using lz4 for future data but not force a rewrite to get rid of all > > > the pglz data they've already got. Those options are valuable as soon > > > as there is more than one possible compression algorithm, even if > > > they're all built in. Now, as I said upthread, it's also true that you > > > could do #5 before #3 and #4. I don't think that's insane. But I > > > prefer it in the other order, because I think having #5 without #3 and > > > #4 wouldn't be too much fun for users. > > > > Details of the attached patch set > > > > 0001: This provides syntax to set the compression method from the > > built-in compression method (pglz or zlib). pg_attribute stores the > > compression method (char) and there are conversion functions to > > convert that compression method to the built-in compression array > > index. As discussed up thread the first 2 bits will be storing the > > compression method index using that we can directly get the handler > > routing using the built-in compression method array. > > > > 0002: This patch provides an option to changes the compression method > > for an existing column and it will rewrite the table. > > > > Next, I will be working on providing an option to alter the > > compression method without rewriting the whole table, basically, we > > can provide a preserve list to preserve old compression methods. > > I have rebased the patch and I have also done a couple of defect fixes > and some cleanup. Here is the next patch which allows providing a PRESERVE list using this we can avoid table rewrite while altering the compression method. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: