Re: row filtering for logical replication - Mailing list pgsql-hackers
From | Peter Smith |
---|---|
Subject | Re: row filtering for logical replication |
Date | |
Msg-id | CAHut+PsgRHymwLhJ9t3By6+KNaVDzfjf6Y4Aq=JRD-y8t1mEFg@mail.gmail.com Whole thread Raw |
In response to | Re: row filtering for logical replication (Peter Smith <smithpb2250@gmail.com>) |
Responses |
Re: row filtering for logical replication
|
List | pgsql-hackers |
On Fri, Aug 27, 2021 at 8:01 AM Peter Smith <smithpb2250@gmail.com> wrote: > > On Thu, Aug 26, 2021 at 9:13 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > On Thu, Aug 26, 2021 at 3:41 PM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > On Thu, Aug 26, 2021 at 3:00 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > On Thu, Aug 26, 2021 at 9:51 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > > > > > On Thu, Aug 26, 2021 at 1:20 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > On Thu, Aug 26, 2021 at 7:37 AM Peter Smith <smithpb2250@gmail.com> wrote: > > > > > > > > > > > > > > On Wed, Aug 25, 2021 at 3:28 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > Hmm, I think the gain via caching is not visible because we are using > > > > > > > > simple expressions. It will be visible when we use somewhat complex > > > > > > > > expressions where expression evaluation cost is significant. > > > > > > > > Similarly, the impact of this change will magnify and it will also be > > > > > > > > visible when a publication has many tables. Apart from performance, > > > > > > > > this change is logically correct as well because it would be any way > > > > > > > > better if we don't invalidate the cached expressions unless required. > > > > > > > > > > > > > > Please tell me what is your idea of a "complex" row filter expression. > > > > > > > Do you just mean a filter that has multiple AND conditions in it? I > > > > > > > don't really know if few complex expressions would amount to any > > > > > > > significant evaluation costs, so I would like to run some timing tests > > > > > > > with some real examples to see the results. > > > > > > > > > > > > > > > > > > > I think this means you didn't even understand or are convinced why the > > > > > > patch has cache in the first place. As per your theory, even if we > > > > > > didn't have cache, it won't matter but that is not true otherwise, the > > > > > > patch wouldn't have it. > > > > > > > > > > I have never said there should be no caching. On the contrary, my > > > > > performance test results [1] already confirmed that caching ExprState > > > > > is of benefit for the millions of times it may be used in the > > > > > pgoutput_row_filter function. My only doubts are in regard to how much > > > > > observable impact there would be re-evaluating the filter expression > > > > > just a few extra times by the get_rel_sync_entry function. > > > > > > > > > > > > > I think it depends but why in the first place do you want to allow > > > > re-evaluation when there is a way for not doing that? > > > > > > Because the current code logic of having the "delayed" ExprState > > > evaluation does come at some cost. > > > > > > > So, now you mixed it with the second point. Here, I was talking about > > the need for correct invalidation but you started discussing when to > > first time evaluate the expression, both are different things. > > > > > And the cost is - > > > a. Needing an extra condition and more code in the function pgoutput_row_filter > > > b. Needing to maintain the additional Node list > > > > > > > I am not sure you need (b) above and I think (a) should make the > > overall code look clean. > > > > > If we chose not to implement a delayed ExprState cache evaluation then > > > there would still be a (one-time) ExprState cache evaluation but it > > > would happen whenever get_rel_sync_entry is called (regardless of if > > > pgoputput_row_filter is subsequently called). E.g. there can be some > > > rebuilds of the ExprState cache if the user calls TRUNCATE. > > > > > > > Apart from Truncate, it will also be a waste if any error happens > > before actually evaluating the filter, tomorrow there could be other > > operations like replication of sequences (I have checked that proposed > > patch for sequences uses get_rel_sync_entry) where we don't need to > > build ExprState (as filters might or might not be there). So, it would > > be better to avoid cache lookups in those cases if possible. I still > > think doing expensive things like preparing expressions should ideally > > be done only when it is required. > > OK. Per your suggestion, I will try to move as much of the row-filter > cache code as possible out of the get_rel_sync_entry function and into > the pgoutput_row_filter function. > Here are the new v26* patches. This is a refactoring of the row-filter caches to remove all the logic from the get_rel_sync_entry function and delay it until if/when needed in the pgoutput_row_filter function. This is now implemented per Amit's suggestion to move all the cache code [1]. It is a replacement for the v25* patches. The make check and TAP subscription tests are all OK. I have repeated the performance tests [2] and those results are good too. v26-0001 <--- v23 (base RF patch) v26-0002 <--- ExprState cache mods (refactored row filter caching) v26-0002 <--- ExprState cache extra debug logging (temp) ------ [1] https://www.postgresql.org/message-id/CAA4eK1%2Btio46goUKBUfAKFsFVxtgk8nOty%3DTxKoKH-gdLzHD2g%40mail.gmail.com [2] https://www.postgresql.org/message-id/CAHut%2BPs5j7mkO0xLmNW%3DkXh0eezGoKyzBCiQc9bfkCiM_MVDrg%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia.
Attachment
pgsql-hackers by date: