Thread: Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY
From
Aleksander Alekseev
Date:
Hi Shlok, > Here the generated column 'b' is set as REPLICA IDENTITY for table > 'testpub_gencol'. When we create publication 'pub_gencol' we do not > specify any column list, so column 'b' will not be published. > So, the update message generated by the last UPDATE would have NULL > for column 'b'. > > To avoid the issue, we can disallow UPDATE/DELETE on table with > unpublished generated column as REPLICA IDENTITY. I have attached a > patch for the same. I don't think this would be a correct fix. Let's say I *don't* have any publications: ``` =# CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1) STORED NOT NULL); CREATE TABLE =# CREATE UNIQUE INDEX testpub_gencol_idx ON testpub_gencol (b); CREATE INDEX =# INSERT INTO testpub_gencol (a) VALUES (1); INSERT 0 1 =# UPDATE testpub_gencol SET a = 100 WHERE a = 1; UPDATE 1 eax=# SELECT * FROM testpub_gencol ; a | b -----+----- 100 | 101 (1 row) ``` So far everything works fine. You are saying that when one creates a publication UPDATEs should stop working. That would be rather surprising behavior for a typical user not to mention that it will break the current behavior. I believe one would expect that both UPDATEs and the publication should continue to work. Perhaps we should forbid the creation of a publication like this instead. Or alternatively include a generated column to the publication list if it's used as a replica identity. Or maybe even keep everything as is. Thoughts? -- Best regards, Aleksander Alekseev
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY
From
Shlok Kyal
Date:
Hi Aleksander, > > > Here the generated column 'b' is set as REPLICA IDENTITY for table > > 'testpub_gencol'. When we create publication 'pub_gencol' we do not > > specify any column list, so column 'b' will not be published. > > So, the update message generated by the last UPDATE would have NULL > > for column 'b'. > > > > To avoid the issue, we can disallow UPDATE/DELETE on table with > > unpublished generated column as REPLICA IDENTITY. I have attached a > > patch for the same. > > I don't think this would be a correct fix. Let's say I *don't* have > any publications: > > ``` > =# CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1) > STORED NOT NULL); > CREATE TABLE > > =# CREATE UNIQUE INDEX testpub_gencol_idx ON testpub_gencol (b); > CREATE INDEX > > =# INSERT INTO testpub_gencol (a) VALUES (1); > INSERT 0 1 > > =# UPDATE testpub_gencol SET a = 100 WHERE a = 1; > UPDATE 1 > eax=# SELECT * FROM testpub_gencol ; > a | b > -----+----- > 100 | 101 > (1 row) > ``` > > So far everything works fine. You are saying that when one creates a > publication UPDATEs should stop working. That would be rather > surprising behavior for a typical user not to mention that it will > break the current behavior. > > I believe one would expect that both UPDATEs and the publication > should continue to work. Perhaps we should forbid the creation of a > publication like this instead. Or alternatively include a generated > column to the publication list if it's used as a replica identity. Or > maybe even keep everything as is. > > Thoughts? > While testing I found that similar behaviors already exist in some cases. Where once we create a publication UPDATES might stop working. For example: Case1: postgres=# create table t1(c1 int); CREATE TABLE postgres=# insert into t1 values(1); INSERT 0 1 postgres=# update t1 set c1 = 100 where c1 = 1; UPDATE 1 postgres=# create publication pub for table t1; CREATE PUBLICATION postgres=# update t1 set c1 = 100 where c1 = 1; ERROR: cannot update table "t1" because it does not have a replica identity and publishes updates HINT: To enable updating the table, set REPLICA IDENTITY using ALTER TABLE. Case2: postgres=# create table t2(c1 int, c2 int not null); CREATE TABLE postgres=# create unique index t2_idx on t2 (c2); CREATE INDEX postgres=# alter table t2 replica identity using index t2_idx; ALTER TABLE postgres=# insert into t2 values(1,1); INSERT 0 1 postgres=# update t2 set c1 = 100 where c1 = 1; UPDATE 1 postgres=# create publication pub2 for table t2 where (c1 > 10); CREATE PUBLICATION postgres=# update t2 set c1 = 100 where c1 = 1; ERROR: cannot update table "t2" DETAIL: Column used in the publication WHERE expression is not part of the replica identity. Behaviour with the patch provided in [1] to resolve the issue: postgres=# create table t3(c1 int, c2 INT GENERATED ALWAYS AS (c1 + 1) STORED NOT NULL); CREATE TABLE postgres=# create unique index t3_idx on t3 (c2); CREATE INDEX postgres=# alter table t3 replica identity using index t3_idx; ALTER TABLE postgres=# insert into t3 values(1); INSERT 0 1 postgres=# update t3 set c1 = 100 where c1 = 1; UPDATE 1 postgres=# create publication pub3 for table t3; CREATE PUBLICATION postgres=# update t3 set c1 = 100 where c1 = 1; ERROR: cannot update table "t3" DETAIL: Column list used by the publication does not cover the replica identity. So, I think this behavior would be acceptable. Thoughts? [1]: https://www.postgresql.org/message-id/CANhcyEVw4V2Awe2AB6i0E5AJLNdASShGfdBLbUd1XtWDboymCA%40mail.gmail.com Thanks and Regards, Shlok Kyal
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY
From
Aleksander Alekseev
Date:
Hi Shlok, > So, I think this behavior would be acceptable. Thoughts? That's a fair point, thanks for sharing. Personally I find this behavior somewhat suboptimal but since we already have it in certain cases I guess what you propose might be acceptable. I'm still not entirely happy about breaking the existing behavior in the discussed case. Not sure what the lesser evil would be - breaking it or keeping it as is. Some input from other people on the mailing list would be appreciated. -- Best regards, Aleksander Alekseev
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY
From
Amit Kapila
Date:
On Wed, Nov 6, 2024 at 5:48 PM Aleksander Alekseev <aleksander@timescale.com> wrote: > > > So, I think this behavior would be acceptable. Thoughts? > > That's a fair point, thanks for sharing. Personally I find this > behavior somewhat suboptimal but since we already have it in certain > cases I guess what you propose might be acceptable. > This is not a suboptimal behavior but a must to have, otherwise, there is no way we can identify the row to update on the subscriber side. Also, this is not in certain cases but in all cases for UPDATE/DELETE, we need REPLICA IDENTITY to be set. See more about REPLICA IDENTITY in Alter Table docs [1]. The problem reported by Shlok is that even though we have a REPLICA IDENTITY defined on a generated column but still won't send the required column value (as generated columns are skipped by default) to the subscriber which will lead to ERROR as mentioned below. Now, one can argue that this is not expected from the user or why the user would have such a setup but I think we should fix the problem if it leads to unexpected behavior on the subscriber. > I'm still not entirely happy about breaking the existing behavior in > the discussed case. Not sure what the lesser evil would be - breaking > it or keeping it as is. > The current behavior is not acceptable because it would generate an ERROR as follows on the subscriber: 2024-11-07 10:50:31.381 IST [16260] ERROR: publisher did not send replica identity column expected by the logical replication target relation "public.testpub_gencol" 2024-11-07 10:50:31.381 IST [16260] CONTEXT: processing remote data for replication origin "pg_16389" during message type "UPDATE" for replication target relation "public.testpub_gencol" in transaction 748, finished at 0/176D5D8 2024-11-07 10:50:31.398 IST [6216] LOG: background worker "logical replication apply worker" (PID 16260) exited with exit code 1 > Some input from other people on the mailing list would be appreciated. > We should fix this in the HEAD and back branches. [1] - https://www.postgresql.org/docs/devel/sql-altertable.html -- With Regards, Amit Kapila.
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY
From
Amit Kapila
Date:
On Thu, Nov 7, 2024 at 11:04 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Wed, Nov 6, 2024 at 5:48 PM Aleksander Alekseev > <aleksander@timescale.com> wrote: > > We should fix this in the HEAD and back branches. > BTW, I was thinking as to how to fix it on back branches and it seems we should restrict to define REPLICA IDENTITY on stored generated columns in the first place in back branches as those can't be replicated. So, the following should fail: CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1) STORED NOT NULL); CREATE UNIQUE INDEX testpub_gencol_idx ON testpub_gencol (b); ALTER TABLE testpub_gencol REPLICA IDENTITY USING index testpub_gencol_idx; Peter, do you have an opinion on this? [1] - https://www.postgresql.org/docs/devel/ddl-generated-columns.html -- With Regards, Amit Kapila.
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY
From
Alvaro Herrera
Date:
On 2024-Nov-07, Amit Kapila wrote: > BTW, I was thinking as to how to fix it on back branches and it seems > we should restrict to define REPLICA IDENTITY on stored generated > columns in the first place in back branches as those can't be > replicated. So, the following should fail: > > CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1) > STORED NOT NULL); > CREATE UNIQUE INDEX testpub_gencol_idx ON testpub_gencol (b); > ALTER TABLE testpub_gencol REPLICA IDENTITY USING index testpub_gencol_idx; > > Peter, do you have an opinion on this? I think a blanket restriction of this sort is not a good idea (at least in back branches), because there might be people using replica identities with stacks other than pgoutput. Would it work to enforce the restriction when such a table is added to a publication? -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Nunca confiaré en un traidor. Ni siquiera si el traidor lo he creado yo" (Barón Vladimir Harkonnen)
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY
From
Amit Kapila
Date:
On Fri, Nov 8, 2024 at 5:17 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2024-Nov-07, Amit Kapila wrote: > > > BTW, I was thinking as to how to fix it on back branches and it seems > > we should restrict to define REPLICA IDENTITY on stored generated > > columns in the first place in back branches as those can't be > > replicated. So, the following should fail: > > > > CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1) > > STORED NOT NULL); > > CREATE UNIQUE INDEX testpub_gencol_idx ON testpub_gencol (b); > > ALTER TABLE testpub_gencol REPLICA IDENTITY USING index testpub_gencol_idx; > > > > Peter, do you have an opinion on this? > > I think a blanket restriction of this sort is not a good idea (at least > in back branches), because there might be people using replica > identities with stacks other than pgoutput. > Do you mean to say that people using plugins other than pgoutput may already be sending generated columns, so defining replica identity should be okay for them? > > Would it work to enforce > the restriction when such a table is added to a publication? > But what if somebody defines REPLICA IDENTITY on the generated column after adding the table to the publication? -- With Regards, Amit Kapila.
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY
From
Amit Kapila
Date:
On Sat, Nov 9, 2024 at 8:46 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Fri, Nov 8, 2024 at 5:17 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > On 2024-Nov-07, Amit Kapila wrote: > > > > > BTW, I was thinking as to how to fix it on back branches and it seems > > > we should restrict to define REPLICA IDENTITY on stored generated > > > columns in the first place in back branches as those can't be > > > replicated. So, the following should fail: > > > > > > CREATE TABLE testpub_gencol (a INT, b INT GENERATED ALWAYS AS (a + 1) > > > STORED NOT NULL); > > > CREATE UNIQUE INDEX testpub_gencol_idx ON testpub_gencol (b); > > > ALTER TABLE testpub_gencol REPLICA IDENTITY USING index testpub_gencol_idx; > > > > > > Peter, do you have an opinion on this? > > > > I think a blanket restriction of this sort is not a good idea (at least > > in back branches), because there might be people using replica > > identities with stacks other than pgoutput. > > > > Do you mean to say that people using plugins other than pgoutput may > already be sending generated columns, so defining replica identity > should be okay for them? > If we don't want to add a restriction to not create replica identity on generated columns then I think the solution similar to HEAD should be okay which is to restrict UPDATE/DELETE in such cases. Also, another point against restricting defining REPLICA IDENTITY on generated columns is that we do allow generated columns to be PRIMARY KEY which is a DEFAULT for REPLICA IDENTITY, so that also needs to be restricted. That won't be a good idea. -- With Regards, Amit Kapila.
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY
From
Alvaro Herrera
Date:
On 2024-Nov-09, Amit Kapila wrote: > On Fri, Nov 8, 2024 at 5:17 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > On 2024-Nov-07, Amit Kapila wrote: > > > > > BTW, I was thinking as to how to fix it on back branches and it seems > > > we should restrict to define REPLICA IDENTITY on stored generated > > > columns in the first place in back branches as those can't be > > > replicated. > > I think a blanket restriction of this sort is not a good idea (at least > > in back branches), because there might be people using replica > > identities with stacks other than pgoutput. > > Do you mean to say that people using plugins other than pgoutput may > already be sending generated columns, so defining replica identity > should be okay for them? Yes. > If we don't want to add a restriction to not create replica identity > on generated columns then I think the solution similar to HEAD should > be okay which is to restrict UPDATE/DELETE in such cases. Hmm, I don't know about this. Maybe nobody cares, but I'm uneasy about it. I'm wondering about hypothetical cases where people is already using this combination of features in stable branches, without pgoutput. I think it's not great to add restrictions that didn't exist when they upgraded to some stable branch. In branch master it's probably okay, because they'll have to test before upgrading and they'll realize the problem and have the chance to adjust (or complain) before calling the upgrade good. But if we do that for stable branches, we'd deprive them of the ability to do minor upgrades, which would be Not Good. So, another option is to do nothing for stable branches. > > Would it work to enforce the restriction when such a table is added > > to a publication? > > But what if somebody defines REPLICA IDENTITY on the generated column > after adding the table to the publication? Well, maybe we can restrict the change of REPLICA IDENTITY if the table is already in a pgoutput publication? On 2024-Nov-12, Amit Kapila wrote: > Also, another point against restricting defining REPLICA IDENTITY on > generated columns is that we do allow generated columns to be PRIMARY > KEY which is a DEFAULT for REPLICA IDENTITY, so that also needs to be > restricted. That won't be a good idea. Oh, that's a good point too. It's not clear to me why doesn't pgoutput cope with generated columns in replica identities. Maybe that can be reconsidered? -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "La persona que no quería pecar / estaba obligada a sentarse en duras y empinadas sillas / desprovistas, por cierto de blandos atenuantes" (Patricio Vogel)
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY
From
Amit Kapila
Date:
On Tue, Nov 12, 2024 at 2:15 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2024-Nov-09, Amit Kapila wrote: > > > On Fri, Nov 8, 2024 at 5:17 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > > > > On 2024-Nov-07, Amit Kapila wrote: > > > > > > > BTW, I was thinking as to how to fix it on back branches and it seems > > > > we should restrict to define REPLICA IDENTITY on stored generated > > > > columns in the first place in back branches as those can't be > > > > replicated. > > > > I think a blanket restriction of this sort is not a good idea (at least > > > in back branches), because there might be people using replica > > > identities with stacks other than pgoutput. > > > > Do you mean to say that people using plugins other than pgoutput may > > already be sending generated columns, so defining replica identity > > should be okay for them? > > Yes. > > > If we don't want to add a restriction to not create replica identity > > on generated columns then I think the solution similar to HEAD should > > be okay which is to restrict UPDATE/DELETE in such cases. > > Hmm, I don't know about this. Maybe nobody cares, but I'm uneasy about > it. I'm wondering about hypothetical cases where people is already > using this combination of features in stable branches, without pgoutput. > I think it's not great to add restrictions that didn't exist when they > upgraded to some stable branch. In branch master it's probably okay, > because they'll have to test before upgrading and they'll realize the > problem and have the chance to adjust (or complain) before calling the > upgrade good. But if we do that for stable branches, we'd deprive them > of the ability to do minor upgrades, which would be Not Good. > > So, another option is to do nothing for stable branches. > Fair enough. The other point in favor of that option is that nobody has reported this problem yet but my guess is that they would have probably not used such a combination at least with pgoutput plugin otherwise, they would have faced the ERRORs on the subscriber. So, we can do this only for HEAD and decide on the fix if anyone ever reports this problem. > > > Would it work to enforce the restriction when such a table is added > > > to a publication? > > > > But what if somebody defines REPLICA IDENTITY on the generated column > > after adding the table to the publication? > > Well, maybe we can restrict the change of REPLICA IDENTITY if the table > is already in a pgoutput publication? > What about the PRIMARY KEY case as shared in my later email? Even apart from that the plugin is decided via slot, so we won't be able to detect from table<->publication relationship. > On 2024-Nov-12, Amit Kapila wrote: > > > Also, another point against restricting defining REPLICA IDENTITY on > > generated columns is that we do allow generated columns to be PRIMARY > > KEY which is a DEFAULT for REPLICA IDENTITY, so that also needs to be > > restricted. That won't be a good idea. > > Oh, that's a good point too. > > It's not clear to me why doesn't pgoutput cope with generated columns in > replica identities. Maybe that can be reconsidered? > In stable branches, we intentionally skip publishing generated columns as we assumed that the subscriber side also had a generated column. So, sending it would be a waste of network bandwidth. OTOH, when one tries to replicate the changes to some other database that didn't have the generated columns concept, it would create a problem. So we developed a new feature for HEAD as part of commits 745217a051 and 7054186c4e which allows the publication of generated columns when explicitly specified by the users. -- With Regards, Amit Kapila.
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY
From
Alvaro Herrera
Date:
On 2024-Nov-12, Amit Kapila wrote: > On Tue, Nov 12, 2024 at 2:15 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > So, another option is to do nothing for stable branches. > > Fair enough. The other point in favor of that option is that nobody > has reported this problem yet but my guess is that they would have > probably not used such a combination at least with pgoutput plugin > otherwise, they would have faced the ERRORs on the subscriber. So, we > can do this only for HEAD and decide on the fix if anyone ever reports > this problem. Right. > > Well, maybe we can restrict the change of REPLICA IDENTITY if the table > > is already in a pgoutput publication? > > What about the PRIMARY KEY case as shared in my later email? Even > apart from that the plugin is decided via slot, so we won't be able to > detect from table<->publication relationship. I responded to both emails together, my response is what you quoted below: > > On 2024-Nov-12, Amit Kapila wrote: > > > > > Also, another point against restricting defining REPLICA IDENTITY on > > > generated columns is that we do allow generated columns to be PRIMARY > > > KEY which is a DEFAULT for REPLICA IDENTITY, so that also needs to be > > > restricted. That won't be a good idea. > > > > Oh, that's a good point too. (I was acknowledging this as a problem case.) > > It's not clear to me why doesn't pgoutput cope with generated columns in > > replica identities. Maybe that can be reconsidered? > > In stable branches, we intentionally skip publishing generated columns > as we assumed that the subscriber side also had a generated column. > So, sending it would be a waste of network bandwidth. OTOH, when one > tries to replicate the changes to some other database that didn't have > the generated columns concept, it would create a problem. So we > developed a new feature for HEAD as part of commits 745217a051 and > 7054186c4e which allows the publication of generated columns when > explicitly specified by the users. Ah, I think it's good then, we don't need to do anything further on this. It's just not supported on earlier branches (and it doesn't work with pgoutput, though it does with other plugins); and master has a mechanism for it to work with any output plugin. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY
From
Amit Kapila
Date:
On Tue, Nov 12, 2024 at 5:37 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2024-Nov-12, Amit Kapila wrote: > > > > > It's not clear to me why doesn't pgoutput cope with generated columns in > > > replica identities. Maybe that can be reconsidered? > > > > In stable branches, we intentionally skip publishing generated columns > > as we assumed that the subscriber side also had a generated column. > > So, sending it would be a waste of network bandwidth. OTOH, when one > > tries to replicate the changes to some other database that didn't have > > the generated columns concept, it would create a problem. So we > > developed a new feature for HEAD as part of commits 745217a051 and > > 7054186c4e which allows the publication of generated columns when > > explicitly specified by the users. > > Ah, I think it's good then, we don't need to do anything further on > this. It's just not supported on earlier branches (and it doesn't work > with pgoutput, though it does with other plugins); and master has a > mechanism for it to work with any output plugin. > I think we still need a fix for the master for the case when generated columns are not published but are part of REPLICA IDENTITY as that could lead to failures in applying UPDATE and DELETE on subscriber. Am, I missing something? -- With Regards, Amit Kapila.
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY
From
Alvaro Herrera
Date:
On 2024-Nov-12, Amit Kapila wrote: > I think we still need a fix for the master for the case when generated > columns are not published but are part of REPLICA IDENTITY as that > could lead to failures in applying UPDATE and DELETE on subscriber. Ah, I thought that was already in place. > Am, I missing something? Nope, it's me who was missing something. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "Use it up, wear it out, make it do, or do without"
Re: Disallow UPDATE/DELETE on table with unpublished generated column as REPLICA IDENTITY
From
Amit Kapila
Date:
On Tue, Nov 12, 2024 at 7:05 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2024-Nov-12, Amit Kapila wrote: > > > I think we still need a fix for the master for the case when generated > > columns are not published but are part of REPLICA IDENTITY as that > > could lead to failures in applying UPDATE and DELETE on subscriber. > > Ah, I thought that was already in place. > No, we left it with the thought that we needed something for it in the back branches as well. But now that we have decided not to do anything for the back branches, we should fix it in HEAD. > > Am, I missing something? > > Nope, it's me who was missing something. > No problem, thanks for all the feedback and helping us to conclude. -- With Regards, Amit Kapila.