Thread: Logical replication can be broken by domain constraint with NOT VALIDoption
Logical replication can be broken by domain constraint with NOT VALIDoption
From
Andrey Lepikhov
Date:
Hi, During patch development I ran into a small problem (see attachment, fail_replication.sh): 1. We have a table with logical replication to another node. 2. On the master and replica add such "NOT VALID" domain constraint on the table that some tuples violates the constraint. 3. UPDATE the table: set value of the tuple that violates constraint to correct value. 4. That's all! The reason for this problem is that on UPDATE walsender sends old tuple value (that violates the constraint) with new version (satisfied the constraint). Replication worker at replica node restores slot from transfer representation. During this process domain checking constraint and returns an ERROR. Because we can't apply WAL record of the UPDATE command, logical replication will be stopped at all. As I understand, this problem can be reproduced in all postgres versions with logical replication feature. This problem can be solved by many ways and approaches. I wrote the patch to solve this problem (see in attachment) by the shortest way. -- Andrey Lepikhov Postgres Professional https://postgrespro.com The Russian Postgres Company
Attachment
Re: Logical replication can be broken by domain constraint with NOT VALID option
From
Tom Lane
Date:
Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes: > During patch development I ran into a small problem (see attachment, > fail_replication.sh): > 1. We have a table with logical replication to another node. > 2. On the master and replica add such "NOT VALID" domain constraint on > the table that some tuples violates the constraint. > 3. UPDATE the table: set value of the tuple that violates constraint to > correct value. > 4. That's all! > The reason for this problem is that on UPDATE walsender sends old tuple > value (that violates the constraint) with new version (satisfied the > constraint). > Replication worker at replica node restores slot from transfer > representation. During this process domain checking constraint and > returns an ERROR. I'm not sure this is something we should attempt to fix. There are an infinite number of ways you can break logical replication by presenting it with inconsistent data, and that's really what you've done here. > This problem can be solved by many ways and approaches. I wrote the > patch to solve this problem (see in attachment) by the shortest way. That patch is certainly utterly unacceptable. It'd allow the receipient to accept data that violates the domain constraint. The situation you're describing would probably best be handled by not adding the constraint on the replica side until all the bad data has been corrected (and replicated). regards, tom lane
Re: Logical replication can be broken by domain constraint with NOTVALID option
From
Andrey Lepikhov
Date:
On 03/11/2019 20:42, Tom Lane wrote: > Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes: >> The reason for this problem is that on UPDATE walsender sends old tuple >> value (that violates the constraint) with new version (satisfied the >> constraint). >> Replication worker at replica node restores slot from transfer >> representation. During this process domain checking constraint and >> returns an ERROR. > > I'm not sure this is something we should attempt to fix. There are > an infinite number of ways you can break logical replication by > presenting it with inconsistent data, and that's really what you've > done here. This problem reproduced by standard way from the documentation. I assume this inconsistency option is allowed by SQL standard because it has a practical usage. > >> This problem can be solved by many ways and approaches. I wrote the >> patch to solve this problem (see in attachment) by the shortest way. > > That patch is certainly utterly unacceptable. It'd allow the > receipient to accept data that violates the domain constraint. If this is the only reason, I propose a new version of the patch (see in attachment). It is satisfy the "Paranoid safety" rule. > > The situation you're describing would probably best be handled by > not adding the constraint on the replica side until all the > bad data has been corrected (and replicated). On any PostgreSQL-based multimaster system, this will be a problem. -- regards, Andrey Lepikhov Postgres Professional https://postgrespro.com The Russian Postgres Company
Attachment
Re: Logical replication can be broken by domain constraint with NOTVALID option
From
Euler Taveira
Date:
Em dom., 3 de nov. de 2019 às 23:33, Andrey Lepikhov <a.lepikhov@postgrespro.ru> escreveu: > > On 03/11/2019 20:42, Tom Lane wrote: > > Andrey Lepikhov <a.lepikhov@postgrespro.ru> writes: > >> The reason for this problem is that on UPDATE walsender sends old tuple > >> value (that violates the constraint) with new version (satisfied the > >> constraint). > >> Replication worker at replica node restores slot from transfer > >> representation. During this process domain checking constraint and > >> returns an ERROR. > > > > I'm not sure this is something we should attempt to fix. There are > > an infinite number of ways you can break logical replication by > > presenting it with inconsistent data, and that's really what you've > > done here. > > This problem reproduced by standard way from the documentation. I assume > this inconsistency option is allowed by SQL standard because it has a > practical usage. > Could you point out the problem in the documentation? > > > >> This problem can be solved by many ways and approaches. I wrote the > >> patch to solve this problem (see in attachment) by the shortest way. > > > > That patch is certainly utterly unacceptable. It'd allow the > > receipient to accept data that violates the domain constraint. > > If this is the only reason, I propose a new version of the patch (see in > attachment). It is satisfy the "Paranoid safety" rule. > > I don't think that is acceptable either. If you have different schemas (even for a small period of time), you should handle it dropping and recreating the constraints. Logical replication is far from a complete feature. There should be cases that someone wants to enforce even the FK constraints in the subscriber. I certainly wouldn't like to open that can of worms. Relaxing constraints could lead to inconsistent datasets across nodes. If you want to accept constraint violation, drop the constraints. > > The situation you're describing would probably best be handled by > > not adding the constraint on the replica side until all the > > bad data has been corrected (and replicated). > > On any PostgreSQL-based multimaster system, this will be a problem. > ... if you do not replicate DDLs in the same order it occurs or if you have different schemas. -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
Re: Logical replication can be broken by domain constraint with NOTVALID option
From
Andrey Lepikhov
Date:
On 05/11/2019 20:21, Euler Taveira wrote: > Em dom., 3 de nov. de 2019 às 23:33, Andrey Lepikhov > <a.lepikhov@postgrespro.ru> escreveu: >> If this is the only reason, I propose a new version of the patch (see in >> attachment). It is satisfy the "Paranoid safety" rule. > I don't think that is acceptable either. If you have different schemas > (even for a small period of time), you should handle it dropping and > recreating the constraints. Changing schema is a big deal. But adding a constraint with "not valid" option can be used frequently. May be for change phone numbers format, for example. > Logical replication is far from a complete > feature. There should be cases that someone wants to enforce even the > FK constraints in the subscriber. I certainly wouldn't like to open > that can of worms. Relaxing constraints could lead to inconsistent > datasets across nodes. If you want to accept constraint violation, > drop the constraints. May be logical replication is incomplete. But it is no argument to not fix an errors that we found. In v2 version of the patch constraints are suppressed only for old version of the tuple that used for search in the heap and can't be applied. In this sense we do not relaxing any constraints. -- Andrey Lepikhov Postgres Professional https://postgrespro.com The Russian Postgres Company