Thread: Queries are failing on standby server
Hi All
The queries are failing on the standby server. Please note that Primary server loading of data is going on and under load.
What setting do we need to configure to not conflict with queries. Below is the error.
Failure happened on 'Source' side. ErrorCode=UserErrorUnclassifiedError,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Odbc Operation Failed.,Source=Microsoft.DataTransfer.ClientLibrary.Odbc.OdbcConnector,''Type=System.Data.Odbc.OdbcException,Message=ERROR [40001] [Microsoft][ODBC PostgreSQL Wire Protocol driver][PostgreSQL]ERROR: VERROR; canceling statement due to conflict with recovery(Detail User query might have needed to see row versions that must be removed.; File postgres.c; Line 3133; Routine ProcessInterrupts; ),Source=mspsql27.,'
Thanks,
Wasim
Two options here, both settings on the replica side:
- Set "max_standby_streaming_delay" to a value that will allow your statements time to complete on the standby. If they take longer than this, you may see the same error. Note that setting this pauses ALL replication if a conflicting statement is encountered until it either completes or the statement timeout is hit. So if there's long running, conflicting statements, you can see a significant delay in any new data showing up on the replicas. This does not allow the potential bloat caused by the next option.
https://www.postgresql.org/docs/16/runtime-config-replication.html#GUC-MAX-STANDBY-STREAMING-DELAY
https://www.postgresql.org/docs/16/runtime-config-replication.html#GUC-MAX-STANDBY-STREAMING-DELAY
- Set "hot_standby_feedback" to on. This allows the replica to communicate with the primary about what queries are running so that the primary does not clean up the old rows that cause the replication conflict. Note that this makes it so queries run on the replica would have the same effect on MVCC and VACUUM that running them on the primary would have. This means long running statements will cause VACUUM to skip those rows/tables and may cause additional bloat. This is the preferred method if you do not want replication to be delayed by conflicting statements.
https://www.postgresql.org/docs/16/runtime-config-replication.html#GUC-HOT-STANDBY-FEEDBACK
https://www.postgresql.org/docs/16/runtime-config-replication.html#GUC-HOT-STANDBY-FEEDBACK
On Thu, Jul 25, 2024 at 9:30 AM Wasim Devale <wasimd60@gmail.com> wrote:
Hi AllThe queries are failing on the standby server. Please note that Primary server loading of data is going on and under load.What setting do we need to configure to not conflict with queries. Below is the error.Failure happened on 'Source' side. ErrorCode=UserErrorUnclassifiedError,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Odbc Operation Failed.,Source=Microsoft.DataTransfer.ClientLibrary.Odbc.OdbcConnector,''Type=System.Data.Odbc.OdbcException,Message=ERROR [40001] [Microsoft][ODBC PostgreSQL Wire Protocol driver][PostgreSQL]ERROR: VERROR; canceling statement due to conflict with recovery(Detail User query might have needed to see row versions that must be removed.; File postgres.c; Line 3133; Routine ProcessInterrupts; ),Source=mspsql27.,'Thanks,Wasim
How do those config options interact with replication slots?
On Thu, Jul 25, 2024 at 9:54 AM Keith Fiske <keith.fiske@crunchydata.com> wrote:
Two options here, both settings on the replica side:- Set "max_standby_streaming_delay" to a value that will allow your statements time to complete on the standby. If they take longer than this, you may see the same error. Note that setting this pauses ALL replication if a conflicting statement is encountered until it either completes or the statement timeout is hit. So if there's long running, conflicting statements, you can see a significant delay in any new data showing up on the replicas. This does not allow the potential bloat caused by the next option.
https://www.postgresql.org/docs/16/runtime-config-replication.html#GUC-MAX-STANDBY-STREAMING-DELAY- Set "hot_standby_feedback" to on. This allows the replica to communicate with the primary about what queries are running so that the primary does not clean up the old rows that cause the replication conflict. Note that this makes it so queries run on the replica would have the same effect on MVCC and VACUUM that running them on the primary would have. This means long running statements will cause VACUUM to skip those rows/tables and may cause additional bloat. This is the preferred method if you do not want replication to be delayed by conflicting statements.
https://www.postgresql.org/docs/16/runtime-config-replication.html#GUC-HOT-STANDBY-FEEDBACKOn Thu, Jul 25, 2024 at 9:30 AM Wasim Devale <wasimd60@gmail.com> wrote:Hi AllThe queries are failing on the standby server. Please note that Primary server loading of data is going on and under load.What setting do we need to configure to not conflict with queries. Below is the error.Failure happened on 'Source' side. ErrorCode=UserErrorUnclassifiedError,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Odbc Operation Failed.,Source=Microsoft.DataTransfer.ClientLibrary.Odbc.OdbcConnector,''Type=System.Data.Odbc.OdbcException,Message=ERROR [40001] [Microsoft][ODBC PostgreSQL Wire Protocol driver][PostgreSQL]ERROR: VERROR; canceling statement due to conflict with recovery(Detail User query might have needed to see row versions that must be removed.; File postgres.c; Line 3133; Routine ProcessInterrupts; ),Source=mspsql27.,'
This allows the replica to communicate with the primary about what queries are running so that the primary does not clean up the old rows that cause the replication conflict.
What does the above statement mean? Primary does not clean up the old rows ? The it will be problematic if database is under high insert delete load.
On Thu, 25 Jul, 2024, 7:24 pm Keith Fiske, <keith.fiske@crunchydata.com> wrote:
Two options here, both settings on the replica side:- Set "max_standby_streaming_delay" to a value that will allow your statements time to complete on the standby. If they take longer than this, you may see the same error. Note that setting this pauses ALL replication if a conflicting statement is encountered until it either completes or the statement timeout is hit. So if there's long running, conflicting statements, you can see a significant delay in any new data showing up on the replicas. This does not allow the potential bloat caused by the next option.
https://www.postgresql.org/docs/16/runtime-config-replication.html#GUC-MAX-STANDBY-STREAMING-DELAY- Set "hot_standby_feedback" to on. This allows the replica to communicate with the primary about what queries are running so that the primary does not clean up the old rows that cause the replication conflict. Note that this makes it so queries run on the replica would have the same effect on MVCC and VACUUM that running them on the primary would have. This means long running statements will cause VACUUM to skip those rows/tables and may cause additional bloat. This is the preferred method if you do not want replication to be delayed by conflicting statements.
https://www.postgresql.org/docs/16/runtime-config-replication.html#GUC-HOT-STANDBY-FEEDBACKOn Thu, Jul 25, 2024 at 9:30 AM Wasim Devale <wasimd60@gmail.com> wrote:Hi AllThe queries are failing on the standby server. Please note that Primary server loading of data is going on and under load.What setting do we need to configure to not conflict with queries. Below is the error.Failure happened on 'Source' side. ErrorCode=UserErrorUnclassifiedError,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Odbc Operation Failed.,Source=Microsoft.DataTransfer.ClientLibrary.Odbc.OdbcConnector,''Type=System.Data.Odbc.OdbcException,Message=ERROR [40001] [Microsoft][ODBC PostgreSQL Wire Protocol driver][PostgreSQL]ERROR: VERROR; canceling statement due to conflict with recovery(Detail User query might have needed to see row versions that must be removed.; File postgres.c; Line 3133; Routine ProcessInterrupts; ),Source=mspsql27.,'Thanks,Wasim--
On Thu, Jul 25, 2024 at 12:54 PM Wasim Devale <wasimd60@gmail.com> wrote:
This allows the replica to communicate with the primary about what queries are running so that the primary does not clean up the old rows that cause the replication conflict.What does the above statement mean? Primary does not clean up the old rows ?
If you're executing long-running queries on the replica, then you you don't want the old rows cleaned up until the long-running query is complete.
The it will be problematic if database is under high insert delete load.
Which is why it's only recommended in rare circumstances.
On Thu, 25 Jul, 2024, 7:24 pm Keith Fiske, <keith.fiske@crunchydata.com> wrote:Two options here, both settings on the replica side:- Set "max_standby_streaming_delay" to a value that will allow your statements time to complete on the standby. If they take longer than this, you may see the same error. Note that setting this pauses ALL replication if a conflicting statement is encountered until it either completes or the statement timeout is hit. So if there's long running, conflicting statements, you can see a significant delay in any new data showing up on the replicas. This does not allow the potential bloat caused by the next option.
https://www.postgresql.org/docs/16/runtime-config-replication.html#GUC-MAX-STANDBY-STREAMING-DELAY- Set "hot_standby_feedback" to on. This allows the replica to communicate with the primary about what queries are running so that the primary does not clean up the old rows that cause the replication conflict. Note that this makes it so queries run on the replica would have the same effect on MVCC and VACUUM that running them on the primary would have. This means long running statements will cause VACUUM to skip those rows/tables and may cause additional bloat. This is the preferred method if you do not want replication to be delayed by conflicting statements.
https://www.postgresql.org/docs/16/runtime-config-replication.html#GUC-HOT-STANDBY-FEEDBACKOn Thu, Jul 25, 2024 at 9:30 AM Wasim Devale <wasimd60@gmail.com> wrote:Hi AllThe queries are failing on the standby server. Please note that Primary server loading of data is going on and under load.What setting do we need to configure to not conflict with queries. Below is the error.Failure happened on 'Source' side. ErrorCode=UserErrorUnclassifiedError,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Odbc Operation Failed.,Source=Microsoft.DataTransfer.ClientLibrary.Odbc.OdbcConnector,''Type=System.Data.Odbc.OdbcException,Message=ERROR [40001] [Microsoft][ODBC PostgreSQL Wire Protocol driver][PostgreSQL]ERROR: VERROR; canceling statement due to conflict with recovery(Detail User query might have needed to see row versions that must be removed.; File postgres.c; Line 3133; Routine ProcessInterrupts; ),Source=mspsql27.,'Thanks,Wasim--
ERROR: User query might have needed to see row versions that must be removed.canceling statement due to conflict with recovery ERROR: canceling statement due to conflict with recovery SQL state: 40001 Detail: User query might have needed to see row versions that must be removed.
So how to tackle the above error. PG version is 12.8 and has a replication slot created.
On Thu, Jul 25, 2024 at 10:30 PM Ron Johnson <ronljohnsonjr@gmail.com> wrote:
On Thu, Jul 25, 2024 at 12:54 PM Wasim Devale <wasimd60@gmail.com> wrote:This allows the replica to communicate with the primary about what queries are running so that the primary does not clean up the old rows that cause the replication conflict.What does the above statement mean? Primary does not clean up the old rows ?If you're executing long-running queries on the replica, then you you don't want the old rows cleaned up until the long-running query is complete.The it will be problematic if database is under high insert delete load.Which is why it's only recommended in rare circumstances.On Thu, 25 Jul, 2024, 7:24 pm Keith Fiske, <keith.fiske@crunchydata.com> wrote:Two options here, both settings on the replica side:- Set "max_standby_streaming_delay" to a value that will allow your statements time to complete on the standby. If they take longer than this, you may see the same error. Note that setting this pauses ALL replication if a conflicting statement is encountered until it either completes or the statement timeout is hit. So if there's long running, conflicting statements, you can see a significant delay in any new data showing up on the replicas. This does not allow the potential bloat caused by the next option.
https://www.postgresql.org/docs/16/runtime-config-replication.html#GUC-MAX-STANDBY-STREAMING-DELAY- Set "hot_standby_feedback" to on. This allows the replica to communicate with the primary about what queries are running so that the primary does not clean up the old rows that cause the replication conflict. Note that this makes it so queries run on the replica would have the same effect on MVCC and VACUUM that running them on the primary would have. This means long running statements will cause VACUUM to skip those rows/tables and may cause additional bloat. This is the preferred method if you do not want replication to be delayed by conflicting statements.
https://www.postgresql.org/docs/16/runtime-config-replication.html#GUC-HOT-STANDBY-FEEDBACKOn Thu, Jul 25, 2024 at 9:30 AM Wasim Devale <wasimd60@gmail.com> wrote:Hi AllThe queries are failing on the standby server. Please note that Primary server loading of data is going on and under load.What setting do we need to configure to not conflict with queries. Below is the error.Failure happened on 'Source' side. ErrorCode=UserErrorUnclassifiedError,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Odbc Operation Failed.,Source=Microsoft.DataTransfer.ClientLibrary.Odbc.OdbcConnector,''Type=System.Data.Odbc.OdbcException,Message=ERROR [40001] [Microsoft][ODBC PostgreSQL Wire Protocol driver][PostgreSQL]ERROR: VERROR; canceling statement due to conflict with recovery(Detail User query might have needed to see row versions that must be removed.; File postgres.c; Line 3133; Routine ProcessInterrupts; ),Source=mspsql27.,'Thanks,Wasim--
> On Jul 25, 2024, at 11:17 AM, Wasim Devale <wasimd60@gmail.com> wrote: > > So how to tackle the above error. ??? The two options to mitigate this, and their side effects, were explained to you. What more do you want?
Our company's production in the evening only and has heavy loading and unloading of data. So I can suggest them not to run the long running queries in bulk for analysis at that peak time and will ask them to use them off peak hours. Correct? Any suggestions from you.
On Fri, 26 Jul, 2024, 12:05 am Scott Ribe, <scott_ribe@elevated-dev.com> wrote:
> On Jul 25, 2024, at 11:17 AM, Wasim Devale <wasimd60@gmail.com> wrote:
>
> So how to tackle the above error.
???
The two options to mitigate this, and their side effects, were explained to you. What more do you want?
Hi Wasim,
I think you might have misinterpreted the explanation given to you. The cancellation of the query on the standby server isn't related to the load on the primary server. It happens that when you run queries on a hot standby, the replication is temporarily paused in order to not modify data the running queries on the standby server need. Once the queries end, replication resumes.
The problem of this behaviour is that the standby server starts to fall behind in relation to the master, a scenario which presents a risky condition: if the master happens to fail while the replica is delayed you end up with data loss.
To avoid having a standby server lagging too far behind Postgres will cancel long running queries on the replica. The parameter max_standby_streaming_delay defines the maximum replication delay the standby will tolerate. Default is 30 seconds. Increase the value to allow for longer running queries on the standby server bearing in mind that you could end up with data loss if the master fails at the wrong moment.
A working alternative is to have one standby server exclusively for replication purposes and another standby for reporting/read-only queries where you can increase the max_standby_streaming_delay to accommodate your long running queries. Of course, this will require additional computing and storage resources.
Cheers,
Fernando.
El jue, 25 jul 2024 a la(s) 3:41 p.m., Wasim Devale (wasimd60@gmail.com) escribió:
Our company's production in the evening only and has heavy loading and unloading of data. So I can suggest them not to run the long running queries in bulk for analysis at that peak time and will ask them to use them off peak hours. Correct? Any suggestions from you.
On Fri, 26 Jul, 2024, 12:05 am Scott Ribe, <scott_ribe@elevated-dev.com> wrote:> On Jul 25, 2024, at 11:17 AM, Wasim Devale <wasimd60@gmail.com> wrote:
>
> So how to tackle the above error.
???
The two options to mitigate this, and their side effects, were explained to you. What more do you want?
On Thu, Jul 25, 2024 at 7:57 PM Fernando Hevia <fhevia@gmail.com> wrote:
Hi Wasim,I think you might have misinterpreted the explanation given to you. The cancellation of the query on the standby server isn't related to the load on the primary server. It happens that when you run queries on a hot standby, the replication is temporarily paused in order to not modify data the running queries on the standby server need. Once the queries end, replication resumes.The problem of this behaviour is that the standby server starts to fall behind in relation to the master, a scenario which presents a risky condition: if the master happens to fail while the replica is delayed you end up with data loss.To avoid having a standby server lagging too far behind Postgres will cancel long running queries on the replica. The parameter max_standby_streaming_delay defines the maximum replication delay the standby will tolerate. Default is 30 seconds. Increase the value to allow for longer running queries on the standby server bearing in mind that you could end up with data loss if the master fails at the wrong moment.A working alternative is to have one standby server exclusively for replication purposes and another standby for reporting/read-only queries where you can increase the max_standby_streaming_delay to accommodate your long running queries. Of course, this will require additional computing and storage resources.Cheers,Fernando.
This is all true, but the hot_standby_feedback option is the way to get around needing to worry about replication delay all together. As far as how it affects VACUUM, it's no different to how running those same queries on the primary would affect it. The reason I mention it is that people think that moving queries to the replica takes away all the effects of running them on the primary. It takes away the load of the query, but there are side effects that still have to be managed. Either of the options mentioned are fine to do as long as you know the consequences of them.
On Thu, 2024-07-25 at 22:59 -0400, Keith Fiske wrote: > On Thu, Jul 25, 2024 at 7:57 PM Fernando Hevia <fhevia@gmail.com> wrote: > > I think you might have misinterpreted the explanation given to you. The cancellation of the > > query on the standby server isn't related to the load on the primary server. It happens that > > when you run queries on a hot standby, the replication is temporarily paused in order to not > > modify data the running queries on the standby server need. Replication (applying the WAL information) is only paused if there is a conflict. Even when replay is paused, the WAL is still replicated to the standby and piles up there. > > Once the queries end, replication resumes. > > The problem of this behaviour is that the standby server starts to fall behind in relation > > to the master, a scenario which presents a risky condition: if the master happens to fail > > while the replica is delayed you end up with data loss. No, because the WAL is replayed. What happens is that promoting the standby will take longer if it has to replay a lot of WAL. > > To avoid having a standby server lagging too far behind Postgres will cancel long running > > queries on the replica. The parameter max_standby_streaming_delay defines the maximum > > replication delay the standby will tolerate. Default is 30 seconds. Increase the value to > > allow for longer running queries on the standby server bearing in mind that you could end > > up with data loss if the master fails at the wrong moment. Yes, increasing "max_standby_streaming_delay" is the correct solution. You can set it to -1 to prevent any queries on the standby from bein cancelled. > > A working alternative is to have one standby server exclusively for replication purposes > > and another standby for reporting/read-only queries where you can increase the > > max_standby_streaming_delay to accommodate your long running queries. Of course, this will > > require additional computing and storage resources. That is good advice. > > > > This is all true, but the hot_standby_feedback option is the way to get around needing to > worry about replication delay all together. No, because there are other kinds of replication conflicts. The most frequent are: - lock conflicts They can occur whenever an ACCESS EXCLUSIVE lock on the primary conflicts with a query on the standby. The most frequent cause is VACUUM truncation (which can be disabled for individual tables). - buffer pin conflicts It depends on the workload if you get them, but you cannot get rid of them. Yours, Laurenz Albe
Thanks everyone for your inputs and solutions.
On Fri, 26 Jul, 2024, 10:38 am Laurenz Albe, <laurenz.albe@cybertec.at> wrote:
On Thu, 2024-07-25 at 22:59 -0400, Keith Fiske wrote:
> On Thu, Jul 25, 2024 at 7:57 PM Fernando Hevia <fhevia@gmail.com> wrote:
> > I think you might have misinterpreted the explanation given to you. The cancellation of the
> > query on the standby server isn't related to the load on the primary server. It happens that
> > when you run queries on a hot standby, the replication is temporarily paused in order to not
> > modify data the running queries on the standby server need.
Replication (applying the WAL information) is only paused if there is a conflict.
Even when replay is paused, the WAL is still replicated to the standby and piles up there.
> > Once the queries end, replication resumes.
> > The problem of this behaviour is that the standby server starts to fall behind in relation
> > to the master, a scenario which presents a risky condition: if the master happens to fail
> > while the replica is delayed you end up with data loss.
No, because the WAL is replayed.
What happens is that promoting the standby will take longer if it has to replay a lot of WAL.
> > To avoid having a standby server lagging too far behind Postgres will cancel long running
> > queries on the replica. The parameter max_standby_streaming_delay defines the maximum
> > replication delay the standby will tolerate. Default is 30 seconds. Increase the value to
> > allow for longer running queries on the standby server bearing in mind that you could end
> > up with data loss if the master fails at the wrong moment.
Yes, increasing "max_standby_streaming_delay" is the correct solution.
You can set it to -1 to prevent any queries on the standby from bein cancelled.
> > A working alternative is to have one standby server exclusively for replication purposes
> > and another standby for reporting/read-only queries where you can increase the
> > max_standby_streaming_delay to accommodate your long running queries. Of course, this will
> > require additional computing and storage resources.
That is good advice.
> > >
> This is all true, but the hot_standby_feedback option is the way to get around needing to
> worry about replication delay all together.
No, because there are other kinds of replication conflicts. The most frequent are:
- lock conflicts
They can occur whenever an ACCESS EXCLUSIVE lock on the primary conflicts with
a query on the standby. The most frequent cause is VACUUM truncation (which can
be disabled for individual tables).
- buffer pin conflicts
It depends on the workload if you get them, but you cannot get rid of them.
Yours,
Laurenz Albe
Hi ,
Set the below parameters on standby node.
max_standby_archive_delay = 300s
max_standby_streaming_delay = 300s
Thanks
Obireddy.G
On Fri, 26 Jul 2024, 11:44 Wasim Devale, <wasimd60@gmail.com> wrote:
Thanks everyone for your inputs and solutions.
On Fri, 26 Jul, 2024, 10:38 am Laurenz Albe, <laurenz.albe@cybertec.at> wrote:On Thu, 2024-07-25 at 22:59 -0400, Keith Fiske wrote:
> On Thu, Jul 25, 2024 at 7:57 PM Fernando Hevia <fhevia@gmail.com> wrote:
> > I think you might have misinterpreted the explanation given to you. The cancellation of the
> > query on the standby server isn't related to the load on the primary server. It happens that
> > when you run queries on a hot standby, the replication is temporarily paused in order to not
> > modify data the running queries on the standby server need.
Replication (applying the WAL information) is only paused if there is a conflict.
Even when replay is paused, the WAL is still replicated to the standby and piles up there.
> > Once the queries end, replication resumes.
> > The problem of this behaviour is that the standby server starts to fall behind in relation
> > to the master, a scenario which presents a risky condition: if the master happens to fail
> > while the replica is delayed you end up with data loss.
No, because the WAL is replayed.
What happens is that promoting the standby will take longer if it has to replay a lot of WAL.
> > To avoid having a standby server lagging too far behind Postgres will cancel long running
> > queries on the replica. The parameter max_standby_streaming_delay defines the maximum
> > replication delay the standby will tolerate. Default is 30 seconds. Increase the value to
> > allow for longer running queries on the standby server bearing in mind that you could end
> > up with data loss if the master fails at the wrong moment.
Yes, increasing "max_standby_streaming_delay" is the correct solution.
You can set it to -1 to prevent any queries on the standby from bein cancelled.
> > A working alternative is to have one standby server exclusively for replication purposes
> > and another standby for reporting/read-only queries where you can increase the
> > max_standby_streaming_delay to accommodate your long running queries. Of course, this will
> > require additional computing and storage resources.
That is good advice.
> > >
> This is all true, but the hot_standby_feedback option is the way to get around needing to
> worry about replication delay all together.
No, because there are other kinds of replication conflicts. The most frequent are:
- lock conflicts
They can occur whenever an ACCESS EXCLUSIVE lock on the primary conflicts with
a query on the standby. The most frequent cause is VACUUM truncation (which can
be disabled for individual tables).
- buffer pin conflicts
It depends on the workload if you get them, but you cannot get rid of them.
Yours,
Laurenz Albe
This is an issue at most place. Don't give out the r/o login to the Replica. The Replica is not a data research /prod sandboxbox, it's a streaming replica for HA failover. If you want a research r/o query box, create one. -----Original Message----- From: Scott Ribe <scott_ribe@elevated-dev.com> Sent: Thursday, July 25, 2024 11:36 AM To: Wasim Devale <wasimd60@gmail.com> Cc: Ron Johnson <ronljohnsonjr@gmail.com>; pgsql-admin <pgsql-admin@postgresql.org> Subject: [EXTERNAL] Re: Queries are failing on standby server > On Jul 25, 2024, at 11:17 AM, Wasim Devale <wasimd60@gmail.com> wrote: > > So how to tackle the above error. ??? The two options to mitigate this, and their side effects, were explained to you. What more do you want? ---------------------------------------------------------------------- CONFIDENTIALITY NOTICE: If you have received this email in error, please immediately notify the sender by e-mail at the addressshown. This email transmission may contain confidential information. This information is intended only for the useof the individual(s) or entity to whom it is intended even if addressed incorrectly. Please delete it from your filesif you are not the intended recipient. Thank you for your compliance. Copyright (c) 2024 Evernorth
> > The problem of this behaviour is that the standby server starts to fall behind in relation
> > to the master, a scenario which presents a risky condition: if the master happens to fail
> > while the replica is delayed you end up with data loss.
> > to the master, a scenario which presents a risky condition: if the master happens to fail
> > while the replica is delayed you end up with data loss.
No, because the WAL is replayed.
What happens is that promoting the standby will take longer if it has to replay a lot of WAL.
I definitely messed that one up. Thanks for the correction Laurenz.
Regards,
Fernando.
El vie, 26 jul 2024 a la(s) 2:08 a.m., Laurenz Albe (laurenz.albe@cybertec.at) escribió:
On Thu, 2024-07-25 at 22:59 -0400, Keith Fiske wrote:
> On Thu, Jul 25, 2024 at 7:57 PM Fernando Hevia <fhevia@gmail.com> wrote:
> > I think you might have misinterpreted the explanation given to you. The cancellation of the
> > query on the standby server isn't related to the load on the primary server. It happens that
> > when you run queries on a hot standby, the replication is temporarily paused in order to not
> > modify data the running queries on the standby server need.
Replication (applying the WAL information) is only paused if there is a conflict.
Even when replay is paused, the WAL is still replicated to the standby and piles up there.
> > Once the queries end, replication resumes.
> > The problem of this behaviour is that the standby server starts to fall behind in relation
> > to the master, a scenario which presents a risky condition: if the master happens to fail
> > while the replica is delayed you end up with data loss.
No, because the WAL is replayed.
What happens is that promoting the standby will take longer if it has to replay a lot of WAL.
> > To avoid having a standby server lagging too far behind Postgres will cancel long running
> > queries on the replica. The parameter max_standby_streaming_delay defines the maximum
> > replication delay the standby will tolerate. Default is 30 seconds. Increase the value to
> > allow for longer running queries on the standby server bearing in mind that you could end
> > up with data loss if the master fails at the wrong moment.
Yes, increasing "max_standby_streaming_delay" is the correct solution.
You can set it to -1 to prevent any queries on the standby from bein cancelled.
> > A working alternative is to have one standby server exclusively for replication purposes
> > and another standby for reporting/read-only queries where you can increase the
> > max_standby_streaming_delay to accommodate your long running queries. Of course, this will
> > require additional computing and storage resources.
That is good advice.
> > >
> This is all true, but the hot_standby_feedback option is the way to get around needing to
> worry about replication delay all together.
No, because there are other kinds of replication conflicts. The most frequent are:
- lock conflicts
They can occur whenever an ACCESS EXCLUSIVE lock on the primary conflicts with
a query on the standby. The most frequent cause is VACUUM truncation (which can
be disabled for individual tables).
- buffer pin conflicts
It depends on the workload if you get them, but you cannot get rid of them.
Yours,
Laurenz Albe