Re: CDC/ETL system on top of logical replication with pgoutput, custom client - Mailing list pgsql-hackers

From Andres Freund
Subject Re: CDC/ETL system on top of logical replication with pgoutput, custom client
Date
Msg-id 20230731203927.6nurjhw5ltisfhc2@awork3.anarazel.de
Whole thread Raw
In response to RE: CDC/ETL system on top of logical replication with pgoutput, custom client  (José Neves <rafaneves3@msn.com>)
Responses RE: CDC/ETL system on top of logical replication with pgoutput, custom client
List pgsql-hackers
Hi,

On 2023-07-31 14:16:22 +0000, José Neves wrote:
> Hi Amit, thanks for the reply.
>
> In our worker (custom pg replication client), we care only about INSERT,
> UPDATE, and DELETE operations, which - sure - may be part of the issue.

That seems likely. Postgres streams out changes in commit order, not in order
of the changes having been made (that'd not work due to rollbacks etc). If you
just disregard transactions entirely, you'll get something bogus after
retries.

You don't need to store the details for each commit in the target system, just
up to which LSN you have processed *commit records*. E.g. if you have received
and safely stored up to commit 0/1000, you need to remember that.


Are you using the 'streaming' mode / option to pgoutput?


> 1. We have no way to match LSN operations with the respective commit, as
> they have unordered offsets.

Not sure what you mean with "unordered offsets"?


> Assuming that all of them were received in order, we would commit all data with the commit message LSN4-4000 as other
eventswould match the transaction start and end LSN interval of it.
 

Logical decoding sends out changes in a deterministic order and you won't see
out of order data when using TCP (the entire connection can obviously fail
though).

Andres



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: pgsql: Fix search_path to a safe value during maintenance operations.
Next
From: Tristen Raab
Date:
Subject: Re: Correct the documentation for work_mem