Re: Moving delta data faster - Mailing list pgsql-general
From | Adrian Klaver |
---|---|
Subject | Re: Moving delta data faster |
Date | |
Msg-id | b43139d3-8f8e-4635-a7cb-cab90e5205eb@aklaver.com Whole thread Raw |
In response to | Re: Moving delta data faster (yudhi s <learnerdatabase99@gmail.com>) |
Responses |
Re: Moving delta data faster
|
List | pgsql-general |
On 4/3/24 20:54, yudhi s wrote: > On Thu, Apr 4, 2024 at 2:41 AM Adrian Klaver <adrian.klaver@aklaver.com > <mailto:adrian.klaver@aklaver.com>> wrote: > > On 4/3/24 13:38, yudhi s wrote: > > Hi All, > > It's postgresql database version 15.4. We have a requirement in > which > > we will be initially moving full table data for 3-4 tables, from > source > > database to target(i.e. postgres) . Maximum number of rows will be > > ~10million rows in those tables. Then subsequently these rows > will be > > inserted/updated based on the delta number of rows that got > > inserted/updated in the source database. In some cases these changed > > data can flow multiple times per day to the downstream i.e. postgres > > database and in other cases once daily. > > What is the source database? > > Can it be reached with a FDW?: > > https://wiki.postgresql.org/wiki/Foreign_data_wrappers > <https://wiki.postgresql.org/wiki/Foreign_data_wrappers> > > Can the delta on the source be output as CSV? > > > > Thank you Adrian. > > And one thing i forgot to mention this target postgresql database would > be on AWS RDS whereas the source Oracle databases is on premise. I think > we don't have the FDW extension currently in place but we can get that. > I am just not able to understand clearly though, but do you mean export > the data from source using CSV and do truncate and import on target. And > as these data will be moved through the network won't that cause slowness? > > The source database here is Oracle database. Correct me if wrong, it > looks like foreign data wrapper is like a DB link. Or do you mean > writing a query on the target database (which can be UPSERT or MERGE) > but will be joining the table from the source database through the > DBlink/DDW? But my question was whether we should use UPSERT or MERGE > for comparing and loading the delta records to the target postgresql > database. Want to understand which is more performant , as I see in the > past Merge having performance issues in the past, but not very sure > about that. My motivation was to get some basic information about your setup and what you are trying to achieve. If I understand correctly you have: 1) An Oracle database with tables that you want to copy the complete data from to a Postgres database. For this sort of thing COPY(https://www.postgresql.org/docs/current/sql-copy.html) on the Postgres end using CSV data generated from the source is probably the quickest bulk load method. 2) After the initial load you want to do follow up INSERT/UPDATEs based on a delta of the source tables relative to the initial load. This is still a bit of mystery to me. How are determining the delta: a) On the source end entirely or b) Target relative to source? Also what is the anticipated size of the delta per transfer? Additional information needed: 1) Network distance between source and target? 2) Network capacity? 3) Expected load on both source and target servers from other operations? -- Adrian Klaver adrian.klaver@aklaver.com
pgsql-general by date: