Re: [GENERAL] via psycopg2 or pg2pg? Move rows from one database toother - Mailing list pgsql-general
From | Thomas Güttler |
---|---|
Subject | Re: [GENERAL] via psycopg2 or pg2pg? Move rows from one database toother |
Date | |
Msg-id | 749c9060-79e5-7c39-f25a-400decf5aa43@thomas-guettler.de Whole thread Raw |
In response to | Re: [GENERAL] Move rows from one database to other (Francisco Olarte <folarte@peoplecall.com>) |
Responses |
Re: [GENERAL] via psycopg2 or pg2pg? Move rows from one database to other
|
List | pgsql-general |
Thank you for explaining the steps of your algorithm. Just one question: How to do the actual transfer of data? I see two solutions: 1, Read the data into a script (via psycopg2 (we love python)) and dump it into a second connection. 2, connect postgres to postgres and transfer the data without a database adapter like psycopg2. Regards, Thomas Am 23.02.2017 um 17:40 schrieb Francisco Olarte: > Thomas: > > On Thu, Feb 23, 2017 at 4:16 PM, Thomas Güttler > <guettliml@thomas-guettler.de> wrote: >> Am 22.02.2017 um 16:00 schrieb Adrian Klaver: >>> only written on a successful transfer. To improve the chances of >>> successful transfer more smaller transfer batches >>> rather then larger transfers. > >> I really need a solid solution. >> You said "... improve the chances of successful transfer ...". This makes me >> nervous. > > I think what Adrian say is you improve the individual transfer time, > if it fails you retry. > >> Delays are no problems, but data loss or duplication is. > > Remember you can never guarantee 'exactly once' without very complex > solutions, I think you can do "at least once" or "at most once". That > means lose or duplicate. > > That being said, IF you have some kind of global, unchanging ( at the > central site ) unique key, you could try the following, using a > holding table in each satellite and assuming you have 'on conflict do > nothing'. > > 1.- Move rows from main to holding table in the satelite, in a single > transaction. This is to let you work with an unmovable set ( as your > process is the only one touching the holding tables ). If there is > some data in holding it is no problem, they are from a previous > crashed transfer. > > 2.- Insert every thing from the holding table in main, using on > conflict do nothing. > > 3.- When everything is commited in main, truncate the satellite holding table. > > If satellite crashes in 1 it will roll back, you have not touched main. > > If you crash in 2 you will find 1 partially full in the next round, > and main will be rolled back ( it's important to not commit until > everything is done in 2, i.e., if you have read problems in the > satellite do no go to 3, just crash and rollback everything ). You can > either do a loop with the current set or append more data, your > choice, does not matter, as you have to reinsert. The on conflict do > nothing in 2 will take care of potential duplicates. > > If you crash in 3 you will transfer the lot again, but the do-nothing > in 2 will eliminate it and 3 will eventually purge it. > > You can optimize on that, but basically you just repeat this until > everything goes fine. I do these ( just with two DBs, not 100 ) and it > works. > > It does a lot of duplicate work, but only on problems, it normally runs smooth. > > If you do not have "on conflict do nothing" ( I do not remember the > versions ) you can use an extra step. Instead of inserting in main in > 2 do 2.a - Copy holding to main ( truncating before hand if copy > present ) and 2.b insert news from the copy, either by using and > anti-join with main or by deleting ( in the same transaction ) the > dupes before inserting. > > Francisco Olarte. > > -- Thomas Guettler http://www.thomas-guettler.de/
pgsql-general by date: