Re: [HACKERS] Logical replication existing data copy - Mailing list pgsql-hackers
From | Peter Eisentraut |
---|---|
Subject | Re: [HACKERS] Logical replication existing data copy |
Date | |
Msg-id | 88c743e3-2c16-1eaf-bb35-ef60afb16f52@2ndquadrant.com Whole thread Raw |
In response to | [HACKERS] Logical replication existing data copy (Petr Jelinek <petr.jelinek@2ndquadrant.com>) |
Responses |
Re: [HACKERS] Logical replication existing data copy
|
List | pgsql-hackers |
On 12/19/16 4:30 AM, Petr Jelinek wrote: > This patch implements data synchronization for the logical replication. > It works both for initial setup of subscription as well as for tables > added later to replication when the subscription is already active. First detailed read-through. General structure makes sense. Comments on some details: No catalogs.sgml documentation for pg_subscription_rel. When that is added, I would emphasize that entries are only added when relations are first encountered, not immediately when a table is added to a publication. So it is unlike pg_publication_rel in that respect. Rename max_subscription_sync_workers -> max_sync_workers_per_subscription (similar to max_parallel_workers_per_gather and others) About the changes to COPY: It took me a while to get clarity on the direction of things. Is the read_cb reading the table, or the data? You are copying data produced by a function into a table, so produce_cb or data_source_cb could be clearer. SetSubscriptionRelState(): This is doing an "upsert", so I don't think a RowExclusiveLock is enough, or it needs to be able to retry on concurrent changes. I suppose there shouldn't be more than one concurrent change per sub/rel pair, but that would need to be explained there. SetSubscriptionRelState(): The memset(values, 0, sizeof(values)) is kind of in an odd place. Possibly not actually needed. GetSubscriptionRelState(): Prefer error messages that identify the objects by name. (Also subid should be %u.) GetSubscriptionRelationsFilter(): The state and filter arguments are kind of weird. And there are only two callers, so all this complexity is not even used. Perhaps, if state == SUBREL_STATE_UNKNOWN, then return everything, else return exactly the state specified. The case of everything-but-the-state-specified does not appear to be needed. GetSubscriptionRelationsFilter(): RowExclusiveLock is probably too much This patch adds the fourth definition of oid_cmp() (also known as oidComparator()) and the 12th definition of atooid(). Let's collect those in a central place. I'm sending a separate patch for that. (No need to change here until that is resolved, of course.) AlterSubscription_refresh(): Put the if (wrconn == NULL) case first and error out, and then put the rest of the code into the main function block, saving one level of indentation. AlterSubscription_refresh(): remote_table_oids isn't really the remote OIDs of the tables but the local OIDs of the remote tables. Consider clearer variable naming in that function. interesting_relation(): very generic name, maybe should_apply_changes_for_rel(). Also the comment mentions a "parallel worker" -- maybe better a "separate worker process running in parallel", since a parallel worker is something different. LogicalRepApplyLoop() changed to non-static, but not used anywhere else. process_syncing_tables_*(): Function names and associated comments are not very clear (process what?, handle what?). In process_syncing_tables_apply(), it says that logicalrep_worker_count() counts the apply worker as well, but what happens if the apply worker has temporarily disappeared? Since logicalrep_worker_count() is only used in this one place, maybe have it count just the sync workers and rename it slightly. Changed some LOG messages to DEBUG, not clear what the purpose there is. check_max_subscription_sync_workers(): Same problem as discussed previously, checking a GUC setting against another GUC setting like this doesn't work. Just skip it and check at run time. wait_for_sync_status_change(): Comment is wrong/misleading: It doesn't wait until it matches any specific state, it just waits for any state change. LogicalRepSyncTableStart(): The code that assembles the slot name needs to be aware of NAMEDATALEN. (Maybe at least throw in a static assert that NAMEDATALEN>=64.) -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: