Re: Logical Replication WIP - Mailing list pgsql-hackers
From | Steve Singer |
---|---|
Subject | Re: Logical Replication WIP |
Date | |
Msg-id | 581687C9.2090700@ssinger.info Whole thread Raw |
In response to | Re: Logical Replication WIP (Petr Jelinek <petr@2ndquadrant.com>) |
Responses |
Re: Logical Replication WIP
|
List | pgsql-hackers |
On 10/24/2016 09:22 AM, Petr Jelinek wrote: > Hi, > > attached is updated version of the patch. > > There are quite a few improvements and restructuring, I fixed all the > bugs and basically everything that came up from the reviews and was > agreed on. There are still couple of things missing, ie column type > definition in protocol and some things related to existing data copy. Here are a few things I've noticed so far. +<programlisting> +CREATE SUBSCRIPTION mysub WITH CONNECTION <quote>dbname=foo host=bar user=repuser</quote> PUBLICATION mypub; +</programlisting> + </para> + <para> The documentation above doesn't match the syntax, CONNECTION needs to be in single quotes not double quotes I think you want +<programlisting> +CREATE SUBSCRIPTION mysub WITH CONNECTION 'dbname=foo host=bar user=repuser' PUBLICATION mypub; +</programlisting> + </para> + <para> I am not sure if this is a known issue covered by your comments about data copy but I am still having issues with error reporting on a failed subscription. I created a subscription, dropped the subscription and created a second one. The second subscription isn't active but shows no errors. P: create publication mypub for table public.a; S: create subscription mysub with connection 'dbname=test host=localhost port=5440' publication mypub; P: insert into a(b) values ('t'); S: select * FROM a; a | b ---+--- 1 | t (1 row) Everything is good Then I do S: drop subscription mysub; S: create subscription mysub2 with connection 'dbname=test host=localhost port=5440' publication mypub; P: insert into a(b) values ('f'); S: select * FROM a; a | b ---+--- 1 | t The data doesn't replicate select * FROM pg_stat_subscription; subid | subname | pid | relid | received_lsn | last_msg_send_time | last_msg_receipt_time | latest_end_lsn | latest_end_time -------+---------+-----+-------+--------------+--------------------+-----------------------+----------------+----------------- 16398| mysub2 | | | | | | | (1 row) The only thing in my log is 2016-10-30 15:27:27.038 EDT [6028] NOTICE: dropped replication slot "mysub" on publisher 2016-10-30 15:27:36.072 EDT [6028] NOTICE: created replication slot "mysub2" on publisher 2016-10-30 15:27:36.082 EDT [6028] NOTICE: synchronized table states I'd expect an error in the log or something. However, if I delete everything from the table on the subscriber then the subscription proceeds I think there are still problems with signal handling in the initial sync If I try to drop mysub2 (while the subscription is stuck instead of deleting the data) the drop hangs If I then try to kill the postmaster for the subscriber nothing happens, have to send it a -9 to go away. However once I do that and then restart the postmaster for the subscriber I start to see the duplicate key errors in the log 2016-10-30 16:00:54.635 EDT [7018] ERROR: duplicate key value violates unique constraint "a_pkey" 2016-10-30 16:00:54.635 EDT [7018] DETAIL: Key (a)=(1) already exists. 2016-10-30 16:00:54.635 EDT [7018] CONTEXT: COPY a, line 1 2016-10-30 16:00:54.637 EDT [7007] LOG: worker process: logical replication worker 16400 sync 16387 (PID 7018) exited with exit code 1 I'm not sure why I didn't get those until I restarted the postmaster but it seems to happen whenever I drop a subscription then create a new one. Creating the second subscription from the same psql session as I create/drop the first seems important in reproducing this I am also having issues dropping a second subscription from the same psql session (table a is empty on both nodes to avoid duplicate key errors) S: create subscription sub1 with connection 'host=localhost dbname=test port=5440' publication mypub; S: create subscription sub2 with connection 'host=localhost dbname=test port=5440' publication mypub; S: drop subscription sub1; S: drop subscription sub2; At this point the drop subscription hangs. > > The biggest changes are: > > I added one more prerequisite patch (the first one) which adds ephemeral > slots (or well implements UI on top of the code that was mostly already > there). The ephemeral slots are different in that they go away either on > error or when session is closed. This means the initial data sync does > not have to worry about cleaning up the slots after itself. I think this > will be useful in other places as well (for example basebackup). I > originally wanted to call them temporary slots in the UI but since the > behavior is bit different from temp tables I decided to go with what the > underlying code calls them in UI as well. > > I also split out the libpqwalreceiver rewrite to separate patch which > does just the re-architecture and does not really add new functionality. > And I did the re-architecture bit differently based on the review. > > There is now new executor module in execReplication.c, no new nodes but > several utility commands. I moved there the tuple lookup functions from > apply and also wrote new interfaces for doing inserts/updates/deletes to > a table including index updates and constraints checks and trigger > execution but without the need for the whole nodeModifyTable handling. > > What I also did when rewriting this is implementation of the tuple > lookup also using sequential scan so that we can support replica > identity full properly. This greatly simplified the dependency handling > between pkeys and publications (by removing it completely ;) ). Also > when there is replica identity full and the table has primary key, the > code will use the primary key even though it's not replica identity > index to lookup the row so that users who want to combine the logical > replication with some kind of other system that requires replica > identity full (ie auditing) they still get usable experience. > > The way copy is done was heavily reworked. For one it uses the ephemeral > slots mentioned above. But more importantly there are now new custom > commands anymore. Instead the walsender accepts some SQL, currently > allowed are BEGIN, ROLLBACK, SELECT and COPY. The way that is > implemented is probably not perfect and it could use look from somebody > who knows bison better. How it works is that if the command sent to > walsender starts with one of the above mentioned keywords the walsender > parser passes the whole query back and it's passed then to > exec_simple_query. The main reason why we need BEGIN is so that the COPY > can use the snapshot exported by the slot creation so that there is > synchronization point when there are concurrent writes. This probably > needs more discussion. > > I also tried to keep the naming more consistent so cleaned up all > mentions of "provider" and changed them to "publisher" and also > publications don't mention that they "replicate", they just "publish" > now (that has effect on DDL syntax as well). > > > Some things that were discussed in the reviews that I didn't implement > knowingly include: > > Removal of the Oid in the pg_publication_rel, that's mainly because it > would need significant changes to pg_dump which assumes everything > that's dumped has Oid and it's not something that seems worth it as part > of this patch. > > Also didn't do the outfuncs, it's unclear to me what are the rules there > as the only DDL statement there is CreateStmt atm. > > > There are still few TODOs: > > Type info for columns. My current best idea is to write typeOid and > typemod in the relation message and add another message (type message) > that describes the type which will skip the built-in types (as we can't > really remap those without breaking a lot of software so they seem safe > to skip). I plan to do this soonish barring objections. > > Removal of use of replication origin in the table sync worker. > > Parallelization of the initial copy. And ability to resync (do new copy) > of a table. These two mainly wait for agreement over how the current way > of doing copy should work. > > >
pgsql-hackers by date: