Thread: logical replication initiate via manual pg_dump

logical replication initiate via manual pg_dump

From

Vijaykumar Jain

Date:

10 May 2019, 15:29:52

Hey Guys,

tl;dr, but incase i missed something, i can follow up on this with more details.


I have a setup where i try to upgrade a cluster from pg10 to pg11 via
logical replication with minimum downtime.
its a database that is 500GB with 1 table having 350GB of data (+
bloat) and 100GB of indexes.

now when i triggered logical replication, it took more than 2 days
(and still around 10% remaining) to catch up on the data. the
publisher shows copy table to stdout still running.
i am not sure of the internals, but given a case that in a test prep
with no active connections and DMLs it took around 2 days, does seem
like i am missing something.

on both pg10 and pg11
ram 32GB
cpu 8
SSD
max_wal_size 100GB
checkpoint_timeout 30min
shared_buffers 8GB

on pg10
(copy still running on the huge table)
select (now() - query_start)::interval, query from pg_stat_activity;
2 days 19:03:12.799767 | COPY public.<table> TO STDOUT

this is how disk looks like on pg10 (publisher)
du -sh /var/lib/postgresql/1*/main/{pg_wal*,base*}
78G /var/lib/postgresql/10/main/pg_wal
467G /var/lib/postgresql/10/main/base


on pg11
du -sh /var/lib/postgresql/1*/main/{pg_wal*,base*}
65G /var/lib/postgresql/11/main/pg_wal
417G /var/lib/postgresql/11/main/base

now although this is kind of upgrade involves very little downtime,
but it required around 2x the disk on the original server while the
replication was running, but there was not way to correctly estimate
the disk required for logical replication to finish and move over to
pg11.

---------------------

now
given the above problem,
i tried to read thru with a goal of if i can do an initial sync via
pg_dump and start the restore from there.

https://www.slideshare.net/noriyoshishinoda/pgconfasia-2017-logical-replication-internals-english
https://medium.com/avitotech/recovery-use-cases-for-logical-replication-in-postgresql-10-a1e6bab03072

and setup a small test cluster as to test if i can pg_dump and
pg_restore and then start replication from the restart_lsn of the
primary/publisher.

***
demo lab
(all pg11 for now)
pg1 (primary/publisher on port 3000)
pg2  (hot_standby replica on port 3001)
pg3  (subscriber on port 3002)

***
on pg1 (create some tables and trigger to ensure trigger does not fire
on subscriber)
example=# CREATE TABLE public.company (
example(#     id integer NOT NULL,
example(#     name text NOT NULL,
example(#     age integer NOT NULL,
example(#     address character(50),
example(#     salary real
example(# );
CREATE TABLE
example=# ALTER TABLE ONLY public.company
example-#     ADD CONSTRAINT company_pkey PRIMARY KEY (id);
ALTER TABLE
example=# CREATE TABLE public.audit (
example(#     emp_id integer NOT NULL,
example(#     entry_date text NOT NULL
example(# );
CREATE TABLE
example=# ALTER TABLE ONLY public.audit
example-#     ADD CONSTRAINT audit_pkey PRIMARY KEY (emp_id);
ALTER TABLE
example=# CREATE FUNCTION public.auditlogfunc() RETURNS trigger
example-#     LANGUAGE plpgsql
example-#     AS $$
example$#    BEGIN
example$#       INSERT INTO AUDIT(EMP_ID, ENTRY_DATE) VALUES (new.ID,
current_timestamp);
example$#       RETURN NEW;
example$#    END;
example$# $$;
CREATE FUNCTION
example=# CREATE TRIGGER example_trigger AFTER INSERT ON
public.company FOR EACH ROW EXECUTE PROCEDURE public.auditlogfunc();


***
setup pg2 as hot_standby replica
and dump the schema of pg1 example database on pg3.

***
on pg1 (insert some dummy data)
insert into company select x, x::text, x, 'address-' || x::text, x
from generate_series(1, 1500) x;

***
on pg1
create publication pg1 for all tables;

***
on pg2
verify replica is up and running and data replicated (fine)
example=# select count(1) from company;
 count
-------
  1500
(1 row)


***
on pg3
*take a dump and restore the dump on pg3.

pg_dump -p 3000 -U postgres -Fc --serializable-deferrable
--no-subscriptions --no-publications -d example | pg_restore -p 3002
-U postgres -C -d example

*then create subscriptions to pg1

create subscription pg3 connection 'dbname=example port=3000
user=postgres' publication pg1 with (enabled = false, copy_data =
false);


***
on pg1
*get the last restart_lsn value from pg_replication_slots

select * from pg_replication_slots;
 slot_name | plugin | slot_type | datoid | database | temporary |
active | active_pid | xmin | catalog_xmin | restart_lsn |
confirmed_flush_lsn

-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
 pg2       |        | physical  |        |          | f         | t
  |      22724 |      |              | 13/A8133A68 |


***
and then on pg3

select * from pg_stat_subscription;
 subid | subname | pid | relid | received_lsn | last_msg_send_time |
last_msg_receipt_time | latest_end_lsn | latest_end_time

-------+---------+-----+-------+--------------+--------------------+-----------------------+----------------+-----------------
 17104 | pg3     |     |       |              |                    |
                    |                |
(1 row)

*** manually advance the lsn of remote origin
select pg_replication_origin_advance('pg_ 17104', '13/A8133A68');

select * from pg_replication_origin_status;
 local_id | external_id | remote_lsn  | local_lsn
----------+-------------+-------------+-----------
        1 | pg_17104    | 13/A8133A68 | 0/0


*enable subscription
alter subscription pg3 enable;

select * from pg_stat_subscription;
 subid | subname |  pid  | relid | received_lsn |
last_msg_send_time       |     last_msg_receipt_time     |
latest_end_lsn |        latest_end_time

-------+---------+-------+-------+--------------+-------------------------------+-------------------------------+----------------+-------------------------------
 17104 | pg3     | 31346 |       | 13/A8137700  | 2019-05-10
14:55:08.257756+00 | 2019-05-10 14:55:08.257806+00 | 13/A8137700    |
2019-05-10 14:55:08.257756+00

(1 row)

example=# select count(1) from company;
 count
-------
  1500
(1 row)



and i see i am able to carry on with the logical replication from the
publisher via the dump.

is this the right way, i have not been tinkering with lsn and generally used to
alter subscription pg3 refresh publication with (copy_data);

but for some reason this takes forever, hence the pg_dump and restore.


questions:
when a copy is triggered on the publisher during the initial sync, is
it similar to pg_dump but with -F plain  and hence the network i/o is
huge ?
if true, is there an option to compress the initial sync like the
pg_dump -Fc format and apply if on the subscriber like pg_restore to
make the network i/o fast?

finally,
am i totally doing things wrong :) ? and this should not be done.





Regards,
Vijay

Re: logical replication initiate via manual pg_dump

From

Vijaykumar Jain

Date:

13 May 2019, 09:56:30

update:
i dropped all the indexes on the dest db tables, but the primary key
and unique constraints from base tables in logical replication to
ensure replica identity.

and retriggered the logical replication from scratch.
it completed in 4 hours. (which otherwise ran for 4 days)

so i guess it was the "too many indexes" slowed down copy way too much.
anyways, i got to explore pg_replication_origin_advance which was a cool thing.


Regards,
Vijay

On Fri, May 10, 2019 at 8:59 PM Vijaykumar Jain <vjain@opentable.com> wrote:
>
> Hey Guys,
>
> tl;dr, but incase i missed something, i can follow up on this with more details.
>
>
> I have a setup where i try to upgrade a cluster from pg10 to pg11 via
> logical replication with minimum downtime.
> its a database that is 500GB with 1 table having 350GB of data (+
> bloat) and 100GB of indexes.
>
> now when i triggered logical replication, it took more than 2 days
> (and still around 10% remaining) to catch up on the data. the
> publisher shows copy table to stdout still running.
> i am not sure of the internals, but given a case that in a test prep
> with no active connections and DMLs it took around 2 days, does seem
> like i am missing something.
>
> on both pg10 and pg11
> ram 32GB
> cpu 8
> SSD
> max_wal_size 100GB
> checkpoint_timeout 30min
> shared_buffers 8GB
>
> on pg10
> (copy still running on the huge table)
> select (now() - query_start)::interval, query from pg_stat_activity;
> 2 days 19:03:12.799767 | COPY public.<table> TO STDOUT
>
> this is how disk looks like on pg10 (publisher)
> du -sh /var/lib/postgresql/1*/main/{pg_wal*,base*}
> 78G /var/lib/postgresql/10/main/pg_wal
> 467G /var/lib/postgresql/10/main/base
>
>
> on pg11
> du -sh /var/lib/postgresql/1*/main/{pg_wal*,base*}
> 65G /var/lib/postgresql/11/main/pg_wal
> 417G /var/lib/postgresql/11/main/base
>
> now although this is kind of upgrade involves very little downtime,
> but it required around 2x the disk on the original server while the
> replication was running, but there was not way to correctly estimate
> the disk required for logical replication to finish and move over to
> pg11.
>
> ---------------------
>
> now
> given the above problem,
> i tried to read thru with a goal of if i can do an initial sync via
> pg_dump and start the restore from there.
>
> https://www.slideshare.net/noriyoshishinoda/pgconfasia-2017-logical-replication-internals-english
> https://medium.com/avitotech/recovery-use-cases-for-logical-replication-in-postgresql-10-a1e6bab03072
>
> and setup a small test cluster as to test if i can pg_dump and
> pg_restore and then start replication from the restart_lsn of the
> primary/publisher.
>
> ***
> demo lab
> (all pg11 for now)
> pg1 (primary/publisher on port 3000)
> pg2  (hot_standby replica on port 3001)
> pg3  (subscriber on port 3002)
>
> ***
> on pg1 (create some tables and trigger to ensure trigger does not fire
> on subscriber)
> example=# CREATE TABLE public.company (
> example(#     id integer NOT NULL,
> example(#     name text NOT NULL,
> example(#     age integer NOT NULL,
> example(#     address character(50),
> example(#     salary real
> example(# );
> CREATE TABLE
> example=# ALTER TABLE ONLY public.company
> example-#     ADD CONSTRAINT company_pkey PRIMARY KEY (id);
> ALTER TABLE
> example=# CREATE TABLE public.audit (
> example(#     emp_id integer NOT NULL,
> example(#     entry_date text NOT NULL
> example(# );
> CREATE TABLE
> example=# ALTER TABLE ONLY public.audit
> example-#     ADD CONSTRAINT audit_pkey PRIMARY KEY (emp_id);
> ALTER TABLE
> example=# CREATE FUNCTION public.auditlogfunc() RETURNS trigger
> example-#     LANGUAGE plpgsql
> example-#     AS $$
> example$#    BEGIN
> example$#       INSERT INTO AUDIT(EMP_ID, ENTRY_DATE) VALUES (new.ID,
> current_timestamp);
> example$#       RETURN NEW;
> example$#    END;
> example$# $$;
> CREATE FUNCTION
> example=# CREATE TRIGGER example_trigger AFTER INSERT ON
> public.company FOR EACH ROW EXECUTE PROCEDURE public.auditlogfunc();
>
>
> ***
> setup pg2 as hot_standby replica
> and dump the schema of pg1 example database on pg3.
>
> ***
> on pg1 (insert some dummy data)
> insert into company select x, x::text, x, 'address-' || x::text, x
> from generate_series(1, 1500) x;
>
> ***
> on pg1
> create publication pg1 for all tables;
>
> ***
> on pg2
> verify replica is up and running and data replicated (fine)
> example=# select count(1) from company;
>  count
> -------
>   1500
> (1 row)
>
>
> ***
> on pg3
> *take a dump and restore the dump on pg3.
>
> pg_dump -p 3000 -U postgres -Fc --serializable-deferrable
> --no-subscriptions --no-publications -d example | pg_restore -p 3002
> -U postgres -C -d example
>
> *then create subscriptions to pg1
>
> create subscription pg3 connection 'dbname=example port=3000
> user=postgres' publication pg1 with (enabled = false, copy_data =
> false);
>
>
> ***
> on pg1
> *get the last restart_lsn value from pg_replication_slots
>
> select * from pg_replication_slots;
>  slot_name | plugin | slot_type | datoid | database | temporary |
> active | active_pid | xmin | catalog_xmin | restart_lsn |
> confirmed_flush_lsn
>
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
>  pg2       |        | physical  |        |          | f         | t
>   |      22724 |      |              | 13/A8133A68 |
>
>
> ***
> and then on pg3
>
> select * from pg_stat_subscription;
>  subid | subname | pid | relid | received_lsn | last_msg_send_time |
> last_msg_receipt_time | latest_end_lsn | latest_end_time
>
-------+---------+-----+-------+--------------+--------------------+-----------------------+----------------+-----------------
>  17104 | pg3     |     |       |              |                    |
>                     |                |
> (1 row)
>
> *** manually advance the lsn of remote origin
> select pg_replication_origin_advance('pg_ 17104', '13/A8133A68');
>
> select * from pg_replication_origin_status;
>  local_id | external_id | remote_lsn  | local_lsn
> ----------+-------------+-------------+-----------
>         1 | pg_17104    | 13/A8133A68 | 0/0
>
>
> *enable subscription
> alter subscription pg3 enable;
>
> select * from pg_stat_subscription;
>  subid | subname |  pid  | relid | received_lsn |
> last_msg_send_time       |     last_msg_receipt_time     |
> latest_end_lsn |        latest_end_time
>
-------+---------+-------+-------+--------------+-------------------------------+-------------------------------+----------------+-------------------------------
>  17104 | pg3     | 31346 |       | 13/A8137700  | 2019-05-10
> 14:55:08.257756+00 | 2019-05-10 14:55:08.257806+00 | 13/A8137700    |
> 2019-05-10 14:55:08.257756+00
>
> (1 row)
>
> example=# select count(1) from company;
>  count
> -------
>   1500
> (1 row)
>
>
>
> and i see i am able to carry on with the logical replication from the
> publisher via the dump.
>
> is this the right way, i have not been tinkering with lsn and generally used to
> alter subscription pg3 refresh publication with (copy_data);
>
> but for some reason this takes forever, hence the pg_dump and restore.
>
>
> questions:
> when a copy is triggered on the publisher during the initial sync, is
> it similar to pg_dump but with -F plain  and hence the network i/o is
> huge ?
> if true, is there an option to compress the initial sync like the
> pg_dump -Fc format and apply if on the subscriber like pg_restore to
> make the network i/o fast?
>
> finally,
> am i totally doing things wrong :) ? and this should not be done.
>
>
>
>
>
> Regards,
> Vijay