Re: [HACKERS] snapbuild woes - Mailing list pgsql-hackers
From | Erik Rijkers |
---|---|
Subject | Re: [HACKERS] snapbuild woes |
Date | |
Msg-id | 84a48a73304ed5363433ec4b9fbb4438@xs4all.nl Whole thread Raw |
In response to | Re: [HACKERS] snapbuild woes (Petr Jelinek <petr.jelinek@2ndquadrant.com>) |
Responses |
Re: [HACKERS] snapbuild woes
|
List | pgsql-hackers |
On 2017-05-09 11:50, Petr Jelinek wrote: > I rebased the above mentioned patch to apply to the patches Andres > sent, > if you could try to add it on top of what you have and check if it > still > fails, that would be helpful. It still fails. With these patches - 0001-WIP-Fix-off-by-one-around-GetLastImportantRecPtr.patch+ - 00002-WIP-Possibly-more-robust-snapbuild-approach.patch + - fix-statistics-reporting-in-logical-replication-work.patch + - Skip-unnecessary-snapshot-builds.patch built again on top of 44c528810a1 ( so I had to add the 'fix-statistics-rep*' patch because without it I immediately got that Assertion failure again ). As always most runs succeed (especially on this large 192GB 16-core server). But attached is an output file of a number of runs of my pgbench_derail2.sh test. Overal result: -- out_20170509_1635.txt 3 -- pgbench -c 64 -j 8 -T 900 -P 180 -n -- scale 25 2 -- All is well. 1 -- Not good, but breaking out of wait (21 times no change) I broke it off after iteration 4, so 5 never ran, and iteration 1 failed due to a mistake in the harness (somethind stupid I did) - not interesting. iteration 2 succeeds. (eventually has 'replica ok') iteration 3 succeeds. (eventually has 'replica ok') iteration 4 fails. Just after 'alter subscription sub1 enable' I caught (as is usual) pg_stat_replication.state as 'catchup'. So far so good. After the 15-minute pgbench run pg_stat_replication has only 2 'startup' lines (and none 'catchup' or 'streaming'): port | pg_stat_replication | pid | wal | replay_loc | diff | ?column? | state | app | sync_state 6972 | pg_stat_replication | 108349 | 19/8FBCC248 | | | | startup | derail2 | async 6972 | pg_stat_replication | 108351 | 19/8FBCC248 | | | | startup | derail2 | async (that's from: select $port1 as port,'pg_stat_replication' as pg_stat_replication, pid , pg_current_wal_location() wal, replay_location replay_loc, pg_current_wal_location() - replay_location as diff , pg_current_wal_location() <= replay_location , state, application_name as app, sync_state from pg_stat_replication ) This remains in this state for as long as my test-programs lets it (i.e., 20 x 30s, or something like that, and then the loop is exited); in the ouput file it says: 'Not good, but breaking out of wait' Below is the accompanying ps (with the 2 'deranged senders' as Jeff Janes would surely call them): UID PID PPID C STIME TTY STAT TIME CMD rijkers 107147 1 0 17:11 pts/35 S+ 0:00 /var/data1/pg_stuff/pg_installations/pgsql.logical_replication2/bin/postgres -D /var/data1/pg_stuff/pg_installations rijkers 107149 107147 0 17:11 ? Ss 0:00 \_ postgres: logger process rijkers 107299 107147 0 17:11 ? Ss 0:01 \_ postgres: checkpointer process rijkers 107300 107147 0 17:11 ? Ss 0:00 \_ postgres: writer process rijkers 107301 107147 0 17:11 ? Ss 0:00 \_ postgres: wal writer process rijkers 107302 107147 0 17:11 ? Ss 0:00 \_ postgres: autovacuum launcher process rijkers 107303 107147 0 17:11 ? Ss 0:00 \_ postgres: stats collector process rijkers 107304 107147 0 17:11 ? Ss 0:00 \_ postgres: bgworker: logical replication launcher rijkers 108348 107147 0 17:12 ? Ss 0:01 \_ postgres: bgworker: logical replication worker for subscription 70310 sync 70293 rijkers 108350 107147 0 17:12 ? Ss 0:00 \_ postgres: bgworker: logical replication worker for subscription 70310 sync 70298 rijkers 107145 1 0 17:11 pts/35 S+ 0:02 /var/data1/pg_stuff/pg_installations/pgsql.logical_replication/bin/postgres -D /var/data1/pg_stuff/pg_installations rijkers 107151 107145 0 17:11 ? Ss 0:00 \_ postgres: logger process rijkers 107160 107145 0 17:11 ? Ss 0:08 \_ postgres: checkpointer process rijkers 107161 107145 0 17:11 ? Ss 0:07 \_ postgres: writer process rijkers 107162 107145 0 17:11 ? Ss 0:02 \_ postgres: wal writer process rijkers 107163 107145 0 17:11 ? Ss 0:00 \_ postgres: autovacuum launcher process rijkers 107164 107145 0 17:11 ? Ss 0:02 \_ postgres: stats collector process rijkers 107165 107145 0 17:11 ? Ss 0:00 \_ postgres: bgworker: logical replication launcher rijkers 108349 107145 0 17:12 ? Ss 0:27 \_ postgres: wal sender process rijkers [local] idle rijkers 108351 107145 0 17:12 ? Ss 0:26 \_ postgres: wal sender process rijkers [local] idle I have had no time to add (or view) any CPUinfo. Erik Rijkers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: