BUG #16129: Segfault in tts_virtual_materialize in logical replication worker - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #16129: Segfault in tts_virtual_materialize in logical replication worker |
Date | |
Msg-id | 16129-a0c0f48e71741e5f@postgresql.org Whole thread Raw |
Responses |
Re: BUG #16129: Segfault in tts_virtual_materialize in logicalreplication worker
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 16129 Logged by: Ondrej Jirman Email address: ienieghapheoghaiwida@xff.cz PostgreSQL version: 12.1 Operating system: Arch Linux Description: Hello, I've upgraded my main PostgreSQL cluster from 11.5 to 12.1 via pg_dumpall method and after a while I started getting segfault in logical replication worker. My setup is fairly vanilla, non-default options: shared_buffers = 256MB work_mem = 512MB temp_buffers = 64MB maintenance_work_mem = 4GB effective_cache_size = 16GB max_logical_replication_workers = 30 max_replication_slots = 30 max_worker_processes = 30 wal_level = logical I have several databases that I subscribe to from this database cluster using logical replication. Replication of one of my databases (running on ARMv7 machine) started segfaulting on the subscriber side (x86_64) like this: #0 0x00007fc259739917 in __memmove_sse2_unaligned_erms () from /usr/lib/libc.so.6 #1 0x000055d033e93d44 in memcpy (__len=620701425, __src=<optimized out>, __dest=0x55d0356da804) at /usr/include/bits/string_fortified.h:34 #2 tts_virtual_materialize (slot=0x55d0356da3b8) at execTuples.c:235 #3 0x000055d033e94d32 in ExecFetchSlotHeapTuple (slot=slot@entry=0x55d0356da3b8, materialize=materialize@entry=true, shouldFree=shouldFree@entry=0x7fff0e7cf387) at execTuples.c:1624 #4 0x000055d033d272a1 in heapam_tuple_update (relation=0x7fc24573a228, otid=0x55d0356dac38, slot=0x55d0356da3b8, cid=0, snapshot=<optimized out>, crosscheck=0x0, wait=true, tmfd=0x7fff0e7cf400, lockmode=0x7fff0e7cf3fc, update_indexes=0x7fff0e7cf457) at heapam_handler.c:325 #5 0x000055d033d537cd in table_tuple_update (update_indexes=0x7fff0e7cf457, lockmode=0x7fff0e7cf3fc, tmfd=0x7fff0e7cf400, wait=true, crosscheck=0x0, snapshot=0x0, cid=<optimized out>, slot=0x55d0356da3b8, otid=0x55d0356dac38, rel=0x7fc24573a228) at ../../../../src/include/access/tableam.h:1261 #6 simple_table_tuple_update (rel=rel@entry=0x7fc24573a228, otid=otid@entry=0x55d0356dac38, slot=slot@entry=0x55d0356da3b8, snapshot=0x0, update_indexes=update_indexes@entry=0x7fff0e7cf457) at tableam.c:325 #7 0x000055d033e914c4 in ExecSimpleRelationUpdate (estate=estate@entry=0x55d03572cff0, epqstate=epqstate@entry=0x7fff0e7cf4b0, searchslot=searchslot@entry=0x55d0356dac08, slot=slot@entry=0x55d0356da3b8) at execReplication.c:493 #8 0x000055d033f9158e in apply_handle_update (s=s@entry=0x7fff0e7d6a80) at worker.c:750 #9 0x000055d033f91ab0 in apply_dispatch (s=s@entry=0x7fff0e7d6a80) at worker.c:968 #10 0x000055d033f9206e in LogicalRepApplyLoop (last_received=11741462160) at worker.c:1176 #11 0x000055d033f927c4 in ApplyWorkerMain (main_arg=<optimized out>) at worker.c:1734 #12 0x000055d033f601d5 in StartBackgroundWorker () at bgworker.c:834 #13 0x000055d033f6e818 in do_start_bgworker (rw=rw@entry=0x55d035620780) at postmaster.c:5770 #14 0x000055d033f6e9a4 in maybe_start_bgworkers () at postmaster.c:5996 #15 0x000055d033f6f216 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5167 #16 <signal handler called> #17 0x00007fc259787e2a in select () from /usr/lib/libc.so.6 #18 0x000055d033f6fb37 in ServerLoop () at postmaster.c:1668 #19 0x000055d033f70c5a in PostmasterMain (argc=3, argv=<optimized out>) at postmaster.c:1377 #20 0x000055d033ed9731 in main (argc=3, argv=0x55d0355f3260) at main.c:228 Subscriber: 2019-11-20 23:22:50.315 CET [1480050] LOG: data stream from publisher has ended 2019-11-20 23:22:50.315 CET [1480050] ERROR: could not send end-of-streaming message to primary: COPY se neprovádí 2019-11-20 23:22:50.318 CET [178527] LOG: background worker "logical replication worker" (PID 1480050) exited with exit code 1 2019-11-20 23:22:50.320 CET [1480051] LOG: data stream from publisher has ended 2019-11-20 23:22:50.320 CET [1480051] ERROR: could not send end-of-streaming message to primary: COPY se neprovádí 2019-11-20 23:22:50.321 CET [178527] LOG: background worker "logical replication worker" (PID 1480051) exited with exit code 1 2019-11-20 23:22:50.321 CET [2195905] LOG: logical replication apply worker for subscription "l5_hometv" has started 2019-11-20 23:22:50.325 CET [1480052] LOG: data stream from publisher has ended 2019-11-20 23:22:50.325 CET [1480052] ERROR: could not send end-of-streaming message to primary: COPY se neprovádí 2019-11-20 23:22:50.326 CET [178527] LOG: background worker "logical replication worker" (PID 1480052) exited with exit code 1 2019-11-20 23:22:50.395 CET [2195905] ERROR: could not connect to the publisher: FATAL: the database system is shutting down FATAL: the database system is shutting down 2019-11-20 23:22:50.396 CET [178527] LOG: background worker "logical replication worker" (PID 2195905) exited with exit code 1 2019-11-20 23:22:55.401 CET [2195919] LOG: logical replication apply worker for subscription "l5_hometv" has started 2019-11-20 23:22:55.411 CET [2195920] LOG: logical replication apply worker for subscription "l5_ledger" has started 2019-11-20 23:22:55.421 CET [2195921] LOG: logical replication apply worker for subscription "l5_news" has started 2019-11-20 23:28:52.488 CET [178527] LOG: background worker "logical replication worker" (PID 2195919) was terminated by signal 11: Neoprávněný přístup do pam> 2019-11-20 23:28:52.488 CET [178527] LOG: terminating any other active server processes 2019-11-20 23:28:52.488 CET [2195470] WARNING: terminating connection because of crash of another server process 2019-11-20 23:28:52.488 CET [2195470] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another > 2019-11-20 23:28:52.488 CET [2195470] HINT: In a moment you should be able to reconnect to the database and repeat your command. Logs for publisher and subscriber follow. I think that the initial segfault on the subscriber was tirggered by restart of the publisher machine. From that point forward, subscriber segfaults on each connection to the publisher. It may or may not be relevant to the upgrade, but looks like the code in execTuples.c was changed quite a bit in 12.x, so it may. Publisher: lis 20 23:29:02 l5 postgres[355]: 2019-11-20 23:29:02.983 CET [466] LOG: unexpected EOF on standby connection lis 20 23:29:02 l5 postgres[355]: 2019-11-20 23:29:02.983 CET [467] LOG: unexpected EOF on standby connection lis 20 23:29:02 l5 postgres[355]: 2019-11-20 23:29:02.982 CET [466] LOG: could not receive data from client: Spojení zrušeno druhou stranou lis 20 23:29:02 l5 postgres[355]: 2019-11-20 23:29:02.982 CET [467] LOG: could not receive data from client: Spojení zrušeno druhou stranou lis 20 23:29:02 l5 postgres[355]: 2019-11-20 23:29:02.980 CET [465] LOG: unexpected EOF on standby connection lis 20 23:29:02 l5 postgres[355]: 2019-11-20 23:29:02.980 CET [465] LOG: could not receive data from client: Spojení zrušeno druhou stranou lis 20 23:28:54 l5 postgres[355]: 2019-11-20 23:28:54.588 CET [467] DETAIL: There are no running transactions. lis 20 23:28:54 l5 postgres[355]: 2019-11-20 23:28:54.588 CET [467] LOG: logical decoding found consistent point at 2/BBD880E8 lis 20 23:28:54 l5 postgres[355]: 2019-11-20 23:28:54.587 CET [467] DETAIL: Streaming transactions committing after 2/BBD88120, reading WAL from 2/BBD880E8. lis 20 23:28:54 l5 postgres[355]: 2019-11-20 23:28:54.587 CET [467] LOG: starting logical decoding for slot "l5_news" lis 20 23:28:54 l5 postgres[355]: 2019-11-20 23:28:54.579 CET [466] DETAIL: There are no running transactions. lis 20 23:28:54 l5 postgres[355]: 2019-11-20 23:28:54.579 CET [466] LOG: logical decoding found consistent point at 2/BBD880E8 lis 20 23:28:54 l5 postgres[355]: 2019-11-20 23:28:54.578 CET [466] DETAIL: Streaming transactions committing after 2/BBD88120, reading WAL from 2/BBD880E8. lis 20 23:28:54 l5 postgres[355]: 2019-11-20 23:28:54.578 CET [466] LOG: starting logical decoding for slot "l5_ledger" lis 20 23:28:54 l5 postgres[355]: 2019-11-20 23:28:54.572 CET [465] DETAIL: There are no running transactions. lis 20 23:28:54 l5 postgres[355]: 2019-11-20 23:28:54.572 CET [465] LOG: logical decoding found consistent point at 2/BBD86E68 lis 20 23:28:54 l5 postgres[355]: 2019-11-20 23:28:54.570 CET [465] DETAIL: Streaming transactions committing after 2/BBD86EA0, reading WAL from 2/BBD86E68. lis 20 23:28:54 l5 postgres[355]: 2019-11-20 23:28:54.570 CET [465] LOG: starting logical decoding for slot "l5_hometv" lis 20 23:28:52 l5 postgres[355]: 2019-11-20 23:28:52.495 CET [385] LOG: unexpected EOF on standby connection lis 20 23:28:52 l5 postgres[355]: 2019-11-20 23:28:52.495 CET [385] LOG: could not receive data from client: Spojení zrušeno druhou stranou lis 20 23:28:52 l5 postgres[355]: 2019-11-20 23:28:52.491 CET [384] LOG: unexpected EOF on standby connection lis 20 23:28:52 l5 postgres[355]: 2019-11-20 23:28:52.491 CET [384] LOG: could not receive data from client: Spojení zrušeno druhou stranou lis 20 23:28:52 l5 postgres[355]: 2019-11-20 23:28:52.489 CET [386] LOG: unexpected EOF on standby connection lis 20 23:28:52 l5 postgres[355]: 2019-11-20 23:28:52.489 CET [386] LOG: could not receive data from client: Spojení zrušeno druhou stranou lis 20 23:22:58 l5 postgres[355]: 2019-11-20 23:22:58.107 CET [386] DETAIL: There are no running transactions. lis 20 23:22:58 l5 postgres[355]: 2019-11-20 23:22:58.107 CET [386] LOG: logical decoding found consistent point at 2/BBD86CE0 lis 20 23:22:58 l5 postgres[355]: 2019-11-20 23:22:58.107 CET [384] DETAIL: There are no running transactions. lis 20 23:22:58 l5 postgres[355]: 2019-11-20 23:22:58.107 CET [384] LOG: logical decoding found consistent point at 2/BBD86CE0 lis 20 23:22:58 l5 postgres[355]: 2019-11-20 23:22:58.107 CET [385] DETAIL: There are no running transactions. lis 20 23:22:58 l5 postgres[355]: 2019-11-20 23:22:58.107 CET [385] LOG: logical decoding found consistent point at 2/BBD86CE0 lis 20 23:22:58 l5 postgres[355]: 2019-11-20 23:22:58.106 CET [386] DETAIL: Streaming transactions committing after 2/BBD86CE0, reading WAL from 2/BBD86CE0. lis 20 23:22:58 l5 postgres[355]: 2019-11-20 23:22:58.106 CET [386] LOG: starting logical decoding for slot "l5_hometv" lis 20 23:22:58 l5 postgres[355]: 2019-11-20 23:22:58.106 CET [384] DETAIL: Streaming transactions committing after 2/BBD86CE0, reading WAL from 2/BBD86CE0. lis 20 23:22:58 l5 postgres[355]: 2019-11-20 23:22:58.106 CET [384] LOG: starting logical decoding for slot "l5_ledger" lis 20 23:22:58 l5 postgres[355]: 2019-11-20 23:22:58.106 CET [385] DETAIL: Streaming transactions committing after 2/BBD86CE0, reading WAL from 2/BBD86CE0. lis 20 23:22:58 l5 postgres[355]: 2019-11-20 23:22:58.106 CET [385] LOG: starting logical decoding for slot "l5_news" lis 20 23:22:54 l5 systemd[1]: Started PostgreSQL database server. lis 20 23:22:54 l5 postgres[355]: 2019-11-20 23:22:54.295 CET [355] LOG: database system is ready to accept connections lis 20 23:22:54 l5 postgres[355]: 2019-11-20 23:22:54.146 CET [356] LOG: database system was shut down at 2019-11-20 23:22:50 CET lis 20 23:22:54 l5 systemd-networkd[326]: eth0: Gained IPv6LL lis 20 23:22:53 l5 postgres[355]: 2019-11-20 23:22:53.544 CET [355] LOG: listening on Unix socket "/run/postgresql/.s.PGSQL.5432" lis 20 23:22:53 l5 postgres[355]: 2019-11-20 23:22:53.539 CET [355] LOG: listening on IPv6 address "::", port 5432 lis 20 23:22:53 l5 postgres[355]: 2019-11-20 23:22:53.539 CET [355] LOG: listening on IPv4 address "0.0.0.0", port 5432 thank you and regards, Ondrej
pgsql-bugs by date: