Re: [HACKERS] logical replication launcher crash on buildfarm - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: [HACKERS] logical replication launcher crash on buildfarm |
Date | |
Msg-id | 20170316085322.crffknkgee5s6air@alap3.anarazel.de Whole thread Raw |
In response to | Re: [HACKERS] logical replication launcher crash on buildfarm (Petr Jelinek <petr.jelinek@2ndquadrant.com>) |
Responses |
Re: [HACKERS] logical replication launcher crash on buildfarm
|
List | pgsql-hackers |
On 2017-03-16 09:40:48 +0100, Petr Jelinek wrote: > On 16/03/17 04:42, Andres Freund wrote: > > On 2017-03-15 20:28:33 -0700, Andres Freund wrote: > >> Hi, > >> > >> I just unstuck a bunch of my buildfarm animals. That triggered some > >> spurious failures (on piculet, calliphoridae, mylodon), but also one > >> that doesn't really look like that: > >> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2017-03-16%2002%3A40%3A03 > >> > >> with the pertinent point being: > >> > >> ================== stack trace: pgsql.build/src/test/regress/tmp_check/data/core ================== > >> [New LWP 1894] > >> [Thread debugging using libthread_db enabled] > >> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". > >> Core was generated by `postgres: bgworker: logical replication launcher '. > >> Program terminated with signal SIGSEGV, Segmentation fault. > >> #0 0x000055e265bff5e3 in ?? () > >> #0 0x000055e265bff5e3 in ?? () > >> #1 0x000055d3ccabed0d in StartBackgroundWorker () at /home/andres/build/buildfarm-culicidae/HEAD/pgsql.build/../pgsql/src/backend/postmaster/bgworker.c:792 > >> #2 0x000055d3ccacf4fc in SubPostmasterMain (argc=3, argv=0x55d3cdbb71c0) at /home/andres/build/buildfarm-culicidae/HEAD/pgsql.build/../pgsql/src/backend/postmaster/postmaster.c:4878 > >> #3 0x000055d3cca443ea in main (argc=3, argv=0x55d3cdbb71c0) at /home/andres/build/buildfarm-culicidae/HEAD/pgsql.build/../pgsql/src/backend/main/main.c:205 > >> > >> it's possible that me killing things and upgrading caused this, but > >> given this is a backend running EXEC_BACKEND, I'm a bit suspicous that > >> it's more than that. The machine is a bit backed up at the moment, so > >> it'll probably be a while till it's at that animal/branch again, > >> otherwise I'd not have mentioned this. > > > > For some reason it ran again pretty soon. And I'm afraid it's indeed an > > issue: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2017-03-16%2003%3A30%3A02 > > > > Hmm, I tried with EXEC_BACKEND (and with --disable-spinlocks) and it > seems to work fine on my two machines. I don't see anything else > different on culicidae though. Sadly the backtrace is not that > informative either. I'll try to investigate more but it will take time... Worthwhile additional failure: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=culicidae&dt=2017-03-16%2002%3A55%3A01 Same animal, also EXEC_BACKEND, but 9.6. A quick look at the relevant line:/* * If bgw_main is set, we use that value as the initial entrypoint. * However, if thelibrary containing the entrypoint wasn't loaded at * postmaster startup time, passing it as a direct function pointeris not * possible. To work around that, we allow callers for whom a function * pointer is not available to pass alibrary name (which will be loaded, * if necessary) and a function name (which will be looked up in the named * library).*/if (worker->bgw_main != NULL) entrypt = worker->bgw_main; makes the issue clear - we appear to be assuming that bgw_main is meaningful across processes. Which it isn't in the EXEC_BACKEND case when ASLR is in use... This kinda sounds familiar, but a quick google search doesn't find anything relevant. Greetings, Andres Freund
pgsql-hackers by date: