Re: 9.4 beta1 crash on Debian sid/i386 - Mailing list pgsql-hackers
From | Christoph Berg |
---|---|
Subject | Re: 9.4 beta1 crash on Debian sid/i386 |
Date | |
Msg-id | 20140518090834.GA18253@msgid.df7cb.de Whole thread Raw |
In response to | Re: 9.4 beta1 crash on Debian sid/i386 (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: 9.4 beta1 crash on Debian sid/i386
|
List | pgsql-hackers |
Re: Tom Lane 2014-05-18 <9058.1400385611@sss.pgh.pa.us> > Christoph Berg <cb@df7cb.de> writes: > > Re: Tom Lane 2014-05-14 <1357.1400028161@sss.pgh.pa.us> > >> It would appear that something is wrong with check_stack_depth(), > >> and/or getrlimit(RLIMIT_STACK) is lying to us about the available stack. > > > ulimit -s is 8192 (kB); max_stack_depth is 2MB. > > > check_stack_depth looks right, max_stack_depth_bytes there is 2097152 > > and I can see stack_base_ptr - &stack_top_loc grow over repeated > > invocations of the function (stack_depth itself is optimized out). > > Still, it never enters "if (stack_depth > max_stack_depth_bytes...)". > > Hm. Did you check that stack_base_ptr is non-NULL? If it were somehow > not getting set, that would disable the error report. But on most > architectures that would also result in silly values for the pointer > difference, so I doubt this is the issue. stack_base_ptr was non-NULL. The stack size started around 3 or 5kB (don't remember exactly), and grew by something like a few 100B in each iteration, so this looked sane. > > Interestingly, the Debian buildd managed to run the testsuite for > > i386, while I could reproduce the problem on the pgapt build machine > > and on my notebook, so there must be some system difference. Possibly > > the reason is these two machines are running a 64bit kernel and I'm > > building in a 32bit chroot, though that hasn't been a problem before. > > I'm suspicious that something has changed in your build environment, > because that stack-checking logic hasn't changed since these commits: It's something in the combination of build and runtime environment. I can reproduce the problem in the package that the Debian i386/experimental buildd has compiled, including passing the regression tests there. Possibly a change in libc there. I'll try to ask some kernel/libc people if they have an idea. My current bet is on the gcc hardening flags we are using. > The lack of reports from the buildfarm or other users is also evidence > against there being a widespread issue here. The only animal running Debian testing/unstable I can see is dugong, which is ia64 - which has been removed from Debian some months ago. I guess I should look into setting up a new animal for this. > A different thought: I have heard of environments in which the available > stack depth is much less than what ulimit would suggest because the ulimit > space gets split up for multiple per-thread stacks. That should not be > happening in a Postgres backend, since we don't do threading, but I'm > running out of ideas to investigate ... I've done some builds now and there's no clear picture yet when the problem is occurring. Still trying... Christoph -- cb@df7cb.de | http://www.df7cb.de/
pgsql-hackers by date: