Thread: fairywren exiting in ecpg

fairywren exiting in ecpg

From
Andres Freund
Date:
Hi,

Looks like fairywren is possibly seeing something I saw before and spent many
days looking into:
https://postgr.es/m/20220909235836.lz3igxtkcjb5w7zb%40awork3.anarazel.de
which led me to add the following to .cirrus.yml:

    # Cirrus defaults to SetErrorMode(SEM_NOGPFAULTERRORBOX | ...). That
    # prevents crash reporting from working unless binaries do SetErrorMode()
    # themselves. Furthermore, it appears that either python or, more likely,
    # the C runtime has a bug where SEM_NOGPFAULTERRORBOX can very
    # occasionally *trigger* a crash on process exit - which is hard to debug,
    # given that it explicitly prevents crash dumps from working...
    # 0x8001 is SEM_FAILCRITICALERRORS | SEM_NOOPENFILEERRORBOX
    CIRRUS_WINDOWS_ERROR_MODE: 0x8001


The mingw folks also spent a lot of time looking into this ([1]), without a
lot of success.

It sure looks like it might be a windows C runtime issue - none of the
stacktrace handling python has gets invoked. I could not find any relevant
behavoural differences in python's code that depend on SEM_NOGPFAULTERRORBOX
being set.

It'd be interesting to see if fairywren's occasional failures go away if you
set MSYS=winjitdebug (which prevents msys from adding SEM_NOGPFAULTERRORBOX to
ErrorMode).

Greetings,

Andres Freund

[1] https://github.com/msys2/MINGW-packages/issues/11864



Re: fairywren exiting in ecpg

From
Andrew Dunstan
Date:


On 2023-04-03 Mo 21:15, Andres Freund wrote:
Hi,

Looks like fairywren is possibly seeing something I saw before and spent many
days looking into:
https://postgr.es/m/20220909235836.lz3igxtkcjb5w7zb%40awork3.anarazel.de
which led me to add the following to .cirrus.yml:
    # Cirrus defaults to SetErrorMode(SEM_NOGPFAULTERRORBOX | ...). That    # prevents crash reporting from working unless binaries do SetErrorMode()    # themselves. Furthermore, it appears that either python or, more likely,    # the C runtime has a bug where SEM_NOGPFAULTERRORBOX can very    # occasionally *trigger* a crash on process exit - which is hard to debug,    # given that it explicitly prevents crash dumps from working...    # 0x8001 is SEM_FAILCRITICALERRORS | SEM_NOOPENFILEERRORBOX    CIRRUS_WINDOWS_ERROR_MODE: 0x8001


The mingw folks also spent a lot of time looking into this ([1]), without a
lot of success.

It sure looks like it might be a windows C runtime issue - none of the
stacktrace handling python has gets invoked. I could not find any relevant
behavoural differences in python's code that depend on SEM_NOGPFAULTERRORBOX
being set.

It'd be interesting to see if fairywren's occasional failures go away if you
set MSYS=winjitdebug (which prevents msys from adding SEM_NOGPFAULTERRORBOX to
ErrorMode).


trying now. Since this happened every build or so it shouldn't take long for us to see.

(I didn't see anything in the MSYS2 docs that specified the possible values for MSYS :-( )


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: fairywren exiting in ecpg

From
Andrew Dunstan
Date:


On 2023-04-04 Tu 08:22, Andrew Dunstan wrote:


On 2023-04-03 Mo 21:15, Andres Freund wrote:
Hi,

Looks like fairywren is possibly seeing something I saw before and spent many
days looking into:
https://postgr.es/m/20220909235836.lz3igxtkcjb5w7zb%40awork3.anarazel.de
which led me to add the following to .cirrus.yml:
    # Cirrus defaults to SetErrorMode(SEM_NOGPFAULTERRORBOX | ...). That    # prevents crash reporting from working unless binaries do SetErrorMode()    # themselves. Furthermore, it appears that either python or, more likely,    # the C runtime has a bug where SEM_NOGPFAULTERRORBOX can very    # occasionally *trigger* a crash on process exit - which is hard to debug,    # given that it explicitly prevents crash dumps from working...    # 0x8001 is SEM_FAILCRITICALERRORS | SEM_NOOPENFILEERRORBOX    CIRRUS_WINDOWS_ERROR_MODE: 0x8001


The mingw folks also spent a lot of time looking into this ([1]), without a
lot of success.

It sure looks like it might be a windows C runtime issue - none of the
stacktrace handling python has gets invoked. I could not find any relevant
behavoural differences in python's code that depend on SEM_NOGPFAULTERRORBOX
being set.

It'd be interesting to see if fairywren's occasional failures go away if you
set MSYS=winjitdebug (which prevents msys from adding SEM_NOGPFAULTERRORBOX to
ErrorMode).


trying now. Since this happened every build or so it shouldn't take long for us to see.

(I didn't see anything in the MSYS2 docs that specified the possible values for MSYS :-( )




The error hasn't been seen since I set this about a week ago.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: fairywren exiting in ecpg

From
Andres Freund
Date:
Hi,

On 2023-04-11 07:10:20 -0400, Andrew Dunstan wrote:
> The error hasn't been seen since I set this about a week ago.

This issue really bothers me, but I am at my wits end how to debug it, given
that we get a segfault only if we *disable* getting crash reports / core dumps
in some form. There's no debug printout or anything, python just exits with an
error code indicating an access violation.

Greetings,

Andres Freund