Re: REL_13_STABLE Windows 10 Regression Failures - Mailing list pgsql-bugs
From | Heath Lord |
---|---|
Subject | Re: REL_13_STABLE Windows 10 Regression Failures |
Date | |
Msg-id | CA+BEBhsbe37XMAF2ChA9aoOE6RwSR15mkYFUy9_bPXoZM3bPkw@mail.gmail.com Whole thread Raw |
In response to | Re: REL_13_STABLE Windows 10 Regression Failures (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: REL_13_STABLE Windows 10 Regression Failures
|
List | pgsql-bugs |
Tom, We are working to set up our environment to allow us to get a stack trace as we do not have any of the Visual Studios stuff installed right now. However, I thought I would send you a little more information while we are trying to get that working. Going through the stats_ext.sql file line by line with a freshly built REL_13_STABLE database stood up we have determined that running any of the following commands back to back will cause the database to crash: CREATE STATISTICS tst ON relnatts + relpages FROM pg_class; CREATE STATISTICS tst ON (relpages, reltuples) FROM pg_class; If you run another command in between them like: SELECT version(); Then it will not crash when you run either of those commands again. However if you run any combination of those 2 commands back to back it will crash the database. The output from the psql instance after stepping through the stats_ext.sql file is in the stats_ext_psql_output.txt file attached. The information from the postgres logfile for the above is attached in the pg_logfile_output.txt file. Hopefully, this will at least give you some information while we are working on getting the backtrace. Thanks. -Heath On Fri, Oct 30, 2020 at 1:25 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Heath Lord <heath.lord@crunchydata.com> writes: > > When building from source on a Windows 10 VM using MinGW (8.1.0), I > > get a random number of regression failures off the REL_13_STABLE > > branch. I debugged this a little bit and found out that the "random" > > number of failures is fully dependent on the machine and if I disable > > the "stats_ext.sql" regression test; all other tests pass without > > issue. When the "stats_ext.sql" regression test runs, it causes a > > database exception and PostgreSQL crashes. > > Hmph ... it's weird that we have not seen this in the buildfarm. > Have you tried to extract any info from the crash, like a stack trace? > > > I did some digging and determined that on the REL_13_STABLE branch > > this instability was introduced with this commit > > "b380484a850b6bf7d9fc0d85c555a2366e38451f"[1]. This corresponds to > > commit "19f5a37b9fc48a12c77edafb732543875da2f4a3"[1] on master. I > > worked backwards from there to determine when the regressions stopped > > failing and determined that with commit > > "be0a6666656ec3f68eb7d8e7abab5139fcd47012"[2] the regression tests are > > no longer failing. > > I'm having a hard time believing that b380484a8 would have introduced > a portability problem, and an even harder time believing that be0a66666 > would have resolved it if so. What seems more likely is that there's > some underlying issue such as a memory stomp, that the first commit > accidentally exposed and the second one accidentally hid again. > So, even if back-patching be0a66666 seemed feasible from a stability > standpoint (which I don't think it is), I fear it'd just mask a problem > that would eventually bite us again. > > So I think we need to dig down and try to identify the root cause, > without any preconceptions about how to fix it. Again, a stack trace > would be pretty useful. Or at least some info about which step of > stats_ext.sql is crashing. > > regards, tom lane
Attachment
pgsql-bugs by date: