Re: Why is parula failing? - Mailing list pgsql-hackers

From David Rowley
Subject Re: Why is parula failing?
Date
Msg-id CAApHDvo=2W6k3tm+qmihBbRTnr4rr2Qyz_=x+Jj1K7E0oeEKCA@mail.gmail.com
Whole thread Raw
In response to Re: Why is parula failing?  (Robins Tharakan <tharakan@gmail.com>)
Responses Re: Why is parula failing?
Re: Why is parula failing?
List pgsql-hackers
On Mon, 15 Apr 2024 at 16:10, Robins Tharakan <tharakan@gmail.com> wrote:
> - I now have 2 separate runs stuck on pg_sleep() - HEAD / REL_16_STABLE
> - I'll keep them (stuck) for this week, in case there's more we can get
> from them (and to see how long they take)
> - Attached are 'bt full' outputs for both (b.txt - HEAD / a.txt - REL_16_STABLE)

Thanks for getting those.

#4  0x000000000090b7b4 in pg_sleep (fcinfo=<optimized out>) at misc.c:406
        delay = <optimized out>
        delay_ms = <optimized out>
        endtime = 0

This endtime looks like a problem. It seems unlikely to be caused by
gettimeofday's timeval fields being zeroed given that the number of
seconds should have been added to that.

I can't quite make sense of how we end up sleeping at all with a zero
endtime. Assuming the subsequent GetNowFloats() worked, "delay =
endtime - GetNowFloat();" would result in a negative sleep duration
and we'd break out of the sleep loop.

If GetNowFloat() somehow was returning a negative number then we could
end up with a large delay.  But if gettimeofday() was so badly broken
then wouldn't there be some evidence of this in the log timestamps on
failing runs?

I'm not that familiar with the buildfarm config, but I do see some
Valgrind related setting in there. Is PostgreSQL running under
Valgrind on these runs?

David



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: wal_consistemcy_checking clean on HEAD
Next
From: Robins Tharakan
Date:
Subject: Re: Why is parula failing?