Re: Buildfarm feature request: some way to track/classify failures - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Buildfarm feature request: some way to track/classify failures |
Date | |
Msg-id | 20700.1174274533@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Buildfarm feature request: some way to track/classify failures (Andrew Dunstan <andrew@dunslane.net>) |
Responses |
Re: Buildfarm feature request: some way to track/classify
failures
Re: Buildfarm feature request: some way to track/classify failures Re: Buildfarm feature request: some way to track/classify failures |
List | pgsql-hackers |
Andrew Dunstan <andrew@dunslane.net> writes: > OK, for anyone that wants to play, I have created an extract that > contains a summary of every non-CVS-related failure we've had. It's a > single table looking like this: I did some analysis on this data. Attached is a text dump of a table declared as CREATE TABLE mreasons ( sysname text, snapshot timestamp without time zone, branch text, reason text, known boolean ); where the sysname/snapshot/branch data is taken from your table, "reason" is a brief sketch of the failure, and "known" indicates whether the cause is known ... although as I went along it sort of evolved into "does this seem worthy of more investigation?". I looked at every failure back through early December. I'd intended to go back further, but decided I'd hit a point of diminishing returns. However, failures back to the beginning of July that matched grep searches for recent symptoms are classified in the table. The gross stats are: 2231 failures classified, 71 distinct reason codes, 81 failures (with 18 reasons) that seem worthy of closer investigation: bfarm=# select reason,branch,max(snapshot) as latest, count(*) from mreasons where not known group by 1,2 order by 1,2 ; reason | branch | latest | count ------------------------------------------------------------------+---------------+---------------------+------- Input/output error - possible hardware problem | HEAD | 2007-03-06 10:30:01 | 1 No rule to make target | HEAD | 2007-02-08 15:30:01 | 6 No rule to make target | REL8_0_STABLE | 2007-02-28 03:15:02 | 9 No rule to make target | REL8_2_STABLE | 2006-12-17 20:00:01 | 1 could not open relation with OID | HEAD | 2007-03-16 16:45:01 | 2 could not open relation with OID | REL8_1_STABLE | 2006-08-29 23:30:07 | 2 createlang not found? | REL8_1_STABLE | 2007-02-28 02:50:00 | 1 irreproducible contrib/sslinfo build failure, likely not our bug | HEAD | 2007-02-03 07:03:02 | 1 irreproducible opr_sanity failure | HEAD | 2006-12-18 19:15:02 | 2 libintl.h rejected by configure | HEAD | 2007-01-11 20:35:00 | 3 libintl.h rejected by configure | REL8_0_STABLE | 2007-03-01 20:28:04 | 22 postmaster failed to start | REL7_4_STABLE | 2007-02-28 22:23:20 | 1 postmaster failed to start | REL8_0_STABLE | 2007-02-28 22:30:44 | 1 random Solaris configure breakage | HEAD | 2007-01-14 05:30:00 | 1 random Windows breakage | HEAD | 2007-03-16 09:48:31 | 3 random Windows breakage | REL8_0_STABLE | 2007-03-15 03:15:09 | 7 segfault during bootstrap | HEAD | 2007-03-12 23:03:03 | 1 server does not shut down | HEAD | 2007-01-08 03:03:03 | 3 tablespace is not empty | HEAD | 2007-02-24 15:00:10 | 6 tablespace is not empty | REL8_1_STABLE | 2007-01-25 02:30:01 | 2 unexpected statement_timeout failure | HEAD | 2007-01-25 05:05:06 | 1 unexplained tsearch2 crash | HEAD | 2007-01-10 22:05:02 | 1 weird DST-transition-like timestamp test failure | HEAD | 2007-02-04 07:25:04 | 1 weird assembler failure, likely not our bug | HEAD | 2006-12-26 17:02:01 | 1 weird assembler failure, likely not our bug | REL8_2_STABLE | 2007-02-03 23:47:01 | 1 weird install failure | HEAD | 2007-01-25 12:35:00 | 1 (26 rows) I think I know the cause of the recent 'could not open relation with OID' failures in HEAD, but the rest of these maybe need a look. Any volunteers? Also, for completeness, the causes I wrote off as not interesting (anymore, in some cases): bfarm=# select reason,max(snapshot) as latest, count(*) from mreasons where known group by 1 order by 1 ; reason | latest | count ----------------------------------------------------------------------+---------------------+------- DST transition test failure | 2007-03-13 04:04:47 | 26 ISO-week-patch regression test breakage | 2007-02-16 15:00:08 | 23 No rule to make Makefile.port | 2007-03-02 12:30:02 | 40 Out of disk space | 2007-02-16 22:30:01 | 67 Out of semaphores | 2007-02-20 02:03:31 | 14 Python not installed | 2007-02-19 22:45:05 | 2 Solaris random conn-refused bug | 2007-03-06 01:20:00 | 37 TCP socket already in use | 2007-01-09 07:03:04 | 13 Too many clients | 2007-02-26 06:06:02 | 90 Too many open files in system | 2007-02-27 20:30:59 | 17 another icc crash | 2007-02-03 10:50:01 | 1 apparently a malloc bug | 2007-03-04 23:00:20 | 27 bogus system clock setting | 1997-12-21 15:20:11 | 6 breakage from changing := to = in makefiles | 2007-02-10 02:15:01 | 4 broken GUC patch | 2007-03-13 15:15:01 | 92 broken float8 hacking | 2007-01-06 20:00:09 | 120 broken fsync-revoke patch | 2007-01-17 16:21:01 | 77 broken inet hacking | 2007-01-03 00:05:01 | 4 broken log_error patch | 2007-01-28 08:15:01 | 15 broken money patch | 2007-01-03 19:05:01 | 78 broken pg_regress change for msvc support | 2007-01-19 22:03:00 | 46 broken plpython patch | 2007-01-25 14:21:00 | 22 broken sys_siglist patch | 2007-01-28 06:06:02 | 18 bug in btree page split patch | 2007-02-08 11:35:03 | 7 buildfarm pilot error | 2007-01-19 03:28:07 | 69 cache flush bug in operator-family patch | 2006-12-31 10:30:03 | 8 ccache failure | 2007-01-25 23:00:34 | 2 could not create shared memory | 2007-02-13 07:00:05 | 32 ecpg regression test teething pains | 2007-02-03 13:30:02 | 516 failure to update PL expected files for may/can/might rewording | 2007-02-01 20:15:01 | 8 failure to update contrib expected files for may/can/might rewording | 2007-02-01 21:15:02 | 11 failure to update expected files for may/can/might rewording | 2007-02-01 19:35:02 | 3 icc "internal error" | 2007-03-16 16:30:01 | 29 image not found (possibly related to too-many-open-files) | 2006-10-25 08:05:02 | 1 largeobject test bugs | 2007-02-17 23:35:03 | 4 ld segfaulted | 2007-03-16 15:30:02 | 3 missing BYTE_ORDER definition for Solaris | 2007-01-10 14:18:23 | 1 pg_regress patch breakage | 2007-02-08 18:30:01 | 1 plancache test race condition | 2007-03-16 11:15:01 | 5 pltcl regression test broken by ORDER BY semantics tightening | 2007-01-09 03:15:01 | 9 previous contrib test still running | 2007-02-13 20:49:33 | 21 random Solaris breakage | 2007-01-05 17:20:01 | 1 random Windows breakage | 2006-12-27 03:15:07 | 1 random Windows permission-denied failures | 2007-02-12 11:00:09 | 5 random ccache breakage | 2007-01-04 01:34:33 | 1 readline misconfiguration | 2007-02-12 17:19:41 | 33 row-ordering discrepancy in rowtypes test | 2007-02-10 03:00:02 | 3 stats test failed | 2007-03-14 13:00:02 | 319 threaded Python library | 2007-01-10 04:05:02 | 6 undefined symbol pg_mic2ascii | 2007-02-03 01:13:40 | 101 unexpected signal 9 | 2006-12-31 06:30:02 | 15 unportable uuid patch | 2007-01-31 17:30:01 | 16 use of // comment | 2007-02-16 09:23:02 | 1 xml code teething problems | 2007-02-16 16:01:05 | 79 (54 rows) Some of these might possibly be interesting to other people ... regards, tom lane
Attachment
pgsql-hackers by date: