Home > mailing lists

Re: strange buildfarm failure on lionfish - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: strange buildfarm failure on lionfish
Date	July 24, 2007 14:44:31
Msg-id	13477.1185299054@sss.pgh.pa.us Whole thread Raw
In response to	Re: strange buildfarm failure on lionfish (Gregory Stark <stark@enterprisedb.com>)
Responses	Re: strange buildfarm failure on lionfish
List	pgsql-hackers

Tree view

Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> I saw what I think was the identical failure last night on my own
>> machine, but it wasn't repeatable.  Evidently the planner is changing to
>> a different plan for those queries, but why has this only started
>> recently?  Maybe the recent changes to autovacuum defaults are causing
>> autovac to hit these tables when it never did before?

> Indeed the only alternate plan I can imagine for this is to do the join the
> other way around. And given the large difference in sizes between the two
> tables the only way I could get that to happen was by obliterating the
> statistics entirely for one table but having stats for the other.

Yeah, I turned off autovac and let the tests run through 'horology',
then stopped and looked at the plans for these queries.  For the first
one, what you get with no ANALYZE having been done is

explain SELECT t.d1 + i.f1 AS "102" FROM TIMESTAMP_TBL t, INTERVAL_TBL i
WHEREt.d1 BETWEEN '1990-01-01' AND '2001-01-01'   AND i.f1 BETWEEN '00:00' AND '23:00';
                              QUERY PLAN                                                                  
 

---------------------------------------------------------------------------------------------------------------------------------------------Nested
Loop (cost=36.56..80.89 rows=99 width=24)  ->  Seq Scan on timestamp_tbl t  (cost=0.00..42.10 rows=11 width=8)
Filter:((d1 >= '1990-01-01 00:00:00'::timestamp without time zone) AND (d1 <= '2001-01-01 00:00:00'::timestamp without
timezone))  ->  Materialize  (cost=36.56..36.65 rows=9 width=16)        ->  Seq Scan on interval_tbl i
(cost=0.00..36.55rows=9 width=16)              Filter: ((f1 >= '00:00:00'::interval) AND (f1 <= '23:00:00'::interval))
 

If timestamp_tbl is then ANALYZEd, the plan flips around to put i on the
outside of the nestloop (because the estimate of the number of matching
rows rises to 49, which is pretty good because the actual is 52).

OTOH, if interval_tbl is ANALYZEd, the estimate for it drops to 2 rows
(again a much better match to reality) and we go back to preferring
i on the inside, with or without timestamp_tbl having been analyzed.

And, at least in the serial-schedule case, the stats at this point
look like
relid  | schemaname |    relname    | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd |
n_tup_del| n_live_tup | n_dead_tup | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze 
 

--------+------------+---------------+----------+--------------+----------+---------------+-----------+-----------+-----------+------------+------------+-------------+-----------------+--------------+------------------132885
|public     | interval_tbl  |       22 |          210 |          |               |        10 |         0 |         0 |
      10 |          0 |             |                 |              | 132879 | public     | timestamp_tbl |       45 |
       2444 |          |               |        74 |         0 |         8 |         66 |          8 |             |
            |              | 
 

So yesterday's change to reduce the analyze threshold to 50 means that
timestamp_tbl is now vulnerable to being asynchronously analyzed while
the tests run.

While I don't have any very strong objection to putting an ORDER BY
on these particular queries, I'm worried about how many other regression
tests will now start showing random failures.  We have an awful lot
of small tables in the tests ...
        regards, tom lane

pgsql-hackers by date:

From: Tom Lane
Date: 24 July 2007, 14:16:28
Subject: Re: strange buildfarm failure on lionfish

From: Alvaro Herrera
Date: 24 July 2007, 14:56:14
Subject: Re: strange buildfarm failure on lionfish

Re: strange buildfarm failure on lionfish - Mailing list pgsql-hackers

Previous

Next