Re: Instability in the postgres_fdw regression test - Mailing list pgsql-hackers
From | Etsuro Fujita |
---|---|
Subject | Re: Instability in the postgres_fdw regression test |
Date | |
Msg-id | 5AD020A5.10605@lab.ntt.co.jp Whole thread Raw |
In response to | Re: Instability in the postgres_fdw regression test (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
(2018/04/13 3:49), Robert Haas wrote: > On Thu, Apr 12, 2018 at 12:49 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> We've been seeing $subject since commit 1bc0100d2, with little clue >> as to the cause. Previous attempts to fix it by preventing autovacuum >> from running on the relevant tables didn't seem to help. >> >> I have now managed to reproduce the issue reliably enough to study it. >> (It turns out that on longfin's host, running the buildfarm script *under >> cron* produces the failure a reasonable percentage of the time. The >> identical test case, launched from an interactive shell, never fails. >> I speculate that the kernel scheduler treats cron jobs differently.) >> It is in fact a timing issue, and what triggers it is there being an >> active autovacuum task in another database. The fact that the buildfarm >> script uses USE_MODULE_DB in the contrib tests greatly increases the >> surface area of the problem, since that creates a lot more databases that >> autovacuum could be active in. Once I realized that, I found that it can >> trivially be reproduced by hand, in a "make installcheck" test, simply by >> having an open transaction in another DB in the cluster. For instance >> "select pg_sleep(60);" and then run make installcheck in postgres_fdw. >> >> So now, drilling down to the specific problem: what we're seeing is that >> the plans for some queries change because the "remote" server reports >> a different rowcount estimate than usual for >> >> EXPLAIN SELECT "C 1", c2, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1"> 2000)) FOR UPDATE >> >> Normally the estimate is one, but in the failure cases it's 12 or so. >> This estimate is coming out of ineq_histogram_selectivity, of course. >> Furthermore, the highest value actually present in the histogram is >> 1000, because that was the highest value of "C 1" when the test did >> ANALYZE up at the top. That means we'll invoke get_actual_variable_range >> to try to identify the actual current maximum. Most of the time, it >> returns 2010, which is correct, and we end up estimating that only >> about one row will exceed 2000. >> >> However, in the failure cases, what's getting extracted from the index >> is 9999. That's because we'd inserted and deleted that value further >> up in the test, and if there's been an open transaction holding back >> RecentGlobalXmin, then SnapshotNonVacuumable is going to consider that >> entry still good. This value is enough larger than 2000 to move the >> selectivity estimate appreciably, and then kaboom. >> >> This theory successfully explains the observed fact that some buildfarm >> failures show differences in two query plans, while others show a >> difference just in the first one. In the latter cases, the external >> transaction ended, so that RecentGlobalXmin could advance, in between. >> >> What I propose to do to fix the instability is to change the test >> stanza that uses 9999 as a key-chosen-at-random to use something less >> far away from the normal range of "C 1" values, so that whether it's >> still visible to get_actual_variable_range has less effect on this >> selectivity estimate. That's a hack, for sure, but I don't see any >> other fix that's much less of a hack. > > Thanks for the detective work. Thanks a lot! Best regards, Etsuro Fujita
pgsql-hackers by date: