Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly |
Date | |
Msg-id | 25502.1520277552@sss.pgh.pa.us Whole thread Raw |
In response to | Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a join directly (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: postgres_fdw: perform UPDATE/DELETE .. RETURNING on a joindirectly
|
List | pgsql-hackers |
I wrote: > The thing that I find curious, now that we've shut off autovacuum > altogether on those tables, is that we *still* aren't getting stable > results. How can that be? I spent some time trying to figure out what's going on here. I've still not been able to replicate the failure on any of my machines, but I have learned a thing or two. On the first query that's still failing on rhinoceros (and has also been seen to fail on other machines, before the last round of changes), which is at line 1142 in postgres_fdw.sql in current HEAD: EXPLAIN (verbose, costs off) UPDATE ft2 SET c3 = 'baz' FROM ft4 INNER JOIN ft5 ON (ft4.c1 = ft5.c1) WHERE ft2.c1 > 2000 AND ft2.c2 === ft4.c1 RETURNING ft2.*, ft4.*, ft5.*; -- can't be pushed down (call this Q1), we are expecting to get a plan of the shape of -> Nested Loop -> Foreign Scan on public.ft2 Remote SQL: SELECT "C 1", c2, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" > 2000)) FOR UPDATE -> Foreign Scan Relations: (public.ft4) INNER JOIN (public.ft5) Now, there is no possible way that the planner would pick that plan if its estimate of the number of rows out of the ft2 scan was more than 1. Re-executing the foreign join would have high enough overhead to push the plan to some other shape. In fact, probably the shape we see in the actual plan choice, on the failing machines: -> Nested Loop -> Foreign Scan Relations: (public.ft4) INNER JOIN (public.ft5) -> Materialize -> Foreign Scan on public.ft2 Remote SQL: SELECT "C 1", c2, c4, c5, c6, c7, c8, ctid FROM "S 1"."T 1" WHERE (("C 1" > 2000)) FORUPDATE I instrumented costsize.c, and determined that the estimated size of "S 1"."T 1" WHERE (("C 1" > 2000)), before the clamp_row_est() rounding, is only 0.1159 rows on my machine. So there's no way to explain the plan change as the result of small platform-specific roundoff differences --- we need something more than a 12X change in the selectivity estimate before the plan shape would change like this. There are a bunch of things going on under the hood, such as that the table's raw rowcount estimate gets scaled up from the original 1000 rows because it's now got more pages than before, but none come close to explaining 12X. However, the planner is working with quite old statistics, dating back to the manual ANALYZE at postgres_fdw.sql line 93. If I stick in another manual ANALYZE just before Q1, I get exactly the same plan change reported by rhinoceros. (And underneath, the rowcount estimate for ft2 has gone from 1 row to 8 rows, which is much closer to the true value of 10 rows, so the plan change is not surprising.) What's more, doing this also reproduces the one other plan change seen later in rhinoceros' output. It is, therefore, very hard to avoid the conclusion that something is causing an ANALYZE to happen while the script runs, despite our fooling about with the table's reloptions. I'm not sure that that something is autovacuum. A platform-specific bug in reloptions handling doesn't seem out of the question, but poking around in the code didn't spot anything obvious. Joe, I wonder if you could add "log_autovacuum_min_duration = 0" to rhinoceros' extra_config options, temporarily? Correlating that log output with the log_statement output from the test proper would let us confirm or deny whether it's autovacuum. Another thing I'd like to do is temporarily add select relpages, reltuples from pg_class where relname = 'T 1'; to the test script, both just after the manual ANALYZE and just before Q1. If we see a change between those reports on any of the affected machines, we'll know that *something* is changing the stats. Now the problem with doing that is that the expected value of relpages is platform-dependent (I see 11 on 64-bit and 10 on 32-bit on my machines). We can work around that, perhaps by capturing the initial value in a temp table and printing only the delta, but I'm not sure if it's worth the effort as opposed to just letting it fail on 32-bit critters for a day or two. regards, tom lane
pgsql-hackers by date: