Oddity in handling of cached plans for FDW queries - Mailing list pgsql-hackers
From | Etsuro Fujita |
---|---|
Subject | Oddity in handling of cached plans for FDW queries |
Date | |
Msg-id | d49c1e5b-f059-20f4-c132-e9752ee0113e@lab.ntt.co.jp Whole thread Raw |
Responses |
Re: Oddity in handling of cached plans for FDW queries
|
List | pgsql-hackers |
Hi, I noticed odd behavior in invalidating a cached plan with a foreign join due to user mapping changes. Consider: postgres=# create extension postgres_fdw; postgres=# create server loopback foreign data wrapper postgres_fdw options (dbname 'postgres'); postgres=# create user mapping for public server loopback; postgres=# create table t1 (a int, b int); postgres=# insert into t1 values (1, 1); postgres=# create foreign table ft1 (a int, b int) server loopback options (table_name 't1'); postgres=# analyze ft1; postgres=# create view v1 as select * from ft1; postgres=# create user v1_owner; postgres=# alter view v1 owner to v1_owner; postgres=# grant select on ft1 to v1_owner; postgres=# prepare join_stmt as select * from ft1, v1 where ft1.a = v1.a; postgres=# explain verbose execute join_stmt; QUERY PLAN -------------------------------------------------------------------------------------------------------------- Foreign Scan (cost=100.00..102.04 rows=1 width=16) Output: ft1.a, ft1.b, ft1_1.a, ft1_1.b Relations: (public.ft1) INNER JOIN (public.ft1) Remote SQL: SELECT r1.a, r1.b, r5.a, r5.b FROM (public.t1 r1 INNER JOIN public.t1 r5 ON (((r1.a = r5.a)))) (4 rows) That's great. postgres=# create user mapping for v1_owner server loopback; postgres=# explain verbose execute join_stmt; QUERY PLAN ------------------------------------------------------------------------------ Nested Loop (cost=200.00..202.07 rows=1 width=16) Output: ft1.a, ft1.b, ft1_1.a, ft1_1.b Join Filter: (ft1.a = ft1_1.a) -> Foreign Scan on public.ft1 (cost=100.00..101.03 rows=1 width=8) Output: ft1.a, ft1.b Remote SQL: SELECT a, b FROM public.t1 -> Foreign Scan on public.ft1 ft1_1 (cost=100.00..101.03 rows=1 width=8) Output: ft1_1.a, ft1_1.b Remote SQL: SELECT a, b FROM public.t1 (9 rows) That's also great. Since ft1 and v1 use different user mappings, the join should not be pushed down. postgres=# drop user mapping for v1_owner server loopback; postgres=# explain verbose execute join_stmt; QUERY PLAN ------------------------------------------------------------------------------ Nested Loop (cost=200.00..202.07 rows=1 width=16) Output: ft1.a, ft1.b, ft1_1.a, ft1_1.b Join Filter: (ft1.a = ft1_1.a) -> Foreign Scan on public.ft1 (cost=100.00..101.03 rows=1 width=8) Output: ft1.a, ft1.b Remote SQL: SELECT a, b FROM public.t1 -> Foreign Scan on public.ft1 ft1_1 (cost=100.00..101.03 rows=1 width=8) Output: ft1_1.a, ft1_1.b Remote SQL: SELECT a, b FROM public.t1 (9 rows) Seems odd to me. Both relations use the same user mapping as before, so the join should be pushed down. Another thing I noticed is that the code in plancache.c would invalidate a cached plan with a foreign join due to user mapping changes even in the case where user mappings are meaningless to the FDW. To fix the first, I'd like to propose (1) replacing the existing has_foreign_join flag in the CachedPlan data structure with a new flag, say uses_user_mapping, that indicates whether a cached plan uses any user mapping regardless of whether the cached plan has foreign joins or not (likewise, replace the hasForeignJoin flag in PlannedStmt), and (2) invalidating the cached plan if the uses_user_mapping flag is true. I thought that we invalidate the cached plan if the flag is true and there is a possibility of performing foreign joins, but it seems to me that updates on the corresponding catalog are infrequent enough that such a detailed tracking is not worth the effort. One benefit of using the proposed approach is that we could make the FDW's handling of user mappings in BeginForeignScan more efficient; currently, there is additional overhead caused by catalog re-lookups to obtain the user mapping information for the simple-foreign-table-scan case where user mappings mean something to the FDW as in postgres_fdw (probably, updates on the catalogs are infrequent, though), but we could improve the efficiency by using the validated user mapping information created at planning time for that case as well as for the foreign-join case. For that, I'd like to propose storing the validated user mapping information into ForeignScan, not fdw_private. There is a discussion about using an ExtensibleNode [1] for that, but the proposed approach would make the FDW's job much simpler. I don't think the above change is sufficient to fix the second. The root reason for that is that since currently, we allow the user mapping OID (rel->umid) to be InvalidOid in two cases: (1) user mappings mean something to the FDW but it can't get any user mapping at planning time and (2) user mappings are meaningless to the FDW, we cannot distinguish these two cases. So, I'd like to introduce a new callback routine to specify that user mappings mean something to the FDW as proposed by Tom [2], and use that to reject the former case, which allows us to set the above uses_user_mapping flag appropriately, ie, set the flag to true only if user mapping changes require forcing a replan. This would change the postgres_fdw's behavior that it allows to prepare statements involving foreign tables without any user mappings as long as those aren't required at planning time, but I'm not sure that it's a good idea to keep that behavior because ISTM that such a behavior would make people sloppy about creating user mappings, which could lead to latent security problems [2]. Attached is a proposed patch for that. Best regards, Etsuro Fujita [1] https://www.postgresql.org/message-id/CA%2BTgmoZK6BB4RTb5paz45Vme%3Dq6Z3D7FF2-VKdVyQCS1ps-cGw%40mail.gmail.com [2] https://www.postgresql.org/message-id/18299.1457923919%40sss.pgh.pa.us
Attachment
pgsql-hackers by date: