From dce6657e911afecef82cddda90e0d236c273fe18 Mon Sep 17 00:00:00 2001
From: Amit Langote <amitlan@postgresql.org>
Date: Fri, 22 Sep 2023 18:17:15 +0900
Subject: [PATCH v49 4/8] Teach the executor to lock child tables in some cases

An upcoming commit will move the locking of child tables referenced
in a cached plan tree from GetCachedPlan() to the executor
initialization of the plan tree in ExecutorStart().  This commit
teaches ExecGetRangeTableRelation() to lock child tables if
EState.es_cachedplan points to a CachedPlan.

The executor must now deal with the cases where an unlocked child
table might have been concurrently dropped, so this modifies
ExecGetRangeTableRelation() to use try_table_open().  All of its
callers (and those of ExecOpenScanRelation() that calls it) must
now account for the child table disappearing, which means to abort
initializing the table's Scan node in the middle.

ExecGetRangeTableRelation() now examines inFromCl field of an RTE
to determine that a given range table relation is a child table, so
this commit also makes the planner set inFromCl to false in the
child tables' RTEs that it manufactures.

Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.comk
---
 src/backend/executor/README          | 36 +++++++++++++++++++++++-
 src/backend/executor/execPartition.c |  2 ++
 src/backend/executor/execUtils.c     | 41 +++++++++++++++++++++-------
 src/backend/optimizer/util/inherit.c |  7 +++++
 src/backend/parser/analyze.c         |  7 ++---
 src/include/nodes/parsenodes.h       |  8 ++++--
 6 files changed, 84 insertions(+), 17 deletions(-)

diff --git a/src/backend/executor/README b/src/backend/executor/README
index 642d63be61..6cd840d3a7 100644
--- a/src/backend/executor/README
+++ b/src/backend/executor/README
@@ -280,6 +280,34 @@ are typically reset to empty once per tuple.  Per-tuple contexts are usually
 associated with ExprContexts, and commonly each PlanState node has its own
 ExprContext to evaluate its qual and targetlist expressions in.
 
+Relation Locking
+----------------
+
+Typically, when the executor initializes a plan tree for execution, it doesn't
+lock non-index relations if the plan tree is freshly generated and not derived
+from a CachedPlan. This is because such locks have already been established
+during the query's parsing, rewriting, and planning phases. However, with a
+cached plan tree, there can be relations that remain unlocked. The function
+GetCachedPlan() locks relations existing in the query's range table pre-planning
+but doesn't account for those added during the planning phase. Consequently,
+inheritance child tables, introduced to the query's range table during planning,
+won't be locked when the cached plan reaches the executor.
+
+The decision to defer locking child tables with GetCachedPlan() arises from the
+fact that not all might be accessed during plan execution. For instance, if
+child tables are partitions, some might be omitted due to pruning at
+execution-initialization-time. Thus, the responsibility of locking these child
+tables is pushed to execution-initialization-time, taking place in ExecInitNode()
+for plan nodes encompassing these tables.
+
+This approach opens a window where a cached plan tree with child tables could
+become outdated if another backend modifies these tables before ExecInitNode()
+locks them. Given this, the executor has the added duty to confirm the plan
+tree's validity whenever it locks a child table post execution-initialization-
+pruning. This validation is done by checking the CachedPlan.is_valid attribute
+of the CachedPlan provided. If the plan tree is outdated (is_valid=false), the
+executor halts any further initialization and alerts the caller that they should
+retry execution with another freshly created plan tree.
 
 Query Processing Control Flow
 -----------------------------
@@ -316,7 +344,13 @@ This is a sketch of control flow for full query processing:
 
 	FreeQueryDesc
 
-Per above comments, it's not really critical for ExecEndNode to free any
+As mentioned in the "Relation Locking" section, if the plan tree is found to
+be stale during one of the recursive calls of ExecInitNode() after taking a
+lock on a child table, the control is immmediately returned to the caller of
+ExecutorStart(), which must redo the steps from CreateQueryDesc with a new
+plan tree.
+
+Per above comments, it's not really critical for ExecEndPlan to free any
 memory; it'll all go away in FreeExecutorState anyway.  However, we do need to
 be careful to close relations, drop buffer pins, etc, so we do need to scan
 the plan state tree to find these sorts of resources.
diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c
index f6c34328b8..532734d758 100644
--- a/src/backend/executor/execPartition.c
+++ b/src/backend/executor/execPartition.c
@@ -1927,6 +1927,8 @@ CreatePartitionPruneState(PlanState *planstate, PartitionPruneInfo *pruneinfo)
 			 * duration of this executor run.
 			 */
 			partrel = ExecGetRangeTableRelation(estate, pinfo->rtindex);
+			if (unlikely(partrel == NULL))
+				return NULL;
 			partkey = RelationGetPartitionKey(partrel);
 			partdesc = PartitionDirectoryLookup(estate->es_partition_directory,
 												partrel);
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index f0f5740c26..117773706a 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -697,6 +697,8 @@ ExecRelationIsTargetRelation(EState *estate, Index scanrelid)
  *
  *		Open the heap relation to be scanned by a base-level scan plan node.
  *		This should be called during the node's ExecInit routine.
+ *
+ *		NULL is returned if the relation is found to have been dropped.
  * ----------------------------------------------------------------
  */
 Relation
@@ -706,6 +708,8 @@ ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags)
 
 	/* Open the relation. */
 	rel = ExecGetRangeTableRelation(estate, scanrelid);
+	if (unlikely(rel == NULL))
+		return NULL;
 
 	/*
 	 * Complain if we're attempting a scan of an unscannable relation, except
@@ -763,6 +767,9 @@ ExecInitRangeTable(EState *estate, List *rangeTable, List *permInfos)
  *		Open the Relation for a range table entry, if not already done
  *
  * The Relations will be closed again in ExecEndPlan().
+ *
+ * Returned value may be NULL if the relation is a child relation that is not
+ * already locked.
  */
 Relation
 ExecGetRangeTableRelation(EState *estate, Index rti)
@@ -779,7 +786,28 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
 
 		Assert(rte->rtekind == RTE_RELATION);
 
-		if (!IsParallelWorker())
+		if (IsParallelWorker() ||
+			(estate->es_cachedplan != NULL && !rte->inFromCl))
+		{
+			/*
+			 * Take a lock if we are a parallel worker or if this is a child
+			 * table referenced in a cached plan.
+			 *
+			 * Parallel workers need to have their own local lock on the
+			 * relation.  This ensures sane behavior in case the parent process
+			 * exits before we do.
+			 *
+			 * When executing a cached plan, child tables must be locked
+			 * here, because plancache.c (GetCachedPlan()) would only have
+			 * locked tables mentioned in the query, that is, tables whose
+			 * RTEs' inFromCl is true.
+			 *
+			 * Note that we use try_table_open() here, because without a lock
+			 * held on the relation, it may have disappeared from under us.
+			 */
+			rel = try_table_open(rte->relid, rte->rellockmode);
+		}
+		else
 		{
 			/*
 			 * In a normal query, we should already have the appropriate lock,
@@ -792,15 +820,6 @@ ExecGetRangeTableRelation(EState *estate, Index rti)
 			Assert(rte->rellockmode == AccessShareLock ||
 				   CheckRelationLockedByMe(rel, rte->rellockmode, false));
 		}
-		else
-		{
-			/*
-			 * If we are a parallel worker, we need to obtain our own local
-			 * lock on the relation.  This ensures sane behavior in case the
-			 * parent process exits before we do.
-			 */
-			rel = table_open(rte->relid, rte->rellockmode);
-		}
 
 		estate->es_relations[rti - 1] = rel;
 	}
@@ -823,6 +842,8 @@ ExecInitResultRelation(EState *estate, ResultRelInfo *resultRelInfo,
 	Relation	resultRelationDesc;
 
 	resultRelationDesc = ExecGetRangeTableRelation(estate, rti);
+	if (unlikely(resultRelationDesc == NULL))
+		return;
 	InitResultRelInfo(resultRelInfo,
 					  resultRelationDesc,
 					  rti,
diff --git a/src/backend/optimizer/util/inherit.c b/src/backend/optimizer/util/inherit.c
index f9d3ff1e7a..1b9d79e341 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -493,6 +493,13 @@ expand_single_inheritance_child(PlannerInfo *root, RangeTblEntry *parentrte,
 	}
 	else
 		childrte->inh = false;
+
+	/*
+	 * Flag child tables as indirectly referenced in the query. This helps
+	 * the executor's ExecGetRangeTableRelation() recognize them as
+	 * inheritance children.
+	 */
+	childrte->inFromCl = false;
 	childrte->securityQuals = NIL;
 
 	/* No permission checking for child RTEs. */
diff --git a/src/backend/parser/analyze.c b/src/backend/parser/analyze.c
index 7a1dfb6364..cf269f8c53 100644
--- a/src/backend/parser/analyze.c
+++ b/src/backend/parser/analyze.c
@@ -3305,10 +3305,9 @@ transformLockingClause(ParseState *pstate, Query *qry, LockingClause *lc,
 		/*
 		 * Lock all regular tables used in query and its subqueries.  We
 		 * examine inFromCl to exclude auto-added RTEs, particularly NEW/OLD
-		 * in rules.  This is a bit of an abuse of a mostly-obsolete flag, but
-		 * it's convenient.  We can't rely on the namespace mechanism that has
-		 * largely replaced inFromCl, since for example we need to lock
-		 * base-relation RTEs even if they are masked by upper joins.
+		 * in rules.  We can't rely on the namespace mechanism since for
+		 * example we need to lock base-relation RTEs even if they are masked
+		 * by upper joins.
 		 */
 		i = 0;
 		foreach(rt, qry->rtable)
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index e494309da8..642c7bdfea 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -987,11 +987,15 @@ typedef struct PartitionCmd
  *
  *	  inFromCl marks those range variables that are listed in the FROM clause.
  *	  It's false for RTEs that are added to a query behind the scenes, such
- *	  as the NEW and OLD variables for a rule, or the subqueries of a UNION.
+ *	  as the NEW and OLD variables for a rule, or the subqueries of a UNION,
+ *	  or the RTEs of inheritance child tables that are added by the planner.
  *	  This flag is not used during parsing (except in transformLockingClause,
  *	  q.v.); the parser now uses a separate "namespace" data structure to
  *	  control visibility.  But it is needed by ruleutils.c to determine
- *	  whether RTEs should be shown in decompiled queries.
+ *	  whether RTEs should be shown in decompiled queries.  The executor uses
+ *	  this to ascertain if an RTE_RELATION entry is for a table explicitly
+ *	  named in the query or a child table added by the planner.  This
+ *	  distinction is vital when child tables in a plan must be locked.
  *
  *	  securityQuals is a list of security barrier quals (boolean expressions),
  *	  to be tested in the listed order before returning a row from the
-- 
2.35.3