Re: executor relation handling - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: executor relation handling |
Date | |
Msg-id | 31259.1538954295@sss.pgh.pa.us Whole thread Raw |
In response to | Re: executor relation handling (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: executor relation handling
Re: executor relation handling Re: executor relation handling Re: executor relation handling Re: executor relation handling |
List | pgsql-hackers |
I wrote: > Still need to think a bit more about whether we want 0005 in > anything like its current form. So I poked at that for a bit, and soon realized that the *main* problem there is that ExecFindRowMark() eats O(N^2) time due to repeated searches of the es_rowMarks list. While the patch as stated would improve that for cases where most of the partitions can be pruned at plan time, it does nothing much for cases where they can't. However, it's pretty trivial to fix that: let's just use an array not a list. Patch 0001 attached does that. A further improvement we could consider is to avoid opening the relcache entries for pruned-away relations. I could not find a way to do that that was less invasive than removing ExecRowMark.relation and requiring callers to call a new function to get the relation if they need it. Patch 0002 attached is a delta on top of 0001 that does that. Replicating your select-lt-for-share test case as best I can (you never actually specified it carefully), I find that the TPS rate on my workstation goes from about 250 tps with HEAD to 920 tps with patch 0001 or 1130 tps with patch 0002. This compares to about 1600 tps for the non-FOR-SHARE version of the query. However, we should keep in mind that without partitioning overhead (ie "select * from lt_999 where b = 999 for share"), the TPS rate is over 25800 tps. Most of the overhead in the partitioned case seems to be from acquiring locks on rangetable entries that we won't ever use, and none of these patch variants are touching that problem. So ISTM that the *real* win for this scenario is going to come from teaching the system to prune unwanted relations from the query altogether, not just from the PlanRowMark list. Keeping that comparison in mind, I'm inclined to think that 0001 is the best thing to do for now. The incremental win from 0002 is not big enough to justify the API break it creates, while your 0005 is not really attacking the problem the right way. regards, tom lane diff --git a/src/backend/executor/execCurrent.c b/src/backend/executor/execCurrent.c index 7480cf5..aadf749 100644 --- a/src/backend/executor/execCurrent.c +++ b/src/backend/executor/execCurrent.c @@ -91,21 +91,22 @@ execCurrentOf(CurrentOfExpr *cexpr, * the other code can't, while the non-FOR-UPDATE case allows use of WHERE * CURRENT OF with an insensitive cursor. */ - if (queryDesc->estate->es_rowMarks) + if (queryDesc->estate->es_rowmarks) { ExecRowMark *erm; - ListCell *lc; + Index i; /* * Here, the query must have exactly one FOR UPDATE/SHARE reference to * the target table, and we dig the ctid info out of that. */ erm = NULL; - foreach(lc, queryDesc->estate->es_rowMarks) + for (i = 0; i < queryDesc->estate->es_range_table_size; i++) { - ExecRowMark *thiserm = (ExecRowMark *) lfirst(lc); + ExecRowMark *thiserm = queryDesc->estate->es_rowmarks[i]; - if (!RowMarkRequiresRowShareLock(thiserm->markType)) + if (thiserm == NULL || + !RowMarkRequiresRowShareLock(thiserm->markType)) continue; /* ignore non-FOR UPDATE/SHARE items */ if (thiserm->relid == table_oid) diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index b6abad5..0a766ef 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -909,61 +909,68 @@ InitPlan(QueryDesc *queryDesc, int eflags) } /* - * Next, build the ExecRowMark list from the PlanRowMark(s), if any. + * Next, build the ExecRowMark array from the PlanRowMark(s), if any. */ - estate->es_rowMarks = NIL; - foreach(l, plannedstmt->rowMarks) + if (plannedstmt->rowMarks) { - PlanRowMark *rc = (PlanRowMark *) lfirst(l); - Oid relid; - Relation relation; - ExecRowMark *erm; + estate->es_rowmarks = (ExecRowMark **) + palloc0(estate->es_range_table_size * sizeof(ExecRowMark *)); + foreach(l, plannedstmt->rowMarks) + { + PlanRowMark *rc = (PlanRowMark *) lfirst(l); + Oid relid; + Relation relation; + ExecRowMark *erm; - /* ignore "parent" rowmarks; they are irrelevant at runtime */ - if (rc->isParent) - continue; + /* ignore "parent" rowmarks; they are irrelevant at runtime */ + if (rc->isParent) + continue; - /* get relation's OID (will produce InvalidOid if subquery) */ - relid = exec_rt_fetch(rc->rti, estate)->relid; + /* get relation's OID (will produce InvalidOid if subquery) */ + relid = exec_rt_fetch(rc->rti, estate)->relid; - /* open relation, if we need to access it for this mark type */ - switch (rc->markType) - { - case ROW_MARK_EXCLUSIVE: - case ROW_MARK_NOKEYEXCLUSIVE: - case ROW_MARK_SHARE: - case ROW_MARK_KEYSHARE: - case ROW_MARK_REFERENCE: - relation = ExecGetRangeTableRelation(estate, rc->rti); - break; - case ROW_MARK_COPY: - /* no physical table access is required */ - relation = NULL; - break; - default: - elog(ERROR, "unrecognized markType: %d", rc->markType); - relation = NULL; /* keep compiler quiet */ - break; - } + /* open relation, if we need to access it for this mark type */ + switch (rc->markType) + { + case ROW_MARK_EXCLUSIVE: + case ROW_MARK_NOKEYEXCLUSIVE: + case ROW_MARK_SHARE: + case ROW_MARK_KEYSHARE: + case ROW_MARK_REFERENCE: + relation = ExecGetRangeTableRelation(estate, rc->rti); + break; + case ROW_MARK_COPY: + /* no physical table access is required */ + relation = NULL; + break; + default: + elog(ERROR, "unrecognized markType: %d", rc->markType); + relation = NULL; /* keep compiler quiet */ + break; + } - /* Check that relation is a legal target for marking */ - if (relation) - CheckValidRowMarkRel(relation, rc->markType); - - erm = (ExecRowMark *) palloc(sizeof(ExecRowMark)); - erm->relation = relation; - erm->relid = relid; - erm->rti = rc->rti; - erm->prti = rc->prti; - erm->rowmarkId = rc->rowmarkId; - erm->markType = rc->markType; - erm->strength = rc->strength; - erm->waitPolicy = rc->waitPolicy; - erm->ermActive = false; - ItemPointerSetInvalid(&(erm->curCtid)); - erm->ermExtra = NULL; - - estate->es_rowMarks = lappend(estate->es_rowMarks, erm); + /* Check that relation is a legal target for marking */ + if (relation) + CheckValidRowMarkRel(relation, rc->markType); + + erm = (ExecRowMark *) palloc(sizeof(ExecRowMark)); + erm->relation = relation; + erm->relid = relid; + erm->rti = rc->rti; + erm->prti = rc->prti; + erm->rowmarkId = rc->rowmarkId; + erm->markType = rc->markType; + erm->strength = rc->strength; + erm->waitPolicy = rc->waitPolicy; + erm->ermActive = false; + ItemPointerSetInvalid(&(erm->curCtid)); + erm->ermExtra = NULL; + + Assert(erm->rti > 0 && erm->rti <= estate->es_range_table_size && + estate->es_rowmarks[erm->rti - 1] == NULL); + + estate->es_rowmarks[erm->rti - 1] = erm; + } } /* @@ -2394,13 +2401,12 @@ ExecUpdateLockMode(EState *estate, ResultRelInfo *relinfo) ExecRowMark * ExecFindRowMark(EState *estate, Index rti, bool missing_ok) { - ListCell *lc; - - foreach(lc, estate->es_rowMarks) + if (rti > 0 && rti <= estate->es_range_table_size && + estate->es_rowmarks != NULL) { - ExecRowMark *erm = (ExecRowMark *) lfirst(lc); + ExecRowMark *erm = estate->es_rowmarks[rti - 1]; - if (erm->rti == rti) + if (erm) return erm; } if (!missing_ok) @@ -3131,6 +3137,7 @@ EvalPlanQualStart(EPQState *epqstate, EState *parentestate, Plan *planTree) estate->es_range_table_array = parentestate->es_range_table_array; estate->es_range_table_size = parentestate->es_range_table_size; estate->es_relations = parentestate->es_relations; + estate->es_rowmarks = parentestate->es_rowmarks; estate->es_plannedstmt = parentestate->es_plannedstmt; estate->es_junkFilter = parentestate->es_junkFilter; estate->es_output_cid = parentestate->es_output_cid; @@ -3148,7 +3155,6 @@ EvalPlanQualStart(EPQState *epqstate, EState *parentestate, Plan *planTree) } /* es_result_relation_info must NOT be copied */ /* es_trig_target_relations must NOT be copied */ - estate->es_rowMarks = parentestate->es_rowMarks; estate->es_top_eflags = parentestate->es_top_eflags; estate->es_instrument = parentestate->es_instrument; /* es_auxmodifytables must NOT be copied */ diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c index d98e90e..71c6b5d 100644 --- a/src/backend/executor/execUtils.c +++ b/src/backend/executor/execUtils.c @@ -113,6 +113,7 @@ CreateExecutorState(void) estate->es_range_table_array = NULL; estate->es_range_table_size = 0; estate->es_relations = NULL; + estate->es_rowmarks = NULL; estate->es_plannedstmt = NULL; estate->es_junkFilter = NULL; @@ -142,8 +143,6 @@ CreateExecutorState(void) estate->es_tupleTable = NIL; - estate->es_rowMarks = NIL; - estate->es_processed = 0; estate->es_lastoid = InvalidOid; @@ -709,6 +708,12 @@ ExecInitRangeTable(EState *estate, List *rangeTable) */ estate->es_relations = (Relation *) palloc0(estate->es_range_table_size * sizeof(Relation)); + + /* + * es_rowmarks is also parallel to the es_range_table_array, but it's + * allocated only if needed. + */ + estate->es_rowmarks = NULL; } /* diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h index 657b593..8347089 100644 --- a/src/include/nodes/execnodes.h +++ b/src/include/nodes/execnodes.h @@ -34,6 +34,7 @@ struct PlanState; /* forward references in this file */ struct ParallelHashJoinState; +struct ExecRowMark; struct ExprState; struct ExprContext; struct RangeTblEntry; /* avoid including parsenodes.h here */ @@ -491,6 +492,8 @@ typedef struct EState Index es_range_table_size; /* size of the range table arrays */ Relation *es_relations; /* Array of per-range-table-entry Relation * pointers, or NULL if not yet opened */ + struct ExecRowMark **es_rowmarks; /* Array of per-range-table-entry + * ExecRowMarks, or NULL if none */ PlannedStmt *es_plannedstmt; /* link to top of plan tree */ const char *es_sourceText; /* Source text from QueryDesc */ @@ -537,8 +540,6 @@ typedef struct EState List *es_tupleTable; /* List of TupleTableSlots */ - List *es_rowMarks; /* List of ExecRowMarks */ - uint64 es_processed; /* # of tuples processed */ Oid es_lastoid; /* last oid processed (by INSERT) */ @@ -607,7 +608,9 @@ typedef struct EState * node that sources the relation (e.g., for a foreign table the FDW can use * ermExtra to hold information). * - * EState->es_rowMarks is a list of these structs. + * EState->es_rowmarks is an array of these structs, indexed by RT index, + * with NULLs for irrelevant RT indexes. es_rowmarks itself is NULL if + * there are no rowmarks. */ typedef struct ExecRowMark { @@ -629,7 +632,7 @@ typedef struct ExecRowMark * additional runtime representation of FOR [KEY] UPDATE/SHARE clauses * * Each LockRows and ModifyTable node keeps a list of the rowmarks it needs to - * deal with. In addition to a pointer to the related entry in es_rowMarks, + * deal with. In addition to a pointer to the related entry in es_rowmarks, * this struct carries the column number(s) of the resjunk columns associated * with the rowmark (see comments for PlanRowMark for more detail). In the * case of ModifyTable, there has to be a separate ExecAuxRowMark list for @@ -638,7 +641,7 @@ typedef struct ExecRowMark */ typedef struct ExecAuxRowMark { - ExecRowMark *rowmark; /* related entry in es_rowMarks */ + ExecRowMark *rowmark; /* related entry in es_rowmarks */ AttrNumber ctidAttNo; /* resno of ctid junk attribute, if any */ AttrNumber toidAttNo; /* resno of tableoid junk attribute, if any */ AttrNumber wholeAttNo; /* resno of whole-row junk attribute, if any */ diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index 0a766ef..1775e77 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -919,7 +919,6 @@ InitPlan(QueryDesc *queryDesc, int eflags) { PlanRowMark *rc = (PlanRowMark *) lfirst(l); Oid relid; - Relation relation; ExecRowMark *erm; /* ignore "parent" rowmarks; they are irrelevant at runtime */ @@ -929,32 +928,7 @@ InitPlan(QueryDesc *queryDesc, int eflags) /* get relation's OID (will produce InvalidOid if subquery) */ relid = exec_rt_fetch(rc->rti, estate)->relid; - /* open relation, if we need to access it for this mark type */ - switch (rc->markType) - { - case ROW_MARK_EXCLUSIVE: - case ROW_MARK_NOKEYEXCLUSIVE: - case ROW_MARK_SHARE: - case ROW_MARK_KEYSHARE: - case ROW_MARK_REFERENCE: - relation = ExecGetRangeTableRelation(estate, rc->rti); - break; - case ROW_MARK_COPY: - /* no physical table access is required */ - relation = NULL; - break; - default: - elog(ERROR, "unrecognized markType: %d", rc->markType); - relation = NULL; /* keep compiler quiet */ - break; - } - - /* Check that relation is a legal target for marking */ - if (relation) - CheckValidRowMarkRel(relation, rc->markType); - erm = (ExecRowMark *) palloc(sizeof(ExecRowMark)); - erm->relation = relation; erm->relid = relid; erm->rti = rc->rti; erm->prti = rc->prti; @@ -963,6 +937,7 @@ InitPlan(QueryDesc *queryDesc, int eflags) erm->strength = rc->strength; erm->waitPolicy = rc->waitPolicy; erm->ermActive = false; + erm->ermChecked = false; ItemPointerSetInvalid(&(erm->curCtid)); erm->ermExtra = NULL; @@ -2462,6 +2437,20 @@ ExecBuildAuxRowMark(ExecRowMark *erm, List *targetlist) return aerm; } +Relation +ExecRowMarkGetRelation(EState *estate, ExecRowMark *erm) +{ + Relation relation = ExecGetRangeTableRelation(estate, erm->rti); + + /* Check that relation is a legal target for marking, if we haven't */ + if (!erm->ermChecked) + { + CheckValidRowMarkRel(relation, erm->markType); + erm->ermChecked = true; + } + return relation; +} + /* * EvalPlanQual logic --- recheck modified tuple(s) to see if we want to @@ -2935,10 +2924,9 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate) if (erm->markType == ROW_MARK_REFERENCE) { + Relation relation = ExecRowMarkGetRelation(epqstate->estate, erm); HeapTuple copyTuple; - Assert(erm->relation != NULL); - /* fetch the tuple's ctid */ datum = ExecGetJunkAttribute(epqstate->origslot, aerm->ctidAttNo, @@ -2948,18 +2936,18 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate) continue; /* fetch requests on foreign tables must be passed to their FDW */ - if (erm->relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE) + if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE) { FdwRoutine *fdwroutine; bool updated = false; - fdwroutine = GetFdwRoutineForRelation(erm->relation, false); + fdwroutine = GetFdwRoutineForRelation(relation, false); /* this should have been checked already, but let's be safe */ if (fdwroutine->RefetchForeignRow == NULL) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("cannot lock rows in foreign table \"%s\"", - RelationGetRelationName(erm->relation)))); + RelationGetRelationName(relation)))); copyTuple = fdwroutine->RefetchForeignRow(epqstate->estate, erm, datum, @@ -2979,7 +2967,7 @@ EvalPlanQualFetchRowMarks(EPQState *epqstate) Buffer buffer; tuple.t_self = *((ItemPointer) DatumGetPointer(datum)); - if (!heap_fetch(erm->relation, SnapshotAny, &tuple, &buffer, + if (!heap_fetch(relation, SnapshotAny, &tuple, &buffer, false, NULL)) elog(ERROR, "failed to fetch tuple for EvalPlanQual recheck"); diff --git a/src/backend/executor/nodeLockRows.c b/src/backend/executor/nodeLockRows.c index 6db345a..4e161c0 100644 --- a/src/backend/executor/nodeLockRows.c +++ b/src/backend/executor/nodeLockRows.c @@ -74,6 +74,7 @@ lnext: { ExecAuxRowMark *aerm = (ExecAuxRowMark *) lfirst(lc); ExecRowMark *erm = aerm->rowmark; + Relation relation; HeapTuple *testTuple; Datum datum; bool isNull; @@ -122,19 +123,22 @@ lnext: if (isNull) elog(ERROR, "ctid is NULL"); + /* access the tuple's relation */ + relation = ExecRowMarkGetRelation(estate, erm); + /* requests for foreign tables must be passed to their FDW */ - if (erm->relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE) + if (relation->rd_rel->relkind == RELKIND_FOREIGN_TABLE) { FdwRoutine *fdwroutine; bool updated = false; - fdwroutine = GetFdwRoutineForRelation(erm->relation, false); + fdwroutine = GetFdwRoutineForRelation(relation, false); /* this should have been checked already, but let's be safe */ if (fdwroutine->RefetchForeignRow == NULL) ereport(ERROR, (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("cannot lock rows in foreign table \"%s\"", - RelationGetRelationName(erm->relation)))); + RelationGetRelationName(relation)))); copyTuple = fdwroutine->RefetchForeignRow(estate, erm, datum, @@ -180,7 +184,7 @@ lnext: break; } - test = heap_lock_tuple(erm->relation, &tuple, + test = heap_lock_tuple(relation, &tuple, estate->es_output_cid, lockmode, erm->waitPolicy, true, &buffer, &hufd); @@ -230,7 +234,7 @@ lnext: } /* updated, so fetch and lock the updated version */ - copyTuple = EvalPlanQualFetch(estate, erm->relation, + copyTuple = EvalPlanQualFetch(estate, relation, lockmode, erm->waitPolicy, &hufd.ctid, hufd.xmax); @@ -286,6 +290,7 @@ lnext: { ExecAuxRowMark *aerm = (ExecAuxRowMark *) lfirst(lc); ExecRowMark *erm = aerm->rowmark; + Relation relation; HeapTupleData tuple; Buffer buffer; @@ -309,13 +314,16 @@ lnext: continue; } + /* access the tuple's relation */ + relation = ExecRowMarkGetRelation(estate, erm); + /* foreign tables should have been fetched above */ - Assert(erm->relation->rd_rel->relkind != RELKIND_FOREIGN_TABLE); + Assert(relation->rd_rel->relkind != RELKIND_FOREIGN_TABLE); Assert(ItemPointerIsValid(&(erm->curCtid))); /* okay, fetch the tuple */ tuple.t_self = erm->curCtid; - if (!heap_fetch(erm->relation, SnapshotAny, &tuple, &buffer, + if (!heap_fetch(relation, SnapshotAny, &tuple, &buffer, false, NULL)) elog(ERROR, "failed to fetch tuple for EvalPlanQual recheck"); diff --git a/src/include/executor/executor.h b/src/include/executor/executor.h index c9ed198..4dbbb3b 100644 --- a/src/include/executor/executor.h +++ b/src/include/executor/executor.h @@ -190,6 +190,7 @@ extern void ExecWithCheckOptions(WCOKind kind, ResultRelInfo *resultRelInfo, extern LockTupleMode ExecUpdateLockMode(EState *estate, ResultRelInfo *relinfo); extern ExecRowMark *ExecFindRowMark(EState *estate, Index rti, bool missing_ok); extern ExecAuxRowMark *ExecBuildAuxRowMark(ExecRowMark *erm, List *targetlist); +extern Relation ExecRowMarkGetRelation(EState *estate, ExecRowMark *erm); extern TupleTableSlot *EvalPlanQual(EState *estate, EPQState *epqstate, Relation relation, Index rti, int lockmode, ItemPointer tid, TransactionId priorXmax); diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h index 8347089..e2cd7ea 100644 --- a/src/include/nodes/execnodes.h +++ b/src/include/nodes/execnodes.h @@ -597,16 +597,20 @@ typedef struct EState * ExecRowMark - * runtime representation of FOR [KEY] UPDATE/SHARE clauses * - * When doing UPDATE, DELETE, or SELECT FOR [KEY] UPDATE/SHARE, we will have an - * ExecRowMark for each non-target relation in the query (except inheritance - * parent RTEs, which can be ignored at runtime). Virtual relations such as - * subqueries-in-FROM will have an ExecRowMark with relation == NULL. See - * PlanRowMark for details about most of the fields. In addition to fields - * directly derived from PlanRowMark, we store an activity flag (to denote - * inactive children of inheritance trees), curCtid, which is used by the - * WHERE CURRENT OF code, and ermExtra, which is available for use by the plan - * node that sources the relation (e.g., for a foreign table the FDW can use - * ermExtra to hold information). + * When doing UPDATE, DELETE, or SELECT FOR [KEY] UPDATE/SHARE, we will have + * an ExecRowMark for each non-target relation in the query (except + * inheritance parent RTEs, which can be ignored at runtime). See PlanRowMark + * for details about most of the fields. In addition to fields directly + * derived from PlanRowMark, we store an activity flag (to denote inactive + * children of inheritance trees); a "checked" flag (to denote whether we've + * verified the relation is of appropriate type to be marked); curCtid, which + * is used by the WHERE CURRENT OF code; and ermExtra, which is available for + * use by the plan node that sources the relation (e.g., for a foreign table + * the FDW can use ermExtra to hold information). + * + * For performance reasons, we don't open or verify the relkind of tables + * that are never accessed in a query; this is important in partitioned + * tables with many partitions. Use ExecRowMarkGetRelation to do that. * * EState->es_rowmarks is an array of these structs, indexed by RT index, * with NULLs for irrelevant RT indexes. es_rowmarks itself is NULL if @@ -614,8 +618,7 @@ typedef struct EState */ typedef struct ExecRowMark { - Relation relation; /* opened and suitably locked relation */ - Oid relid; /* its OID (or InvalidOid, if subquery) */ + Oid relid; /* relation's OID (InvalidOid if subquery) */ Index rti; /* its range table index */ Index prti; /* parent range table index, if child */ Index rowmarkId; /* unique identifier for resjunk columns */ @@ -623,6 +626,7 @@ typedef struct ExecRowMark LockClauseStrength strength; /* LockingClause's strength, or LCS_NONE */ LockWaitPolicy waitPolicy; /* NOWAIT and SKIP LOCKED */ bool ermActive; /* is this mark relevant for current tuple? */ + bool ermChecked; /* have we verified relation's relkind? */ ItemPointerData curCtid; /* ctid of currently locked tuple, if any */ void *ermExtra; /* available for use by relation source node */ } ExecRowMark;
pgsql-hackers by date: