Re: why partition pruning doesn't work? - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: why partition pruning doesn't work? |
Date | |
Msg-id | 4114.1531674142@sss.pgh.pa.us Whole thread Raw |
In response to | Re: why partition pruning doesn't work? (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>) |
Responses |
Re: why partition pruning doesn't work?
Re: why partition pruning doesn't work? |
List | pgsql-hackers |
Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> writes: > On 2018/06/19 2:05, Tom Lane wrote: >> Or maybe what we should do is drop ExecLockNonLeafAppendTables/ >> ExecOpenAppendPartitionedTables entirely and teach InitPlan to do it. > Hmm, for InitPlan to do what ExecOpenAppendPartitionedTables does, we'd > have to have all the partitioned tables (contained in partitioned_rels > fields of all Append/MergeAppend/ModifyTable nodes of a plan) also listed > in a global list like rootResultRelations and nonleafResultRelations of > PlannedStmt. > Attached updated patch. I've been looking at this patch, and while it's not unreasonable on its own terms, I'm growing increasingly distressed at the ad-hoc and rather duplicative nature of the data structures that have gotten stuck into plan trees thanks to partitioning (the rootResultRelations and nonleafResultRelations lists being prime examples). It struck me this morning that a whole lot of the complication here is solely due to needing to identify the right type of relation lock to take during executor startup, and that THAT WORK IS TOTALLY USELESS. In every case, we must already hold a suitable lock before we ever get to the executor; either one acquired during the parse/plan pipeline, or one re-acquired by AcquireExecutorLocks in the case of a cached plan. Otherwise it's entirely possible that the plan has been invalidated by concurrent DDL --- and it's not the executor's job to detect that and re-plan; that *must* have been done upstream. Moreover, it's important from a deadlock-avoidance standpoint that these locks get acquired in the same order as they were acquired during the initial parse/plan pipeline. I think it's reasonable to assume they were acquired in RTE order in that stage, so AcquireExecutorLocks is OK. But, if the current logic in the executor gets them in that order, it's both non-obvious that it does so and horribly fragile if it does, seeing that the responsibility for this is split across InitPlan, ExecOpenScanRelation, and ExecLockNonLeafAppendTables. So I'm thinking that what we really ought to do here is simplify executor startup to just open all rels with NoLock, and get rid of any supporting data structures that turn out to have no other use. (David Rowley's nearby patch to create a properly hierarchical executor data structure for partitioning info is likely to tie into this too, by removing some other vestigial uses of those lists.) I think that this idea has been discussed in the past, and we felt at the time that having the executor take its own locks was a good safety measure, and a relatively cheap one since the lock manager is pretty good at short-circuiting duplicative lock requests. But those are certainly not free. Moreover, I'm not sure that this is really a safety measure at all: if the executor were really taking any lock not already held, it'd be masking a DDL hazard. To investigate this further, I made the attached not-meant-for-commit hack to verify whether InitPlan and related executor startup functions were actually taking any not-previously-held locks. I could only find one such case: the parser always opens tables selected FOR UPDATE with RowShareLock, but if we end up implementing the resulting row mark with ROW_MARK_COPY, the executor is acquiring just AccessShareLock (because ExecOpenScanRelation thinks it needs to acquire some lock). The patch as presented hacks ExecOpenScanRelation to avoid that, and it passes check-world. What we'd be better off doing, if we go this route, is to install an assertion-build-only test that verifies during relation_open(NoLock) that some kind of lock is already held on the rel. That would protect not only the executor, but a boatload of existing places that open rels with NoLock on the currently-unverified assumption that a lock is already held. I'm also rather strongly tempted to add a field to RangeTblEntry that records the appropriate lock strength to take, so that we don't have to rely on keeping AcquireExecutorLocks' logic to decide on the lock type in sync with whatever the parse/plan pipeline does. (One could then imagine adding assertions in the executor that this field shows a lock strength of at least X, in place of actually opening the rel with X.) BTW, there'd be a lot to be said for having InitPlan just open all the rels and build an array of Relation pointers that parallels the RTE list, rather than doing heap_opens in random places elsewhere. regards, tom lane diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index 8026fe2..58c62f3 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -817,6 +817,7 @@ InitPlan(QueryDesc *queryDesc, int eflags) TupleDesc tupType; ListCell *l; int i; + bool gotlock; /* * Do permissions checks @@ -852,7 +853,9 @@ InitPlan(QueryDesc *queryDesc, int eflags) Relation resultRelation; resultRelationOid = getrelid(resultRelationIndex, rangeTable); - resultRelation = heap_open(resultRelationOid, RowExclusiveLock); + gotlock = LockRelationOid(resultRelationOid, RowExclusiveLock); + Assert(!gotlock || IsParallelWorker()); + resultRelation = heap_open(resultRelationOid, NoLock); InitResultRelInfo(resultRelInfo, resultRelation, @@ -892,7 +895,9 @@ InitPlan(QueryDesc *queryDesc, int eflags) Relation resultRelDesc; resultRelOid = getrelid(resultRelIndex, rangeTable); - resultRelDesc = heap_open(resultRelOid, RowExclusiveLock); + gotlock = LockRelationOid(resultRelOid, RowExclusiveLock); + Assert(!gotlock || IsParallelWorker()); + resultRelDesc = heap_open(resultRelOid, NoLock); InitResultRelInfo(resultRelInfo, resultRelDesc, lfirst_int(l), @@ -912,8 +917,11 @@ InitPlan(QueryDesc *queryDesc, int eflags) /* We locked the roots above. */ if (!list_member_int(plannedstmt->rootResultRelations, resultRelIndex)) - LockRelationOid(getrelid(resultRelIndex, rangeTable), - RowExclusiveLock); + { + gotlock = LockRelationOid(getrelid(resultRelIndex, rangeTable), + RowExclusiveLock); + Assert(!gotlock || IsParallelWorker()); + } } } } @@ -963,10 +971,14 @@ InitPlan(QueryDesc *queryDesc, int eflags) case ROW_MARK_NOKEYEXCLUSIVE: case ROW_MARK_SHARE: case ROW_MARK_KEYSHARE: - relation = heap_open(relid, RowShareLock); + gotlock = LockRelationOid(relid, RowShareLock); + Assert(!gotlock || IsParallelWorker()); + relation = heap_open(relid, NoLock); break; case ROW_MARK_REFERENCE: - relation = heap_open(relid, AccessShareLock); + gotlock = LockRelationOid(relid, AccessShareLock); + Assert(!gotlock || IsParallelWorker()); + relation = heap_open(relid, NoLock); break; case ROW_MARK_COPY: /* no physical table access is required */ diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c index b963cae..cf08b50 100644 --- a/src/backend/executor/execUtils.c +++ b/src/backend/executor/execUtils.c @@ -42,6 +42,7 @@ #include "postgres.h" +#include "access/parallel.h" #include "access/relscan.h" #include "access/transam.h" #include "executor/executor.h" @@ -645,6 +646,7 @@ ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags) Relation rel; Oid reloid; LOCKMODE lockmode; + bool gotlock; /* * Determine the lock type we need. First, scan to see if target relation @@ -659,13 +661,19 @@ ExecOpenScanRelation(EState *estate, Index scanrelid, int eflags) /* Keep this check in sync with InitPlan! */ ExecRowMark *erm = ExecFindRowMark(estate, scanrelid, true); - if (erm != NULL && erm->relation != NULL) + /* HACK: assume things are OK for ROW_MARK_COPY case */ + if (erm != NULL) lockmode = NoLock; } /* Open the relation and acquire lock as needed */ reloid = getrelid(scanrelid, estate->es_range_table); - rel = heap_open(reloid, lockmode); + if (lockmode != NoLock) + { + gotlock = LockRelationOid(reloid, lockmode); + Assert(!gotlock || IsParallelWorker()); + } + rel = heap_open(reloid, NoLock); /* * Complain if we're attempting a scan of an unscannable relation, except @@ -874,6 +882,7 @@ ExecLockNonLeafAppendTables(List *partitioned_rels, EState *estate) Index rti = lfirst_int(lc); bool is_result_rel = false; Oid relid = getrelid(rti, estate->es_range_table); + bool gotlock; /* If this is a result relation, already locked in InitPlan */ foreach(l, stmt->nonleafResultRelations) @@ -903,9 +912,10 @@ ExecLockNonLeafAppendTables(List *partitioned_rels, EState *estate) } if (rc && RowMarkRequiresRowShareLock(rc->markType)) - LockRelationOid(relid, RowShareLock); + gotlock = LockRelationOid(relid, RowShareLock); else - LockRelationOid(relid, AccessShareLock); + gotlock = LockRelationOid(relid, AccessShareLock); + Assert(!gotlock || IsParallelWorker()); } } } diff --git a/src/backend/storage/lmgr/lmgr.c b/src/backend/storage/lmgr/lmgr.c index 7b2dcb6..a1ebf64 100644 --- a/src/backend/storage/lmgr/lmgr.c +++ b/src/backend/storage/lmgr/lmgr.c @@ -100,8 +100,9 @@ SetLocktagRelationOid(LOCKTAG *tag, Oid relid) * * Lock a relation given only its OID. This should generally be used * before attempting to open the relation's relcache entry. + * Return TRUE if we acquired a new lock, FALSE if already held. */ -void +bool LockRelationOid(Oid relid, LOCKMODE lockmode) { LOCKTAG tag; @@ -122,7 +123,11 @@ LockRelationOid(Oid relid, LOCKMODE lockmode) * CommandCounterIncrement, not here.) */ if (res != LOCKACQUIRE_ALREADY_HELD) + { AcceptInvalidationMessages(); + return true; + } + return false; } /* diff --git a/src/include/storage/lmgr.h b/src/include/storage/lmgr.h index a217de9..69e6f7f 100644 --- a/src/include/storage/lmgr.h +++ b/src/include/storage/lmgr.h @@ -37,7 +37,7 @@ typedef enum XLTW_Oper extern void RelationInitLockInfo(Relation relation); /* Lock a relation */ -extern void LockRelationOid(Oid relid, LOCKMODE lockmode); +extern bool LockRelationOid(Oid relid, LOCKMODE lockmode); extern bool ConditionalLockRelationOid(Oid relid, LOCKMODE lockmode); extern void UnlockRelationId(LockRelId *relid, LOCKMODE lockmode); extern void UnlockRelationOid(Oid relid, LOCKMODE lockmode);
pgsql-hackers by date: