From e788b293f3770c7d89bc2156658f4bde3aba1303 Mon Sep 17 00:00:00 2001 From: "dgrowley@gmail.com" Date: Fri, 9 Nov 2018 10:20:14 +1300 Subject: [PATCH v16 2/2] Delay locking of partitions during INSERT and UPDATE During INSERT, even if we were inserting a single row into a partitioned table, we would obtain a lock on every partition which was a direct or an indirect partition of the insert target table. This was done in order to provide a consistent order to the locking of the partitions, which happens to be the same order that partitions are locked during planning. The problem with locking all these partitions was that if a partitioned table had many partitions and the INSERT inserted one, or just a few rows, the overhead of the locking was significantly more than the inserting the actual rows. This commit changes the locking to only lock partitions the first time we route a tuple to them, so if you insert one row, then only 1 leaf partition will be locked, plus any sub-partitioned tables that we search through before we find the correct home of the tuple. This does mean that the locking order of partitions during INSERT does become less well defined. Previously operations such as CREATE INDEX and TRUNCATE when performed on leaf partitions could defend against deadlocking with concurrent INSERT by performing the operation in table oid order. However, to deadlock, such DDL would have had to have been performed inside a transaction and not in table oid order. With this commit it's now possible to get deadlocks even if the DDL is performed in table oid order. If required such transactions can defend against such deadlocks by performing a LOCK TABLE on the partitioned table before performing the DDL. Currently, only INSERTs are affected by this change as UPDATEs to a partitioned table still obtain locks on all partitions either during planning or during AcquireExecutorLocks, however, there are upcoming patches which may change this too. --- src/backend/executor/execPartition.c | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-) diff --git a/src/backend/executor/execPartition.c b/src/backend/executor/execPartition.c index 962db6d7f0..f37371f561 100644 --- a/src/backend/executor/execPartition.c +++ b/src/backend/executor/execPartition.c @@ -167,9 +167,6 @@ static void find_matching_subplans_recurse(PartitionPruningData *prunedata, * tuple routing for partitioned tables, encapsulates it in * PartitionTupleRouting, and returns it. * - * Note that all the relations in the partition tree are locked using the - * RowExclusiveLock mode upon return from this function. - * * Callers must use the returned PartitionTupleRouting during calls to * ExecFindPartition(). The actual ResultRelInfo for a partition is only * allocated when the first tuple is routed there. @@ -180,9 +177,6 @@ ExecSetupPartitionTupleRouting(ModifyTableState *mtstate, Relation rel) PartitionTupleRouting *proute; ModifyTable *node = mtstate ? (ModifyTable *) mtstate->ps.plan : NULL; - /* Lock all the partitions. */ - (void) find_all_inheritors(RelationGetRelid(rel), RowExclusiveLock, NULL); - /* * Here we attempt to expend as little effort as possible in setting up * the PartitionTupleRouting. Each partition's ResultRelInfo is built on @@ -535,11 +529,7 @@ ExecInitPartitionInfo(ModifyTableState *mtstate, bool found_whole_row; int part_result_rel_index; - /* - * We locked all the partitions in ExecSetupPartitionTupleRouting - * including the leaf partitions. - */ - partrel = heap_open(dispatch->partdesc->oids[partidx], NoLock); + partrel = heap_open(dispatch->partdesc->oids[partidx], RowExclusiveLock); /* * Keep ResultRelInfo and other information for this partition in the @@ -987,7 +977,7 @@ ExecInitPartitionDispatchInfo(PartitionTupleRouting *proute, Oid partoid, int dispatchidx; if (partoid != RelationGetRelid(proute->partition_root)) - rel = heap_open(partoid, NoLock); + rel = heap_open(partoid, RowExclusiveLock); else rel = proute->partition_root; partdesc = RelationGetPartitionDesc(rel); -- 2.16.2.windows.1