Re: [HACKERS] UPDATE of partition key - Mailing list pgsql-hackers
From | Amit Khandekar |
---|---|
Subject | Re: [HACKERS] UPDATE of partition key |
Date | |
Msg-id | CAJ3gD9f4Um99sOJmVSEPj783VWw5GbXkgU3OWcYZJv=ipjTkAw@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] UPDATE of partition key (David Rowley <david.rowley@2ndquadrant.com>) |
Responses |
Re: [HACKERS] UPDATE of partition key
|
List | pgsql-hackers |
The following contains replies to David's remaining comments , i.e. from #27 onwards, followed by replies to Alvaro's review comments. Attached is the revised patch v25. ===================== On 13 November 2017 at 18:25, David Rowley <david.rowley@2ndquadrant.com> wrote: > > 27. Comment does not explain how we're skipping checking the partition > constraint check in: > > * We have already checked partition constraints above, so skip > * checking them here. > > Maybe something like: > > * We've already checked the partition constraint above, however, we > * must still ensure the tuple passes all other constraints, so we'll > * call ExecConstraints() and have it validate all remaining checks. Done. > > 28. For table WITH OIDs, the OID should probably follow the new tuple > for partition-key-UPDATEs. > I understand that as far as possible we want to simulate the UPDATE as if it's a normal table update. But for system columns, I think we should avoid that; and instead, let the system handle it the way it is handling (i.e. the new row in a table should always have a new OID.) > 29. ExecSetupChildParentMap gets called here for non-partitioned relations. > Maybe that's not the best function name? The function only seems to do > that when perleaf is True. I didn't clearly understand this, particularly, what task you were referring to when you said "the function only seems to do that" ? The function does setup child-parent map even when perleaf=false. The function name is chosen that way because the map is always a child-to-root map, but the map array elements may be arranged in the order of the per-partition array 'mtstate->mt_partitions[]', or in the order of the per-subplan result rels 'mtstate->resultRelInfo[]' > > Is a leaf a partition of a partitioned table? It's not that clear the > meaning here. Leaf partition means it is a child of a partitioned table, but it itself is not a partitioned table. I have added more comments for the function ExecSetupChildParentMap() (both, at the function header and inside). Please check and let me know if you still have questions. > > 30. The following chunk of code is giving me a headache trying to > verify which arrays are which size: > > ExecSetupPartitionTupleRouting(rel, > mtstate->resultRelInfo, > (operation == CMD_UPDATE ? nplans : 0), > node->nominalRelation, > estate, > &partition_dispatch_info, > &partitions, > &partition_tupconv_maps, > &subplan_leaf_map, > &partition_tuple_slot, > &num_parted, &num_partitions); > mtstate->mt_partition_dispatch_info = partition_dispatch_info; > mtstate->mt_num_dispatch = num_parted; > mtstate->mt_partitions = partitions; > mtstate->mt_num_partitions = num_partitions; > mtstate->mt_parentchild_tupconv_maps = partition_tupconv_maps; > mtstate->mt_subplan_partition_offsets = subplan_leaf_map; > mtstate->mt_partition_tuple_slot = partition_tuple_slot; > mtstate->mt_root_tuple_slot = MakeTupleTableSlot(); > > I know this patch is not completely responsible for it, but you're not > making things any better. > > Would it not be better to invent some PartitionTupleRouting struct and > make that struct a member of ModifyTableState and CopyState, then just > pass the pointer to that struct to ExecSetupPartitionTupleRouting() > and have it fill in the required details? I think the complexity of > this is already on the high end, I think you really need to do the > refactor before this gets any worse. > Ok. I am currently working on doing this change. So not yet included in the attached patch. Will send yet another revised patch for this change. (TODO) > > 31. The following code seems incorrect: > > /* > * If this is an UPDATE and a BEFORE UPDATE trigger is present, we may > * need to do update tuple routing. > */ > if (resultRelInfo->ri_TrigDesc && > resultRelInfo->ri_TrigDesc->trig_update_before_row && > operation == CMD_UPDATE) > update_tuple_routing_needed = true; > > Shouldn't this be setting update_tuple_routing_needed to false if > there are no before row update triggers? Otherwise, you're setting it > to true regardless of if there are any partition key columns being > UPDATEd. That would make the work you're doing in > inheritance_planner() to set part_cols_updated a waste of time. The point of setting it to true regardless of whether the partition key is updated is : even if partition key is not explicitly modified by the UPDATE, a before-row trigger can update it later. But we can never know whether it is actually going to update. So if there are BR UPDATE triggers on result rels of any of the subplans, we *always* setup the tuple routing. This approach was concluded in the earlier discussions about trigger handling. > > Also, this bit of code is a bit confused. > > /* Decide whether we need to perform update tuple routing. */ > if (rel->rd_rel->relkind != RELKIND_PARTITIONED_TABLE) > update_tuple_routing_needed = false; > > /* > * Build state for tuple routing if it's an INSERT or if it's an UPDATE of > * partition key. > */ > if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE && > (operation == CMD_INSERT || update_tuple_routing_needed)) > > > The first if test would not be required if you fixed the code where > you set update_tuple_routing_needed = true regardless if its a > partitioned table or not. The place where I set update_tuple_routing_needed to true unconditionally, we don't have the relation open, so we don't know whether it is a partitioned table. Hence, set it anyways, and then revert it to false if it's not a partitioned table after all. > > So basically, you need to take the node->part_cols_updated from the > planner, if that's true then perform your test for before row update > triggers, set a bool to false if there are none, then proceed to setup > the partition tuple routing for partition table inserts or if your > bool is still true. Right? I think if we look at "update_tuple_routing_needed" as meaning that update tuple routing *may be* required, then the logic as-is makes sense: Set the variable if we see that we may require to do update routing. And the conditions for that are : either node->partKeyUpdated is true, or there is a BR UPDATE trigger and the operation is UPDATE. So set this variable for those conditions, and revert it back to false later if it is found that it's not a partitioned table. So I have retained the existing logic in the patch, but with some additional comments to make this logic clear to the reader. > > 32. "WCO" abbreviation is not that common and might need to be expanded. > > * Below are required as reference objects for mapping partition > * attno's in expressions such as WCO and RETURNING. > > Searching for other comments which mention "WCO" they're all around > places that is easy to understand they mean "With Check Option", e.g. > next to a variable with a more descriptive name. That's not the case > here. Ok. Changed WCO to WithCheckOptions. > > 33. "are anyway newly allocated", should "anyway" be "always"? > Otherwise, it does not make sense. > OK. Changed this : * because all leaf partition result rels are anyway newly allocated. to this (also removed 'all') : * because leaf partition result rels are always newly allocated. > > 34. Comment added which mentions a member that does not exist. > > * all_part_cols contains all attribute numbers from the parent that are > * used as partitioning columns by the parent or some descendent which is > * itself partitioned. > * Oops. Left-overs from earlier patch. Removed the comment. ===================== Below are Alvaro's review comments : On 14 November 2017 at 22:22, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > David Rowley wrote: > >> 5. Triggers. Do we need a new "TG_" tag to allow trigger functions to >> do something special when the DELETE/INSERT is a partition move? I >> have audit tables in mind here it may appear as though a user >> performed a DELETE when they actually performed an UPDATE giving >> visibility of this to the trigger function will allow the application >> to work around this. > > +1 I think we do need a flag that can be inspected from the user > trigger function. What I feel is : it's too early to do such changes. I think we should first get in the core patch, and then consider this request and any further enhancements. > >> 9. Also, this should likely be reworded: >> >> * 'num_update_rri' : number of UPDATE per-subplan result rels. For INSERT, >> * this is 0. >> >> 'num_update_rri' number of elements in 'update_rri' array or zero for INSERT. > > Also: > > /pgsql/source/master/src/backend/executor/execMain.c: In function 'ExecSetupPartitionTupleRouting': > /pgsql/source/master/src/backend/executor/execMain.c:3401:18: warning: 'leaf_part_arr' may be used uninitialized in thisfunction [-Wmaybe-uninitialized] > leaf_part_rri = leaf_part_arr + i; > ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~ > Right. I have now made "leaf_part_arr = NULL" during declaration. Actually leaf_part_arr is used only for inserts; but for compiler-sake we should add this initialization. > I think using num_update_rri==0 as a flag to indicate INSERT is strange. > I suggest passing an additional boolean -- I think adding another param looks redundant. To make the condition more readable, I have introduced a new local variable : bool is_update = (num_update_rri > 0); > or maybe just split the whole > function in two, one for updates and another for inserts, say > ExecSetupPartitionTupleRoutingForInsert() and > ExecSetupPartitionTupleRoutingForUpdate(). They seem to > share almost no code, and the current flow is hard to read; maybe just > add a common subroutine for the lower bottom of the loop. So there are two common code sections. One is the initial code which initializes various arrays and output params. And the 2nd common code is the 2nd half of the for loop block that includes calls to heap_open(), InitResultRelInfo(), convert_tuples_by_name(), CheckValidResultRel() and others. So it looks like there is a lot of common code. We would need to have two functions, one to have the initialization code, and the other to run the later half of the loop. And, heap_open() and InitResultRelInfo() need to be called only if partrel (which needs to be passed as function param) is NULL. Rather than this, I think this condition is better placed in-line in ExecSetupPartitionTupleRouting() for clarity. I am feeling it's not worth doing the shuffling. We are extracting the code into two functions only to avoid the "if num_update_rri" conditions. That's why I feel having a "is_update" variable would solve the purpose. The hard-to-understand code, I presume, is the update part where it tries to re-use already-existing result resl, and this part would anyways remain, although in a separate function. -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company
Attachment
pgsql-hackers by date: