Re: making update/delete of inheritance trees scale better - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Re: making update/delete of inheritance trees scale better |
Date | |
Msg-id | CA+HiwqH-2sq-3Zq-CtuWjfRSyrGPXJBf1nCKKvTHuGVyfQ1OYA@mail.gmail.com Whole thread Raw |
In response to | Re: making update/delete of inheritance trees scale better (Amit Langote <amitlangote09@gmail.com>) |
Responses |
Re: making update/delete of inheritance trees scale better
|
List | pgsql-hackers |
So, I think I have a patch that seems to work, but not all the way, more on which below. Here is the commit message in the attached patch. === Subject: [PATCH] Overhaul UPDATE's targetlist processing Instead of emitting the full tuple matching the target table's tuple descriptor, make the plan emit only the attributes that are assigned values in the SET clause, plus row-identity junk attributes as before. This allows us to avoid making a separate plan for each target relation in the inheritance case, because the only reason it is so currently is to account for the fact that each target relations may have a set of attributes that is different from others. Having only one plan suffices, because the set of assigned attributes must be same in all the result relations. While the plan will now produce only the assigned attributes and row-identity junk attributes, other columns' values are filled by refetching the old tuple. To that end, there will be a targetlist for each target relation to compute the full tuple, that is, by combining the values from the plan tuple and the old tuple, but they are passed separately in the ModifyTable node. Implementation notes: * In the inheritance case, as the same plan produces tuples to be updated from multiple result relations, the tuples now need to also identity which table they come from, so an additional junk attribute "tableoid" is present in that case. * Considering that the inheritance set may contain foreign tables that require a different (set of) row-identity junk attribute(s), the plan needs to emit multiple distinct junk attributes. When transposed to a child scan node, this targetlist emits a non-NULL value for the junk attribute that's valid for the child relation and NULL for others. * Executor and FDW execution APIs can no longer assume any specific order in which the result relations will be processed. For each tuple to be updated/deleted, result relation is selected by looking it up in a hash table using the "tableoid" value as the key. * Since the plan does not emit values for all the attributes, FDW APIs may not assume that the individual column values in the TupleTableSlot containing the plan tuple are accessible by their attribute numbers. TODO: * Reconsider having only one plan! * Update FDW handler docs to reflect the API changes === Regarding the first TODO, it is to address the limitation that FDWs will no longer be able push the *whole* child UPDATE/DELETE query down to the remote server, including any joins, which is allowed at the moment via PlanDirectModify API. The API seems to have been designed with an assumption that the child scan/join node is the top-level plan, but that's no longer the case. If we consider bypassing the Append and allow ModifyTable to access the child scan/join nodes directly, maybe we can allow that. I haven't updated the expected output of postgres_fdw regression tests for now pending this. A couple of things in the patch that I feel slightly uneasy about: * Result relations are now appendrel children in the planner. Normally, any wholerow Vars in the child relation's reltarget->exprs get a ConvertRowType added on top to convert it back to the parent's reltype, because that's what the client expects in the SELECT case. In the result relation case, the executor expects to see child wholerow Vars themselves, not their parent versions. * FDW's ExecFoeignUpdate() API expects that the NEW tuple passed to it match the target foreign table reltype, so that it can access the target attributes in the tuple by attribute numbers. Considering that the plan no longer builds the full tuple itself, I made the executor satisfy that expectation by filling the missing attributes' values using the target table's wholerow attribute. That is, we now *always* fetch the wholerow attributes for UPDATE, not just when there are row-level triggers that need it. I think that's unfortunate. Maybe, the correct way is asking the FDWs to translate (setrefs.c style) the target attribute numbers appropriately to access the plan's output tuple. I will add the patch to the next CF. I haven't yet fully checked the performance considerations of the new approach, but will do so in the coming days. -- Amit Langote EnterpriseDB: http://www.enterprisedb.com
Attachment
pgsql-hackers by date: