Thread: partitioning - changing a slot's descriptor is expensive
Hi, (sorry if I CCed the wrong folks, the code has changed a fair bit) I noticed that several places in the partitioning code look like: /* * If the tuple has been routed, it's been converted to the * partition's rowtype, which might differ from the root * table's. We must convert it back to the root table's * rowtype so that val_desc shown error message matches the * input tuple. */ if (resultRelInfo->ri_PartitionRoot) { TableTuple tuple = ExecFetchSlotTuple(slot); TupleConversionMap *map; rel = resultRelInfo->ri_PartitionRoot; tupdesc = RelationGetDescr(rel); /* a reverse map */ map = convert_tuples_by_name(orig_tupdesc, tupdesc, gettext_noop("could not convert row type")); if (map != NULL) { tuple = do_convert_tuple(tuple, map); ExecSetSlotDescriptor(slot, tupdesc); ExecStoreTuple(tuple, slot, InvalidBuffer, false); } } Unfortunately calling ExecSetSlotDescriptor() is far from cheap, it has to reallocate the values/isnull arrays (and potentially do desc pinning etc). Nor is it always cheap to call ExecFetchSlotTuple(slot) - it might be a virtual slot. Calling heap_deform_tuple(), as done in do_convert_tuple(), might be superfluous work, because the input tuple might already be entirely deformed in the input slot. I think it'd be good to rewrite the code so there's an input and an output slot that each will keep their slot descriptors set. Besides performance as the code stands, this'll also be important for pluggable storage (as we'd otherwise get a *lot* of back/forth conversions between tuple formats). Anybody interesting in writing a patch that redoes this? Greetings, Andres Freund
Hi Andres, On 2018/06/27 14:09, Andres Freund wrote: > Hi, > > (sorry if I CCed the wrong folks, the code has changed a fair bit) > > I noticed that several places in the partitioning code look like: > > /* > * If the tuple has been routed, it's been converted to the > * partition's rowtype, which might differ from the root > * table's. We must convert it back to the root table's > * rowtype so that val_desc shown error message matches the > * input tuple. > */ > if (resultRelInfo->ri_PartitionRoot) > { > TableTuple tuple = ExecFetchSlotTuple(slot); > TupleConversionMap *map; > > rel = resultRelInfo->ri_PartitionRoot; > tupdesc = RelationGetDescr(rel); > /* a reverse map */ > map = convert_tuples_by_name(orig_tupdesc, tupdesc, > gettext_noop("could not convert row type")); > if (map != NULL) > { > tuple = do_convert_tuple(tuple, map); > ExecSetSlotDescriptor(slot, tupdesc); > ExecStoreTuple(tuple, slot, InvalidBuffer, false); > } > } This particular code block (and a couple of others like it) are executed only if a tuple fails partition constraint check, so maybe not a very frequently executed piece of code. There is however similar code that runs in non-error paths, such as in ExecPrepareTupleRouting(), that is executed *if* the leaf partition and the root parent have differing attribute numbers. So, one might say that that code's there to handle the special cases which may not arise much in practice, but it may still be a good idea to make improvements if we can do so. > Unfortunately calling ExecSetSlotDescriptor() is far from cheap, it has > to reallocate the values/isnull arrays (and potentially do desc pinning > etc). Nor is it always cheap to call ExecFetchSlotTuple(slot) - it might > be a virtual slot. Calling heap_deform_tuple(), as done in > do_convert_tuple(), might be superfluous work, because the input tuple > might already be entirely deformed in the input slot. > > I think it'd be good to rewrite the code so there's an input and an > output slot that each will keep their slot descriptors set. > > Besides performance as the code stands, this'll also be important for > pluggable storage (as we'd otherwise get a *lot* of back/forth > conversions between tuple formats). > > Anybody interesting in writing a patch that redoes this? Thanks for the pointers above, I will see if I can produce a patch and will also be interested in hearing what others may have to say. Thanks, Amit
On 2018-06-27 14:46:26 +0900, Amit Langote wrote: > Hi Andres, > > On 2018/06/27 14:09, Andres Freund wrote: > > Hi, > > > > (sorry if I CCed the wrong folks, the code has changed a fair bit) > > > > I noticed that several places in the partitioning code look like: > > > > /* > > * If the tuple has been routed, it's been converted to the > > * partition's rowtype, which might differ from the root > > * table's. We must convert it back to the root table's > > * rowtype so that val_desc shown error message matches the > > * input tuple. > > */ > > if (resultRelInfo->ri_PartitionRoot) > > { > > TableTuple tuple = ExecFetchSlotTuple(slot); > > TupleConversionMap *map; > > > > rel = resultRelInfo->ri_PartitionRoot; > > tupdesc = RelationGetDescr(rel); > > /* a reverse map */ > > map = convert_tuples_by_name(orig_tupdesc, tupdesc, > > gettext_noop("could not convert row type")); > > if (map != NULL) > > { > > tuple = do_convert_tuple(tuple, map); > > ExecSetSlotDescriptor(slot, tupdesc); > > ExecStoreTuple(tuple, slot, InvalidBuffer, false); > > } > > } > > This particular code block (and a couple of others like it) are executed > only if a tuple fails partition constraint check, so maybe not a very > frequently executed piece of code. > > There is however similar code that runs in non-error paths, such as in > ExecPrepareTupleRouting(), that is executed *if* the leaf partition and > the root parent have differing attribute numbers. So, one might say that > that code's there to handle the special cases which may not arise much in > practice, but it may still be a good idea to make improvements if we can > do so. Yea, I was referring to all do_convert_tuple() callers, and some of them are more hot-path than the specific one above. E.g. the ConvertPartitionTupleSlot() call in CopyFrom(). > > Unfortunately calling ExecSetSlotDescriptor() is far from cheap, it has > > to reallocate the values/isnull arrays (and potentially do desc pinning > > etc). Nor is it always cheap to call ExecFetchSlotTuple(slot) - it might > > be a virtual slot. Calling heap_deform_tuple(), as done in > > do_convert_tuple(), might be superfluous work, because the input tuple > > might already be entirely deformed in the input slot. > > > > I think it'd be good to rewrite the code so there's an input and an > > output slot that each will keep their slot descriptors set. > > > > Besides performance as the code stands, this'll also be important for > > pluggable storage (as we'd otherwise get a *lot* of back/forth > > conversions between tuple formats). > > > > Anybody interesting in writing a patch that redoes this? > > Thanks for the pointers above, I will see if I can produce a patch and > will also be interested in hearing what others may have to say. Cool. Greetings, Andres Freund
On 2018/06/27 14:55, Andres Freund wrote: > On 2018-06-27 14:46:26 +0900, Amit Langote wrote: >> There is however similar code that runs in non-error paths, such as in >> ExecPrepareTupleRouting(), that is executed *if* the leaf partition and >> the root parent have differing attribute numbers. So, one might say that >> that code's there to handle the special cases which may not arise much in >> practice, but it may still be a good idea to make improvements if we can >> do so. > > Yea, I was referring to all do_convert_tuple() callers, and some of them > are more hot-path than the specific one above. E.g. the > ConvertPartitionTupleSlot() call in CopyFrom(). Just in case you haven't noticed, ConvertPartitionTupleSlot(), even if it's called unconditionally from CopyFrom etc., calls do_convert_tuple only if necessary. Note the return statement at the top of its body. HeapTuple ConvertPartitionTupleSlot(TupleConversionMap *map, HeapTuple tuple, TupleTableSlot *new_slot, TupleTableSlot **p_my_slot) { if (!map) return tuple; map is non-NULL only if a partition has different attribute numbers than the root parent. Thanks, Amit
On Wed, Jun 27, 2018 at 10:39 AM, Andres Freund <andres@anarazel.de> wrote: > Hi, > > (sorry if I CCed the wrong folks, the code has changed a fair bit) > > I noticed that several places in the partitioning code look like: > > /* > * If the tuple has been routed, it's been converted to the > * partition's rowtype, which might differ from the root > * table's. We must convert it back to the root table's > * rowtype so that val_desc shown error message matches the > * input tuple. > */ > if (resultRelInfo->ri_PartitionRoot) > { > TableTuple tuple = ExecFetchSlotTuple(slot); > TupleConversionMap *map; > > rel = resultRelInfo->ri_PartitionRoot; > tupdesc = RelationGetDescr(rel); > /* a reverse map */ > map = convert_tuples_by_name(orig_tupdesc, tupdesc, > gettext_noop("could not convert row type")); > if (map != NULL) > { > tuple = do_convert_tuple(tuple, map); > ExecSetSlotDescriptor(slot, tupdesc); > ExecStoreTuple(tuple, slot, InvalidBuffer, false); > } > } > > > Unfortunately calling ExecSetSlotDescriptor() is far from cheap, it has > to reallocate the values/isnull arrays (and potentially do desc pinning > etc). I bumped into this code yesterday while looking at ExecFetchSlotTuple and ExecMaterializeSlot usages. I was wondering the same. ExecSetSlotDescriptor() always frees tts_values and tts_isnull array even if the tuple descriptor being set has same number of attributes as previous one. Most of the times that will be the case. I think we should optimize that case. If you agree, I will submit a patch for that. Amit Langote has clarified that this doesn't happen often since partitioned tables will not require tuple conversion usually. But I agree that in pathological case this is a performance eater and should be fixed. > Nor is it always cheap to call ExecFetchSlotTuple(slot) - it might > be a virtual slot. Calling heap_deform_tuple(), as done in > do_convert_tuple(), might be superfluous work, because the input tuple > might already be entirely deformed in the input slot. > > I think it'd be good to rewrite the code so there's an input and an > output slot that each will keep their slot descriptors set. +1 for that. But I am worried that the code will have to create thousand slots if there are thousand partitions. I think we will need to see how much effect that has. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On 27 June 2018 at 18:33, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: > On Wed, Jun 27, 2018 at 10:39 AM, Andres Freund <andres@anarazel.de> wrote: >> Hi, >> >> (sorry if I CCed the wrong folks, the code has changed a fair bit) >> >> I noticed that several places in the partitioning code look like: >> >> /* >> * If the tuple has been routed, it's been converted to the >> * partition's rowtype, which might differ from the root >> * table's. We must convert it back to the root table's >> * rowtype so that val_desc shown error message matches the >> * input tuple. >> */ >> if (resultRelInfo->ri_PartitionRoot) >> { >> TableTuple tuple = ExecFetchSlotTuple(slot); >> TupleConversionMap *map; >> >> rel = resultRelInfo->ri_PartitionRoot; >> tupdesc = RelationGetDescr(rel); >> /* a reverse map */ >> map = convert_tuples_by_name(orig_tupdesc, tupdesc, >> gettext_noop("could not convert row type")); >> if (map != NULL) >> { >> tuple = do_convert_tuple(tuple, map); >> ExecSetSlotDescriptor(slot, tupdesc); >> ExecStoreTuple(tuple, slot, InvalidBuffer, false); >> } >> } >> >> >> Unfortunately calling ExecSetSlotDescriptor() is far from cheap, it has >> to reallocate the values/isnull arrays (and potentially do desc pinning >> etc). > > I bumped into this code yesterday while looking at ExecFetchSlotTuple > and ExecMaterializeSlot usages. I was wondering the same. > > ExecSetSlotDescriptor() always frees tts_values and tts_isnull array > even if the tuple descriptor being set has same number of attributes > as previous one. Most of the times that will be the case. I think we > should optimize that case. +1 >> I think it'd be good to rewrite the code so there's an input and an >> output slot that each will keep their slot descriptors set. > > +1 for that. > > But I am worried that the code will have to create thousand slots if > there are thousand partitions. I think we will need to see how much > effect that has. I agree that it does not make sense to create as many slots, if at all we go by this approach. Suppose the partitioned table is the only one having different tuple descriptor, and rest of the partitions have same tuple descriptor. In that case, keep track of unique descriptors, and allocate a slot per unique descriptor. -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company
On 2018-06-29 11:20:53 +0530, Amit Khandekar wrote: > On 27 June 2018 at 18:33, Ashutosh Bapat > <ashutosh.bapat@enterprisedb.com> wrote: > > On Wed, Jun 27, 2018 at 10:39 AM, Andres Freund <andres@anarazel.de> wrote: > >> Unfortunately calling ExecSetSlotDescriptor() is far from cheap, it has > >> to reallocate the values/isnull arrays (and potentially do desc pinning > >> etc). > > > > I bumped into this code yesterday while looking at ExecFetchSlotTuple > > and ExecMaterializeSlot usages. I was wondering the same. > > > > ExecSetSlotDescriptor() always frees tts_values and tts_isnull array > > even if the tuple descriptor being set has same number of attributes > > as previous one. Most of the times that will be the case. I think we > > should optimize that case. > > +1 This doesn't strike me as a great optimization. Any place where change descriptors with any regularity, we're doing something wrong or at least decidedly suboptimal. We shouldn't react to that by optimizing the wrong thing, we should do the wrong thing less often. > >> I think it'd be good to rewrite the code so there's an input and an > >> output slot that each will keep their slot descriptors set. > > > > +1 for that. > > > > But I am worried that the code will have to create thousand slots if > > there are thousand partitions. I think we will need to see how much > > effect that has. > > I agree that it does not make sense to create as many slots, if at all > we go by this approach. Suppose the partitioned table is the only one > having different tuple descriptor, and rest of the partitions have > same tuple descriptor. In that case, keep track of unique descriptors, > and allocate a slot per unique descriptor. Why? Compared to the rest of the structures created, a slot is not particularly expensive? I don't see what you're optimizing here. Greetings, Andres Freund
On Fri, Jun 29, 2018 at 11:29 AM, Andres Freund <andres@anarazel.de> wrote: > On 2018-06-29 11:20:53 +0530, Amit Khandekar wrote: >> On 27 June 2018 at 18:33, Ashutosh Bapat >> <ashutosh.bapat@enterprisedb.com> wrote: >> > On Wed, Jun 27, 2018 at 10:39 AM, Andres Freund <andres@anarazel.de> wrote: >> >> Unfortunately calling ExecSetSlotDescriptor() is far from cheap, it has >> >> to reallocate the values/isnull arrays (and potentially do desc pinning >> >> etc). >> > >> > I bumped into this code yesterday while looking at ExecFetchSlotTuple >> > and ExecMaterializeSlot usages. I was wondering the same. >> > >> > ExecSetSlotDescriptor() always frees tts_values and tts_isnull array >> > even if the tuple descriptor being set has same number of attributes >> > as previous one. Most of the times that will be the case. I think we >> > should optimize that case. >> >> +1 > > This doesn't strike me as a great optimization. Any place where change > descriptors with any regularity, we're doing something wrong or at least > decidedly suboptimal. We shouldn't react to that by optimizing the wrong > thing, we should do the wrong thing less often. I agree with all of that, but I think this tiny optimization can be done independent of partitioning problem as well. > > >> >> I think it'd be good to rewrite the code so there's an input and an >> >> output slot that each will keep their slot descriptors set. >> > >> > +1 for that. >> > >> > But I am worried that the code will have to create thousand slots if >> > there are thousand partitions. I think we will need to see how much >> > effect that has. >> >> I agree that it does not make sense to create as many slots, if at all >> we go by this approach. Suppose the partitioned table is the only one >> having different tuple descriptor, and rest of the partitions have >> same tuple descriptor. In that case, keep track of unique descriptors, >> and allocate a slot per unique descriptor. > > Why? Compared to the rest of the structures created, a slot is not > particularly expensive? I don't see what you're optimizing here. The size of slot depends upon the number of attributes of the table. A ten column table will take 80 byes for datum array and 10 bytes (+ padding) for isnull array, which for a thousand partitions would translate to 90KB memory. That may be small compared to the relation cache memory consumed, but it's some memory. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
On 2018/06/29 14:59, Andres Freund wrote: > On 2018-06-29 11:20:53 +0530, Amit Khandekar wrote: >> On 27 June 2018 at 18:33, Ashutosh Bapat >>> But I am worried that the code will have to create thousand slots if >>> there are thousand partitions. I think we will need to see how much >>> effect that has. >> >> I agree that it does not make sense to create as many slots, if at all >> we go by this approach. Suppose the partitioned table is the only one >> having different tuple descriptor, and rest of the partitions have >> same tuple descriptor. In that case, keep track of unique descriptors, >> and allocate a slot per unique descriptor. > > Why? Compared to the rest of the structures created, a slot is not > particularly expensive? I don't see what you're optimizing here. What I'm thinking of doing is something that's inspired by one of the things that David Rowley proposes in his patch for PG 12 to remove inefficiencies in the tuple routing code [1]. Instead of a single TupleTableSlot attached at partition_tuple_slot, we allocate an array of TupleTableSlot pointers of same length as the number of partitions, as you mentioned upthread. We then call MakeTupleTableSlot() only if a partition needs it and pass it the partition's TupleDesc. Allocated slots are remembered in a list. ExecDropSingleTupleTableSlot is then called on those allocated slots when the plan ends. Note that the array of slots is not allocated if none of the partitions affected by a given query (or copy) needed to convert tuples. Other issues that you mentioned, such as needless heap_tuple_deform/form being invoked, seem less localized (to me) than this particular issue, so I created a patch for just this, which is attached with this email. I'm thinking about how to fix the other issues, but will need a bit more time. Thanks, Amit [1] https://www.postgresql.org/message-id/CAKJS1f_1RJyFquuCKRFHTdcXqoPX-PYqAd7nz=GVBwvGh4a6xA@mail.gmail.com
Attachment
On 2018/06/29 15:23, Amit Langote wrote: > Instead of a single TupleTableSlot attached at partition_tuple_slot, we > allocate an array of TupleTableSlot pointers of same length as the number > of partitions, as you mentioned upthread. We then call > MakeTupleTableSlot() only if a partition needs it and pass it the > partition's TupleDesc. Allocated slots are remembered in a list. > ExecDropSingleTupleTableSlot is then called on those allocated slots when > the plan ends. Note that the array of slots is not allocated if none of > the partitions affected by a given query (or copy) needed to convert tuples. Forgot to show effect the patch has (on my workstation, so numbers a bit noisy). create table p (a int, b int, c int) partition by range (a); create table p1 (b int, a int, c int); alter table p attach partition p1 for values from (1) to (maxvalue); Note that every row will end up into p1 and will require conversion. -- 2 million records copy (select i, i+1, i+2 from generate_series(1, 2000000) i) to '/tmp/data.csv' csv; Un-patched: truncate p; copy p from '/tmp/data.csv' csv; COPY 2000000 Time: 8521.308 ms (00:08.521) truncate p; copy p from '/tmp/data.csv' csv; COPY 2000000 Time: 8160.741 ms (00:08.161) truncate p; copy p from '/tmp/data.csv' csv; COPY 2000000 Time: 8389.925 ms (00:08.390) Patched: truncate p; copy p from '/tmp/data.csv' csv; COPY 2000000 Time: 7716.568 ms (00:07.717) truncate p; copy p from '/tmp/data.csv' csv; COPY 2000000 Time: 7569.224 ms (00:07.569) truncate p; copy p from '/tmp/data.csv' csv; COPY 2000000 Time: 7572.085 ms (00:07.572) So, there is at least some speedup. Thanks, Amit
On 29 June 2018 at 11:53, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > What I'm thinking of doing is something that's inspired by one of the > things that David Rowley proposes in his patch for PG 12 to remove > inefficiencies in the tuple routing code [1]. > > Instead of a single TupleTableSlot attached at partition_tuple_slot, we > allocate an array of TupleTableSlot pointers of same length as the number > of partitions, as you mentioned upthread. We then call > MakeTupleTableSlot() only if a partition needs it and pass it the > partition's TupleDesc. Allocated slots are remembered in a list. > ExecDropSingleTupleTableSlot is then called on those allocated slots when > the plan ends. Note that the array of slots is not allocated if none of > the partitions affected by a given query (or copy) needed to convert tuples. Thanks for the patch. I did some review of the patch (needs a rebase). Below are my comments. @@ -1936,12 +1936,11 @@ ExecPartitionCheckEmitError(ResultRelInfo *resultRelInfo, + /* Input slot might be of a partition, which has a fixed tupdesc. */ + slot = MakeTupleTableSlot(tupdesc); if (map != NULL) - { tuple = do_convert_tuple(tuple, map); - ExecSetSlotDescriptor(slot, tupdesc); - ExecStoreTuple(tuple, slot, InvalidBuffer, false); - } + ExecStoreTuple(tuple, slot, InvalidBuffer, false); Both MakeTupleTableSlot() and ExecStoreTuple() can be inside the (map != NULL) if condition. This also applies for similar changes in ExecConstraints() and ExecWithCheckOptions(). + * Initialize an empty slot that will be used to manipulate tuples of any + * this partition's rowtype. of any this => of this + * Initialized the array where these slots are stored, if not already Initialized => Initialize + proute->partition_tuple_slots_alloced = + lappend(proute->partition_tuple_slots_alloced, + proute->partition_tuple_slots[partidx]); Instead of doing the above, I think we can collect those slots in estate->es_tupleTable using ExecInitExtraTupleSlot() so that they don't have to be explicitly dropped in ExecCleanupTupleRouting(). And also then we won't require the new field proute->partition_tuple_slots_alloced. -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company
On 29 June 2018 at 11:53, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > Other issues that you mentioned, such as needless heap_tuple_deform/form > being invoked, seem less localized (to me) than this particular issue, so > I created a patch for just this, which is attached with this email. I'm > thinking about how to fix the other issues, but will need a bit more time. I have started a new thread on these tuple forming/deforming issues [1], and posted there a patch that is to be applied over your Allocate-dedicated-slots-of-partition-tuple.patch (which I rebased and posted in that thread) : [1] https://www.postgresql.org/message-id/CAJ3gD9fR0wRNeAE8VqffNTyONS_UfFPRpqxhnD9Q42vZB%2BJvpg%40mail.gmail.com -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company
Thanks for the review. On 2018/08/17 15:00, Amit Khandekar wrote: > Thanks for the patch. I did some review of the patch (needs a rebase). > Below are my comments. > > @@ -1936,12 +1936,11 @@ ExecPartitionCheckEmitError(ResultRelInfo > *resultRelInfo, > + /* Input slot might be of a partition, which has a fixed tupdesc. */ > + slot = MakeTupleTableSlot(tupdesc); > if (map != NULL) > - { > tuple = do_convert_tuple(tuple, map); > - ExecSetSlotDescriptor(slot, tupdesc); > - ExecStoreTuple(tuple, slot, InvalidBuffer, false); > - } > + ExecStoreTuple(tuple, slot, InvalidBuffer, false); > > Both MakeTupleTableSlot() and ExecStoreTuple() can be inside the (map > != NULL) if condition. > This also applies for similar changes in ExecConstraints() and > ExecWithCheckOptions(). Ah, okay. I guess that means we'll allocate a new slot here only if we had to switch to a partition-specific slot in the first place. > + * Initialize an empty slot that will be used to manipulate tuples of any > + * this partition's rowtype. > of any this => of this > > + * Initialized the array where these slots are stored, if not already > Initialized => Initialize Fixed. > + proute->partition_tuple_slots_alloced = > + lappend(proute->partition_tuple_slots_alloced, > + proute->partition_tuple_slots[partidx]); > > Instead of doing the above, I think we can collect those slots in > estate->es_tupleTable using ExecInitExtraTupleSlot() so that they > don't have to be explicitly dropped in ExecCleanupTupleRouting(). And > also then we won't require the new field > proute->partition_tuple_slots_alloced. Although I was slightly uncomfortable of the idea at first, thinking that it's not good for tuple routing specific resources to be released by generic executor code, doesn't seem too bad to do it the way you suggest. Attached updated patch. By the way, I think it might be a good idea to try to merge this patch with the effort in the following thread. * Reduce partition tuple routing overheads * https://commitfest.postgresql.org/19/1690/ Thanks, Amit
Attachment
On 20 August 2018 at 14:45, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > Thanks for the review. > > On 2018/08/17 15:00, Amit Khandekar wrote: >> Thanks for the patch. I did some review of the patch (needs a rebase). >> Below are my comments. >> >> @@ -1936,12 +1936,11 @@ ExecPartitionCheckEmitError(ResultRelInfo >> *resultRelInfo, >> + /* Input slot might be of a partition, which has a fixed tupdesc. */ >> + slot = MakeTupleTableSlot(tupdesc); >> if (map != NULL) >> - { >> tuple = do_convert_tuple(tuple, map); >> - ExecSetSlotDescriptor(slot, tupdesc); >> - ExecStoreTuple(tuple, slot, InvalidBuffer, false); >> - } >> + ExecStoreTuple(tuple, slot, InvalidBuffer, false); >> >> Both MakeTupleTableSlot() and ExecStoreTuple() can be inside the (map >> != NULL) if condition. >> This also applies for similar changes in ExecConstraints() and >> ExecWithCheckOptions(). > > Ah, okay. I guess that means we'll allocate a new slot here only if we > had to switch to a partition-specific slot in the first place. > >> + * Initialize an empty slot that will be used to manipulate tuples of any >> + * this partition's rowtype. >> of any this => of this >> >> + * Initialized the array where these slots are stored, if not already >> Initialized => Initialize > > Fixed. > >> + proute->partition_tuple_slots_alloced = >> + lappend(proute->partition_tuple_slots_alloced, >> + proute->partition_tuple_slots[partidx]); >> >> Instead of doing the above, I think we can collect those slots in >> estate->es_tupleTable using ExecInitExtraTupleSlot() so that they >> don't have to be explicitly dropped in ExecCleanupTupleRouting(). And >> also then we won't require the new field >> proute->partition_tuple_slots_alloced. > > Although I was slightly uncomfortable of the idea at first, thinking that > it's not good for tuple routing specific resources to be released by > generic executor code, doesn't seem too bad to do it the way you suggest. > > Attached updated patch. Thanks for the changes. The patch looks good to me. > By the way, I think it might be a good idea to > try to merge this patch with the effort in the following thread. > > * Reduce partition tuple routing overheads * > https://commitfest.postgresql.org/19/1690/ Yes. I guess, whichever commits later needs to be rebased on the earlier one. -- Thanks, -Amit Khandekar EnterpriseDB Corporation The Postgres Database Company
Hi, On 2018-08-20 16:40:17 +0530, Amit Khandekar wrote: > On 20 August 2018 at 14:45, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > > By the way, I think it might be a good idea to > > try to merge this patch with the effort in the following thread. > > > > * Reduce partition tuple routing overheads * > > https://commitfest.postgresql.org/19/1690/ > > Yes. I guess, whichever commits later needs to be rebased on the earlier one. I think I'd rather keep this patch separate, it's pretty simple, so we should be able to commit it fairly soon. Greetings, Andres Freund
On 20 August 2018 at 23:21, Andres Freund <andres@anarazel.de> wrote: > On 2018-08-20 16:40:17 +0530, Amit Khandekar wrote: >> On 20 August 2018 at 14:45, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: >> > By the way, I think it might be a good idea to >> > try to merge this patch with the effort in the following thread. >> > >> > * Reduce partition tuple routing overheads * >> > https://commitfest.postgresql.org/19/1690/ >> >> Yes. I guess, whichever commits later needs to be rebased on the earlier one. > > I think I'd rather keep this patch separate, it's pretty simple, so we > should be able to commit it fairly soon. +1. I think that patch is already big enough. I'm perfectly fine with rebasing that if this one gets committed first. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services