Thread: partitioning - changing a slot's descriptor is expensive

partitioning - changing a slot's descriptor is expensive

From

Andres Freund

Date:

27 June 2018, 08:09:28

Hi,

(sorry if I CCed the wrong folks, the code has changed a fair bit)

I noticed that several places in the partitioning code look like:

    /*
     * If the tuple has been routed, it's been converted to the
     * partition's rowtype, which might differ from the root
     * table's.  We must convert it back to the root table's
     * rowtype so that val_desc shown error message matches the
     * input tuple.
     */
    if (resultRelInfo->ri_PartitionRoot)
    {
        TableTuple tuple = ExecFetchSlotTuple(slot);
        TupleConversionMap *map;

        rel = resultRelInfo->ri_PartitionRoot;
        tupdesc = RelationGetDescr(rel);
        /* a reverse map */
        map = convert_tuples_by_name(orig_tupdesc, tupdesc,
                                     gettext_noop("could not convert row type"));
        if (map != NULL)
        {
            tuple = do_convert_tuple(tuple, map);
            ExecSetSlotDescriptor(slot, tupdesc);
            ExecStoreTuple(tuple, slot, InvalidBuffer, false);
        }
    }


Unfortunately calling ExecSetSlotDescriptor() is far from cheap, it has
to reallocate the values/isnull arrays (and potentially do desc pinning
etc). Nor is it always cheap to call ExecFetchSlotTuple(slot) - it might
be a virtual slot. Calling heap_deform_tuple(), as done in
do_convert_tuple(), might be superfluous work, because the input tuple
might already be entirely deformed in the input slot.

I think it'd be good to rewrite the code so there's an input and an
output slot that each will keep their slot descriptors set.

Besides performance as the code stands, this'll also be important for
pluggable storage (as we'd otherwise get a *lot* of back/forth
conversions between tuple formats).

Anybody interesting in writing a patch that redoes this?

Greetings,

Andres Freund

Re: partitioning - changing a slot's descriptor is expensive

From

Amit Langote

Date:

27 June 2018, 08:46:26

Hi Andres,

On 2018/06/27 14:09, Andres Freund wrote:
> Hi,
> 
> (sorry if I CCed the wrong folks, the code has changed a fair bit)
> 
> I noticed that several places in the partitioning code look like:
> 
>     /*
>      * If the tuple has been routed, it's been converted to the
>      * partition's rowtype, which might differ from the root
>      * table's.  We must convert it back to the root table's
>      * rowtype so that val_desc shown error message matches the
>      * input tuple.
>      */
>     if (resultRelInfo->ri_PartitionRoot)
>     {
>         TableTuple tuple = ExecFetchSlotTuple(slot);
>         TupleConversionMap *map;
> 
>         rel = resultRelInfo->ri_PartitionRoot;
>         tupdesc = RelationGetDescr(rel);
>         /* a reverse map */
>         map = convert_tuples_by_name(orig_tupdesc, tupdesc,
>                                      gettext_noop("could not convert row type"));
>         if (map != NULL)
>         {
>             tuple = do_convert_tuple(tuple, map);
>             ExecSetSlotDescriptor(slot, tupdesc);
>             ExecStoreTuple(tuple, slot, InvalidBuffer, false);
>         }
>     }

This particular code block (and a couple of others like it) are executed
only if a tuple fails partition constraint check, so maybe not a very
frequently executed piece of code.

There is however similar code that runs in non-error paths, such as in
ExecPrepareTupleRouting(), that is executed *if* the leaf partition and
the root parent have differing attribute numbers.  So, one might say that
that code's there to handle the special cases which may not arise much in
practice, but it may still be a good idea to make improvements if we can
do so.

> Unfortunately calling ExecSetSlotDescriptor() is far from cheap, it has
> to reallocate the values/isnull arrays (and potentially do desc pinning
> etc). Nor is it always cheap to call ExecFetchSlotTuple(slot) - it might
> be a virtual slot. Calling heap_deform_tuple(), as done in
> do_convert_tuple(), might be superfluous work, because the input tuple
> might already be entirely deformed in the input slot.
> 
> I think it'd be good to rewrite the code so there's an input and an
> output slot that each will keep their slot descriptors set.
> 
> Besides performance as the code stands, this'll also be important for
> pluggable storage (as we'd otherwise get a *lot* of back/forth
> conversions between tuple formats).
> 
> Anybody interesting in writing a patch that redoes this?

Thanks for the pointers above, I will see if I can produce a patch and
will also be interested in hearing what others may have to say.

Thanks,
Amit

Re: partitioning - changing a slot's descriptor is expensive

From

Andres Freund

Date:

27 June 2018, 08:55:58

On 2018-06-27 14:46:26 +0900, Amit Langote wrote:
> Hi Andres,
> 
> On 2018/06/27 14:09, Andres Freund wrote:
> > Hi,
> > 
> > (sorry if I CCed the wrong folks, the code has changed a fair bit)
> > 
> > I noticed that several places in the partitioning code look like:
> > 
> >     /*
> >      * If the tuple has been routed, it's been converted to the
> >      * partition's rowtype, which might differ from the root
> >      * table's.  We must convert it back to the root table's
> >      * rowtype so that val_desc shown error message matches the
> >      * input tuple.
> >      */
> >     if (resultRelInfo->ri_PartitionRoot)
> >     {
> >         TableTuple tuple = ExecFetchSlotTuple(slot);
> >         TupleConversionMap *map;
> > 
> >         rel = resultRelInfo->ri_PartitionRoot;
> >         tupdesc = RelationGetDescr(rel);
> >         /* a reverse map */
> >         map = convert_tuples_by_name(orig_tupdesc, tupdesc,
> >                                      gettext_noop("could not convert row type"));
> >         if (map != NULL)
> >         {
> >             tuple = do_convert_tuple(tuple, map);
> >             ExecSetSlotDescriptor(slot, tupdesc);
> >             ExecStoreTuple(tuple, slot, InvalidBuffer, false);
> >         }
> >     }
> 
> This particular code block (and a couple of others like it) are executed
> only if a tuple fails partition constraint check, so maybe not a very
> frequently executed piece of code.
> 
> There is however similar code that runs in non-error paths, such as in
> ExecPrepareTupleRouting(), that is executed *if* the leaf partition and
> the root parent have differing attribute numbers.  So, one might say that
> that code's there to handle the special cases which may not arise much in
> practice, but it may still be a good idea to make improvements if we can
> do so.

Yea, I was referring to all do_convert_tuple() callers, and some of them
are more hot-path than the specific one above. E.g. the
ConvertPartitionTupleSlot() call in CopyFrom().


> > Unfortunately calling ExecSetSlotDescriptor() is far from cheap, it has
> > to reallocate the values/isnull arrays (and potentially do desc pinning
> > etc). Nor is it always cheap to call ExecFetchSlotTuple(slot) - it might
> > be a virtual slot. Calling heap_deform_tuple(), as done in
> > do_convert_tuple(), might be superfluous work, because the input tuple
> > might already be entirely deformed in the input slot.
> > 
> > I think it'd be good to rewrite the code so there's an input and an
> > output slot that each will keep their slot descriptors set.
> > 
> > Besides performance as the code stands, this'll also be important for
> > pluggable storage (as we'd otherwise get a *lot* of back/forth
> > conversions between tuple formats).
> > 
> > Anybody interesting in writing a patch that redoes this?
> 
> Thanks for the pointers above, I will see if I can produce a patch and
> will also be interested in hearing what others may have to say.

Cool.

Greetings,

Andres Freund

Re: partitioning - changing a slot's descriptor is expensive

From

Amit Langote

Date:

27 June 2018, 09:03:54

On 2018/06/27 14:55, Andres Freund wrote:
> On 2018-06-27 14:46:26 +0900, Amit Langote wrote:
>> There is however similar code that runs in non-error paths, such as in
>> ExecPrepareTupleRouting(), that is executed *if* the leaf partition and
>> the root parent have differing attribute numbers.  So, one might say that
>> that code's there to handle the special cases which may not arise much in
>> practice, but it may still be a good idea to make improvements if we can
>> do so.
> 
> Yea, I was referring to all do_convert_tuple() callers, and some of them
> are more hot-path than the specific one above. E.g. the
> ConvertPartitionTupleSlot() call in CopyFrom().

Just in case you haven't noticed, ConvertPartitionTupleSlot(), even if
it's called unconditionally from CopyFrom etc., calls do_convert_tuple
only if necessary.  Note the return statement at the top of its body.

HeapTuple
ConvertPartitionTupleSlot(TupleConversionMap *map,
                          HeapTuple tuple,
                          TupleTableSlot *new_slot,
                          TupleTableSlot **p_my_slot)
{
    if (!map)
        return tuple;

map is non-NULL only if a partition has different attribute numbers than
the root parent.

Thanks,
Amit

Re: partitioning - changing a slot's descriptor is expensive

From

Ashutosh Bapat

Date:

27 June 2018, 16:03:29

On Wed, Jun 27, 2018 at 10:39 AM, Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> (sorry if I CCed the wrong folks, the code has changed a fair bit)
>
> I noticed that several places in the partitioning code look like:
>
>     /*
>      * If the tuple has been routed, it's been converted to the
>      * partition's rowtype, which might differ from the root
>      * table's.  We must convert it back to the root table's
>      * rowtype so that val_desc shown error message matches the
>      * input tuple.
>      */
>     if (resultRelInfo->ri_PartitionRoot)
>     {
>         TableTuple tuple = ExecFetchSlotTuple(slot);
>         TupleConversionMap *map;
>
>         rel = resultRelInfo->ri_PartitionRoot;
>         tupdesc = RelationGetDescr(rel);
>         /* a reverse map */
>         map = convert_tuples_by_name(orig_tupdesc, tupdesc,
>                                      gettext_noop("could not convert row type"));
>         if (map != NULL)
>         {
>             tuple = do_convert_tuple(tuple, map);
>             ExecSetSlotDescriptor(slot, tupdesc);
>             ExecStoreTuple(tuple, slot, InvalidBuffer, false);
>         }
>     }
>
>
> Unfortunately calling ExecSetSlotDescriptor() is far from cheap, it has
> to reallocate the values/isnull arrays (and potentially do desc pinning
> etc).

I bumped into this code yesterday while looking at ExecFetchSlotTuple
and ExecMaterializeSlot usages. I was wondering the same.

ExecSetSlotDescriptor() always frees tts_values and tts_isnull array
even if the tuple descriptor being set has same number of attributes
as previous one. Most of the times that will be the case. I think we
should optimize that case. If you agree, I will submit a patch for
that.

Amit Langote has clarified that this doesn't happen often since
partitioned tables will not require tuple conversion usually. But I
agree that in pathological case this is a performance eater and should
be fixed.

> Nor is it always cheap to call ExecFetchSlotTuple(slot) - it might
> be a virtual slot. Calling heap_deform_tuple(), as done in
> do_convert_tuple(), might be superfluous work, because the input tuple
> might already be entirely deformed in the input slot.
>
> I think it'd be good to rewrite the code so there's an input and an
> output slot that each will keep their slot descriptors set.

+1 for that.

But I am worried that the code will have to create thousand slots if
there are thousand partitions. I think we will need to see how much
effect that has.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: partitioning - changing a slot's descriptor is expensive

From

Amit Khandekar

Date:

29 June 2018, 08:50:53

On 27 June 2018 at 18:33, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
> On Wed, Jun 27, 2018 at 10:39 AM, Andres Freund <andres@anarazel.de> wrote:
>> Hi,
>>
>> (sorry if I CCed the wrong folks, the code has changed a fair bit)
>>
>> I noticed that several places in the partitioning code look like:
>>
>>     /*
>>      * If the tuple has been routed, it's been converted to the
>>      * partition's rowtype, which might differ from the root
>>      * table's.  We must convert it back to the root table's
>>      * rowtype so that val_desc shown error message matches the
>>      * input tuple.
>>      */
>>     if (resultRelInfo->ri_PartitionRoot)
>>     {
>>         TableTuple tuple = ExecFetchSlotTuple(slot);
>>         TupleConversionMap *map;
>>
>>         rel = resultRelInfo->ri_PartitionRoot;
>>         tupdesc = RelationGetDescr(rel);
>>         /* a reverse map */
>>         map = convert_tuples_by_name(orig_tupdesc, tupdesc,
>>                                      gettext_noop("could not convert row type"));
>>         if (map != NULL)
>>         {
>>             tuple = do_convert_tuple(tuple, map);
>>             ExecSetSlotDescriptor(slot, tupdesc);
>>             ExecStoreTuple(tuple, slot, InvalidBuffer, false);
>>         }
>>     }
>>
>>
>> Unfortunately calling ExecSetSlotDescriptor() is far from cheap, it has
>> to reallocate the values/isnull arrays (and potentially do desc pinning
>> etc).
>
> I bumped into this code yesterday while looking at ExecFetchSlotTuple
> and ExecMaterializeSlot usages. I was wondering the same.
>
> ExecSetSlotDescriptor() always frees tts_values and tts_isnull array
> even if the tuple descriptor being set has same number of attributes
> as previous one. Most of the times that will be the case. I think we
> should optimize that case.

+1

>> I think it'd be good to rewrite the code so there's an input and an
>> output slot that each will keep their slot descriptors set.
>
> +1 for that.
>
> But I am worried that the code will have to create thousand slots if
> there are thousand partitions. I think we will need to see how much
> effect that has.

I agree that it does not make sense to create as many slots, if at all
we go by this approach. Suppose the partitioned table is the only one
having different tuple descriptor, and rest of the partitions have
same tuple descriptor. In that case, keep track of unique descriptors,
and allocate a slot per unique descriptor.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: partitioning - changing a slot's descriptor is expensive

From

Andres Freund

Date:

29 June 2018, 08:59:36

On 2018-06-29 11:20:53 +0530, Amit Khandekar wrote:
> On 27 June 2018 at 18:33, Ashutosh Bapat
> <ashutosh.bapat@enterprisedb.com> wrote:
> > On Wed, Jun 27, 2018 at 10:39 AM, Andres Freund <andres@anarazel.de> wrote:
> >> Unfortunately calling ExecSetSlotDescriptor() is far from cheap, it has
> >> to reallocate the values/isnull arrays (and potentially do desc pinning
> >> etc).
> >
> > I bumped into this code yesterday while looking at ExecFetchSlotTuple
> > and ExecMaterializeSlot usages. I was wondering the same.
> >
> > ExecSetSlotDescriptor() always frees tts_values and tts_isnull array
> > even if the tuple descriptor being set has same number of attributes
> > as previous one. Most of the times that will be the case. I think we
> > should optimize that case.
> 
> +1

This doesn't strike me as a great optimization. Any place where change
descriptors with any regularity, we're doing something wrong or at least
decidedly suboptimal. We shouldn't react to that by optimizing the wrong
thing, we should do the wrong thing less often.


> >> I think it'd be good to rewrite the code so there's an input and an
> >> output slot that each will keep their slot descriptors set.
> >
> > +1 for that.
> >
> > But I am worried that the code will have to create thousand slots if
> > there are thousand partitions. I think we will need to see how much
> > effect that has.
> 
> I agree that it does not make sense to create as many slots, if at all
> we go by this approach. Suppose the partitioned table is the only one
> having different tuple descriptor, and rest of the partitions have
> same tuple descriptor. In that case, keep track of unique descriptors,
> and allocate a slot per unique descriptor.

Why? Compared to the rest of the structures created, a slot is not
particularly expensive? I don't see what you're optimizing here.

Greetings,

Andres Freund

Re: partitioning - changing a slot's descriptor is expensive

From

Ashutosh Bapat

Date:

29 June 2018, 09:20:06

On Fri, Jun 29, 2018 at 11:29 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2018-06-29 11:20:53 +0530, Amit Khandekar wrote:
>> On 27 June 2018 at 18:33, Ashutosh Bapat
>> <ashutosh.bapat@enterprisedb.com> wrote:
>> > On Wed, Jun 27, 2018 at 10:39 AM, Andres Freund <andres@anarazel.de> wrote:
>> >> Unfortunately calling ExecSetSlotDescriptor() is far from cheap, it has
>> >> to reallocate the values/isnull arrays (and potentially do desc pinning
>> >> etc).
>> >
>> > I bumped into this code yesterday while looking at ExecFetchSlotTuple
>> > and ExecMaterializeSlot usages. I was wondering the same.
>> >
>> > ExecSetSlotDescriptor() always frees tts_values and tts_isnull array
>> > even if the tuple descriptor being set has same number of attributes
>> > as previous one. Most of the times that will be the case. I think we
>> > should optimize that case.
>>
>> +1
>
> This doesn't strike me as a great optimization. Any place where change
> descriptors with any regularity, we're doing something wrong or at least
> decidedly suboptimal. We shouldn't react to that by optimizing the wrong
> thing, we should do the wrong thing less often.

I agree with all of that, but I think this tiny optimization can be
done independent of partitioning problem as well.

>
>
>> >> I think it'd be good to rewrite the code so there's an input and an
>> >> output slot that each will keep their slot descriptors set.
>> >
>> > +1 for that.
>> >
>> > But I am worried that the code will have to create thousand slots if
>> > there are thousand partitions. I think we will need to see how much
>> > effect that has.
>>
>> I agree that it does not make sense to create as many slots, if at all
>> we go by this approach. Suppose the partitioned table is the only one
>> having different tuple descriptor, and rest of the partitions have
>> same tuple descriptor. In that case, keep track of unique descriptors,
>> and allocate a slot per unique descriptor.
>
> Why? Compared to the rest of the structures created, a slot is not
> particularly expensive? I don't see what you're optimizing here.

The size of slot depends upon the number of attributes of the table. A
ten column table will take 80 byes for datum array and 10 bytes (+
padding) for isnull array, which for a thousand partitions would
translate to 90KB memory. That may be small compared to the relation
cache memory consumed, but it's some memory.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: partitioning - changing a slot's descriptor is expensive

From

Amit Langote

Date:

29 June 2018, 09:23:32

On 2018/06/29 14:59, Andres Freund wrote:
> On 2018-06-29 11:20:53 +0530, Amit Khandekar wrote:
>> On 27 June 2018 at 18:33, Ashutosh Bapat
>>> But I am worried that the code will have to create thousand slots if
>>> there are thousand partitions. I think we will need to see how much
>>> effect that has.
>>
>> I agree that it does not make sense to create as many slots, if at all
>> we go by this approach. Suppose the partitioned table is the only one
>> having different tuple descriptor, and rest of the partitions have
>> same tuple descriptor. In that case, keep track of unique descriptors,
>> and allocate a slot per unique descriptor.
> 
> Why? Compared to the rest of the structures created, a slot is not
> particularly expensive? I don't see what you're optimizing here.

What I'm thinking of doing is something that's inspired by one of the
things that David Rowley proposes in his patch for PG 12 to remove
inefficiencies in the tuple routing code [1].

Instead of a single TupleTableSlot attached at partition_tuple_slot, we
allocate an array of TupleTableSlot pointers of same length as the number
of partitions, as you mentioned upthread.  We then call
MakeTupleTableSlot() only if a partition needs it and pass it the
partition's TupleDesc.  Allocated slots are remembered in a list.
ExecDropSingleTupleTableSlot is then called on those allocated slots when
the plan ends.  Note that the array of slots is not allocated if none of
the partitions affected by a given query (or copy) needed to convert tuples.

Other issues that you mentioned, such as needless heap_tuple_deform/form
being invoked, seem less localized (to me) than this particular issue, so
I created a patch for just this, which is attached with this email.  I'm
thinking about how to fix the other issues, but will need a bit more time.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/CAKJS1f_1RJyFquuCKRFHTdcXqoPX-PYqAd7nz=GVBwvGh4a6xA@mail.gmail.com

Attachment

v1-0001-Allocate-dedicated-slots-of-partition-tuple-conve.patch

Re: partitioning - changing a slot's descriptor is expensive

From

Amit Langote

Date:

29 June 2018, 10:05:28

On 2018/06/29 15:23, Amit Langote wrote:
> Instead of a single TupleTableSlot attached at partition_tuple_slot, we
> allocate an array of TupleTableSlot pointers of same length as the number
> of partitions, as you mentioned upthread.  We then call
> MakeTupleTableSlot() only if a partition needs it and pass it the
> partition's TupleDesc.  Allocated slots are remembered in a list.
> ExecDropSingleTupleTableSlot is then called on those allocated slots when
> the plan ends.  Note that the array of slots is not allocated if none of
> the partitions affected by a given query (or copy) needed to convert tuples.

Forgot to show effect the patch has (on my workstation, so numbers a bit
noisy).

create table p (a int, b int, c int) partition by range (a);
create table p1 (b int, a int, c int);
alter table p attach partition p1 for values from (1) to (maxvalue);

Note that every row will end up into p1 and will require conversion.

-- 2 million records
copy (select i, i+1, i+2 from generate_series(1, 2000000) i) to
'/tmp/data.csv' csv;


Un-patched:

truncate p;
copy p from '/tmp/data.csv' csv;
COPY 2000000
Time: 8521.308 ms (00:08.521)

truncate p;
copy p from '/tmp/data.csv' csv;
COPY 2000000
Time: 8160.741 ms (00:08.161)

truncate p;
copy p from '/tmp/data.csv' csv;
COPY 2000000
Time: 8389.925 ms (00:08.390)


Patched:

truncate p;
copy p from '/tmp/data.csv' csv;
COPY 2000000
Time: 7716.568 ms (00:07.717)

truncate p;
copy p from '/tmp/data.csv' csv;
COPY 2000000
Time: 7569.224 ms (00:07.569)

truncate p;
copy p from '/tmp/data.csv' csv;
COPY 2000000
Time: 7572.085 ms (00:07.572)

So, there is at least some speedup.

Thanks,
Amit

Re: partitioning - changing a slot's descriptor is expensive

From

Amit Khandekar

Date:

17 August 2018, 09:00:49

On 29 June 2018 at 11:53, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
> What I'm thinking of doing is something that's inspired by one of the
> things that David Rowley proposes in his patch for PG 12 to remove
> inefficiencies in the tuple routing code [1].
>
> Instead of a single TupleTableSlot attached at partition_tuple_slot, we
> allocate an array of TupleTableSlot pointers of same length as the number
> of partitions, as you mentioned upthread.  We then call
> MakeTupleTableSlot() only if a partition needs it and pass it the
> partition's TupleDesc.  Allocated slots are remembered in a list.
> ExecDropSingleTupleTableSlot is then called on those allocated slots when
> the plan ends.  Note that the array of slots is not allocated if none of
> the partitions affected by a given query (or copy) needed to convert tuples.

Thanks for the patch. I did some review of the patch (needs a rebase).
Below are my comments.

@@ -1936,12 +1936,11 @@ ExecPartitionCheckEmitError(ResultRelInfo
*resultRelInfo,
+   /* Input slot might be of a partition, which has a fixed tupdesc. */
+   slot = MakeTupleTableSlot(tupdesc);
    if (map != NULL)
-   {
      tuple = do_convert_tuple(tuple, map);
-     ExecSetSlotDescriptor(slot, tupdesc);
-     ExecStoreTuple(tuple, slot, InvalidBuffer, false);
-   }
+   ExecStoreTuple(tuple, slot, InvalidBuffer, false);

Both MakeTupleTableSlot() and ExecStoreTuple() can be inside the (map
!= NULL) if condition.
This also applies for similar changes in ExecConstraints() and
ExecWithCheckOptions().

+ * Initialize an empty slot that will be used to manipulate tuples of any
+ * this partition's rowtype.
of any this => of this

+ * Initialized the array where these slots are stored, if not already
Initialized => Initialize

+ proute->partition_tuple_slots_alloced =
+ lappend(proute->partition_tuple_slots_alloced,
+ proute->partition_tuple_slots[partidx]);

Instead of doing the above, I think we can collect those slots in
estate->es_tupleTable using ExecInitExtraTupleSlot() so that they
don't have to be explicitly dropped in ExecCleanupTupleRouting(). And
also then we won't require the new field
proute->partition_tuple_slots_alloced.


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: partitioning - changing a slot's descriptor is expensive

From

Amit Khandekar

Date:

17 August 2018, 19:30:20

On 29 June 2018 at 11:53, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
> Other issues that you mentioned, such as needless heap_tuple_deform/form
> being invoked, seem less localized (to me) than this particular issue, so
> I created a patch for just this, which is attached with this email.  I'm
> thinking about how to fix the other issues, but will need a bit more time.

I have started a new thread on these tuple forming/deforming issues
[1], and posted there a patch that is to be applied over your
Allocate-dedicated-slots-of-partition-tuple.patch (which I rebased and
posted in that thread) :

[1] https://www.postgresql.org/message-id/CAJ3gD9fR0wRNeAE8VqffNTyONS_UfFPRpqxhnD9Q42vZB%2BJvpg%40mail.gmail.com


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: partitioning - changing a slot's descriptor is expensive

From

Amit Langote

Date:

20 August 2018, 12:15:15

Thanks for the review.

On 2018/08/17 15:00, Amit Khandekar wrote:
> Thanks for the patch. I did some review of the patch (needs a rebase).
> Below are my comments.
> 
> @@ -1936,12 +1936,11 @@ ExecPartitionCheckEmitError(ResultRelInfo
> *resultRelInfo,
> +   /* Input slot might be of a partition, which has a fixed tupdesc. */
> +   slot = MakeTupleTableSlot(tupdesc);
>     if (map != NULL)
> -   {
>       tuple = do_convert_tuple(tuple, map);
> -     ExecSetSlotDescriptor(slot, tupdesc);
> -     ExecStoreTuple(tuple, slot, InvalidBuffer, false);
> -   }
> +   ExecStoreTuple(tuple, slot, InvalidBuffer, false);
> 
> Both MakeTupleTableSlot() and ExecStoreTuple() can be inside the (map
> != NULL) if condition.
> This also applies for similar changes in ExecConstraints() and
> ExecWithCheckOptions().

Ah, okay.  I guess that means we'll allocate a new slot here only if we
had to switch to a partition-specific slot in the first place.

> + * Initialize an empty slot that will be used to manipulate tuples of any
> + * this partition's rowtype.
> of any this => of this
> 
> + * Initialized the array where these slots are stored, if not already
> Initialized => Initialize

Fixed.

> + proute->partition_tuple_slots_alloced =
> + lappend(proute->partition_tuple_slots_alloced,
> + proute->partition_tuple_slots[partidx]);
> 
> Instead of doing the above, I think we can collect those slots in
> estate->es_tupleTable using ExecInitExtraTupleSlot() so that they
> don't have to be explicitly dropped in ExecCleanupTupleRouting(). And
> also then we won't require the new field
> proute->partition_tuple_slots_alloced.

Although I was slightly uncomfortable of the idea at first, thinking that
it's not good for tuple routing specific resources to be released by
generic executor code, doesn't seem too bad to do it the way you suggest.

Attached updated patch.  By the way, I think it might be a good idea to
try to merge this patch with the effort in the following thread.

* Reduce partition tuple routing overheads *
https://commitfest.postgresql.org/19/1690/

Thanks,
Amit

Attachment

v2-0001-Allocate-dedicated-slots-of-partition-tuple-conve.patch

Re: partitioning - changing a slot's descriptor is expensive

From

Amit Khandekar

Date:

20 August 2018, 14:10:17

On 20 August 2018 at 14:45, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
> Thanks for the review.
>
> On 2018/08/17 15:00, Amit Khandekar wrote:
>> Thanks for the patch. I did some review of the patch (needs a rebase).
>> Below are my comments.
>>
>> @@ -1936,12 +1936,11 @@ ExecPartitionCheckEmitError(ResultRelInfo
>> *resultRelInfo,
>> +   /* Input slot might be of a partition, which has a fixed tupdesc. */
>> +   slot = MakeTupleTableSlot(tupdesc);
>>     if (map != NULL)
>> -   {
>>       tuple = do_convert_tuple(tuple, map);
>> -     ExecSetSlotDescriptor(slot, tupdesc);
>> -     ExecStoreTuple(tuple, slot, InvalidBuffer, false);
>> -   }
>> +   ExecStoreTuple(tuple, slot, InvalidBuffer, false);
>>
>> Both MakeTupleTableSlot() and ExecStoreTuple() can be inside the (map
>> != NULL) if condition.
>> This also applies for similar changes in ExecConstraints() and
>> ExecWithCheckOptions().
>
> Ah, okay.  I guess that means we'll allocate a new slot here only if we
> had to switch to a partition-specific slot in the first place.
>
>> + * Initialize an empty slot that will be used to manipulate tuples of any
>> + * this partition's rowtype.
>> of any this => of this
>>
>> + * Initialized the array where these slots are stored, if not already
>> Initialized => Initialize
>
> Fixed.
>
>> + proute->partition_tuple_slots_alloced =
>> + lappend(proute->partition_tuple_slots_alloced,
>> + proute->partition_tuple_slots[partidx]);
>>
>> Instead of doing the above, I think we can collect those slots in
>> estate->es_tupleTable using ExecInitExtraTupleSlot() so that they
>> don't have to be explicitly dropped in ExecCleanupTupleRouting(). And
>> also then we won't require the new field
>> proute->partition_tuple_slots_alloced.
>
> Although I was slightly uncomfortable of the idea at first, thinking that
> it's not good for tuple routing specific resources to be released by
> generic executor code, doesn't seem too bad to do it the way you suggest.
>
> Attached updated patch.

Thanks for the changes. The patch looks good to me.

> By the way, I think it might be a good idea to
> try to merge this patch with the effort in the following thread.
>
> * Reduce partition tuple routing overheads *
> https://commitfest.postgresql.org/19/1690/

Yes. I guess, whichever commits later needs to be rebased on the earlier one.

-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Re: partitioning - changing a slot's descriptor is expensive

From

Andres Freund

Date:

20 August 2018, 14:21:03

Hi,

On 2018-08-20 16:40:17 +0530, Amit Khandekar wrote:
> On 20 August 2018 at 14:45, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
> > By the way, I think it might be a good idea to
> > try to merge this patch with the effort in the following thread.
> >
> > * Reduce partition tuple routing overheads *
> > https://commitfest.postgresql.org/19/1690/
> 
> Yes. I guess, whichever commits later needs to be rebased on the earlier one.

I think I'd rather keep this patch separate, it's pretty simple, so we
should be able to commit it fairly soon.

Greetings,

Andres Freund

Re: partitioning - changing a slot's descriptor is expensive

From

David Rowley

Date:

20 August 2018, 14:24:02

On 20 August 2018 at 23:21, Andres Freund <andres@anarazel.de> wrote:
> On 2018-08-20 16:40:17 +0530, Amit Khandekar wrote:
>> On 20 August 2018 at 14:45, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote:
>> > By the way, I think it might be a good idea to
>> > try to merge this patch with the effort in the following thread.
>> >
>> > * Reduce partition tuple routing overheads *
>> > https://commitfest.postgresql.org/19/1690/
>>
>> Yes. I guess, whichever commits later needs to be rebased on the earlier one.
>
> I think I'd rather keep this patch separate, it's pretty simple, so we
> should be able to commit it fairly soon.

+1.   I think that patch is already big enough.

I'm perfectly fine with rebasing that if this one gets committed first.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services