Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers

From Sutou Kouhei
Subject Re: Make COPY format extendable: Extract COPY TO format implementations
Date
Msg-id 20250909.115014.143895394711134404.kou@clear-code.com
Whole thread Raw
In response to Re: Make COPY format extendable: Extract COPY TO format implementations  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Make COPY format extendable: Extract COPY TO format implementations
List pgsql-hackers
Hi,

In <CAD21AoCCjKA77xkUxx59qJ8an_G_58Mry_EtCEcFgd=g9N2xew@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Mon, 8 Sep 2025 14:08:16 -0700,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:

>> > The fields in 1 are mostly static fields, and the fields in 2 and 3
>> > are likely to be accessed in hot functions during COPY FROM. Would it
>> > be a good idea to restructure these fields so that we can hide the
>> > fields in 1 from callback functions and having the fields in 3 in a
>> > separate format-specific struct that can be accessed via an opaque
>> > pointer? But could the latter change potentially cause performance
>> > overheads?
>>
>> Yes. It changes memory layout (1 continuous memory chunk ->
>> 2 separated memory chunks) and introduces indirect member
>> accesses (x->y -> x->z->y).
> 
> I think fields accessed by the hot functions are limited. If we
> assemble fields required by hot functions into one struct and pass it
> to them can we deal with the latter? For example, we assemble the
> fields in 3 I mentioned above (i.e., built-in format specific fields)
> into say CopyFromStateBuiltin and pass it to CopyReadLine() function
> and the following functions, instead of CopyFromState. Since there are
> some places where we need to access to CopyFromState (e.g.,
> CopyGetData()), CopyFromStateBuiltin needs to have a pointer to
> CopyFromState as well.

It can change indirect member accesses (built-in format
specific members can be direct access but other members in
CopyFromState are indirect access) but it doesn't change 2
separated memory chunks.

If this approach has performance impact and it's caused by
indirect member accesses for built-in format specific
members, your suggestion will work. If performance impact is
caused by another reason, your suggestion may not work.

Anyway, we need to measure performance to proceed with this
approach. If we can confirm that this approach doesn't have
any performance impact, we can use the original your idea.

Do you have any idea how to measure performance of this
approach?

We did it when we introduce Copy{To,From}Routine. But it was
difficult to evaluate the results:

* I don't have machines for stable benchmark results
  * We may not be able to use them for the final decision
* Most results showed performance improvement but
  there was a result showed mysterious result[1]

[1] https://www.postgresql.org/message-id/flat/CAEG8a3LUBcvjwqgt6AijJmg67YN_b_NZ4Kzoxc_dH4rpAq0pKg%40mail.gmail.com

>> Drawback: This approach always allocates
>> CopyToStateInternalData not CopyToStateData. So we need to
>> allocate needless memories for extensions. But this will
>> prevent performance regression of built-in formats. Is it
>> acceptable?
> 
> While this approach could make sense to avoid potential performance
> overheads for built-in format, I find it's somewhat odd that
> extensions cannot allocate memory for its working state having
> CopyToStateData as the base type.

Is it important? We'll provide a opaque member for
extensions. Extensions should use the opaque member instead
of extending Copy{From,To}StateData.


I don't object your approach but we need a good way to
measure performance. If we use this approach, we can omit it
for now and we can revisit your approach later without
breaking compatibility. How about using this approach if we
can't find a good way to measure performance?


Thanks,
-- 
kou



pgsql-hackers by date:

Previous
From: Sami Imseih
Date:
Subject: Re: PgStat_HashKey padding issue when passed by reference
Next
From: Ashutosh Bapat
Date:
Subject: Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6