Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Make COPY format extendable: Extract COPY TO format implementations
Date
Msg-id CAD21AoBb3t7EcsjYT4w68p9OfMNwWTYsbSVaSRY6DRhi7sNRFg@mail.gmail.com
Whole thread Raw
In response to Re: Make COPY format extendable: Extract COPY TO format implementations  (Sutou Kouhei <kou@clear-code.com>)
List pgsql-hackers
On Tue, Sep 9, 2025 at 7:41 PM Sutou Kouhei <kou@clear-code.com> wrote:
>
> Hi,
>
> In <CAD21AoCidyfKcpf9-f2Np8kWgkM09c4TjnS1h1hcO_-CCbjeqw@mail.gmail.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" on Tue, 9 Sep 2025 13:15:43 -0700,
>   Masahiko Sawada <sawada.mshk@gmail.com> wrote:
>
> >> I don't object your approach but we need a good way to
> >> measure performance. If we use this approach, we can omit it
> >> for now and we can revisit your approach later without
> >> breaking compatibility. How about using this approach if we
> >> can't find a good way to measure performance?
> >
> > I think it would be better to hear more opinions about this idea and
> > then make a decision, rather than basing our decision on whether or
> > not we can measure its performance, so we can be more confident in the
> > idea we have chosen. While this idea has the above downside, it could
> > make sense because we always allocate the entire CopyFrom/ToStateData
> > even today in spite of some fields being not used at all in binary
> > format and it requires less implementation costs to hide the
> > for-core-only fields. On the other hand, another possible idea is that
> > we have three different structs for categories 1 (core-only), 2 (core
> > and extensions), and 3 (extension-only), and expose only 2 that has a
> > void pointer to 3. The core can allocate the memory for 1 that embeds
> > 2 at the beginning of the fields. While this design looks cleaner and
> > we can minimize overheads due to indirect references, it would require
> > more implementation costs. Which method we choose, I think we need
> > performance measurements in several scenarios to check if performance
> > regressions don't happen unexpectedly.
>
> OK. So the next step is collecting more opinions, right?
>
> Could you add key people in this area to Cc to hear their
> opinions? I'm not familiar with key people in the PostgreSQL
> community...

How about another idea like we move format-specific data to another
struct that embeds CopyFrom/ToStateData at the first field and have
CopyFrom/ToStart callback return memory with the size of that
struct?It resolves the concerns about adding an extra indirection
layer and extensions doesn't need to allocate memory for unnecessary
fields (used only for built-in formats). While extensions can access
the internal fields I think we can live with that given that there are
some similar precedents such as table AM's scan descriptions.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: Incorrect logic in XLogNeedsFlush()
Next
From: Andrei Lepikhov
Date:
Subject: Re: Query Performance Degradation Due to Partition Scan Order – PostgreSQL v17.6