Re: Make COPY format extendable: Extract COPY TO format implementations - Mailing list pgsql-hackers

From Sutou Kouhei
Subject Re: Make COPY format extendable: Extract COPY TO format implementations
Date
Msg-id 20250911.144627.1367548062212866518.kou@clear-code.com
Whole thread Raw
In response to Re: Make COPY format extendable: Extract COPY TO format implementations  (Masahiko Sawada <sawada.mshk@gmail.com>)
Responses Re: Make COPY format extendable: Extract COPY TO format implementations
List pgsql-hackers
Hi,

In <CAD21AoBb3t7EcsjYT4w68p9OfMNwWTYsbSVaSRY6DRhi7sNRFg@mail.gmail.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on Wed, 10 Sep 2025 00:36:38 -0700,
  Masahiko Sawada <sawada.mshk@gmail.com> wrote:

> How about another idea like we move format-specific data to another
> struct that embeds CopyFrom/ToStateData at the first field and have
> CopyFrom/ToStart callback return memory with the size of that
> struct?It resolves the concerns about adding an extra indirection
> layer and extensions doesn't need to allocate memory for unnecessary
> fields (used only for built-in formats). While extensions can access
> the internal fields I think we can live with that given that there are
> some similar precedents such as table AM's scan descriptions.

The another idea looks like the following, right?

struct CopyToStateBuiltInData
{
  struct CopyToStateData parent;

  /* Members for built-in formats */
  ...;
}

typedef CopyToState *(*CopyToStart) (void);

CopyToState
BeginCopyTo(..., CopyToStart copy_to_start)
{
  ...;

  /* Allocate workspace and zero all fields */
  cstate = copy_to_start();
  ...;
}

This idea will almost work. But we can't know which
CopyToStart should be used before we parse "FORMAT" option
of COPY.

If we can iterate options twice in BeginCopy{To,From}(), we
can know it. For example:

BeginCopyTo(...)
{
  ...;

  CopyToStart copy_to_start = NULL;
  foreach(option, options)
  {
    DefElem  *defel = lfirst_node(DefElem, option);

    if (strcmp(defel->defname, "format") == 0)
    {
       char *fmt = defGetString(defel);
       if (strcmp(fmt, "text") == 0 ||
           strcmp(fmt, "csv") == 0 ||
           strcmp(fmt, "binary") == 0) {
         /* Use the builtin cstate */
       } else {
         copy_to_start = /* Detect CopyToStart for custom format */;
       }
    }
  }
  if (copy_to_start)
    cstate = copy_to_start();
  else 
    cstate = (CopyToStateData *) palloc0(sizeof(CopyToStateBuiltInData));
  ...;
}

(It may be better that we add
Copy{To,From}Routine::Copy{To,From}Allocate() instead of
CopyToStart callback.)

I think that this is acceptable because this must be a light
process. This must not have negative performance impact.


Thanks,
-- 
kou



pgsql-hackers by date:

Previous
From: Corey Huinker
Date:
Subject: Re: someone else to do the list of acknowledgments
Next
From: Tom Lane
Date:
Subject: Re: someone else to do the list of acknowledgments