Re: Let file_fdw access COPY FROM PROGRAM - Mailing list pgsql-hackers
From | Amit Langote |
---|---|
Subject | Re: Let file_fdw access COPY FROM PROGRAM |
Date | |
Msg-id | 6fc03b16-291a-61e2-cf1d-13aa360b4555@lab.ntt.co.jp Whole thread Raw |
In response to | Re: Let file_fdw access COPY FROM PROGRAM (Corey Huinker <corey.huinker@gmail.com>) |
Responses |
Re: Let file_fdw access COPY FROM PROGRAM
|
List | pgsql-hackers |
On 2016/09/07 3:12, Corey Huinker wrote: > On Fri, Sep 2, 2016 at 5:07 AM, Amit Langote wrote: >> I am not familiar with win32 stuff too, so I don't have much to say about >> that. Maybe someone else can chime in to help with that. > > The regressions basically *can't* test this because we'd need a shell > command we know works on any architecture. OK. >> @@ -97,8 +99,9 @@ typedef struct FileFdwPlanState >> typedef struct FileFdwExecutionState >> { >> char *filename; /* file to read */ >> - List *options; /* merged COPY options, excluding >> filename */ >> - CopyState cstate; /* state of reading file */ >> + char *program; /* program to read output from */ >> + List *options; /* merged COPY options, excluding >> filename and program */ >> + CopyState cstate; /* state of reading file or program */ >> >> Have you considered a Boolean flag is_program instead of char **program >> similar to how copy.c does it? (See a related comment further below) > > Considered it yes, but decided against it when I started to write my > version. When Adam delivered his version of the patch, I noticed he had > made the same design decision. Only one of them will be initialized, and > the boolean will byte align to 4 bytes, so it's the same storage allocated > either way. > > Either we're fine with two variables, or we think file_name is poorly > named. I have only a slight preference for the two variables, and defer to > the group for a preference. OK, let's defer it to the committer. >> - * This is because the filename is one of those options, and we don't >> want >> - * non-superusers to be able to determine which file gets read. >> + * This is because the filename or program string are two of those >> + * options, and we don't want non-superusers to be able to determine >> which >> + * file gets read or what command is run. >> >> I'm no native English speaker, but I wonder if this should read: filename >> or program string *is* one of those options ... OR filename *and* program >> are two of those options ... ? Also, "are two of those options" sounds a >> bit odd to me because I don't see that used as often as "one of those >> whatever". I guess that's quite a bit of nitpicking about a C comment, ;) > > Given that C comments constitute a large portion of our documentation, I > fully support making them as clear as possible. > > I don't remember if this is Adam's comment or mine. Adam - if you're out > there, chime in. > > The original paragraph was: > > Changing table-level options requires superuser privileges, for security > reasons: only a superuser should be able to determine which file is read. > In principle non-superusers could be allowed to change the other options, > but that's not supported at present. > > > How does this paragraph sound?: > > Changing table-level options requires superuser privileges, for security > reasons: only a superuser should be able to determine which file is read, > or which program is run. In principle non-superusers could be allowed to > change the other options, but that's not supported at present. Hmm, just a little modification would make it better for me: ... for security reasons. For example, only superuser should be able to determine which file read or which program is run. Although that could be just me. >> @@ -257,9 +264,26 @@ file_fdw_validator(PG_FUNCTION_ARGS) >> ereport(ERROR, >> (errcode(ERRCODE_SYNTAX_ERROR), >> errmsg("conflicting or redundant options"))); >> + if (program) >> + ereport(ERROR, >> + (errcode(ERRCODE_SYNTAX_ERROR), >> + errmsg("conflicting or redundant options"))); >> filename = defGetString(def); >> } >> >> + else if (strcmp(def->defname, "program") == 0) >> + { >> + if (filename) >> + ereport(ERROR, >> + (errcode(ERRCODE_SYNTAX_ERROR), >> + errmsg("conflicting or redundant options"))); >> + if (program) >> + ereport(ERROR, >> + (errcode(ERRCODE_SYNTAX_ERROR), >> + errmsg("conflicting or redundant options"))); >> + program = defGetString(def); >> + } >> >> Why does it check for filename here? Also the other way around above. > > We don't want them to specify both program and filename, nor do we allow 2 > programs, or 2 filenames. Ah, I forgot about the mutually exclusive options part. >> @@ -632,11 +671,18 @@ fileBeginForeignScan(ForeignScanState *node, int >> eflags) >> * Create CopyState from FDW options. We always acquire all columns, >> so >> * as to match the expected ScanTupleSlot signature. >> */ >> - cstate = BeginCopyFrom(node->ss.ss_currentRelation, >> - filename, >> - false, >> - NIL, >> - options); >> + if(filename) >> + cstate = BeginCopyFrom(node->ss.ss_currentRelation, >> + filename, >> + false, >> + NIL, >> + options); >> + else >> + cstate = BeginCopyFrom(node->ss.ss_currentRelation, >> + program, >> + true, >> + NIL, >> + options) >> >> Like I mentioned above, if there was a is_program Boolean flag instead of >> separate filename and program, this would be just: >> >> + cstate = BeginCopyFrom(node->ss.ss_currentRelation, >> + filename, >> + is_program, >> + NIL, >> + options); >> >> That would get rid of if-else blocks here and a couple of other places. > > It would, pushing the complexity out to the user. Which could be fine, but > if we do that then "filename" is a misnomer. This is an internal state so I'm not sure how this would be pushing complexity out to the user. Am I perhaps misreading what you said? What a user sees is that there are two separate foreign table options - filename and program. That we handle them internally using a string to identify either file or program and a Boolean flag to show which of the two is just an internal implementation detail. COPY does it that way internally and I just saw that psql's \copy does it the same way too. In the latter's case, following is the options struct (src/bin/psql/copy.c): struct copy_options { char *before_tofrom; /* COPY string before TO/FROM */ char *after_tofrom; /* COPY string after TO/FROMfilename */ char *file; /* NULL = stdin/stdout */ bool program; /* is 'file' a programto popen? */ bool psql_inout; /* true = use psql stdin/stdout */ bool from; /* true= FROM, false = TO */ }; But as you said above, this could be deferred to the committer. >> diff --git a/contrib/file_fdw/input/file_fdw.source >> b/contrib/file_fdw/input/file_fdw.source >> index 685561f..eae34a4 100644 >> --- a/contrib/file_fdw/input/file_fdw.source >> +++ b/contrib/file_fdw/input/file_fdw.source >> >> You forgot to include expected output diffs. > > Having regression tests for this is extremely problematic, because the > program invoked would need to be an invokable command on any architecture > supported by postgres. I'm pretty sure no such command exists. Craig and Michael elsewhere suggested something about adding TAP tests. If that helps the situation, maybe you could. I will mark the CF entry status to "Waiting on author" till you post a new patch. Thanks, Amit
pgsql-hackers by date: