Re: Review: Patch FORCE_NULL option for copy COPY in CSV mode - Mailing list pgsql-hackers
From | Ian Lawrence Barwick |
---|---|
Subject | Re: Review: Patch FORCE_NULL option for copy COPY in CSV mode |
Date | |
Msg-id | CAB8KJ=jVfVi3U8DTn_K05spir6jJoSGFnWVVvvrzL7tK9dOUeQ@mail.gmail.com Whole thread Raw |
In response to | Review: Patch FORCE_NULL option for copy COPY in CSV mode (Payal Singh <payal@omniti.com>) |
Responses |
Re: Review: Patch FORCE_NULL option for copy COPY in CSV
mode
|
List | pgsql-hackers |
2013-11-01 Payal Singh <payal@omniti.com>: > The post was made before I subscribed to the mailing list, so posting my > review separately. The link to the original patch mail is > http://www.postgresql.org/message-id/CAB8KJ=jS-Um4TGwenS5wLUfJK6K4rNOm_V6GRUj+tcKekL2=GQ@mail.gmail.com > >> >> Hi, >> >> This patch implements the following TODO item: >> >> Allow COPY in CSV mode to control whether a quoted zero-length >> string is treated as NULL >> >> Currently this is always treated as a zero-length string, >> which generates an error when loading into an integer column >> >> Re: [PATCHES] allow CSV quote in NULL >> http://archives.postgresql.org/pgsql-hackers/2007-07/msg00905.php >> >> >> http://wiki.postgresql.org/wiki/Todo#COPY >> >> >> I had a very definite use-case for this functionality recently while >> importing >> CSV files generated by Oracle, and was somewhat frustrated by the >> existence >> of a FORCE_NOT_NULL option for specific columns, but not one for >> FORCE_NULL. >> >> I'll add this to the November commitfest. >> >> >> Regards >> >> Ian Barwick > > > ================== > Contents & Purpose > ================== > > This patch introduces a new 'FORCE_NULL' option which has the opposite > functionality of the already present 'FORCE_NOT_NULL' option for the COPY > command. Prior to this option there was no way to convert a zeroed-length > quoted value to a NULL value when COPY FROM is used to import data from CSV > formatted files. > > ================== > Submission Review > ================== > > The patch is in the correct context diff format. It includes changes to the > documentation as well as additional regression tests. The description has > been discussed and defined in the previous mails leading to this patch. > > ===================== > Functionality Review > ===================== > > CORRECTION NEEDED - Due to a minor error in code (details in 'Code Review' > section below), force_null option is not limited to COPY FROM, and works > even when COPY TO is used. This should instead give an error message. > > The updated documentation describes the added functionality clearly. > > All regression tests passed successfully. > > Code compilation after including patch was successful. No warnings either. > > Manually tested COPY FROM with FORCE_NULL for a number of scenarios, all > with expected outputs. No issues. > > Been testing the patch for a few days, no crashes or weird behavior > observed. > > ============================================= > Code Formatting Review (Needs Improvement) > ============================================= > > Looks good, the tabs between variable declaration and accompanying comment > can be improved. > > ================================= > Code Review (Needs Improvement) > ================================= > > 1. There is a " NOTE: force_not_null option are not applied to the returned > fields." before COPY FROM block. Similar note should be added for force_null > option too. > > 2. One of the conditions to check and give an error if force_null is true > and copy from is false is wrong, cstate->force_null should be checked > instead of cstate->force_notnull: > > /* Check force_notnull */ > if (!cstate->csv_mode && cstate->force_notnull != NIL) > ereport(ERROR, > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > errmsg("COPY force not null available only > in CSV mode"))); > if (cstate->force_notnull != NIL && !is_from) > ereport(ERROR, > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > errmsg("COPY force not null only available using > COPY FROM"))); > > /* Check force_null */ > if (!cstate->csv_mode && cstate->force_null != NIL) > ereport(ERROR, > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > errmsg("COPY force null available only in > CSV mode"))); > > ==> if (cstate->force_notnull != NIL && !is_from) > ereport(ERROR, > (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > errmsg("COPY force null only available using COPY > FROM"))); > > =============================== > Suggested Changes & Conclusion > =============================== > > The Above mentioned error condition should be corrected. Minor comments and > tab changes are upto the author. > > In all, suggested modifications aside, the patch works well and in my > opinion, would be a useful addition to the COPY command. Hi Payal Many thanks for the review, and my apologies for not getting back to you earlier. Updated version of the patch attached with suggested corrections. I'm not sure about the tabs in the variable declarations - the whole section seems to be all over the place (regardless of whether tabs are set to 4 or 8 spaces) and could do with tidying up, however I didn't want to expand the scope of the patch too much. Quick recap of the reasons behind this patch - we had a bunch of CSV files (and by "bunch" I mean totalling hundreds of millions of lines) exported from a data source which was unable to distinguish between an empty string and a null value. Too late we realized we had a ridiculous number of comma separated rows with some empty strings which should be kept as such, and some empty strings which should be imported as NULLs. "Wait", I said, for I remembered COPY FROM had some kind of option involving specifying the NULL status of certain columns. Alas it turned out to be the opposite of what we required, and we found another workaround, but I thought as it's likely we would face a similar situation in the future, it would be handy to have the option available. Example: Given a table like this (a very stripped-down version of the original use case): CREATE TABLE nulltest ( prefix TEXT NOT NULL DEFAULT '', altname TEXT DEFAULT NULL ); I want to be able to do this: postgres=# COPY nulltest FROM STDIN WITH (FORMAT CSV, FORCE_NULL (altname)); Enter data to be copied followed by a newline. End with a backslash and a period on a line by itself. >> "","" >> \. postgres=# \pset null NULL Null display (null) is "NULL". postgres=# SELECT * FROM nulltest ; prefix | altname --------+--------- | NULL (1 row) I don't have any strong feelings about the syntax - as is it's just the logical opposite of what's already available. Regards Ian Barwick
Attachment
pgsql-hackers by date: