Re: COPY FROM performance improvements - Mailing list pgsql-patches
From | Tom Lane |
---|---|
Subject | Re: COPY FROM performance improvements |
Date | |
Msg-id | 26467.1123362292@sss.pgh.pa.us Whole thread Raw |
In response to | Re: COPY FROM performance improvements ("Alon Goldshuv" <agoldshuv@greenplum.com>) |
Responses |
Re: COPY FROM performance improvements
Re: COPY FROM performance improvements Re: COPY FROM performance improvements |
List | pgsql-patches |
"Alon Goldshuv" <agoldshuv@greenplum.com> writes: > New patch attached. It includes very minor changes. These are changes that > were committed to CVS 3 weeks ago (copy.c 1.247) which I missed in the > previous patch. I've applied this with (rather extensive) revisions. I didn't like what you had done with the control structure --- loading the input buffer only at the outermost loop level was a bad design choice IMHO. You had sprinkled the code with an unreasonable number of special cases in order to try to cope with the effects of that mistake, but there were lots of problems still left. Some of the bugs I noticed: * Broke old-protocol COPY, since that has no provision for stopping at the EOF marker except by parsing the data carefully to start with. The backend would just hang up unless the total data size chanced to be a multiple of 64K. * Subtle change in interpretation of \. EOF marker (the existing code will recognize it even when not at start of line). * Seems to have thrown away detection of newline format discrepancies. * Fails for zero-column tables. * Broke display of column values during error context callback (would always show the first column contents no matter which one is being complained of). * DetectLineEnd mistakenly assumes CR mode if very last character of first bufferload is CR; need to reserve judgment until next char is available. * DetectLineEnd fails to account for backslashed control characters, so it will e.g. accept \ followed by \n as determining the newline style. * Fails to apply encoding conversion if first line exceeds copy buf size, because when DetectLineEnd fails the quick-exit path doesn't do it. * There seem to be several bugs associated with the fact that input_buf[] always has 64K of data in it even when EOF has been reached on the input. One example: echo -n 123 >zzz1 psql> create temp table t1(f1 text); psql> copy t1 from '/home/tgl/zzz1'; psql> select * from t1; hmm ... where'd that 64K of whitespace come from? I rewrote the patch in a way that retained most of the speedups without changing the basic control structure (except for replacing multiple CopyReadAttribute calls with one CopyReadAttributes call per line). I had some difficulty in generating test cases that weren't largely I/O-bound, but AFAICT the patch as applied is about the same speed as what you submitted. regards, tom lane
pgsql-patches by date: