Thread: COPY and file_fdw with fixed column widths

COPY and file_fdw with fixed column widths

From

Bruce Momjian

Date:

28 April 2015, 19:35:43

I know COPY doesn't support importing files with fixed column widths,
i.e. we can't say field1 is the first 30 characters, and field2 is the
rest of the line.  We need a unique delimiter at column 31.  (Commercial
Ingres does support this ability.)

I know we tell most people to use sed, Perl, or an ETL tool to convert
files into a format COPY understands, and I think that is a reasonable
answer.  However, the file_fdw also reads our COPY format, and in that
case, the data file might be updated regularly and running an ETL
process on it every time it is read is inconvenient.

I asking because I am playing around with file_fdw and I have some log
files with a Unix 'date' prefix, a colon, then some text that might also
contain a colon.  It would be ideal if I could read in first 28
characters into field1, then the rest of the line into field2.  The only
idea I have is to create a single-column foreign table that reads the
entire line as one field, then a view on top of that the parses the line
into fields, but that seems kind of complex.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: COPY and file_fdw with fixed column widths

From

Tom Lane

Date:

28 April 2015, 19:46:34

Bruce Momjian <bruce@momjian.us> writes:
> I know COPY doesn't support importing files with fixed column widths,
> i.e. we can't say field1 is the first 30 characters, and field2 is the
> rest of the line.  We need a unique delimiter at column 31.  (Commercial
> Ingres does support this ability.)

> I know we tell most people to use sed, Perl, or an ETL tool to convert
> files into a format COPY understands, and I think that is a reasonable
> answer.  However, the file_fdw also reads our COPY format, and in that
> case, the data file might be updated regularly and running an ETL
> process on it every time it is read is inconvenient.

COPY is, and has always been intended to be, as fast as possible; loading
format transformation abilities onto it seems like a fundamental mistake.
Therefore, if you wish file_fdw were more flexible, I think the answer is
to create a variant of file_fdw that doesn't use COPY but some other
mechanism.
        regards, tom lane

Re: COPY and file_fdw with fixed column widths

From

Bruce Momjian

Date:

28 April 2015, 19:50:07

On Tue, Apr 28, 2015 at 12:46:22PM -0700, Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > I know COPY doesn't support importing files with fixed column widths,
> > i.e. we can't say field1 is the first 30 characters, and field2 is the
> > rest of the line.  We need a unique delimiter at column 31.  (Commercial
> > Ingres does support this ability.)
> 
> > I know we tell most people to use sed, Perl, or an ETL tool to convert
> > files into a format COPY understands, and I think that is a reasonable
> > answer.  However, the file_fdw also reads our COPY format, and in that
> > case, the data file might be updated regularly and running an ETL
> > process on it every time it is read is inconvenient.
> 
> COPY is, and has always been intended to be, as fast as possible; loading
> format transformation abilities onto it seems like a fundamental mistake.
> Therefore, if you wish file_fdw were more flexible, I think the answer is
> to create a variant of file_fdw that doesn't use COPY but some other
> mechanism.

Yes, I think this is a missing feature.  While we can tell people to do
ETL for loading, we are really not doing that for file_fdw.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +

Re: COPY and file_fdw with fixed column widths

From

Andrew Dunstan

Date:

28 April 2015, 20:39:09

On 04/28/2015 03:50 PM, Bruce Momjian wrote:
> On Tue, Apr 28, 2015 at 12:46:22PM -0700, Tom Lane wrote:
>> Bruce Momjian <bruce@momjian.us> writes:
>>> I know COPY doesn't support importing files with fixed column widths,
>>> i.e. we can't say field1 is the first 30 characters, and field2 is the
>>> rest of the line.  We need a unique delimiter at column 31.  (Commercial
>>> Ingres does support this ability.)
>>> I know we tell most people to use sed, Perl, or an ETL tool to convert
>>> files into a format COPY understands, and I think that is a reasonable
>>> answer.  However, the file_fdw also reads our COPY format, and in that
>>> case, the data file might be updated regularly and running an ETL
>>> process on it every time it is read is inconvenient.
>> COPY is, and has always been intended to be, as fast as possible; loading
>> format transformation abilities onto it seems like a fundamental mistake.
>> Therefore, if you wish file_fdw were more flexible, I think the answer is
>> to create a variant of file_fdw that doesn't use COPY but some other
>> mechanism.
> Yes, I think this is a missing feature.  While we can tell people to do
> ETL for loading, we are really not doing that for file_fdw.
>

This needs some love, but it's probably along the lines you need.

<https://github.com/adunstan/file_fixed_length_record_fdw>

cheers

andrew