Thread: COPY and file_fdw with fixed column widths
I know COPY doesn't support importing files with fixed column widths, i.e. we can't say field1 is the first 30 characters, and field2 is the rest of the line. We need a unique delimiter at column 31. (Commercial Ingres does support this ability.) I know we tell most people to use sed, Perl, or an ETL tool to convert files into a format COPY understands, and I think that is a reasonable answer. However, the file_fdw also reads our COPY format, and in that case, the data file might be updated regularly and running an ETL process on it every time it is read is inconvenient. I asking because I am playing around with file_fdw and I have some log files with a Unix 'date' prefix, a colon, then some text that might also contain a colon. It would be ideal if I could read in first 28 characters into field1, then the rest of the line into field2. The only idea I have is to create a single-column foreign table that reads the entire line as one field, then a view on top of that the parses the line into fields, but that seems kind of complex. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
Bruce Momjian <bruce@momjian.us> writes: > I know COPY doesn't support importing files with fixed column widths, > i.e. we can't say field1 is the first 30 characters, and field2 is the > rest of the line. We need a unique delimiter at column 31. (Commercial > Ingres does support this ability.) > I know we tell most people to use sed, Perl, or an ETL tool to convert > files into a format COPY understands, and I think that is a reasonable > answer. However, the file_fdw also reads our COPY format, and in that > case, the data file might be updated regularly and running an ETL > process on it every time it is read is inconvenient. COPY is, and has always been intended to be, as fast as possible; loading format transformation abilities onto it seems like a fundamental mistake. Therefore, if you wish file_fdw were more flexible, I think the answer is to create a variant of file_fdw that doesn't use COPY but some other mechanism. regards, tom lane
On Tue, Apr 28, 2015 at 12:46:22PM -0700, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > I know COPY doesn't support importing files with fixed column widths, > > i.e. we can't say field1 is the first 30 characters, and field2 is the > > rest of the line. We need a unique delimiter at column 31. (Commercial > > Ingres does support this ability.) > > > I know we tell most people to use sed, Perl, or an ETL tool to convert > > files into a format COPY understands, and I think that is a reasonable > > answer. However, the file_fdw also reads our COPY format, and in that > > case, the data file might be updated regularly and running an ETL > > process on it every time it is read is inconvenient. > > COPY is, and has always been intended to be, as fast as possible; loading > format transformation abilities onto it seems like a fundamental mistake. > Therefore, if you wish file_fdw were more flexible, I think the answer is > to create a variant of file_fdw that doesn't use COPY but some other > mechanism. Yes, I think this is a missing feature. While we can tell people to do ETL for loading, we are really not doing that for file_fdw. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
On 04/28/2015 03:50 PM, Bruce Momjian wrote: > On Tue, Apr 28, 2015 at 12:46:22PM -0700, Tom Lane wrote: >> Bruce Momjian <bruce@momjian.us> writes: >>> I know COPY doesn't support importing files with fixed column widths, >>> i.e. we can't say field1 is the first 30 characters, and field2 is the >>> rest of the line. We need a unique delimiter at column 31. (Commercial >>> Ingres does support this ability.) >>> I know we tell most people to use sed, Perl, or an ETL tool to convert >>> files into a format COPY understands, and I think that is a reasonable >>> answer. However, the file_fdw also reads our COPY format, and in that >>> case, the data file might be updated regularly and running an ETL >>> process on it every time it is read is inconvenient. >> COPY is, and has always been intended to be, as fast as possible; loading >> format transformation abilities onto it seems like a fundamental mistake. >> Therefore, if you wish file_fdw were more flexible, I think the answer is >> to create a variant of file_fdw that doesn't use COPY but some other >> mechanism. > Yes, I think this is a missing feature. While we can tell people to do > ETL for loading, we are really not doing that for file_fdw. > This needs some love, but it's probably along the lines you need. <https://github.com/adunstan/file_fixed_length_record_fdw> cheers andrew