Thread: Desperately Seeking Regular Expression

Desperately Seeking Regular Expression

From

Thomas Good

Date:

27 April 1999, 09:37:33

Hi all -

I am porting a PROGRESS database to PostgreSQL.

I've had success previously doing a port - but from FoxPro which
allows one to dump data delimited by tabs.  Unfortunately, PROGRESS
dumps fields delimited by whitespace rather than tabs and I can find no
documentation on how to alter this behaviour.

I read the recent post wherein someone used awk to change whitespace
to tabs:

cat $input | awk '{ print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t" \
$6"\t"$7"\t" }' > $input.out

I am using this with good effect.  However, I run into trouble as
inside my dump file(s) there are doublequoted character strings.
awk is changing the whitespace delimited words inside the char strs
into tab delimited words inside strings.  Ouch.

What follows is my inept effort to get sed on my side as I try to sort
this out:
sed -e 's/"    *"/ /g' $input.out > $input.sql

This is a miserable failure as it simply converts all the tabs back
to whitespace.  I've tried escaping the double quotes in the regex
but then sed changes nothing.

Can someone put me out of my misery?  Anyone have suggestions on a
regular expression that will:

Convert tabs to whitespace *inside* of double quoted strings *only* ???
I don't care if the regex is for sed/awk/perl, whatever, I need to get
the job done!

TIA!

Stuck in Staten Island,
Tom
----
         North Richmond Community Mental Health Center
                              ---
         Thomas Good   tomg@ { admin | q8 } .nrnet.org
         Phone:        718-354-5528
         Fax:          718-354-5056
         Powered By:   Slackware 3.6  PostgreSQL 6.3.2
                              ---
        /* Die Wahrheit Ist Irgendwo Da Draussen... */

Re: [GENERAL] Desperately Seeking Regular Expression

From

Adriaan Joubert

Date:

27 April 1999, 10:04:17

I solved something like this recently in perl. It's not terribly
efficient, but it is simple. I'm doing this from memory, so it may need
some debugging. Use something along the lines of

#!/usr/local/bin/perl

while (<>) {
  @a = split /(\")/;
  # This gives you a list with some of the items being double-quotes
  # Now you need to figure out which ones were inside double quotes
  my $b;
  if ($a[0] eq '"') {
    # we started with a double quoted string, so join th e 1st 3 fields
        # and stick them on the new string
    $b = join('',splice(@a,0,3))."\t";
  }
  while (@a) {
    $b .= join("\t",split(' ',shift @a))."\t";
    # if there is more then we have another double quoted string
    $b = join('',splice(@a,0,3))."\t" if @a;
  }
  # Remove the last tab and replace with a newline
  $b =~ s/\t$/\n/;
  print $b;
}

Adriaan

Re: [GENERAL] Desperately Seeking Regular Expression

From

Herouth Maoz

Date:

29 April 1999, 07:34:56

At 16:35 +0300 on 27/04/1999, Thomas Good wrote:

>
> I've had success previously doing a port - but from FoxPro which
> allows one to dump data delimited by tabs.  Unfortunately, PROGRESS
> dumps fields delimited by whitespace rather than tabs and I can find no
> documentation on how to alter this behaviour.
>
> I read the recent post wherein someone used awk to change whitespace
> to tabs:
>
> cat $input | awk '{ print $1"\t"$2"\t"$3"\t"$4"\t"$5"\t" \
> $6"\t"$7"\t" }' > $input.out
>
> I am using this with good effect.  However, I run into trouble as
> inside my dump file(s) there are doublequoted character strings.
> awk is changing the whitespace delimited words inside the char strs
> into tab delimited words inside strings.  Ouch.

I have a feeling that you are missing additional points. For example, if
you want to use the resulting text as input for COPY, strings should not be
delimited within quotes. And possible tabs and newlines and backslashes
within the file should be properly preceded with "\".

And what do you mean by the fact that the output is delimited by
whitespaces? That there is a single whitespace between the fields? Because
I think the awk above would join null fields in such a case. Or did you
mean that it outputs a fixed width file? That is, that the first field is
from column 1 to column 20, and if it's shorter, it adds spaces until
column 20?

That would require a different treatment.

If you give a more detailed description, you may get a better solution.

Herouth

--
Herouth Maoz, Internet developer.
Open University of Israel - Telem project
http://telem.openu.ac.il/~herutma