Re: Another regexp performance improvement: skip useless paren-captures - Mailing list pgsql-hackers

From Mark Dilger
Subject Re: Another regexp performance improvement: skip useless paren-captures
Date
Msg-id 80944B12-6B9A-443F-B4F8-95B04F85E28A@enterprisedb.com
Whole thread Raw
In response to Re: Another regexp performance improvement: skip useless paren-captures  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Another regexp performance improvement: skip useless paren-captures
Re: Another regexp performance improvement: skip useless paren-captures
List pgsql-hackers

> On Aug 9, 2021, at 4:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> There is a potentially interesting definitional question:
> what exactly ought this regexp do?
>
>         ((.)){0}\2
>
> Because the capturing paren sets are zero-quantified, they will
> never be matched to any characters, so the backref can never
> have any defined referent.

Perl regular expressions are not POSIX, but if there is a principled reason POSIX should differ from perl on this, we
shouldbe clear what that is: 

    #!/usr/bin/perl

    use strict;
    use warnings;

    our $match;
    if ('foo' =~ m/((.)(??{ die; })){0}(..)/)
    {
        print "captured 1 $1\n" if defined $1;
        print "captured 2 $2\n" if defined $2;
        print "captured 3 $3\n" if defined $3;
        print "captured 4 $4\n" if defined $4;
        print "match = $match\n" if defined $match;
    }

This will print "captured 3 fo", proving that although the regular expression is parsed with the (..) bound to the
thirdcapture group, the first two capture groups never run.  If you don't believe that, change the {0} to {1} and
observethat the script dies. 

> So I think throwing an
> error is an appropriate response.  The existing code will
> throw such an error for
>
>         ((.)){0}\1
>
> so I guess Spencer did think about this to some extent -- he
> just forgot about the possibility of nested parens.


Ugg.  That means our code throws an error where perl does not, pretty well negating my point above.  If we're already
throwingan error for this type of thing, I agree we should be consistent about it.  My personal preference would have
beento do the same thing as perl, but it seems that ship has already sailed. 


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Autovacuum on partitioned table (autoanalyze)
Next
From: Mark Dilger
Date:
Subject: Re: Another regexp performance improvement: skip useless paren-captures