Thread: BUG #15046: non-greedy ignored

BUG #15046: non-greedy ignored

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      15046
Logged by:          Bob Gailer
Email address:      bgailer@gmail.com
PostgreSQL version: 10.1
Operating system:   windows 10
Description:

I start psql; enter:

postgres=# select regexp_replace('a(d)s(e)f', '\(.*?\)', '', 'g');
 regexp_replace
----------------
 asf
(1 row)

Works as expected. Then I add |q to the pattern, and the .*? becomes
greedy!

postgres=# select regexp_replace('a(d)s(e)f', '\(.*?\)|q', '', 'g');
 regexp_replace
----------------
 af
(1 row)


Re: BUG #15046: non-greedy ignored

From
"David G. Johnston"
Date:
On Friday, February 2, 2018, PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:

Bug reference:      15046
Logged by:          Bob Gailer
Email address:      bgailer@gmail.com
PostgreSQL version: 10.1
Operating system:   windows 10
Description:

I start psql; enter:

postgres=# select regexp_replace('a(d)s(e)f', '\(.*?\)', '', 'g');
 regexp_replace
----------------
 asf
(1 row)

Works as expected. Then I add |q to the pattern, and the .*? becomes
greedy!

postgres=# select regexp_replace('a(d)s(e)f', '\(.*?\)|q', '', 'g');
 regexp_replace
----------------
 af
(1 row)


This seems to be explained by the final greediness rule:


  • An RE consisting of two or more branches connected by the | operator is always greedy.


    David J.

Re: BUG #15046: non-greedy ignored

From
Tom Lane
Date:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Friday, February 2, 2018, PG Bug reporting form <noreply@postgresql.org>
> wrote:
>> Works as expected. Then I add |q to the pattern, and the .*? becomes
>> greedy!

> This seems to be explained by the final greediness rule:
> https://www.postgresql.org/docs/10/static/functions-matching.html#POSIX-MATCHING-RULES
>    An RE consisting of two or more branches connected by the | operator is
>    always greedy.

Yeah.  That subsection also contains some useful advice about how to
control greediness decisions --- in this case, wrapping the whole
thing with (...){1,1}? might do what you want.

The short answer, perhaps, is that non-greedy patterns are not
standardized by POSIX and you shouldn't expect that all regex
engines do them the same way.  Ours is definitely different
from Perl's, for example.

            regards, tom lane


Re: BUG #15046: non-greedy ignored

From
Bob Gailer
Date:

Thanks! Rtfp, eh?


On Feb 2, 2018 8:48 PM, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:
"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Friday, February 2, 2018, PG Bug reporting form <noreply@postgresql.org>
> wrote:
>> Works as expected. Then I add |q to the pattern, and the .*? becomes
>> greedy!

> This seems to be explained by the final greediness rule:
> https://www.postgresql.org/docs/10/static/functions-matching.html#POSIX-MATCHING-RULES
>    An RE consisting of two or more branches connected by the | operator is
>    always greedy.

Yeah.  That subsection also contains some useful advice about how to
control greediness decisions --- in this case, wrapping the whole
thing with (...){1,1}? might do what you want.

The short answer, perhaps, is that non-greedy patterns are not
standardized by POSIX and you shouldn't expect that all regex
engines do them the same way.  Ours is definitely different
from Perl's, for example.

                        regards, tom lane