Thread: Using regexp_replace to remove small words
Hi,
I'm trying to find a regular expression that removes all small (length < N) words from a string. But, until now I've not been successful.
For example:
If I pass 'George W Bush' as parameter, I want regexp_replace to return 'George Bush'.
Other examples are:
select regexp_replace( 'Tommy Lee Jones', 'REGEX', ' ', 'g' ) => 'Tommy Jones'
select regexp_replace( 'Dr Tommy Lee Jones', 'REGEX', ' ', 'g' ) => 'Tommy Jones'
select regexp_replace( 'Tommy Lee Jones Jr', 'REGEX', ' ', 'g' ) => 'Tommy Jones'
Thanks in advance,
HENRIQUE TRINDADE
ANALISTA DE SISTEMA II
( 55 (31) 3025-3550
Attachment
On fre, 2010-12-10 at 10:47 -0200, Henrique de Lima Trindade wrote: > I'm trying to find a regular expression that removes all small (length < N) > words from a string. But, until now I've not been successful. Here is a start: select regexp_replace('Tommy Lee Jones', $$\y\w{2,3}\y$$, ' ', 'g' ); If you want to normalize the spaces after the removal and handle beginning and end of the word, you will need to expand this to cover those cases, but the example should contain the key ingredients.
Hi Peter,
your REGEX is exactly what we were trying to find.
Thank you very much!
Rodrigo.
--
your REGEX is exactly what we were trying to find.
Thank you very much!
Rodrigo.
--
![]() | Rodrigo Mendonça Diretor de Tecnologia Tel: 31 3025 3550 / Cel: 31 8489 3119 rodrigo@vivver.com.br / diretoria@vivver.com.br |
On Fri, Dec 10, 2010 at 5:59 PM, Peter Eisentraut <peter_e@gmx.net> wrote: > select regexp_replace('Tommy Lee Jones', $$\y\w{2,3}\y$$, ' ', 'g' ); > speaking of regular expressions... the other day I was trying to find where the 'flags' option to regexp_replace() is described, but I cannot find it in the 9.0 manual in any obvious place. it is not described in the string functions section, nor in the regexp section.
Hi Peter, Your example works perfectly. But, I need Your help with on another situation. We're trying to create a plpgsql function with the expression. But, I'm getting a syntax error: ----------------- create or replace function sp_remove_small_words( ptext text ) returns text immutable as $$ begin return regexp_replace( ptext, $$\y\w{1,3}\y$$, '', 'g' ); end; $$ language plpgsql ; ----------------- ERRO: erro de sintaxe em ou próximo a "\" LINE 6: return regexp_replace( ptext, $$\y\w{1,3}\y$$, '', 'g' ); ^ ********** Error ********** ERRO: erro de sintaxe em ou próximo a "\" SQL state: 42601 Character: 138 Thanks again! -----Mensagem original----- De: Peter Eisentraut [mailto:peter_e@gmx.net] Enviada em: sexta-feira, 10 de dezembro de 2010 20:59 Para: Henrique de Lima Trindade Cc: pgsql-general@postgresql.org Assunto: Re: [GENERAL] Using regexp_replace to remove small words On fre, 2010-12-10 at 10:47 -0200, Henrique de Lima Trindade wrote: > I'm trying to find a regular expression that removes all small (length < N) > words from a string. But, until now I've not been successful. Here is a start: select regexp_replace('Tommy Lee Jones', $$\y\w{2,3}\y$$, ' ', 'g' ); If you want to normalize the spaces after the removal and handle beginning and end of the word, you will need to expand this to cover those cases, but the example should contain the key ingredients.
On Mon, Dec 13, 2010 at 7:24 AM, Henrique de Lima Trindade <henrique@vivver.com.br> wrote: > Your example works perfectly. But, I need Your help with on another situation. > We're trying to create a plpgsql function with the expression. But, I'm getting a syntax error: You're using $$ quoting for the function, and $$ quoting for the string in the regexp. Think of it like you're trying to embed a ' in a string delimited by ' also.
Vick Khera <vivek@khera.org> writes: > speaking of regular expressions... the other day I was trying to find > where the 'flags' option to regexp_replace() is described, but I > cannot find it in the 9.0 manual in any obvious place. it is not > described in the string functions section, nor in the regexp section. It's described in the same paragraph that describes regexp_replace, in http://www.postgresql.org/docs/9.0/static/functions-matching.html The regexp_replace function provides substitution of new text for substrings that match POSIX regular expression patterns. It has the syntax regexp_replace(source, pattern, replacement [, flags ]). The source string is returned unchanged if there is no match to the pattern. If there is a match, the source string is returned with the replacement string substituted for the matching substring. The replacement string can contain \n, where n is 1 through 9, to indicate that the source substring matching the n'th parenthesized subexpression of the pattern should be inserted, and it can contain \& to indicate that the substring matching the entire pattern should be inserted. Write \\ if you need to put a literal backslash in the replacement text. (As always, remember to double backslashes written in literal constant strings, assuming escape string syntax is used.) The flags parameter is an optional text string containing zero or more single-letter flags that change the function's behavior. Flag i specifies case-insensitive matching, while flag g specifies replacement of each matching substring rather than only the first one. Other supported flags are described in Table 9-19. regards, tom lane
On 2010-12-13 13.24, Henrique de Lima Trindade wrote: > Hi Peter, > > Your example works perfectly. But, I need Your help with on another situation. > We're trying to create a plpgsql function with the expression. But, I'm getting a syntax error: > > ----------------- > create or replace function sp_remove_small_words( ptext text ) returns text immutable as > $$ > begin > > return regexp_replace( ptext, $$\y\w{1,3}\y$$, '', 'g' ); > > end; > $$ language plpgsql > ; > ----------------- > ERRO: erro de sintaxe em ou próximo a "\" > LINE 6: return regexp_replace( ptext, $$\y\w{1,3}\y$$, '', 'g' ); You're ending the function declaration with the $$. Using $fun$ as For example, the following works: create or replace function sp_remove_small_words( ptext text ) returns text immutable as $fun$ begin return regexp_replace( ptext, $$\y\w{1,3}\y$$, '', 'g' ); end; $fun$ language plpgsql; For details on dollar quoting see: http://www.postgresql.org/docs/current/interactive/sql-syntax-lexical.html#SQL-SYNTAX-DOLLAR-QUOTING Regards, roppert > ^ > > ********** Error ********** > > ERRO: erro de sintaxe em ou próximo a "\" > SQL state: 42601 > Character: 138 > > Thanks again! > > > -----Mensagem original----- > De: Peter Eisentraut [mailto:peter_e@gmx.net] > Enviada em: sexta-feira, 10 de dezembro de 2010 20:59 > Para: Henrique de Lima Trindade > Cc: pgsql-general@postgresql.org > Assunto: Re: [GENERAL] Using regexp_replace to remove small words > > On fre, 2010-12-10 at 10:47 -0200, Henrique de Lima Trindade wrote: >> I'm trying to find a regular expression that removes all small (length< N) >> words from a string. But, until now I've not been successful. > > Here is a start: > > select regexp_replace('Tommy Lee Jones', $$\y\w{2,3}\y$$, ' ', 'g' ); > > If you want to normalize the spaces after the removal and handle > beginning and end of the word, you will need to expand this to cover > those cases, but the example should contain the key ingredients. > > > -- Regards, Robert "roppert" Gravsjö
On Mon, Dec 13, 2010 at 10:00 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Flag i > specifies case-insensitive matching, while flag g specifies > replacement of each matching substring rather than only the first > one. Other supported flags are described in Table 9-19. Thanks. Quite well hidden in plain sight. :( As a user, I was looking for a table similar to 9-19, since I discounted it because it described the embedded flags. Based on the above text though it seems they are also valid for the flags parameter.