Re: Regular Expression INTRO [long] [OT] - Mailing list pgsql-general
From | will trillich |
---|---|
Subject | Re: Regular Expression INTRO [long] [OT] |
Date | |
Msg-id | 20020410163854.A5944@serensoft.com Whole thread Raw |
In response to | Regular Expression INTRO [long] [OT] (will trillich <will@serensoft.com>) |
List | pgsql-general |
On Tue, Apr 09, 2002 at 06:05:43PM -0600, will trillich wrote: [snip] > With LIKE, your search values can use the percent (%) > to tell PostgreSQL that anything can occupy that spot > -- one character, twelve symbols or zero digits -- and > still satisfy the search. > > On Unix or Linux, this is basically the same as the > ASTERISK (*) at the command line, when dealing with > file names: > > # list all files whose names begin with '.bash' > ls .bash* > > # list all files containing 'out' anywhere in the name > ls *out* > > # list all file names ending with '.pl' > ls *.pl > > # list file starting with 'proj', ending with '.c' > ls proj*.c i'm torn between leaving in a full set of shell-glob samples to pair up with postgres-like samples, and using only one (as recommended by Holger Klawitter -- thanks!)... > With PostgreSQL using the LIKE operator, use the > percent, instead: > > -- list all customers within the 47610 postal code > SELECT * FROM cust WHERE zip LIKE '47610%'; > > -- display customers who have 'Corp' in their names > SELECT * FROM cust WHERE name LIKE '%Corp%'; > > -- show customers whose names end in 'LLC' > SELECT * FROM cust WHERE name LIKE '%LLC'; > > -- documents beginning with 'We', ending with 'rica' > SELECT * FROM doc WHERE contents LIKE 'We%rica'; > > Wherever the '%' appears (using the LIKE operator) > Postgres allows anything at all to match -- from a > lengthy string of text, to one single character, to > a zero-length string -- i.e. nothing at all. > > ...ILIKE 'A%Z' > -- matches 'AZ' and 'A to Z' and 'ABC2468XYZ' and 'A@$*Z' whoops -- this shoulda been LIKE, not ILIKE (at least not yet) [snip] > So what's with the BACKSLASH in front of the dot? Well, > just as LIKE has the UNDERSCORE (_) to denote "any single > character", REGULAR EXPRESSIONS use the DOT (.) for > that very same purpose. So we have to "escape" the > dot to alter its normal interpretation, using the. using the what? um, "...using the backslash." oops! [snip] > And for very common cases, there are handy abbreviations: > > 'x{1,1}' 'x' one (the default) > 'r{0,1}' 'r?' zero or one, i.e. 'x' is optional should be "'r' is optional". (thanks, Holger) > 'B{0,}' 'B*' zero or more > 'z{1,}' 'z+' one or more may as well flesh these out, to be parallel with the others: 'B{0,}' 'B*' zero or more B's 'z{1,}' 'z+' one or more z's [snip] > Note that we've prefixed the previous pattern with > > '[^0-9]' > > because, within the square brackets of a character class, > the CARAT (^) means "anything EXCEPT..." > > So now, 00000-0000 will NOT match; this is what we're > after. maybe i could be more clear, by saying: So now, our pattern will NOT match something like 00000-0000, and since we don't want it to, we're making progress. [snip] > And, just as in the LIKE operator, unless regular expressions > ARE anchored at the beginning of a field, you'll defeat any > index you have for that field. Indexes help alphabetize by > comparing the beginning of your fields to each other; unless > you're looking at the beginning of the field, your index can't > be used. some of you asked if regexes are actually able to use the index. well, here's the answer: if they're anchored at the front, YES! EXPLAIN repo=# explain repo-# select * from _table_1015197075 where str ~ '^f'; NOTICE: QUERY PLAN: Index Scan using _table_ix on _table_1015197075 (cost=0.00..2.01 rows=1 width=24) EXPLAIN repo=# explain repo-# select * from _table_1015197075 where str ~ 'f'; NOTICE: QUERY PLAN: Seq Scan on _table_1015197075 (cost=100000000.00..100000001.03 rows=1 width=24) to be honest, i "SET ENABLE_SEQSCAN = false" and created a quickie table with about fifty rows, with an index on the "str" column. with '^f' it DID use the index; without the carat, 'f', it absolutely COULD NOT use the index... imagine looking for words in the dictionary that /contain/ the letter 'f', right? (no mystery here -- but whether the '^f' COULD use the index was the question, of course -- and it did). cool. (i suspected, but wasn't certain.) [snip] > There are ways to anchor your searches to word boundaries, > as well -- not just beginning-of-field and end-of-field. See > your documentation for details. rewording is in order here -- how about: There are ways to anchor your searches to word boundaries, as well -- you're not restricted to testing only for beginning-of-field or end-of-field. (In fact, that's a more likely solution to the problem than what I devised here.) See your documentation for information on [[:<:]] and [[:>:]]. [snip] > As for [1] email stuff, it can be (zero or more of): > > '[\\-\\.]' > > hyphen or dot, followed by > > '[a-z0-9_]' > > alphanumerics (or underscore) > > '+' > > one or more times. how about a little added explanation for clarity? This means an address (before the @domain.com) can be alphanumerics only, or if it contains a dot or hyphen that more alphanumerics must follow the dot or hyphen. So 'me' and 'my.self' and 'albert.einstein-newton-john.jr' are all acceptable to this part of the pattern. > And as for [2] site stuff, it can be (one or more of): > > '[\\-\\.]' > > hyphen or dot, followed by > > '[a-z0-9_]' > > alphanumerics (or underscore) > > '+' > > one or more times. more clarity -- something like: So the site stuff means that the domain (@some.where) MUST contain at least one dot or hyphen, each of which must be followed by alphanumerics. These would NOT match: 'me..you' -- no alphanumerics after first dot 'oops-' -- no alphanumerics after hyphen '-eesh' -- need alphanumerics before hyphen To be strict, that last one WILL match the 'site stuff' fragment of the pattern, but it'll fail in the whole pattern, because after the '@' we require SOME alphanumerics. feel free to post comments/feedback to pgsql-user... -- "We will fight them on the beaches, we will fight them on the sons of beaches" -- Miguel Churchill, Winston's bastard Mexican brother. --lifted from http://www.astray.com/acmemail/stable/documentation.xml will@serensoft.com http://sourceforge.net/projects/newbiedoc -- we need your brain! http://www.dontUthink.com/ -- your brain needs us!
pgsql-general by date: