In Perl:
$foo =~ m/^(F|f)oo\s*(B|b)ar$/ $foo =~ s/foo bar/bas bat/ $foo =~ tr/[a,e,i,o,u]/[A,E,I,O,U]/
Does not match:
$bar !~ m/foobar/i
* Zero or more + One or more ? Zero or one {7} Exactly seven {3,} Three or more {2,5} Two, three, four, or five
Putting a question mark after the repetition (like x*?
) makes in non-greedy.
^ Match beginning of line or string \A Start of string $ Match end of line or string \Z End of string \b Match word boundary \B Non-word boundary \< Start of word \> End of word
\w Word (alphanumeric plus "_") \W Non-word \s Whitespace \S Non-whitespace \d Digit \D Non-digit \c Control character \x Hex digit \O Octal digit \n Newline \r Carriage return \t Tab \v Vertical tab \f Formfeed \a Alarm (bell, beep) \e Escape
\ Excape next character (e.g. \^ for literal carrot rather than line start) \Q Begin sequence of literals \E End literal sequence
Example: An "i" at the end of the expression makes it case insensitive: $bar =~ m/foobar/i
g Global (match all) m Multi-line (^ and $ match anywhere, not just at the very right and left edges of the string) s Single string (. matches anything, including newlines) x Improve legibility by permitting whitespace and comments in pattern a ASCII-safe matching against Unicode x Ignore whitespace in pattern unless it's backslashed or inside brackets (allows writting the regex itself in a more readable format, with line breaks)
If you wanted to ignore case for only part of a regular expression:
/(?i)foobar(?-i)BaT/
if($string =~ m/John (Smith|Smyth|Psmith)/) {print "I found John!\n"}
. Any character except \n (foo|bar) foo or bar (?:foo) Non-capturing group [xyz] x or y or z (single character) [^xyz] NOT x or y or z [a-f] Single character in range a through f
Example: If we want to match "All the king's horse" but not match the escaped "All the king''s horses" (doubled single quote) we combine negating groups with a negative lookahead to match one single quote but not two:
[^']*'(?!')[^']*
Grouping with parens is also the way to capture matches (group $1, $2, etc.). This can also be used for backreferences, like: s/(November) 3rd/\1 4th/g
$1, $2, $3 First, second, third matches $+ Last/final match $& The entire match $` Before match $' After match
?= Positive lookahead ?! Negative lookahead ?<= Positive lookbehind ?<! Negative lookbehind ?> Once-only sub-expression ?() Conditional if-then ?()| Conditional if-then-else ?# Comment
A regex with positive lookahead matches something followed by something else. foo(?=t).*
matches "football" but not "foobar".
A regex with negative lookahead matches something not followed by something else. foo(?!t).*
matches "foobar" but not "football".
Lookbehind works the same way, with (?<=foot)ball
("ball" preceded by "foot") and (?<!wrecking)ball
("ball" not preceded by "wrecking").
[:upper:] Like [A-Z] [:lower:] Like [a-z] [:alpha:] Like [a-zA-Z] [:digit:] Like [0-9] [:alnum:] Like [a-zA-Z0-9] [:word:] Like [a-zA-Z0-9_] [:xdigit:] Like [0-9a-f] [:punct:] Any punctuation [:space:] Like [\t\r\n\f\v] [:blank:] Space or tab
POSIX regular expressions come in two types: Basic and Extended. Extended POSIX regular expressions are more Perl-like and generally more powerful, although they lack back-references. Basic POSIX regular expressions include back references, like \1\2
for the first and second matches. However, basic regular expressions lack support for alternate either/or groups, like `(foo|bar)`.See re_format(7).
© Paul Gorman