paulgorman.org/technical

Regular Expressions

In Perl:

$foo =~ m/^(F|f)oo\s*(B|b)ar$/
$foo =~ s/foo bar/bas bat/
$foo =~ tr/[a,e,i,o,u]/[A,E,I,O,U]/</pre>

Does not match:

$bar !~ m/foobar/i

Quantifiers

Putting a question mark after the repetition (like x*?) makes in non-greedy.

Anchors

Character classes

Escapes

Pattern modifiers

Example: An i at the end of the expression makes it case insensitive:

$bar =~ m/foobar/i

If you wanted to ignore case for only part of a regular expression:

/(?i)foobar(?-i)BaT/

Grouping and ranges and backreferences

if ($string =~ m/John (Smith|Smyth|Psmith)/) {
	print "I found John!\n"
}

Example: If we want to match “All the king’s horse” but not match the escaped “All the king”s horses” (doubled single quote) we combine negating groups with a negative lookahead to match one single quote but not two:

[^']*'(?!')[^']*

Grouping with parens is also the way to capture matches (group $1, $2, etc.). This can also be used for backreferences, like:

s/(November) 3rd/\1 4th/g

Asertions, lookahead and lookbehind

A regex with positive lookahead matches something followed by something else. foo(?=t).* matches “football” but not “foobar”.

A regex with negative lookahead matches something not followed by something else. foo(?!t).* matches “foobar” but not “football”.

Lookbehind works the same way, with (?<=foot)ball (“ball” preceded by “foot”) and (?<!wrecking)ball (“ball” not preceded by “wrecking”).

POSIX classes

POSIX regular expressions come in two types: Basic and Extended. Extended POSIX regular expressions are more Perl-like and generally more powerful, although they lack back-references. Basic POSIX regular expressions include back references, like \1\2 for the first and second matches. However, basic regular expressions lack support for alternate either/or groups, like (foo|bar). See re_format(7).