Posix regular expressions cheatsheet, where "posix" refers to the syntax supported by traditional unix utilities (grep, sed, awk) without the additions of perl-style "extended" syntax
Regular expression syntax
Pattern | Backslash needed (basic syntax) | Meaning |
---|---|---|
. | No | ANY character |
[:class:] | No | Character inside pre-defined set named "class" [1] |
[abc] | No | Character inside given set |
[a-m] | No | Character inside given range |
[^abc] | No | Character NOT inside given set |
(abc) | Yes | Cluster into pattern group |
a|b | Yes | Alternative patterns/groups |
a? | Yes | Repeat count: occurs zero or once |
a* | No | Repeat count: occurs ANY number of times (including 0) |
a+ | Yes | Repeat count: occurs one or more times |
a{2,5} | Yes | Repeat count: occurs some number within range |
^abc | No | Anchor: beginning of line |
abc$ | No | Anchor: end of line |
[1] | there is no character class named "class", merely serving as placeholder for the real classes of which the most important are: alpha, digit, alnum, lower, upper, blank, and space |
Basic syntax vs extended syntax
The only difference between "basic syntax" and "extended syntax" [2] is differing policies on backslash-ification.
Basic syntax requires certain characters to be backslashed to obtain the special pattern-matching meaning ("Yes" in middle column of table above), while other characters are assumed to have the special meaning unless they are backslashed to be meant literally ("No" in middle column of table above).
Extended syntax is much more consistent: omit the backslash to use the pattern-matching meaning for all special characters, or include a backslash to take the literal meaning of a character.
[2] | "extended" in this case refers to Posix-style extended syntax, not to perl-style extended syntax, which adds entirely new syntax elements not included in this cheatsheet |
Expression grouping
Reasons for grouping with parenthesis:
- Apply count to whole group (abc)?
- Use groups as alternative choices (abc)|(xyz)
- When using regexes to do substitutions, groups can be recalled as part of replacement text
Tips
Convenient testing from the command-line
Use —colour option of egrep or grep to do testing with a candidate pattern
# matches "2014-02-15" echo "2014-02-15: did stuff" | egrep --colour "^[^:]+"
Develop complicated patterns in extended syntax with egrep, then backslash-ify to basic syntax and test with grep if needed
# matches "product ID 4234A" echo "quantity 5: product ID 4234A-Z99" | egrep --colour "(product ID [0-9]+)+A" # matches "product ID 4234A" echo "quantity 5: product ID 4234A-Z99" | grep --colour "\(product ID [0-9]\+\)\+A"
Convenient testing from a browser
A very friendly web app for interactive experiments with regular expressions: small text box for the regex, large text box for the text to be matched, and highlighting that shows all matches in real-time
Watch out for "*" vs "+"
Choose carefully between "any number" and "one or more"; it is a common mistake to use "any" when the match should specify "at least one"
# matches "A", but probably not intended echo "HUMMA" | egrep --colour "(product ID [0-9]*)*A" # no match echo "HUMMA" | egrep --colour "(product ID [0-9]+)+A"
Relevant man pages
"man 7 regex", "man perlre"