Regular expressions are a simple pattern matching language used for locating text in a file. All regular expressions are constructed from series of one or more single character expressions. Single character expressions can take several forms:
Typing chars |
A-Z a-z 0-9 ! @ # % |
Any alphanumeric or symbol char that |
< > ( ) { } , ~ | : |
can be typed except chars used in | |
; ? + = - _ <tab> |
substitution. These chars match only | |
<blk> <ctrl chars> |
identical chars in text. We must precede | |
? with \ if ? is used as first |
||
| char in a backward search. | ||
Substitution |
. ^ $ / [ ] \ * - |
These chars represent another char or |
or search |
beg or end of line or serve as | |
control chars |
delimiters, range identifiers, or escape | |
| chars in regular expressions. However, | ||
under certain conditions, - and ] |
||
| are interpreted directly as explained | ||
| below. | ||
Sets or ranges |
[set_of_chars] or |
A group of single chars or range of |
of chars |
[range_of_chars] or |
chars enclosed within a pair
of square brackets [ ] |
[combination_of_both] |
(such as [actz58&] or
[3-7]) where a |
|
| match is accepted if any of the chars | ||
between the [] or in the specified |
||
| range appears in the position defined by | ||
| the position of the single char | ||
| expression in a larger expression. The | ||
| second form example accepts a match if | ||
| 3, 4, 5, 6 or 7 appears in the position | ||
| indicated. |
The - is interpreted as a range specifier when defining sets of characters,
as in the single character expression [a-z], unless it is the first character in a
set of characters, as in the expression [-abdfgh12], which match any one of
the characters -, a , b, d, f, h, 1, or 2. Likewise, the ]
terminates the expression unless it is the first character in the set, as in the
group []=+rt12], which matches any one of the characters
], =, +, r, t, 1 or 2.