Regular expressions allow more specific queries then a simple query. *Examples* <TABLE> <TR> <TD> compan(y|ies) </TD><TD> Search for _company_ , _companies_ </TD> </TR><TR> <TD> (peter|paul) </TD><TD> Search for _peter_ , _paul_ </TD> </TR><TR> <TD> bug* </TD><TD> Search for _bug_ , _bugs_ , _bugfix_ </TD> </TR><TR> <TD> [Bb]ag </TD><TD> Search for _Bag_ , _bag_ </TD> </TR><TR> <TD> b[aiueo]g </TD><TD> Second letter is a vowel. Matches _bag_ , _bug_ , _big_ </TD> </TR><TR> <TD> b.g </TD><TD> Second letter is any letter. Matches also _b&g_ </TD> </TR><TR> <TD> [a-zA-Z] </TD><TD> Matches any one letter (not a number and a symbol) </TD> </TR><TR> <TD> [^0-9a-zA-Z] </TD><TD> Matches any symbol (not a number or a letter) </TD> </TR><TR> <TD> [A-Z][A-Z]* </TD><TD> Matches one or more uppercase letters </TD> </TR><TR> <TD> [0-9][0-9][0-9]-[0-9][0-9]- <br> [0-9][0-9][0-9][0-9] </TD><TD VALIGN="top"> US social security number, e.g. 123-45-6789 </TD> </TR> </TABLE> Here is stuff for our UNIX freaks: <BR> (copied from 'man grep') <pre> \c A backslash (\) followed by any special character is a one-character regular expression that matches the spe- cial character itself. The special characters are: + `.', `*', `[', and `\' (period, asterisk, left square bracket, and backslash, respec- tively), which are always special, except when they appear within square brackets ([]). + `^' (caret or circumflex), which is special at the beginning of an entire regular expres- sion, or when it immediately follows the left of a pair of square brackets ([]). + $ (currency symbol), which is special at the end of an entire regular expression. . A `.' (period) is a one-character regular expression that matches any character except NEWLINE. [string] A non-empty string of characters enclosed in square brackets is a one-character regular expression that matches any one character in that string. If, however, the first character of the string is a `^' (a circum- flex or caret), the one-character regular expression matches any character except NEWLINE and the remaining characters in the string. The `^' has this special meaning only if it occurs first in the string. The `-' (minus) may be used to indicate a range of consecutive ASCII characters; for example, [0-9] is equivalent to [0123456789]. The `-' loses this special meaning if it occurs first (after an initial `^', if any) or last in the string. The `]' (right square bracket) does not terminate such a string when it is the first character within it (after an initial `^', if any); that is, []a-f] matches either `]' (a right square bracket ) or one of the letters a through f inclusive. The four characters `.', `*', `[', and `\' stand for themselves within such a string of characters. The following rules may be used to construct regular expres- sions: * A one-character regular expression followed by `*' (an asterisk) is a regular expression that matches zero or more occurrences of the one-character regular expres- sion. If there is any choice, the longest leftmost string that permits a match is chosen. ^ A circumflex or caret (^) at the beginning of an entire regular expression constrains that regular expression to match an initial segment of a line. $ A currency symbol ($) at the end of an entire regular expression constrains that regular expression to match a final segment of a line. * A regular expression (not just a one- character regular expression) followed by `*' (an asterisk) is a regular expression that matches zero or more occurrences of the one- character regular expression. If there is any choice, the longest leftmost string that permits a match is chosen. + A regular expression followed by `+' (a plus sign) is a regular expression that matches one or more occurrences of the one-character regular expression. If there is any choice, the longest leftmost string that permits a match is chosen. ? A regular expression followed by `?' (a ques- tion mark) is a regular expression that matches zero or one occurrences of the one- character regular expression. If there is any choice, the longest leftmost string that permits a match is chosen. | Alternation: two regular expressions separated by `|' or NEWLINE match either a match for the first or a match for the second. () A regular expression enclosed in parentheses matches a match for the regular expression. The order of precedence of operators at the same parenthesis level is `[ ]' (character classes), then `*' `+' `?' (closures),then concatenation, then `|' (alternation)and NEWLINE. </pre>
This topic: TWiki
>
RegularExpression
Topic revision: r1 - 2000-08-18 - PeterThoeny
Copyright © 1999-2025 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki?
Send feedback
Note:
Please contribute updates to this topic on TWiki.org at
TWiki:TWiki.RegularExpression
.