Regular-
Expression


About Regular Expression


Regular expressions have been used since the birth of computer science with automata theory, and grew with the popularity of UNIX.

Here is an example of a very simple state machine that represents all acceptable states that have 0 or more combinations of “ab”. Here is the set of the possible acceptable outcomes: {‘’, ‘ab’, ‘abab’, ‘ababab’, …}. To represent this in short hand, a regular expression might be written as (ab)*.

Regular expressions are used to represent complicated data into a simplified string. The ‘(‘ and ‘)’ are used in this case to say that the ‘ab’ are to be together, while the * represents the phrase: “zero or more”. Here are some quick references on regular expressions [Wiki]:

. Matches any single character except newline. Within square brackets, the dot character matches a literal dot. For example, a.c matches "abc", etc., but [a.c] matches only "a", ".", or "c".

[ ] Matches a single character that is contained within the brackets. For example, [abc] matches "a", "b", or "c". [a-z] specifies a range which matches any lowercase letter from "a" to "z". These forms can be mixed: [abcx-z] matches "a", "b", "c", "x", "y", and "z", as does [a-cx-z].

The - character is treated as a literal character if it is the last or the first character within the brackets, or if it is escaped with a backslash: [abc-], [-abc], or [a\-bc]. The [ character can be included anywhere within the brackets. To match the ] character, the easiest way is to escape it with a backslash, e.g., [\]]. Some tools allow you to avoid the backslash if the closing bracket is first in the enclosing square brackets, e.g., [][ab] matches "]", "[", "a", or "b".

[^ ] Matches a single character that is not contained within the brackets. For example, [^abc] matches any character other than "a", "b", or "c". [^a-z] matches any single character that is not a lowercase letter from "a" to "z". As above, literal characters and ranges can be mixed.

^ Matches the starting position within the string. In multiline mode, it matches the starting position of any line.

$ Matches the ending position of the string or the position just before a string-terminating newline. In multiline mode, it matches the ending position of any line.

( ) Defines a marked subexpression. The string matched within the parentheses can be recalled later (see the next entry, \n. A marked subexpression is also called a block or capturing group. This feature is not found in all instances of regular expressions, and in many Unix utilities including sed and vi, a backslash must precede the open and close parentheses for them to be interpreted with special meaning.

\n Matches what the nth marked subexpression matched, where n is a digit from 1 to 9. This construct is theoretically irregular and was not adopted in the POSIX extended regular expression (ERE) syntax. Some tools allow referencing more than nine capturing groups.

* Matches the preceding element zero or more times. For example, ab*c matches "ac", "abc", "abbbc", etc. [xyz]* matches "", "x", "y", "z", "zx", "zyx", "xyzzy", and so on. (ab)* matches "", "ab", "abab", "ababab", and so on.

{m,n} Matches the preceding element at least m and not more than n times. For example, a{3,5} matches only "aaa", "aaaa", and "aaaaa". This is not found in a few, older instances of regular expressions.

Download


Requirements/Recommend:

  • Microsoft Windows 2000/XP

  • Microsoft .NET Framework 2.0

File:

Click Here to Download

Instructions:

Installation:

Simply unzip the file and place the folder in the desire location. Run the [RegEx Explorer.exe] file to run the program. 

Running:

Upload a file into the program, and then enter your regular expression into the textbox and click on the button "Parse" to view results.

Sources


[Wiki] Wikimedia Foundation, Inc., “Regular expression” http://en.wikipedia.org/wiki/Regular_expression (12:57, 18 September 2007)

[RegEx] Jay Goyvaerts., “Regulare-Expression.Info” http://www.regular-expressions.info/ (10 September 2007)