Archive for October, 2006

Working With Regular Expressions in JavaScript

Sunday, October 22nd, 2006

Defining Regular Expression Patterns:

The most common way to define a regexp pattern is to define a variable instance with a pattern you want to match, along with any modifiers you wish to use. This can be achieved with the following:

1
var pattern = new RegExp("yourpattern", ["g"|"gi"|"i"|"x"]);

Understand that the second parameter is just an example, and not to be applied to all your patterns.

Understanding Modifiers:

Modifiers are parameters that a newly instantiated RegExp class use to define how to parse a particular pattern from its string. The most common modifiers used are g, i, m, x, U, and a combination of g, and i.

g means Global Match, i means Case Insensitive, m means Multiple Lines, x means Allow Comments and White Space in pattern, and U means Ungreedy pattern.

For example, if you wish to match foo in a large document that contained words like Foo, foo, and foobar, you would apply the following pattern:

1
var pattern = new RegExp("foo", ["g"|"gi"|"i"]);

Modifiers can also be applied with the string you wish to match, like /foo/g or /\s/g

Handling Regular Expression Methods:

There are seven definitive methods most commonly used when handling RexExp and String classes. Writing the pattern is only half the battle, knowing how to use them is the desired end game.

RegExp.exec(string) will apply the RegExp to a string and return an array of matches.

1
var exec= new RegExp(/foo/i).exec("foo foofighter Foo foobar")

RegExp.compile will take a non literal notation pattern, and compile it to RegExp native for faster execution. No example needed as this method is not used very often.

RegExp.test(string) will return true if the given string matches the RegExp pattern, returns false if not

1
var test = new RegExp(/foobar/).test("Foo bar foo Bar")

String.match(pattern) will match the string with the given pattern. If the modifier g is applied, it will return an array of matches. If not applied, it will either return the first match, or null for no matches.

1
var match = "Foobar is a foofighter, foo".match(/(foo)(oo+)/g)

String.search(pattern) will return the numeric beginning of the index of the matched pattern, will return -1 if no match is found.

1
var search = "Apples and oranges are not foos and bars".search(/foos/)

String.replace(pattern, string) will find the matched pattern and replace it with the supplied string, and returns the newly formed string.

1
var str = "Foobar is the big foobar.".replace(/foobar/gi, "apples")

String.split(pattern) splits the string into an array, splitting at the matched pattern.

1
var split = "Foo bar is a great word".split(/\s/g)

Backreferences:

Simple put, backreferences are references to the same variable in a previously successful match. \n where n is any positive nonzero integer telling the engine which successful match to reference. The following code will match any html tag:

1
var tags = new RegExp(/<(\S+).*>(.*)<\/\1>/)

Character Sets:

Character sets match any of the contained characters, and can be written in ranges. Examples of acceptable patterns are:

1
2
3
var halfchars = new RegExp(/a-l/)
var allnums = new RegExp(/0-9/)
var allcharsnonums = new RegExp(/^0-9/)

Quantifiers:

Quantifiers match the preceding subpattern n number of times. Acceptable subpatterns include, single characters, escaped sequences, patterns enclosed in parentheses, and character sets.

1
var matches = new RegExp(/o{1,2}/).match("Foobar foofighting toooo many times with the Fockers")

Example Regular Expression Patterns:

Trimming whitespace from the beginning and end of a string would look like the following:

1
var trim = new RegExp(/^[ \s]+|[ \s]+$/)

Validating an IP address would use the following pattern to capture the group of numbers between each [ . ].

1
var vip = new RegExp(/\b(?:\d{1,3}\.){3}\d{1,3}\b/)

Matching a date would look like the following

1
var date = new RegExp(/(\d{1,2}\/\d{1,2}\/\d{4})/)

The following pattern loosely validates email addresses:

1
var email = new RegExp(/\w@(a-zA-Z_]+?.[a-zA-Z]{2,6}/)