Mining information from text using regular expressions

Mining helmet

Here's some regular expressions to extract common pieces of information like URLs, phone number, mail addresses, prices, dates and times.

To see them in action, press the 'Find Data' button below, and the text will be turned into links or highlighted, using my open-source JavaScript library to apply the REs to the whole document.

You can also enter your own text to search in the box at the bottom. For more information, see http://petewarden.typepad.com


Phone numbers

([0-9]{3})[^0-9]*([0-9]{3})[^0-9]*([0-9]{4})
805 277 3606, 805-277-3606, (805) 277 3606, 8052773606


Email addresses

[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,4}
pete@petewarden.com


URLs

https?://[a-z0-9\./%+_\-\?=&#]+
[a-z0-9\./%+_-]+\\.[a-z]{2,4}[a-z0-9\./%+_\\-\\?=&#]*
http://foo.com, https://foo.com, http://foo.com/index.html, http://sub.foo.com/directory/index.html, foo.com/index.html, foo.com/search?query=foo&lang=en#17


Dollar amounts

\$\s?[0-9,]+(\.[0-9]{1,2})?(\s(thousand|m[^a-z]|mm[^a-z]|million|b[^a-z]|billion))?
$10 $10.99, $10,000, $10 thousand, $10 mm, $10 million, $99 billion, $99 b, $ 10 thousand!


Times

[012]?[0-9]:[0-5][0-9]((\.|:)[0-5][0-9])?(\s?(a|p)m)?
10:30, 10:30.48, 6:30, 11:30pm, 10:40:36 am


Dates

(January|Jan|February|Feb|March|Mar|April|Apr|May|June|Jun|July|Jul|August|Aug|September|Sept|October|Oct|November|Nov|December|Dec)[^0-9a-z]+([0-9]{1,2})(st|nd|rd|th)?[^0-9a-z]+((19|20)?[0-9]{2})
([0-9]{1,2})[/-]([0-9]{1,2})[/-]((19|20)?[0-9]{1,2})
June 1st, 2008, Jun 1 2008, 6/1/08, 6/1/2008


Enter your own text