Greedy and Lazy Regular Expressions
Posted: Mon Sep 14, 2009 2:59 am
Consider the following snippet of text:
Assume the user is trying to extract the <March> component of this text, so they might write the following regular expression:
But to their surprise when the search is run it selects the entire line of text: <March>, <12>, <2009>
One way to fix this is is to modifying the query to also specifying the text not to be matched as shown below:
But the reason the regular expressions is doing what it is doing is that by default it is defined to be greedy, meaning it will try to match as much text as possible.
A much simpler fix to this problem is to define the regular expression as being lazy by adding a question mark after the quantifier as shown below:
NOTE: Lazy regular expressions are also referred to as being minimal, non-greedy, reluctant or un-greedy.
Cheers Jussi
Code: Select all
<March>, <12>, <2009>
Code: Select all
<.*>
One way to fix this is is to modifying the query to also specifying the text not to be matched as shown below:
Code: Select all
<[^>]*>
A much simpler fix to this problem is to define the regular expression as being lazy by adding a question mark after the quantifier as shown below:
Code: Select all
<.*?>
Cheers Jussi