Page 1 of 1

Greedy and Lazy Regular Expressions

Posted: Mon Sep 14, 2009 2:59 am
by jussij
Consider the following snippet of text:

Code: Select all

<March>, <12>, <2009>
Assume the user is trying to extract the <March> component of this text, so they might write the following regular expression:

Code: Select all

<.*>
But to their surprise when the search is run it selects the entire line of text: <March>, <12>, <2009>

One way to fix this is is to modifying the query to also specifying the text not to be matched as shown below:

Code: Select all

<[^>]*>
But the reason the regular expressions is doing what it is doing is that by default it is defined to be greedy, meaning it will try to match as much text as possible.

A much simpler fix to this problem is to define the regular expression as being lazy by adding a question mark after the quantifier as shown below:

Code: Select all

<.*?>
NOTE: Lazy regular expressions are also referred to as being minimal, non-greedy, reluctant or un-greedy.

Cheers Jussi

Greedy

Posted: Thu Sep 24, 2009 9:51 pm
by AlanStewart
That's a very handy thing to know! Thank you! I've been struggling with this problem ever since I had to give up using Brief!