Sunday, January 9, 2011

Greedy and Ungreedy Matching

Perl regular expressions normally match the longest string possible. For instance:
my($text) = "mississippi";
$text =~ m/(i.*s)/;
print $1 . "\n";
 
Run the preceding code, and here's what you get:
ississ
 
It matches the first i, the last s, and everything in between them. But what if you want to match the first i to the s most closely following it? Use this code:
my($text) = "mississippi";
$text =~ m/(i.*?s)/;
print $1 . "\n";
 
Now look what the code produces:
is
 
Clearly, the use of the question mark makes the match ungreedy. But theres another problem in that regular expressions always try to match as early as possible.

No comments: