Friday, December 31, 2010

Regex lookahead, lookbehind and atomic groups

Regex lookahead, lookbehind and atomic groups are
 
(?!) - negative lookahead(?=) - positive lookahead(?<=) - positive lookbehind(?<!) - negative lookbehind
(?>) - atomic group

 
 
EX 1:

given the string foobarbarfoo

bar(?!bar) finds the second bar in the string.
bar(?=bar) finds the first bar in the string.
(?<=foo)bar finds the firs bar in the string
(?<!foo)bar finds the second bar in the string
 
you can also combine them
 
(?<=foo)bar(?=bar)
 
 
EX 2:
Check for 5 characters, then a space, then a non-space
(?=.{5}\s\S)
 
EX 3:
^(?=.{3}$).*
^        # The caret is an anchor which denotes "STARTS WITH"
(?=      # lookahead
   .     # wildcard match; the . matches any non-new-line character
    {3}  # quantifier; exactly 3 times
   $     # dollar sign; I'm not sure if it will act as an anchor but if it did it would mean "THE END"
)        # end of lookbehind
.        # wildcard match; the . matches any non-new-line character
 *       # quantifier; any number of times, including 0 times

 
EX 4:
$a = "<no> 3232 </no> ";
$a =~ s#(?<=<no>).*?(?=</no>)# 000 #gi;
print "$a\n";

 
EX 5:
perl -pe 's/(.)(?=.*?\1)//g' FILE_NAME
The regex used is: (.)(?=.*?\1)
  • . : to match any char.
  • first () : remember the matched single char.
  • (?=...) : +ve lookahead
  • .*? : to match anything in between
  • \1 : the remembered match.
  • (.)(?=.*?\1) : match and remember any char only if it appears again later in the string.
  • s/// : Perl way of doing the substitution.
  • g: to do the substitution globally...that is don't stop after first substitution.
  • s/(.)(?=.*?\1)//g : this will delete a char from the input string only if that char appears again later in the string.
 
 

 
 

No comments: