Perl/Shell Knowledge Sharing: Regex lookahead, lookbehind and atomic groups

Friday, December 31, 2010

Regex lookahead, lookbehind and atomic groups

Regex lookahead, lookbehind and atomic groups are

(?!) - negative lookahead(?=) - positive lookahead(?<=) - positive lookbehind(?<!) - negative lookbehind
(?>) - atomic group

EX 1:

given the string foobarbarfoo

bar(?!bar) finds the second bar in the string.
bar(?=bar) finds the first bar in the string.
(?<=foo)bar finds the firs bar in the string
(?<!foo)bar finds the second bar in the string

you can also combine them

(?<=foo)bar(?=bar)

EX 2:

Check for 5 characters, then a space, then a non-space

(?=.{5}\s\S)

EX 3:

^(?=.{3}$).*
^        # The caret is an anchor which denotes "STARTS WITH"
(?=      # lookahead
   .     # wildcard match; the . matches any non-new-line character
    {3}  # quantifier; exactly 3 times
   $     # dollar sign; I'm not sure if it will act as an anchor but if it did it would mean "THE END"
)        # end of lookbehind
.        # wildcard match; the . matches any non-new-line character
 *       # quantifier; any number of times, including 0 times

EX 4:

$a = "<no> 3232 </no> ";
$a =~ s#(?<=<no>).*?(?=</no>)# 000 #gi;
print "$a\n";

EX 5:

perl -pe 's/(.)(?=.*?\1)//g' FILE_NAME

The regex used is: (.)(?=.*?\1)

. : to match any char.
first () : remember the matched single char.
(?=...) : +ve lookahead
.*? : to match anything in between
\1 : the remembered match.
(.)(?=.*?\1) : match and remember any char only if it appears again later in the string.
s/// : Perl way of doing the substitution.
g: to do the substitution globally...that is don't stop after first substitution.
s/(.)(?=.*?\1)//g : this will delete a char from the input string only if that char appears again later in the string.

Friday, December 31, 2010

Regex lookahead, lookbehind and atomic groups

No comments: