An Article from Aaron's Article ArchiveRuby Regexp Class Oddness
Photo: Kolob Wild RoseIPv4You are not logged in. Click here to log in.
Use Google to search aarongifford.com:
Ruby Regexp Class Oddness
Wednesday, 19 December 2007 9:52 AM MST
This morning while working on an email log processing ruby script, I discovered this bit of strangeness with Ruby's Regexp regular expression class.
It appears the in-group match modifier (?i) (which should enable case insensitivity within the group) correctly modifies the A option, but does something strange on any subsequent options, apparently inverting the regular expression case so that string "B" doesn't match regular expression character "B".
Here's the same thing with a lowercase regex:
There is no odd inversion in this case. String character "B" doesn't match regex character "b" -- leading me to conclude that the (?i) modifier only applies within grouping's first option. This would be a perfectly fine expected behavior if when I use uppercase in the regex, the (?i) modifier didn't act on subsequent optional subsets and effectively invert them when they're uppercase.
It gets worse:
In the above, the first match of string "bb" with regex group option "Bb" makes it seem the (?i) modifier is acting correctly on both the first group option AND the second group option. However the failure of all other variations of capitalization of "bb" means there's something screwy.
My conclusion? Don't use (?ix-ix) modifiers (or at least (?i) anyway) inside groups with multiple options separated by the bar/pipe "|" character unless you stick the modifier in each and every such option, or unless you stick it outside the group with multiple options by enclosing the group in a non-capturing group with the modifier, thus:
Now THAT works as expected.
NOTE: I'm running ruby version 1.8.6 on Mac OS X Leopard. However I get the same results with 1.8.6 under FreeBSD too.
Update (29 April 2009):
I'm now running Ruby 1.9.1 on Mac OS X and it looks like things are fixed: