正则表达式类
我们可以通过三个方式创建正则表达式的实例
/R..y/ Regexp.new("R..y") %r(R..y)
irb(main):001:0> /R..y/ => /R..y/ irb(main):002:0> /R..y/ =~ "Ruby" => 0 irb(main):003:0> Regexp.new("R..y") =~ "Ruby" => 0 irb(main):004:0> %r(R..y) =~ "Ruby" => 0 irb(main):005:0> %r(R..y) == Regexp.new("R..y") => true
表达式的模式与匹配
用=~来匹配,如果匹配返回该字符串起始字符的位置,Ruby中nil与false为假 !~可以来匹配不匹配,我想这个应该用的比较少
匹配普通字符
当模式中只写有英文、数字时,正则表达式会单纯地根据目标字符串忠是否包含该模式忠的字符来判断是否匹配。
irb(main):007:0> /ABC/ =~ "123Abc" => nil irb(main):008:0> /ABC/ =~ "123ABC" => 3 irb(main):009:0> /ABC/i =~ "123Abc" => 3 irb(main):010:0>
^$这种表示特殊意义,不匹配字符的称为元字符
irb(main):007:0> /ABC/ =~ "123Abc" => nil irb(main):008:0> /ABC/ =~ "123ABC" => 3 irb(main):009:0> /ABC/i =~ "123Abc" => 3 irb(main):010:0> /^123$/ =~ "123bbb 123ccc" => nil irb(main):011:0> /^123/ =~ "123bbb 123ccc" => 0 irb(main):012:0> /^123/ =~ "12bbb 123ccc" => 6 irb(main):013:0> /^123$/ =~ "123" => 0 irb(main):014:0> /bb$/ =~ "12bbb 123ccc" => 3 irb(main):015:0> /$/ =~ "12bbb 123ccc" => 5 irb(main):016:0> /c$/ =~ "12bbb 123ccc" => 11 irb(main):017:0>
^,$分别匹配"行首" "行尾",而不是字符串的开头与结尾,匹配开头与结尾用A与z
irb(main):017:0> p "a ".gsub(/z/, "!") "a !" => "a !" irb(main):018:0> p "a ".gsub(//, "!") "a! !" => "a! !" irb(main):019:0> p "a".gsub(//, "!") "a!" => "a!" irb(main):020:0> p "a".gsub(/z/, "!") "a!" => "a!" irb(main):021:0>
大写的,会匹配两个地方,一个是 前,一个是 的最后。
指定匹配字符的范围
用[]来, [abc]表示abc中任意字符,[a-z][0-9]表示全部小写字母,0-9的全部数字,想匹配-这个字符,可以写在[-a-z]前面或者后面
irb(main):024:0> /[a-z-]/ =~ "-" => 0 irb(main):025:0> /[-a-z]/ =~ "-" => 0 irb(main):026:0> /[-a-z]/ =~ "ha" => 0 irb(main):027:0> /[-a-z]/ =~ "123ha" => 3 irb(main):028:0>
[^a-z]就是选除了a-z以外的数字
irb(main):028:0> /[^A-Z][A-Z]/ =~ "1A2b3c" => 0 irb(main):029:0> /[^0-9][^A-z]/ =~ "1A2b3c4D" => 1 irb(main):030:0>
匹配任意字符
一个.匹配任意字符
irb(main):032:0> /^...$/ =~ "123" => 0 irb(main):033:0> /^...$/ =~ "1234" => nil irb(main):034:0> /^...$/ =~ "12" => nil irb(main):035:0>
这个就是匹配指定长度的字符
使用反斜杠的模式
s 表示空白符,匹配空格,制表符,换页符
irb(main):035:0> /abcs123/ =~ "abc 1" => nil irb(main):036:0> /abcs123/ =~ "abc 1234" => 0 irb(main):037:0> /abcs123/ =~ "abc 1234" => 0 irb(main):038:0> /abcs123/ =~ "abc1234" => nil irb(main):039:0>
d匹配0-9的数字效果跟[0-9]一样
irb(main):039:0> /dd-dd/ =~ "12-456" => 0 irb(main):040:0> /dd-dd/ =~ "aa12-456" => 2 irb(main):041:0> /dd-dd/ =~ "aa1a2-456" => nil irb(main):042:0>
w匹配英文字母与数字 = [a-zA-Z0-9]
irb(main):042:0> /www/ =~ "23dd" => 0 irb(main):043:0> /www/ =~ " 23dd" => 1 irb(main):044:0> /www/ =~ " 23dd" => 3 irb(main):045:0> /www/ =~ " 23 dd" => nil irb(main):046:0>
A z一个匹配字符串的头,一个匹配字符串的尾
irb(main):046:0> /AABC/ =~ "ABC" => 0 irb(main):047:0> /AABC/ =~ "ABCdf" => 0 irb(main):048:0> /AABC/ =~ "123 ABCdf" => nil irb(main):049:0>
z
irb(main):049:0> /ABCz/ =~"ABC" => 0 irb(main):050:0> /ABCz/ =~"123ABC" => 3 irb(main):051:0> /ABCz/ =~"123ABC " => nil irb(main):052:0> /ABC/ =~"123ABC " => 3 irb(main):053:0> /ABCz/ =~"123 ABC" => 4 irb(main):054:0> /ABCz/ =~"123ABC AB" => nil irb(main):055:0> /ABCz/i =~"123ABC AB" => nil irb(main):056:0>
当要匹配一些特殊符号的如^$[]可以用进行转义
irb(main):056:0> /]/ =~"[]" => 1 irb(main):057:0> /[]^]/ =~"[]^" => 1 irb(main):058:0> /[]^]/ =~"[12^" => 3 irb(main):059:0>
重复
* 重复0到无穷多 + 重复1到无穷多 ? 重复0到1次 {n} 重复n次 {n,m}重复n到m次 {n,} 最少重复n次 {,n}最多重复n次
irb(main):059:0> /a{2}/ =~ "aaa" => 0 irb(main):060:0> /a{2}/ =~ "312aaa" => 3 irb(main):061:0> /a{2}/ =~ "312a" => nil irb(main):062:0> /a{2}/ =~ "312abaa" => 5 irb(main):063:0>
最短匹配,默认是贪婪匹配,通过*?或者+?变成最小匹配或者懒惰匹配
可以通过()选定范围来进行重复多个字符的匹配
/(abc){2,}/
/(abc)?/
选择 使用(|)小括号里面一个|
irb(main):068:0> /(123|321|23)/ =~ "312abaa" => nil irb(main):069:0> /(123|321|23|ba)/ =~ "312abaa" => 4 irb(main):070:0> /(123|321|23|ba)?/ =~ "312abaa" => 0 irb(main):071:0> /(123|321|23|ba)+/ =~ "312abaa" => 4 irb(main):072:0> /(123|321|23|ba)+/ =~ "312abaa"
使用quote的正式表达式
当希望转义表达式中的所有元字符,可以使用quote
irb(main):075:0> re1 = Regexp.new("abc*def") => /abc*def/ irb(main):076:0> re2 = Regexp.new(Regexp.quote("abc*def")) => /abc*def/ irb(main):077:0> re1 =~ "abc*def" => nil irb(main):078:0> re2 =~ "abc*def" => 0 irb(main):079:0>
正则表达式的选择//i表示忽略大小写,//m表示.可以匹配换行符
捕获
所谓捕获,就是从正则表达式的匹配部分中提取其中的某部分。通过$1 $2这样的形式的标量,获取捕获的部分字符串
irb(main):079:0> /(.)(.)(.)/ =~ "abcd" => 0 irb(main):080:0> p $1 "a" => "a" irb(main):081:0> p $2 "b" => "b" irb(main):082:0> p $3 "c" => "c" irb(main):083:0> p $4 nil => nil
使用(?: )过滤不需要捕获的模式
>> /(.)(dd)+(.)/ =~ "123456" => 0 >> $1 => "1" >> $2 => "45" >> $3 => "6" >> /(.)(?:dd)+(.)/ =~ "123456" => 0 >> $1 => "1" >> $2 => "6" >> $3 => nil >>
除了$数字,还有通过$`,$&,$'分别代码匹配字符串的前面,
>> /C./ =~ "ABCDEF" => 2 >> $` => "AB" >> $& => "CD" >> $' => "EF" >>
使用$~可以获取所有的匹配结果
>> /(C.)/ =~ "ABCDEF" => 2 >> $~ => #<MatchData "CD" 1:"CD"> >> $~[1] => "CD" >>
使用正则表达式的方法
sub替换一次,gsub全部替换
>> str = "abc def g hi" => "abc def g hi" >> str.sub(/s+/, ' ') => "abc def g hi" >> str => "abc def g hi" >> str.gsub(/s+/, ' ') => "abc def g hi" >> str => "abc def g hi" >>
sub与gsub还可以使用块
str = "abracatabra"
irb(main):001:0> str = "abracatabra" => "abracatabra" irb(main):002:0> nstr = str.sub(/.a/) do |matched| irb(main):003:1* '<'+matched.upcase+'>' irb(main):004:1> end => "ab<RA>catabra" irb(main):007:0> nstr => "ab<RA>catabra" irb(main):008:0> nstr = str.gsub(/.a/) do |matched| irb(main):009:1* '<'+matched.upcase+'>' irb(main):010:1> end => "ab<RA><CA><TA>b<RA>" irb(main):011:0>
双单引号,需要修改的变量,添加+号
也可以通过sub!,gsub!修改本身
scan方法
获取匹配的字符,返回arry
"ra" "ca" "ta" "ra" shijianzhongdeMacBook-Pro:chapter_16 shijianzhong$ cat scan1.rb "abracatabra".scan(/.a/) do |matched| p matched end
在表达式中用()
["r", "a"] ["c", "a"] ["t", "a"] ["r", "a"] shijianzhongdeMacBook-Pro:chapter_16 shijianzhong$ cat scan2.rb "abracatabra".scan(/(.)(a)/) do |matched| p matched end
正则中的()通过块中的多个变量接收
"r-a" "c-a" "t-a" "r-a" [["r", "a"], ["c", "a"], ["t", "a"], ["r", "a"]] shijianzhongdeMacBook-Pro:chapter_16 shijianzhong$ cat scan3.rb "abracatabra".scan(/(.)(a)/) do |a, b| p a+"-"+b end p "abracatabra".scan(/(.)(a)/) shijianzhongdeMacBook-Pro:chapter_16 shijianzhong$
正则表达式的例子
匹配网址
server address: www.ruby-lang.org shijianzhongdeMacBook-Pro:chapter_16 shijianzhong$ cat url_match.rb str = "http://www.ruby-lang.org/ja/" %r|http://([^/]*)| =~str print "server address: ", $1, " " shijianzhongdeMacBook-Pro:chapter_16 shijianzhong$
练习题
1将电子邮箱的账号名保存在$1,域名保存在$2
email = "china@163.com" re = %r|(w+)@([a-zA-z0-9.]*)| re =~ email p $1 p $2
2利用gsub方法,将字符串"正则表达式真难啊,怎么这么难懂" 替换为"正则表达式真简单啊,怎么这么易懂"
"正则表达式真简单啊,怎么这么易懂!" shijianzhongdeMacBook-Pro:exercises shijianzhong$ cat e2.rb str = "正则表达式真难啊,怎么这么难懂!" str = str.gsub(/真难/, "真简单") str = str.gsub(/难懂/, "易懂") p str shijianzhongdeMacBook-Pro:exercises shijianzhong$
3定义方法word_capitalize,当被指定的参数为连字符(hyphen)连接的英文字符串时,都被连字符分割的部分做capitalize化处理(即单词的首字母大写,其余小写)
def word_capitalize(string) string.gsub(/w+/) do |matched| ''+matched.capitalize+'' end end p word_capitalize("in-reply-to") p word_capitalize("X-MAILER")