正则表达式是一种针对于字符串的操作,主要功能有匹配、切割、替换和获取的作用,在Scala中正则也是被频繁使用的方法(regex.r表示为正则表达式)
1、匹配
Scala支持多种正则表达式解析,主要包括下面三种:
- String.matches()方法
- 正则表达式模式匹配
- scala.util.matching.Regex API
//String.matches
val a = "studying83" println(a.matches("[a-z0-9]+")) //true println(a.matches("[a-z0-9]{4}"))//false
//正则表达式模式匹配
val b = """([a-z0-9]+)"""".r "studying83" match { case b => println("匹配成功") case _ => println("匹配失败") } //匹配成功
//scala.util.matching.Regex API
其中有三种匹配:
findFirstMatchIn()返回第一个匹配(Option[match])
findAllMatchIn()返回所有匹配(regex.match)
findAllIn()返回所有匹配结果(String)
//findFirstMatchIn() val reg = "[0-9]".r reg.findFirstMatchIn("abc3d2gf") match { case Some(x) => println(x) case None => println("no") } //3 //findAllMatchIn() val reg = "[0-9]".r println(reg.findAllMatchIn("abc3d2gf").toList) //List(3, 2)
2、捕获分组
val str = "{"id":"123456","friends":{"name":"zs","age":"40"}}" val reg = "\{"id":"([0-9]+)","friends":\{"name":"([a-z]+)","age":"([0-9]+)"}}".r reg.findAllMatchIn(str).foreach(x=>println(x.group(1),x.group(2),x.group(3))) //(123456,zs,40) val input="name:Jason,age:19,weight:100" val studentPattern="([0-9a-zA-Z-#() ]+):([0-9a-zA-Z-#() ]+)".r studentPattern.findAllMatchIn(input).foreach(x=>println(x.group(1),x.group(2))) //(name,Jason) (age,19) (weight,100) //实用性 例如某一日志文件内容如:INFO 2000-01-07 requestURI:/c?app=0&p=1 路径为path 对其进行解析 import scala.io.Source val source = Source.fromFile("path","UTF-8") val lines = source.getLines.toArray val reg = """([A-Z]+) ([0-9]{4}-[0-9]{2}-[0-9]{1,2}) requestURI:(.*)""".r 1## lines.map(line => reg.findAllMatchIn(line).toList.map(x => (x.group(1),x.group(2),x.group(3)))).foreach(println) //List((INFO,2020-01-07,/c?app=0&p=1)) 2## lines.map(line => line match{case reg(le,ld,ad) => (le,ld,ad)}) // Array[(String)] = Array((INFO,2000-01-07,/c?app=0&p=1))
3、替换
//replaceFirstIn val a = """([0-9]+)""".r a.replaceFirstIn("123,go! 666","run") // run,go! 666 //replaceAllIn val a = """([0-9]+)""".r a.replaceAllIn("123 you are the best!","come on!") //come on! you are the best!
4、查找
val date = """([0-9]{4})-([0-9]{1,2})-([0-9]{1,2})""".r "2020-5-18" match {case date(year, _*) => println((year))} //2020 "2020-5-18" match {case date(_,mon,_*) => println(mon)} //5