• scala 之正则匹配


    一、String.matches()  ## 用于过滤需要处理的日志(如空格空行错误字符)

    语句:
    "!123".matches("[a-zA-Z0-9]{4}")  //false
    "34Az".matches("[a-zA-Z0-9]{4}")  //true
    // 应用:
    
    // 1.scala读取log
      def readFromTxt(filePath:String): Array[String] ={
          import scala.io.Source
          val source = Source.fromFile(filePath,"UTF-8")
          val lines = source.getLines().toArray
          source.close()
          lines
      }
    //2. 应用于过滤日志需要的信息
    // regex里三个""",就不需要转义了! val reg = """([A-Z]+) ([0-9]{4}-[0-9]{1,2}-[0-9]{1,2}) requestURI:(.*)""".r // 先过滤空格,再map lines.filter(_.matches("""([A-Z]+) ([0-9]{4}-[0-9]{1,2}-[0-9]{1,2}) requestURI:(.*)""")) .map(line=>line match{ case reg(level,logdate,addr)=>(level,logdate,addr) }).foreach(println(_))

    ----补充LOG日志-----
    INFO 2000-10-01 requestURI:/c?app=0&p=1&did=180042334&industry=45Z

    INFO 2012-11-11 requestURI:/c?app=2&p=3&did=140042334&industry=42Z
    WARN 2012-11-11 requestURI:/c?app=2&p=3&did=140042334&industry=42Z
    ERROR 2012-11-11 requestURI:/c?app=2&p=3&did=140042334&industry=42Z

    二、case模式匹配(推荐使用,最方便)

    模式匹配/模式守卫/类型匹配:https://blog.csdn.net/lyq7269/article/details/107759026

    例1

    // 语句1:
    val pattern = "([a-zA-Z][0-9][a-zA-Z] [0-9][a-zA-Z][0-9])".r
    "L3R 6M2" match {
        case pattern(x) => println("Valid zip-code: " + x )  //x为第1个分组结果,可以匹配多个分组
        case x => println("Invalid zip-code: " + x )
    } 
    // 语句2:
    val date = """(dddd)-(dd)-(dd)""".r
    "2014-05-23" match {
        case date(year, month, day) => println(year,month,day)
    }
    "2014-05-23" match {
        case date(year, _*) => println("The year of the date is " + year) 
    } 
    "2014-05-23" match {
        case date(_*) => println("It is a date")
    }

    例2

    val reg = """.* set se[0-9]_([0-9]+)_([0-9]+)_([0-9]+)r (.*),.*""".r
    rdd.foreach {
          case reg(zs, stu, ques, sa) => println(zs, stu, ques,sa)
        }

    匹配log如下,取红色字段

    2019-06-16 14:24:34 INFO com.noriental.praxissvr.answer.util.PraxisSsdbUtil:45 [SimpleAsyncTaskExecutor-1] [020765925160] req: set se0_34434412_8195023659593_80801,resp: ok 14

    注意点:使用模式匹配虽然方便,但是要注意reg中的括号一定不能镶嵌,比如匹配整数or小数时, ([0-9](.[0-9])?) 会因为找不到哪个括号而报错!最好使用 (.*) 

    三、import scala.util.matching.Regex API

    1)findFirstMatchIn() 返回第一个匹配(Option[Match])

    语句:
    import scala.util.matching.Regex
    val numberPattern: Regex = "[0-9]".r
    numberPattern.findFirstMatchIn("awesomepassword") match {
      case Some(_) => println("Password OK")  //匹配成功
      case None => println("Password must contain a number")   //未匹配
    }

    2)分组处理
    findAllMatchIn().toList => List[Regex.Match]

    例1

    语句2:
    import scala.util.matching.Regex
    
    val studentPattern:Regex="([0-9a-zA-Z-#() ]+):([0-9a-zA-Z-#() ]+)".r
    val input="name:Jason,age:19,weight:100"
    
    for(patternMatch<-studentPattern.findAllMatchIn(input)){
        println(s"key: ${patternMatch.group(1)} value: ${patternMatch.group(2)}")
    }

     例2

    rdd.map(line=>{
          val reg = """.* set se[0-9]_([0-9]+)_([0-9]+)_([0-9]+)r ([0-9](.[0-9])?),.*""".r
          reg.findAllMatchIn(line).map(x=>(x.group(1),x.group(2),x.group(3),x.group(4))
            .productIterator.mkString("	")).mkString("")
        }).foreach(println(_))

    匹配log如下,取红色字段

    2019-06-16 14:24:34 INFO com.noriental.praxissvr.answer.util.PraxisSsdbUtil:45 [SimpleAsyncTaskExecutor-1] [020765925160] req: set se0_34434412_8195023659593_8080r 1,resp: ok 14

    3)字符串处理

    1.字符串中替换
    replaceFirstIn("长字符串","需要替换成什么字符")
    replaceAllIn("长字符串","需要替换成什么字符")

    语句1:
    "[0-9]+".r.replaceFirstIn("234 Main Street Suite 2034", "567") //234->567   
    "[0-9]+".r.replaceAllIn("234 Main Street Suite 2034", "567") //234、2034->567

    2.

    字符串中查找:findAllIn().toList => list[String]

    字符串中查找:_用来扔掉不需要的数据,_*用于句末

    语句1:
    val nums = "[0-9]+".r.findAllIn("123 Main Street Suite 2012").toList.foreach(println(_))
    语句2:
    val date = """(dddd)-(dd)-(dd)""".r
    "2014-05-23" match {
        case date(year, month, day) => println(year,month,day)
    }
    "2014-05-23" match {
        case date(year, _*) => println("The year of the date is " + year) 
    } 
    "2014-05-23" match {
        case date(_*) => println("It is a date")
    }
  • 相关阅读:
    CSS
    CSS样式
    CentOS/Ubuntu 搭载环境所遇问题
    XHTML 注意的地方
    HTML 全局属性 全局事件属性
    shell命令之---Linux文件权限
    shell命令之---使用Linux环境变量
    shell命令之---处理数据文件
    shell命令之---检测磁盘空间
    shell命令之---文件内容查看
  • 原文地址:https://www.cnblogs.com/sabertobih/p/13683587.html
Copyright © 2020-2023  润新知