• 使用Scala代码删除hbase数据库当中的数据


    这里只是记录下删除HBase数据的一个简单方法,其他的删除方式大家可以发散思维。代码如下:

    // 根据时间删除错误数据
      private def rmRazorError(table: String)(implicit args: Array[String]): Unit = {
        var isSucc = false
        var msg = ""
        val JOB_NAME = s"$table-$rmDay"
        val jobID =  s"$JOB_NAME-" + workID
        if (SQLLogger.isJobSucc(jobID)) {
          msg = jobID + " has already been executed successfully."
          log.info(msg)
          isSucc = true
          return
        }
        SQLLogger.insJobStart(workID, jobID, JOB_NAME)
        log.info(s"$JOB_NAME start ...")
        val hTable = Config.getHBaseConn.getTable(table)
        hTable.setAutoFlushTo(false)
        try {
          // Get the parameter of work
          val Array(startTime, endTime) = args
          // 删除操作
          val delRow = (r: Result) => {
            val row = r.getRow
            log.info("Deleting row: " + Bytes.toString(row))
            hTable.delete(new Delete(row))
          }
          var tmpTime = startTime
          // foreach to delete the row
          while(tmpTime.compare(endTime) <= 0) {
            //val hTable: HTableInterface = Config.getHBaseConn.getTable(table)
    
            log.info(s"Deleting rows in table: $table" + " using " +tmpTime)
    
            val scan = new Scan()
            val rowFilter1 = new RowFilter(CompareFilter.CompareOp.EQUAL,
              new RegexStringComparator(".*-"+tmpTime+".*"))
            scan.setFilter(rowFilter1)
    
            val rs2 = hTable.getScanner(scan).toIterator
            rs2.foreach(delRow)
            tmpTime = getBeforeOneDay(tmpTime)
          }
          isSucc = true
        } catch {
          case ex:Exception => {isSucc = false; msg = s"$table's job is failed"; finalSucc = false; isSucc = isSucc&&finalSucc}
        } finally {
          hTable.flushCommits()
          hTable.close()
        }
        SQLLogger.insJobEnd(jobID, isSucc, msg)
        log.info(s"$JOB_NAME end.")
      }

    代码当中的table为表的名称,同时拥有两个隐式参数startTime和endTime。该例子是讲startTime到endTime之间的所有的表中的数据给删除掉。删除的依据就是rowKey当中的yyyyMMdd这个时间值,如果你的rowKey当中有这个字段,可以依据此条件进行删除。

    学习、成长
  • 相关阅读:
    chapter01
    2019.07.11
    系统进程
    Linex第五-第七章
    Linex第三章第四章
    Linux 系统管理 第二章第三章
    2019/7/24
    使用.htaccess进行浏览器图片文件缓存
    div+css3实现漂亮的多彩标签云,鼠标移动会有动画
    搜索排序的作弊与反作弊,面壁人与智子的巅峰对决
  • 原文地址:https://www.cnblogs.com/yarcl/p/11046769.html
Copyright © 2020-2023  润新知