这里只是记录下删除HBase数据的一个简单方法,其他的删除方式大家可以发散思维。代码如下:
// 根据时间删除错误数据
private def rmRazorError(table: String)(implicit args: Array[String]): Unit = {
var isSucc = false
var msg = ""
val JOB_NAME = s"$table-$rmDay"
val jobID = s"$JOB_NAME-" + workID
if (SQLLogger.isJobSucc(jobID)) {
msg = jobID + " has already been executed successfully."
log.info(msg)
isSucc = true
return
}
SQLLogger.insJobStart(workID, jobID, JOB_NAME)
log.info(s"$JOB_NAME start ...")
val hTable = Config.getHBaseConn.getTable(table)
hTable.setAutoFlushTo(false)
try {
// Get the parameter of work
val Array(startTime, endTime) = args
// 删除操作
val delRow = (r: Result) => {
val row = r.getRow
log.info("Deleting row: " + Bytes.toString(row))
hTable.delete(new Delete(row))
}
var tmpTime = startTime
// foreach to delete the row
while(tmpTime.compare(endTime) <= 0) {
//val hTable: HTableInterface = Config.getHBaseConn.getTable(table)
log.info(s"Deleting rows in table: $table" + " using " +tmpTime)
val scan = new Scan()
val rowFilter1 = new RowFilter(CompareFilter.CompareOp.EQUAL,
new RegexStringComparator(".*-"+tmpTime+".*"))
scan.setFilter(rowFilter1)
val rs2 = hTable.getScanner(scan).toIterator
rs2.foreach(delRow)
tmpTime = getBeforeOneDay(tmpTime)
}
isSucc = true
} catch {
case ex:Exception => {isSucc = false; msg = s"$table's job is failed"; finalSucc = false; isSucc = isSucc&&finalSucc}
} finally {
hTable.flushCommits()
hTable.close()
}
SQLLogger.insJobEnd(jobID, isSucc, msg)
log.info(s"$JOB_NAME end.")
}
代码当中的table为表的名称,同时拥有两个隐式参数startTime和endTime。该例子是讲startTime到endTime之间的所有的表中的数据给删除掉。删除的依据就是rowKey当中的yyyyMMdd这个时间值,如果你的rowKey当中有这个字段,可以依据此条件进行删除。