业务场景
有如下数据:
id intime outtime
1190771865,2019-11-26 13:27:26,2019-11-26 13:27:26 1190771865,2019-11-26 13:27:26,2019-11-26 13:27:26 1190771865,2019-11-26 16:42:46,2019-11-26 16:42:46 1190771865,2019-11-26 13:27:26,2019-11-26 13:27:26 1190771865,2019-11-26 16:42:46,2019-11-26 16:42:46 1190771865,2019-11-26 17:23:11,2019-11-26 17:23:11 1190771865,2019-11-26 13:27:26,2019-11-26 13:27:26 1190771865,2019-11-26 16:42:46,2019-11-26 16:42:46 1190771865,2019-11-26 17:23:11,2019-11-26 17:23:11 1190771865,2019-11-26 13:27:26,2019-11-26 13:27:26 1190771865,2019-11-26 16:42:46,2019-11-26 16:42:46 1190771865,2019-11-26 17:23:11,2019-11-26 17:23:11 1190771865,2019-11-26 13:27:26,2019-11-26 13:27:26 1190771865,2019-11-26 16:42:46,2019-11-26 16:42:46 1190771865,2019-11-26 17:23:11,2019-11-26 17:23:11 1190771865,2019-11-26 13:27:26,2019-11-26 13:27:26 1190771865,2019-11-26 16:42:46,2019-11-26 16:42:46 1190771865,2019-11-26 17:23:11,2019-11-26 17:23:11 1190771865,2019-11-26 13:27:26,2019-11-26 13:27:26 1190771865,2019-11-26 16:42:46,2019-11-26 16:42:46 1190771865,2019-11-26 17:23:11,2019-11-26 17:23:11 1190771865,2019-11-26 13:27:26,2019-11-26 13:27:26 1190771865,2019-11-26 16:42:46,2019-11-26 16:42:46 1190771865,2019-11-26 17:23:11,2019-11-26 17:23:11 1190771865,2019-11-26 13:27:26,2019-11-26 13:27:26 1190771865,2019-11-26 16:42:46,2019-11-26 16:42:46 1190771865,2019-11-26 17:23:11,2019-11-26 17:23:11 1190771865,2019-11-26 13:27:26,2019-11-26 13:27:26 1190771865,2019-11-26 16:42:46,2019-11-26 16:42:46 1190771865,2019-11-26 17:23:11,2019-11-26 17:23:11 1190771865,2019-11-26 13:27:26,2019-11-26 13:27:26 1190771865,2019-11-26 16:42:46,2019-11-26 16:42:46 1190771865,2019-11-26 17:23:11,2019-11-26 17:23:11 1190771865,2019-11-26 13:27:26,2019-11-26 13:27:26 1190771865,2019-11-26 16:42:46,2019-11-26 16:42:46 1190771865,2019-11-26 17:23:11,2019-11-26 17:23:11 1190771865,2019-11-26 13:27:26,2019-11-26 13:27:26 1190771865,2019-11-26 16:42:46,2019-11-26 16:42:46 1190771865,2019-11-26 17:23:11,2019-11-26 17:23:11 1190771865,2019-11-26 13:27:26,2019-11-26 13:27:26 1190771865,2019-11-26 16:42:46,2019-11-26 16:42:46 1190771865,2019-11-26 17:23:11,2019-11-26 17:23:11 1190771865,2019-11-26 13:27:26,2019-11-26 13:27:26 1190771865,2019-11-26 16:42:46,2019-11-26 16:42:46 1190771865,2019-11-26 17:23:11,2019-11-26 17:23:11 1190771865,2019-11-26 13:27:26,2019-11-26 13:27:26 1190771865,2019-11-26 16:42:46,2019-11-26 16:42:46 1190771865,2019-11-26 17:23:11,2019-11-26 17:23:11 1190771865,2019-11-26 13:27:26,2019-11-26 13:27:26
需求:
针对以上数据进行重组,重组规则为:
对以上数据进行intime升序排序,后一条数据与前一条数据的intime进行比较
1、如果第二条与第一条数据的差值大于120min,则直接舍弃第一条数据
2、后一条数据与前一条数据差值小于120,则保留上一条数据的intime,将这一条的intime当做上一条的outtime,继续往后遍历,知道遍历到最后一条数据
3、如果后一条数据与前一条数据的差值大于120min,则将该条数据当做新的一条数据,继续循环上面的规则
代码实现:
1、将上面数据处理成为一个array,即(aaa,Array(id,intime,outtime))
注:在这之前已经将每条数据中的进出时间转换为了时间戳
mergedDataTmp.map(x => (x._1, .distinct.filter(x => x._2<= x._2))) .mapPartitions(iter => { iter.map(x => { var count = 0 var iterNum = 0 val tList = new ListBuffer[(String, (String, String, String))]() val vs = x._2.sortWith((a, b) => a._2 < b._2).toIterator val vsList = vs.toList val vsLength = vsList.length var tmpV = "" for (t <- vsList) { iterNum += 1 if (count == 0) { tList += ((x._1, t)) count += 1 } else { val compareTime = if (!tList.isEmpty) { (DateUtil.dateToTimeStamp(t._2) - DateUtil.dateToTimeStamp(tList.last._2._2)) / 1000 >= 120 * 60 } else { false } if (compareTime && count == 1) { // (如果后一条记录的进时间)-(前一条记录的进时间)>=120min tList.remove(tList.length - 1) tList += ((x._1, t)) } else if (compareTime && count > 1) { // (如果后一条记录的进时间)-(前一条记录的进时间)>=120min val lastRecord = tList.last tList(tList.length - 1) = (x._1, (t._1, lastRecord._2._2, tmpV, t._3)) tList += ((x._1, t)) count = 1 } else { // 如果后一条记录的进时间 - 前一条记录的进时间<120min count += 1 if (iterNum == vsLength) { val lastRecord = tList.last tList(tList.length - 1) = (x._1, (t._1,lastRecord._2._2, t._3)) } tmpV = t._2 } } } tList }) }).flatMap(x => x)