• scala mapPartitionsWithIndex函数的使用


    var rdd1=sc.makeRDD(Array((1,"A"),(2,"B"),(3,"C"),(4,"D")),2)

    rdd1.partitions.size

    res20:int=2

    rdd1.mapPartitionsWithIndex{

    (partIdx,iter)=>{

     var part_map=scala.collection.mutable.Map[string,List[(Int,String)]]()

     while(iter.hasNext)

    {

      var part_name="part_"+partIdx;

      var elem=iter.next();

     if(part_map.contains(part_name)){

     var elems=part_map(part_name)

    elems::=elem

    part_map(part_name)=elems

    } else{

      part_map(part_name)=List[(Int,String)]{elem}

    }

    }

    part_map.iterator

    }}.collect

     -----------------------------------------------------------

    val three=sc.textFile("/tmp/spark/three",3)
    var idx=0
    import org.apache.spark.HashPartitioner

    val res=three.filter(_.trim().length>0).map(num=>(num.trim.toInt,"")).partitionBy(new HashPartitioner(1)).sortBykey().map
    (t=>{
    idx+=1
    (idx,t._1)
    }).collect.foreach(x=>println(x._1+" "+x._2))

    ------------------------------------------------------------------

    spark算子:partitionBy对数据进行分区
    https://www.cnblogs.com/yy3b2007com/p/7800793.html

    Hadoop经典案例Spark实现(三)——数据排序

    https://blog.csdn.net/kwu_ganymede/article/details/50475788

  • 相关阅读:
    分布式文件系统技术选型
    .net core 与nginx笔记
    分布式场景
    c printf函数
    c 编程范式
    VS2019 卡顿,甚至卡死
    SQL Server 跨服务器查询
    递归 0到100求和
    moment js 制作倒计时 比较简单
    关于地狱回调的理解
  • 原文地址:https://www.cnblogs.com/chengjun/p/8954515.html
Copyright © 2020-2023  润新知