• PairRDD中算子reduceByKey图解


    reduceByKey

    函数原型:

    def reduceByKey(func: (V, V) => V): RDD[(K, V)]

    def reduceByKey(func: (V, V) => V, numPartitions: Int): RDD[(K, V)]

    def reduceByKey(partitioner: Partitioner, func: (V, V) => V): RDD[(K, V)]

    作用:

    按照func的映射关系,将两个V型的值映射到相同类型的V值上去。

    例子:

    scala> var rdd1 = sc.makeRDD(Array(("A",0),("A",2),("B",1),("B",2),("C",1)))
    rdd1: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[0] at makeRDD at <console>:27

    scala> rdd1.partitions.size
    res0: Int = 48

    scala> var rdd2 = rdd1.reduceByKey((x,y) => x + y)
    rdd2: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[1] at reduceByKey at <console>:29

    scala> rdd2.collect
    res1: Array[(String, Int)] = Array((A,2), (B,3), (C,1))

    scala> rdd2.partitions.size
    res2: Int = 48

    scala> var rdd2 = rdd1.reduceByKey(new org.apache.spark.HashPartitioner(2),(x,y) => x + y)
    rdd2: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[2] at reduceByKey at <console>:29

    scala> rdd2.collect
    res3: Array[(String, Int)] = Array((B,3), (A,2), (C,1))

    scala> rdd2.partitions.size
    res4: Int = 2

  • 相关阅读:
    对我人生影响最大的三位老师
    自我介绍
    转-一般产品的使用过程
    谷歌浏览器开发调试工具中Sources面板 js调试等 完全介绍 --转载
    接口测试--总结
    常见正则表达式
    B/S架构的软件,主要的功能测试点有哪些
    SQL语句大全转
    11.2
    11.1
  • 原文地址:https://www.cnblogs.com/seaspring/p/5722036.html
Copyright © 2020-2023  润新知