Spark算子讲解(二)

1：glom

def glom(): RDD[Array[T]]

将原RDD的元素收集到一个数组，创建一个数组类型的RDD

2：getNumPartitions

final def getNumPartitions: Int

求RDD的分区书

3：groupBy

def groupBy[K](f: (T) ⇒ K)(implicit kt: ClassTag[K]): RDD[(K, Iterable[T])]

根据指定函数进行分组，例如：

scala> rdd1.collect
res61: Array[Int] = Array(1, 2, 3, 4, 5)

scala> rdd1.groupBy(x=>if(x%2==0) 0 else 1).collect
res62: Array[(Int, Iterable[Int])] = Array((0,CompactBuffer(4, 2)), (1,CompactBuffer(1, 3, 5)))

4：randomSplit

def randomSplit(weights: Array[Double], seed: Long = Utils.random.nextLong): Array[RDD[T]]

将一个RDD根据weights数组进行划分多个RDD，返回一个数组。

5：countByValue

返回每一个元素出现的次数，可以更加方便实现wordcount

scala> sc.parallelize(Array(1,2,1,2,1,2,3,4,5)).countByValue
res73: scala.collection.Map[Int,Long] = Map(5 -> 1, 1 -> 3, 2 -> 3, 3 -> 1, 4 -> 1)

6：countByValueApprox

def countByValueApprox(timeout: Long, confidence: Double = 0.95)(implicit ord: Ordering[T] = null): PartialResult[Map[T, BoundedDouble]]

求一个近似的计算结果

7：++

def ++(other: RDD[T]): RDD[T]

求RDD的并集

8：fold

def fold(zeroValue: T)(op: (T, T) ⇒ T): T

例如：

scala> rdd1.collect
res90: Array[Int] = Array(1, 2, 3, 4, 5)

scala> rdd1.fold(0)(_+_)
res91: Int = 15

相关阅读:
二、Elasticsearch核心配置文件详解
javax.mail.AuthenticationFailedException: 535 authentication failed
Java compiler level does not match the version of the installed java project facet
一、设置cookie报非法参数异常
slf4j的使用规范
基于token的身份验证JWT
单点登录
spring的bean不能注入原因分析
规避空指针异常规范
Git远程操作详解

原文地址：https://www.cnblogs.com/leodaxin/p/7499552.html