• spark向量、矩阵类型


    先来个普通的数组:

    scala> var arr=Array(1.0,2,3,4)
    arr: Array[Double] = Array(1.0, 2.0, 3.0, 4.0)

    可以将它转换成一个Vector:

    scala> import org.apache.spark.mllib.linalg._
    scala> var vec=Vectors.dense(arr)
    vec: org.apache.spark.mllib.linalg.Vector = [1.0,2.0,3.0,4.0]

    再做一个RDD[Vector]:

    scala> val rdd=sc.makeRDD(Seq(Vectors.dense(arr),Vectors.dense(arr.map(_*10)),Vectors.dense(arr.map(_*100))))
    rdd: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector] = ParallelCollectionRDD[6] at makeRDD at <console>:26

    可以根据这个RDD做一个分布式的矩阵:

    scala> import org.apache.spark.mllib.linalg.distributed._
    scala> val mat: RowMatrix = new RowMatrix(rdd)
    mat: org.apache.spark.mllib.linalg.distributed.RowMatrix = org.apache.spark.mllib.linalg.distributed.RowMatrix@3133b850
    scala> val m = mat.numRows()
    m: Long = 3
    scala> val n = mat.numCols()
    n: Long = 4

    试试统计工具,算算平均值:

    scala> var sum=Statistics.colStats(rdd)
    scala> sum.mean
    res7: org.apache.spark.mllib.linalg.Vector = [37.0,74.0,111.0,148.0]
  • 相关阅读:
    Vue
    Vue
    Vue
    Vue
    Vue
    Vue
    Vue
    Vue
    Vue
    建立索引该如何选取字段
  • 原文地址:https://www.cnblogs.com/bluejoe/p/5115849.html
Copyright © 2020-2023  润新知