• 假期学习13


    今天做的是最后一个实验Spark 机器学习库 MLlib 编程实践的前一部分。

    以下是部分代码:

    import org.apache.spark.ml.feature.PCA
    import org.apache.spark.sql.Row
    import org.apache.spark.ml.linalg.{Vector,Vectors}
    import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator
    import org.apache.spark.ml.{Pipeline,PipelineModel}
    import org.apache.spark.ml.feature.{IndexToString, StringIndexer, VectorIndexer,HashingTF, 
    Tokenizer}
    import org.apache.spark.ml.classification.LogisticRegression
    import org.apache.spark.ml.classification.LogisticRegressionModel
    import org.apache.spark.ml.classification.{BinaryLogisticRegressionSummary, 
    LogisticRegression}
    import org.apache.spark.sql.functions;
    scala> import spark.implicits._
    import spark.implicits._
    scala> case class Adult(features: org.apache.spark.ml.linalg.Vector, label: String)
    defined class Adult
    scala> val df = sc.textFile("adult.data.txt").map(_.split(",")).map(p => 
    Adult(Vectors.dense(p(0).toDouble,p(2).toDouble,p(4).toDouble, p(10).toDouble, p(11).toDouble, 
    p(12).toDouble), p(14).toString())).toDF()
    df: org.apache.spark.sql.DataFrame = [features: vector, label: string]
    scala> val test = sc.textFile("adult.test.txt").map(_.split(",")).map(p => 
    Adult(Vectors.dense(p(0).toDouble,p(2).toDouble,p(4).toDouble, p(10).toDouble, p(11).toDouble, 
    p(12).toDouble), p(14).toString())).toDF()
    test: org.apache.spark.sql.DataFrame = [features: vector, label: string]
  • 相关阅读:
    Python第二弹--------类和对象
    Python第一弹--------初步了解Python
    Java标记接口
    CentOS7下的YUM源服务器搭建详解,过程写的很详细(转)
    CentOS7.0安装Nginx 1.10.0
    QT中C++与Html端通信例子
    QT基础:QMainWindow学习小结
    QT基础:QT 定时器学习
    QT3D场景快速绘制入门学习
    QT编译错误:cannot find file: *.pro
  • 原文地址:https://www.cnblogs.com/Excusezuo/p/12315306.html
Copyright © 2020-2023  润新知