• pyspark数据准备


    鸢尾花数据集

    1 5.1,3.5,1.4,0.2,Iris-setosa
    2 4.9,3.0,1.4,0.2,Iris-setosa
    3 4.7,3.2,1.3,0.2,Iris-setosa
    4 4.6,3.1,1.5,0.2,Iris-setosa
    5 5.0,3.6,1.4,0.2,Iris-setosa
    6 5.4,3.9,1.7,0.4,Iris-setosa
    7 4.6,3.4,1.4,0.3,Iris-setosa
    8 5.0,3.4,1.5,0.2,Iris-setosa

    转换成libsvm格式代码

     1 import sys
     2 
     3 file = sys.argv[1]
     4 
     5 def main():
     6     with open(file,'r') as df:
     7         for line in df:
     8             ss = line.strip().split(",")
     9             if ss[4]=="Iris-setosa":
    10                 ss[4]=0
    11             if ss[4]=="Iris-versicolor":
    12                 ss[4]=1
    13             if ss[4]=="Iris-virginica":
    14                 ss[4]=2
    15             print("%d 1:%.1f 2:%.1f 3:%.1f 4:%.1f"%(ss[4],float(ss[0]),float(ss[1]),float(ss[2]),float(ss[3])))
    16 if __name__ == '__main__':
    17     try:
    18         main()
    19     except Exception as e:
    20         raise e

    libsvm格式的鸢尾花数据集

     1 0 1:5.1 2:3.5 3:1.4 4:0.2
     2 0 1:4.9 2:3.0 3:1.4 4:0.2
     3 0 1:4.7 2:3.2 3:1.3 4:0.2
     4 0 1:4.6 2:3.1 3:1.5 4:0.2
     5 0 1:5.0 2:3.6 3:1.4 4:0.2
     6 0 1:5.4 2:3.9 3:1.7 4:0.4
     7 0 1:4.6 2:3.4 3:1.4 4:0.3
     8 0 1:5.0 2:3.4 3:1.5 4:0.2
     9 0 1:4.4 2:2.9 3:1.4 4:0.2
    10 0 1:4.9 2:3.1 3:1.5 4:0.1
    11 0 1:5.4 2:3.7 3:1.5 4:0.2

    pyspark读取libsvm格式数据并转换

    
    
    >>> from pyspark.mllib.util import MLUtils
    
    >>> examples = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")

     >>> examples.take(2)
     [Stage 26:>                                                         (0 + 1) / 1]

     [LabeledPoint(0.0, (4,[0,1,2,3],[5.1,3.5,1.4,0.2])), LabeledPoint(0.0, (4,[0,1,2
     ,3],[4.9,3.0,1.4,0.2]))]

     
  • 相关阅读:
    IdHTTP + RegExpr
    Delphi多线程编程之五不同类线程读写全局变量阻塞和锁定
    服务器开发
    Delphi多线程编程之四 线程安全和VCL
    接口测试方式
    LR11开始录制时打不开浏览器
    接口测试基础
    ospf应用简单
    OSPF协议原理及配置4邻接关系的建立和LSDB同步
    windows7安装远程服务器AD域管理工具
  • 原文地址:https://www.cnblogs.com/luozeng/p/9227669.html
Copyright © 2020-2023  润新知