• Spark-windows安装


     

     

     

    Spark

     

    目的:达到能在pycharm中测试

    1.安装必要的文件:

    JDK

    AnaConda

    spark

    hadoop

    jdk测试:java -version

    Anaconda测试: 打开Anaconda Prompt输入conda list

    spark测试(注意spark的安装路径不能有空格):spark-shell

    2.配置环境变量

     

    3.打开pycharm测试

    import os
    from pyspark import SparkConf, SparkContext
    os.environ['JAVA_HOME']='G:Program FilesJavajdk1.8.0_181'
    conf = SparkConf().setMaster('local[*]').setAppName('word_count')
    sc = SparkContext(conf=conf)
    d = ['a b c d', 'b c d e', 'c d e f']
    d_rdd = sc.parallelize(d)
    rdd_res = d_rdd.flatMap(lambda x: x.split(' ')).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a+b)
    print(rdd_res)
    print(rdd_res.collect())

    运行结果:

    G:ProgramDataAnaconda3python.exe "H:/1.study/资料(1)/机器学习2/Maching Learning_2/chapter13/spark_test.py"
    19/07/18 17:12:13 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    PythonRDD[5] at RDD at PythonRDD.scala:53
    [('a', 1), ('e', 2), ('b', 2), ('c', 3), ('d', 3), ('f', 1)]
    ​
    Process finished with exit code 0

    利用spark求圆周率代码

    
    
    import random
    import os
    from pyspark import SparkConf, SparkContext
    os.environ['JAVA_HOME']='G:Program FilesJavajdk1.8.0_181'
    conf = SparkConf().setMaster('local[*]').setAppName('word_count')
    sc = SparkContext(conf=conf)
    NUM_SAMPLES = 100000def inside(p):
        x, y = random.random(), random.random()
        return x*x + y*y < 1
    ​
    count = sc.parallelize(range(0, NUM_SAMPLES)).filter(inside).count()
    print("π粗糙的值: %f" % (4.0 * count / NUM_SAMPLES))

    得到结果:

    [Stage 0:============================================>              (6 + 2) / 8]
     π粗糙的值: 3.129680
  • 相关阅读:
    dpdk优化相关 转
    常用的TCP Option
    c10k C10M
    Linux惊群效应详解
    bloomfilter 以及count min sketch
    Squid 搭建正向代理服务器
    Openflow的架构+源码剖析 转载
    Hyperscan与Snort的集成方案
    动态图
    psutil 模块
  • 原文地址:https://www.cnblogs.com/TimVerion/p/11211046.html
Copyright © 2020-2023  润新知