• spark单机环境搭建以及快速入门


    1 准备

    系统环境

    cat /etc/centos-release
    CentOS Linux release 7.3.1611 (Core)
    

    配置jdk8

    wget --no-cookies --no-check-certificate --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie" "http://download.oracle.com/otn-pub/java/jdk/8u181-b13/96a7b8442fe848ef90c96a2fad6ed6d1/jdk-8u181-linux-x64.tar.gz"
    tar -xf jdk-8u181-linux-x64.tar.gz
    
    echo 'export JAVA_HOME=/home/work/fsj/jdk1.8.0_181
    export PATH=$JAVA_HOME/bin:$PATH' >> ~/.bashrc
    
    source ~/.bashrc
    java -version
    

    配置spark

    从http://spark.apache.org/downloads.html 下载最新版spark预编译包并解压。

    echo 'export SPARK_HOME=/home/work/fsj/spark-2.3.0-bin-hadoop2.7
    export PATH=$SPARK_HOME/bin:$PATH' >> ~/.bashrc
    source ~/.bashrc
    run-example SparkPi 10  # 运行例子
    

    2 spark-shell

    $ spark-shell --master local[2]
    2018-09-02 16:12:37 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Setting default log level to "WARN".
    To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
    Spark context Web UI available at http://localhost:4040
    Spark context available as 'sc' (master = local[2], app id = local-1535875965532).
    Spark session available as 'spark'.
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _ / _ / _ `/ __/  '_/
       /___/ .__/\_,_/_/ /_/\_   version 2.3.0
          /_/
    
    Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_181)
    Type in expressions to have them evaluated.
    Type :help for more information.
    
    scala> sc
    res1: org.apache.spark.SparkContext = org.apache.spark.SparkContext@674aa626
    
    scala> val textFile = spark.read.textFile("README.md")
    2018-09-02 16:16:44 WARN  ObjectStore:6666 - Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
    2018-09-02 16:16:45 WARN  ObjectStore:568 - Failed to get database default, returning NoSuchObjectException
    2018-09-02 16:16:45 WARN  ObjectStore:568 - Failed to get database global_temp, returning NoSuchObjectException
    textFile: org.apache.spark.sql.Dataset[String] = [value: string]
    
    scala> textFile.first()
    res2: String = # Apache Spark
    

    可以看到shell给出了ui地址:http://localhost:4040

    3 独立项目

    spark-submit提交作业包括local模式和集群模式。这里只涉及local模式。

    通过maven来管理Scala依赖,新建SimpleApp项目。
    在pom文件中需要加上scala maven plugin
    由于插件包含scala,所以我们用maven编译项目时,本地并不需要配置scala

    代码和具体pom写法见:https://github.com/shenjiefeng/spark-examples/tree/master/SimpleApp

    $ cd /path/to/SimpleApp && mvn clean package  # 建议选择国内mvn源
    $ tree
    .
    ├── pom.xml
    ├── src
    │   └── main
    │       └── scala
    │           └── SimpleApp.scala
    └── target
        ├── classes
        │   ├── SimpleApp$$anonfun$1.class
        │   ├── SimpleApp$$anonfun$2.class
        │   ├── SimpleApp.class
        │   └── SimpleApp$.class
        ├── classes.timestamp
        ├── maven-archiver
        │   └── pom.properties
        ├── simple-project-1.0.jar
        ├── surefire
        └── test-classes
    
    $ spark-submit   --class "SimpleApp"   --master local[*]   target/simple-project-1.0.jar
    ...
    

    修改spark-submit命令的日志级别:

    $ cd $SPARK_HOME
    $ cp conf/log4j.properties.template conf/log4j.properties
    log4j.rootCategory=INFO, console # 改成WARN
    
    $ spark-submit   --class "SimpleApp"   --master local[*]   target/simple-project-1.0.jar
    18/09/02 17:27:52 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Lines with a: 61, Lines with b: 30
    

    更多

  • 相关阅读:
    用asp.net还原与恢复sqlserver数据库(转)
    子窗口和父窗口交互
    Oracle 数据库导入导出和windows环境下的oracle服务
    从...中检测到有潜在危险的 Request.Form 值的解决办法 和嵌入页面代码
    ccat – 使用语法突出显示输出内容
    如何在Linux中使用Shell脚本终止用户会话?
    如何在Rescue模式下配置网络和SSH登录
    Linux 是洗衣粉!关于Linux 的10个趣事
    讲述:一个月薪12000的北京程序员的真实生活
    Linux文件的颜色代码
  • 原文地址:https://www.cnblogs.com/lawlietfans/p/9574689.html
Copyright © 2020-2023  润新知