• Spark运行环境的安装


    scala-2.9.3:一种编程语言,下载地址:http://www.scala-lang.org/download/
        spark-1.4.0:必须是编译好的Spark,如果下载的是Source,则需要自己根据环境使用SBT或者MAVEN重新编译才能使用。  

        编译好的 Spark下载地址:http://spark.apache.org/downloads.html

    2、安装scala-2.9.3

     
    #解压scala-2.9.3.tgz
    tar -zxvf scala-2.9.3.tgz
    #配置SCALA_HOME
    vi /etc/profile
    #添加如下环境
    export SCALA_HOME=/home/apps/scala-2.9.3
    export PATH=.:$SCALA_HOME/bin:$PATH
    #测试scala安装是否成功
    #直接输入
    scala
    

      


    3、安装spark-1.4.0

     
    #解压spark-1.4.0.tgz
    tar -zxvf spark-1.4.0.tgz
    #配置SPARK_HOME
    vi /etc/profile
    #添加如下环境
    export SCALA_HOME=/home/apps/spark-1.4.0
    export PATH=.:$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH
    

      

    4、修改Spark配置文件

    #复制slaves.template和 spark-env.sh.template各一份
    cp  spark-env.sh.template  spark-env.sh
    cp  slaves.template slaves
    #slaves,此文件是指定子节点的主机,直接添加子节点主机名即可
    

      

        在spark-env.sh末端添加如下几行:

    #JDK安装路径
    export JAVA_HOME=/root/app/jdk
    #SCALA安装路径
    export SCALA_HOME=/root/app/scala-2.9.3
    #主节点的IP地址
    export SPARK_MASTER_IP=192.168.1.200
    #分配的内存大小
    export SPARK_WORKER_MEMORY=200m
    #指定hadoop的配置文件目录
    export HADOOP_CONF_DIR=/root/app/hadoop/etc/hadoop
    #指定worker工作时分配cpu数量
    export SPARK_WORKER_CORES=1
    #指定spark实例,一般1个足以
    export SPARK_WORKER_INSTANCES=1
    #jvm操作,在spark1.0之后增加了spark-defaults.conf默认配置文件,该配置参数在默认配置在该文件中
    export SPARK_JAVA_OPTS
    

      

        spark-defaults.conf中还有如下配置参数:

    SPARK.MASTER    //spark://hostname:8080
    SPARK.LOCAL.DIR    //spark工作目录(做shuffle的目录)
    SPARK.EXECUTOR.MEMORY //spark1.0抛弃SPARK_MEM参数,使用该参数
    

      

     

    5、测试spark安装是否成功

    在主节点机器上启动顺序
    1、先启动hdfs(./sbin/start-dfs.sh)
    2、启动spark-master(./sbin/start-master.sh)
    3、启动spark-worker(./sbin/start-slaves.sh)
    4、jps查看进程有
    主节点:namenode、secondrynamnode、master
    从节点:datanode、worker
    5、启动spark-shell
    15/06/21 21:23:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    15/06/21 21:23:47 INFO spark.SecurityManager: Changing view acls to: root
    15/06/21 21:23:47 INFO spark.SecurityManager: Changing modify acls to: root
    15/06/21 21:23:47 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
    15/06/21 21:23:47 INFO spark.HttpServer: Starting HTTP Server
    15/06/21 21:23:47 INFO server.Server: jetty-8.y.z-SNAPSHOT
    15/06/21 21:23:47 INFO server.AbstractConnector: Started SocketConnector@0 .0.0.0:38651
    15/06/21 21:23:47 INFO util.Utils: Successfully started service 'HTTP class server' on port 38651.
    Welcome to
    ____              __
    / __/__  ___ _____/ /__
    _ / _ / _ `/ __/  '_/
    /___/ .__/\_,_/_/ /_/\_   version 1.4.0
    /_/
    Using Scala version 2.10.4 (Java HotSpot(TM) Client VM, Java 1.7.0_65)
    Type in expressions to have them evaluated.
    Type :help for more information.
    15/06/21 21:23:54 INFO spark.SparkContext: Running Spark version 1.4.0
    15/06/21 21:23:54 INFO spark.SecurityManager: Changing view acls to: root
    15/06/21 21:23:54 INFO spark.SecurityManager: Changing modify acls to: root
    15/06/21 21:23:54 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); users with modify permissions: Set(root)
    15/06/21 21:23:56 INFO slf4j.Slf4jLogger: Slf4jLogger started
    15/06/21 21:23:56 INFO Remoting: Starting remoting
    15/06/21 21:23:57 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.1.200:57658]
    15/06/21 21:23:57 INFO util.Utils: Successfully started service 'sparkDriver' on port 57658.
    15/06/21 21:23:58 INFO spark.SparkEnv: Registering MapOutputTracker
    15/06/21 21:23:58 INFO spark.SparkEnv: Registering BlockManagerMaster
    15/06/21 21:23:58 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-4f1badf6-1e92-47ca-98a2-6d82f4882f15/blockmgr-530e4335-9e59-45d4-b9fb-6014089f5a00
    15/06/21 21:23:58 INFO storage.MemoryStore: MemoryStore started with capacity 267.3 MB
    15/06/21 21:23:59 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-4f1badf6-1e92-47ca-98a2-6d82f4882f15/httpd-4b2cca3c-e8d4-4ab3-9c3d-38ec579ec873
    15/06/21 21:23:59 INFO spark.HttpServer: Starting HTTP Server
    15/06/21 21:23:59 INFO server.Server: jetty-8.y.z-SNAPSHOT
    15/06/21 21:23:59 INFO server.AbstractConnector: Started SocketConnector@0 .0.0.0:51899
    15/06/21 21:23:59 INFO util.Utils: Successfully started service 'HTTP file server' on port 51899.
    15/06/21 21:23:59 INFO spark.SparkEnv: Registering OutputCommitCoordinator
    15/06/21 21:23:59 INFO server.Server: jetty-8.y.z-SNAPSHOT
    15/06/21 21:23:59 INFO server.AbstractConnector: Started SelectChannelConnector@0 .0.0.0:4040
    15/06/21 21:23:59 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
    15/06/21 21:23:59 INFO ui.SparkUI: Started SparkUI at http://192.168.1.200:4040
    15/06/21 21:24:00 INFO executor.Executor: Starting executor ID driver on host localhost
    15/06/21 21:24:00 INFO executor.Executor: Using REPL class URI: http://192.168.1.200:38651
    15/06/21 21:24:01 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 59385.
    15/06/21 21:24:01 INFO netty.NettyBlockTransferService: Server created on 59385
    15/06/21 21:24:01 INFO storage.BlockManagerMaster: Trying to register BlockManager
    15/06/21 21:24:01 INFO storage.BlockManagerMasterEndpoint: Registering block manager localhost:59385 with 267.3 MB RAM, BlockManagerId(driver, localhost, 59385)
    15/06/21 21:24:01 INFO storage.BlockManagerMaster: Registered BlockManager
    15/06/21 21:24:02 INFO repl.SparkILoop: Created spark context..
    Spark context available as sc.
    15/06/21 21:24:03 INFO hive.HiveContext: Initializing execution hive, version 0.13.1
    15/06/21 21:24:04 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
    15/06/21 21:24:04 INFO metastore.ObjectStore: ObjectStore, initialize called
    15/06/21 21:24:04 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored
    15/06/21 21:24:04 INFO DataNucleus.Persistence: Property hive.metastore.integral.jdo.pushdown unknown - will be ignored
    15/06/21 21:24:05 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
    15/06/21 21:24:07 WARN DataNucleus.Connection: BoneCP specified but not present in CLASSPATH (or one of dependencies)
    15/06/21 21:24:14 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order"
    15/06/21 21:24:14 INFO metastore.MetaStoreDirectSql: MySQL check failed, assuming we are not on mysql: Lexical error at line 1, column 5.  Encountered: "@" (64), after : "".
    15/06/21 21:24:15 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
    15/06/21 21:24:15 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
    15/06/21 21:24:18 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
    15/06/21 21:24:18 INFO DataNucleus.Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
    15/06/21 21:24:19 INFO metastore.ObjectStore: Initialized ObjectStore
    15/06/21 21:24:20 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 0.13.1aa
    15/06/21 21:24:24 INFO metastore.HiveMetaStore: Added admin role in metastore
    15/06/21 21:24:24 INFO metastore.HiveMetaStore: Added public role in metastore
    15/06/21 21:24:24 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty
    15/06/21 21:24:25 INFO session.SessionState: No Tez session required at this point. hive.execution.engine=mr.
    15/06/21 21:24:25 INFO repl.SparkILoop: Created sql context (with Hive support)..
    SQL context available as sqlContext.
    6、使用wordcount例子测试,启动spark-shell之前先上传一份文件到hdfs
    7、代码:
    val file = sc.textFile("hdfs://hadoop.master:9000/data/intput/wordcount.data")
    val count = file.flatMap(line=>(line.split(" "))).map(word=>(word,1)).reduceByKey(_+_)
    count.collect()
    count.textAsFile("hdfs://hadoop.master:9000/data/output")
    理解上面的代码你需要学习scala语言。
    直接打印结果:hadoop dfs -cat /data/output/p*
    (im,1)
    (are,1)
    (yes,1)
    (hi,2)
    (do,1)
    (no,3)
    (to,1)
    (lll,1)
    (,3)
    (hello,3)
    (xiaoming,1)
    (ga,1)
    (world,1)
    

      

  • 相关阅读:
    C#执行cmd命令
    mongodb 高级查询详解(2)
    mongodb-管道操作:常规查询
    python-pymongo高级查询
    traceback异常打印
    Sanic基础和测试
    Python网络爬虫实战:根据天猫胸罩销售数据分析中国女性胸部大小分布
    POST提交数据的四种方式
    pymongo基础:入门
    python中__name__的意义
  • 原文地址:https://www.cnblogs.com/fclbky/p/5209146.html
Copyright © 2020-2023  润新知