• (四)Spark集群搭建-Java&Python版Spark


    Spark集群搭建

    视频教程

    1、优酷

    2、YouTube

    安装scala环境

    下载地址http://www.scala-lang.org/download/

    上传scala-2.10.5.tgzmasterslave机器的hadoop用户installer目录下

    两台机器都要做

    [hadoop@master installer]$ ls

    hadoop2  hadoop-2.6.0.tar.gz  scala-2.10.5.tgz

    解压

    [hadoop@master installer]$ tar -zxvf scala-2.10.5.tgz

    [hadoop@master installer]$ mv scala-2.10.5 scala

    [hadoop@master installer]$ cd scala

    [hadoop@master scala]$ pwd

    /home/hadoop/installer/scala

    配置环境变量:

    [hadoop@master ~]$ vim .bashrc

    # .bashrc

    # Source global definitions

    if [ -f /etc/bashrc ]; then

            . /etc/bashrc

    fi

    # User specific aliases and functions

    export JAVA_HOME=/usr/java/jdk1.7.0_79

    export HADOOP_HOME=/home/hadoop/installer/hadoop2

    export SCALA_HOME=/home/hadoop/installer/scala

    export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native

    export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

    export CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib:$JAVA_HOME/lib:$SCALA_HOME/lib

    export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SCALA_HOME/bin

    [hadoop@master ~]$ . .bashrc

    安装python

    安装gcc

    [root@master ~]# mkdir /RHEL5U4

    [root@master ~]# mount /dev/cdrom /media/

    [root@master media]# cp -r * /RHEL5U4/

    [root@master ~]vim /etc/yum.repos.d/iso.repo

    [rhel-Server]

    Name=5u4_Server

    Baseurl=file:///RHEL5U4/Server

    Enable=1

    Gpgcheck=0

    Gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release

    yum clean all

    yum install gcc

    Python安装

    [root@master installer]# tar -zxvf Python-2.7.12

    上传zlib-1.2.8.tar.gz

    替换/root/installer/Python-2.7.12/Moduleszlib

    [root@master Python-2.7.12]# ./configure --prefix=/usr/local/python27

    [root@master Python-2.7.12]# make

    [root@master Python-2.7.12]# make install

    [root@master Python-2.7.12]# mv /usr/bin/python /usr/bin/python_old

    [root@master Python-2.7.12]# ln -s /usr/local/python27/bin/python /usr/bin/

    [root@master Python-2.7.12]# python

    Python 2.7.12 (default, Nov  7 2016, 21:42:16)

    [GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2

    Type "help", "copyright", "credits" or "license" for more information.

    >>>

    安装spark环境

    下载地址http://spark.apache.org/downloads.html

    上传spark-2.0.0-bin-hadoop2.6.tgzmasterhadoop用户installer目录下

    解压缩

    [hadoop@master installer]$ tar -zxvf spark-2.0.0-bin-hadoop2.6.tgz

    [hadoop@master installer]$ mv spark-2.0.0-bin-hadoop2.6 spark2

    [hadoop@master installer]$ cd spark2/

    [hadoop@master spark2]$ ls

    bin  conf  data  examples  jars  LICENSE  licenses  NOTICE  python  R  README.md  RELEASE  sbin  yarn

    [hadoop@master spark2]$ pwd

    /home/hadoop/installer/spark2

    [hadoop@master ~]$ vim .bashrc

    # .bashrc

    # Source global definitions

    if [ -f /etc/bashrc ]; then

            . /etc/bashrc

    fi

    # User specific aliases and functions

    export JAVA_HOME=/usr/java/jdk1.7.0_79

    export HADOOP_HOME=/home/hadoop/installer/hadoop2

    export SCALA_HOME=/home/hadoop/installer/scala

    export SPARK_HOME=/home/hadoop/installer/spark2

    export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native

    export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

    export CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib:$JAVA_HOME/lib:$SCALA_HOME/lib

    export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$SCALA_HOME/bin:$SPARK_HOME/bin:$SPARK_HOME/sbin

    [hadoop@master ~]$ . .bashrc

    [hadoop@master ~]$ scp .bashrc slave:~

    .bashrc                                                                                            100%  621     0.6KB/s   00:00

    slave机器上执行

    [hadoop@slave ~]$ . .bashrc

    配置spark

    [hadoop@master conf]$ cp spark-env.sh.template spark-env.sh

    [hadoop@slave conf]$ vim spark-env.sh

    #!/usr/bin/env bash

    #

    # Licensed to the Apache Software Foundation (ASF) under one or more

    # contributor license agreements.  See the NOTICE file distributed with

    # this work for additional information regarding copyright ownership.

    # The ASF licenses this file to You under the Apache License, Version 2.0

    # (the "License"); you may not use this file except in compliance with

    # the License.  You may obtain a copy of the License at

    #

    #    http://www.apache.org/licenses/LICENSE-2.0

    #

    # Unless required by applicable law or agreed to in writing, software

    # distributed under the License is distributed on an "AS IS" BASIS,

    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

    # See the License for the specific language governing permissions and

    # limitations under the License.

    #

    export JAVA_HOME=/usr/java/jdk1.7.0_79

    export SCALA_HOME=/home/hadoop/installer/scala

    export SPARK_MASTER_HOST=master

    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

    export SPARK_EXECUTOR_MEMORY=600M

    export SPARK_DRIVER_MEMORY=600M

    [hadoop@slave conf]$ vim slaves

    master

    slave

    [hadoop@master installer]$ scp -r spark2 slave:~/installer/

    启动spark集群

    [hadoop@master ~]$ start-master.sh

    [hadoop@master ~]$ start-slaves.sh

    [hadoop@master ~]$ jps

    17769 ResourceManager

    20192 Master

    20275 Worker

    17443 NameNode

    20521 Jps

    17631 SecondaryNameNode

    [hadoop@slave ~]$ jps

    13297 DataNode

    15367 Worker

    13408 NodeManager

    16245 Jps

    Spark wordcount

    [hadoop@master ~]$ spark-shell

    Setting default log level to "WARN".

    To adjust logging level use sc.setLogLevel(newLevel).

    16/11/04 11:05:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

    16/11/04 11:05:09 WARN spark.SparkContext: Use an existing SparkContext, some configuration may not take effect.

    Spark context Web UI available at http://192.168.3.100:4040

    Spark context available as 'sc' (master = local[*], app id = local-1478228709028).

    Spark session available as 'spark'.

    Welcome to

          ____              __

         / __/__  ___ _____/ /__

        _ / _ / _ `/ __/  '_/

       /___/ .__/\_,_/_/ /_/\_   version 2.0.0

          /_/

             

    Using Scala version 2.11.8 (Java HotSpot(TM) Client VM, Java 1.7.0_79)

    Type in expressions to have them evaluated.

    Type :help for more information.

    scala> val file = sc.textFile("hdfs://master:9000/data/wordcount")

    16/11/04 11:05:14 WARN util.SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes

    file: org.apache.spark.rdd.RDD[String] = hdfs://master:9000/data/input/wordcount MapPartitionsRDD[1] at textFile at <console>:24

    scala> val count=file.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)

    count: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at <console>:26

    scala> count.collect()

    res0: Array[(String, Int)] = Array((package,1), (this,1), (Version"](http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version),1), (Because,1), (Python,2), (cluster.,1), (its,1), ([run,1), (general,2), (have,1), (pre-built,1), (YARN,,1), (locally,2), (changed,1), (locally.,1), (sc.parallelize(1,1), (only,1), (Configuration,1), (This,2), (basic,1), (first,1), (learning,,1), ([Eclipse](https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-Eclipse),1), (documentation,3), (graph,1), (Hive,2), (several,1), (["Specifying,1), ("yarn",1), (page](http://spark.apache.org/documentation.html),1), ([params]`.,1), ([project,2), (prefer,1), (SparkPi,2), (<http://spark.apache.org/>,1), (engine,1), (version,1), (file,1), (documentation...

    scala>

  • 相关阅读:
    值得推荐的C/C++框架和库
    P2P技术基础: 关于TCP打洞技术
    Java FileInputStream与FileReader的区别
    如何理解java采用Unicode编码
    细说:Unicode, UTF-8, UTF-16, UTF-32, UCS-2, UCS-4
    程序员趣味读物:谈谈Unicode编码
    字符编码笔记:ASCII,Unicode 和 UTF-8
    Java并发编程:volatile关键字解析(转)
    java线程安全问题之静态变量、实例变量、局部变量
    Hystrix使用详解
  • 原文地址:https://www.cnblogs.com/LgyBean/p/6251288.html
Copyright © 2020-2023  润新知