• CentOS下SparkR安装部署:hadoop2.7.3+spark2.0.0+scale2.11.8+hive2.1.0


    注:之前本人写了一篇SparkR的安装部署文章:SparkR安装部署及数据分析实例,当时SparkR项目还没正式入主Spark,需要自己下载SparkR安装包,但现在spark已经支持R接口,so更新了这篇文章。

    1、Hadoop安装

    参考:

    http://www.linuxidc.com/Linux/2015-11/124800.htm

    http://blog.csdn.net/sa14023053/article/details/51952534

    yarn-site.xml

     1   <property>
     2     <name>yarn.scheduler.maximum-allocation-mb</name>
     3     <value>57344</value>
     4   </property>
     5 
     6   <property>
     7     <name>yarn.scheduler.minimum-allocation-mb</name>
     8     <value>2048</value>
     9   </property>
    10 
    11   <property>
    12     <name>yarn.nodemanager.resource.memory-mb</name>
    13     <value>57344</value>
    14   </property>
    15 
    16   <property>
    17     <name>yarn.app.mapreduce.am.resource.mb</name>
    18     <value>2048</value>
    19   </property>

    2、Spark安装

    参考:

    http://blog.csdn.net/sa14023053/article/details/51953836

    vim spark-defaults.conf

    1 spark.master                     spark://hadoopmaster:7077
    2 spark.eventLog.enabled           true
    3 spark.eventLog.dir               hdfs://hadoopmaster:9000/directory
    4 spark.yarn.historyServer.address hadoopmaster:18080
    5 spark.serializer                 org.apache.spark.serializer.KryoSerializer
    6 spark.driver.memory              30g
    7 spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"

    vim spark-env.sh

     1 export SCALA_HOME=/xxx/scala-2.11.8
     2 export JAVA_HOME=/usr/java/jdk1.8.0_101
     3 export HADOOP_HOME=/xxx/hadoop-2.7.3
     4 export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
     5 export SPARK_MASTER_IP=XXXX
     6 export SPARK_LOCAL_DIRS=/xxx/spark-2.0.0-bin-hadoop2.7
     7 export SPARK_WORKER_MEMORY=110g
     8 export SPARK_LOCAL_DIRS=/data/sparkdata/local
     9 SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true"
    10 export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=3 -Dspark.history.fs.logDirectory=hdfs://hadoopmaster:9000/directory"

    3、Mysql安装

    参考:

    http://blog.csdn.net/wendi_0506/article/details/39478369

    grant all privileges on *.* to hive@localhost identified by 'hive' with grant option;

    4、Hive安装

    参考:

    http://blog.csdn.net/lnho2015/article/details/51355511

    http://blog.csdn.net/blue_jjw/article/details/50479263

    hadoop fs -mkdir /tmp
    hadoop fs -mkdir -p /user/hive/warehouse
    hadoop fs -mkdir -p /user/hive/log
    hadoop fs -chmod g+w /user/hive/warehouse
    hadoop fs -chmod g+w /user/hive/log
    hadoop fs -chmod g+w /tmp

    hive-site.xml配置如下:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
    <property>
        <name>hive.metastore.warehouse.dir</name>
        <value>/user/hive/warehouse</value>
        <description>location of default database for the warehouse</description>
    </property>
    <property>
        <name>hive.querylog.location</name>
        <value>/user/hive/log</value>
    </property>
    <property>
       <name>javax.jdo.option.ConnectionURL</name>
       <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
       <description>JDBC connect string for a JDBC metastore</description>
    </property>
    <property>
      <name>javax.jdo.option.ConnectionDriverName</name>
      <value>com.mysql.jdbc.Driver</value>
      <description>Driver class name for a JDBC metastore</description>
    </property>
    <property>
      <name>javax.jdo.option.ConnectionUserName</name>
      <value>hive</value>
      <description>username to use against metastore database</description>
    </property>
    <property>
      <name>javax.jdo.option.ConnectionPassword</name>
      <value>hive</value>
      <description>password to use against metastore database</description>
    </property>
    </configuration>

    hive客户端配置(scp hadoopmaster上的hive安装目录到slaver1,修改hive_site.xml配置):

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <configuration>
        <property>
           <name>hive.metastore.uris</name>
           <value>thrift://hadoopmaster:9083</value>
        </property>
    </configuration>

    5、R安装

    5.1 安装相应软件

    yum -y groupinstall 'Development Tools'
    yum -y install gfortran
    yum -y install cmake
    yum -y groupinstall "X Window System"
    yum install gcc
    yum install gcc-c++
    yum install gcc-gfortran
    yum install bzip2-libs
    yum install bzip2-devel
    yum -y install xz-devel.x86_64
    yum install libcurl-devel.x86_64
    yum -y install readline-devel tcl tk libX11-devel libXtst-devel xorg-x11-xtrans-devel libpng-devel libXt-devel

    5.2 编译和安装

    ./configure --prefix /usr/R --enable-R-shlib --with-readline=yes --with-x=yes --with-tcltk=yes --with-cairo=yes --with-libpng=yes --with-jpeglib=yes --with-libtiff=yes --with-aqua=yes --with-ICU=yes --with-libcurl=yes --enable-utf8

    make

    make install

    vim /etc/profile:

    export R_HOME=/usr/R/lib64/R

    然后:

    source /etc/profile

    R CMD javareconf

    source /usr/R/lib64/R/etc/ldpaths

    vim ~/.bash_profile

    添加PATH=/usr/R/bin:$PATH

    然后source ~/.bash_profile

     ok,SparkR集群安装完成!

    ---------------------------------------------------------------------------------- 数据和特征决定了效果上限,模型和算法决定了逼近这个上限的程度 ----------------------------------------------------------------------------------
  • 相关阅读:
    python全栈开发从入门到放弃之内置函数
    python全栈开发从入门到放弃之递归函数的调用
    python全栈开发从入门到放弃之字典的应用
    python全栈开发从入门到放弃之元组的内置应用
    python全栈开发从入门到放弃之装饰器函数
    [LeetCode-JAVA] Remove Duplicates from Sorted Array II
    [LeetCode-JAVA] Simplify Path
    [LeetCode-JAVA] Permutations
    tensorboard在windows系统浏览器显示空白的解决writer =tf.summary.FileWriter("logs/", sess.graph)
    Windows64位安装CPU版TensorFlow
  • 原文地址:https://www.cnblogs.com/payton/p/5843280.html
Copyright © 2020-2023  润新知