• centos7无cm安装hadoop+spark


    配置内核参数后重启生效
    # echo 'vm.swappiness=10'>> /etc/sysctl.conf

    安装JDK8
    # rpm -ivh jdk-8u211-linux-x64.rpm
    # vi /etc/profile
    export JAVA_HOME=/usr/java/jdk1.8.0_211-amd64
    export CLASSPATH=.:$JAVA_HIOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
    # source /etc/profile

    总结
    1.无CM使用rpm的方式安装CDH6.2.0与之前安装CDH5.10.0基本没有太大的区别。
    2.此安装方式需要下载相关的所有rpm包到服务器,然后制作本地的yum源进行安装,下载的包的总大小在4.3G左右。
    3.同样的在安装过程中需要最先安装Zookeeper。

    ----------------------------------------------------------------------

    3.1 Zookeeper
    1.在所有节点安装Zookeeper
    # yum -y install zookeeper

    2.创建数据目录并修改属主
    # mkdir -p /var/lib/zookeeper
    # chown -R zookeeper /var/lib/zookeeper

    3.修改配置文件/etc/zookeeper/conf/zoo.cfg
    maxClientCnxns=60
    tickTime=2000
    initLimit=10
    syncLimit=5
    dataDir=/var/lib/zookeeper
    clientPort=2181
    dataLogDir=/var/lib/zookeeper
    minSessionTimeout=4000
    maxSessionTimeout=40000
    server.1=gp-mdw:3181:4181
    server.2=gp-data-1:3181:4181
    server.3=gp-data-2:3181:4181

    4.所有节点创建myid文件并修改属主
    【gp-mdw】# echo 1 > /var/lib/zookeeper/myid
    【gp-mdw】# chown zookeeper:zookeeper myid
    ssh gp-data-1 -e "echo 2 > /var/lib/zookeeper/myid"
    ssh gp-data-2 -e "echo 3 > /var/lib/zookeeper/myid"

    5.所有节点启动Zookeeper
    【gp-data-2】# /usr/lib/zookeeper/bin/zkServer.sh start
    【gp-data-1】# /usr/lib/zookeeper/bin/zkServer.sh start
    【gp-mdw】 # /usr/lib/zookeeper/bin/zkServer.sh start

    查看所有节点启动状态,三个节点均启动成功
    # /usr/lib/zookeeper/bin/zkServer.sh status
    JMX enabled by default
    Using config: /usr/lib/zookeeper/bin/../conf/zoo.cfg
    Mode: follower

    -------------------------------------------------------------------------------

    3.2 HDFS
    1.在所有节点安装HDFS必需的包,由于只有三个节点,所以三个节点都安装DataNode
    yum -y install hadoop hadoop-hdfs hadoop-client hadoop-doc hadoop-debuginfo hadoop-hdfs-datanode
    2.在一个节点安装NameNode以及SecondaryNameNode
    yum -y install hadoop-hdfs-namenode hadoop-hdfs-secondarynamenode

    3.创建数据目录并修改属主和权限
    所有节点创建DataNode的目录
    mkdir -p /data0/dfs/dn
    chown -R hdfs:hadoop /data0/dfs/dn
    chmod 700 /data0/dfs/dn

    NameNode和SecondaryNameNode节点创建数据目录
    mkdir -p /data0/dfs/nn
    chown -R hdfs:hadoop /data0/dfs/nn
    chmod 700 /data0/dfs/nn
    mkdir -p /data0/dfs/snn
    chown -R hdfs:hadoop /data0/dfs/snn
    chmod 700 /data0/dfs/snn

    4.修改配置文件
    # /etc/hadoop/conf/core-site.xml
    <configuration>
    <property>
    <name>fs.defaultFS</name>
    <value>hdfs://gp-mdw:8020</value>
    </property>
    <property>
    <name>fs.trash.interval</name>
    <value>1</value>
    </property>
    <property>
    <name>io.compression.codecs</name>
    <value>org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.DeflateCodec,org.apache.hadoop.io.compress.SnappyCodec,org.apache.hadoop.io.compress.Lz4Codec</value>
    </property>
    </configuration>

    # vi /etc/hadoop/conf/hdfs-site.xml
    <configuration>
    <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:///data0/dfs/nn</value>
    </property>
    <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:///data0/dfs/dn</value>
    </property>
    <property>
    <name>dfs.namenode.servicerpc-address</name>
    <value>gp-mdw:8022</value>
    </property>
    <property>
    <name>dfs.https.address</name>
    <value>gp-mdw:9871</value>
    </property>
    <property>
    <name>dfs.secondary.http.address</name>
    <value>gp-mdw:50090</value>
    </property>
    <property>
    <name>dfs.https.port</name>
    <value>9871</value>
    </property>
    <property>
    <name>dfs.namenode.http-address</name>
    <value>gp-mdw:9870</value>
    </property>
    <property>
    <name>dfs.replication</name>
    <value>3</value>
    </property>
    <property>
    <name>dfs.blocksize</name>
    <value>134217728</value>
    </property>
    <property>
    <name>dfs.namenode.checkpoint.dir</name>
    <value>file:///data0/dfs/snn</value>
    </property>
    </configuration>

    5.将修改的配置文件保存并同步到所有节点
    scp /etc/hadoop/conf/core-site.xml gp-data-1:/etc/hadoop/conf
    scp /etc/hadoop/conf/core-site.xml gp-data-2:/etc/hadoop/conf
    scp /etc/hadoop/conf/hdfs-site.xml gp-data-1:/etc/hadoop/conf
    scp /etc/hadoop/conf/hdfs-site.xml gp-data-2:/etc/hadoop/conf

    6.格式化NameNode
    sudo -u hdfs hdfs namenode -format

    7.在所有节点运行命令启动HDFS
    【gp-mdw】systemctl start hadoop-hdfs-namenode
    【gp-mdw】systemctl start hadoop-hdfs-secondarynamenode
    全部: systemctl start hadoop-hdfs-datanode
    【gp-mdw】systemctl status hadoop-hdfs-namenode
    【gp-mdw】systemctl status hadoop-hdfs-secondarynamenode
    全部: systemctl status hadoop-hdfs-datanode

    8.创建/tmp临时目录,并设置目录权限,然后使用hadoop命令查看创建的目录成功
    sudo -u hdfs hadoop fs -mkdir /tmp
    sudo -u hdfs hadoop fs -chmod -R 1777 /tmp

    9.访问NameNode的Web UI
    http://gp-mdw:8020

    --------------------------------------------------------------------

    3.3 Yarn
    1.安装Yarn的包,在一个节点安装ResourceManager和JobHistory Server,所有节点安装NodeManager
    【gp-mdw】 # yum -y install hadoop-yarn hadoop-yarn-resourcemanager hadoop-mapreduce-historyserver hadoop-yarn-proxyserver hadoop-mapreduce
    全部: # yum -y install hadoop-yarn hadoop-yarn-nodemanager hadoop-mapreduce

    2.创建目录并修改属主和权限
    在所有节点创建本地目录
    mkdir -p /data0/yarn/nm
    chown yarn:hadoop /data0/yarn/nm
    mkdir -p /data0/yarn/container-logs
    chown yarn:hadoop /data0/yarn/container-logs

    在HDFS上创建logs目录
    sudo -u hdfs hdfs dfs -mkdir /tmp/logs
    sudo -u hdfs hdfs dfs -chown mapred:hadoop /tmp/logs
    sudo -u hdfs hdfs dfs -chmod 1777 /tmp/logs

    在HDFS上创建/user/history目录
    sudo -u hdfs hdfs dfs -mkdir -p /user
    sudo -u hdfs hdfs dfs -chmod 777 /user
    sudo -u hdfs hdfs dfs -mkdir -p /user/history
    sudo -u hdfs hdfs dfs -chown mapred:hadoop /user/history
    sudo -u hdfs hdfs dfs -chmod 1777 /user/history
    sudo -u hdfs hdfs dfs -mkdir -p /user/history/done
    sudo -u hdfs hdfs dfs -mkdir -p /user/history/done_intermediate
    sudo -u hdfs hdfs dfs -chown -R mapred:hadoop /user/history
    sudo -u hdfs hdfs dfs -chmod 771 /user/history/done
    sudo -u hdfs hdfs dfs -chmod 1777 /user/history/done_intermediate

    3.修改配置文件
    # vi /etc/hadoop/conf/yarn-site.xml
    <configuration>
    <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    </property>
    <property>
    <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
    </property>
    <property>
    <name>yarn.nodemanager.local-dirs</name>
    <value>file:///data0/yarn/nm</value>
    </property>
    <property>
    <name>yarn.nodemanager.log-dirs</name>
    <value>file:///data0/yarn/container-logs</value>
    </property>
    <property>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>/tmp/logs</value>
    </property>
    <property>
    <name>yarn.application.classpath</name>
    <value>$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*</value>
    </property>
    <property>
    <name>yarn.resourcemanager.address</name>
    <value>gp-mdw:8032</value>
    </property>
    <property>
    <name>yarn.resourcemanager.admin.address</name>
    <value>gp-mdw:8033</value>
    </property>
    <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>gp-mdw:8030</value>
    </property>
    <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>gp-mdw:8031</value>
    </property>
    <property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>gp-mdw:8088</value>
    </property>
    <property>
    <name>yarn.resourcemanager.webapp.https.address</name>
    <value>gp-mdw:8090</value>
    </property>
    </configuration>

    # vi /etc/hadoop/conf/mapred-site.xml
    <configuration>
    <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
    </property>
    <property>
    <name>mapreduce.jobhistory.address</name>
    <value>gp-mdw:10020</value>
    </property>
    <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>gp-mdw:19888</value>
    </property>
    <property>
    <name>mapreduce.jobhistory.webapp.https.address</name>
    <value>gp-mdw:19890</value>
    </property>
    <property>
    <name>mapreduce.jobhistory.admin.address</name>
    <value>gp-mdw:10033</value>
    </property>
    <property>
    <name>yarn.app.mapreduce.am.staging-dir</name>
    <value>/user</value>
    </property>
    </configuration>

    # /etc/hadoop/conf/core-site.xml,下面只贴出修改的部分配置
    <property>
    <name>hadoop.proxyuser.mapred.groups</name>
    <value>*</value>
    </property>
    <property>
    <name>hadoop.proxyuser.mapred.hosts</name>
    <value>*</value>
    </property>

    将修改的配置文件保存并同步到所有节点
    scp /etc/hadoop/conf/core-site.xml gp-data-1:/etc/hadoop/conf
    scp /etc/hadoop/conf/core-site.xml gp-data-2:/etc/hadoop/conf
    scp /etc/hadoop/conf/yarn-site.xml gp-data-1:/etc/hadoop/conf
    scp /etc/hadoop/conf/yarn-site.xml gp-data-2:/etc/hadoop/conf
    scp /etc/hadoop/conf/mapred-site.xml gp-data-1:/etc/hadoop/conf
    scp /etc/hadoop/conf/mapred-site.xml gp-data-2:/etc/hadoop/conf

    5.启动Yarn服务
    在JobHistoryServer节点上启动mapred-historyserver
    【gp-mdw】/etc/init.d/hadoop-mapreduce-historyserver start

    在RM节点启动ResourceManager
    【gp-mdw】systemctl start hadoop-yarn-resourcemanager
    【gp-mdw】systemctl status hadoop-yarn-resourcemanager

    在NM节点启动NodeManager
    【全部】systemctl start hadoop-yarn-nodemanager
    【全部】systemctl status hadoop-yarn-nodemanager

    6.访问Yarn服务的Web UI
    Yarn的管理页面
    JobHistory的管理页面
    查看在线的节点
    <name>mapreduce.jobhistory.address</name>
    <value>gp-mdw:10020</value>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>gp-mdw:19888</value>
    <name>mapreduce.jobhistory.webapp.https.address</name>
    <value>gp-mdw:19890</value>
    <name>mapreduce.jobhistory.admin.address</name>
    <value>gp-mdw:10033</value>

    7.运行MR示例程序
    使用root用户运行示例程序,所以要先创建root用户的目录
    sudo -u hdfs hdfs dfs -mkdir /user/root
    sudo -u hdfs hdfs dfs -chown root:root /user/root
    运行MR示例程序,运行成功
    hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar pi 5 5

    -----------------------------------------------------------------------------

    3.4 Spark
    1.安装Spark所需的包
    【gp-mdw】yum -y install spark-core spark-master spark-worker spark-history-server spark-python

    2.创建目录并修改属主和权限
    sudo -u hdfs hadoop fs -mkdir /user/spark
    sudo -u hdfs hadoop fs -mkdir /user/spark/applicationHistory
    sudo -u hdfs hadoop fs -chown -R spark:spark /user/spark
    sudo -u hdfs hadoop fs -chmod 1777 /user/spark/applicationHistory

    3.修改配置文件/etc/spark/conf/spark-defaults.conf
    spark.eventLog.enabled=true
    spark.eventLog.dir=hdfs://gp-mdw:8020/user/spark/applicationHistory
    spark.yarn.historyServer.address=http://gp-mdw:18088

    4.启动spark-history-server
    【gp-mdw】systemctl start spark-history-server
    【gp-mdw】systemctl status spark-history-server
    访问Web UI

    //5.修改配置文件并同步到所有节点
    //scp /etc/spark/conf/spark-defaults.conf gp-data-1:/etc/spark/conf
    //scp /etc/spark/conf/spark-defaults.conf gp-data-2:/etc/spark/conf

    6.测试Spark使用
    spark-submit --class org.apache.spark.examples.SparkPi --master local /usr/lib/spark/examples/jars/spark-examples_2.11-2.4.0-cdh6.3.2.jar
    2020-03-15 10:01:56 INFO DAGScheduler:57 - Job 0 finished: reduce at SparkPi.scala:38, took 1.052675 s
    Pi is roughly 3.143435717178586

    -----------------------------------------------------------------------------------

    由于centos7原本就安装了Python2,而且这个Python2不能被删除,因为有很多系统命令,比如yum都要用到。
    安装python3.7之前需要先安装一些依赖
    yum -y install zlib-devel bzip2-devel openssl-devel openssl-static ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel libffi-devel lzma gcc
    wget https://www.python.org/ftp/python/3.7.7/Python-3.7.7.tar.xz

    mkdir /usr/local/python3

    然后解压压缩包,进入该目录,安装Python3
    tar -xvJf Python-3.7.7.tar.xz
    cd Python-3.7.7
    ./configure --prefix=/usr/local/python3 --enable-shared
    make && make install

    最后创建软链接
    ln -s /usr/local/python3/bin/python3 /usr/bin/python3
    ln -s /usr/local/python3/bin/pip3 /usr/bin/pip3

    在命令行中输入python3测试
    # echo "/usr/local/python3/lib" > /etc/ld.so.conf.d/python3-x86_64.conf
    # ldconfig -v

    # vi /etc/profile
    export PYSPARK_PYTHON=/usr/bin/python3
    export PYSPARK_DRIVER_PYTHON=/usr/bin/python3
    # source /etc/profile
    # pyspark 调用的是就是python3.7

    ------------------------------------------------------------------------------------------

  • 相关阅读:
    python易混易乱(2)
    python易混易乱(1)
    #1062 – Duplicate entry ‘1’ for key ‘PRIMARY’
    关于 flask 实现数据库迁移以后 如何根据创建的模型类添加新的表?
    Linux同步互斥(Peterson算法,生产者消费者模型)
    正则表达式(Python)
    进程间通信
    CSS常见简写规则整理
    Django Model
    Django杂记
  • 原文地址:https://www.cnblogs.com/zsfishman/p/12500104.html
Copyright © 2020-2023  润新知