• hadoop-3.1.1集群搭建


    Hadoop完全分布式集群搭建:

    1)配置文件
      1.hadoop-env.sh
        export JAVA_HOME=/opt/module/jdk1.8.0_171
        export HDFS_NAMENODE_USER=root
        export HDFS_DATANODE_USER=root
        export HDFS_SECONDARYNAMENODE_USER=root
      2.hdfs-core.xml
        <configuration>
          <property>
            <name>fs.defaultFS</name>
            <value>hdfs://master:9820</value>
          </property>
          <property>
            <name>hadoop.tmp.dir</name>
            <value>/opt/module/hadoop-3.1.1/tmp</value>
          </property>
        </configuration>
      3.hdfs-site.xml
        <configuration>
          <property>
            <name>dfs.replication</name>
            <value>2</value>
          </property>
          <property>
            <name>dfs.namenode.secondary.http-address</name>
            <value>slave1:9868</value>
          </property>
        </configuration>
      4.workers
        slave1
        slave2
        slave3
    2)格式化文件系统
      $ bin/hdfs namenode -format
    3)启动集群
      $ sbin/start-dfs.sh
    4)可视化查看Hadoop集群:
      master:9870

    Hadoop-HA搭建
      1)配置文件
        1.hadoop-env.sh
          export JAVA_HOME=/opt/module/jdk1.8.0_171
          export HDFS_NAMENODE_USER=root
          export HDFS_DATANODE_USER=root
          export HDFS_ZKFC_USER=root
          export HDFS_JOURNALNODE_USER=root
        2.hdfs-core.xml
          <configuration>
            <property>
              <name>fs.defaultFS</name>
              <value>hdfs://mycluster</value>
            </property>
            <property>
              <name>hadoop.tmp.dir</name>
              <value>/opt/module/hadoop-3.1.1/tmp</value>
            </property>
            <property>
              <name>hadoop.http.staticuser.user</name>
              <value>root</value>
            </property>
            <property>
              <name>ha.zookeeper.quorum</name>
              <value>slave1:2181,slave2:2181,slave3:2181</value>
            </property>
          </configuration>
        3.hdfs-site.xml
          <configuration>
            <property>
              <name>dfs.replication</name>
              <value>2</value>
            </property>
            <property>
              <name>dfs.nameservices</name>
              <value>mycluster</value>
            </property>
            <property>
              <name>dfs.ha.namenodes.mycluster</name>
              <value>nn1,nn2</value>
            </property>
            <property>
              <name>dfs.namenode.rpc-address.mycluster.nn1</name>
              <value>master:8020</value>
            </property>
            <property>
              <name>dfs.namenode.rpc-address.mycluster.nn2</name>
              <value>slave1:8020</value>
            </property>
            <property>
              <name>dfs.namenode.http-address.mycluster.nn1</name>
              <value>master:9870</value>
            </property>
            <property>
              <name>dfs.namenode.http-address.mycluster.nn2</name>
              <value>slave1:9870</value>
            </property>
            <property>
              <name>dfs.namenode.shared.edits.dir</name>
              <value>qjournal://master:8485;slave1:8485;slave2:8485/mycluster</value>
            </property>
            <property>
              <name>dfs.client.failover.proxy.provider.mycluster</name>
              <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
            </property>
            <property>
              <name>dfs.ha.fencing.methods</name>
              <value>sshfence</value>
            </property>
            <property>
              <name>dfs.ha.fencing.ssh.private-key-files</name>
              <value>/root/.ssh/id_rsa</value>
            </property>
            <property>
              <name>dfs.journalnode.edits.dir</name>
              <value>/opt/journal</value>
            </property>
            <property>
              <name>dfs.ha.automatic-failover.enabled</name>
              <value>true</value>
            </property>
          </configuration>
        4.workers
          slave1
          slave2
          slave3
      2)zookeeper集群搭建
        zoo.cfg
        tickTime=2000
        dataDir=/opt/module/zookeeper-3.4.12/data
        clientPort=2181
        initLimit=5
        syncLimit=2
        server.1=slave1:2888:3888
        server.2=slave2:2888:3888
        server.3=slave3:2888:3888
        项目主目录/data/myid 内容分别是[1,2,3]
      3)每个zk节点上都执行:zkServer.sh start
        看是否启动成功:zkServer.sh status
      4)启动journalnode(每个journalnode节点都启动)
        hdfs --daemon start journalnode
      5)同步编辑日志
        如果已有集群并且是单namenode
          hdfs namenode -initializeSharedEdits(在已经format的namenode上执行)
          hdfs --daemon start namenode
          hdfs namenode -bootstrapStandby(没有format的namenode上执行)
        如果是新建集群
          hdfs namenode -format
          hdfs --daemon start namenode
          hdfs namenode -bootstrapStandby(没有format的namenode上执行)
      6)格式化zookeeper并启动
          $HADOOP_HOME/bin/hdfs zkfc -formatZK (在其中一台namenode节点上格式化即可)
          $HADOOP_HOME/bin/hdfs --daemon start zkfc (两台zkfc(也就是namenode)节点都启动)
      7)yarn搭建
        yarn-env.sh
          export YARN_RESOURCEMANAGER_USER=root
          export YARN_NODEMANAGER_USER=root

        mapred-site.xml
          <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
          </property>
          <property>
            <name>yarn.app.mapreduce.am.env</name>
            <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
          </property>
          <property>
            <name>mapreduce.map.env</name>
            <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
          </property>
          <property>
            <name>mapreduce.reduce.env</name>
            <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
          </property>
        yarn-site.xml
          <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
          </property>
          <property>
            <name>yarn.resourcemanager.ha.enabled</name>
            <value>true</value>
          </property>
          <property>
            <name>yarn.resourcemanager.cluster-id</name>
            <value>cluster1</value>
          </property>
          <property>
            <name>yarn.resourcemanager.ha.rm-ids</name>
            <value>rm1,rm2</value>
          </property>
          <property>
            <name>yarn.resourcemanager.hostname.rm1</name>
            <value>slave2</value>
          </property>
          <property>
            <name>yarn.resourcemanager.hostname.rm2</name>
            <value>slave3</value>
          </property>
          <property>
            <name>yarn.resourcemanager.webapp.address.rm1</name>
            <value>slave2:8088</value>
          </property>
          <property>
            <name>yarn.resourcemanager.webapp.address.rm2</name>
            <value>slave3:8088</value>
          </property>
          <property>
            <name>yarn.resourcemanager.zk-address</name>
            <value>slave1:2181,slave2:2181,slave3:2181</value>
          </property>

  • 相关阅读:
    PHP各种读取文件的函数效率对比
    一篇不错的session与cookie机制的文章
    Yahoo!团队实践分享:网站性能优化的34条黄金守则
    PHP魔术函数集锦
    MySQL索引类型一览 让MySQL高效运行起来
    Mysql的索引和查询优化
    MySQL简单查询性能分析
    kafka进阶一
    Event Loop
    如何提高 Webpack 的构建速度、优化前端性能?NPM 的模块构建优先选用ES6还是ES5?如何吃透复杂的插件体系,源码如何阅读?
  • 原文地址:https://www.cnblogs.com/jqbai/p/10989967.html
Copyright © 2020-2023  润新知