• 集群搭建系列


    Hadoop 2.4.0+zookeeper3.4.6+hbase0.98.3分布式集群搭建

    Ip

     主机名

    程序

    进程

    192.168.137.11

    h1

    Jdk

    Hadoop

    hbase

    Namenode

    DFSZKFailoverController

    Hamster

    192.168.137.12

    h2

    Jdk

    Hadoop

    hbase

    Namenode

    DFSZKFailoverController

    Hamster

    192.168.137.13

    h3

    Jdk

    Hadoop

    resourceManager

    192.168.137.14

    h4

    Jdk

    Hadoop

    Zookeeper

    hbase

    Datanode

    nodeManager

    JournalNode

    QuorumPeerMain

    HRegionServer

    192.168.137.15

    h5

    Jdk

    Hadoop

    Zookeeper

    Hbase

    Datanode

    nodeManager

    JournalNode

    QuorumPeerMain

    HRegionServer

    192.168.137.16

    h6

    Jdk

    Hadoop

    Zookeeper

    hbase

    Datanode

    nodeManager

    JournalNode

    QuorumPeerMain

    HRegionServer

    准备工作

    1. 修改Linux主机名

    Vim /etc/sysconfig/network

    添加 HOSTNAME=h1

    1. 修改IP

     vim /etc/sysconfig/network-scripts/ifcfg-eth0 

    修改IPADDR=192.168.137.11

    1. 修改主机名和IP的映射关系

    Vim /etc/hosts

    添加192.168.137.11  h1

    1. 关闭防火墙

     service iptables stop

    1. ssh免登陆

    ssh-keygen –t rsa //产生公钥和私钥

    拷贝公钥到其他电脑(h2为主机名)

    ssh-copy-id -i h2

    1. 安装JDK,配置环境变量等

    这里可以在一台电脑上配置,然后拷贝到其他电脑

    scp –r /home/jdk/  h2:/home/

    都做完可以重启一下电脑

     
     //-----------------------------------------------------------------------------------
     //----------------------------zookeeper集群安装-------------------------------------
     //-----------------------------------------------------------------------------------

    安装zookeeper

    解压 tar –zxvf zookeeper-3.4.6.tar.gz

    1.修改配置文件conf/ zoo_sample.cfg  为zoo.cfg

    mv zoo_sample.cfg zoo.cfg

    打开修改内容:

    dataDir=/home/gj/zookeeper-3.4.6/data  //数据目录,可随意定义

    最后面添加:

    server.1=h4:2888:3888

    server.2=h5:2888:3888

    server.3=h6:2888:3888

    // server.X=A:B:C

    其中X是一个数字, 表示这是第几号server.

    A是该server所在的IP地址.

    B配置该server和集群中的leader交换消息所使用的端口.

    C配置选举leader时所使用的端口. 

    注意这里需要创建data文件夹

    进入data文件夹创建文件myid  内容为1

    1表示这是第几号server, 与server.X=A:B:C中的X对应

    2.将配置到的zookeeper拷贝到其他电脑(h2,h3)上

    使用 scp -r 命令

    分别修改 myid文件内容为2,3

    1. 启动三个节点的 bin目录下的./zkServer.sh start

    也可以将zookeeper 配置到环境变量里面

       //-----------------------------------------------------------------------------------
     //----------------------------zookeeper集群安装-------------------------------------
     //-----------------------------------------------------------------------------------


     //-----------------------------------------------------------------------------------
     //----------------------------Hadoop集群安装-------------------------------------
     //-----------------------------------------------------------------------------------

    安装hadoop

    修改文件:

    1.hadoop-env.sh

    export JAVA_HOME=/usr/hadoop/jdk  //添加java环境

    2.core-site.xml

    <configuration>

     <!--指定hdfs的nameservice为ns1-->

     <property>

       <name>fs.defaultFS</name>

       <value>hdfs://ns1</value>

     </property>

    <!--指定hadoop数据存放目录-->

     <property>

       <name>hadoop.tmp.dir</name>

       <value>/root/hadoop/hadoop-2.4.0/tmp</value>

     </property>

    <!--指定zookeeper地址-->

     <property>

       <name>ha.zookeeper.quorum</name>

       <value>h4:2181,h5:2181,h6:2181</value>

     </property>

    </configuration>

    1. hdfs-site.xml

    <configuration>

    <!--指定hdfs的nameservice为ns1,需要和core-site.xml中的保持一致 -->

    <property>

       <name>dfs.nameservices</name>

       <value>ns1</value>

    </property>

    <!-- ns1下面有两个NameNode,分别是nn1,nn2 -->

    <property>

       <name>dfs.ha.namenodes.ns1</name>

       <value>nn1,nn2</value>

    </property>

    <!-- nn1的RPC通信地址 -->

    <property>

       <name>dfs.namenode.rpc-address.ns1.nn1</name>

       <value>h1:9000</value>

    </property>

    <!-- nn1的http通信地址 -->

    <property>

            <name>dfs.namenode.http-address.ns1.nn1</name>

            <value>h1:50070</value>

    </property>

    <!-- nn2的RPC通信地址 -->

    <property>

            <name>dfs.namenode.rpc-address.ns1.nn2</name>

            <value>h2:9000</value>

    </property>

    <!-- nn2的http通信地址 -->

    <property>

            <name>dfs.namenode.http-address.ns1.nn2</name>

            <value>h2:50070</value>

    </property>

    <!-- 指定NameNode的元数据在JournalNode上的存放位置 -->

    <property>

            <name>dfs.namenode.shared.edits.dir</name>

            <value>qjournal://h4:8485;h5:8485;h6:8485/ns1</value>

    </property>

    <!-- 指定JournalNode在本地磁盘存放数据的位置 -->

    <property>

            <name>dfs.journalnode.edits.dir</name>

            <value>/root/hadoop/hadoop-2.4.0/journal</value>

    </property>

    <!-- 开启NameNode失败自动切换 -->

    <property>

            <name>dfs.ha.automatic-failover.enabled</name>

            <value>true</value>

    </property>

    <!-- 配置失败自动切换实现方式 -->

    <property>

            <name>dfs.client.failover.proxy.provider.ns1</name>

           <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

    </property>

    <!-- 配置隔离机制 -->

    <property>

            <name>dfs.ha.fencing.methods</name>

            <value>sshfence</value>

    </property>

    <!-- 使用隔离机制时需要ssh免登陆 -->

    <property>

            <name>dfs.ha.fencing.ssh.private-key-files</name>

            <value>/root/.ssh/id_rsa</value>

    </property>

    </configuration>

    4. mapred-site.xml.template 重命名为mapred-site.xml

    <configuration>

    <!-- 指定mr框架为yarn方式 -->

    <property>

            <name>mapreduce.framework.name</name>

            <value>yarn</value>

    </property>

    </configuration>

    5. yarn-site.xml

    <configuration>

    <!-- 指定resourcemanager地址 -->

            <property>

                    <name>yarn.resourcemanager.hostname</name>

                    <value>h3</value>

            </property>

    <!-- 指定nodemanager启动时加载server的方式为shuffle server -->

            <property>

                    <name>yarn.nodemanager.aux-services</name>

                    <value>mapreduce_shuffle</value>

            </property>

    </configuration>

    6.slaves

      h4

      h5

      h6

    将在一台电脑上配置好的hadoop拷贝到其他电脑

    启动hadoop  和zookeeper(已配置到环境变量里面)

    1.先启动zookeeper在h4,h5,h6上

      zkServer.sh start

     查看状态zkServer.sh status(会发现有一个leader,两个follower)

    2. 启动journalnode(在h1上启动)

       hadoop-daemons.sh start journalnode

    3.格式化HDFS(在h1上启动)

    hadoop namenode –format

    此时会在hadoop目录里面产生tmp文件夹,将这个文件夹拷贝到h2上

    1. 格式化ZK(在h1上启动)

    hdfs zkfc –formatZK

    1. 启动hadoop(在h1上启动)

    start-all.sh

    此时可能在h3上的resourceManager没有启动,可以进入h3启动start-yarn.sh

    这时就可以通过web查看hadoop集群的各个状态,也可以用jps 命令查看进程

      
     //-----------------------------------------------------------------------------------
     //----------------------------Hadoop集群安装-------------------------------------
     //-----------------------------------------------------------------------------------

     
     //-----------------------------------------------------------------------------------
     //----------------------------hbase集群安装-------------------------------------
     //-----------------------------------------------------------------------------------

    hbase 集群配置

    1. conf/hbase-env.sh

    java_home=java路径

    export HBASE_MANAGES_ZK=false 

    使用独立的ZooKeeper时需要修改HBASE_MANAGES_ZK值为false,为不使用默认ZooKeeper实例。

    2. conf/hbase-site.xml

    <property>
     <name>hbase.rootdir</name>
     <value>hdfs://h1:9000/hbase</value>
    </property>
    <property>
     <name>hbase.cluster.distributed</name>
     <value>true</value>
    </property>
    <property>
    <name>hbase.master</name>
    <value>h1:60000</value>
    </property>
     <property>
     <name>hbase.master.port</name>
     <value>60000</value>
     <description>The port master should bind to.</description>
     </property>
     
     <property>
       <name>hbase.zookeeper.quorum</name>
       <value>h4,h5,h6</value>
     </property>

    3.conf/ regionservers

    h4

    h5

    h6

    启动hbase

    在h1 上

    start-hbase.sh

    在h2上

    start-hbase.sh

    这是可以通过web查看hbase的状态  ,会发现像namenode一样有一个Active 状态的hmaster和Standby 状态的hmaster

    至此完成集群。

     //-----------------------------------------------------------------------------------
     //----------------------------hbase集群安装-------------------------------------
     //-----------------------------------------------------------------------------------

     //-----------------------------------------------------------------------------------
     //----------------------------storm集群安装-------------------------------------
     //-----------------------------------------------------------------------------------

    storm的集群安装与配置

    机器:

    192.168.180.101

    192.168.187.16

    需要准备的软件有:

    zookeeper(zookeeper-3.4.4.tar.gz),storm(storm-0.8.1.zip) ,jdk

    1、配置zookeeper

    解压zookeeper,将conf目录下的zoo_sample.cfg 重命名为:zoo.cfg

    修改后内容为:

    复制代码
    # The number of milliseconds of each tick
    tickTime=2000
    # The number of ticks that the initial 
    # synchronization phase can take
    initLimit=10
    # The number of ticks that can pass between 
    # sending a request and getting an acknowledgement
    syncLimit=5
    # the directory where the snapshot is stored.
    # do not use /tmp for storage, /tmp here is just 
    # example sakes.
    dataDir=/data/zookeeper/data
    dataLogDir=/data/zookeeper/log
    # the port at which the clients will connect
    clientPort=2181
    #
    # Be sure to read the maintenance section of the 
    # administrator guide before turning on autopurge.
    #
    # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
    #
    # The number of snapshots to retain in dataDir
    #autopurge.snapRetainCount=3
    # Purge task interval in hours
    # Set to "0" to disable auto purge feature
    #autopurge.purgeInterval=1
    server.1=192.168.187.16:2888:3888
    server.2=192.168.180.101:2888:3888
    复制代码

    具体配置可以参看:
     http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_configuration

    注意,最后两行的配置:

    格式为:server.id=host:port:port

    id只能为数字 1-255,同时需要在 dataDir目录下面新建一个文件名为myid的文件,里面的内容只有一行:"id"

    Every machine that is part of the ZooKeeper ensemble should know about every other machine in the ensemble. You accomplish this with the series of lines of the form server.id=host:port:port. The parameters host and port are straightforward. You attribute the server id to each machine by creating a file named myid, one for each server, which resides in that server's data directory, as specified by the configuration file parameterdataDir.

    接着还需要添加环境变量:

    export ZOOKEEPER_HOME=/home/zhxia/apps/db/zookeeper

    两台机器上环境配置相同,但是myid文件内的id指不一样

     2、配置storm

    解压storm

    进入conf目录,编辑storm.yaml文件

    复制代码
    ########## These MUST be filled in for a storm configuration
     storm.zookeeper.servers:
         - "192.168.187.16"
         - "192.168.180.101"
    
     nimbus.host: "192.168.187.16"
    
     storm.local.dir: "/data/storm/data"
     ##### These may optionally be filled in:
    
    # List of custom serializations
    # topology.kryo.register:
    #     - org.mycompany.MyType
    #     - org.mycompany.MyType2: org.mycompany.MyType2Serializer
    #
    ## List of custom kryo decorators
    # topology.kryo.decorators:
    #     - org.mycompany.MyDecorator
    
    # Locations of the drpc servers
    # drpc.servers:
        # - "127.0.0.1"
         #- "server2"
    ## to nimbus 
    #nimbus.childopts: "-Xmx1024m" 
    #
    ## to supervisor 
    #supervisor.childopts: "-Xmx1024m" 
    #
    ## to worker 
    #worker.childopts: "-Xmx768m" 
    复制代码

    配置完成之后,开始启动zookeeper和storm

    启动zookeeper

    bin/zkServer.sh start

    启动storm

    bin/storm nimbus

    bin/storm supervisor

    bin/storm ui

    浏览器打开: http://localhost:8080 查看集群的运行状态

  • 相关阅读:
    代码的未来
    shell脚本中的[]/[[]]区别
    shell脚本判断文件类型
    muduo库安装
    protobuf安装
    讲给普通人听的分布式数据存储(转载)
    Oracle OCCI学习之开篇
    浅谈MySQL索引背后的数据结构及算法(转载)
    5.7版本mysql查询报错:com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException:...this is incompatible with sql_mode=only_full_group_by
    IDEA启动tomcat报错:java.lang.NoClassDefFoundError: org/springframework/context/ApplicationContext、ContainerBase.addChild: start: org.apache.catalina.LifecycleException: Failed to start component
  • 原文地址:https://www.cnblogs.com/cxzdy/p/5535067.html
Copyright © 2020-2023  润新知