• [大数据学习研究] 4. Zookeeper-分布式服务的协同管理神器


    本来这一节想写Hadoop的分布式高可用环境的搭建,写到一半,发现还是有必要先介绍一下ZooKeeper这个东西。

    ZooKeeper理念介绍

    ZooKeeper是为分布式应用来提供协同服务的,而且ZooKeeper本身也是分布式的,由分布在至少三台机器上,这几台机器形成一个Quorum,就像一个剧团一样。这个团里有个团长,就是leader的角色,其他的是follower。这个剧团里的每个人脑子里都记住同样的东西(ZooKeeper是基于内存的),并且及时和leader保持同步,所有client可连接任何一个server即可。剧团里的每个人都有一个编号myid。如果剧团里的leader挂断后,剩下的几个要重新选举出新的leader来确保服务正常运行。

    1. ZooKeepe的安装

    ZooKeeper的安装挺简单,就是解压,设置环境变量就可以了

    [root@hadoop100 bin]# tar -zxvf /opt/software/zookeeper-3.4.10.tar.gz -C /opt/modules/

    打开/ect/profile 编辑环境变量,加上下面的内容:

    #JAVA_HOME
    export JAVA_HOME=/opt/modules/jdk1.8.0_121
    export PATH=$PATH:$JAVA_HOME/bin
    
    #HADOOP_HOME
    export HADOOP_HOME=/opt/modules/hadoop-2.7.3
    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
    
    #ZOOKEEPER
    export ZOOKEEPER_HOME=/opt/modules/zookeeper-3.4.10
    export PATH=$PATH:ZOOKEEPER_HOME/bin

    然后 source /ect/profile 让更改生效。记得用xsync 和xcall超级脚本,把更改同步到整个集群。

    [root@hadoop100 bin]# xsync /etc/profile
    [root@hadoop100 bin]# xcall source /etc/profile

    2. ZooKeeper的配置

    1. Zookeeper 需要一个data目录,用于存储zookeeper内存数据库的镜像和日志。然后更改zoo.cfg文件。ZooKeeper解压后提供了一个/opt/modules/zookeeper-3.4.10/conf/zoo_sample.cfg文件,把这个复制一下或者改个名字叫zoo.cfg, 修改一下里面的dataDir的指向。

    # The number of milliseconds of each tick
    tickTime=2000
    # The number of ticks that the initial
    # synchronization phase can take
    initLimit=10
    # The number of ticks that can pass between
    # sending a request and getting an acknowledgement
    syncLimit=5
    # the directory where the snapshot is stored.
    # do not use /tmp for storage, /tmp here is just
    # example sakes.
    dataDir=/opt/modules/zookeeper-3.4.10/zkData
    # the port at which the clients will connect
    clientPort=2181
    # the maximum number of client connections.
    # increase this if you need to handle more clients
    #maxClientCnxns=60
    #
    # Be sure to read the maintenance section of the
    # administrator guide before turning on autopurge.
    #
    # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
    #
    # The number of snapshots to retain in dataDir
    #autopurge.snapRetainCount=3
    # Purge task interval in hours
    # Set to "0" to disable auto purge feature
    #autopurge.purgeInterval=1
    ~

    要搭建ZooKeeper的机器环境,zookeeper服务器的数量应该是奇数台。最少要3台。

    # 连接到leader 服务器的tick数,超过这个tick数 这台服务器还没有连接上leader,那这台机
    器就被认为是死掉了
    initLimit = 5
    # 在和leader同步过程中所允许落后的最大tick数,如果超过这个,那就是掉队了
    syncLimit = 2
    server.100=hadoop100:2888:3888
    server.101=hadoop101:2888:3888
    server.102=hadoop102:2888:3888
    server.103=hadoop103:2888:3888
    server.104=hadoop104:2888:3888

     机器的参数配置的格式是这样的:

    Server.A=B:C:D。
    A是一个数字,表示这个是第几号服务器;
    B是这个服务器的ip地址;
    C是这个服务器与集群中的Leader服务器交换信息的端口;
    D是万一集群中的Leader服务器挂了,需要一个端口来重新进行选举,选出一个新的Leader,而这个端口就是用来执行选举时服务器相互通信的端口。

    注意更改完毕后别忘了分发到集群中。zookeeper本身是也分布式的。先把相关文件分发到集群中的其他机器上。

    [root@hadoop100 modules]# xsync zookeeper-3.4.10/

    然后为每台机器做上独特的标记,在data目录里创建myId文件,内容就是上面配置文件中的数字

    [root@hadoop100 zookeeper-3.4.10]# cd zkData/
    [root@hadoop100 zkData]# echo 100 > myid

    在集群的其他几台机器上修改myid文件的内容,让myid的内容和配置文件中的编号一致。这时候只能麻烦点,依次登录到每台机器上创建 data目录下的myid文件了。

    [root@hadoop100 zkData]# ssh hadoop101

    Last login: Thu Sep 19 14:10:35 2019 from gateway
    [root@hadoop101 ~]# echo 101 > /opt/modules/zookeeper-3.4.10/zkData/myid
    [root@hadoop101 ~]#exit

    [root@hadoop100 zkData]# ssh hadoop101
    Last login: Thu Sep 19 14:10:35 2019 from gateway
    [root@hadoop101 ~]# echo 101 > /opt/modules/zookeeper-3.4.10/zkData/myid
    [root@hadoop101 ~]# exit
    logout
    Connection to hadoop101 closed.
    [root@hadoop100 zkData]# ssh hadoop102
    Last login: Tue Sep 17 13:26:48 2019 from hadoop100
    [root@hadoop102 ~]# echo 102 > /opt/modules/zookeeper-3.4.10/zkData/myid
    [root@hadoop102 ~]# exit
    logout
    Connection to hadoop102 closed.
    [root@hadoop100 zkData]# ssh hadoop103
    Last login: Tue Sep 17 13:17:00 2019 from hadoop100
    [root@hadoop103 ~]# echo 103 > /opt/modules/zookeeper-3.4.10/zkData/myid
    [root@hadoop103 ~]# exit
    logout
    Connection to hadoop103 closed.
    [root@hadoop100 zkData]# ssh hadoop104
    Last login: Tue Sep 17 11:04:38 2019 from hadoop100
    [root@hadoop104 ~]# echo 104 > /opt/modules/zookeeper-3.4.10/zkData/myid
    [root@hadoop104 ~]# exit
    logout
    Connection to hadoop104 closed.

    检查一下确保没问题

    [root@hadoop100 bin]# xcall cat /opt/modules/zookeeper-3.4.10/zkData/myid
    ---------running at localhost--------
    100
    ---------running at hadoop101-------
    101
    ---------running at hadoop102-------
    102
    ---------running at hadoop103-------
    103
    ---------running at hadoop104-------
    104
    [root@hadoop100 bin]#

    好了,基本配置好了,准备启动了,ZooKeeper集群都要启动ZooKeeper服务。我用之前介绍过的超级脚本xcall. (后来发现用这种方式靠不住,说是启动了,其实没启动 ;;;) 

    [root@hadoop100 zkData]# xcall /opt/modules/zookeeper-3.4.10/bin/zkServer.sh start
    ---------running at localhost--------
    ZooKeeper JMX enabled by default
    Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
    Starting zookeeper ... STARTED
    ---------running at hadoop101-------
    ZooKeeper JMX enabled by default
    Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
    Starting zookeeper ... STARTED
    ---------running at hadoop102-------
    ZooKeeper JMX enabled by default
    Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
    Starting zookeeper ... STARTED
    ---------running at hadoop103-------
    ZooKeeper JMX enabled by default
    Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
    Starting zookeeper ... STARTED
    ---------running at hadoop104-------
    ZooKeeper JMX enabled by default
    Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
    Starting zookeeper ... STARTED
    [root@hadoop100 zkData]#

    错误排查:Error contacting service. It is probably not running.

    查看一下运行状态, 啊哦,怎么没启动呢? 

    [root@hadoop100 bin]# xcall /opt/modules/zookeeper-3.4.10/bin/zkServer.sh status
    ---------running at localhost--------
    ZooKeeper JMX enabled by default
    Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
    Error contacting service. It is probably not running.
    ---------running at hadoop101-------
    ZooKeeper JMX enabled by default
    Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
    Error contacting service. It is probably not running.
    ---------running at hadoop102-------
    ZooKeeper JMX enabled by default
    Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
    Error contacting service. It is probably not running.
    ---------running at hadoop103-------
    ZooKeeper JMX enabled by default
    Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
    Error contacting service. It is probably not running.
    ---------running at hadoop104-------
    ZooKeeper JMX enabled by default
    Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
    Error contacting service. It is probably not running.
    [root@hadoop100 bin]#

    后来发现需要单独ssh到每台机器上单独启动就可以了,可能是xcall神器有的时候不可靠。不过提示一点,zkServer.sh start-foreground 命令,可以在查看详细启动过程,方便排查错误。

    [root@hadoop101 ~]# /opt/modules/zookeeper-3.4.10/bin/zkServer.sh start-foreground
    ZooKeeper JMX enabled by default
    Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
    2019-09-19 14:52:29,093 [myid:] - INFO  [main:QuorumPeerConfig@134] - Reading configuration from: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
    2019-09-19 14:52:29,122 [myid:] - INFO  [main:QuorumPeer$QuorumServer@167] - Resolved hostname: hadoop104 to address: hadoop104/192.168.56.104
    2019-09-19 14:52:29,123 [myid:] - INFO  [main:QuorumPeer$QuorumServer@167] - Resolved hostname: hadoop103 to address: hadoop103/192.168.56.103
    2019-09-19 14:52:29,123 [myid:] - INFO  [main:QuorumPeer$QuorumServer@167] - Resolved hostname: hadoop102 to address: hadoop102/192.168.56.102
    2019-09-19 14:52:29,124 [myid:] - INFO  [main:QuorumPeer$QuorumServer@167] - Resolved hostname: hadoop101 to address: hadoop101/192.168.56.101
    2019-09-19 14:52:29,124 [myid:] - INFO  [main:QuorumPeer$QuorumServer@167] - Resolved hostname: hadoop100 to address: hadoop100/192.168.56.100
    2019-09-19 14:52:29,124 [myid:] - INFO  [main:QuorumPeerConfig@396] - Defaulting to majority quorums
    2019-09-19 14:52:29,134 [myid:101] - INFO  [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 3
    2019-09-19 14:52:29,135 [myid:101] - INFO  [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 0
    2019-09-19 14:52:29,135 [myid:101] - INFO  [main:DatadirCleanupManager@101] - Purge task is not scheduled.
    2019-09-19 14:52:29,150 [myid:101] - INFO  [main:QuorumPeerMain@127] - Starting quorum peer
    2019-09-19 14:52:29,171 [myid:101] - INFO  [main:NIOServerCnxnFactory@89] - binding to port 0.0.0.0/0.0.0.0:2181
    2019-09-19 14:52:29,172 [myid:101] - ERROR [main:QuorumPeerMain@89] - Unexpected exception, exiting abnormally
    java.net.BindException: Address already in use
        at sun.nio.ch.Net.bind0(Native Method)
        at sun.nio.ch.Net.bind(Net.java:433)
        at sun.nio.ch.Net.bind(Net.java:425)
        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
        at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:90)
        at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:130)
        at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
        at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
    [root@hadoop101 ~]#

    如果jps命令能看到QuorumPeerMain就是已经启动成功了。

    [root@hadoop100 bin]# jps
    1885 QuorumPeerMain
    2029 Jps

    SSH单独登录到各个服务器上依次启动,并查看状态,可以发现我现在的集群环境中hadoop102是leader,其他几台是follower:

    [root@hadoop100 bin]# /opt/modules/zookeeper-3.4.10/bin/zkServer.sh status
    ZooKeeper JMX enabled by default
    Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
    Mode: follower
    [root@hadoop100 bin]# ssh hadoop101
    Last login: Thu Sep 19 15:04:12 2019 from hadoop100
    [root@hadoop101 ~]# /opt/modules/zookeeper-3.4.10/bin/zkServer.sh status
    ZooKeeper JMX enabled by default
    Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
    Mode: follower
    [root@hadoop101 ~]# exit
    logout
    Connection to hadoop101 closed.
    [root@hadoop100 bin]# ssh hadoop102
    Last login: Thu Sep 19 15:04:48 2019 from hadoop100
    [root@hadoop102 ~]# /opt/modules/zookeeper-3.4.10/bin/zkServer.sh status
    ZooKeeper JMX enabled by default
    Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
    Mode: leader
    [root@hadoop102 ~]# exit
    logout
    Connection to hadoop102 closed.
    [root@hadoop100 bin]# ssh hadoop103
    Last login: Thu Sep 19 15:05:07 2019 from hadoop100
    [root@hadoop103 ~]# /opt/modules/zookeeper-3.4.10/bin/zkServer.sh status
    ZooKeeper JMX enabled by default
    Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
    Mode: follower
    [root@hadoop103 ~]# exit
    logout
    Connection to hadoop103 closed.
    [root@hadoop100 bin]# ssh hadoop104
    Last login: Thu Sep 19 15:05:51 2019 from hadoop100
    [root@hadoop104 ~]# /opt/modules/zookeeper-3.4.10/bin/zkServer.sh status
    ZooKeeper JMX enabled by default
    Using config: /opt/modules/zookeeper-3.4.10/bin/../conf/zoo.cfg
    Mode: follower
    [root@hadoop104 ~]# exit
    logout
    Connection to hadoop104 closed.
    [root@hadoop100 bin]#

    好了,到现在为止,我的ZooKeeper集群环境已经搭建成功了。 

    题外话

    学习研究的话可以用虚拟机,真要认真做点事还是要上云,比如阿里云。如果你需要,可以用我的下面这个链接,有折扣返现。

    https://promotion.aliyun.com/ntms/yunparter/invite.html?userCode=vltv9frd

  • 相关阅读:
    2.截取部分字符串中的内容(可做文件上传时的文件重命名)
    1.git fetch的使用
    2.java.lang.IllegalStateException: Optional long parameter 'id' is present but cannot be translated into a null value due to being declared as a primitive type. Consider declaring it ......Springmvc报错
    彩色动态球
    小球落下的动画
    form表单
    repeating-radial-gradient示例
    background示例一
    css中关于以background开的的介绍
    块元素居中的范例
  • 原文地址:https://www.cnblogs.com/junqilian/p/11550793.html
Copyright © 2020-2023  润新知