• zookeeper选举代码分析


    本文将以zookeeper的3.4.6版本作为源码分析版本。主要的代码类包括QuorumPeerMain、QuorumPeer、FastLeaderElection、QuorumMaj等。

    假设有a,b,c三个zookeeper服务,serverid分别是1、2、3:

    1.先启动集群中的a服务,先投票自己a为leader,并将投票信息发送给自己;

           QuorumPeerMain对象调用QuorumPeer线程的startLeaderElection方法,最终调用FastLeaderElection的lookForLeader方法

    2.再启动集群中的b服务,先投票自己b为leader,并将投票信息发送给自己和a服务;

    3.a收到b的投票信息,服务a通过对比算法,选择b为leader,并通知自己和服务b;

        protected boolean totalOrderPredicate(long newId, long newZxid, long newEpoch, long curId, long curZxid, long curEpoch) {
            LOG.debug("id: " + newId + ", proposed id: " + curId + ", zxid: 0x" +
                    Long.toHexString(newZxid) + ", proposed zxid: 0x" + Long.toHexString(curZxid));
            if(self.getQuorumVerifier().getWeight(newId) == 0){
                return false;
            }
            
            /*
             * We return true if one of the following three cases hold:
             * 1- New epoch is higher
             * 2- New epoch is the same as current epoch, but new zxid is higher
             * 3- New epoch is the same as current epoch, new zxid is the same
             *  as current zxid, but server id is higher.
             */
            
            return ((newEpoch > curEpoch) || 
                    ((newEpoch == curEpoch) &&
                    ((newZxid > curZxid) || ((newZxid == curZxid) && (newId > curId)))));
        }

    4.服务a通过验证算法,确认b为leader;

    5.服务b收到a的通知信息,通过验证算法b确认自身为leader;

        protected boolean termPredicate(
                HashMap<Long, Vote> votes,
                Vote vote) {
    
            HashSet<Long> set = new HashSet<Long>();
    
            /*
             * First make the views consistent. Sometimes peers will have
             * different zxids for a server depending on timing.
             */
            for (Map.Entry<Long,Vote> entry : votes.entrySet()) {
                if (vote.equals(entry.getValue())){
                    set.add(entry.getKey());
                }
            }
    
            return self.getQuorumVerifier().containsQuorum(set);
        }
    
    
    public boolean containsQuorum(HashSet<Long> set){
            return (set.size() > half);
        }
    

    6.服务c启动后,发现集群已经选出leader为服务b;

    7.下面的代码是选举过程中最重要的代码:

        public Vote lookForLeader() throws InterruptedException {
            try {
                self.jmxLeaderElectionBean = new LeaderElectionBean();
                MBeanRegistry.getInstance().register(
                        self.jmxLeaderElectionBean, self.jmxLocalPeerBean);
            } catch (Exception e) {
                LOG.warn("Failed to register with JMX", e);
                self.jmxLeaderElectionBean = null;
            }
            if (self.start_fle == 0) {
               self.start_fle = System.currentTimeMillis();
            }
            try {
                HashMap<Long, Vote> recvset = new HashMap<Long, Vote>();
    
                HashMap<Long, Vote> outofelection = new HashMap<Long, Vote>();
    
                int notTimeout = finalizeWait;
    
                synchronized(this){
                    logicalclock++;
                    updateProposal(getInitId(), getInitLastLoggedZxid(), getPeerEpoch());
                }
    
                LOG.info("New election. My id =  " + self.getId() +
                        ", proposed zxid=0x" + Long.toHexString(proposedZxid));
                sendNotifications();
    
                /*
                 * Loop in which we exchange notifications until we find a leader
                 */
    
                while ((self.getPeerState() == ServerState.LOOKING) &&
                        (!stop)){
                    /*
                     * Remove next notification from queue, times out after 2 times
                     * the termination time
                     */
                    Notification n = recvqueue.poll(notTimeout,
                            TimeUnit.MILLISECONDS);
    
                    /*
                     * Sends more notifications if haven't received enough.
                     * Otherwise processes new notification.
                     */
                    if(n == null){
                        if(manager.haveDelivered()){
                            sendNotifications();
                        } else {
                            manager.connectAll();
                        }
    
                        /*
                         * Exponential backoff
                         */
                        int tmpTimeOut = notTimeout*2;
                        notTimeout = (tmpTimeOut < maxNotificationInterval?
                                tmpTimeOut : maxNotificationInterval);
                        LOG.info("Notification time out: " + notTimeout);
                    }
                    else if(self.getVotingView().containsKey(n.sid)) {
                        /*
                         * Only proceed if the vote comes from a replica in the
                         * voting view.
                         */
                        switch (n.state) {
                        case LOOKING:
                            System.out.println("receive message : electionEpoch,logicalclock"+n.electionEpoch+":"+logicalclock);
                            // If notification > current, replace and send messages out
                            if (n.electionEpoch > logicalclock) {
                                logicalclock = n.electionEpoch;
                                recvset.clear();
                                if(totalOrderPredicate(n.leader, n.zxid, n.peerEpoch,
                                        getInitId(), getInitLastLoggedZxid(), getPeerEpoch())) {
                                    updateProposal(n.leader, n.zxid, n.peerEpoch);
                                } else {
                                    updateProposal(getInitId(),
                                            getInitLastLoggedZxid(),
                                            getPeerEpoch());
                                }
                                sendNotifications();
                            } else if (n.electionEpoch < logicalclock) {
                                if(LOG.isDebugEnabled()){
                                    LOG.debug("Notification election epoch is smaller than logicalclock. n.electionEpoch = 0x"
                                            + Long.toHexString(n.electionEpoch)
                                            + ", logicalclock=0x" + Long.toHexString(logicalclock));
                                }
                                break;
                            } else if (totalOrderPredicate(n.leader, n.zxid, n.peerEpoch,
                                    proposedLeader, proposedZxid, proposedEpoch)) {
                                updateProposal(n.leader, n.zxid, n.peerEpoch);
                                sendNotifications();
                            }
    
                            if(LOG.isDebugEnabled()){
                                LOG.debug("Adding vote: from=" + n.sid +
                                        ", proposed leader=" + n.leader +
                                        ", proposed zxid=0x" + Long.toHexString(n.zxid) +
                                        ", proposed election epoch=0x" + Long.toHexString(n.electionEpoch));
                            }
    
                            recvset.put(n.sid, new Vote(n.leader, n.zxid, n.electionEpoch, n.peerEpoch));
    
                            if (termPredicate(recvset,
                                    new Vote(proposedLeader, proposedZxid,
                                            logicalclock, proposedEpoch))) {
    
                                // Verify if there is any change in the proposed leader
                                while((n = recvqueue.poll(finalizeWait,
                                        TimeUnit.MILLISECONDS)) != null){
                                    if(totalOrderPredicate(n.leader, n.zxid, n.peerEpoch,
                                            proposedLeader, proposedZxid, proposedEpoch)){
                                        recvqueue.put(n);
                                        break;
                                    }
                                }
    
                                /*
                                 * This predicate is true once we don't read any new
                                 * relevant message from the reception queue
                                 */
                                if (n == null) {
                                    self.setPeerState((proposedLeader == self.getId()) ?
                                            ServerState.LEADING: learningState());
    
                                    Vote endVote = new Vote(proposedLeader,
                                                            proposedZxid,
                                                            logicalclock,
                                                            proposedEpoch);
                                    leaveInstance(endVote);
                                    return endVote;
                                }
                            }
                            break;
                        case OBSERVING:
                            LOG.debug("Notification from observer: " + n.sid);
                            break;
                        case FOLLOWING:
                        case LEADING:
                            /*
                             * Consider all notifications from the same epoch
                             * together.
                             */
                            if(n.electionEpoch == logicalclock){
                                recvset.put(n.sid, new Vote(n.leader,
                                                              n.zxid,
                                                              n.electionEpoch,
                                                              n.peerEpoch));
                               
                                if(ooePredicate(recvset, outofelection, n)) {
                                    self.setPeerState((n.leader == self.getId()) ?
                                            ServerState.LEADING: learningState());
    
                                    Vote endVote = new Vote(n.leader, 
                                            n.zxid, 
                                            n.electionEpoch, 
                                            n.peerEpoch);
                                    leaveInstance(endVote);
                                    return endVote;
                                }
                            }
    
                            /*
                             * Before joining an established ensemble, verify
                             * a majority is following the same leader.
                             */
                            outofelection.put(n.sid, new Vote(n.version,
                                                                n.leader,
                                                                n.zxid,
                                                                n.electionEpoch,
                                                                n.peerEpoch,
                                                                n.state));
               
                            if(ooePredicate(outofelection, outofelection, n)) {
                                synchronized(this){
                                    logicalclock = n.electionEpoch;
                                    self.setPeerState((n.leader == self.getId()) ?
                                            ServerState.LEADING: learningState());
                                }
                                Vote endVote = new Vote(n.leader,
                                                        n.zxid,
                                                        n.electionEpoch,
                                                        n.peerEpoch);
                                leaveInstance(endVote);
                                return endVote;
                            }
                            break;
                        default:
                            LOG.warn("Notification state unrecognized: {} (n.state), {} (n.sid)",
                                    n.state, n.sid);
                            break;
                        }
                    } else {
                        LOG.warn("Ignoring notification from non-cluster member " + n.sid);
                    }
                }
                return null;
            } finally {
                try {
                    if(self.jmxLeaderElectionBean != null){
                        MBeanRegistry.getInstance().unregister(
                                self.jmxLeaderElectionBean);
                    }
                } catch (Exception e) {
                    LOG.warn("Failed to unregister with JMX", e);
                }
                self.jmxLeaderElectionBean = null;
            }
        }
    

      

  • 相关阅读:
    Windows编程中UNICODE和_UNICODE定义问题
    关于暗时间
    一种简单的图像显著性计算模型
    L2TP连接尝试失败,因为安全层在初始化与远程计算机的协商时遇到了一个处理错误
    L2TP连接尝试失败,因为安全层在初始化与远程计算机的协商时遇到了一个处理错误
    Python正则表达式指南
    【Python】解压文件/ZIP等 并实时计算解压进度
    使用html2canvas和jsPdf实现打印功能
    Windows Phone 7 开发资源汇总
    批量删除数据库中有特定开始字符的表、视图和存储过程
  • 原文地址:https://www.cnblogs.com/mantu/p/6166096.html
Copyright © 2020-2023  润新知