• k8s中Controller-Manager和Scheduler的选主逻辑


    K8s中的control-plane包括了apiserver、controller-manager、scheduler、etcd,当搭建高可用集群时就会涉及到部分组件的选主问题。etcd是整个集群所有状态信息的存储,涉及数据的读写和多个etcd之间数据的同步,对数据的一致性要求严格,所以使用较复杂的raft算法来选择用于提交数据的主节点。而apiserver作为集群入口,本身是无状态的web服务器,多个apiserver服务之间直接负载请求并不需要做选主。Controller-Manager和Scheduler作为任务类型的组件,比如controller-manager内置的k8s各种资源对象的控制器实时的watch apiserver获取对象最新的变化事件做期望状态和实际状态调整,调度器watch未绑定节点的pod做节点选择,显然多个这些任务同时工作是完全没有必要的,所以controller-manager和scheduler也是需要选主的,但是选主逻辑和etcd不一样的,这里只需要保证从多个controller-manager和scheduler之间选出一个进入工作状态即可,而无需考虑它们之间的数据一致和同步。

    kube-scheduler中关于leader选择的参数描述

    / # kube-scheduler -h 2>&1 | grep -i leader--leader-elect                                                      Start a leader election client and gain leadership before executing the main loop. Enable this when running replicated components for high availability. (default true)
          --leader-elect-lease-duration duration                              The duration that non-leader candidates will wait after observing a leadership renewal until attempting to acquire leadership of a led but unrenewed leader slot. This is effectively the maximum duration that a leader can be stopped before it is replaced by another candidate. This is only applicable if leader election is enabled. (default 15s)
          --leader-elect-renew-deadline duration                              The interval between attempts by the acting master to renew a leadership slot before it stops leading. This must be less than or equal to the lease duration. This is only applicable if leader election is enabled. (default 10s)
          --leader-elect-resource-lock endpoints                              The type of resource object that is used for locking during leader election. Supported options are endpoints (default) and `configmaps`. (default "endpoints")
          --leader-elect-retry-period duration                                The duration the clients should wait between attempting acquisition and renewal of a leadership. This is only applicable if leader election is enabled. (default 2s)

    基于k8s 1.11源码分析,Lock Resouce为Endpoint

    1、调度器启动时先选举leader,再回调schuduler的run方法进入调度逻辑

    // https://sourcegraph.com/github.com/kubernetes/kubernetes@release-1.11/-/blob/cmd/kube-scheduler/app/server.go
    
    func Run(c schedulerserverconfig.CompletedConfig, stopCh <-chan struct{}) error {
    ......
    // Prepare a reusable run function.
        run := func(stopCh <-chan struct{}) {
            sched.Run()
            <-stopCh
        }
    
        // If leader election is enabled, run via LeaderElector until done and exit.
        if c.LeaderElection != nil {
            c.LeaderElection.Callbacks = leaderelection.LeaderCallbacks{
                OnStartedLeading: run,
                OnStoppedLeading: func() {
                    utilruntime.HandleError(fmt.Errorf("lost master"))
                },
            }
            leaderElector, err := leaderelection.NewLeaderElector(*c.LeaderElection)
            leaderElector.Run()
    }
    }

    2、直接调用Acquire方法来尝试竞选为leader

    // Run starts the leader election loop
    func (le *LeaderElector) Run() {
        defer func() {
            runtime.HandleCrash()
            le.config.Callbacks.OnStoppedLeading()
        }()
        le.acquire()
        stop := make(chan struct{})
        go le.config.Callbacks.OnStartedLeading(stop)
        le.renew()
        close(stop)
    }

    3、Acquire方法以leader-elect-retry-period指定的时间为间隔,循环调用TryAcquireOrRenew方法,其中的le.config.Lock类型为EndpointsLock,EndpointsLock.Identity()方法返回自己的主机名,EndpointsLock.Get方法请求apiServer获取保存在etcd中的选举记录。

    如果从apiserver获取ep选举记录对象失败,则尝试自己作为leader

    以自己观察到的observe时间来看,如果租约(15s)未到,并且自己不是leader,不能去抢占为leader,所以就没有其他可以做的了

    如果当前自己就是leader,不管租约是否到期,都以当前时间尝试续约,竞选时间acquireTime保持、leader切换次数保持,否则切换次数加1

    向apiserver发送更新ep选举记录对象的请求,由apiserver来保证多个客户端的原子更新操作,通过对比resourceVersion版本号(对应etcd中的modifiedindex编号),保证只有一个client能修改成功,其余的返回409

    Lock被初始化为EndpointsLock
    type EndpointsLock struct {
        // EndpointsMeta should contain a Name and a Namespace of an
        // Endpoints object that the LeaderElector will attempt to lead.
        EndpointsMeta metav1.ObjectMeta
        Client        corev1client.EndpointsGetter
        LockConfig    ResourceLockConfig
        e             *v1.Endpoints
    }
    
    // Get returns the election record from a Endpoints Annotation
    func (el *EndpointsLock) Get() (*LeaderElectionRecord, error) {
        var record LeaderElectionRecord
        el.e, err = el.Client.Endpoints(el.EndpointsMeta.Namespace).Get(el.EndpointsMeta.Name, metav1.GetOptions{})
        if recordBytes, found := el.e.Annotations[LeaderElectionRecordAnnotationKey]; found {
            if err := json.Unmarshal([]byte(recordBytes), &record); err != nil {
                return nil, err
            }
        }
        return &record, nil
    }
    
    //如果自己不是leader,尝试竞选为leader,如果自己就是leader,尝试renew续租
    // tryAcquireOrRenew tries to acquire a leader lease if it is not already acquired,
    // else it tries to renew the lease if it has already been acquired. Returns true
    // on success else returns false.
    func (le *LeaderElector) tryAcquireOrRenew() bool {
        now := metav1.Now()
        // 这个Identity()返回的就是自己的hostname + "_" + string(uuid.NewUUID())
    // 初始化一个leader是自己的leaderElectionRecord对象,为自己acquire成功时准备 leaderElectionRecord := rl.LeaderElectionRecord{ HolderIdentity: le.config.Lock.Identity(), LeaseDurationSeconds: int(le.config.LeaseDuration / time.Second), RenewTime: now, AcquireTime: now, } // 1. obtain or create the ElectionRecord oldLeaderElectionRecord, err := le.config.Lock.Get()
    // 如果从apiserver获取ep失败,则尝试自己作为leader
    if err != nil { le.observedRecord = leaderElectionRecord le.observedTime = le.clock.Now() return true } // 2. Record obtained, check the Identity & Time
    // apiServer中的leader对象和自己记录的不一样,更新自己的记录 if !reflect.DeepEqual(le.observedRecord, *oldLeaderElectionRecord) { le.observedRecord = *oldLeaderElectionRecord le.observedTime = le.clock.Now() }

    //以自己观察到的observe时间来看,如果租约(15s)未到,并且自己不是leader,那么自己没有其他可以做的了
    if le.observedTime.Add(le.config.LeaseDuration).After(now.Time) && oldLeaderElectionRecord.HolderIdentity != le.config.Lock.Identity() { return false } // 3. We're going to try to update. The leaderElectionRecord is set to it's default // here. Let's correct it before updating.
    // 走到这里可能:1、自己不是leader,但是租约到期了 2、自己是leader,但租约没有到期 3、自己是leader,但是租约到期
    // 如果当前自己就是leader,即对应2、3,不管租约是否到期,都以当前时间尝试续约,竞选时间acquireTime保持、leader切换次数保持,否则切换次数加1
    if oldLeaderElectionRecord.HolderIdentity == le.config.Lock.Identity() { leaderElectionRecord.AcquireTime = oldLeaderElectionRecord.AcquireTime leaderElectionRecord.LeaderTransitions = oldLeaderElectionRecord.LeaderTransitions } else { leaderElectionRecord.LeaderTransitions = oldLeaderElectionRecord.LeaderTransitions + 1 } // update the lock itself
    // 向apiserver发送更新ep的请求,由apiserver来保证多个客户端的原子更新操作,其resourceVersion版本号机制保证只有一个client能修改成功
    if err = le.config.Lock.Update(leaderElectionRecord); err != nil { glog.Errorf("Failed to update lock: %v", err) return false } le.observedRecord = leaderElectionRecord le.observedTime = le.clock.Now() return true }
  • 相关阅读:
    Centos7 tomcat 启动权限
    Tomcat下post请求大小设置
    postgres安装时locale的选择
    flink 1.11.2 学习笔记(1)-wordCount
    prometheus学习笔记(3)-使用exporter监控mysql
    prometheus学习笔记(2)-利用java client写入数据
    mock测试及jacoco覆盖率
    shading-jdbc 4.1.1 + tk.mybatis + pagehelper 1.3.x +spring boot 2.x 使用注意事项
    prometheus学习笔记(1)-mac单机版环境搭建
    redis数据类型HyperLogLog的使用
  • 原文地址:https://www.cnblogs.com/orchidzjl/p/13651608.html
Copyright © 2020-2023  润新知