• kubernets之高级调度


    一 节点的污点以及pod的容忍度以及节点的亲缘性对比

      

      1.1 首先需要介绍的是节点的污点以及pod的污点容忍度

        污点是节点的属性,容忍度是pod的属性,只有当一个pod的容忍度包含节点的污点,pod才能够将其调度到该节点上

      1.2 对比污点和容忍度以及节点的亲缘性的应用场景

        节点的污点是,通过对现有的节点上面添加污点,来拒绝某些pod被调度过来,而节点的亲缘性是在pod定义上明确的指出这个pod可以或者不可以调度到某个节点上面。

    二 认识了解节点的污点以及pod的容忍度

      2.1 查看集群节点的污点

    Name:               master
    Roles:              master
    Labels:             beta.kubernetes.io/arch=amd64
                        beta.kubernetes.io/os=linux
                        kubernetes.io/arch=amd64
                        kubernetes.io/hostname=master
                        kubernetes.io/os=linux
                        node-role.kubernetes.io/master=
    Annotations:        flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"b6:6a:dc:5d:74:7e"}
                        flannel.alpha.coreos.com/backend-type: vxlan
                        flannel.alpha.coreos.com/kube-subnet-manager: true
                        flannel.alpha.coreos.com/public-ip: 172.16.70.6
                        kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
                        node.alpha.kubernetes.io/ttl: 0
                        volumes.kubernetes.io/controller-managed-attach-detach: true
    CreationTimestamp:  Mon, 21 Dec 2020 11:40:57 +0800
    Taints:             node-role.kubernetes.io/master:NoSchedule
    • 主节点包含一个 node-role.kubernetes.io/master:NoSchedule的污点
    • 一般没有这个容忍度的pod无法调度到这个节点,只有系统级别的pod才能调度到这个系统节点上面来

      2.2 显示pod的容忍度

    Name:                 kube-proxy-z6nwk
    Namespace:            kube-system
    
    ......
    
    QoS Class:       BestEffort
    Node-Selectors:  <none>
    Tolerations:     op=Exists
                     CriticalAddonsOnly op=Exists
                     node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                     node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                     node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                     node.kubernetes.io/not-ready:NoExecute op=Exists
                     node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                     node.kubernetes.io/unreachable:NoExecute op=Exists
                     node.kubernetes.io/unschedulable:NoSchedule op=Exists
    Events:          <none>
    • 红色字体部分表示是该系统pod的污点容忍度

     

      2.3 了解污点的效果

       每一个节点的污点都关联一个效果

      • NoSchedule表示如果pod没有容忍这些污点,pod则不能调度含有这些污点的节点上面
      • preferNoSchedule表示尽量阻止pod调度到这个节点上面来,如果pod实在没地方调度,也允许调度这个节点
      • NoExecute表示对节点上面的pod都有影响,就是如果节点上面的pod没有这些容忍度,就算已经在了,也会被驱逐出去

      2.4  为节点添加污点或者删除污点

    ##添加污点

    k taint node node01 node-type=product:NoSchedule

    ###删除污点

    k taint node node01 node-type:NoSchedule-

      2.5 创建几个pod观察效果

    [root@node01 wxm]# k get po test{1,2,3,4,5} -o wide
    NAME    READY   STATUS    RESTARTS   AGE     IP             NODE     NOMINATED NODE   READINESS GATES
    test1   1/1     Running   0          2m41s   10.244.1.126   node02   <none>           <none>
    test2   1/1     Running   0          2m26s   10.244.1.127   node02   <none>           <none>
    test3   1/1     Running   0          2m21s   10.244.1.128   node02   <none>           <none>
    test4   1/1     Running   0          2m15s   10.244.1.129   node02   <none>           <none>
    test5   1/1     Running   0          2m10s   10.244.1.130   node02   <none>           <none>
    • 可以看到pod都被调度到node02上面去了,因为node01被加了污点

      2.6 在pod上面添加污点容忍度

    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
      name: prod
    spec:
      replicas: 5
      template:
        metadata:
          name: prod
          labels:
            app: prod
        spec:
          containers:
          - image: busybox
            command: ["sleep","999999"]
            name: busybox
          tolerations:
          - key: node-type
            operator: Equal
            value: product
            effect: NoSchedule
    • 在pod上面添加污点容忍度,表示允许将pod调度那些有污点的节点上去,但是不代表只能调度到相应的污点节点上面去

       2.7 观察其他pod上面的污点容忍度

    Name:           prod-7c8c7f9b47-xcbkt
    Namespace:      default
         ........
    Tolerations:     node-type=product:NoSchedule
                     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                     node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
         ........
    • 这个pod有三个污点容忍度,第一个是可以容忍node-type的值为product,可以把这个pod调度到具有这些污点的节点
    • 第二个第三个分别是当节点为不可达或者为unreachable的时候,节点上面的pod将会被重新调度走,300s是这个调度走之前的等待时间,当节点被标记为不可调度或者不可达的时候,还会继续等待300s,如果300s后仍然是是当前状态,则会将这些pod调度到其他节点上面去

    三 认识节点的亲缘性

      3.1  将集群的2个节点分别添加可用去标签以及是否独享和共享的标签

    k label node node01 availability-zone=zone1
    k label node node02 availability-zone=zone2
    k label node node01 share-type=dedicated
    k label node node02 share-type=shared

      

      3.2 之后部署一组deployment并且对zone1和dedicated的亲和权重分别设置为80%以及20%

    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
      name: prod
    spec:
      replicas: 5
      template:
        metadata:
          name: prod
          labels:
            app: prod
        spec:
          containers:
          - image: busybox
            command: ["sleep","999999"]
            name: busybox
          affinity:
            nodeAffinity:
              preferredDuringSchedulingIgnoredDuringExecution:
              - weight: 80
                preference:
                  matchExpressions:
                  - key: availability-zone
                    operator: In
                    values:
                    - zone1
              - weight: 20
                preference:
                  matchExpressions:
                  - key: share-type
                    operator: In
                    values:
                    - dedicated

      3.3 观察这些pod被调度到了什么位置上面去了

    [root@node01 Chapter16]# k get po -o wide
    NAME                  READY   STATUS    RESTARTS   AGE   IP             NODE     NOMINATED NODE   READINESS GATES
    prod-5ffd8886-4646n   1/1     Running   0          52m   10.244.2.130   node01   <none>           <none>
    prod-5ffd8886-c8p4v   1/1     Running   0          52m   10.244.2.129   node01   <none>           <none>
    prod-5ffd8886-crhtp   1/1     Running   0          52m   10.244.1.156   node02   <none>           <none>
    prod-5ffd8886-hrn2b   1/1     Running   0          52m   10.244.2.131   node01   <none>           <none>
    prod-5ffd8886-x4qv6   1/1     Running   0          52m   10.244.2.132   node01   <none>           <none>
    • 可以看到这个deployment的的pod基本都调度到了node01上面这个是符合预期的,因为我们node01节点上面的labels为zone1,正好符合deployment的预期
    • 你可能有一点奇怪的是,为什么会有一台跑到了节点2上面去了,那是因为调度算法里面有一个条例尽量不要让所有的pod都在同一台节点上,这样当这个节点挂了,就没有其他pod可以对外提供服务了

    四 认识pod之间的亲缘性

      4.1 什么是pod之间的亲缘性?有什么应用场景

        pod之间的亲缘性,就是可以在某个节点上面的pod具有某个标签,然后将其他pod也部署到这个节点上面去,举一个例子来说,我在node01上面部署了一个后段pod,我们希望在部署3个前端pod的时候,让这些前端的pod跟他部署在同一个机器上面,这样就能大大的增加性能。

        

      4.2 部署一个例子,第一步部署一个后段pod

    k run backend -l app=backend --image=busybox -- sleep 999999

      4.3 之后部署5个前端pod

    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
      name: prod
    spec:
      replicas: 5
      template:
        metadata:
          name: prod
          labels:
            app: prod
        spec:
          containers:
          - image: busybox
            command: ["sleep","999999"]
            name: busybox
          affinity:
            podAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
              - topologyKey: kubernetes.io/hostname
                labelSelector:
                  matchLabels:
                    app: backend

      4.3 观察是否在同一个机器上面

    NAME                   READY   STATUS    RESTARTS   AGE    IP             NODE     NOMINATED NODE   READINESS GATES
    backend                1/1     Running   0          12m    10.244.1.157   node02   <none>           <none>
    prod-bdd66cb75-7bcmj   1/1     Running   0          118s   10.244.1.162   node02   <none>           <none>
    prod-bdd66cb75-r69q7   1/1     Running   0          118s   10.244.1.161   node02   <none>           <none>
    prod-bdd66cb75-r8fjn   1/1     Running   0          118s   10.244.1.159   node02   <none>           <none>
    prod-bdd66cb75-twphc   1/1     Running   0          118s   10.244.1.158   node02   <none>           <none>
    prod-bdd66cb75-vkrm8   1/1     Running   0          118s   10.244.1.160   node02   <none>           <none>

      结果符合预期:前端的pod都已经调度到了与后段相同的节点上面了

    五 表达pod的亲缘性的取消强制要求以及节点的非亲缘性

      

      5.1 将之前的强制变为优先调度到这些节点上,如果不满足也可以调度到其他节点上,一个例子如何所示

    [root@node01 Chapter16]# cat frontend-podaffinity-host-2.yaml
    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
      name: prod
    spec:
      replicas: 5
      template:
        metadata:
          name: prod
          labels:
            app: prod
        spec:
          containers:
          - image: busybox
            command: ["sleep","999999"]
            name: busybox
          affinity:
            podAffinity:
              preferredDuringSchedulingIgnoredDuringExecution:
              - weight: 80
                podAffinityTerm:
                  topologyKey: kubernetes.io/hostname
                  labelSelector:
                    matchLabels:
                      app: backend

      5.2 查看前端的pod如何调度

    [root@node01 Chapter16]# k get po -o wide
    NAME                    READY   STATUS    RESTARTS   AGE   IP             NODE     NOMINATED NODE   READINESS GATES
    backend                 1/1     Running   0          37m   10.244.1.157   node02   <none>           <none>
    prod-7cd8bf84c4-862qw   1/1     Running   0          11s   10.244.1.165   node02   <none>           <none>
    prod-7cd8bf84c4-db6j5   1/1     Running   0          11s   10.244.1.164   node02   <none>           <none>
    prod-7cd8bf84c4-dzsq7   1/1     Running   0          11s   10.244.2.133   node01   <none>           <none>
    prod-7cd8bf84c4-fhds5   1/1     Running   0          11s   10.244.1.163   node02   <none>           <none>
    prod-7cd8bf84c4-mkps2   1/1     Running   0          11s   10.244.1.166   node02   <none>           <none>
    • 大部分都调度到了与backend相同的机器上面去了
    • 还有一个调度到了节点以上面,这就是我们不加强制性的好处
    • 所以应该用非强制性而不是强制性调度

      

      5.3 利用pod的非亲缘性将pod调度到不同的节点

    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
      name: prod
    spec:
      replicas: 5
      template:
        metadata:
          name: prod
          labels:
            app: prod
        spec:
          containers:
          - image: busybox
            command: ["sleep","999999"]
            name: busybox
          affinity:
            podAntiAffinity:
              requiredDuringSchedulingIgnoredDuringExecution:
              - topologyKey: kubernetes.io/hostname
                labelSelector:
                  matchLabels:
                    app: prod
    • 非亲缘性节点是与之相同的pod的标签都无法调度上去
    • 关键字是 podAntiAffinity

      5.4 查看调度结果

    NAME                    READY   STATUS    RESTARTS   AGE     IP             NODE     NOMINATED NODE   READINESS GATES
    prod-76d8477ff8-5jvv5   1/1     Running   0          3m10s   10.244.2.141   node01   <none>           <none>
    prod-76d8477ff8-bb2t9   0/1     Pending   0          3m10s   <none>         <none>   <none>           <none>
    prod-76d8477ff8-gtj2t   0/1     Pending   0          3m10s   <none>         <none>   <none>           <none>
    prod-76d8477ff8-hwskt   1/1     Running   0          3m10s   10.244.1.175   node02   <none>           <none>
    prod-76d8477ff8-hzz4j   0/1     Pending   0          3m10s   <none>         <none>   <none>           <none>
    • 集群只有2个节点,所以只有2个pod被调度,其余都无法调度,符合预期

      5.5 看下pod的结果

      

    QoS Class:       Burstable
    Node-Selectors:  <none>
    Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                     node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
    Events:
      Type     Reason            Age                  From               Message
      ----     ------            ----                 ----               -------
      Warning  FailedScheduling  29s (x5 over 4m54s)  default-scheduler  0/3 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 2 node(s) didn't match pod affinity/anti-affinity, 2 node(s) didn't satisfy existing pods anti-affinity rules
    • 调度失败原因符合预期
  • 相关阅读:
    Unity3D中的Attribute详解(二)
    Unity3D中的Attribute详解(三)
    利用TortoiseGit对Coding项目进行版本管理
    access 标准表达式中数据类型不匹配 (20091204 15:14:40)
    发布网站失败,提示一个用户控件同时存在于C盘的两个dll中
    取出被正则表达式匹配的值
    Asp.net(C#)数据绑定格式化(转)
    一个关于 asp.net 的简单问题
    ckeditor + ckfinder 上传图片的配置
    [原]可定义的英文小日历
  • 原文地址:https://www.cnblogs.com/wxm-pythoncoder/p/14325014.html
Copyright © 2020-2023  润新知