• 【Kubernetes】kube-dns 持续重启


    kuberbetes部署和启动正常,但是kube-dns持续重启

       使用命令

    kubectl get pods --all-namespaces

      得到结果

    从图中可以看出kube-dns-c7d85897f-jmntw 在不断重启

    使用命令

     kubectl describe pod kube-dns-c7d85897f-jmntw -n kube-system

    得到结果

    Name:           kube-dns-c7d85897f-jmntw
    Namespace:      kube-system
    Node:           172.18.196.2/172.18.196.2
    Start Time:     Tue, 05 Jun 2018 15:28:18 +0800
    Labels:         k8s-app=kube-dns
                    pod-template-hash=738414539
    Annotations:    scheduler.alpha.kubernetes.io/critical-pod=
    Status:         Running
    IP:             172.20.1.9
    Controlled By:  ReplicaSet/kube-dns-c7d85897f
    Containers:
      kubedns:
        Container ID:  docker://516c137ece876a83fc16d26a4fb2c526d8daa75423d1f2371b0b2142bfd2e00a
        Image:         mirrorgooglecontainers/k8s-dns-kube-dns-amd64:1.14.9
        Image ID:      docker-pullable://mirrorgooglecontainers/k8s-dns-kube-dns-amd64@sha256:956ac5f14a388ab9887ae07f36e770852f3f51dcac9e0d193ce8f62cbf066b13
        Ports:         10053/UDP, 10053/TCP, 10055/TCP
        Args:
          --domain=cluster.local.
          --dns-port=10053
          --config-dir=/kube-dns-config
          --v=2
        State:          Running
          Started:      Tue, 05 Jun 2018 15:28:27 +0800
        Ready:          True
        Restart Count:  0
        Limits:
          memory:  170Mi
        Requests:
          cpu:      100m
          memory:   70Mi
        Liveness:   http-get http://:10054/healthcheck/kubedns delay=60s timeout=5s period=10s #success=1 #failure=5
        Readiness:  http-get http://:8081/readiness delay=3s timeout=5s period=10s #success=1 #failure=3
        Environment:
          PROMETHEUS_PORT:  10055
        Mounts:
          /kube-dns-config from kube-dns-config (rw)
          /var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-2ndrd (ro)
      dnsmasq:
        Container ID:  docker://5871fe23f088d23dd342fa7a891be0b5b9f3f879a0902e6633baaa418b2a920f
        Image:         mirrorgooglecontainers/k8s-dns-dnsmasq-nanny-amd64:1.14.9
        Image ID:      docker-pullable://mirrorgooglecontainers/k8s-dns-dnsmasq-nanny-amd64@sha256:38f69fab59a32a490c8c62b035f6aa8dbf9a320686537225adaee16a07856d17
        Ports:         53/UDP, 53/TCP
        Args:
          -v=2
          -logtostderr
          -configDir=/etc/k8s/dns/dnsmasq-nanny
          -restartDnsmasq=true
          --
          -k
          --cache-size=1000
          --log-facility=-
          --server=/cluster.local./127.0.0.1#10053
          --server=/in-addr.arpa/127.0.0.1#10053
          --server=/ip6.arpa/127.0.0.1#10053
        State:          Running
          Started:      Tue, 05 Jun 2018 16:53:08 +0800
        Last State:     Terminated
          Reason:       Error
          Exit Code:    137
          Started:      Tue, 05 Jun 2018 16:43:08 +0800
          Finished:     Tue, 05 Jun 2018 16:53:08 +0800
        Ready:          True
        Restart Count:  9
        Requests:
          cpu:        150m
          memory:     20Mi
        Liveness:     http-get http://:10054/healthcheck/dnsmasq delay=60s timeout=5s period=10s #success=1 #failure=5
        Environment:  <none>
        Mounts:
          /etc/k8s/dns/dnsmasq-nanny from kube-dns-config (rw)
          /var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-2ndrd (ro)
      sidecar:
        Container ID:  docker://bffdb2ace942a0608c2a35e34098d0b43519cce8778371fd96ac549300bf9897
        Image:         mirrorgooglecontainers/k8s-dns-sidecar-amd64:1.14.9
        Image ID:      docker-pullable://mirrorgooglecontainers/k8s-dns-sidecar-amd64@sha256:7caad6678b148c0c74f8b84efa93ddde84e742fa37b25d20ecfdbd43fba74360
        Port:          10054/TCP
        Args:
          --v=2
          --logtostderr
          --probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local.,5,A
          --probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local.,5,A
        State:          Running
          Started:      Tue, 05 Jun 2018 16:53:30 +0800
        Last State:     Terminated
          Reason:       Error
          Exit Code:    2
          Started:      Tue, 05 Jun 2018 16:43:28 +0800
          Finished:     Tue, 05 Jun 2018 16:53:09 +0800
        Ready:          True
        Restart Count:  9
        Requests:
          cpu:        10m
          memory:     20Mi
        Liveness:     http-get http://:10054/metrics delay=60s timeout=5s period=10s #success=1 #failure=5
        Environment:  <none>
        Mounts:
          /var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-2ndrd (ro)
    Conditions:
      Type           Status
      Initialized    True 
      Ready          True 
      PodScheduled   True 
    Volumes:
      kube-dns-config:
        Type:      ConfigMap (a volume populated by a ConfigMap)
        Name:      kube-dns
        Optional:  true
      kube-dns-token-2ndrd:
        Type:        Secret (a volume populated by a Secret)
        SecretName:  kube-dns-token-2ndrd
        Optional:    false
    QoS Class:       Burstable
    Node-Selectors:  <none>
    Tolerations:     CriticalAddonsOnly
    Events:
      Type     Reason     Age               From                   Message
      ----     ------     ----              ----                   -------
      Warning  Unhealthy  8m (x41 over 1h)  kubelet, 172.18.196.2  Liveness probe failed: HTTP probe failed with statuscode: 503
      Warning  Unhealthy  7m (x15 over 1h)  kubelet, 172.18.196.2  Liveness probe failed: Get http://172.20.1.9:10054/healthcheck/kubedns: dial tcp 172.20.1.9:10054: getsockopt: connection refused

    这里有两个warning ,不知道什么原因

    使用命令

    kubectl logs -n kube-system kube-dns-c7d85897f-jmntw -c dnsmasq

    得到结果

    I0605 09:13:08.863881       1 main.go:74] opts: {{/usr/sbin/dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local./127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053] true} /etc/k8s/dns/dnsmasq-nanny 10000000000}
    I0605 09:13:08.863997       1 nanny.go:94] Starting dnsmasq [-k --cache-size=1000 --log-facility=- --server=/cluster.local./127.0.0.1#10053 --server=/in-addr.arpa/127.0.0.1#10053 --server=/ip6.arpa/127.0.0.1#10053]
    I0605 09:13:09.049758       1 nanny.go:119] 
    W0605 09:13:09.049779       1 nanny.go:120] Got EOF from stdout
    I0605 09:13:09.049789       1 nanny.go:116] dnsmasq[17]: started, version 2.78 cachesize 1000
    I0605 09:13:09.049795       1 nanny.go:116] dnsmasq[17]: compile time options: IPv6 GNU-getopt no-DBus no-i18n no-IDN DHCP DHCPv6 no-Lua TFTP no-conntrack ipset auth no-DNSSEC loop-detect inotify
    I0605 09:13:09.049800       1 nanny.go:116] dnsmasq[17]: using nameserver 127.0.0.1#10053 for domain ip6.arpa 
    I0605 09:13:09.049803       1 nanny.go:116] dnsmasq[17]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa 
    I0605 09:13:09.049807       1 nanny.go:116] dnsmasq[17]: using nameserver 127.0.0.1#10053 for domain cluster.local 
    I0605 09:13:09.049811       1 nanny.go:116] dnsmasq[17]: reading /etc/resolv.conf
    I0605 09:13:09.049815       1 nanny.go:116] dnsmasq[17]: using nameserver 127.0.0.1#10053 for domain ip6.arpa 
    I0605 09:13:09.049819       1 nanny.go:116] dnsmasq[17]: using nameserver 127.0.0.1#10053 for domain in-addr.arpa 
    I0605 09:13:09.049823       1 nanny.go:116] dnsmasq[17]: using nameserver 127.0.0.1#10053 for domain cluster.local 
    I0605 09:13:09.049827       1 nanny.go:116] dnsmasq[17]: using nameserver 127.0.1.1#53
    I0605 09:13:09.049836       1 nanny.go:116] dnsmasq[17]: read /etc/hosts - 7 addresses
    I0605 09:21:50.451300       1 nanny.go:116] dnsmasq[17]: Maximum number of concurrent DNS queries reached (max: 150)
    I0605 09:22:00.464414       1 nanny.go:116] dnsmasq[17]: Maximum number of concurrent DNS queries reached (max: 150)

    从这里可以看出nameserver是少了个节点上的nameserver

    其实这里是因为忘了改node节点上的nameserver

    修改 /etc/resolv.conf的nameserver

    改成学校的域名服务器,注意每一个node上都要改,因为不知道dns服务会部署在哪个node上

    然后再重启kubedns的服务

    kubectl delete pod -n kube-system kube-dns-69bf9d5cc9-c68mw

    看到nameserver用了10.8.8.8就可以了

    但是通常集群都有好多个节点,一个一个节点修改太慢了,下面再补充一个利用ansible 修改集群所有节点的nameserver

    root@ht-1:/etc/ansible# ansible all -m lineinfile -a "dest=/etc/resolv.conf regexp='nameserver 127.0.1.1' line='nameserver 10.8.8.8'"
    在这个国度中,必须不停地奔跑,才能使你保持在原地。如果想要寻求突破,就要以两倍现在速度奔跑!
  • 相关阅读:
    ubuntu18.04 常用命令
    docker常用命令
    git
    y7000 intel nvidia 双显卡安装Ubuntu16.04
    linux中fork() 函数详解
    理解GBN协议
    C++ sort
    最近点对-分治
    方便查看 linux/kernel/system_call.s
    方便查看 linux/kernel/asm.s
  • 原文地址:https://www.cnblogs.com/yuxiaoba/p/9141152.html
Copyright © 2020-2023  润新知