• 安装笔记K8S 1.15.0版本遇到的一些错误汇总(部分参考网络上经验,基本可以解决问题)


    1.1.1  Kubelet启动报错:node "kube-master1" not found

    今天在通过kubeadm安装kubernetes v1.13.1集群时,发现有一台机器始终安装不成功,总是在启动kubelet时失败,报错信息如下:

    问题现象:

    [root@master taoweizhong]# systemctl status kubelet -l

    ● kubelet.service - kubelet: The Kubernetes Node Agent

       Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)

      Drop-In: /usr/lib/systemd/system/kubelet.service.d

               └─10-kubeadm.conf

       Active: active (running) since Wed 2019-07-31 05:37:08 PDT; 2min 8s ago

         Docs: https://kubernetes.io/docs/

     Main PID: 6161 (kubelet)

        Tasks: 16

       Memory: 82.0M

       CGroup: /system.slice/kubelet.service

               └─6161 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.1 --fail-swap-on=false

    Jul 31 05:39:16 master kubelet[6161]: E0731 05:39:16.008683    6161 kubelet.go:2248] node "master" not found

    Jul 31 05:39:16 master kubelet[6161]: E0731 05:39:16.109488    6161 kubelet.go:2248] node "master" not found

    Jul 31 05:39:16 master kubelet[6161]: E0731 05:39:16.210368    6161 kubelet.go:2248] node "master" not found

    初始化kubelet时设置的master IP错误,导致kubelet无法连接master的API Server上,检查kubelet.conf配置文件,发现server: https:// 192.168.135.139:6443这项配置非当前机器的IP(原因是我使用动态IP导致):


     [root@master taoweizhong]#  cat /etc/kubernetes/kubelet.conf

    apiVersion: v1

    clusters:

    - cluster:

        certificate-authority-data:    server: https://192.168.135.143:6443

      name: kubernetes

    contexts:

    - context:

        cluster: kubernetes

        user: system:node:master

      name: system:node:master@kubernetes

    current-context: system:node:master@kubernetes

    修改如下文件中配置正确的IP地址

    [root@master kubernetes]# vim admin.conf

    [root@master kubernetes]# vim controller-manager.conf

    [root@master kubernetes]# vim kubelet.conf

    [root@master kubernetes]# vim scheduler.conf

    1.1.2  ETCD启动错误定位

    ETCD启动错误定位

    Error: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 127.0.0.1:4001: getsockopt: connection refused

    配置docker网络flannel时,配置etcd的key的时候出现以下错误

    Error:  client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 127.0.0.1:4001: getsockopt: connection refused
    ; error #1: dial tcp 127.0.0.1:2379: getsockopt: connection refused

    error #0: dial tcp 127.0.0.1:4001: getsockopt: connection refused
    error #1: dial tcp 127.0.0.1:2379: getsockopt: connection refused

    解决办法:

    修改etcd的配置文件:

    vim /etc/etcd/etcd.conf

    在  第6行,后面配置http://127.0.0.1:2379,与本机自己进行通信,  ETCD_LISTEN_CLIENT_URLS="http:// 192.168.135.143:2379,http://127.0.0.1:2379"

    然后重启etcd服务

    1.1.3  NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

    Jul 31 05:59:03 master kubelet[22561]: W0731 05:59:03.364199   22561 cni.go:213] Unable to update cni config: No networks found in /etc/cni/net.d

    Jul 31 05:59:04 master kubelet[22561]: E0731 05:59:04.542692   22561 kubelet.go:2169] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized

    新版的K8S中的flannel.yaml文件中要注意的细节

    部署flannel作为k8s中的网络插件,yaml文件都大小同异。

    但在要注意以下细节。

    以前,只需要前面master判断。

    现在也需要有not-ready状态了。

    tolerations:
    - key: node-role.kubernetes.io/master
    operator: Exists
    effect: NoSchedule
    - key: node.kubernetes.io/not-ready
    operator: Exists
    effect: NoSchedule

    1.1.4  主节点The connection to the server localhost:8080 was refused - did you specify the right host or port?

    root@master taoweizhong]# kubectl get cs

    The connection to the server localhost:8080 was refused - did you specify the right host or port?

    原因:kubenetes master没有与本机绑定,集群初始化的时候没有设置

    解决办法:执行以下命令   export KUBECONFIG=/etc/kubernetes/admin.conf

    /etc/ admin.conf这个文件主要是集群初始化的时候用来传递参数的

    1.1.5   从节点The connection to the server localhost:8080 was refused - did you specify the right host or port?

    在Kubernetes的从节点上运行命令kubectl出现了如下错误

    # kubectl get pod

    The connection to the server localhost:8080 was refused - did you specify the right host or port?

    问题原因是kubectl命令需使用kubernetes-admin来运行,解决方法如下,将主节点中的/etc/kubernetes/admin.conf文件拷贝到从节点相同目录下,然后配置环境变量:

    echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile

    可生效

    source ~/.bash_profile

    [root@master kubernetes]# scp admin.conf root@192.168.135.130:/home/taoweizhong

    现在可以用kubectl get node 查看有多少节点了  , 如果想在node节点上使用kubectl 命令需要把 k8s-master 上 /etc/kubernetes/admin.conf 文件copy到几点机器上并使用 export KUBECONFIG=/etc/kubernetes/admin.conf, 这个在初始化的时候已经提到, 可以用scp 命令拷贝

    1.1.6  yum安装提示Another app is currently holding the yum lock; waiting for it to exit...

    在Linux系统中使用yum安装软件时,提示yum处于锁定状态

    1 Another app is currently holding the yum lock; waiting for it to exit...

    可通过强制关闭yum进程

    1 #rm -f /var/run/yum.pid

    1.1.7  Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused

    t@slave2 taoweizhong]# kubectl apply -f kube-flannel.yml

    unable to recognize "kube-flannel.yml": Get http://localhost:8080/api?timeout=32s: dial tcp [::1]:8080: connect: connection refused

    1.1.8  从节点Unit kubelet.service entered failed state.

    kubelet.service - kubelet: The Kubernetes Node Agent

       Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)

      Drop-In: /usr/lib/systemd/system/kubelet.service.d

               └─10-kubeadm.conf

       Active: activating (auto-restart) (Result: exit-code) since Thu 2019-08-01 07:53:40 PDT; 6s ago

         Docs: https://kubernetes.io/docs/

      Process: 23003 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=255)

     Main PID: 23003 (code=exited, status=255)

    Aug 01 07:53:40 slave2 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a

    Aug 01 07:53:40 slave2 systemd[1]: Unit kubelet.service entered failed state.

    Aug 01 07:53:40 slave2 systemd[1]: kubelet.service failed.

    [root@slave2 taoweizhong]#

    1.1.9  从节点 Error response from daemon: error creating overlay mount to

    Aug 02 06:13:59 slave2 kubelet[15376]: E0802 06:13:59.567319   15376 pod_workers.go:190] Error syncing pod 6286750b-83ea-4c93-a895-f03a7d3ac8f6 ("kube-proxy-6m2hd_kube-system(6286750b-83ea-4c93-a895-f03a7d3ac8f6)"), skipping: failed to "CreatePodSandbox" for "kube-proxy-6m2hd_kube-system(6286750b-83ea-4c93-a895-f03a7d3ac8f6)" with CreatePodSandboxError: "CreatePodSandbox for pod "kube-proxy-6m2hd_kube-system(6286750b-83ea-4c93-a895-f03a7d3ac8f6)" failed: rpc error: code = Unknown desc = failed to create a sandbox for pod "kube-proxy-6m2hd": Error response from daemon: error creating overlay mount to /var/lib/docker/overlay2/458f364c092810a4ce67b80279af2f9de926d5caf0d639c46130e6876b2aca59-init/merged: no such file or directory"

    在网上搜索一番后,一个可行的方案如下(改变storage driver类型, 禁用selinux):

    停止docker服务

    systemctl stop docker

    清理镜像 rm -rf /var/lib/docker

    修改存储类型

    vi /etc/sysconfig/docker-storage

    DOCKER_STORAGE_OPTIONS="--storage-driver overlay"

    1.1.10  running with swap on is not supported. Please disable swap

    1 [ERROR Swap]: running with swap on is not supported. Please disable swap

    2 [ERROR SystemVerification]: missing cgroups: memory

    3 [ERROR ImagePull]: failed to pull image [k8s.gcr.io/kube-apiserver-amd64:v1.12.2]

    建议不使用: swapoff -a

    注释掉/etc/fstab下swap挂载后安装成功

    1.1.11  Container "02599322c0f1ad7113d60d19836b1e91a83a4bdb5f5fe7c6adabee409f88ddd4" not found in pod's containers

    重置kubernetes服务,重置网络。删除网络配置,link

    kubeadm reset

    systemctl stop kubelet

    systemctl stop docker

    rm -rf /var/lib/cni/

    rm -rf /var/lib/kubelet/*

    rm -rf /etc/cni/

    ifconfig cni0 down

    ifconfig flannel.1 down

    ifconfig docker0 down

    ip link delete cni0

    ip link delete flannel.1

    systemctl start docker

  • 相关阅读:
    如何改变拖动时鼠标悬浮样式
    Nginx的server为0.0.0.0/0.0.0.1的作用?
    redis的lua脚本拓展,返回nil及其判断
    lua异常捕获
    nginx配置及常见问题
    centos安装postgresql-10及操作
    23种设计模式
    php的function() use($args)用法
    lua中self.__index = self是什么意思?
    lor实践
  • 原文地址:https://www.cnblogs.com/taoweizhong/p/11545953.html
Copyright © 2020-2023  润新知