• 二进制方式安装Kubernetes 1.14.2高可用详细步骤


    00.组件版本和配置策略

    组件版本

    • Kubernetes 1.14.2
    • Docker 18.09.6-ce
    • Etcd 3.3.13
    • Flanneld 0.11.0
    • 插件:
      • Coredns
      • Dashboard
      • Metrics-server
      • EFK (elasticsearch、fluentd、kibana)
    • 镜像仓库:
      • docker registry
      • harbor

    主要配置策略

    kube-apiserver:

    • 使用节点本地 nginx 4 层透明代理实现高可用;
    • 关闭非安全端口 8080 和匿名访问;
    • 在安全端口 6443 接收 https 请求;
    • 严格的认证和授权策略 (x509、token、RBAC);
    • 开启 bootstrap token 认证,支持 kubelet TLS bootstrapping;
    • 使用 https 访问 kubelet、etcd,加密通信;

    kube-controller-manager:

    • 3 节点高可用;
    • 关闭非安全端口,在安全端口 10252 接收 https 请求;
    • 使用 kubeconfig 访问 apiserver 的安全端口;
    • 自动 approve kubelet 证书签名请求 (CSR),证书过期后自动轮转;
    • 各 controller 使用自己的 ServiceAccount 访问 apiserver;

    kube-scheduler:

    • 3 节点高可用;
    • 使用 kubeconfig 访问 apiserver 的安全端口;

    kubelet:

    • 使用 kubeadm 动态创建 bootstrap token,而不是在 apiserver 中静态配置;
    • 使用 TLS bootstrap 机制自动生成 client 和 server 证书,过期后自动轮转;
    • 在 KubeletConfiguration 类型的 JSON 文件配置主要参数;
    • 关闭只读端口,在安全端口 10250 接收 https 请求,对请求进行认证和授权,拒绝匿名访问和非授权访问;
    • 使用 kubeconfig 访问 apiserver 的安全端口;

    kube-proxy:

    • 使用 kubeconfig 访问 apiserver 的安全端口;
    • 在 KubeProxyConfiguration 类型的 JSON 文件配置主要参数;
    • 使用 ipvs 代理模式;

    集群插件:

    • DNS:使用功能、性能更好的 coredns;
    • Dashboard:支持登录认证;
    • Metric:metrics-server,使用 https 访问 kubelet 安全端口;
    • Log:Elasticsearch、Fluend、Kibana;
    • Registry 镜像库:docker-registry、harbor;

    01.系统初始化和全局变量

    集群机器

    • kube-node1:192.168.75.110
    • kube-node2:192.168.75.111
    • kube-node3:192.168.75.112

    注意:

    1. 本文档中的 etcd 集群、master 节点、worker 节点均使用这三台机器;
    2. 需要在所有机器上执行本文档的初始化命令;
    3. 需要使用 root 账号执行这些命令;
    4. 如果没有特殊指明,本文档的所有操作均在 kube-node1 节点上执行,然后远程分发文件和执行命令;

    主机名

    设置永久主机名称,然后重新登录:

    $ sudo hostnamectl set-hostname kube-node1 # 将 kube-node1 替换为当前主机名
    
    • 设置的主机名保存在 /etc/hostname 文件中;

    如果 DNS 不支持解析主机名称,则需要修改每台机器的 /etc/hosts 文件,添加主机名和 IP 的对应关系:

    cat >> /etc/hosts <<EOF
    192.168.75.110 kube-node1
    192.168.75.111 kube-node2
    192.168.75.112 kube-node3
    EOF
    

    添加 docker 账户

    在每台机器上添加 docker 账户:

    useradd -m docker
    

    无密码 ssh 登录其它节点

    如果没有特殊指明,本文档的所有操作均在 kube-node1 节点上执行,然后远程分发文件和执行命令。

    设置 kube-node1 的 root 账户可以无密码登录所有节点

    ssh-keygen -t rsa
    ssh-copy-id root@kube-node1
    ssh-copy-id root@kube-node2
    ssh-copy-id root@kube-node3
    

    更新 PATH 变量

    将可执行文件目录添加到 PATH 环境变量中:

    mkdir -p /opt/k8s/bin
    echo 'PATH=/opt/k8s/bin:$PATH' >>/root/.bashrc
    source /root/.bashrc
    

    安装依赖包

    在每台机器上安装依赖包:

    CentOS:

    yum install -y epel-release
    yum install -y conntrack ntpdate ntp ipvsadm ipset jq iptables curl sysstat libseccomp wget
    

    Ubuntu:

    apt-get install -y conntrack ipvsadm ntp ipset jq iptables curl sysstat libseccomp
    
    • ipvs 依赖 ipset;
    • ntp 保证各机器系统时间同步;

    关闭防火墙

    在每台机器上关闭防火墙,清理防火墙规则,设置默认转发策略:

    systemctl stop firewalld
    systemctl disable firewalld
    iptables -F && iptables -X && iptables -F -t nat && iptables -X -t nat
    iptables -P FORWARD ACCEPT
    

    关闭 swap 分区

    如果开启了 swap 分区,kubelet 会启动失败(可以通过将参数 --fail-swap-on 设置为 false 来忽略 swap on),故需要在每台机器上关闭 swap 分区。同时注释 /etc/fstab 中相应的条目,防止开机自动挂载 swap 分区:

    swapoff -a
    sed -i '/ swap / s/^(.*)$/#1/g' /etc/fstab
    

    关闭 SELinux

    关闭 SELinux,否则后续 K8S 挂载目录时可能报错 Permission denied

    setenforce 0
    sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config
    

    关闭 dnsmasq(可选)

    linux 系统开启了 dnsmasq 后(如 GUI 环境),将系统 DNS Server 设置为 127.0.0.1,这会导致 docker 容器无法解析域名,需要关闭它:

    systemctl stop dnsmasq
    systemctl disable dnsmasq
    

    加载内核模块

    modprobe ip_vs_rr
    modprobe br_netfilter
    

    优化内核参数

    cat > kubernetes.conf <<EOF
    net.bridge.bridge-nf-call-iptables=1
    net.bridge.bridge-nf-call-ip6tables=1
    net.ipv4.ip_forward=1
    net.ipv4.tcp_tw_recycle=0
    # 禁止使用 swap 空间,只有当系统 OOM 时才允许使用它
    vm.swappiness=0
    # 不检查物理内存是否够用
    vm.overcommit_memory=1
    # 开启 OOM
    vm.panic_on_oom=0
    fs.inotify.max_user_instances=8192
    fs.inotify.max_user_watches=1048576
    fs.file-max=52706963
    fs.nr_open=52706963
    net.ipv6.conf.all.disable_ipv6=1
    net.netfilter.nf_conntrack_max=2310720
    vm.max_map_count=655360
    EOF
    cp kubernetes.conf  /etc/sysctl.d/kubernetes.conf
    sysctl -p /etc/sysctl.d/kubernetes.conf
    
    • 必须关闭 tcp_tw_recycle,否则和 NAT 冲突,会导致服务不通;
    • 关闭 IPV6,防止触发 docker BUG;

    设置系统时区

    # 调整系统 TimeZone
    timedatectl set-timezone Asia/Shanghai
    
    # 将当前的 UTC 时间写入硬件时钟
    timedatectl set-local-rtc 0
    
    # 重启依赖于系统时间的服务
    systemctl restart rsyslog
    systemctl restart crond
    

    关闭无关的服务

    systemctl stop postfix && systemctl disable postfix
    

    设置 rsyslogd 和 systemd journald

    systemd 的 journald 是 Centos 7 缺省的日志记录工具,它记录了所有系统、内核、Service Unit 的日志。

    相比 systemd,journald 记录的日志有如下优势:

    1. 可以记录到内存或文件系统;(默认记录到内存,对应的位置为 /run/log/jounal);
    2. 可以限制占用的磁盘空间、保证磁盘剩余空间;
    3. 可以限制日志文件大小、保存的时间;

    journald 默认将日志转发给 rsyslog,这会导致日志写了多份,/var/log/messages 中包含了太多无关日志,不方便后续查看,同时也影响系统性能。

    mkdir /var/log/journal # 持久化保存日志的目录
    mkdir /etc/systemd/journald.conf.d
    cat > /etc/systemd/journald.conf.d/99-prophet.conf <<EOF
    [Journal]
    # 持久化保存到磁盘
    Storage=persistent
    
    # 压缩历史日志
    Compress=yes
    
    SyncIntervalSec=5m
    RateLimitInterval=30s
    RateLimitBurst=1000
    
    # 最大占用空间 10G
    SystemMaxUse=10G
    
    # 单日志文件最大 200M
    SystemMaxFileSize=200M
    
    # 日志保存时间 2 周
    MaxRetentionSec=2week
    
    # 不将日志转发到 syslog
    ForwardToSyslog=no
    EOF
    systemctl restart systemd-journald
    

    创建目录

    创建目录:

    mkdir -p  /opt/k8s/{bin,work} /etc/{kubernetes,etcd}/cert
    

    升级内核

    CentOS 7.x 系统自带的 3.10.x 内核存在一些 Bugs,导致运行的 Docker、Kubernetes 不稳定,例如:

    1. 高版本的 docker(1.13 以后) 启用了 3.10 kernel 实验支持的 kernel memory account 功能(无法关闭),当节点压力大如频繁启动和停止容器时会导致 cgroup memory leak;
    2. 网络设备引用计数泄漏,会导致类似于报错:"kernel:unregister_netdevice: waiting for eth0 to become free. Usage count = 1";

    解决方案如下:

    1. 升级内核到 4.4.X 以上;
    2. 或者,手动编译内核,disable CONFIG_MEMCG_KMEM 特性;
    3. 或者,安装修复了该问题的 Docker 18.09.1 及以上的版本。但由于 kubelet 也会设置 kmem(它 vendor 了 runc),所以需要重新编译 kubelet 并指定 GOFLAGS="-tags=nokmem";
    git clone --branch v1.14.1 --single-branch --depth 1 https://github.com/kubernetes/kubernetes
    cd kubernetes
    KUBE_GIT_VERSION=v1.14.1 ./build/run.sh make kubelet GOFLAGS="-tags=nokmem"
    

    这里采用升级内核的解决办法:

    rpm -Uvh http://www.elrepo.org/elrepo-release-7.0-3.el7.elrepo.noarch.rpm
    # 安装完成后检查 /boot/grub2/grub.cfg 中对应内核 menuentry 中是否包含 initrd16 配置,如果没有,再安装一次!
    yum --enablerepo=elrepo-kernel install -y kernel-lt
    # 设置开机从新内核启动
    grub2-set-default 0
    

    安装内核源文件(可选,在升级完内核并重启机器后执行):

    # yum erase kernel-headers
    yum --enablerepo=elrepo-kernel install kernel-lt-devel-$(uname -r) kernel-lt-headers-$(uname -r)
    

    关闭 NUMA

    cp /etc/default/grub{,.bak}
    vim /etc/default/grub # 在 GRUB_CMDLINE_LINUX 一行添加 `numa=off` 参数,如下所示:
    diff /etc/default/grub.bak /etc/default/grub
    6c6
    < GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=centos/root rhgb quiet"
    ---
    > GRUB_CMDLINE_LINUX="crashkernel=auto rd.lvm.lv=centos/root rhgb quiet numa=off"
    

    重新生成 grub2 配置文件:

    cp /boot/grub2/grub.cfg{,.bak}
    grub2-mkconfig -o /boot/grub2/grub.cfg
    

    分发集群环境变量定义脚本

    后续使用的环境变量都定义在文件 environment.sh 中,请根据自己的机器、网络情况修改。然后,把它拷贝到所有节点的 /opt/k8s/bin 目录:

    #!/usr/bin/bash
    
    # 生成 EncryptionConfig 所需的加密 key
    export ENCRYPTION_KEY=$(head -c 32 /dev/urandom | base64)
    
    # 集群各机器 IP 数组
    export NODE_IPS=(192.168.75.110 192.168.75.111 192.168.75.112)
    
    # 集群各 IP 对应的主机名数组
    export NODE_NAMES=(kube-node1 kube-node2 kube-node3)
    
    # etcd 集群服务地址列表
    export ETCD_ENDPOINTS="https://192.168.75.110:2379,https://192.168.75.111:2379,https://192.168.75.112:2379"
    
    # etcd 集群间通信的 IP 和端口
    export ETCD_NODES="kube-node1=https://192.168.75.110:2380,kube-node2=https://192.168.75.111:2380,kube-node3=https://192.168.75.112:2380"
    
    # kube-apiserver 的反向代理(kube-nginx)地址端口
    export KUBE_APISERVER="https://127.0.0.1:8443"
    
    # 节点间互联网络接口名称
    export VIP_IF="ens33"
    
    # etcd 数据目录
    export ETCD_DATA_DIR="/data/k8s/etcd/data"
    
    # etcd WAL 目录,建议是 SSD 磁盘分区,或者和 ETCD_DATA_DIR 不同的磁盘分区
    export ETCD_WAL_DIR="/data/k8s/etcd/wal"
    
    # k8s 各组件数据目录
    export K8S_DIR="/data/k8s/k8s"
    
    # docker 数据目录
    export DOCKER_DIR="/data/k8s/docker"
    
    ## 以下参数一般不需要修改
    
    # TLS Bootstrapping 使用的 Token,可以使用命令 head -c 16 /dev/urandom | od -An -t x | tr -d ' ' 生成
    BOOTSTRAP_TOKEN="41f7e4ba8b7be874fcff18bf5cf41a7c"
    
    # 最好使用 当前未用的网段 来定义服务网段和 Pod 网段
    
    # 服务网段,部署前路由不可达,部署后集群内路由可达(kube-proxy 保证)
    SERVICE_CIDR="10.254.0.0/16"
    
    # Pod 网段,建议 /16 段地址,部署前路由不可达,部署后集群内路由可达(flanneld 保证)
    CLUSTER_CIDR="172.30.0.0/16"
    
    # 服务端口范围 (NodePort Range)
    export NODE_PORT_RANGE="30000-32767"
    
    # flanneld 网络配置前缀
    export FLANNEL_ETCD_PREFIX="/kubernetes/network"
    
    # kubernetes 服务 IP (一般是 SERVICE_CIDR 中第一个IP)
    export CLUSTER_KUBERNETES_SVC_IP="10.254.0.1"
    
    # 集群 DNS 服务 IP (从 SERVICE_CIDR 中预分配)
    export CLUSTER_DNS_SVC_IP="10.254.0.2"
    
    # 集群 DNS 域名(末尾不带点号)
    export CLUSTER_DNS_DOMAIN="cluster.local"
    
    # 将二进制目录 /opt/k8s/bin 加到 PATH 中
    export PATH=/opt/k8s/bin:$PATH
    
    source environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        scp environment.sh root@${node_ip}:/opt/k8s/bin/
        ssh root@${node_ip} "chmod +x /opt/k8s/bin/*"
      done
    
    # 这个脚本使用的是用户名@ip的形式,结果还是需要输入响应的用户密码才行,考虑上前面步骤配置的无密码ssh登陆其他节点,这里可以考虑改换成hostname的形式进行
    source environment.sh
    for node_name in ${NODE_NAMES[@]}
      do
        echo ">>> ${node_name}"
        scp environment.sh root@${node_name}:/opt/k8s/bin/
        ssh root@${node_name} "chmod +x /opt/k8s/bin/*"
      done
    
    1. 参考

      1. 系统内核相关参数参考:https://docs.openshift.com/enterprise/3.2/admin_guide/overcommit.html
      2. 3.10.x 内核 kmem bugs 相关的讨论和解决办法:
        1. https://github.com/kubernetes/kubernetes/issues/61937
        2. https://support.mesosphere.com/s/article/Critical-Issue-KMEM-MSPH-2018-0006
        3. https://pingcap.com/blog/try-to-fix-two-linux-kernel-bugs-while-testing-tidb-operator-in-k8s/

    02.创建 CA 证书和秘钥

    为确保安全,kubernetes 系统各组件需要使用 x509 证书对通信进行加密和认证。

    CA (Certificate Authority) 是自签名的根证书,用来签名后续创建的其它证书。

    本文档使用 CloudFlare 的 PKI 工具集 cfssl 创建所有证书。

    注意:如果没有特殊指明,本文档的所有操作均在 kube-node1节点上执行,然后远程分发文件和执行命令。

    安装 cfssl 工具集

    mkdir -p /opt/k8s/cert && cd /opt/k8s
    wget https://pkg.cfssl.org/R1.2/cfssl_linux-amd64
    mv cfssl_linux-amd64 /opt/k8s/bin/cfssl
    
    wget https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64
    mv cfssljson_linux-amd64 /opt/k8s/bin/cfssljson
    
    wget https://pkg.cfssl.org/R1.2/cfssl-certinfo_linux-amd64
    mv cfssl-certinfo_linux-amd64 /opt/k8s/bin/cfssl-certinfo
    
    chmod +x /opt/k8s/bin/*
    export PATH=/opt/k8s/bin:$PATH
    

    创建根证书 (CA)

    CA 证书是集群所有节点共享的,只需要创建一个 CA 证书,后续创建的所有证书都由它签名。

    创建配置文件

    CA 配置文件用于配置根证书的使用场景 (profile) 和具体参数 (usage,过期时间、服务端认证、客户端认证、加密等),后续在签名其它证书时需要指定特定场景。

    cd /opt/k8s/work
    cat > ca-config.json <<EOF
    {
      "signing": {
        "default": {
          "expiry": "87600h"
        },
        "profiles": {
          "kubernetes": {
            "usages": [
                "signing",
                "key encipherment",
                "server auth",
                "client auth"
            ],
            "expiry": "87600h"
          }
        }
      }
    }
    EOF
    
    • signing:表示该证书可用于签名其它证书,生成的 ca.pem 证书中 CA=TRUE
    • server auth:表示 client 可以用该该证书对 server 提供的证书进行验证;
    • client auth:表示 server 可以用该该证书对 client 提供的证书进行验证;

    创建证书签名请求文件

    cd /opt/k8s/work
    cat > ca-csr.json <<EOF
    {
      "CN": "kubernetes",
      "key": {
        "algo": "rsa",
        "size": 2048
      },
      "names": [
        {
          "C": "CN",
          "ST": "BeiJing",
          "L": "BeiJing",
          "O": "k8s",
          "OU": "4Paradigm"
        }
      ],
      "ca": {
        "expiry": "876000h"
     }
    }
    EOF
    
    • CN:Common Name,kube-apiserver 从证书中提取该字段作为请求的用户名 (User Name),浏览器使用该字段验证网站是否合法;
    • O:Organization,kube-apiserver 从证书中提取该字段作为请求用户所属的组 (Group)
    • kube-apiserver 将提取的 User、Group 作为 RBAC 授权的用户标识;

    生成 CA 证书和私钥

    cd /opt/k8s/work
    cfssl gencert -initca ca-csr.json | cfssljson -bare ca
    ls ca*
    

    分发证书文件

    将生成的 CA 证书、秘钥文件、配置文件拷贝到所有节点/etc/kubernetes/cert 目录下:

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh root@${node_ip} "mkdir -p /etc/kubernetes/cert"
        scp ca*.pem ca-config.json root@${node_ip}:/etc/kubernetes/cert
      done
    
    # 这个脚本使用的是用户名@ip的形式,结果还是需要输入响应的用户密码才行,考虑上前面步骤配置的无密码ssh登陆其他节点,这里可以考虑改换成hostname的形式进行
    
    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    for node_name in ${NODE_NAMES[@]}
      do
        echo ">>> ${node_name}"
        ssh root@${node_name} "mkdir -p /etc/kubernetes/cert"
        scp ca*.pem ca-config.json root@${node_name}:/etc/kubernetes/cert
      done
    

    参考

    1.  各种 CA 证书类型:https://github.com/kubernetes-incubator/apiserver-builder/blob/master/docs/concepts/auth.md
    

    03.部署 kubectl 命令行工具

    本文档介绍安装和配置 kubernetes 集群的命令行管理工具 kubectl 的步骤。

    kubectl 默认从 ~/.kube/config 文件读取 kube-apiserver 地址和认证信息,如果没有配置,执行 kubectl 命令时可能会出错:

    $ kubectl get pods
    The connection to the server localhost:8080 was refused - did you specify the right host or port?
    

    注意:

    1. 如果没有特殊指明,本文档的所有操作均在 kube-node1节点上执行,然后远程分发文件和执行命令;
    2. 本文档只需要部署一次,生成的 kubeconfig 文件是通用的,可以拷贝到需要执行 kubectl 命令的机器,重命名为 ~/.kube/config

    下载和分发 kubectl 二进制文件

    下载和解压:

    cd /opt/k8s/work
    # 使用迅雷下载后上传
    wget https://dl.k8s.io/v1.14.2/kubernetes-client-linux-amd64.tar.gz
    tar -xzvf kubernetes-client-linux-amd64.tar.gz
    

    分发到所有使用 kubectl 的节点:

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        scp kubernetes/client/bin/kubectl root@${node_ip}:/opt/k8s/bin/
        ssh root@${node_ip} "chmod +x /opt/k8s/bin/*"
      done
    
    
     # 使用主机名的脚本
    source /opt/k8s/bin/environment.sh
    for node_name in ${NODE_NAMES[@]}
      do
        echo ">>> ${node_name}"
        scp kubernetes/client/bin/kubectl root@${node_name}:/opt/k8s/bin/
        ssh root@${node_name} "chmod +x /opt/k8s/bin/*"
      done
    

    创建 admin 证书和私钥

    kubectl 与 apiserver https 安全端口通信,apiserver 对提供的证书进行认证和授权。

    kubectl 作为集群的管理工具,需要被授予最高权限,这里创建具有最高权限的 admin 证书。

    创建证书签名请求:

    cd /opt/k8s/work
    cat > admin-csr.json <<EOF
    {
      "CN": "admin",
      "hosts": [],
      "key": {
        "algo": "rsa",
        "size": 2048
      },
      "names": [
        {
          "C": "CN",
          "ST": "BeiJing",
          "L": "BeiJing",
          "O": "system:masters",
          "OU": "4Paradigm"
        }
      ]
    }
    EOF
    
    • O 为 system:masters,kube-apiserver 收到该证书后将请求的 Group 设置为 system:masters;
    • 预定义的 ClusterRoleBinding cluster-admin 将 Group system:masters 与 Role cluster-admin 绑定,该 Role 授予所有 API的权限;
    • 该证书只会被 kubectl 当做 client 证书使用,所以 hosts 字段为空;

    生成证书和私钥:

    cd /opt/k8s/work
    cfssl gencert -ca=/opt/k8s/work/ca.pem 
      -ca-key=/opt/k8s/work/ca-key.pem 
      -config=/opt/k8s/work/ca-config.json 
      -profile=kubernetes admin-csr.json | cfssljson -bare admin
    ls admin*
    

    创建 kubeconfig 文件

    kubeconfig 为 kubectl 的配置文件,包含访问 apiserver 的所有信息,如 apiserver 地址、CA 证书和自身使用的证书;

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    
    # 设置集群参数
    kubectl config set-cluster kubernetes 
      --certificate-authority=/opt/k8s/work/ca.pem 
      --embed-certs=true 
      --server=${KUBE_APISERVER} 
      --kubeconfig=kubectl.kubeconfig
    
    # 设置客户端认证参数
    kubectl config set-credentials admin 
      --client-certificate=/opt/k8s/work/admin.pem 
      --client-key=/opt/k8s/work/admin-key.pem 
      --embed-certs=true 
      --kubeconfig=kubectl.kubeconfig
    
    # 设置上下文参数
    kubectl config set-context kubernetes 
      --cluster=kubernetes 
      --user=admin 
      --kubeconfig=kubectl.kubeconfig
    
    # 设置默认上下文
    kubectl config use-context kubernetes --kubeconfig=kubectl.kubeconfig
    
    • --certificate-authority:验证 kube-apiserver 证书的根证书;
    • --client-certificate--client-key:刚生成的 admin 证书和私钥,连接 kube-apiserver 时使用;
    • --embed-certs=true:将 ca.pem 和 admin.pem 证书内容嵌入到生成的 kubectl.kubeconfig 文件中(不加时,写入的是证书文件路径,后续拷贝 kubeconfig 到其它机器时,还需要单独拷贝证书文件,不方便。);

    分发 kubeconfig 文件

    分发到所有使用 kubectl 命令的节点:

    cd /opt/k8s/work
    
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh root@${node_ip} "mkdir -p ~/.kube"
        scp kubectl.kubeconfig root@${node_ip}:~/.kube/config
      done
    
     # 使用主机名的脚本
    source /opt/k8s/bin/environment.sh
    for node_name in ${NODE_NAMES[@]}
      do
        echo ">>> ${node_name}"
        ssh root@${node_name} "mkdir -p ~/.kube"
        scp kubectl.kubeconfig root@${node_name}:~/.kube/config
      done
    
    • 保存的文件名为 ~/.kube/config

    04.部署 etcd 集群

    etcd 是基于 Raft 的分布式 key-value 存储系统,由 CoreOS 开发,常用于服务发现、共享配置以及并发控制(如 leader 选举、分布式锁等)。kubernetes 使用 etcd 存储所有运行数据。

    本文档介绍部署一个三节点高可用 etcd 集群的步骤:

    • 下载和分发 etcd 二进制文件;
    • 创建 etcd 集群各节点的 x509 证书,用于加密客户端(如 etcdctl) 与 etcd 集群、etcd 集群之间的数据流;
    • 创建 etcd 的 systemd unit 文件,配置服务参数;
    • 检查集群工作状态;

    etcd 集群各节点的IP和 名称 如下:

    • 192.168.75.110 kube-node1
    • 192.168.75.111 kube-node2
    • 192.168.75.112 kube-node3

    注意:如果没有特殊指明,本文档的所有操作均在 kube-node1 节点上执行,然后远程分发文件和执行命令。

    下载和分发 etcd 二进制文件

    到 etcd 的 release 页面 下载最新版本的发布包:

    cd /opt/k8s/work
    wget https://github.com/coreos/etcd/releases/download/v3.3.13/etcd-v3.3.13-linux-amd64.tar.gz
    tar -xvf etcd-v3.3.13-linux-amd64.tar.gz
    

    分发二进制文件到集群所有节点:

    cd /opt/k8s/work
    
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        scp etcd-v3.3.13-linux-amd64/etcd* root@${node_ip}:/opt/k8s/bin
        ssh root@${node_ip} "chmod +x /opt/k8s/bin/*"
      done
    
    # 使用主机名的脚本
    source /opt/k8s/bin/environment.sh
    for node_name in ${NODE_NAMES[@]}
      do
        echo ">>> ${node_name}"
        scp etcd-v3.3.13-linux-amd64/etcd* root@${node_name}:/opt/k8s/bin
        ssh root@${node_name} "chmod +x /opt/k8s/bin/*"
      done
    

    创建 etcd 证书和私钥

    创建证书签名请求:

    cd /opt/k8s/work
    
    cat > etcd-csr.json <<EOF
    {
      "CN": "etcd",
      "hosts": [
        "127.0.0.1",
        "192.168.75.110",
        "192.168.75.111",
        "192.168.75.112"
      ],
      "key": {
        "algo": "rsa",
        "size": 2048
      },
      "names": [
        {
          "C": "CN",
          "ST": "BeiJing",
          "L": "BeiJing",
          "O": "k8s",
          "OU": "4Paradigm"
        }
      ]
    }
    EOF
    
    • hosts 字段指定授权使用该证书的 etcd 节点 IP 或域名列表,需要将 etcd 集群的三个节点 IP 都列在其中;

    生成证书和私钥:

    cd /opt/k8s/work
    cfssl gencert -ca=/opt/k8s/work/ca.pem 
        -ca-key=/opt/k8s/work/ca-key.pem 
        -config=/opt/k8s/work/ca-config.json 
        -profile=kubernetes etcd-csr.json | cfssljson -bare etcd
    ls etcd*pem
    

    分发生成的证书和私钥到各 etcd 节点:

    cd /opt/k8s/work
    
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh root@${node_ip} "mkdir -p /etc/etcd/cert"
        scp etcd*.pem root@${node_ip}:/etc/etcd/cert/
      done
    
    # 使用主机名脚本
    source /opt/k8s/bin/environment.sh
    for node_name in ${NODE_NAMES[@]}
      do
        echo ">>> ${node_name}"
        ssh root@${node_name} "mkdir -p /etc/etcd/cert"
        scp etcd*.pem root@${node_name}:/etc/etcd/cert/
      done
    

    创建 etcd 的 systemd unit 模板文件

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    cat > etcd.service.template <<EOF
    [Unit]
    Description=Etcd Server
    After=network.target
    After=network-online.target
    Wants=network-online.target
    Documentation=https://github.com/coreos
    
    [Service]
    Type=notify
    WorkingDirectory=${ETCD_DATA_DIR}
    ExecStart=/opt/k8s/bin/etcd \
      --data-dir=${ETCD_DATA_DIR} \
      --wal-dir=${ETCD_WAL_DIR} \
      --name=##NODE_NAME## \
      --cert-file=/etc/etcd/cert/etcd.pem \
      --key-file=/etc/etcd/cert/etcd-key.pem \
      --trusted-ca-file=/etc/kubernetes/cert/ca.pem \
      --peer-cert-file=/etc/etcd/cert/etcd.pem \
      --peer-key-file=/etc/etcd/cert/etcd-key.pem \
      --peer-trusted-ca-file=/etc/kubernetes/cert/ca.pem \
      --peer-client-cert-auth \
      --client-cert-auth \
      --listen-peer-urls=https://##NODE_IP##:2380 \
      --initial-advertise-peer-urls=https://##NODE_IP##:2380 \
      --listen-client-urls=https://##NODE_IP##:2379,http://127.0.0.1:2379 \
      --advertise-client-urls=https://##NODE_IP##:2379 \
      --initial-cluster-token=etcd-cluster-0 \
      --initial-cluster=${ETCD_NODES} \
      --initial-cluster-state=new \
      --auto-compaction-mode=periodic \
      --auto-compaction-retention=1 \
      --max-request-bytes=33554432 \
      --quota-backend-bytes=6442450944 \
      --heartbeat-interval=250 \
      --election-timeout=2000
    Restart=on-failure
    RestartSec=5
    LimitNOFILE=65536
    
    [Install]
    WantedBy=multi-user.target
    EOF
    
    • WorkingDirectory--data-dir:指定工作目录和数据目录为 ${ETCD_DATA_DIR},需在启动服务前创建这个目录;
    • --wal-dir:指定 wal 目录,为了提高性能,一般使用 SSD 或者和 --data-dir 不同的磁盘;
    • --name:指定节点名称,当 --initial-cluster-state 值为 new 时,--name 的参数值必须位于 --initial-cluster 列表中;
    • --cert-file--key-file:etcd server 与 client 通信时使用的证书和私钥;
    • --trusted-ca-file:签名 client 证书的 CA 证书,用于验证 client 证书;
    • --peer-cert-file--peer-key-file:etcd 与 peer 通信使用的证书和私钥;
    • --peer-trusted-ca-file:签名 peer 证书的 CA 证书,用于验证 peer 证书;

    为各节点创建和分发 etcd systemd unit 文件

    替换模板文件中的变量,为各节点创建 systemd unit 文件:

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    for (( i=0; i < 3; i++ ))
      do
        sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" etcd.service.template > etcd-${NODE_IPS[i]}.service
      done
    ls *.service
    
    • NODE_NAMES 和 NODE_IPS 为相同长度的 bash 数组,分别为节点名称和对应的 IP;

    分发生成的 systemd unit 文件:

    cd /opt/k8s/work
    # 因为生成的etcd.service文件中是以ip进行区分的,这里不能使用主机名的形式
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        scp etcd-${node_ip}.service root@${node_ip}:/etc/systemd/system/etcd.service
      done
    
    • 文件重命名为 etcd.service;

    启动 etcd 服务

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh root@${node_ip} "mkdir -p ${ETCD_DATA_DIR} ${ETCD_WAL_DIR}"
        ssh root@${node_ip} "systemctl daemon-reload && systemctl enable etcd && systemctl restart etcd " &
      done
    
    • 必须先创建 etcd 数据目录和工作目录;
    • etcd 进程首次启动时会等待其它节点的 etcd 加入集群,命令 systemctl start etcd 会卡住一段时间,为正常现象;

    检查启动结果

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh root@${node_ip} "systemctl status etcd|grep Active"
      done
    

    确保状态为 active (running),否则查看日志,确认原因:

    journalctl -u etcd
    

    验证服务状态

    部署完 etcd 集群后,在任一 etcd 节点上执行如下命令:

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ETCDCTL_API=3 /opt/k8s/bin/etcdctl 
        --endpoints=https://${node_ip}:2379 
        --cacert=/etc/kubernetes/cert/ca.pem 
        --cert=/etc/etcd/cert/etcd.pem 
        --key=/etc/etcd/cert/etcd-key.pem endpoint health
      done
    

    预期输出:

    >>> 192.168.75.110
    https://192.168.75.110:2379 is healthy: successfully committed proposal: took = 69.349466ms
    >>> 192.168.75.111
    https://192.168.75.111:2379 is healthy: successfully committed proposal: took = 2.989018ms
    >>> 192.168.75.112
    https://192.168.75.112:2379 is healthy: successfully committed proposal: took = 1.926582ms
    

    输出均为 healthy 时表示集群服务正常。

    查看当前的 leader

    source /opt/k8s/bin/environment.sh
    ETCDCTL_API=3 /opt/k8s/bin/etcdctl 
      -w table --cacert=/etc/kubernetes/cert/ca.pem 
      --cert=/etc/etcd/cert/etcd.pem 
      --key=/etc/etcd/cert/etcd-key.pem 
      --endpoints=${ETCD_ENDPOINTS} endpoint status
    

    输出:

    +-----------------------------+------------------+---------+---------+-----------+-----------+------------+
    |          ENDPOINT           |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
    +-----------------------------+------------------+---------+---------+-----------+-----------+------------+
    | https://192.168.75.110:2379 | f3373394e2909c16 |  3.3.13 |   20 kB |      true |         2 |          8 |
    | https://192.168.75.111:2379 | bd1095e88a91da45 |  3.3.13 |   20 kB |     false |         2 |          8 |
    | https://192.168.75.112:2379 | 110570bfaa8447c2 |  3.3.13 |   20 kB |     false |         2 |          8 |
    +-----------------------------+------------------+---------+---------+-----------+-----------+------------+
    
    • 可见,当前的 leader 为 192.168.75.110。

    05.部署 flannel 网络

    kubernetes 要求集群内各节点(包括 master 节点)能通过 Pod 网段互联互通。flannel 使用 vxlan 技术为各节点创建一个可以互通的 Pod 网络,使用的端口为 UDP 8472(需要开放该端口,如公有云 AWS 等)。

    flanneld 第一次启动时,从 etcd 获取配置的 Pod 网段信息,为本节点分配一个未使用的地址段,然后创建 flannedl.1 网络接口(也可能是其它名称,如 flannel1 等)。

    flannel 将分配给自己的 Pod 网段信息写入 /run/flannel/docker 文件,docker 后续使用这个文件中的环境变量设置 docker0 网桥,从而从这个地址段为本节点的所有 Pod 容器分配 IP。

    注意:如果没有特殊指明,本文档的所有操作均在 kube-node1 节点上执行,然后远程分发文件和执行命令。

    下载和分发 flanneld 二进制文件

    从 flannel 的 release 页面 下载最新版本的安装包:

    cd /opt/k8s/work
    mkdir flannel
    wget https://github.com/coreos/flannel/releases/download/v0.11.0/flannel-v0.11.0-linux-amd64.tar.gz
    tar -xzvf flannel-v0.11.0-linux-amd64.tar.gz -C flannel
    

    分发二进制文件到集群所有节点:

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        scp flannel/{flanneld,mk-docker-opts.sh} root@${node_ip}:/opt/k8s/bin/
        ssh root@${node_ip} "chmod +x /opt/k8s/bin/*"
      done
    

    创建 flannel 证书和私钥

    flanneld 从 etcd 集群存取网段分配信息,而 etcd 集群启用了双向 x509 证书认证,所以需要为 flanneld 生成证书和私钥。

    创建证书签名请求:

    cd /opt/k8s/work
    cat > flanneld-csr.json <<EOF
    {
      "CN": "flanneld",
      "hosts": [],
      "key": {
        "algo": "rsa",
        "size": 2048
      },
      "names": [
        {
          "C": "CN",
          "ST": "BeiJing",
          "L": "BeiJing",
          "O": "k8s",
          "OU": "4Paradigm"
        }
      ]
    }
    EOF
    
    • 该证书只会被 kubectl 当做 client 证书使用,所以 hosts 字段为空;

    生成证书和私钥:

    cfssl gencert -ca=/opt/k8s/work/ca.pem 
      -ca-key=/opt/k8s/work/ca-key.pem 
      -config=/opt/k8s/work/ca-config.json 
      -profile=kubernetes flanneld-csr.json | cfssljson -bare flanneld
    ls flanneld*pem
    

    将生成的证书和私钥分发到所有节点(master 和 worker):

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh root@${node_ip} "mkdir -p /etc/flanneld/cert"
        scp flanneld*.pem root@${node_ip}:/etc/flanneld/cert
      done
    

    向 etcd 写入集群 Pod 网段信息

    注意:本步骤只需执行一次

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    etcdctl 
      --endpoints=${ETCD_ENDPOINTS} 
      --ca-file=/opt/k8s/work/ca.pem 
      --cert-file=/opt/k8s/work/flanneld.pem 
      --key-file=/opt/k8s/work/flanneld-key.pem 
      mk ${FLANNEL_ETCD_PREFIX}/config '{"Network":"'${CLUSTER_CIDR}'", "SubnetLen": 21, "Backend": {"Type": "vxlan"}}'
    
    • flanneld 当前版本 (v0.11.0) 不支持 etcd v3,故使用 etcd v2 API 写入配置 key 和网段数据;
    • 写入的 Pod 网段 ${CLUSTER_CIDR} 地址段(如 /16)必须小于 SubnetLen,必须与 kube-controller-manager--cluster-cidr 参数值一致;

    创建 flanneld 的 systemd unit 文件

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    cat > flanneld.service << EOF
    [Unit]
    Description=Flanneld overlay address etcd agent
    After=network.target
    After=network-online.target
    Wants=network-online.target
    After=etcd.service
    Before=docker.service
    
    [Service]
    Type=notify
    ExecStart=/opt/k8s/bin/flanneld \
      -etcd-cafile=/etc/kubernetes/cert/ca.pem \
      -etcd-certfile=/etc/flanneld/cert/flanneld.pem \
      -etcd-keyfile=/etc/flanneld/cert/flanneld-key.pem \
      -etcd-endpoints=${ETCD_ENDPOINTS} \
      -etcd-prefix=${FLANNEL_ETCD_PREFIX} \
      -iface=${IFACE} \
      -ip-masq
    ExecStartPost=/opt/k8s/bin/mk-docker-opts.sh -k DOCKER_NETWORK_OPTIONS -d /run/flannel/docker
    Restart=always
    RestartSec=5
    StartLimitInterval=0
    
    [Install]
    WantedBy=multi-user.target
    RequiredBy=docker.service
    EOF
    
    • mk-docker-opts.sh 脚本将分配给 flanneld 的 Pod 子网段信息写入 /run/flannel/docker 文件,后续 docker 启动时使用这个文件中的环境变量配置 docker0 网桥;
    • flanneld 使用系统缺省路由所在的接口与其它节点通信,对于有多个网络接口(如内网和公网)的节点,可以用 -iface 参数指定通信接口;
    • flanneld 运行时需要 root 权限;
    • -ip-masq: flanneld 为访问 Pod 网络外的流量设置 SNAT 规则,同时将传递给 Docker 的变量 --ip-masq/run/flannel/docker 文件中)设置为 false,这样 Docker 将不再创建 SNAT 规则; Docker 的 --ip-masq 为 true 时,创建的 SNAT 规则比较“暴力”:将所有本节点 Pod 发起的、访问非 docker0 接口的请求做 SNAT,这样访问其他节点 Pod 的请求来源 IP 会被设置为 flannel.1 接口的 IP,导致目的 Pod 看不到真实的来源 Pod IP。 flanneld 创建的 SNAT 规则比较温和,只对访问非 Pod 网段的请求做 SNAT。

    分发 flanneld systemd unit 文件到所有节点

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        scp flanneld.service root@${node_ip}:/etc/systemd/system/
      done
    

    启动 flanneld 服务

    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh root@${node_ip} "systemctl daemon-reload && systemctl enable flanneld && systemctl restart flanneld"
      done
    

    检查启动结果

    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh root@${node_ip} "systemctl status flanneld|grep Active"
      done
    

    确保状态为 active (running),否则查看日志,确认原因:

    journalctl -u flanneld
    

    检查分配给各 flanneld 的 Pod 网段信息

    查看集群 Pod 网段(/16):

    source /opt/k8s/bin/environment.sh
    etcdctl 
      --endpoints=${ETCD_ENDPOINTS} 
      --ca-file=/etc/kubernetes/cert/ca.pem 
      --cert-file=/etc/flanneld/cert/flanneld.pem 
      --key-file=/etc/flanneld/cert/flanneld-key.pem 
      get ${FLANNEL_ETCD_PREFIX}/config
    

    输出:

    {"Network":"172.30.0.0/16", "SubnetLen": 21, "Backend": {"Type": "vxlan"}}
    

    查看已分配的 Pod 子网段列表(/24):

    source /opt/k8s/bin/environment.sh
    etcdctl 
      --endpoints=${ETCD_ENDPOINTS} 
      --ca-file=/etc/kubernetes/cert/ca.pem 
      --cert-file=/etc/flanneld/cert/flanneld.pem 
      --key-file=/etc/flanneld/cert/flanneld-key.pem 
      ls ${FLANNEL_ETCD_PREFIX}/subnets
    

    输出(结果视部署情况而定):

    /kubernetes/network/subnets/172.30.24.0-21
    /kubernetes/network/subnets/172.30.40.0-21
    /kubernetes/network/subnets/172.30.200.0-21
    

    查看某一 Pod 网段对应的节点 IP 和 flannel 接口地址:

    source /opt/k8s/bin/environment.sh
    etcdctl 
      --endpoints=${ETCD_ENDPOINTS} 
      --ca-file=/etc/kubernetes/cert/ca.pem 
      --cert-file=/etc/flanneld/cert/flanneld.pem 
      --key-file=/etc/flanneld/cert/flanneld-key.pem 
      get ${FLANNEL_ETCD_PREFIX}/subnets/172.30.24.0-21
    

    输出(结果视部署情况而定):

    {"PublicIP":"192.168.75.110","BackendType":"vxlan","BackendData":{"VtepMAC":"62:08:2f:f4:b8:a9"}}
    
    • 172.30.24.0/21 被分配给节点 kube-node1(192.168.75.110);
    • VtepMAC 为 kube-node1 节点的 flannel.1 网卡 MAC 地址;

    检查节点 flannel 网络信息

    [root@kube-node1 work]# ip addr show
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
    2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
        link/ether 00:0c:29:4f:53:fa brd ff:ff:ff:ff:ff:ff
        inet 192.168.75.110/24 brd 192.168.75.255 scope global ens33
           valid_lft forever preferred_lft forever
    3: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
        link/ether 62:08:2f:f4:b8:a9 brd ff:ff:ff:ff:ff:ff
        inet 172.30.24.0/32 scope global flannel.1
           valid_lft forever preferred_lft forever
    
    • flannel.1 网卡的地址为分配的 Pod 子网段的第一个 IP(.0),且是 /32 的地址;
    [root@kube-node1 work]# ip route show |grep flannel.1
    172.30.40.0/21 via 172.30.40.0 dev flannel.1 onlink
    172.30.200.0/21 via 172.30.200.0 dev flannel.1 onlink
    
    • 到其它节点 Pod 网段请求都被转发到 flannel.1 网卡;
    • flanneld 根据 etcd 中子网段的信息,如 ${FLANNEL_ETCD_PREFIX}/subnets/172.30.24.0-21 ,来决定进请求发送给哪个节点的互联 IP;

    验证各节点能通过 Pod 网段互通

    各节点上部署 flannel 后,检查是否创建了 flannel 接口(名称可能为 flannel0、flannel.0、flannel.1 等):

    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh ${node_ip} "/usr/sbin/ip addr show flannel.1|grep -w inet"
      done
    

    输出:

    >>> 192.168.75.110
        inet 172.30.24.0/32 scope global flannel.1
    >>> 192.168.75.111
        inet 172.30.40.0/32 scope global flannel.1
    >>> 192.168.75.112
        inet 172.30.200.0/32 scope global flannel.1
    

    在各节点上 ping 所有 flannel 接口 IP,确保能通:

    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh ${node_ip} "ping -c 1 172.30.80.0"
        ssh ${node_ip} "ping -c 1 172.30.32.0"
        ssh ${node_ip} "ping -c 1 172.30.184.0"
      done
    

    06-0 kube-apiserver 高可用之 nginx 代理

    本文档讲解使用 nginx 4 层透明代理功能实现 K8S 节点( master 节点和 worker 节点)高可用访问 kube-apiserver 的步骤。

    注意:如果没有特殊指明,本文档的所有操作均在 kube-node1 节点上执行,然后远程分发文件和执行命令。

    基于 nginx 代理的 kube-apiserver 高可用方案

    • 控制节点的 kube-controller-manager、kube-scheduler 是多实例部署,所以只要有一个实例正常,就可以保证高可用;
    • 集群内的 Pod 使用 K8S 服务域名 kubernetes 访问 kube-apiserver, kube-dns 会自动解析出多个 kube-apiserver 节点的 IP,所以也是高可用的;
    • 在每个节点起一个 nginx 进程,后端对接多个 apiserver 实例,nginx 对它们做健康检查和负载均衡;
    • kubelet、kube-proxy、controller-manager、scheduler 通过本地的 nginx(监听 127.0.0.1)访问 kube-apiserver,从而实现 kube-apiserver 的高可用;

    下载和编译 nginx

    下载源码:

    cd /opt/k8s/work
    wget http://nginx.org/download/nginx-1.15.3.tar.gz
    tar -xzvf nginx-1.15.3.tar.gz
    

    配置编译参数:

    cd /opt/k8s/work/nginx-1.15.3
    mkdir nginx-prefix
    ./configure --with-stream --without-http --prefix=$(pwd)/nginx-prefix --without-http_uwsgi_module --without-http_scgi_module --without-http_fastcgi_module
    
    • --with-stream:开启 4 层透明转发(TCP Proxy)功能;
    • --without-xxx:关闭所有其他功能,这样生成的动态链接二进制程序依赖最小;

    输出:

    Configuration summary
      + PCRE library is not used
      + OpenSSL library is not used
      + zlib library is not used
    
      nginx path prefix: "/root/tmp/nginx-1.15.3/nginx-prefix"
      nginx binary file: "/root/tmp/nginx-1.15.3/nginx-prefix/sbin/nginx"
      nginx modules path: "/root/tmp/nginx-1.15.3/nginx-prefix/modules"
      nginx configuration prefix: "/root/tmp/nginx-1.15.3/nginx-prefix/conf"
      nginx configuration file: "/root/tmp/nginx-1.15.3/nginx-prefix/conf/nginx.conf"
      nginx pid file: "/root/tmp/nginx-1.15.3/nginx-prefix/logs/nginx.pid"
      nginx error log file: "/root/tmp/nginx-1.15.3/nginx-prefix/logs/error.log"
      nginx http access log file: "/root/tmp/nginx-1.15.3/nginx-prefix/logs/access.log"
      nginx http client request body temporary files: "client_body_temp"
      nginx http proxy temporary files: "proxy_temp"
    

    编译和安装:

    cd /opt/k8s/work/nginx-1.15.3
    make && make install
    

    验证编译的 nginx

    cd /opt/k8s/work/nginx-1.15.3
    ./nginx-prefix/sbin/nginx -v
    

    输出:

    nginx version: nginx/1.15.3
    

    查看 nginx 动态链接的库:

    $ ldd ./nginx-prefix/sbin/nginx
    

    输出:

            linux-vdso.so.1 =>  (0x00007ffc945e7000)
            libdl.so.2 => /lib64/libdl.so.2 (0x00007f4385072000)
            libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f4384e56000)
            libc.so.6 => /lib64/libc.so.6 (0x00007f4384a89000)
            /lib64/ld-linux-x86-64.so.2 (0x00007f4385276000)
    
    • 由于只开启了 4 层透明转发功能,所以除了依赖 libc 等操作系统核心 lib 库外,没有对其它 lib 的依赖(如 libz、libssl 等),这样可以方便部署到各版本操作系统中;

    安装和部署 nginx

    创建目录结构:

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh root@${node_ip} "mkdir -p /opt/k8s/kube-nginx/{conf,logs,sbin}"
      done
    

    拷贝二进制程序:

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh root@${node_ip} "mkdir -p /opt/k8s/kube-nginx/{conf,logs,sbin}"
        scp /opt/k8s/work/nginx-1.15.3/nginx-prefix/sbin/nginx  root@${node_ip}:/opt/k8s/kube-nginx/sbin/kube-nginx
        ssh root@${node_ip} "chmod a+x /opt/k8s/kube-nginx/sbin/*"
      done
    
    • 重命名二进制文件为 kube-nginx;

    配置 nginx,开启 4 层透明转发功能:

    cd /opt/k8s/work
    cat > kube-nginx.conf << EOF
    worker_processes 1;
    
    events {
        worker_connections  1024;
    }
    
    stream {
        upstream backend {
            hash $remote_addr consistent;
            server 192.168.75.110:6443  max_fails=3 fail_timeout=30s;
            server 192.168.75.111:6443  max_fails=3 fail_timeout=30s;
            server 192.168.75.112:6443  max_fails=3 fail_timeout=30s;
        }
    
        server {
            listen 127.0.0.1:8443;
            proxy_connect_timeout 1s;
            proxy_pass backend;
        }
    }
    EOF
    
    • 需要根据集群 kube-apiserver 的实际情况,替换 backend 中 server 列表;

    分发配置文件:

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        scp kube-nginx.conf  root@${node_ip}:/opt/k8s/kube-nginx/conf/kube-nginx.conf
      done
    

    配置 systemd unit 文件,启动服务

    配置 kube-nginx systemd unit 文件:

    cd /opt/k8s/work
    cat > kube-nginx.service <<EOF
    [Unit]
    Description=kube-apiserver nginx proxy
    After=network.target
    After=network-online.target
    Wants=network-online.target
    
    [Service]
    Type=forking
    ExecStartPre=/opt/k8s/kube-nginx/sbin/kube-nginx -c /opt/k8s/kube-nginx/conf/kube-nginx.conf -p /opt/k8s/kube-nginx -t
    ExecStart=/opt/k8s/kube-nginx/sbin/kube-nginx -c /opt/k8s/kube-nginx/conf/kube-nginx.conf -p /opt/k8s/kube-nginx
    ExecReload=/opt/k8s/kube-nginx/sbin/kube-nginx -c /opt/k8s/kube-nginx/conf/kube-nginx.conf -p /opt/k8s/kube-nginx -s reload
    PrivateTmp=true
    Restart=always
    RestartSec=5
    StartLimitInterval=0
    LimitNOFILE=65536
    
    [Install]
    WantedBy=multi-user.target
    EOF
    

    分发 systemd unit 文件:

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        scp kube-nginx.service  root@${node_ip}:/etc/systemd/system/
      done
    

    启动 kube-nginx 服务:

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kube-nginx && systemctl restart kube-nginx"
      done
    

    检查 kube-nginx 服务运行状态

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh root@${node_ip} "systemctl status kube-nginx |grep 'Active:'"
      done
    

    确保状态为 active (running),否则查看日志,确认原因:

    journalctl -u kube-nginx
    

    06-1.部署 master 节点

    kubernetes master 节点运行如下组件:

    • kube-apiserver
    • kube-scheduler
    • kube-controller-manager
    • kube-nginx

    kube-apiserver、kube-scheduler 和 kube-controller-manager 均以多实例模式运行:

    1. kube-scheduler 和 kube-controller-manager 会自动选举产生一个 leader 实例,其它实例处于阻塞模式,当 leader 挂了后,重新选举产生新的 leader,从而保证服务可用性;
    2. kube-apiserver 是无状态的,需要通过 kube-nginx 进行代理访问,从而保证服务可用性;

    注意:如果没有特殊指明,本文档的所有操作均在 kube-node1 节点上执行,然后远程分发文件和执行命令。

    安装和配置 kube-nginx

    参考 06-0.apiserver高可用之nginx代理.md

    下载最新版本二进制文件

    CHANGELOG 页面 下载二进制 tar 文件并解压:

    cd /opt/k8s/work
    # 使用迅雷下载后上传,不过有个问题,迅雷下载后的文件名是kubernetes-server-linux-amd64.tar.tar。注意后缀不是gz,使用的时候需要修改一下
    wget https://dl.k8s.io/v1.14.2/kubernetes-server-linux-amd64.tar.gz
    tar -xzvf kubernetes-server-linux-amd64.tar.gz
    cd kubernetes
    tar -xzvf  kubernetes-src.tar.gz
    

    将二进制文件拷贝到所有 master 节点:

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        scp kubernetes/server/bin/{apiextensions-apiserver,cloud-controller-manager,kube-apiserver,kube-controller-manager,kube-proxy,kube-scheduler,kubeadm,kubectl,kubelet,mounter} root@${node_ip}:/opt/k8s/bin/
        ssh root@${node_ip} "chmod +x /opt/k8s/bin/*"
      done
    

    06-2.部署高可用 kube-apiserver 集群

    本文档讲解部署一个三实例 kube-apiserver 集群的步骤,它们通过 kube-nginx 进行代理访问,从而保证服务可用性。

    注意:如果没有特殊指明,本文档的所有操作均在 kube-node1 节点上执行,然后远程分发文件和执行命令。

    准备工作

    下载最新版本的二进制文件、安装和配置 flanneld 参考:06-1.部署master节点.md

    创建 kubernetes 证书和私钥

    创建证书签名请求:

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    cat > kubernetes-csr.json <<EOF
    {
      "CN": "kubernetes",
      "hosts": [
        "127.0.0.1",
        "172.27.137.240",
        "172.27.137.239",
        "172.27.137.238",
        "${CLUSTER_KUBERNETES_SVC_IP}",
        "kubernetes",
        "kubernetes.default",
        "kubernetes.default.svc",
        "kubernetes.default.svc.cluster",
        "kubernetes.default.svc.cluster.local."
      ],
      "key": {
        "algo": "rsa",
        "size": 2048
      },
      "names": [
        {
          "C": "CN",
          "ST": "BeiJing",
          "L": "BeiJing",
          "O": "k8s",
          "OU": "4Paradigm"
        }
      ]
    }
    EOF
    
    • hosts 字段指定授权使用该证书的 IP 和域名列表,这里列出了 master 节点 IP、kubernetes 服务的 IP 和域名;

    • kubernetes 服务 IP 是 apiserver 自动创建的,一般是 --service-cluster-ip-range 参数指定的网段的第一个IP,后续可以通过下面命令获取:

      $ kubectl get svc kubernetes
      NAME         CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
      kubernetes   10.254.0.1   <none>        443/TCP   1d
      

    生成证书和私钥:

    cfssl gencert -ca=/opt/k8s/work/ca.pem 
      -ca-key=/opt/k8s/work/ca-key.pem 
      -config=/opt/k8s/work/ca-config.json 
      -profile=kubernetes kubernetes-csr.json | cfssljson -bare kubernetes
    ls kubernetes*pem
    

    将生成的证书和私钥文件拷贝到所有 master 节点:

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh root@${node_ip} "mkdir -p /etc/kubernetes/cert"
        scp kubernetes*.pem root@${node_ip}:/etc/kubernetes/cert/
      done
    

    创建加密配置文件

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    cat > encryption-config.yaml <<EOF
    kind: EncryptionConfig
    apiVersion: v1
    resources:
      - resources:
          - secrets
        providers:
          - aescbc:
              keys:
                - name: key1
                  secret: ${ENCRYPTION_KEY}
          - identity: {}
    EOF
    

    将加密配置文件拷贝到 master 节点的 /etc/kubernetes 目录下:

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        scp encryption-config.yaml root@${node_ip}:/etc/kubernetes/
      done
    

    创建审计策略文件

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    cat > audit-policy.yaml <<EOF
    apiVersion: audit.k8s.io/v1beta1
    kind: Policy
    rules:
      # The following requests were manually identified as high-volume and low-risk, so drop them.
      - level: None
        resources:
          - group: ""
            resources:
              - endpoints
              - services
              - services/status
        users:
          - 'system:kube-proxy'
        verbs:
          - watch
    
      - level: None
        resources:
          - group: ""
            resources:
              - nodes
              - nodes/status
        userGroups:
          - 'system:nodes'
        verbs:
          - get
    
      - level: None
        namespaces:
          - kube-system
        resources:
          - group: ""
            resources:
              - endpoints
        users:
          - 'system:kube-controller-manager'
          - 'system:kube-scheduler'
          - 'system:serviceaccount:kube-system:endpoint-controller'
        verbs:
          - get
          - update
    
      - level: None
        resources:
          - group: ""
            resources:
              - namespaces
              - namespaces/status
              - namespaces/finalize
        users:
          - 'system:apiserver'
        verbs:
          - get
    
      # Don't log HPA fetching metrics.
      - level: None
        resources:
          - group: metrics.k8s.io
        users:
          - 'system:kube-controller-manager'
        verbs:
          - get
          - list
    
      # Don't log these read-only URLs.
      - level: None
        nonResourceURLs:
          - '/healthz*'
          - /version
          - '/swagger*'
    
      # Don't log events requests.
      - level: None
        resources:
          - group: ""
            resources:
              - events
    
      # node and pod status calls from nodes are high-volume and can be large, don't log responses for expected updates from nodes
      - level: Request
        omitStages:
          - RequestReceived
        resources:
          - group: ""
            resources:
              - nodes/status
              - pods/status
        users:
          - kubelet
          - 'system:node-problem-detector'
          - 'system:serviceaccount:kube-system:node-problem-detector'
        verbs:
          - update
          - patch
    
      - level: Request
        omitStages:
          - RequestReceived
        resources:
          - group: ""
            resources:
              - nodes/status
              - pods/status
        userGroups:
          - 'system:nodes'
        verbs:
          - update
          - patch
    
      # deletecollection calls can be large, don't log responses for expected namespace deletions
      - level: Request
        omitStages:
          - RequestReceived
        users:
          - 'system:serviceaccount:kube-system:namespace-controller'
        verbs:
          - deletecollection
    
      # Secrets, ConfigMaps, and TokenReviews can contain sensitive & binary data,
      # so only log at the Metadata level.
      - level: Metadata
        omitStages:
          - RequestReceived
        resources:
          - group: ""
            resources:
              - secrets
              - configmaps
          - group: authentication.k8s.io
            resources:
              - tokenreviews
      # Get repsonses can be large; skip them.
      - level: Request
        omitStages:
          - RequestReceived
        resources:
          - group: ""
          - group: admissionregistration.k8s.io
          - group: apiextensions.k8s.io
          - group: apiregistration.k8s.io
          - group: apps
          - group: authentication.k8s.io
          - group: authorization.k8s.io
          - group: autoscaling
          - group: batch
          - group: certificates.k8s.io
          - group: extensions
          - group: metrics.k8s.io
          - group: networking.k8s.io
          - group: policy
          - group: rbac.authorization.k8s.io
          - group: scheduling.k8s.io
          - group: settings.k8s.io
          - group: storage.k8s.io
        verbs:
          - get
          - list
          - watch
    
      # Default level for known APIs
      - level: RequestResponse
        omitStages:
          - RequestReceived
        resources:
          - group: ""
          - group: admissionregistration.k8s.io
          - group: apiextensions.k8s.io
          - group: apiregistration.k8s.io
          - group: apps
          - group: authentication.k8s.io
          - group: authorization.k8s.io
          - group: autoscaling
          - group: batch
          - group: certificates.k8s.io
          - group: extensions
          - group: metrics.k8s.io
          - group: networking.k8s.io
          - group: policy
          - group: rbac.authorization.k8s.io
          - group: scheduling.k8s.io
          - group: settings.k8s.io
          - group: storage.k8s.io
    
      # Default level for all other requests.
      - level: Metadata
        omitStages:
          - RequestReceived
    EOF
    

    分发审计策略文件:

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        scp audit-policy.yaml root@${node_ip}:/etc/kubernetes/audit-policy.yaml
      done
    

    创建后续访问 metrics-server 使用的证书

    创建证书签名请求:

    cat > proxy-client-csr.json <<EOF
    {
      "CN": "aggregator",
      "hosts": [],
      "key": {
        "algo": "rsa",
        "size": 2048
      },
      "names": [
        {
          "C": "CN",
          "ST": "BeiJing",
          "L": "BeiJing",
          "O": "k8s",
          "OU": "4Paradigm"
        }
      ]
    }
    EOF
    
    • CN 名称需要位于 kube-apiserver 的 --requestheader-allowed-names 参数中,否则后续访问 metrics 时会提示权限不足。

    生成证书和私钥:

    cfssl gencert -ca=/etc/kubernetes/cert/ca.pem 
      -ca-key=/etc/kubernetes/cert/ca-key.pem  
      -config=/etc/kubernetes/cert/ca-config.json  
      -profile=kubernetes proxy-client-csr.json | cfssljson -bare proxy-client
    ls proxy-client*.pem
    

    将生成的证书和私钥文件拷贝到所有 master 节点:

    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        scp proxy-client*.pem root@${node_ip}:/etc/kubernetes/cert/
      done
    

    创建 kube-apiserver systemd unit 模板文件

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    cat > kube-apiserver.service.template <<EOF
    [Unit]
    Description=Kubernetes API Server
    Documentation=https://github.com/GoogleCloudPlatform/kubernetes
    After=network.target
    
    [Service]
    WorkingDirectory=${K8S_DIR}/kube-apiserver
    ExecStart=/opt/k8s/bin/kube-apiserver \
      --advertise-address=##NODE_IP## \
      --default-not-ready-toleration-seconds=360 \
      --default-unreachable-toleration-seconds=360 \
      --feature-gates=DynamicAuditing=true \
      --max-mutating-requests-inflight=2000 \
      --max-requests-inflight=4000 \
      --default-watch-cache-size=200 \
      --delete-collection-workers=2 \
      --encryption-provider-config=/etc/kubernetes/encryption-config.yaml \
      --etcd-cafile=/etc/kubernetes/cert/ca.pem \
      --etcd-certfile=/etc/kubernetes/cert/kubernetes.pem \
      --etcd-keyfile=/etc/kubernetes/cert/kubernetes-key.pem \
      --etcd-servers=${ETCD_ENDPOINTS} \
      --bind-address=##NODE_IP## \
      --secure-port=6443 \
      --tls-cert-file=/etc/kubernetes/cert/kubernetes.pem \
      --tls-private-key-file=/etc/kubernetes/cert/kubernetes-key.pem \
      --insecure-port=0 \
      --audit-dynamic-configuration \
      --audit-log-maxage=15 \
      --audit-log-maxbackup=3 \
      --audit-log-maxsize=100 \
      --audit-log-truncate-enabled \
      --audit-log-path=${K8S_DIR}/kube-apiserver/audit.log \
      --audit-policy-file=/etc/kubernetes/audit-policy.yaml \
      --profiling \
      --anonymous-auth=false \
      --client-ca-file=/etc/kubernetes/cert/ca.pem \
      --enable-bootstrap-token-auth \
      --requestheader-allowed-names="aggregator" \
      --requestheader-client-ca-file=/etc/kubernetes/cert/ca.pem \
      --requestheader-extra-headers-prefix="X-Remote-Extra-" \
      --requestheader-group-headers=X-Remote-Group \
      --requestheader-username-headers=X-Remote-User \
      --service-account-key-file=/etc/kubernetes/cert/ca.pem \
      --authorization-mode=Node,RBAC \
      --runtime-config=api/all=true \
      --enable-admission-plugins=NodeRestriction \
      --allow-privileged=true \
      --apiserver-count=3 \
      --event-ttl=168h \
      --kubelet-certificate-authority=/etc/kubernetes/cert/ca.pem \
      --kubelet-client-certificate=/etc/kubernetes/cert/kubernetes.pem \
      --kubelet-client-key=/etc/kubernetes/cert/kubernetes-key.pem \
      --kubelet-https=true \
      --kubelet-timeout=10s \
      --proxy-client-cert-file=/etc/kubernetes/cert/proxy-client.pem \
      --proxy-client-key-file=/etc/kubernetes/cert/proxy-client-key.pem \
      --service-cluster-ip-range=${SERVICE_CIDR} \
      --service-node-port-range=${NODE_PORT_RANGE} \
      --logtostderr=true \
      --v=2
    Restart=on-failure
    RestartSec=10
    Type=notify
    LimitNOFILE=65536
    
    [Install]
    WantedBy=multi-user.target
    EOF
    
    • --advertise-address:apiserver 对外通告的 IP(kubernetes 服务后端节点 IP);
    • --default-*-toleration-seconds:设置节点异常相关的阈值;
    • --max-*-requests-inflight:请求相关的最大阈值;
    • --etcd-*:访问 etcd 的证书和 etcd 服务器地址;
    • --experimental-encryption-provider-config:指定用于加密 etcd 中 secret 的配置;
    • --bind-address: https 监听的 IP,不能为 127.0.0.1,否则外界不能访问它的安全端口 6443;
    • --secret-port:https 监听端口;
    • --insecure-port=0:关闭监听 http 非安全端口(8080);
    • --tls-*-file:指定 apiserver 使用的证书、私钥和 CA 文件;
    • --audit-*:配置审计策略和审计日志文件相关的参数;
    • --client-ca-file:验证 client (kue-controller-manager、kube-scheduler、kubelet、kube-proxy 等)请求所带的证书;
    • --enable-bootstrap-token-auth:启用 kubelet bootstrap 的 token 认证;
    • --requestheader-*:kube-apiserver 的 aggregator layer 相关的配置参数,proxy-client & HPA 需要使用;
    • --requestheader-client-ca-file:用于签名 --proxy-client-cert-file--proxy-client-key-file 指定的证书;在启用了 metric aggregator 时使用;
    • --requestheader-allowed-names:不能为空,值为逗号分割的 --proxy-client-cert-file 证书的 CN 名称,这里设置为 "aggregator";
    • --service-account-key-file:签名 ServiceAccount Token 的公钥文件,kube-controller-manager 的 --service-account-private-key-file 指定私钥文件,两者配对使用;
    • --runtime-config=api/all=true: 启用所有版本的 APIs,如 autoscaling/v2alpha1;
    • --authorization-mode=Node,RBAC--anonymous-auth=false: 开启 Node 和 RBAC 授权模式,拒绝未授权的请求;
    • --enable-admission-plugins:启用一些默认关闭的 plugins;
    • --allow-privileged:运行执行 privileged 权限的容器;
    • --apiserver-count=3:指定 apiserver 实例的数量;
    • --event-ttl:指定 events 的保存时间;
    • --kubelet-*:如果指定,则使用 https 访问 kubelet APIs;需要为证书对应的用户(上面 kubernetes*.pem 证书的用户为 kubernetes) 用户定义 RBAC 规则,否则访问 kubelet API 时提示未授权;
    • --proxy-client-*:apiserver 访问 metrics-server 使用的证书;
    • --service-cluster-ip-range: 指定 Service Cluster IP 地址段;
    • --service-node-port-range: 指定 NodePort 的端口范围;

    如果 kube-apiserver 机器没有运行 kube-proxy,则还需要添加 --enable-aggregator-routing=true 参数;

    关于 --requestheader-XXX 相关参数,参考:

    注意:

    1. requestheader-client-ca-file 指定的 CA 证书,必须具有 client auth and server auth;
    2. 如果 --requestheader-allowed-names 不为空,且 --proxy-client-cert-file 证书的 CN 名称不在 allowed-names 中,则后续查看 node 或 pods 的 metrics 失败,提示:
    [root@zhangjun-k8s01 1.8+]# kubectl top nodes
    Error from server (Forbidden): nodes.metrics.k8s.io is forbidden: User "aggregator" cannot list resource "nodes" in API group "metrics.k8s.io" at the cluster scope
    

    为各节点创建和分发 kube-apiserver systemd unit 文件

    替换模板文件中的变量,为各节点生成 systemd unit 文件:

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    for (( i=0; i < 3; i++ ))
      do
        sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" kube-apiserver.service.template > kube-apiserver-${NODE_IPS[i]}.service
      done
    ls kube-apiserver*.service
    
    • NODE_NAMES 和 NODE_IPS 为相同长度的 bash 数组,分别为节点名称和对应的 IP;

    分发生成的 systemd unit 文件:

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        scp kube-apiserver-${node_ip}.service root@${node_ip}:/etc/systemd/system/kube-apiserver.service
      done
    
    • 文件重命名为 kube-apiserver.service;

    启动 kube-apiserver 服务

    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh root@${node_ip} "mkdir -p ${K8S_DIR}/kube-apiserver"
        ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kube-apiserver && systemctl restart kube-apiserver"
      done
    
    • 启动服务前必须先创建工作目录;

    检查 kube-apiserver 运行状态

    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh root@${node_ip} "systemctl status kube-apiserver |grep 'Active:'"
      done
    

    确保状态为 active (running),否则查看日志,确认原因:

    journalctl -u kube-apiserver
    

    打印 kube-apiserver 写入 etcd 的数据

    source /opt/k8s/bin/environment.sh
    ETCDCTL_API=3 etcdctl 
        --endpoints=${ETCD_ENDPOINTS} 
        --cacert=/opt/k8s/work/ca.pem 
        --cert=/opt/k8s/work/etcd.pem 
        --key=/opt/k8s/work/etcd-key.pem 
        get /registry/ --prefix --keys-only
    

    检查集群信息

    $ kubectl cluster-info
    Kubernetes master is running at https://127.0.0.1:8443
    
    To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
    
    $ kubectl get all --all-namespaces
    NAMESPACE   NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
    default     service/kubernetes   ClusterIP   10.254.0.1   <none>        443/TCP   12m
    
    $ kubectl get componentstatuses
    NAME                 STATUS      MESSAGE                                                                                     ERROR
    controller-manager   Unhealthy   Get http://127.0.0.1:10252/healthz: dial tcp 127.0.0.1:10252: connect: connection refused
    scheduler            Unhealthy   Get http://127.0.0.1:10251/healthz: dial tcp 127.0.0.1:10251: connect: connection refused
    etcd-0               Healthy     {"health":"true"}
    etcd-2               Healthy     {"health":"true"}
    etcd-1               Healthy     {"health":"true"}
    
    1. 如果执行 kubectl 命令式时输出如下错误信息,则说明使用的 ~/.kube/config 文件不对,先检查该文件是否存在,然后再检查该文件中参数是否缺少值,然后再执行该命令:

      The connection to the server localhost:8080 was refused - did you specify the right host or port?

    2. 执行 kubectl get componentstatuses 命令时,apiserver 默认向 127.0.0.1 发送请求。当 controller-manager、scheduler 以集群模式运行时,有可能和 kube-apiserver 不在一台机器上,这时 controller-manager 或 scheduler 的状态为 Unhealthy,但实际上它们工作正常

    检查 kube-apiserver 监听的端口

    $ sudo netstat -lnpt|grep kube
    tcp        0      0 172.27.137.240:6443     0.0.0.0:*               LISTEN      101442/kube-apiserv
    
    • 6443: 接收 https 请求的安全端口,对所有请求做认证和授权;
    • 由于关闭了非安全端口,故没有监听 8080;

    授予 kube-apiserver 访问 kubelet API 的权限

    在执行 kubectl exec、run、logs 等命令时,apiserver 会将请求转发到 kubelet 的 https 端口。这里定义 RBAC 规则,授权 apiserver 使用的证书(kubernetes.pem)用户名(CN:kuberntes)访问 kubelet API 的权限:

    kubectl create clusterrolebinding kube-apiserver:kubelet-apis --clusterrole=system:kubelet-api-admin --user kubernetes
    

    06-3.部署高可用 kube-controller-manager 集群

    本文档介绍部署高可用 kube-controller-manager 集群的步骤。

    该集群包含 3 个节点,启动后将通过竞争选举机制产生一个 leader 节点,其它节点为阻塞状态。当 leader 节点不可用时,阻塞的节点将再次进行选举产生新的 leader 节点,从而保证服务的可用性。

    为保证通信安全,本文档先生成 x509 证书和私钥,kube-controller-manager 在如下两种情况下使用该证书:

    1. 与 kube-apiserver 的安全端口通信;
    2. 安全端口(https,10252) 输出 prometheus 格式的 metrics;

    注意:如果没有特殊指明,本文档的所有操作均在 kube-node1 节点上执行,然后远程分发文件和执行命令。

    准备工作

    下载最新版本的二进制文件、安装和配置 flanneld 参考:06-1.部署master节点.md

    创建 kube-controller-manager 证书和私钥

    创建证书签名请求:

    cd /opt/k8s/work
    cat > kube-controller-manager-csr.json <<EOF
    {
        "CN": "system:kube-controller-manager",
        "key": {
            "algo": "rsa",
            "size": 2048
        },
        "hosts": [
          "127.0.0.1",
          "172.27.137.240",
          "172.27.137.239",
          "172.27.137.238"
        ],
        "names": [
          {
            "C": "CN",
            "ST": "BeiJing",
            "L": "BeiJing",
            "O": "system:kube-controller-manager",
            "OU": "4Paradigm"
          }
        ]
    }
    EOF
    
    • hosts 列表包含所有 kube-controller-manager 节点 IP;
    • CN 和 O 均为 system:kube-controller-manager,kubernetes 内置的 ClusterRoleBindings system:kube-controller-manager 赋予 kube-controller-manager 工作所需的权限。

    生成证书和私钥:

    cd /opt/k8s/work
    cfssl gencert -ca=/opt/k8s/work/ca.pem 
      -ca-key=/opt/k8s/work/ca-key.pem 
      -config=/opt/k8s/work/ca-config.json 
      -profile=kubernetes kube-controller-manager-csr.json | cfssljson -bare kube-controller-manager
    ls kube-controller-manager*pem
    

    将生成的证书和私钥分发到所有 master 节点:

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        scp kube-controller-manager*.pem root@${node_ip}:/etc/kubernetes/cert/
      done
    

    创建和分发 kubeconfig 文件

    kube-controller-manager 使用 kubeconfig 文件访问 apiserver,该文件提供了 apiserver 地址、嵌入的 CA 证书和 kube-controller-manager 证书:

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    kubectl config set-cluster kubernetes 
      --certificate-authority=/opt/k8s/work/ca.pem 
      --embed-certs=true 
      --server=${KUBE_APISERVER} 
      --kubeconfig=kube-controller-manager.kubeconfig
    
    kubectl config set-credentials system:kube-controller-manager 
      --client-certificate=kube-controller-manager.pem 
      --client-key=kube-controller-manager-key.pem 
      --embed-certs=true 
      --kubeconfig=kube-controller-manager.kubeconfig
    
    kubectl config set-context system:kube-controller-manager 
      --cluster=kubernetes 
      --user=system:kube-controller-manager 
      --kubeconfig=kube-controller-manager.kubeconfig
    
    kubectl config use-context system:kube-controller-manager --kubeconfig=kube-controller-manager.kubeconfig
    

    分发 kubeconfig 到所有 master 节点:

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        scp kube-controller-manager.kubeconfig root@${node_ip}:/etc/kubernetes/
      done
    

    创建 kube-controller-manager systemd unit 模板文件

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    cat > kube-controller-manager.service.template <<EOF
    [Unit]
    Description=Kubernetes Controller Manager
    Documentation=https://github.com/GoogleCloudPlatform/kubernetes
    
    [Service]
    WorkingDirectory=${K8S_DIR}/kube-controller-manager
    ExecStart=/opt/k8s/bin/kube-controller-manager \
      --profiling \
      --cluster-name=kubernetes \
      --controllers=*,bootstrapsigner,tokencleaner \
      --kube-api-qps=1000 \
      --kube-api-burst=2000 \
      --leader-elect \
      --use-service-account-credentials\
      --concurrent-service-syncs=2 \
      --bind-address=##NODE_IP## \
      --secure-port=10252 \
      --tls-cert-file=/etc/kubernetes/cert/kube-controller-manager.pem \
      --tls-private-key-file=/etc/kubernetes/cert/kube-controller-manager-key.pem \
      --port=0 \
      --authentication-kubeconfig=/etc/kubernetes/kube-controller-manager.kubeconfig \
      --client-ca-file=/etc/kubernetes/cert/ca.pem \
      --requestheader-allowed-names="" \
      --requestheader-client-ca-file=/etc/kubernetes/cert/ca.pem \
      --requestheader-extra-headers-prefix="X-Remote-Extra-" \
      --requestheader-group-headers=X-Remote-Group \
      --requestheader-username-headers=X-Remote-User \
      --authorization-kubeconfig=/etc/kubernetes/kube-controller-manager.kubeconfig \
      --cluster-signing-cert-file=/etc/kubernetes/cert/ca.pem \
      --cluster-signing-key-file=/etc/kubernetes/cert/ca-key.pem \
      --experimental-cluster-signing-duration=876000h \
      --horizontal-pod-autoscaler-sync-period=10s \
      --concurrent-deployment-syncs=10 \
      --concurrent-gc-syncs=30 \
      --node-cidr-mask-size=24 \
      --service-cluster-ip-range=${SERVICE_CIDR} \
      --pod-eviction-timeout=6m \
      --terminated-pod-gc-threshold=10000 \
      --root-ca-file=/etc/kubernetes/cert/ca.pem \
      --service-account-private-key-file=/etc/kubernetes/cert/ca-key.pem \
      --kubeconfig=/etc/kubernetes/kube-controller-manager.kubeconfig \
      --logtostderr=true \
      --v=2
    Restart=on-failure
    RestartSec=5
    
    [Install]
    WantedBy=multi-user.target
    EOF
    
    • --port=0:关闭监听非安全端口(http),同时 --address 参数无效,--bind-address 参数有效;
    • --secure-port=10252--bind-address=0.0.0.0: 在所有网络接口监听 10252 端口的 https /metrics 请求;
    • --kubeconfig:指定 kubeconfig 文件路径,kube-controller-manager 使用它连接和验证 kube-apiserver;
    • --authentication-kubeconfig--authorization-kubeconfig:kube-controller-manager 使用它连接 apiserver,对 client 的请求进行认证和授权。kube-controller-manager 不再使用 --tls-ca-file 对请求 https metrics 的 Client 证书进行校验。如果没有配置这两个 kubeconfig 参数,则 client 连接 kube-controller-manager https 端口的请求会被拒绝(提示权限不足)。
    • --cluster-signing-*-file:签名 TLS Bootstrap 创建的证书;
    • --experimental-cluster-signing-duration:指定 TLS Bootstrap 证书的有效期;
    • --root-ca-file:放置到容器 ServiceAccount 中的 CA 证书,用来对 kube-apiserver 的证书进行校验;
    • --service-account-private-key-file:签名 ServiceAccount 中 Token 的私钥文件,必须和 kube-apiserver 的 --service-account-key-file 指定的公钥文件配对使用;
    • --service-cluster-ip-range :指定 Service Cluster IP 网段,必须和 kube-apiserver 中的同名参数一致;
    • --leader-elect=true:集群运行模式,启用选举功能;被选为 leader 的节点负责处理工作,其它节点为阻塞状态;
    • --controllers=*,bootstrapsigner,tokencleaner:启用的控制器列表,tokencleaner 用于自动清理过期的 Bootstrap token;
    • --horizontal-pod-autoscaler-*:custom metrics 相关参数,支持 autoscaling/v2alpha1;
    • --tls-cert-file--tls-private-key-file:使用 https 输出 metrics 时使用的 Server 证书和秘钥;
    • --use-service-account-credentials=true: kube-controller-manager 中各 controller 使用 serviceaccount 访问 kube-apiserver;

    为各节点创建和分发 kube-controller-mananger systemd unit 文件

    替换模板文件中的变量,为各节点创建 systemd unit 文件:

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    for (( i=0; i < 3; i++ ))
      do
        sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" kube-controller-manager.service.template > kube-controller-manager-${NODE_IPS[i]}.service
      done
    ls kube-controller-manager*.service
    
    • NODE_NAMES 和 NODE_IPS 为相同长度的 bash 数组,分别为节点名称和对应的 IP;

    分发到所有 master 节点:

    cd /opt/k8s/work
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        scp kube-controller-manager-${node_ip}.service root@${node_ip}:/etc/systemd/system/kube-controller-manager.service
      done
    
    • 文件重命名为 kube-controller-manager.service;

    启动 kube-controller-manager 服务

    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh root@${node_ip} "mkdir -p ${K8S_DIR}/kube-controller-manager"
        ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kube-controller-manager && systemctl restart kube-controller-manager"
      done
    
    • 启动服务前必须先创建工作目录;

    检查服务运行状态

    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh root@${node_ip} "systemctl status kube-controller-manager|grep Active"
      done
    

    确保状态为 active (running),否则查看日志,确认原因:

    journalctl -u kube-controller-manager
    

    kube-controller-manager 监听 10252 端口,接收 https 请求:

    [root@kube-node1 work]# netstat -lnpt | grep kube-cont
    tcp        0      0 192.168.75.110:10252    0.0.0.0:*               LISTEN      11439/kube-controll
    

    查看输出的 metrics

    注意:以下命令在 kube-controller-manager 节点上执行。

    [root@kube-node1 work]# curl -s --cacert /opt/k8s/work/ca.pem --cert /opt/k8s/work/admin.pem --key /opt/k8s/work/admin-key.pem https://172.27.137.240:10252/metrics |head
    ^X^Z
    [1]+  Stopped                 curl -s --cacert /opt/k8s/work/ca.pem --cert /opt/k8s/work/admin.pem --key /opt/k8s/work/admin-key.pem https://172.27.137.240:10252/metrics | head
    [root@kube-node1 work]# curl -s --cacert /opt/k8s/work/ca.pem --cert /opt/k8s/work/admin.pem --key /opt/k8s/work/admin-key.pem https://192.168.75.110:10252/metrics |head
    # HELP ClusterRoleAggregator_adds (Deprecated) Total number of adds handled by workqueue: ClusterRoleAggregator
    # TYPE ClusterRoleAggregator_adds counter
    ClusterRoleAggregator_adds 13
    # HELP ClusterRoleAggregator_depth (Deprecated) Current depth of workqueue: ClusterRoleAggregator
    # TYPE ClusterRoleAggregator_depth gauge
    ClusterRoleAggregator_depth 0
    # HELP ClusterRoleAggregator_longest_running_processor_microseconds (Deprecated) How many microseconds has the longest running processor for ClusterRoleAggregator been running.
    # TYPE ClusterRoleAggregator_longest_running_processor_microseconds gauge
    ClusterRoleAggregator_longest_running_processor_microseconds 0
    # HELP ClusterRoleAggregator_queue_latency (Deprecated) How long an item stays in workqueueClusterRoleAggregator before being requested.
    

    kube-controller-manager 的权限

    ClusteRole system:kube-controller-manager权限很小,只能创建 secret、serviceaccount 等资源对象,各 controller 的权限分散到 ClusterRole system:controller:XXX 中:

    [root@kube-node1 work]# kubectl describe clusterrole system:kube-controller-manager
    Name:         system:kube-controller-manager
    Labels:       kubernetes.io/bootstrapping=rbac-defaults
    Annotations:  rbac.authorization.kubernetes.io/autoupdate: true
    PolicyRule:
      Resources                                  Non-Resource URLs  Resource Names  Verbs
      ---------                                  -----------------  --------------  -----
      secrets                                    []                 []              [create delete get update]
      endpoints                                  []                 []              [create get update]
      serviceaccounts                            []                 []              [create get update]
      events                                     []                 []              [create patch update]
      tokenreviews.authentication.k8s.io         []                 []              [create]
      subjectaccessreviews.authorization.k8s.io  []                 []              [create]
      configmaps                                 []                 []              [get]
      namespaces                                 []                 []              [get]
      *.*                                        []                 []              [list watch]
    

    需要在 kube-controller-manager 的启动参数中添加 --use-service-account-credentials=true 参数,这样 main controller 会为各 controller 创建对应的 ServiceAccount XXX-controller。内置的 ClusterRoleBinding system:controller:XXX 将赋予各 XXX-controller ServiceAccount 对应的 ClusterRole system:controller:XXX 权限。

    $ kubectl get clusterrole|grep controller
    system:controller:attachdetach-controller                              51m
    system:controller:certificate-controller                               51m
    system:controller:clusterrole-aggregation-controller                   51m
    system:controller:cronjob-controller                                   51m
    system:controller:daemon-set-controller                                51m
    system:controller:deployment-controller                                51m
    system:controller:disruption-controller                                51m
    system:controller:endpoint-controller                                  51m
    system:controller:expand-controller                                    51m
    system:controller:generic-garbage-collector                            51m
    system:controller:horizontal-pod-autoscaler                            51m
    system:controller:job-controller                                       51m
    system:controller:namespace-controller                                 51m
    system:controller:node-controller                                      51m
    system:controller:persistent-volume-binder                             51m
    system:controller:pod-garbage-collector                                51m
    system:controller:pv-protection-controller                             51m
    system:controller:pvc-protection-controller                            51m
    system:controller:replicaset-controller                                51m
    system:controller:replication-controller                               51m
    system:controller:resourcequota-controller                             51m
    system:controller:route-controller                                     51m
    system:controller:service-account-controller                           51m
    system:controller:service-controller                                   51m
    system:controller:statefulset-controller                               51m
    system:controller:ttl-controller                                       51m
    system:kube-controller-manager                                         51m
    

    以 deployment controller 为例:

    $ kubectl describe clusterrole system:controller:deployment-controller
    Name:         system:controller:deployment-controller
    Labels:       kubernetes.io/bootstrapping=rbac-defaults
    Annotations:  rbac.authorization.kubernetes.io/autoupdate: true
    PolicyRule:
      Resources                          Non-Resource URLs  Resource Names  Verbs
      ---------                          -----------------  --------------  -----
      replicasets.apps                   []                 []              [create delete get list patch update watch]
      replicasets.extensions             []                 []              [create delete get list patch update watch]
      events                             []                 []              [create patch update]
      pods                               []                 []              [get list update watch]
      deployments.apps                   []                 []              [get list update watch]
      deployments.extensions             []                 []              [get list update watch]
      deployments.apps/finalizers        []                 []              [update]
      deployments.apps/status            []                 []              [update]
      deployments.extensions/finalizers  []                 []              [update]
      deployments.extensions/status      []                 []              [update]
    

    查看当前的 leader

    [root@kube-node1 work]# kubectl get endpoints kube-controller-manager --namespace=kube-system  -o yaml
    apiVersion: v1
    kind: Endpoints
    metadata:
      annotations:
        control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"kube-node3_ef7efd0f-0149-11ea-8f8a-000c291d1820","leaseDurationSeconds":15,"acquireTime":"2019-11-07T10:39:33Z","renewTime":"2019-11-07T10:43:10Z","leaderTransitions":2}'
      creationTimestamp: "2019-11-07T10:32:42Z"
      name: kube-controller-manager
      namespace: kube-system
      resourceVersion: "3766"
      selfLink: /api/v1/namespaces/kube-system/endpoints/kube-controller-manager
      uid: ee2f71e3-0149-11ea-98c9-000c291d1820
    

    可见,当前的 leader 为 kube-node1 节点。

    测试 kube-controller-manager 集群的高可用

    停掉一个或两个节点的 kube-controller-manager 服务,观察其它节点的日志,看是否获取了 leader 权限。

    参考

    1. 关于 controller 权限和 use-service-account-credentials 参数:https://github.com/kubernetes/kubernetes/issues/48208
    2. kubelet 认证和授权:https://kubernetes.io/docs/admin/kubelet-authentication-authorization/#kubelet-authorization

    06-4.部署高可用 kube-scheduler 集群

    本文档介绍部署高可用 kube-scheduler 集群的步骤。

    该集群包含 3 个节点,启动后将通过竞争选举机制产生一个 leader 节点,其它节点为阻塞状态。当 leader 节点不可用后,剩余节点将再次进行选举产生新的 leader 节点,从而保证服务的可用性。

    为保证通信安全,本文档先生成 x509 证书和私钥,kube-scheduler 在如下两种情况下使用该证书:

    1. 与 kube-apiserver 的安全端口通信;
    2. 安全端口(https,10251) 输出 prometheus 格式的 metrics;

    注意:如果没有特殊指明,本文档的所有操作均在 kube-node1 节点上执行,然后远程分发文件和执行命令。

    准备工作

    下载最新版本的二进制文件、安装和配置 flanneld 参考:06-1.部署master节点.md

    创建 kube-scheduler 证书和私钥

    创建证书签名请求:

    cd /opt/k8s/work
    
    cat > kube-scheduler-csr.json <<EOF
    {
        "CN": "system:kube-scheduler",
        "hosts": [
          "127.0.0.1",
         "192.168.75.110",
         "192.168.75.111",
         "192.168.75.112"
        ],
        "key": {
            "algo": "rsa",
            "size": 2048
        },
        "names": [
          {
            "C": "CN",
            "ST": "BeiJing",
            "L": "BeiJing",
            "O": "system:kube-scheduler",
            "OU": "4Paradigm"
          }
        ]
    }
    EOF
    
    • hosts 列表包含所有 kube-scheduler 节点 IP;
    • CN 和 O 均为 system:kube-scheduler,kubernetes 内置的 ClusterRoleBindings system:kube-scheduler 将赋予 kube-scheduler 工作所需的权限;

    生成证书和私钥:

    cd /opt/k8s/work
    
    cfssl gencert -ca=/opt/k8s/work/ca.pem 
      -ca-key=/opt/k8s/work/ca-key.pem 
      -config=/opt/k8s/work/ca-config.json 
      -profile=kubernetes kube-scheduler-csr.json | cfssljson -bare kube-scheduler
    ls kube-scheduler*pem
    

    将生成的证书和私钥分发到所有 master 节点:

    cd /opt/k8s/work
    
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        scp kube-scheduler*.pem root@${node_ip}:/etc/kubernetes/cert/
      done
    

    创建和分发 kubeconfig 文件

    kube-scheduler 使用 kubeconfig 文件访问 apiserver,该文件提供了 apiserver 地址、嵌入的 CA 证书和 kube-scheduler 证书:

    cd /opt/k8s/work
    
    source /opt/k8s/bin/environment.sh
    kubectl config set-cluster kubernetes 
      --certificate-authority=/opt/k8s/work/ca.pem 
      --embed-certs=true 
      --server=${KUBE_APISERVER} 
      --kubeconfig=kube-scheduler.kubeconfig
    
    kubectl config set-credentials system:kube-scheduler 
      --client-certificate=kube-scheduler.pem 
      --client-key=kube-scheduler-key.pem 
      --embed-certs=true 
      --kubeconfig=kube-scheduler.kubeconfig
    
    kubectl config set-context system:kube-scheduler 
      --cluster=kubernetes 
      --user=system:kube-scheduler 
      --kubeconfig=kube-scheduler.kubeconfig
    
    kubectl config use-context system:kube-scheduler --kubeconfig=kube-scheduler.kubeconfig
    

    分发 kubeconfig 到所有 master 节点:

    cd /opt/k8s/work
    
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        scp kube-scheduler.kubeconfig root@${node_ip}:/etc/kubernetes/
      done
    

    创建 kube-scheduler 配置文件

    cd /opt/k8s/work
    
    cat >kube-scheduler.yaml.template <<EOF
    apiVersion: kubescheduler.config.k8s.io/v1alpha1
    kind: KubeSchedulerConfiguration
    bindTimeoutSeconds: 600
    clientConnection:
      burst: 200
      kubeconfig: "/etc/kubernetes/kube-scheduler.kubeconfig"
      qps: 100
    enableContentionProfiling: false
    enableProfiling: true
    hardPodAffinitySymmetricWeight: 1
    healthzBindAddress: ##NODE_IP##:10251
    leaderElection:
      leaderElect: true
    metricsBindAddress: ##NODE_IP##:10251
    EOF
    
    • --kubeconfig:指定 kubeconfig 文件路径,kube-scheduler 使用它连接和验证 kube-apiserver;
    • --leader-elect=true:集群运行模式,启用选举功能;被选为 leader 的节点负责处理工作,其它节点为阻塞状态;

    替换模板文件中的变量:

    cd /opt/k8s/work
    
    source /opt/k8s/bin/environment.sh
    for (( i=0; i < 3; i++ ))
      do
        sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" kube-scheduler.yaml.template > kube-scheduler-${NODE_IPS[i]}.yaml
      done
    ls kube-scheduler*.yaml
    
    • NODE_NAMES 和 NODE_IPS 为相同长度的 bash 数组,分别为节点名称和对应的 IP;

    分发 kube-scheduler 配置文件到所有 master 节点:

    cd /opt/k8s/work
    
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        scp kube-scheduler-${node_ip}.yaml root@${node_ip}:/etc/kubernetes/kube-scheduler.yaml
      done
    
    • 重命名为 kube-scheduler.yaml;

    创建 kube-scheduler systemd unit 模板文件

    cd /opt/k8s/work
    
    source /opt/k8s/bin/environment.sh
    cat > kube-scheduler.service.template <<EOF
    [Unit]
    Description=Kubernetes Scheduler
    Documentation=https://github.com/GoogleCloudPlatform/kubernetes
    
    [Service]
    WorkingDirectory=${K8S_DIR}/kube-scheduler
    ExecStart=/opt/k8s/bin/kube-scheduler \
      --config=/etc/kubernetes/kube-scheduler.yaml \
      --bind-address=##NODE_IP## \
      --secure-port=10259 \
      --port=0 \
      --tls-cert-file=/etc/kubernetes/cert/kube-scheduler.pem \
      --tls-private-key-file=/etc/kubernetes/cert/kube-scheduler-key.pem \
      --authentication-kubeconfig=/etc/kubernetes/kube-scheduler.kubeconfig \
      --client-ca-file=/etc/kubernetes/cert/ca.pem \
      --requestheader-allowed-names="" \
      --requestheader-client-ca-file=/etc/kubernetes/cert/ca.pem \
      --requestheader-extra-headers-prefix="X-Remote-Extra-" \
      --requestheader-group-headers=X-Remote-Group \
      --requestheader-username-headers=X-Remote-User \
      --authorization-kubeconfig=/etc/kubernetes/kube-scheduler.kubeconfig \
      --logtostderr=true \
      --v=2
    Restart=always
    RestartSec=5
    StartLimitInterval=0
    
    [Install]
    WantedBy=multi-user.target
    EOF
    

    为各节点创建和分发 kube-scheduler systemd unit 文件

    替换模板文件中的变量,为各节点创建 systemd unit 文件:

    cd /opt/k8s/work
    
    source /opt/k8s/bin/environment.sh
    for (( i=0; i < 3; i++ ))
      do
        sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" kube-scheduler.service.template > kube-scheduler-${NODE_IPS[i]}.service
      done
    
    ls kube-scheduler*.service
    
    • NODE_NAMES 和 NODE_IPS 为相同长度的 bash 数组,分别为节点名称和对应的 IP;

    分发 systemd unit 文件到所有 master 节点:

    cd /opt/k8s/work
    
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        scp kube-scheduler-${node_ip}.service root@${node_ip}:/etc/systemd/system/kube-scheduler.service
      done
    
    • 重命名为 kube-scheduler.service;

    启动 kube-scheduler 服务

    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh root@${node_ip} "mkdir -p ${K8S_DIR}/kube-scheduler"
        ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kube-scheduler && systemctl restart kube-scheduler"
      done
    
    • 启动服务前必须先创建工作目录;

    检查服务运行状态

    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh root@${node_ip} "systemctl status kube-scheduler|grep Active"
      done
    

    确保状态为 active (running),否则查看日志,确认原因:

    journalctl -u kube-scheduler
    

    查看输出的 metrics

    注意:以下命令在 kube-scheduler 节点上执行。

    kube-scheduler 监听 10251 和 10259 端口:

    • 10251:接收 http 请求,非安全端口,不需要认证授权;
    • 10259:接收 https 请求,安全端口,需要认证授权;

    两个接口都对外提供 /metrics/healthz 的访问。

    [root@kube-node1 work]# netstat -lnpt |grep kube-sch
    tcp        0      0 192.168.75.110:10259    0.0.0.0:*               LISTEN      17034/kube-schedule
    tcp        0      0 192.168.75.110:10251    0.0.0.0:*               LISTEN      17034/kube-schedule
    
    [root@kube-node1 work]# curl -s http://192.168.75.110:10251/metrics |head
    # HELP apiserver_audit_event_total Counter of audit events generated and sent to the audit backend.
    # TYPE apiserver_audit_event_total counter
    apiserver_audit_event_total 0
    # HELP apiserver_audit_requests_rejected_total Counter of apiserver requests rejected due to an error in audit logging backend.
    # TYPE apiserver_audit_requests_rejected_total counter
    apiserver_audit_requests_rejected_total 0
    # HELP apiserver_client_certificate_expiration_seconds Distribution of the remaining lifetime on the certificate used to authenticate a request.
    # TYPE apiserver_client_certificate_expiration_seconds histogram
    apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0
    apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0
    
    [root@kube-node1 work]# curl -s --cacert /opt/k8s/work/ca.pem --cert /opt/k8s/work/admin.pem --key /opt/k8s/work/admin-key.pem https://192.168.75.110:10259/metrics |head
    
    # HELP apiserver_audit_event_total Counter of audit events generated and sent to the audit backend.
    # TYPE apiserver_audit_event_total counter
    apiserver_audit_event_total 0
    # HELP apiserver_audit_requests_rejected_total Counter of apiserver requests rejected due to an error in audit logging backend.
    # TYPE apiserver_audit_requests_rejected_total counter
    apiserver_audit_requests_rejected_total 0
    # HELP apiserver_client_certificate_expiration_seconds Distribution of the remaining lifetime on the certificate used to authenticate a request.
    # TYPE apiserver_client_certificate_expiration_seconds histogram
    apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0
    apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0
    

    查看当前的 leader

    [root@kube-node1 work]# kubectl get endpoints kube-scheduler --namespace=kube-system  -o yaml
    
    apiVersion: v1
    kind: Endpoints
    metadata:
      annotations:
        control-plane.alpha.kubernetes.io/leader: '{"holderIdentity":"kube-node1_a0c24012-0152-11ea-9e7b-000c294f53fa","leaseDurationSeconds":15,"acquireTime":"2019-11-07T11:34:59Z","renewTime":"2019-11-07T11:39:36Z","leaderTransitions":0}'
      creationTimestamp: "2019-11-07T11:34:57Z"
      name: kube-scheduler
      namespace: kube-system
      resourceVersion: "6598"
      selfLink: /api/v1/namespaces/kube-system/endpoints/kube-scheduler
      uid: a00f12ce-0152-11ea-98c9-000c291d1820
    

    可见,当前的 leader 为 kube-node1 节点。

    测试 kube-scheduler 集群的高可用

    随便找一个或两个 master 节点,停掉 kube-scheduler 服务,看其它节点是否获取了 leader 权限。

    07-0.部署 worker 节点

    kubernetes worker 节点运行如下组件:

    • docker
    • kubelet
    • kube-proxy
    • flanneld
    • kube-nginx

    注意:如果没有特殊指明,本文档的所有操作均在 kube-node1 节点上执行,然后远程分发文件和执行命令。

    安装和配置 flanneld

    参考 05-部署flannel网络.md

    安装和配置 kube-nginx

    参考 06-0.apiserver高可用之nginx代理.md

    安装依赖包

    CentOS:

    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh root@${node_ip} "yum install -y epel-release"
        ssh root@${node_ip} "yum install -y conntrack ipvsadm ntp ntpdate ipset jq iptables curl sysstat libseccomp && modprobe ip_vs "
      done
    

    Ubuntu:

    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh root@${node_ip} "apt-get install -y conntrack ipvsadm ntp ntpdate ipset jq iptables curl sysstat libseccomp && modprobe ip_vs "
      done
    

    07-1.部署 docker 组件

    docker 运行和管理容器,kubelet 通过 Container Runtime Interface (CRI) 与它进行交互。

    注意:如果没有特殊指明,本文档的所有操作均在 kube-node1 节点上执行,然后远程分发文件和执行命令。

    安装依赖包

    参考 07-0.部署worker节点.md

    下载和分发 docker 二进制文件

    docker 下载页面 下载最新发布包:

    cd /opt/k8s/work
    wget https://download.docker.com/linux/static/stable/x86_64/docker-18.09.6.tgz
    tar -xvf docker-18.09.6.tgz
    

    分发二进制文件到所有 worker 节点:

    cd /opt/k8s/work
    
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        scp docker/*  root@${node_ip}:/opt/k8s/bin/
        ssh root@${node_ip} "chmod +x /opt/k8s/bin/*"
      done
    

    创建和分发 systemd unit 文件

    cd /opt/k8s/work
    
    cat > docker.service <<"EOF"
    [Unit]
    Description=Docker Application Container Engine
    Documentation=http://docs.docker.io
    
    [Service]
    WorkingDirectory=##DOCKER_DIR##
    Environment="PATH=/opt/k8s/bin:/bin:/sbin:/usr/bin:/usr/sbin"
    EnvironmentFile=-/run/flannel/docker
    ExecStart=/opt/k8s/bin/dockerd $DOCKER_NETWORK_OPTIONS
    ExecReload=/bin/kill -s HUP $MAINPID
    Restart=on-failure
    RestartSec=5
    LimitNOFILE=infinity
    LimitNPROC=infinity
    LimitCORE=infinity
    Delegate=yes
    KillMode=process
    
    [Install]
    WantedBy=multi-user.target
    EOF
    
    • EOF 前后有双引号,这样 bash 不会替换文档中的变量,如 $DOCKER_NETWORK_OPTIONS (这些环境变量是 systemd 负责替换的。);

    • dockerd 运行时会调用其它 docker 命令,如 docker-proxy,所以需要将 docker 命令所在的目录加到 PATH 环境变量中;

    • flanneld 启动时将网络配置写入 /run/flannel/docker 文件中,dockerd 启动前读取该文件中的环境变量 DOCKER_NETWORK_OPTIONS ,然后设置 docker0 网桥网段;

    • 如果指定了多个 EnvironmentFile 选项,则必须将 /run/flannel/docker 放在最后(确保 docker0 使用 flanneld 生成的 bip 参数);

    • docker 需要以 root 用于运行;

    • docker 从 1.13 版本开始,可能将 iptables FORWARD chain的默认策略设置为DROP,从而导致 ping 其它 Node 上的 Pod IP 失败,遇到这种情况时,需要手动设置策略为 ACCEPT

      $ sudo iptables -P FORWARD ACCEPT
      

      并且把以下命令写入 /etc/rc.local 文件中,防止节点重启iptables FORWARD chain的默认策略又还原为DROP

      /sbin/iptables -P FORWARD ACCEPT
      

    分发 systemd unit 文件到所有 worker 机器:

    cd /opt/k8s/work
    
    source /opt/k8s/bin/environment.sh
    sed -i -e "s|##DOCKER_DIR##|${DOCKER_DIR}|" docker.service
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        scp docker.service root@${node_ip}:/etc/systemd/system/
      done
    

    配置和分发 docker 配置文件

    使用国内的仓库镜像服务器以加快 pull image 的速度,同时增加下载的并发数 (需要重启 dockerd 生效):

    cd /opt/k8s/work
    
    source /opt/k8s/bin/environment.sh
    cat > docker-daemon.json <<EOF
    {
        "registry-mirrors": ["https://docker.mirrors.ustc.edu.cn","https://hub-mirror.c.163.com"],
        "insecure-registries": ["docker02:35000"],
        "max-concurrent-downloads": 20,
        "live-restore": true,
        "max-concurrent-uploads": 10,
        "debug": true,
        "data-root": "${DOCKER_DIR}/data",
        "exec-root": "${DOCKER_DIR}/exec",
        "log-opts": {
          "max-size": "100m",
          "max-file": "5"
        }
    }
    EOF
    

    分发 docker 配置文件到所有 worker 节点:

    cd /opt/k8s/work
    
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh root@${node_ip} "mkdir -p  /etc/docker/ ${DOCKER_DIR}/{data,exec}"
        scp docker-daemon.json root@${node_ip}:/etc/docker/daemon.json
      done
    

    启动 docker 服务

    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh root@${node_ip} "systemctl daemon-reload && systemctl enable docker && systemctl restart docker"
      done
    

    检查服务运行状态

    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh root@${node_ip} "systemctl status docker|grep Active"
      done
    

    确保状态为 active (running),否则查看日志,确认原因:

    journalctl -u docker
    

    检查 docker0 网桥

    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh root@${node_ip} "/usr/sbin/ip addr show flannel.1 && /usr/sbin/ip addr show docker0"
      done
    

    确认各 worker 节点的 docker0 网桥和 flannel.1 接口的 IP 处于同一个网段中(如下172.30.24.0/32 位于 172.30.24.1/21 中):

    >>> 192.168.75.110
    3: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
        link/ether ea:90:d9:9a:7c:a7 brd ff:ff:ff:ff:ff:ff
        inet 172.30.24.0/32 scope global flannel.1
           valid_lft forever preferred_lft forever
    4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
        link/ether 02:42:a8:55:ff:36 brd ff:ff:ff:ff:ff:ff
        inet 172.30.24.1/21 brd 172.30.31.255 scope global docker0
           valid_lft forever preferred_lft forever
    

    注意: 如果您的服务安装顺序不对或者机器环境比较复杂, docker服务早于flanneld服务安装,此时 worker 节点的 docker0 网桥和 flannel.1 接口的 IP可能不会同处同一个网段下,这个时候请先停止docker服务, 手工删除docker0网卡,重新启动docker服务后即可修复:

    systemctl stop docker
    ip link delete docker0
    systemctl start docker
    

    查看 docker 的状态信息

    [root@kube-node1 work]# ps -elfH|grep docker
    4 S root      22497      1  0  80   0 - 108496 ep_pol 20:44 ?       00:00:00   /opt/k8s/bin/dockerd --bip=172.30.24.1/21 --ip-masq=false --mtu=1450
    4 S root      22515  22497  0  80   0 - 136798 futex_ 20:44 ?       00:00:00     containerd --config /data/k8s/docker/exec/containerd/containerd.toml --log-level debug
    
    [root@kube-node1 work]# docker info
    Containers: 0
     Running: 0
     Paused: 0
     Stopped: 0
    Images: 0
    Server Version: 18.09.6
    Storage Driver: overlay2
     Backing Filesystem: xfs
     Supports d_type: true
     Native Overlay Diff: true
    Logging Driver: json-file
    Cgroup Driver: cgroupfs
    Plugins:
     Volume: local
     Network: bridge host macvlan null overlay
     Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
    Swarm: inactive
    Runtimes: runc
    Default Runtime: runc
    Init Binary: docker-init
    containerd version: bb71b10fd8f58240ca47fbb579b9d1028eea7c84
    runc version: 2b18fe1d885ee5083ef9f0838fee39b62d653e30
    init version: fec3683
    Security Options:
     seccomp
      Profile: default
    Kernel Version: 4.4.199-1.el7.elrepo.x86_64
    Operating System: CentOS Linux 7 (Core)
    OSType: linux
    Architecture: x86_64
    CPUs: 4
    Total Memory: 1.936GiB
    Name: kube-node1
    ID: MQYP:O7RJ:F22K:TYEC:C5UW:XOLP:XRMF:VF6J:6JVH:AMGN:YLAI:U2FJ
    Docker Root Dir: /data/k8s/docker/data
    Debug Mode (client): false
    Debug Mode (server): true
     File Descriptors: 22
     Goroutines: 43
     System Time: 2019-11-07T20:48:23.252463652+08:00
     EventsListeners: 0
    Registry: https://index.docker.io/v1/
    Labels:
    Experimental: false
    Insecure Registries:
     docker02:35000
     127.0.0.0/8
    Registry Mirrors:
     https://docker.mirrors.ustc.edu.cn/
     https://hub-mirror.c.163.com/
    Live Restore Enabled: true
    Product License: Community Engine
    

    07-2.部署 kubelet 组件

    kubelet 运行在每个 worker 节点上,接收 kube-apiserver 发送的请求,管理 Pod 容器,执行交互式命令,如 exec、run、logs 等。

    kubelet 启动时自动向 kube-apiserver 注册节点信息,内置的 cadvisor 统计和监控节点的资源使用情况。

    为确保安全,部署时关闭了 kubelet 的非安全 http 端口,对请求进行认证和授权,拒绝未授权的访问(如 apiserver、heapster 的请求)。

    注意:如果没有特殊指明,本文档的所有操作均在 kube-node1 节点上执行,然后远程分发文件和执行命令。

    下载和分发 kubelet 二进制文件

    参考 06-1.部署master节点.md

    安装依赖包

    参考 07-0.部署worker节点.md

    创建 kubelet bootstrap kubeconfig 文件

    cd /opt/k8s/work
    
    source /opt/k8s/bin/environment.sh
    for node_name in ${NODE_NAMES[@]}
      do
        echo ">>> ${node_name}"
    
        # 创建 token
        export BOOTSTRAP_TOKEN=$(kubeadm token create 
          --description kubelet-bootstrap-token 
          --groups system:bootstrappers:${node_name} 
          --kubeconfig ~/.kube/config)
    
        # 设置集群参数
        kubectl config set-cluster kubernetes 
          --certificate-authority=/etc/kubernetes/cert/ca.pem 
          --embed-certs=true 
          --server=${KUBE_APISERVER} 
          --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig
    
        # 设置客户端认证参数
        kubectl config set-credentials kubelet-bootstrap 
          --token=${BOOTSTRAP_TOKEN} 
          --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig
    
        # 设置上下文参数
        kubectl config set-context default 
          --cluster=kubernetes 
          --user=kubelet-bootstrap 
          --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig
    
        # 设置默认上下文
        kubectl config use-context default --kubeconfig=kubelet-bootstrap-${node_name}.kubeconfig
      done
    
    • 向 kubeconfig 写入的是 token,bootstrap 结束后 kube-controller-manager 为 kubelet 创建 client 和 server 证书;

    查看 kubeadm 为各节点创建的 token:

    [root@kube-node1 work]# kubeadm token list --kubeconfig ~/.kube/config
    TOKEN                     TTL       EXPIRES                     USAGES                   DESCRIPTION               EXTRA GROUPS
    83n69a.70n786zxgkhl1agc   23h       2019-11-08T20:52:48+08:00   authentication,signing   kubelet-bootstrap-token   system:bootstrappers:kube-node1
    99ljss.x7u9m04h01js5juo   23h       2019-11-08T20:52:48+08:00   authentication,signing   kubelet-bootstrap-token   system:bootstrappers:kube-node2
    9pfh4d.2on6eizmkzy3pgr1   23h       2019-11-08T20:52:48+08:00   authentication,signing   kubelet-bootstrap-token   system:bootstrappers:kube-node3
    
    • token 有效期为 1 天,超期后将不能再被用来 boostrap kubelet,且会被 kube-controller-manager 的 tokencleaner 清理;
    • kube-apiserver 接收 kubelet 的 bootstrap token 后,将请求的 user 设置为 system:bootstrap:<Token ID>,group 设置为 system:bootstrappers,后续将为这个 group 设置 ClusterRoleBinding;

    查看各 token 关联的 Secret:

    [root@kube-node1 work]# kubectl get secrets  -n kube-system|grep bootstrap-token
    bootstrap-token-83n69a                           bootstrap.kubernetes.io/token         7      63s
    bootstrap-token-99ljss                           bootstrap.kubernetes.io/token         7      62s
    bootstrap-token-9pfh4d                           bootstrap.kubernetes.io/token         7      62s
    

    分发 bootstrap kubeconfig 文件到所有 worker 节点

    cd /opt/k8s/work
    
    source /opt/k8s/bin/environment.sh
    for node_name in ${NODE_NAMES[@]}
      do
        echo ">>> ${node_name}"
        scp kubelet-bootstrap-${node_name}.kubeconfig root@${node_name}:/etc/kubernetes/kubelet-bootstrap.kubeconfig
      done
    

    创建和分发 kubelet 参数配置文件

    从 v1.10 开始,部分 kubelet 参数需在配置文件中配置,kubelet --help 会提示:

    DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag
    

    创建 kubelet 参数配置文件模板(可配置项参考代码中注释 ):

    cd /opt/k8s/work
    
    source /opt/k8s/bin/environment.sh
    cat > kubelet-config.yaml.template <<EOF
    kind: KubeletConfiguration
    apiVersion: kubelet.config.k8s.io/v1beta1
    address: "##NODE_IP##"
    staticPodPath: ""
    syncFrequency: 1m
    fileCheckFrequency: 20s
    httpCheckFrequency: 20s
    staticPodURL: ""
    port: 10250
    readOnlyPort: 0
    rotateCertificates: true
    serverTLSBootstrap: true
    authentication:
      anonymous:
        enabled: false
      webhook:
        enabled: true
      x509:
        clientCAFile: "/etc/kubernetes/cert/ca.pem"
    authorization:
      mode: Webhook
    registryPullQPS: 0
    registryBurst: 20
    eventRecordQPS: 0
    eventBurst: 20
    enableDebuggingHandlers: true
    enableContentionProfiling: true
    healthzPort: 10248
    healthzBindAddress: "##NODE_IP##"
    clusterDomain: "${CLUSTER_DNS_DOMAIN}"
    clusterDNS:
      - "${CLUSTER_DNS_SVC_IP}"
    nodeStatusUpdateFrequency: 10s
    nodeStatusReportFrequency: 1m
    imageMinimumGCAge: 2m
    imageGCHighThresholdPercent: 85
    imageGCLowThresholdPercent: 80
    volumeStatsAggPeriod: 1m
    kubeletCgroups: ""
    systemCgroups: ""
    cgroupRoot: ""
    cgroupsPerQOS: true
    cgroupDriver: cgroupfs
    runtimeRequestTimeout: 10m
    hairpinMode: promiscuous-bridge
    maxPods: 220
    podCIDR: "${CLUSTER_CIDR}"
    podPidsLimit: -1
    resolvConf: /etc/resolv.conf
    maxOpenFiles: 1000000
    kubeAPIQPS: 1000
    kubeAPIBurst: 2000
    serializeImagePulls: false
    evictionHard:
      memory.available:  "100Mi"
    nodefs.available:  "10%"
    nodefs.inodesFree: "5%"
    imagefs.available: "15%"
    evictionSoft: {}
    enableControllerAttachDetach: true
    failSwapOn: true
    containerLogMaxSize: 20Mi
    containerLogMaxFiles: 10
    systemReserved: {}
    kubeReserved: {}
    systemReservedCgroup: ""
    kubeReservedCgroup: ""
    enforceNodeAllocatable: ["pods"]
    EOF
    
    • address:kubelet 安全端口(https,10250)监听的地址,不能为 127.0.0.1,否则 kube-apiserver、heapster 等不能调用 kubelet 的 API;
    • readOnlyPort=0:关闭只读端口(默认 10255),等效为未指定;
    • authentication.anonymous.enabled:设置为 false,不允许匿名�访问 10250 端口;
    • authentication.x509.clientCAFile:指定签名客户端证书的 CA 证书,开启 HTTP 证书认证;
    • authentication.webhook.enabled=true:开启 HTTPs bearer token 认证;
    • 对于未通过 x509 证书和 webhook 认证的请求(kube-apiserver 或其他客户端),将被拒绝,提示 Unauthorized;
    • authroization.mode=Webhook:kubelet 使用 SubjectAccessReview API 查询 kube-apiserver 某 user、group 是否具有操作资源的权限(RBAC);
    • featureGates.RotateKubeletClientCertificate、featureGates.RotateKubeletServerCertificate:自动 rotate 证书,证书的有效期取决于 kube-controller-manager 的 --experimental-cluster-signing-duration 参数;
    • 需要 root 账户运行;

    为各节点创建和分发 kubelet 配置文件:

    cd /opt/k8s/work
    
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        sed -e "s/##NODE_IP##/${node_ip}/" kubelet-config.yaml.template > kubelet-config-${node_ip}.yaml.template
        scp kubelet-config-${node_ip}.yaml.template root@${node_ip}:/etc/kubernetes/kubelet-config.yaml
      done
    

    创建和分发 kubelet systemd unit 文件

    创建 kubelet systemd unit 文件模板:

    cd /opt/k8s/work
    
    source /opt/k8s/bin/environment.sh
    cat > kubelet.service.template <<EOF
    [Unit]
    Description=Kubernetes Kubelet
    Documentation=https://github.com/GoogleCloudPlatform/kubernetes
    After=docker.service
    Requires=docker.service
    
    [Service]
    WorkingDirectory=${K8S_DIR}/kubelet
    ExecStart=/opt/k8s/bin/kubelet \
      --allow-privileged=true \
      --bootstrap-kubeconfig=/etc/kubernetes/kubelet-bootstrap.kubeconfig \
      --cert-dir=/etc/kubernetes/cert \
      --cni-conf-dir=/etc/cni/net.d \
      --container-runtime=docker \
      --container-runtime-endpoint=unix:///var/run/dockershim.sock \
      --root-dir=${K8S_DIR}/kubelet \
      --kubeconfig=/etc/kubernetes/kubelet.kubeconfig \
      --config=/etc/kubernetes/kubelet-config.yaml \
      --hostname-override=##NODE_NAME## \
      --pod-infra-container-image=registry.cn-beijing.aliyuncs.com/images_k8s/pause-amd64:3.1 \
      --image-pull-progress-deadline=15m \
      --volume-plugin-dir=${K8S_DIR}/kubelet/kubelet-plugins/volume/exec/ \
      --logtostderr=true \
      --v=2
    Restart=always
    RestartSec=5
    StartLimitInterval=0
    
    [Install]
    WantedBy=multi-user.target
    EOF
    
    • 如果设置了 --hostname-override 选项,则 kube-proxy 也需要设置该选项,否则会出现找不到 Node 的情况;
    • --bootstrap-kubeconfig:指向 bootstrap kubeconfig 文件,kubelet 使用该文件中的用户名和 token 向 kube-apiserver 发送 TLS Bootstrapping 请求;
    • K8S approve kubelet 的 csr 请求后,在 --cert-dir 目录创建证书和私钥文件,然后写入 --kubeconfig 文件;
    • --pod-infra-container-image 不使用 redhat 的 pod-infrastructure:latest 镜像,它不能回收容器的僵尸;

    为各节点创建和分发 kubelet systemd unit 文件:

    cd /opt/k8s/work
    
    source /opt/k8s/bin/environment.sh
    for node_name in ${NODE_NAMES[@]}
      do
        echo ">>> ${node_name}"
        sed -e "s/##NODE_NAME##/${node_name}/" kubelet.service.template > kubelet-${node_name}.service
        scp kubelet-${node_name}.service root@${node_name}:/etc/systemd/system/kubelet.service
      done
    

    Bootstrap Token Auth 和授予权限

    kubelet 启动时查找 --kubeletconfig 参数对应的文件是否存在,如果不存在则使用 --bootstrap-kubeconfig 指定的 kubeconfig 文件向 kube-apiserver 发送证书签名请求 (CSR)。

    kube-apiserver 收到 CSR 请求后,对其中的 Token 进行认证,认证通过后将请求的 user 设置为 system:bootstrap:<Token ID>,group 设置为 system:bootstrappers,这一过程称为 Bootstrap Token Auth。

    默认情况下,这个 user 和 group 没有创建 CSR 的权限,kubelet 启动失败,错误日志如下:

    $ sudo journalctl -u kubelet -a |grep -A 2 'certificatesigningrequests'
    May 26 12:13:41 zhangjun-k8s01 kubelet[128468]: I0526 12:13:41.798230  128468 certificate_manager.go:366] Rotating certificates
    May 26 12:13:41 zhangjun-k8s01 kubelet[128468]: E0526 12:13:41.801997  128468 certificate_manager.go:385] Failed while requesting a signed certificate from the master: cannot cre
    ate certificate signing request: certificatesigningrequests.certificates.k8s.io is forbidden: User "system:bootstrap:82jfrm" cannot create resource "certificatesigningrequests" i
    n API group "certificates.k8s.io" at the cluster scope
    May 26 12:13:42 zhangjun-k8s01 kubelet[128468]: E0526 12:13:42.044828  128468 kubelet.go:2244] node "zhangjun-k8s01" not found
    May 26 12:13:42 zhangjun-k8s01 kubelet[128468]: E0526 12:13:42.078658  128468 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/kubelet.go:442: Failed to list *v1.Service: Unauthor
    ized
    May 26 12:13:42 zhangjun-k8s01 kubelet[128468]: E0526 12:13:42.079873  128468 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Node: Unauthorize
    d
    May 26 12:13:42 zhangjun-k8s01 kubelet[128468]: E0526 12:13:42.082683  128468 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.CSIDriver: Unau
    thorized
    May 26 12:13:42 zhangjun-k8s01 kubelet[128468]: E0526 12:13:42.084473  128468 reflector.go:126] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Unau
    thorized
    May 26 12:13:42 zhangjun-k8s01 kubelet[128468]: E0526 12:13:42.088466  128468 reflector.go:126] k8s.io/client-go/informers/factory.go:133: Failed to list *v1beta1.RuntimeClass: U
    nauthorized
    

    解决办法是:创建一个 clusterrolebinding,将 group system:bootstrappers 和 clusterrole system:node-bootstrapper 绑定:

    $ kubectl create clusterrolebinding kubelet-bootstrap --clusterrole=system:node-bootstrapper --group=system:bootstrappers
    

    启动 kubelet 服务

    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh root@${node_ip} "mkdir -p ${K8S_DIR}/kubelet/kubelet-plugins/volume/exec/"
        ssh root@${node_ip} "/usr/sbin/swapoff -a"
        ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kubelet && systemctl restart kubelet"
      done
    
    • 启动服务前必须先创建工作目录;
    • 关闭 swap 分区,否则 kubelet 会启动失败;
    $ journalctl -u kubelet |tail
    8月 15 12:16:49 zhangjun-k8s01 kubelet[7807]: I0815 12:16:49.578598    7807 feature_gate.go:230] feature gates: &{map[RotateKubeletClientCertificate:true RotateKubeletServerCertificate:true]}
    8月 15 12:16:49 zhangjun-k8s01 kubelet[7807]: I0815 12:16:49.578698    7807 feature_gate.go:230] feature gates: &{map[RotateKubeletClientCertificate:true RotateKubeletServerCertificate:true]}
    8月 15 12:16:50 zhangjun-k8s01 kubelet[7807]: I0815 12:16:50.205871    7807 mount_linux.go:214] Detected OS with systemd
    8月 15 12:16:50 zhangjun-k8s01 kubelet[7807]: I0815 12:16:50.205939    7807 server.go:408] Version: v1.11.2
    8月 15 12:16:50 zhangjun-k8s01 kubelet[7807]: I0815 12:16:50.206013    7807 feature_gate.go:230] feature gates: &{map[RotateKubeletClientCertificate:true RotateKubeletServerCertificate:true]}
    8月 15 12:16:50 zhangjun-k8s01 kubelet[7807]: I0815 12:16:50.206101    7807 feature_gate.go:230] feature gates: &{map[RotateKubeletServerCertificate:true RotateKubeletClientCertificate:true]}
    8月 15 12:16:50 zhangjun-k8s01 kubelet[7807]: I0815 12:16:50.206217    7807 plugins.go:97] No cloud provider specified.
    8月 15 12:16:50 zhangjun-k8s01 kubelet[7807]: I0815 12:16:50.206237    7807 server.go:524] No cloud provider specified: "" from the config file: ""
    8月 15 12:16:50 zhangjun-k8s01 kubelet[7807]: I0815 12:16:50.206264    7807 bootstrap.go:56] Using bootstrap kubeconfig to generate TLS client cert, key and kubeconfig file
    8月 15 12:16:50 zhangjun-k8s01 kubelet[7807]: I0815 12:16:50.208628    7807 bootstrap.go:86] No valid private key and/or certificate found, reusing existing private key or creating a new one
    

    kubelet 启动后使用 --bootstrap-kubeconfig 向 kube-apiserver 发送 CSR 请求,当这个 CSR 被 approve 后,kube-controller-manager 为 kubelet 创建 TLS 客户端证书、私钥和 --kubeletconfig 文件。

    注意:kube-controller-manager 需要配置 --cluster-signing-cert-file--cluster-signing-key-file 参数,才会为 TLS Bootstrap 创建证书和私钥。

    [root@kube-node1 work]# kubectl get csr  
    NAME        AGE     REQUESTOR                 CONDITION
    csr-4stvn   67m     system:bootstrap:9pfh4d   Pending
    csr-5dc4g   18m     system:bootstrap:99ljss   Pending
    csr-5xbbr   18m     system:bootstrap:9pfh4d   Pending
    csr-6599v   64m     system:bootstrap:83n69a   Pending
    csr-7z2mv   3m34s   system:bootstrap:9pfh4d   Pending
    csr-89fmf   3m35s   system:bootstrap:99ljss   Pending
    csr-9kqzb   34m     system:bootstrap:83n69a   Pending
    csr-c6chv   3m38s   system:bootstrap:83n69a   Pending
    csr-cxk4d   49m     system:bootstrap:83n69a   Pending
    csr-h7prh   49m     system:bootstrap:9pfh4d   Pending
    csr-jh6hp   34m     system:bootstrap:9pfh4d   Pending
    csr-jwv9x   64m     system:bootstrap:99ljss   Pending
    csr-k8ss7   18m     system:bootstrap:83n69a   Pending
    csr-nnwwm   49m     system:bootstrap:99ljss   Pending
    csr-q87ps   67m     system:bootstrap:99ljss   Pending
    csr-t4bb5   64m     system:bootstrap:9pfh4d   Pending
    csr-wpjh5   34m     system:bootstrap:99ljss   Pending
    csr-zmrbh   67m     system:bootstrap:83n69a   Pending
    
    [root@kube-node1 work]# kubectl get nodes
    No resources found.
    
    • 三个 worker 节点的 csr 均处于 pending 状态;

    自动 approve CSR 请求

    创建三个 ClusterRoleBinding,分别用于自动 approve client、renew client、renew server 证书:

    cd /opt/k8s/work
    
    cat > csr-crb.yaml <<EOF
     # Approve all CSRs for the group "system:bootstrappers"
     kind: ClusterRoleBinding
     apiVersion: rbac.authorization.k8s.io/v1
     metadata:
       name: auto-approve-csrs-for-group
     subjects:
     - kind: Group
       name: system:bootstrappers
       apiGroup: rbac.authorization.k8s.io
     roleRef:
       kind: ClusterRole
       name: system:certificates.k8s.io:certificatesigningrequests:nodeclient
       apiGroup: rbac.authorization.k8s.io
    ---
     # To let a node of the group "system:nodes" renew its own credentials
     kind: ClusterRoleBinding
     apiVersion: rbac.authorization.k8s.io/v1
     metadata:
       name: node-client-cert-renewal
     subjects:
     - kind: Group
       name: system:nodes
       apiGroup: rbac.authorization.k8s.io
     roleRef:
       kind: ClusterRole
       name: system:certificates.k8s.io:certificatesigningrequests:selfnodeclient
       apiGroup: rbac.authorization.k8s.io
    ---
    # A ClusterRole which instructs the CSR approver to approve a node requesting a
    # serving cert matching its client cert.
    kind: ClusterRole
    apiVersion: rbac.authorization.k8s.io/v1
    metadata:
      name: approve-node-server-renewal-csr
    rules:
    - apiGroups: ["certificates.k8s.io"]
      resources: ["certificatesigningrequests/selfnodeserver"]
      verbs: ["create"]
    ---
     # To let a node of the group "system:nodes" renew its own server credentials
     kind: ClusterRoleBinding
     apiVersion: rbac.authorization.k8s.io/v1
     metadata:
       name: node-server-cert-renewal
     subjects:
     - kind: Group
       name: system:nodes
       apiGroup: rbac.authorization.k8s.io
     roleRef:
       kind: ClusterRole
       name: approve-node-server-renewal-csr
       apiGroup: rbac.authorization.k8s.io
    EOF
    kubectl apply -f csr-crb.yaml
    
    • auto-approve-csrs-for-group:自动 approve node 的第一次 CSR; 注意第一次 CSR 时,请求的 Group 为 system:bootstrappers;
    • node-client-cert-renewal:自动 approve node 后续过期的 client 证书,自动生成的证书 Group 为 system:nodes;
    • node-server-cert-renewal:自动 approve node 后续过期的 server 证书,自动生成的证书 Group 为 system:nodes;

    查看 kubelet 的情况

    等待一段时间(1-10 分钟),三个节点的 CSR 都被自动 approved:

    [root@kube-node1 work]# kubectl get csr  
    NAME        AGE     REQUESTOR                 CONDITION
    csr-4stvn   70m     system:bootstrap:9pfh4d   Pending
    csr-5dc4g   22m     system:bootstrap:99ljss   Pending
    csr-5xbbr   22m     system:bootstrap:9pfh4d   Pending
    csr-6599v   67m     system:bootstrap:83n69a   Pending
    csr-7z2mv   7m22s   system:bootstrap:9pfh4d   Approved,Issued
    csr-89fmf   7m23s   system:bootstrap:99ljss   Approved,Issued
    csr-9kqzb   37m     system:bootstrap:83n69a   Pending
    csr-c6chv   7m26s   system:bootstrap:83n69a   Approved,Issued
    csr-cxk4d   52m     system:bootstrap:83n69a   Pending
    csr-h7prh   52m     system:bootstrap:9pfh4d   Pending
    csr-jfvv4   30s     system:node:kube-node1    Pending
    csr-jh6hp   37m     system:bootstrap:9pfh4d   Pending
    csr-jwv9x   67m     system:bootstrap:99ljss   Pending
    csr-k8ss7   22m     system:bootstrap:83n69a   Pending
    csr-nnwwm   52m     system:bootstrap:99ljss   Pending
    csr-q87ps   70m     system:bootstrap:99ljss   Pending
    csr-t4bb5   67m     system:bootstrap:9pfh4d   Pending
    csr-w2w2k   16s     system:node:kube-node3    Pending
    csr-wpjh5   37m     system:bootstrap:99ljss   Pending
    csr-z5nww   23s     system:node:kube-node2    Pending
    csr-zmrbh   70m     system:bootstrap:83n69a   Pending
    
    • Pending 的 CSR 用于创建 kubelet server 证书,需要手动 approve,参考后文。

    所有节点均 ready:

    [root@kube-node1 work]# kubectl get nodes
    NAME         STATUS   ROLES    AGE   VERSION
    kube-node1   Ready    <none>   76s   v1.14.2
    kube-node2   Ready    <none>   69s   v1.14.2
    kube-node3   Ready    <none>   61s   v1.14.2
    

    kube-controller-manager 为各 node 生成了 kubeconfig 文件和公私钥:

    [root@kube-node1 work]# ls -l /etc/kubernetes/kubelet.kubeconfig
    -rw------- 1 root root 2310 Nov  7 21:04 /etc/kubernetes/kubelet.kubeconfig
    
    [root@kube-node1 work]# ls -l /etc/kubernetes/cert/|grep kubelet
    -rw------- 1 root root 1277 Nov  7 22:11 kubelet-client-2019-11-07-22-11-52.pem
    lrwxrwxrwx 1 root root   59 Nov  7 22:11 kubelet-client-current.pem -> /etc/kubernetes/cert/kubelet-client-2019-11-07-22-11-52.pem
    
    • 没有自动生成 kubelet server 证书;

    手动 approve server cert csr

    基于安全性考虑,CSR approving controllers 不会自动 approve kubelet server 证书签名请求,需要手动 approve:

    # 如下这个根据实际情况而定
    # kubectl get csr
    NAME        AGE     REQUESTOR                    CONDITION
    csr-5f4vh   9m25s   system:bootstrap:82jfrm      Approved,Issued
    csr-5r7j7   6m11s   system:node:zhangjun-k8s03   Pending
    csr-5rw7s   9m23s   system:bootstrap:b1f7np      Approved,Issued
    csr-9snww   8m3s    system:bootstrap:82jfrm      Approved,Issued
    csr-c7z56   6m12s   system:node:zhangjun-k8s02   Pending
    csr-j55lh   6m12s   system:node:zhangjun-k8s01   Pending
    csr-m29fm   9m25s   system:bootstrap:3gzd53      Approved,Issued
    csr-rc8w7   8m3s    system:bootstrap:3gzd53      Approved,Issued
    csr-vd52r   8m2s    system:bootstrap:b1f7np      Approved,Issued
    
    # kubectl certificate approve csr-5r7j7
    certificatesigningrequest.certificates.k8s.io/csr-5r7j7 approved
    
    # kubectl certificate approve csr-c7z56
    certificatesigningrequest.certificates.k8s.io/csr-c7z56 approved
    
    # kubectl certificate approve csr-j55lh
    certificatesigningrequest.certificates.k8s.io/csr-j55lh approved
    
    [root@kube-node1 work]# ls -l /etc/kubernetes/cert/kubelet-*
    -rw------- 1 root root 1277 Nov  7 22:11 /etc/kubernetes/cert/kubelet-client-2019-11-07-22-11-52.pem
    lrwxrwxrwx 1 root root   59 Nov  7 22:11 /etc/kubernetes/cert/kubelet-client-current.pem -> /etc/kubernetes/cert/kubelet-client-2019-11-07-22-11-52.pem
    -rw------- 1 root root 1317 Nov  7 22:23 /etc/kubernetes/cert/kubelet-server-2019-11-07-22-23-05.pem
    lrwxrwxrwx 1 root root   59 Nov  7 22:23 /etc/kubernetes/cert/kubelet-server-current.pem -> /etc/kubernetes/cert/kubelet-server-2019-11-07-22-23-05.pem
    

    kubelet 提供的 API 接口

    kubelet 启动后监听多个端口,用于接收 kube-apiserver 或其它客户端发送的请求:

    [root@kube-node1 work]# netstat -lnpt|grep kubelet
    tcp        0      0 127.0.0.1:38735         0.0.0.0:*               LISTEN      24609/kubelet       
    tcp        0      0 192.168.75.110:10248    0.0.0.0:*               LISTEN      24609/kubelet       
    tcp        0      0 192.168.75.110:10250    0.0.0.0:*               LISTEN      24609/kubelet  
    
    • 10248: healthz http 服务;
    • 10250: https 服务,访问该端口时需要认证和授权(即使访问 /healthz 也需要);
    • 未开启只读端口 10255;
    • 从 K8S v1.10 开始,去除了 --cadvisor-port 参数(默认 4194 端口),不支持访问 cAdvisor UI & API。

    例如执行 kubectl exec -it nginx-ds-5rmws -- sh 命令时,kube-apiserver 会向 kubelet 发送如下请求:

    POST /exec/default/nginx-ds-5rmws/my-nginx?command=sh&input=1&output=1&tty=1
    

    kubelet 接收 10250 端口的 https 请求,可以访问如下资源:

    • /pods、/runningpods
    • /metrics、/metrics/cadvisor、/metrics/probes
    • /spec
    • /stats、/stats/container
    • /logs
    • /run/、/exec/, /attach/, /portForward/, /containerLogs/

    详情参考:https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/server/server.go#L434:3

    由于关闭了匿名认证,同时开启了 webhook 授权,所有访问 10250 端口 https API 的请求都需要被认证和授权。

    预定义的 ClusterRole system:kubelet-api-admin 授予访问 kubelet 所有 API 的权限(kube-apiserver 使用的 kubernetes 证书 User 授予了该权限):

    [root@kube-node1 work]# kubectl describe clusterrole system:kubelet-api-admin
    Name:         system:kubelet-api-admin
    Labels:       kubernetes.io/bootstrapping=rbac-defaults
    Annotations:  rbac.authorization.kubernetes.io/autoupdate: true
    PolicyRule:
      Resources      Non-Resource URLs  Resource Names  Verbs
      ---------      -----------------  --------------  -----
      nodes/log      []                 []              [*]
      nodes/metrics  []                 []              [*]
      nodes/proxy    []                 []              [*]
      nodes/spec     []                 []              [*]
      nodes/stats    []                 []              [*]
      nodes          []                 []              [get list watch proxy]
    

    kubelet api 认证和授权

    kubelet 配置了如下认证参数:

    • authentication.anonymous.enabled:设置为 false,不允许匿名�访问 10250 端口;
    • authentication.x509.clientCAFile:指定签名客户端证书的 CA 证书,开启 HTTPs 证书认证;
    • authentication.webhook.enabled=true:开启 HTTPs bearer token 认证;

    同时配置了如下授权参数:

    • authroization.mode=Webhook:开启 RBAC 授权;

    kubelet 收到请求后,使用 clientCAFile 对证书签名进行认证,或者查询 bearer token 是否有效。如果两者都没通过,则拒绝请求,提示 Unauthorized:

    [root@kube-node1 ~]# curl -s --cacert /etc/kubernetes/cert/ca.pem https://192.168.75.110:10250/metrics
    Unauthorized
    
    [root@kube-node1 ~]# curl -s --cacert /etc/kubernetes/cert/ca.pem -H "Authorization: Bearer 123456" https://192.168.75.110:10250/metrics
    Unauthorized
    

    通过认证后,kubelet 使用 SubjectAccessReview API 向 kube-apiserver 发送请求,查询证书或 token 对应的 user、group 是否有操作资源的权限(RBAC);

    证书认证和授权

    $ # 权限不足的证书;
    [root@kube-node1 ~]# curl -s --cacert /etc/kubernetes/cert/ca.pem --cert /etc/kubernetes/cert/kube-controller-manager.pem --key /etc/kubernetes/cert/kube-controller-manager-key.pem https://192.168.75.110:10250/metrics
    Forbidden (user=system:kube-controller-manager, verb=get, resource=nodes, subresource=metrics)
    
    # 使用部署 kubectl 命令行工具时创建的、具有最高权限的 admin 证书
    
    [root@kube-node1 work]# curl -s --cacert /etc/kubernetes/cert/ca.pem --cert /opt/k8s/work/admin.pem --key /opt/k8s/work/admin-key.pem https://192.168.75.110:10250/metrics|head
    # HELP apiserver_audit_event_total Counter of audit events generated and sent to the audit backend.
    # TYPE apiserver_audit_event_total counter
    apiserver_audit_event_total 0
    # HELP apiserver_audit_requests_rejected_total Counter of apiserver requests rejected due to an error in audit logging backend.
    # TYPE apiserver_audit_requests_rejected_total counter
    apiserver_audit_requests_rejected_total 0
    # HELP apiserver_client_certificate_expiration_seconds Distribution of the remaining lifetime on the certificate used to authenticate a request.
    # TYPE apiserver_client_certificate_expiration_seconds histogram
    apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0
    apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0
    
    • --cacert--cert--key 的参数值必须是文件路径,如上面的 ./admin.pem 不能省略 ./,否则返回 401 Unauthorized

    bear token 认证和授权

    创建一个 ServiceAccount,将它和 ClusterRole system:kubelet-api-admin 绑定,从而具有调用 kubelet API 的权限:

    kubectl create sa kubelet-api-test
    kubectl create clusterrolebinding kubelet-api-test --clusterrole=system:kubelet-api-admin --serviceaccount=default:kubelet-api-test
    SECRET=$(kubectl get secrets | grep kubelet-api-test | awk '{print $1}')
    TOKEN=$(kubectl describe secret ${SECRET} | grep -E '^token' | awk '{print $2}')
    echo ${TOKEN}
    [root@kube-node1 work]# curl -s --cacert /etc/kubernetes/cert/ca.pem -H "Authorization: Bearer ${TOKEN}" https://192.168.75.110:10250/metrics|head
    # HELP apiserver_audit_event_total Counter of audit events generated and sent to the audit backend.
    # TYPE apiserver_audit_event_total counter
    apiserver_audit_event_total 0
    # HELP apiserver_audit_requests_rejected_total Counter of apiserver requests rejected due to an error in audit logging backend.
    # TYPE apiserver_audit_requests_rejected_total counter
    apiserver_audit_requests_rejected_total 0
    # HELP apiserver_client_certificate_expiration_seconds Distribution of the remaining lifetime on the certificate used to authenticate a request.
    # TYPE apiserver_client_certificate_expiration_seconds histogram
    apiserver_client_certificate_expiration_seconds_bucket{le="0"} 0
    apiserver_client_certificate_expiration_seconds_bucket{le="1800"} 0
    

    cadvisor 和 metrics

    cadvisor 是内嵌在 kubelet 二进制中的,统计所在节点各容器的资源(CPU、内存、磁盘、网卡)使用情况的服务。

    浏览器访问 https://172.27.137.240:10250/metricshttps://172.27.137.240:10250/metrics/cadvisor 分别返回 kubelet 和 cadvisor 的 metrics。

    kubelet-metrics cadvisor-metrics

    注意:

    • kubelet.config.json 设置 authentication.anonymous.enabled 为 false,不允许匿名证书访问 10250 的 https 服务;
    • 参考A.浏览器访问kube-apiserver安全端口.md,创建和导入相关证书,然后访问上面的 10250 端口;

    获取 kubelet 的配置

    从 kube-apiserver 获取各节点 kubelet 的配置:

    $ # 使用部署 kubectl 命令行工具时创建的、具有最高权限的 admin 证书;
    [root@kube-node1 work]# source /opt/k8s/bin/environment.sh
    [root@kube-node1 work]# curl -sSL --cacert /etc/kubernetes/cert/ca.pem --cert /opt/k8s/work/admin.pem --key /opt/k8s/work/admin-key.pem ${KUBE_APISERVER}/api/v1/nodes/kube-node1/proxy/configz | jq 
    >   '.kubeletconfig|.kind="KubeletConfiguration"|.apiVersion="kubelet.config.k8s.io/v1beta1"'
    
    {
      "syncFrequency": "1m0s",
      "fileCheckFrequency": "20s",
      "httpCheckFrequency": "20s",
      "address": "192.168.75.110",
      "port": 10250,
      "rotateCertificates": true,
      "serverTLSBootstrap": true,
      "authentication": {
        "x509": {
          "clientCAFile": "/etc/kubernetes/cert/ca.pem"
        },
        "webhook": {
          "enabled": true,
          "cacheTTL": "2m0s"
        },
        "anonymous": {
          "enabled": false
        }
      },
      "authorization": {
        "mode": "Webhook",
        "webhook": {
          "cacheAuthorizedTTL": "5m0s",
          "cacheUnauthorizedTTL": "30s"
        }
      },
      "registryPullQPS": 0,
      "registryBurst": 20,
      "eventRecordQPS": 0,
      "eventBurst": 20,
      "enableDebuggingHandlers": true,
      "enableContentionProfiling": true,
      "healthzPort": 10248,
      "healthzBindAddress": "192.168.75.110",
      "oomScoreAdj": -999,
      "clusterDomain": "cluster.local",
      "clusterDNS": [
        "10.254.0.2"
      ],
      "streamingConnectionIdleTimeout": "4h0m0s",
      "nodeStatusUpdateFrequency": "10s",
      "nodeStatusReportFrequency": "1m0s",
      "nodeLeaseDurationSeconds": 40,
      "imageMinimumGCAge": "2m0s",
      "imageGCHighThresholdPercent": 85,
      "imageGCLowThresholdPercent": 80,
      "volumeStatsAggPeriod": "1m0s",
      "cgroupsPerQOS": true,
      "cgroupDriver": "cgroupfs",
      "cpuManagerPolicy": "none",
      "cpuManagerReconcilePeriod": "10s",
      "runtimeRequestTimeout": "10m0s",
      "hairpinMode": "promiscuous-bridge",
      "maxPods": 220,
      "podCIDR": "172.30.0.0/16",
      "podPidsLimit": -1,
      "resolvConf": "/etc/resolv.conf",
      "cpuCFSQuota": true,
      "cpuCFSQuotaPeriod": "100ms",
      "maxOpenFiles": 1000000,
      "contentType": "application/vnd.kubernetes.protobuf",
      "kubeAPIQPS": 1000,
      "kubeAPIBurst": 2000,
      "serializeImagePulls": false,
      "evictionHard": {
        "memory.available": "100Mi"
      },
      "evictionPressureTransitionPeriod": "5m0s",
      "enableControllerAttachDetach": true,
      "makeIPTablesUtilChains": true,
      "iptablesMasqueradeBit": 14,
      "iptablesDropBit": 15,
      "failSwapOn": true,
      "containerLogMaxSize": "20Mi",
      "containerLogMaxFiles": 10,
      "configMapAndSecretChangeDetectionStrategy": "Watch",
      "enforceNodeAllocatable": [
        "pods"
      ],
      "kind": "KubeletConfiguration",
      "apiVersion": "kubelet.config.k8s.io/v1beta1"
    }
    

    或者参考代码中的注释

    参考

    1. kubelet 认证和授权:https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet-authentication-authorization/

    07-3.部署 kube-proxy 组件

    kube-proxy 运行在所有 worker 节点上,它监听 apiserver 中 service 和 endpoint 的变化情况,创建路由规则以提供服务 IP 和负载均衡功能。

    本文档讲解使用 ipvs 模式的 kube-proxy 的部署过程。

    注意:如果没有特殊指明,本文档的所有操作均在 kube-node1 节点上执行,然后远程分发文件和执行命令。

    下载和分发 kube-proxy 二进制文件

    参考 06-1.部署master节点.md

    安装依赖包

    参考 07-0.部署worker节点.md

    各节点需要安装 ipvsadmipset 命令,加载 ip_vs 内核模块。

    创建 kube-proxy 证书

    创建证书签名请求:

    cd /opt/k8s/work
    
    cat > kube-proxy-csr.json <<EOF
    {
      "CN": "system:kube-proxy",
      "key": {
        "algo": "rsa",
        "size": 2048
      },
      "names": [
        {
          "C": "CN",
          "ST": "BeiJing",
          "L": "BeiJing",
          "O": "k8s",
          "OU": "4Paradigm"
        }
      ]
    }
    EOF
    
    • CN:指定该证书的 User 为 system:kube-proxy
    • 预定义的 RoleBinding system:node-proxier 将User system:kube-proxy 与 Role system:node-proxier 绑定,该 Role 授予了调用 kube-apiserver Proxy 相关 API 的权限;
    • 该证书只会被 kube-proxy 当做 client 证书使用,所以 hosts 字段为空;

    生成证书和私钥:

    cd /opt/k8s/work
    
    cfssl gencert -ca=/opt/k8s/work/ca.pem 
      -ca-key=/opt/k8s/work/ca-key.pem 
      -config=/opt/k8s/work/ca-config.json 
      -profile=kubernetes  kube-proxy-csr.json | cfssljson -bare kube-proxy
    
    ls kube-proxy*
    

    创建和分发 kubeconfig 文件

    cd /opt/k8s/work
    
    source /opt/k8s/bin/environment.sh
    kubectl config set-cluster kubernetes 
      --certificate-authority=/opt/k8s/work/ca.pem 
      --embed-certs=true 
      --server=${KUBE_APISERVER} 
      --kubeconfig=kube-proxy.kubeconfig
    
    kubectl config set-credentials kube-proxy 
      --client-certificate=kube-proxy.pem 
      --client-key=kube-proxy-key.pem 
      --embed-certs=true 
      --kubeconfig=kube-proxy.kubeconfig
    
    kubectl config set-context default 
      --cluster=kubernetes 
      --user=kube-proxy 
      --kubeconfig=kube-proxy.kubeconfig
    
    kubectl config use-context default --kubeconfig=kube-proxy.kubeconfig
    
    • --embed-certs=true:将 ca.pem 和 admin.pem 证书内容嵌入到生成的 kubectl-proxy.kubeconfig 文件中(不加时,写入的是证书文件路径);

    分发 kubeconfig 文件:

    cd /opt/k8s/work
    
    source /opt/k8s/bin/environment.sh
    for node_name in ${NODE_NAMES[@]}
      do
        echo ">>> ${node_name}"
        scp kube-proxy.kubeconfig root@${node_name}:/etc/kubernetes/
      done
    

    创建 kube-proxy 配置文件

    从 v1.10 开始,kube-proxy 部分参数可以配置文件中配置。可以使用 --write-config-to 选项生成该配置文件,或者参考 源代码的注释

    创建 kube-proxy config 文件模板:

    cd /opt/k8s/work
    
    source /opt/k8s/bin/environment.sh
    cat > kube-proxy-config.yaml.template <<EOF
    kind: KubeProxyConfiguration
    apiVersion: kubeproxy.config.k8s.io/v1alpha1
    clientConnection:
      burst: 200
      kubeconfig: "/etc/kubernetes/kube-proxy.kubeconfig"
      qps: 100
    bindAddress: ##NODE_IP##
    healthzBindAddress: ##NODE_IP##:10256
    metricsBindAddress: ##NODE_IP##:10249
    enableProfiling: true
    clusterCIDR: ${CLUSTER_CIDR}
    hostnameOverride: ##NODE_NAME##
    mode: "ipvs"
    portRange: ""
    kubeProxyIPTablesConfiguration:
      masqueradeAll: false
    kubeProxyIPVSConfiguration:
      scheduler: rr
      excludeCIDRs: []
    EOF
    
    • bindAddress: 监听地址;
    • clientConnection.kubeconfig: 连接 apiserver 的 kubeconfig 文件;
    • clusterCIDR: kube-proxy 根据 --cluster-cidr 判断集群内部和外部流量,指定 --cluster-cidr--masquerade-all 选项后 kube-proxy 才会对访问 Service IP 的请求做 SNAT;
    • hostnameOverride: 参数值必须与 kubelet 的值一致,否则 kube-proxy 启动后会找不到该 Node,从而不会创建任何 ipvs 规则;
    • mode: 使用 ipvs 模式;

    为各节点创建和分发 kube-proxy 配置文件:

    cd /opt/k8s/work
    
    source /opt/k8s/bin/environment.sh
    for (( i=0; i < 3; i++ ))
      do
        echo ">>> ${NODE_NAMES[i]}"
        sed -e "s/##NODE_NAME##/${NODE_NAMES[i]}/" -e "s/##NODE_IP##/${NODE_IPS[i]}/" kube-proxy-config.yaml.template > kube-proxy-config-${NODE_NAMES[i]}.yaml.template
        scp kube-proxy-config-${NODE_NAMES[i]}.yaml.template root@${NODE_NAMES[i]}:/etc/kubernetes/kube-proxy-config.yaml
      done
    

    创建和分发 kube-proxy systemd unit 文件

    cd /opt/k8s/work
    
    source /opt/k8s/bin/environment.sh
    cat > kube-proxy.service <<EOF
    [Unit]
    Description=Kubernetes Kube-Proxy Server
    Documentation=https://github.com/GoogleCloudPlatform/kubernetes
    After=network.target
    
    [Service]
    WorkingDirectory=${K8S_DIR}/kube-proxy
    ExecStart=/opt/k8s/bin/kube-proxy \
      --config=/etc/kubernetes/kube-proxy-config.yaml \
      --logtostderr=true \
      --v=2
    Restart=on-failure
    RestartSec=5
    LimitNOFILE=65536
    
    [Install]
    WantedBy=multi-user.target
    EOF
    

    分发 kube-proxy systemd unit 文件:

    cd /opt/k8s/work
    
    source /opt/k8s/bin/environment.sh
    for node_name in ${NODE_NAMES[@]}
      do
        echo ">>> ${node_name}"
        scp kube-proxy.service root@${node_name}:/etc/systemd/system/
      done
    

    启动 kube-proxy 服务

    cd /opt/k8s/works
    
    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh root@${node_ip} "mkdir -p ${K8S_DIR}/kube-proxy"
        ssh root@${node_ip} "modprobe ip_vs_rr"
        ssh root@${node_ip} "systemctl daemon-reload && systemctl enable kube-proxy && systemctl restart kube-proxy"
      done
    
    • 启动服务前必须先创建工作目录;

    检查启动结果

    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh root@${node_ip} "systemctl status kube-proxy|grep Active"
      done
    

    确保状态为 active (running),否则查看日志,确认原因:

    journalctl -u kube-proxy
    

    查看监听端口

    [root@kube-node1 work]# netstat -lnpt|grep kube-proxy
    tcp        0      0 192.168.75.110:10249    0.0.0.0:*               LISTEN      6648/kube-proxy     
    tcp        0      0 192.168.75.110:10256    0.0.0.0:*               LISTEN      6648/kube-proxy
    
    • 10249:http prometheus metrics port;
    • 10256:http healthz port;

    查看 ipvs 路由规则

    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh root@${node_ip} "/usr/sbin/ipvsadm -ln"
      done
    

    预期输出:

    >>> 192.168.75.110
    IP Virtual Server version 1.2.1 (size=4096)
    Prot LocalAddress:Port Scheduler Flags
      -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
    TCP  10.254.0.1:443 rr
      -> 192.168.75.110:6443          Masq    1      0          0         
      -> 192.168.75.111:6443          Masq    1      0          0         
      -> 192.168.75.112:6443          Masq    1      0          0         
    >>> 192.168.75.111
    IP Virtual Server version 1.2.1 (size=4096)
    Prot LocalAddress:Port Scheduler Flags
      -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
    TCP  10.254.0.1:443 rr
      -> 192.168.75.110:6443          Masq    1      0          0         
      -> 192.168.75.111:6443          Masq    1      0          0         
      -> 192.168.75.112:6443          Masq    1      0          0         
    >>> 192.168.75.112
    IP Virtual Server version 1.2.1 (size=4096)
    Prot LocalAddress:Port Scheduler Flags
      -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
    TCP  10.254.0.1:443 rr
      -> 192.168.75.110:6443          Masq    1      0          0         
      -> 192.168.75.111:6443          Masq    1      0          0         
      -> 192.168.75.112:6443          Masq    1      0          0
    

    可见所有通过 https 访问 K8S SVC kubernetes 的请求都转发到 kube-apiserver 节点的 6443 端口;

    08.验证集群功能

    本文档使用 daemonset 验证 master 和 worker 节点是否工作正常。

    注意:如果没有特殊指明,本文档的所有操作均在 kube-node1 节点上执行,然后远程分发文件和执行命令。

    检查节点状态

    [root@kube-node1 work]# kubectl get nodes
    NAME         STATUS   ROLES    AGE   VERSION
    kube-node1   Ready    <none>   16h   v1.14.2
    kube-node2   Ready    <none>   16h   v1.14.2
    kube-node3   Ready    <none>   16h   v1.14.2
    

    都为 Ready 时正常。

    创建测试文件

    cd /opt/k8s/work
    
    cat > nginx-ds.yml <<EOF
    apiVersion: v1
    kind: Service
    metadata:
      name: nginx-ds
      labels:
        app: nginx-ds
    spec:
      type: NodePort
      selector:
        app: nginx-ds
      ports:
      - name: http
        port: 80
        targetPort: 80
    ---
    apiVersion: extensions/v1beta1
    kind: DaemonSet
    metadata:
      name: nginx-ds
      labels:
        addonmanager.kubernetes.io/mode: Reconcile
    spec:
      template:
        metadata:
          labels:
            app: nginx-ds
        spec:
          containers:
          - name: my-nginx
            image: nginx:1.7.9
            ports:
            - containerPort: 80
    EOF
    

    执行测试

    [root@kube-node1 work]# kubectl create -f nginx-ds.yml
    service/nginx-ds created
    daemonset.extensions/nginx-ds created
    

    检查各节点的 Pod IP 连通性

    在这中间有一个逐步创建并启动的过程

    [root@kube-node1 work]# kubectl get pods  -o wide|grep nginx-ds
    nginx-ds-7z464   0/1     ContainerCreating   0          22s   <none>   kube-node2   <none>           <none>
    nginx-ds-hz5fd   0/1     ContainerCreating   0          22s   <none>   kube-node1   <none>           <none>
    nginx-ds-skcrt   0/1     ContainerCreating   0          22s   <none>   kube-node3   <none>           <none>
    
    [root@kube-node1 work]# kubectl get pods  -o wide|grep nginx-ds
    nginx-ds-7z464   0/1     ContainerCreating   0          34s   <none>         kube-node2   <none>           <none>
    nginx-ds-hz5fd   0/1     ContainerCreating   0          34s   <none>         kube-node1   <none>           <none>
    nginx-ds-skcrt   1/1     Running          0          34s   172.30.200.2   kube-node3   <none>           <none>
    
    [root@kube-node1 work]# kubectl get pods  -o wide|grep nginx-ds
    nginx-ds-7z464   1/1     Running   0          70s   172.30.40.2    kube-node2   <none>           <none>
    nginx-ds-hz5fd   1/1     Running   0          70s   172.30.24.2    kube-node1   <none>           <none>
    nginx-ds-skcrt   1/1     Running   0          70s   172.30.200.2   kube-node3   <none>           <none>
    
    

    可见,nginx-ds 的 Pod IP 分别是 172.30.40.2172.30.24.2172.30.200.2,在所有 Node 上分别 ping 这三个 IP,看是否连通:

    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh ${node_ip} "ping -c 1 172.30.24.2"
        ssh ${node_ip} "ping -c 1 172.30.40.2"
        ssh ${node_ip} "ping -c 1 172.30.200.2"
      done
    

    检查服务 IP 和端口可达性

    [root@kube-node1 work]# kubectl get svc |grep nginx-ds
    nginx-ds     NodePort    10.254.94.213   <none>        80:32039/TCP   3m24s
    

    可见:

    • Service Cluster IP:10.254.94.213
    • 服务端口:80
    • NodePort 端口:32039

    在所有 Node 上 curl Service IP:

    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh ${node_ip} "curl -s 10.254.94.213"
      done
    

    预期输出 nginx 欢迎页面内容。

    检查服务的 NodePort 可达性

    在所有 Node 上执行:

    source /opt/k8s/bin/environment.sh
    for node_ip in ${NODE_IPS[@]}
      do
        echo ">>> ${node_ip}"
        ssh ${node_ip} "curl -s ${node_ip}:32039"
      done
    

    预期输出 nginx 欢迎页面内容。

    09-0.部署集群插件

    插件是集群的附件组件,丰富和完善了集群的功能。

    注意:

    1. kuberntes 自带插件的 manifests yaml 文件使用 gcr.io 的 docker registry,国内被墙,需要手动替换为其它 registry 地址(本文档未替换);

    09-1.部署 coredns 插件

    注意:

    1. 如果没有特殊指明,本文档的所有操作均在 kube-node1 节点上执行
    2. kuberntes 自带插件的 manifests yaml 文件使用 gcr.io 的 docker registry,国内被墙,需要手动替换为其它 registry 地址(本文档未替换);
    3. 可以从微软中国提供的 gcr.io 免费代理下载被墙的镜像;

    修改配置文件

    将下载的 kubernetes-server-linux-amd64.tar.gz 解压后,再解压其中的 kubernetes-src.tar.gz 文件。

    cd /opt/k8s/work/kubernetes/
    tar -xzvf kubernetes-src.tar.gz
    

    coredns 目录是 cluster/addons/dns

    cd /opt/k8s/work/kubernetes/cluster/addons/dns/coredns
    cp coredns.yaml.base coredns.yaml
    source /opt/k8s/bin/environment.sh
    sed -i -e "s/__PILLAR__DNS__DOMAIN__/${CLUSTER_DNS_DOMAIN}/" -e "s/__PILLAR__DNS__SERVER__/${CLUSTER_DNS_SVC_IP}/" coredns.yaml
    
    ### 注意 ###
    在文件coredns.yaml中,拉取的coredns镜像是k8s.gcr.io/coredns:1.3.1,但是网站k8s.gcr.io被墙,无法访问,所以需要使用文档中提供的地址更换镜像下载地址:
    地址:http://mirror.azure.cn/help/gcr-proxy-cache.html
    
    文档中需要修改的地方:
    将image: k8s.gcr.io/coredns:1.3.1 换成 image: gcr.azk8s.cn/google_containers/coredns:1.3.1 此时才能拉取镜像,避免后面因镜像无法拉取而导致的容器启动错误
    

    创建 coredns

    kubectl create -f coredns.yaml
    
    # 注意
    若在上一步中忘记修改镜像地址,造成coredns无法成功运行,可以使用如下命令先删除操作,然后修改上述步骤提到的修改镜像地址,然后再创建
    

    检查 coredns 功能

    [root@kube-node1 coredns]# kubectl get all -n kube-system
    NAME                           READY   STATUS    RESTARTS   AGE
    pod/coredns-58c479c699-blpdq   1/1     Running   0          4m
    
    NAME               TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
    service/kube-dns   ClusterIP   10.254.0.2   <none>        53/UDP,53/TCP,9153/TCP   4m
    
    NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
    deployment.apps/coredns   1/1     1            1           4m
    
    NAME                                 DESIRED   CURRENT   READY   AGE
    replicaset.apps/coredns-58c479c699   1         1         1       4m
    
    # 注意:pod/coredns状态应该是Running才行,否则后面的步骤都无法验证
    

    新建一个 Deployment

    cd /opt/k8s/work
    
    cat > my-nginx.yaml <<EOF
    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
      name: my-nginx
    spec:
      replicas: 2
      template:
        metadata:
          labels:
            run: my-nginx
        spec:
          containers:
          - name: my-nginx
            image: nginx:1.7.9
            ports:
            - containerPort: 80
    EOF
    
    kubectl create -f my-nginx.yaml
    

    export 该 Deployment, 生成 my-nginx 服务:

    [root@kube-node1 work]# kubectl expose deploy my-nginx
    service/my-nginx exposed
    
    [root@kube-node1 work]# kubectl get services --all-namespaces |grep my-nginx
    default       my-nginx     ClusterIP   10.254.63.243   <none>        80/TCP                   11s
    

    创建另一个 Pod,查看 /etc/resolv.conf 是否包含 kubelet 配置的 --cluster-dns--cluster-domain,是否能够将服务 my-nginx 解析到上面显示的 Cluster IP 10.254.242.255

    cd /opt/k8s/work
    
    cat > dnsutils-ds.yml <<EOF
    apiVersion: v1
    kind: Service
    metadata:
      name: dnsutils-ds
      labels:
        app: dnsutils-ds
    spec:
      type: NodePort
      selector:
        app: dnsutils-ds
      ports:
      - name: http
        port: 80
        targetPort: 80
    ---
    apiVersion: extensions/v1beta1
    kind: DaemonSet
    metadata:
      name: dnsutils-ds
      labels:
        addonmanager.kubernetes.io/mode: Reconcile
    spec:
      template:
        metadata:
          labels:
            app: dnsutils-ds
        spec:
          containers:
          - name: my-dnsutils
            image: tutum/dnsutils:latest
            command:
              - sleep
              - "3600"
            ports:
            - containerPort: 80
    EOF
    
    kubectl create -f dnsutils-ds.yml
    
    [root@kube-node1 work]# kubectl get pods -lapp=dnsutils-ds
    NAME                READY   STATUS    RESTARTS   AGE
    dnsutils-ds-5krtg   1/1     Running   0          64s
    dnsutils-ds-cxzlg   1/1     Running   0          64s
    dnsutils-ds-tln64   1/1     Running   0          64s
    
    [root@kube-node1 work]# kubectl -it exec dnsutils-ds-5krtg bash
    root@dnsutils-ds-5krtg:/# cat /etc/resolv.conf
    nameserver 10.254.0.2
    search default.svc.cluster.local svc.cluster.local cluster.local mshome.net
    options ndots:5
    
    注意:若下面这些步骤均无法验证,则很大可能是coredns镜像拉取不到,此时可以通过如下命令查看具体原因:
    kubectl get pod -n kube-system # 查看coredns
    kubectl describe pods -n kube-system  coredns名称全称 # 查看具体描述信息
    
    [root@kube-node1 coredns]# kubectl exec dnsutils-ds-5krtg nslookup kubernetes
    Server:         10.254.0.2
    Address:        10.254.0.2#53
    
    Name:   kubernetes.default.svc.cluster.local
    Address: 10.254.0.1
    
    [root@kube-node1 coredns]# kubectl exec dnsutils-ds-5krtg nslookup www.baidu.com
    Server:         10.254.0.2
    Address:        10.254.0.2#53
    
    Non-authoritative answer:
    Name:   www.baidu.com.mshome.net
    Address: 218.28.144.36
    
    [root@kube-node1 coredns]# kubectl exec dnsutils-ds-5krtg nslookup my-nginx
    Server:         10.254.0.2
    Address:        10.254.0.2#53
    
    Name:   my-nginx.default.svc.cluster.local
    Address: 10.254.63.243
    
    [root@kube-node1 coredns]# kubectl exec dnsutils-ds-5krtg nslookup kube-dns.kube-system.svc.cluster
    Server:         10.254.0.2
    Address:        10.254.0.2#53
    
    Non-authoritative answer:
    Name:   kube-dns.kube-system.svc.cluster.mshome.net
    Address: 218.28.144.37
    
    [root@kube-node1 coredns]# kubectl exec dnsutils-ds-5krtg nslookup kube-dns.kube-system.svc
    Server:         10.254.0.2
    Address:        10.254.0.2#53
    
    Name:   kube-dns.kube-system.svc.cluster.local
    Address: 10.254.0.2
    
    [root@kube-node1 coredns]# kubectl exec dnsutils-ds-5krtg nslookup kube-dns.kube-system.svc.cluster.local
    Server:         10.254.0.2
    Address:        10.254.0.2#53
    
    Name:   kube-dns.kube-system.svc.cluster.local
    Address: 10.254.0.2
    
    [root@kube-node1 coredns]# kubectl exec dnsutils-ds-5krtg nslookup kube-dns.kube-system.svc.cluster.local.
    Server:         10.254.0.2
    Address:        10.254.0.2#53
    
    Name:   kube-dns.kube-system.svc.cluster.local
    Address: 10.254.0.2
    

    参考

    1. https://community.infoblox.com/t5/Community-Blog/CoreDNS-for-Kubernetes-Service-Discovery/ba-p/8187
    2. https://coredns.io/2017/03/01/coredns-for-kubernetes-service-discovery-take-2/
    3. https://www.cnblogs.com/boshen-hzb/p/7511432.html
    4. https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/dns

    09-2.部署 dashboard 插件

    注意:

    1. 如果没有特殊指明,本文档的所有操作均在 kube-node1 节点上执行
    2. kuberntes 自带插件的 manifests yaml 文件使用 gcr.io 的 docker registry,国内被墙,需要手动替换为其它 registry 地址(本文档未替换);
    3. 可以从微软中国提供的 gcr.io 免费代理下载被墙的镜像;

    修改配置文件

    将下载的 kubernetes-server-linux-amd64.tar.gz 解压后,再解压其中的 kubernetes-src.tar.gz 文件。

    cd /opt/k8s/work/kubernetes/
    tar -xzvf kubernetes-src.tar.gz
    

    dashboard 对应的目录是:cluster/addons/dashboard

    cd /opt/k8s/work/kubernetes/cluster/addons/dashboard
    

    修改 service 定义,指定端口类型为 NodePort,这样外界可以通过地址 NodeIP:NodePort 访问 dashboard;

    # cat dashboard-service.yaml
    apiVersion: v1
    kind: Service
    metadata:
      name: kubernetes-dashboard
      namespace: kube-system
      labels:
        k8s-app: kubernetes-dashboard
        kubernetes.io/cluster-service: "true"
        addonmanager.kubernetes.io/mode: Reconcile
    spec:
      type: NodePort # 增加这一行
      selector:
        k8s-app: kubernetes-dashboard
      ports:
      - port: 443
        targetPort: 8443
    

    执行所有定义文件

    # ls *.yaml
    dashboard-configmap.yaml  dashboard-controller.yaml  dashboard-rbac.yaml  dashboard-secret.yaml  dashboard-service.yaml
    
    #  注意,需要修改其中镜像地址的文件
    dashboard-controller.yaml
    
    image: k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.1修改成image: gcr.azk8s.cn/google_containers/kubernetes-dashboard-amd64:v1.10.1
    
    # kubectl apply -f  .
    

    查看分配的 NodePort

    [root@kube-node1 dashboard]# kubectl get deployment kubernetes-dashboard  -n kube-system
    NAME                   READY   UP-TO-DATE   AVAILABLE   AGE
    kubernetes-dashboard   1/1     1            1           14s
    
    [root@kube-node1 dashboard]# kubectl --namespace kube-system get pods -o wide
    NAME                                    READY   STATUS    RESTARTS   AGE   IP             NODE         NOMINATED NODE   READINESS GATES
    coredns-58c479c699-blpdq                1/1     Running   0          30m   172.30.200.4   kube-node3   <none>           <none>
    kubernetes-dashboard-64ffdff795-5rgd2   1/1     Running   0          33s   172.30.24.3    kube-node1   <none>           <none>        <none>
    
    [root@kube-node1 dashboard]# kubectl get services kubernetes-dashboard -n kube-system
    NAME                   TYPE       CLUSTER-IP       EXTERNAL-IP   PORT(S)         AGE
    kubernetes-dashboard   NodePort   10.254.110.235   <none>        443:31673/TCP   47s
    
    • NodePort 31673映射到 dashboard pod 443 端口;

    查看 dashboard 支持的命令行参数

    # kubernetes-dashboard-64ffdff795-5rgd2 是pod名称
    [root@kube-node1 dashboard]# kubectl exec --namespace kube-system -it kubernetes-dashboard-64ffdff795-5rgd2  -- /dashboard --help
    
    2019/11/08 07:55:04 Starting overwatch
    Usage of /dashboard:
          --alsologtostderr                  log to standard error as well as files
          --api-log-level string             Level of API request logging. Should be one of 'INFO|NONE|DEBUG'. Default: 'INFO'. (default "INFO")
          --apiserver-host string            The address of the Kubernetes Apiserver to connect to in the format of protocol://address:port, e.g., http://localhost:8080. If not specified, the assumption is that the binary runs inside a Kubernetes cluster and local discovery is attempted.
          --authentication-mode strings      Enables authentication options that will be reflected on login screen. Supported values: token, basic. Default: token.Note that basic option should only be used if apiserver has '--authorization-mode=ABAC' and '--basic-auth-file' flags set. (default [token])
          --auto-generate-certificates       When set to true, Dashboard will automatically generate certificates used to serve HTTPS. Default: false.
          --bind-address ip                  The IP address on which to serve the --secure-port (set to 0.0.0.0 for all interfaces). (default 0.0.0.0)
          --default-cert-dir string          Directory path containing '--tls-cert-file' and '--tls-key-file' files. Used also when auto-generating certificates flag is set. (default "/certs")
          --disable-settings-authorizer      When enabled, Dashboard settings page will not require user to be logged in and authorized to access settings page.
          --enable-insecure-login            When enabled, Dashboard login view will also be shown when Dashboard is not served over HTTPS. Default: false.
          --enable-skip-login                When enabled, the skip button on the login page will be shown. Default: false.
          --heapster-host string             The address of the Heapster Apiserver to connect to in the format of protocol://address:port, e.g., http://localhost:8082. If not specified, the assumption is that the binary runs inside a Kubernetes cluster and service proxy will be used.
          --insecure-bind-address ip         The IP address on which to serve the --port (set to 0.0.0.0 for all interfaces). (default 127.0.0.1)
          --insecure-port int                The port to listen to for incoming HTTP requests. (default 9090)
          --kubeconfig string                Path to kubeconfig file with authorization and master location information.
          --log_backtrace_at traceLocation   when logging hits line file:N, emit a stack trace (default :0)
          --log_dir string                   If non-empty, write log files in this directory
          --logtostderr                      log to standard error instead of files
          --metric-client-check-period int   Time in seconds that defines how often configured metric client health check should be run. Default: 30 seconds. (default 30)
          --port int                         The secure port to listen to for incoming HTTPS requests. (default 8443)
          --stderrthreshold severity         logs at or above this threshold go to stderr (default 2)
          --system-banner string             When non-empty displays message to Dashboard users. Accepts simple HTML tags. Default: ''.
          --system-banner-severity string    Severity of system banner. Should be one of 'INFO|WARNING|ERROR'. Default: 'INFO'. (default "INFO")
          --tls-cert-file string             File containing the default x509 Certificate for HTTPS.
          --tls-key-file string              File containing the default x509 private key matching --tls-cert-file.
          --token-ttl int                    Expiration time (in seconds) of JWE tokens generated by dashboard. Default: 15 min. 0 - never expires (default 900)
      -v, --v Level                          log level for V logs
          --vmodule moduleSpec               comma-separated list of pattern=N settings for file-filtered logging
    pflag: help requested
    command terminated with exit code 2
    

    dashboard 的 --authentication-mode 支持 token、basic,默认为 token。如果使用 basic,则 kube-apiserver 必须配置 --authorization-mode=ABAC--basic-auth-file 参数。

    访问 dashboard

    从 1.7 开始,dashboard 只允许通过 https 访问,如果使用 kube proxy 则必须监听 localhost 或 127.0.0.1。对于 NodePort 没有这个限制,但是仅建议在开发环境中使用。

    对于不满足这些条件的登录访问,在登录成功后浏览器不跳转,始终停在登录界面

    1. kubernetes-dashboard 服务暴露了 NodePort,可以使用 https://NodeIP:NodePort 地址访问 dashboard;
    2. 通过 kube-apiserver 访问 dashboard;
    3. 通过 kubectl proxy 访问 dashboard:

    通过 kubectl proxy 访问 dashboard

    这一步不操作

    启动代理:

    $ kubectl proxy --address='localhost' --port=8086 --accept-hosts='^*$'
    Starting to serve on 127.0.0.1:8086
    
    • --address 必须为 localhost 或 127.0.0.1;
    • 需要指定 --accept-hosts 选项,否则浏览器访问 dashboard 页面时提示 “Unauthorized”;

    浏览器访问 URL:http://127.0.0.1:8086/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy

    通过 kube-apiserver 访问 dashboard

    使用这种方式访问

    获取集群服务地址列表:

    [root@kube-node1 work]# kubectl cluster-info
    Kubernetes master is running at https://127.0.0.1:8443
    CoreDNS is running at https://127.0.0.1:8443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
    kubernetes-dashboard is running at https://127.0.0.1:8443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy
    
    To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
    
    • 由于 apiserver 通过本地的 kube-nginx 做了代理,所以上面显示的 127.0.0.1:8443 为本地的 kube-nginx 的 IP 和 Port,浏览器访问时需要替换为 kube-apiserver 实际监听的 IP 和端口,如 192.168.75.110:6443;
    • 必须通过 kube-apiserver 的安全端口(https)访问 dashbaord,访问时浏览器需要使用自定义证书,否则会被 kube-apiserver 拒绝访问。
    • 创建和导入自定义证书的步骤,参考:A.浏览器访问kube-apiserver安全端口

    浏览器访问 URL:https://192.168.75.110:6443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy

    创建登录 Dashboard 的 token 和 kubeconfig 配置文件

    dashboard 默认只支持 token 认证(不支持 client 证书认证),所以如果使用 Kubeconfig 文件,需要将 token 写入到该文件。

    创建登录 token

    kubectl create sa dashboard-admin -n kube-system
    kubectl create clusterrolebinding dashboard-admin --clusterrole=cluster-admin --serviceaccount=kube-system:dashboard-admin
    ADMIN_SECRET=$(kubectl get secrets -n kube-system | grep dashboard-admin | awk '{print $1}')
    DASHBOARD_LOGIN_TOKEN=$(kubectl describe secret -n kube-system ${ADMIN_SECRET} | grep -E '^token' | awk '{print $2}')
    echo ${DASHBOARD_LOGIN_TOKEN}
    
    eyJhbGciOiJSUzI1NiIsImtpZCI6IiJ9.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXNoYm9hcmQtYWRtaW4tdG9rZW4tZnpjbWwiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiZGFzaGJvYXJkLWFkbWluIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiY2FmYzk3MDctMDFmZi0xMWVhLThlOTctMDAwYzI5MWQxODIwIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmUtc3lzdGVtOmRhc2hib2FyZC1hZG1pbiJ9.YdK7a1YSUa-Y4boHDM2qLrI5PrimxIUd3EfuCX7GiiDVZ3EvJZQFA4_InGWcbHdZoA8AYyh2pQn-hGhiVz0lU2jLIIIFEF2zHc5su1CSISRciONv6NMrFBlTr6tNFsf6SEeEep9tvGILAFTHXPqSVsIb_lCmHeBdH_CDo4sAyLFATDYqI5Q2jBxnCU7DsD73j3LvLY9WlgpuLwAhOrNHc6USxPvB91-z-4GGbcpGIQPpDQ6OlT3cAP47zFRBIpIc2JwBZ63EmcZJqLxixgPMROqzFvV9mtx68o_GEAccsIELMEMqq9USIXibuFtQT6mV0U3p_wntIhr4OPxe5b7jvQ
    

    使用输出的 token 登录 Dashboard。

    在浏览器登陆界面选择使用令牌

    创建使用 token 的 KubeConfig 文件

    source /opt/k8s/bin/environment.sh
    # 设置集群参数
    kubectl config set-cluster kubernetes 
      --certificate-authority=/etc/kubernetes/cert/ca.pem 
      --embed-certs=true 
      --server=${KUBE_APISERVER} 
      --kubeconfig=dashboard.kubeconfig
    
    # 设置客户端认证参数,使用上面创建的 Token
    kubectl config set-credentials dashboard_user 
      --token=${DASHBOARD_LOGIN_TOKEN}  # 注意这个参数,若使用shell脚本,有可能获取不到这个值,可以在shell脚本中手动设置这个值
      --kubeconfig=dashboard.kubeconfig
    
    # 设置上下文参数
    kubectl config set-context default 
      --cluster=kubernetes 
      --user=dashboard_user 
      --kubeconfig=dashboard.kubeconfig
    
    # 设置默认上下文
    kubectl config use-context default --kubeconfig=dashboard.kubeconfig
    
    
    

    由于缺少 Heapster 插件,当前 dashboard 不能展示 Pod、Nodes 的 CPU、内存等统计数据和图表。

    参考

    1. https://github.com/kubernetes/dashboard/wiki/Access-control
    2. https://github.com/kubernetes/dashboard/issues/2558
    3. https://kubernetes.io/docs/concepts/configuration/organize-cluster-access-kubeconfig/
    4. https://github.com/kubernetes/dashboard/wiki/Accessing-Dashboard---1.7.X-and-above
    5. https://github.com/kubernetes/dashboard/issues/2540

    09-3.部署 metrics-server 插件

    注意:

    1. 如果没有特殊指明,本文档的所有操作均在 kube-node1 节点上执行
    2. kuberntes 自带插件的 manifests yaml 文件使用 gcr.io 的 docker registry,国内被墙,需要手动替换为其它 registry 地址(本文档未替换);
    3. 可以从微软中国提供的 gcr.io 免费代理下载被墙的镜像;

    metrics-server 通过 kube-apiserver 发现所有节点,然后调用 kubelet APIs(通过 https 接口)获得各节点(Node)和 Pod 的 CPU、Memory 等资源使用情况。

    从 Kubernetes 1.12 开始,kubernetes 的安装脚本移除了 Heapster,从 1.13 开始完全移除了对 Heapster 的支持,Heapster 不再被维护。

    替代方案如下:

    1. 用于支持自动扩缩容的 CPU/memory HPA metrics:metrics-server;
    2. 通用的监控方案:使用第三方可以获取 Prometheus 格式监控指标的监控系统,如 Prometheus Operator;
    3. 事件传输:使用第三方工具来传输、归档 kubernetes events;

    Kubernetes Dashboard 还不支持 metrics-server(PR:#3504),如果使用 metrics-server 替代 Heapster,将无法在 dashboard 中以图形展示 Pod 的内存和 CPU 情况,需要通过 Prometheus、Grafana 等监控方案来弥补。

    注意:如果没有特殊指明,本文档的所有操作均在 kube-node1 节点上执行

    监控架构

    monitoring_architecture.png

    安装 metrics-server

    从 github clone 源码:

    $ cd /opt/k8s/work/
    $ git clone https://github.com/kubernetes-incubator/metrics-server.git
    $ cd metrics-server/deploy/1.8+/
    $ ls
    aggregated-metrics-reader.yaml  auth-delegator.yaml  auth-reader.yaml  metrics-apiservice.yaml  metrics-server-deployment.yaml  metrics-server-service.yaml  resource-reader.yaml
    

    修改 metrics-server-deployment.yaml 文件,为 metrics-server 添加三个命令行参数:

    # cat metrics-server-deployment.yaml
     34         args:
     35           - --cert-dir=/tmp
     36           - --secure-port=4443
     37           - --metric-resolution=30s # 新增
     38           - --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP  # 新增
     
     同时还需要修改镜像的拉取地址:
     把image: k8s.gcr.io/metrics-server-amd64:v0.3.6换成image: gcr.azk8s.cn/google_containers/metrics-server-amd64:v0.3.6
    
    • --metric-resolution=30s:从 kubelet 采集数据的周期;
    • --kubelet-preferred-address-types:优先使用 InternalIP 来访问 kubelet,这样可以避免节点名称没有 DNS 解析记录时,通过节点名称调用节点 kubelet API 失败的情况(未配置时默认的情况);

    部署 metrics-server:

    # cd /opt/k8s/work/metrics-server/deploy/1.8+/
    # kubectl create -f .
    

    查看运行情况

    [root@kube-node1 1.8+]# kubectl -n kube-system get pods -l k8s-app=metrics-server
    NAME                              READY   STATUS    RESTARTS   AGE
    metrics-server-65879bf98c-ghqbk   1/1     Running   0          38s
    
    [root@kube-node1 1.8+]# kubectl get svc -n kube-system  metrics-server
    NAME             TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE
    metrics-server   ClusterIP   10.254.244.235   <none>        443/TCP   55s
    

    metrics-server 的命令行参数

    #  docker run -it --rm gcr.azk8s.cn/google_containers/metrics-server-amd64:v0.3.6 --help
    Launch metrics-server
    
    Usage:
       [flags]
    
    Flags:
          --alsologtostderr                                         log to standard error as well as files
          --authentication-kubeconfig string                        kubeconfig file pointing at the 'core' kubernetes server with enough rights to create tokenaccessreviews.authentication.k8s.io.
          --authentication-skip-lookup                              If false, the authentication-kubeconfig will be used to lookup missing authentication configuration from the clust
    er.
          --authentication-token-webhook-cache-ttl duration         The duration to cache responses from the webhook token authenticator. (default 10s)
          --authentication-tolerate-lookup-failure                  If true, failures to look up missing authentication configuration from the cluster are not considered fatal. Note that this can result in authentication that treats all requests as anonymous.
          --authorization-always-allow-paths strings                A list of HTTP paths to skip during authorization, i.e. these are authorized without contacting the 'core' kubernetes server.
          --authorization-kubeconfig string                         kubeconfig file pointing at the 'core' kubernetes server with enough rights to create subjectaccessreviews.authorization.k8s.io.
          --authorization-webhook-cache-authorized-ttl duration     The duration to cache 'authorized' responses from the webhook authorizer. (default 10s)
          --authorization-webhook-cache-unauthorized-ttl duration   The duration to cache 'unauthorized' responses from the webhook authorizer. (default 10s)
          --bind-address ip                                         The IP address on which to listen for the --secure-port port. The associated interface(s) must be reachable by the
     rest of the cluster, and by CLI/web clients. If blank, all interfaces will be used (0.0.0.0 for all IPv4 interfaces and :: for all IPv6 interfaces). (default 0.0.0.0)
          --cert-dir string                                         The directory where the TLS certs are located. If --tls-cert-file and --tls-private-key-file are provided, this flag will be ignored. (default "apiserver.local.config/certificates")
          --client-ca-file string                                   If set, any request presenting a client certificate signed by one of the authorities in the client-ca-file is authenticated with an identity corresponding to the CommonName of the client certificate.
          --contention-profiling                                    Enable lock contention profiling, if profiling is enabled
      -h, --help                                                    help for this command
          --http2-max-streams-per-connection int                    The limit that the server gives to clients for the maximum number of streams in an HTTP/2 connection. Zero means t
    o use golang's default.
          --kubeconfig string                                       The path to the kubeconfig used to connect to the Kubernetes API server and the Kubelets (defaults to in-cluster c
    onfig)
          --kubelet-certificate-authority string                    Path to the CA to use to validate the Kubelet's serving certificates.
          --kubelet-insecure-tls                                    Do not verify CA of serving certificates presented by Kubelets.  For testing purposes only.
          --kubelet-port int                                        The port to use to connect to Kubelets. (default 10250)
          --kubelet-preferred-address-types strings                 The priority of node address types to use when determining which address to use to connect to a particular node (d
    efault [Hostname,InternalDNS,InternalIP,ExternalDNS,ExternalIP])
          --log-flush-frequency duration                            Maximum number of seconds between log flushes (default 5s)
          --log_backtrace_at traceLocation                          when logging hits line file:N, emit a stack trace (default :0)
          --log_dir string                                          If non-empty, write log files in this directory
          --log_file string                                         If non-empty, use this log file
          --logtostderr                                             log to standard error instead of files (default true)
          --metric-resolution duration                              The resolution at which metrics-server will retain metrics. (default 1m0s)
          --profiling                                               Enable profiling via web interface host:port/debug/pprof/ (default true)
          --requestheader-allowed-names strings                     List of client certificate common names to allow to provide usernames in headers specified by --requestheader-user
    name-headers. If empty, any client certificate validated by the authorities in --requestheader-client-ca-file is allowed.
          --requestheader-client-ca-file string                     Root certificate bundle to use to verify client certificates on incoming requests before trusting usernames in hea
    ders specified by --requestheader-username-headers. WARNING: generally do not depend on authorization being already done for incoming requests.
          --requestheader-extra-headers-prefix strings              List of request header prefixes to inspect. X-Remote-Extra- is suggested. (default [x-remote-extra-])
          --requestheader-group-headers strings                     List of request headers to inspect for groups. X-Remote-Group is suggested. (default [x-remote-group])
          --requestheader-username-headers strings                  List of request headers to inspect for usernames. X-Remote-User is common. (default [x-remote-user])
          --secure-port int                                         The port on which to serve HTTPS with authentication and authorization.If 0, don't serve HTTPS at all. (default 443)
          --skip_headers                                            If true, avoid header prefixes in the log messages
          --stderrthreshold severity                                logs at or above this threshold go to stderr
          --tls-cert-file string                                    File containing the default x509 Certificate for HTTPS. (CA cert, if any, concatenated after server cert). If HTTPS serving is enabled, and --tls-cert-file and --tls-private-key-file are not provided, a self-signed certificate and key are generated for the public address and saved to the directory specified by --cert-dir.
          --tls-cipher-suites strings                               Comma-separated list of cipher suites for the server. If omitted, the default Go cipher suites will be use.  Possible values: TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_RC4_128_SHA,TLS_ECDHE_RSA_WITH_3DES_EDE_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_RSA_WITH_RC4_128_SHA,TLS_RSA_WITH_3DES_EDE_CBC_SHA,TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_128_CBC_SHA256,TLS_RSA_WITH_AES_128_GCM_SHA256,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_RSA_WITH_AES_256_GCM_SHA384,TLS_RSA_WITH_RC4_128_SHA
          --tls-min-version string                                  Minimum TLS version supported. Possible values: VersionTLS10, VersionTLS11, VersionTLS12
          --tls-private-key-file string                             File containing the default x509 private key matching --tls-cert-file.
          --tls-sni-cert-key namedCertKey                           A pair of x509 certificate and private key file paths, optionally suffixed with a list of domain patterns which are fully qualified domain names, possibly with prefixed wildcard segments. If no domain patterns are provided, the names of the certificate are extracted. Non-wildcard matches trump over wildcard matches, explicit domain patterns trump over extracted names. For multiple key/certificate pairs, use the --tls-sni-cert-key multiple times. Examples: "example.crt,example.key" or "foo.crt,foo.key:*.foo.com,foo.com". (default [])
      -v, --v Level                                                 number for the log level verbosity
          --vmodule moduleSpec                                      comma-separated list of pattern=N settings for file-filtered logging
    

    查看 metrics-server 输出的 metrics

    1. 通过 kube-apiserver 或 kubectl proxy 访问:

      使用浏览器访问,直接返回结果

      https://192.168.75.110:6443/apis/metrics.k8s.io/v1beta1/nodes

      https://192.168.75.110:6443/apis/metrics.k8s.io/v1beta1/pods

    2. 直接使用 kubectl 命令访问:

      kubectl get --raw apis/metrics.k8s.io/v1beta1/nodes

      kubectl get --raw apis/metrics.k8s.io/v1beta1/pods

    #  kubectl get --raw "/apis/metrics.k8s.io/v1beta1" | jq .
    {
      "kind": "APIResourceList",
      "apiVersion": "v1",
      "groupVersion": "metrics.k8s.io/v1beta1",
      "resources": [
        {
          "name": "nodes",
          "singularName": "",
          "namespaced": false,
          "kind": "NodeMetrics",
          "verbs": [
            "get",
            "list"
          ]
        },
        {
          "name": "pods",
          "singularName": "",
          "namespaced": true,
          "kind": "PodMetrics",
          "verbs": [
            "get",
            "list"
          ]
        }
      ]
    }
    
    # kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes" | jq .
    {
      "kind": "NodeMetricsList",
      "apiVersion": "metrics.k8s.io/v1beta1",
      "metadata": {
        "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes"
      },
      "items": [
        {
          "metadata": {
            "name": "zhangjun-k8s01",
            "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/zhangjun-k8s01",
            "creationTimestamp": "2019-05-26T10:55:10Z"
          },
          "timestamp": "2019-05-26T10:54:52Z",
          "window": "30s",
          "usage": {
            "cpu": "311155148n",
            "memory": "2881016Ki"
          }
        },
        {
          "metadata": {
            "name": "zhangjun-k8s02",
            "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/zhangjun-k8s02",
            "creationTimestamp": "2019-05-26T10:55:10Z"
          },
          "timestamp": "2019-05-26T10:54:54Z",
          "window": "30s",
          "usage": {
            "cpu": "253796835n",
            "memory": "1028836Ki"
          }
        },
        {
          "metadata": {
            "name": "zhangjun-k8s03",
            "selfLink": "/apis/metrics.k8s.io/v1beta1/nodes/zhangjun-k8s03",
            "creationTimestamp": "2019-05-26T10:55:10Z"
          },
          "timestamp": "2019-05-26T10:54:54Z",
          "window": "30s",
          "usage": {
            "cpu": "280441339n",
            "memory": "1072772Ki"
          }
        }
      ]
    }
    
    • /apis/metrics.k8s.io/v1beta1/nodes 和 /apis/metrics.k8s.io/v1beta1/pods 返回的 usage 包含 CPU 和 Memory;

    使用 kubectl top 命令查看集群节点资源使用情况

    kubectl top 命令从 metrics-server 获取集群节点基本的指标信息:

    [root@kube-node1 1.8+]# kubectl top node
    NAME         CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
    kube-node1   125m         3%     833Mi           44%       
    kube-node2   166m         4%     891Mi           47%       
    kube-node3   126m         3%     770Mi           40% 
    

    参考

    1. https://kubernetes.feisky.xyz/zh/addons/metrics.html
    2. metrics-server RBAC:https://github.com/kubernetes-incubator/metrics-server/issues/40
    3. metrics-server 参数:https://github.com/kubernetes-incubator/metrics-server/issues/25
    4. https://kubernetes.io/docs/tasks/debug-application-cluster/core-metrics-pipeline/
    5. metrics-server 的 APIs 文档

    09-4.部署 EFK 插件

    注意:

    1. 如果没有特殊指明,本文档的所有操作均在 kube-node1 节点上执行
    2. kuberntes 自带插件的 manifests yaml 文件使用 gcr.io 的 docker registry,国内被墙,需要手动替换为其它 registry 地址;
    3. 可以从微软中国提供的 gcr.io 免费代理下载被墙的镜像;

    修改配置文件

    将下载的 kubernetes-server-linux-amd64.tar.gz 解压后,再解压其中的 kubernetes-src.tar.gz 文件。

    cd /opt/k8s/work/kubernetes/
    tar -xzvf kubernetes-src.tar.gz
    

    EFK 目录是 kubernetes/cluster/addons/fluentd-elasticsearch

    # cd /opt/k8s/work/kubernetes/cluster/addons/fluentd-elasticsearch
    # vim fluentd-es-ds.yaml 
    把path: /var/lib/docker/containers修改成 path: /data/k8s/docker/data/containers/
    把image: k8s.gcr.io/fluentd-elasticsearch:v2.4.0修改成 image: gcr.azk8s.cn/google_containers/fluentd-elasticsearch:v2.4.0
    
    # vim es-statefulset.yaml
    官方文档中容器名称和镜像写的有问题,需要修改成如下形式
      serviceAccountName: elasticsearch-logging
      containers:
      - name: elasticsearch-logging
        #image: gcr.io/fluentd-elasticsearch/elasticsearch:v6.6.1
        image: docker.elastic.co/elasticsearch/elasticsearch:6.6.1
        #gcr.azk8s.cn/fluentd-elasticsearch/elasticsearch:v6.6.1
    
    

    执行定义文件

    [root@kube-node1 fluentd-elasticsearch]# pwd
    /opt/k8s/work/kubernetes/cluster/addons/fluentd-elasticsearch
    [root@kube-node1 fluentd-elasticsearch]# ls *.yaml
    es-service.yaml  es-statefulset.yaml  fluentd-es-configmap.yaml  fluentd-es-ds.yaml  kibana-deployment.yaml  kibana-service.yaml
    
    # kubectl apply -f .
    

    检查执行结果

    # 理想状态下的结果
    [root@kube-node1 fluentd-elasticsearch]# kubectl get pods -n kube-system -o wide|grep -E 'elasticsearch|fluentd|kibana'
    elasticsearch-logging-0                 1/1     Running   1          92s     172.30.24.6    kube-node1   <none>           <none>
    elasticsearch-logging-1                 1/1     Running   1          85s     172.30.40.6    kube-node2   <none>           <none>
    fluentd-es-v2.4.0-k72m9                 1/1     Running   0          91s     172.30.200.7   kube-node3   <none>           <none>
    fluentd-es-v2.4.0-klvbr                 1/1     Running   0          91s     172.30.24.7    kube-node1   <none>           <none>
    fluentd-es-v2.4.0-pcq8p                 1/1     Running   0          91s     172.30.40.5    kube-node2   <none>           <none>
    kibana-logging-f4d99b69f-779gm          1/1     Running   0          91s     172.30.200.6   kube-node3   <none>           <none>
    
    # 不理想状态下的结果
    # 两个elasticsearch-logging只有一个是正常的,三个fluentd-es中有俩是正常的
    # 不过过一会儿有问题的也会出现running正常情况,没问题的出现问题
    # 初步判断是因为出问题所在主机系统平均负载压力大的缘故
    
    [root@kube-node1 fluentd-elasticsearch]# kubectl get pods -n kube-system -o wide|grep -E 'elasticsearch|fluentd|kibana'
    elasticsearch-logging-0                 1/1     Running            0          16m   172.30.48.3   kube-node2   <none>           <none>
    elasticsearch-logging-1                 0/1     CrashLoopBackOff   7          15m   172.30.24.6   kube-node1   <none>           <none>
    fluentd-es-v2.4.0-lzcl7                 1/1     Running            0          16m   172.30.96.3   kube-node3   <none>           <none>
    fluentd-es-v2.4.0-mm6gs                 0/1     CrashLoopBackOff   5          16m   172.30.48.4   kube-node2   <none>           <none>
    fluentd-es-v2.4.0-vx5vj                 1/1     Running            0          16m   172.30.24.3   kube-node1   <none>           <none>
    kibana-logging-f4d99b69f-6kjlr          1/1     Running            0          16m   172.30.96.5   kube-node3   <none>           <none>
    
    [root@kube-node1 fluentd-elasticsearch]# kubectl get service  -n kube-system|grep -E 'elasticsearch|kibana'
    elasticsearch-logging   ClusterIP   10.254.202.87    <none>        9200/TCP                 116s
    kibana-logging          ClusterIP   10.254.185.3     <none>        5601/TCP                 114s
    

    kibana Pod 第一次启动时会用较长时间(0-20分钟)来优化和 Cache 状态页面,可以 tailf 该 Pod 的日志观察进度:

    $ kubectl logs kibana-logging-7445dc9757-pvpcv -n kube-system -f
    {"type":"log","@timestamp":"2019-05-26T11:36:18Z","tags":["info","optimize"],"pid":1,"message":"Optimizing and caching bundles for graph, ml, kibana, stateSessionStorageRedirect, timelion and status_page. This may take a few minutes"}
    {"type":"log","@timestamp":"2019-05-26T11:40:03Z","tags":["info","optimize"],"pid":1,"message":"Optimization of bundles for graph, ml, kibana, stateSessionStorageRedirect, timelion and status_page complete in 224.57 seconds"}
    

    注意:只有当 Kibana pod 启动完成后,浏览器才能查看 kibana dashboard,否则会被拒绝。

    访问 kibana

    1. 通过 kube-apiserver 访问:

    操作这个步骤

    ```bash
    [root@kube-node1 fluentd-elasticsearch]# kubectl cluster-info|grep -E 'Elasticsearch|Kibana'
    Elasticsearch is running at https://127.0.0.1:8443/api/v1/namespaces/kube-system/services/elasticsearch-logging/proxy
    Kibana is running at https://127.0.0.1:8443/api/v1/namespaces/kube-system/services/kibana-logging/proxy
    ```
    

    浏览器访问 URL: https://192.168.75.111:6443/api/v1/namespaces/kube-system/services/kibana-logging/proxy
    对于 virtuabox 做了端口映射: http://127.0.0.1:8080/api/v1/namespaces/kube-system/services/kibana-logging/proxy

    1. 通过 kubectl proxy 访问

      不操作这个步骤

      创建代理

      $ kubectl proxy --address='172.27.137.240' --port=8086 --accept-hosts='^*$'
      Starting to serve on 172.27.129.150:8086
      

      浏览器访问 URL:http://172.27.137.240:8086/api/v1/namespaces/kube-system/services/kibana-logging/proxy

      对于 virtuabox 做了端口映射: http://127.0.0.1:8086/api/v1/namespaces/kube-system/services/kibana-logging/proxy

    在 Management -> Indices 页面创建一个 index(相当于 mysql 中的一个 database),选中 Index contains time-based events,使用默认的 logstash-* pattern,点击 Create(这一步对操作所在节点的系统平均负载压力很大) ; 创建 Index 后,稍等几分钟就可以在 Discover 菜单下看到 ElasticSearch logging 中汇聚的日志;

    系统平均负载压力大,表现是kswapd0进程占用CPU过高,深层含义是主机物理内存不足

    10.部署私有 docker registry

    注意:这一步不操作,私有仓库采用Harbor来部署

    注意:本文档介绍使用 docker 官方的 registry v2 镜像部署私有仓库的步骤,你也可以部署 Harbor 私有仓库(部署 Harbor 私有仓库)。

    本文档讲解部署一个 TLS 加密、HTTP Basic 认证、用 ceph rgw 做后端存储的私有 docker registry 步骤,如果使用其它类型的后端存储,则可以从 “创建 docker registry” 节开始;

    示例两台机器 IP 如下:

    • ceph rgw: 172.27.132.66
    • docker registry: 172.27.132.67

    部署 ceph RGW 节点

    $ ceph-deploy rgw create 172.27.132.66 # rgw 默认监听7480端口
    $
    

    创建测试账号 demo

    $ radosgw-admin user create --uid=demo --display-name="ceph rgw demo user"
    $
    

    创建 demo 账号的子账号 swift

    当前 registry 只支持使用 swift 协议访问 ceph rgw 存储,暂时不支持 s3 协议;

    $ radosgw-admin subuser create --uid demo --subuser=demo:swift --access=full --secret=secretkey --key-type=swift
    $
    

    创建 demo:swift 子账号的 sercret key

    $ radosgw-admin key create --subuser=demo:swift --key-type=swift --gen-secret
    {
        "user_id": "demo",
        "display_name": "ceph rgw demo user",
        "email": "",
        "suspended": 0,
        "max_buckets": 1000,
        "auid": 0,
        "subusers": [
            {
                "id": "demo:swift",
                "permissions": "full-control"
            }
        ],
        "keys": [
            {
                "user": "demo",
                "access_key": "5Y1B1SIJ2YHKEHO5U36B",
                "secret_key": "nrIvtPqUj7pUlccLYPuR3ntVzIa50DToIpe7xFjT"
            }
        ],
        "swift_keys": [
            {
                "user": "demo:swift",
                "secret_key": "ttQcU1O17DFQ4I9xzKqwgUe7WIYYX99zhcIfU9vb"
            }
        ],
        "caps": [],
        "op_mask": "read, write, delete",
        "default_placement": "",
        "placement_tags": [],
        "bucket_quota": {
            "enabled": false,
            "max_size_kb": -1,
            "max_objects": -1
        },
        "user_quota": {
            "enabled": false,
            "max_size_kb": -1,
            "max_objects": -1
        },
            "temp_url_keys": []
    }
    
    • ttQcU1O17DFQ4I9xzKqwgUe7WIYYX99zhcIfU9vb 为子账号 demo:swift 的 secret key;

    创建 docker registry

    创建 registry 使用的 x509 证书

    $ mkdir -p registry/{auth,certs}
    $ cat > registry-csr.json <<EOF
    {
      "CN": "registry",
      "hosts": [
          "127.0.0.1",
          "172.27.132.67"
      ],
      "key": {
        "algo": "rsa",
        "size": 2048
      },
      "names": [
        {
          "C": "CN",
          "ST": "BeiJing",
          "L": "BeiJing",
          "O": "k8s",
          "OU": "4Paradigm"
        }
      ]
    }
    EOF
    $ cfssl gencert -ca=/etc/kubernetes/cert/ca.pem 
        -ca-key=/etc/kubernetes/cert/ca-key.pem 
        -config=/etc/kubernetes/cert/ca-config.json 
        -profile=kubernetes registry-csr.json | cfssljson -bare registry
    $ cp registry.pem registry-key.pem registry/certs
    $
    
    • 这里复用以前创建的 CA 证书和秘钥文件;
    • hosts 字段指定 registry 的 NodeIP;

    创建 HTTP Baisc 认证文件

    $ docker run --entrypoint htpasswd registry:2 -Bbn foo foo123  > registry/auth/htpasswd
    $ cat  registry/auth/htpasswd
    foo:$2y$05$iZaM45Jxlcg0DJKXZMggLOibAsHLGybyU.CgU9AHqWcVDyBjiScN.
    

    配置 registry 参数

    export RGW_AUTH_URL="http://172.27.132.66:7480/auth/v1"
    export RGW_USER="demo:swift"
    export RGW_SECRET_KEY="ttQcU1O17DFQ4I9xzKqwgUe7WIYYX99zhcIfU9vb"
    cat > config.yml << EOF
    # https://docs.docker.com/registry/configuration/#list-of-configuration-options
    version: 0.1
    log:
      level: info
      fromatter: text
      fields:
        service: registry
    
    storage:
      cache:
        blobdescriptor: inmemory
      delete:
        enabled: true
      swift:
        authurl: ${RGW_AUTH_URL}
        username: ${RGW_USER}
        password: ${RGW_SECRET_KEY}
        container: registry
    
    auth:
      htpasswd:
        realm: basic-realm
        path: /auth/htpasswd
    
    http:
      addr: 0.0.0.0:8000
      headers:
        X-Content-Type-Options: [nosniff]
      tls:
        certificate: /certs/registry.pem
        key: /certs/registry-key.pem
    
    health:
      storagedriver:
        enabled: true
        interval: 10s
        threshold: 3
    EOF
    [k8s@zhangjun-k8s01 cert]$ cp config.yml registry
    [k8s@zhangjun-k8s01 cert]$ scp -r registry 172.27.132.67:/opt/k8s
    
    • storage.swift 指定后端使用 swfit 接口协议的存储,这里配置的是 ceph rgw 存储参数;
    • auth.htpasswd 指定了 HTTP Basic 认证的 token 文件路径;
    • http.tls 指定了 registry http 服务器的证书和秘钥文件路径;

    创建 docker registry:

    ssh k8s@172.27.132.67
    $ docker run -d -p 8000:8000 --privileged 
        -v /opt/k8s/registry/auth/:/auth 
        -v /opt/k8s/registry/certs:/certs 
        -v /opt/k8s/registry/config.yml:/etc/docker/registry/config.yml 
        --name registry registry:2
    
    • 执行该 docker run 命令的机器 IP 为 172.27.132.67;

    向 registry push image

    将签署 registry 证书的 CA 证书拷贝到 /etc/docker/certs.d/172.27.132.67:8000 目录下

    [k8s@zhangjun-k8s01 cert]$ sudo mkdir -p /etc/docker/certs.d/172.27.132.67:8000
    [k8s@zhangjun-k8s01 cert]$ sudo cp /etc/kubernetes/cert/ca.pem /etc/docker/certs.d/172.27.132.67:8000/ca.crt
    

    登陆私有 registry:

    $ docker login 172.27.132.67:8000
    Username: foo
    Password:
    Login Succeeded
    

    登陆信息被写入 ~/.docker/config.json 文件:

    $ cat ~/.docker/config.json
    {
            "auths": {
                    "172.27.132.67:8000": {
                            "auth": "Zm9vOmZvbzEyMw=="
                    }
            }
    }
    

    将本地的 image 打上私有 registry 的 tag:

    $ docker tag prom/node-exporter:v0.16.0 172.27.132.67:8000/prom/node-exporter:v0.16.0
    $ docker images |grep pause
    prom/node-exporter:v0.16.0                            latest              f9d5de079539        2 years ago         239.8 kB
    172.27.132.67:8000/prom/node-exporter:v0.16.0                        latest              f9d5de079539        2 years ago         239.8 kB
    

    将 image push 到私有 registry:

    $ docker push 172.27.132.67:8000/prom/node-exporter:v0.16.0
    The push refers to a repository [172.27.132.67:8000/prom/node-exporter:v0.16.0]
    5f70bf18a086: Pushed
    e16a89738269: Pushed
    latest: digest: sha256:9a6b437e896acad3f5a2a8084625fdd4177b2e7124ee943af642259f2f283359 size: 916
    

    查看 ceph 上是否已经有 push 的 pause 容器文件:

    $ rados lspools
    rbd
    cephfs_data
    cephfs_metadata
    .rgw.root
    k8s
    default.rgw.control
    default.rgw.meta
    default.rgw.log
    default.rgw.buckets.index
    default.rgw.buckets.data
    
    $  rados --pool default.rgw.buckets.data ls|grep node-exporter
    1f3f02c4-fe58-4626-992b-c6c0fe4c8acf.34107.1_files/docker/registry/v2/repositories/prom/node-exporter/_layers/sha256/cdb7590af5f064887f3d6008d46be65e929c74250d747813d85199e04fc70463/link
    1f3f02c4-fe58-4626-992b-c6c0fe4c8acf.34107.1_files/docker/registry/v2/repositories/prom/node-exporter/_manifests/revisions/sha256/55302581333c43d540db0e144cf9e7735423117a733cdec27716d87254221086/link
    1f3f02c4-fe58-4626-992b-c6c0fe4c8acf.34107.1_files/docker/registry/v2/repositories/prom/node-exporter/_manifests/tags/v0.16.0/current/link
    1f3f02c4-fe58-4626-992b-c6c0fe4c8acf.34107.1_files/docker/registry/v2/repositories/prom/node-exporter/_manifests/tags/v0.16.0/index/sha256/55302581333c43d540db0e144cf9e7735423117a733cdec27716d87254221086/link
    1f3f02c4-fe58-4626-992b-c6c0fe4c8acf.34107.1_files/docker/registry/v2/repositories/prom/node-exporter/_layers/sha256/224a21997e8ca8514d42eb2ed98b19a7ee2537bce0b3a26b8dff510ab637f15c/link
    1f3f02c4-fe58-4626-992b-c6c0fe4c8acf.34107.1_files/docker/registry/v2/repositories/prom/node-exporter/_layers/sha256/528dda9cf23d0fad80347749d6d06229b9a19903e49b7177d5f4f58736538d4e/link
    1f3f02c4-fe58-4626-992b-c6c0fe4c8acf.34107.1_files/docker/registry/v2/repositories/prom/node-exporter/_layers/sha256/188af75e2de0203eac7c6e982feff45f9c340eaac4c7a0f59129712524fa2984/link
    

    私有 registry 的运维操作

    查询私有镜像中的 images

    $ curl  --user foo:foo123 --cacert /etc/docker/certs.d/172.27.132.67:8000/ca.crt https://172.27.132.67:8000/v2/_catalog
    {"repositories":["prom/node-exporter"]}
    

    查询某个镜像的 tags 列表

    $  curl  --user foo:foo123 --cacert /etc/docker/certs.d/172.27.132.67:8000/ca.crt https://172.27.132.67:8000/v2/prom/node-exporter/tags/list
    {"name":"prom/node-exporter","tags":["v0.16.0"]}
    

    获取 image 或 layer 的 digest

    v2/<repoName>/manifests/<tagName> 发 GET 请求,从响应的头部 Docker-Content-Digest 获取 image digest,从响应的 body 的 fsLayers.blobSum 中获取 layDigests;

    注意,必须包含请求头:Accept: application/vnd.docker.distribution.manifest.v2+json

    $ curl -v -H "Accept: application/vnd.docker.distribution.manifest.v2+json" --user foo:foo123 --cacert /etc/docker/certs.d/172.27.132.67:8000/ca.crt https://172.27.132.67:8000/v2/prom/node-exporter/manifests/v0.16.0
    * About to connect() to 172.27.132.67 port 8000 (#0)
    *   Trying 172.27.132.67...
    * Connected to 172.27.132.67 (172.27.132.67) port 8000 (#0)
    * Initializing NSS with certpath: sql:/etc/pki/nssdb
    *   CAfile: /etc/docker/certs.d/172.27.132.67:8000/ca.crt
      CApath: none
    * SSL connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
    * Server certificate:
    *       subject: CN=registry,OU=4Paradigm,O=k8s,L=BeiJing,ST=BeiJing,C=CN
    *       start date: Jul 05 12:52:00 2018 GMT
    *       expire date: Jul 02 12:52:00 2028 GMT
    *       common name: registry
    *       issuer: CN=kubernetes,OU=4Paradigm,O=k8s,L=BeiJing,ST=BeiJing,C=CN
    * Server auth using Basic with user 'foo'
    > GET /v2/prom/node-exporter/manifests/v0.16.0 HTTP/1.1
    > Authorization: Basic Zm9vOmZvbzEyMw==
    > User-Agent: curl/7.29.0
    > Host: 172.27.132.67:8000
    > Accept: application/vnd.docker.distribution.manifest.v2+json
    >
    < HTTP/1.1 200 OK
    < Content-Length: 949
    < Content-Type: application/vnd.docker.distribution.manifest.v2+json
    < Docker-Content-Digest: sha256:55302581333c43d540db0e144cf9e7735423117a733cdec27716d87254221086
    < Docker-Distribution-Api-Version: registry/2.0
    < Etag: "sha256:55302581333c43d540db0e144cf9e7735423117a733cdec27716d87254221086"
    < X-Content-Type-Options: nosniff
    < Date: Fri, 06 Jul 2018 06:18:41 GMT
    <
    {
       "schemaVersion": 2,
       "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
       "config": {
          "mediaType": "application/vnd.docker.container.image.v1+json",
          "size": 3511,
          "digest": "sha256:188af75e2de0203eac7c6e982feff45f9c340eaac4c7a0f59129712524fa2984"
       },
       "layers": [
          {
             "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
             "size": 2392417,
             "digest": "sha256:224a21997e8ca8514d42eb2ed98b19a7ee2537bce0b3a26b8dff510ab637f15c"
          },
          {
             "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
             "size": 560703,
             "digest": "sha256:cdb7590af5f064887f3d6008d46be65e929c74250d747813d85199e04fc70463"
          },
          {
             "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
             "size": 5332460,
             "digest": "sha256:528dda9cf23d0fad80347749d6d06229b9a19903e49b7177d5f4f58736538d4e"
          }
       ]
    

    删除 image

    /v2/<name>/manifests/<reference> 发送 DELETE 请求,reference 为上一步返回的 Docker-Content-Digest 字段内容:

    $ curl -X DELETE  --user foo:foo123 --cacert /etc/docker/certs.d/172.27.132.67:8000/ca.crt https://172.27.132.67:8000/v2/prom/node-exporter/manifests/sha256:68effe31a4ae8312e47f54bec52d1fc925908009ce7e6f734e1b54a4169081c5
    $
    

    删除 layer

    /v2/<name>/blobs/<digest>发送 DELETE 请求,其中 digest 是上一步返回的 fsLayers.blobSum 字段内容:

    $ curl -X DELETE  --user foo:foo123 --cacert /etc/docker/certs.d/172.27.132.67:8000/ca.crt https://172.27.132.67:8000/v2/prom/node-exporter/blobs/sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4
    $ curl -X DELETE  --cacert /etc/docker/certs.d/172.27.132.67:8000/ca.crt https://172.27.132.67:8000/v2/prom/node-exporter/blobs/sha256:04176c8b224aa0eb9942af765f66dae866f436e75acef028fe44b8a98e045515
    $
    

    常见问题

    login 失败 416

    执行 http://docs.ceph.com/docs/master/install/install-ceph-gateway/ 里面的 s3 test.py 程序失败:

    [k8s@zhangjun-k8s01 cert]$ python s3test.py Traceback (most recent call last): File "s3test.py", line 12, in bucket = conn.create_bucket('my-new-bucket') File "/usr/lib/python2.7/site-packages/boto/s3/connection.py", line 625, in create_bucket response.status, response.reason, body) boto.exception.S3ResponseError: S3ResponseError: 416 Requested Range Not Satisfiable

    解决版办法:

    1. 在管理节点上修改 ceph.conf
    2. ceph-deploy config push zhangjun-k8s01 zhangjun-k8s02 zhangjun-k8s03
    3. systemctl restart 'ceph-mds@zhangjun-k8s03.service' systemctl restart ceph-osd@0 systemctl restart 'ceph-mon@zhangjun-k8s01.service' systemctl restart 'ceph-mgr@zhangjun-k8s01.service'

    For anyone who is hitting this issue set default pg_num and pgp_num to lower value(8 for example), or set mon_max_pg_per_osd to a high value in ceph.conf radosgw-admin doesn' throw proper error when internal pool creation fails, hence the upper level error which is very confusing.

    https://tracker.ceph.com/issues/21497

    login 失败 503

    [root@zhangjun-k8s01 ~]# docker login 172.27.132.67:8000 Username: foo Password: Error response from daemon: login attempt to https://172.27.132.67:8000/v2/ failed with status: 503 Service Unavailable

    原因: docker run 缺少 --privileged 参数

    11.部署 harbor 私有仓库

    本文档介绍使用 docker-compose 部署 harbor 私有仓库的步骤,你也可以使用 docker 官方的 registry 镜像部署私有仓库(部署 Docker Registry)。

    使用的变量

    本文档用到的变量定义如下:

    # 这个环境变量后面会用到,但是搞不清楚这个IP到底是从哪儿来的???
    
    export NODE_IP=10.64.3.7 # 当前部署 harbor 的节点 IP
    
    

    下载文件

    从 docker compose 发布页面下载最新的 docker-compose 二进制文件

    cd /opt/k88/work
    wget https://github.com/docker/compose/releases/download/1.21.2/docker-compose-Linux-x86_64
    mv docker-compose-Linux-x86_64 /opt/k8s/bin/docker-compose
    chmod a+x  /opt/k8s/bin/docker-compose
    export PATH=/opt/k8s/bin:$PATH
    
    

    从 harbor 发布页面下载最新的 harbor 离线安装包

    cd /opt/k88/work
    wget --continue https://storage.googleapis.com/harbor-releases/release-1.5.0/harbor-offline-installer-v1.5.1.tgz
    tar -xzvf harbor-offline-installer-v1.5.1.tgz
    
    

    导入 docker images

    导入离线安装包中 harbor 相关的 docker images:

    cd harbor
    docker load -i harbor.v1.5.1.tar.gz
    
    

    创建 harbor nginx 服务器使用的 x509 证书

    创建 harbor 证书签名请求:

    cd /opt/k8s/work
    cat > harbor-csr.json <<EOF
    {
      "CN": "harbor",
      "hosts": [
        "127.0.0.1",
        "${NODE_IP}" ### 前面未设置环境变量的话可以直接写死
      ],
      "key": {
        "algo": "rsa",
        "size": 2048
      },
      "names": [
        {
          "C": "CN",
          "ST": "BeiJing",
          "L": "BeiJing",
          "O": "k8s",
          "OU": "4Paradigm"
        }
      ]
    }
    EOF
    
    • hosts 字段指定授权使用该证书的当前部署节点 IP,如果后续使用域名访问 harbor 则还需要添加域名;

    生成 harbor 证书和私钥:

    cd /opt/k8s/work
    cfssl gencert -ca=/etc/kubernetes/cert/ca.pem 
      -ca-key=/etc/kubernetes/cert/ca-key.pem 
      -config=/etc/kubernetes/cert/ca-config.json 
      -profile=kubernetes harbor-csr.json | cfssljson -bare harbor
    
    ls harbor*
    harbor.csr  harbor-csr.json  harbor-key.pem harbor.pem
    
    mkdir -p /etc/harbor/ssl
    cp harbor*.pem /etc/harbor/ssl
    
    

    修改 harbor.cfg 文件

    cd /opt/k8s/work/harbor
    cp harbor.cfg{,.bak} #  备份配置文件
    vim harbor.cfg
        hostname = 172.27.129.81
        ui_url_protocol = https
        ssl_cert =  /etc/harbor/ssl/harbor.pem
        ssl_cert_key = /etc/harbor/ssl/harbor-key.pem
    
    
    cp prepare{,.bak}
    vim prepare
    
    把empty_subj = "/C=/ST=/L=/O=/CN=/" 修改成 empty_subj = "/"
    
    • 需要修改 prepare 脚本的 empyt_subj 参数,否则后续 install 时出错退出:

      Fail to generate key file: ./common/config/ui/private_key.pem, cert file: ./common/config/registry/root.crt

    参考:https://github.com/vmware/harbor/issues/2920

    加载和启动 harbor 镜像

    cd /opt/k8s/work/harbor
    mkdir -p /data # 用来存放日志相关的 后期可以考虑修改到其他路径下
    ./install.sh
    
    [Step 0]: checking installation environment ...
    
    Note: docker version: 18.03.0
    
    Note: docker-compose version: 1.21.2
    
    [Step 1]: loading Harbor images ...
    Loaded image: vmware/clair-photon:v2.0.1-v1.5.1
    Loaded image: vmware/postgresql-photon:v1.5.1
    Loaded image: vmware/harbor-adminserver:v1.5.1
    Loaded image: vmware/registry-photon:v2.6.2-v1.5.1
    Loaded image: vmware/photon:1.0
    Loaded image: vmware/harbor-migrator:v1.5.1
    Loaded image: vmware/harbor-ui:v1.5.1
    Loaded image: vmware/redis-photon:v1.5.1
    Loaded image: vmware/nginx-photon:v1.5.1
    Loaded image: vmware/mariadb-photon:v1.5.1
    Loaded image: vmware/notary-signer-photon:v0.5.1-v1.5.1
    Loaded image: vmware/harbor-log:v1.5.1
    Loaded image: vmware/harbor-db:v1.5.1
    Loaded image: vmware/harbor-jobservice:v1.5.1
    Loaded image: vmware/notary-server-photon:v0.5.1-v1.5.1
    
    
    [Step 2]: preparing environment ...
    loaded secret from file: /data/secretkey
    Generated configuration file: ./common/config/nginx/nginx.conf
    Generated configuration file: ./common/config/adminserver/env
    Generated configuration file: ./common/config/ui/env
    Generated configuration file: ./common/config/registry/config.yml
    Generated configuration file: ./common/config/db/env
    Generated configuration file: ./common/config/jobservice/env
    Generated configuration file: ./common/config/jobservice/config.yml
    Generated configuration file: ./common/config/log/logrotate.conf
    Generated configuration file: ./common/config/jobservice/config.yml
    Generated configuration file: ./common/config/ui/app.conf
    Generated certificate, key file: ./common/config/ui/private_key.pem, cert file: ./common/config/registry/root.crt
    The configuration files are ready, please use docker-compose to start the service.
    
    
    [Step 3]: checking existing instance of Harbor ...
    
    
    [Step 4]: starting Harbor ...
    Creating network "harbor_harbor" with the default driver
    Creating harbor-log ... done
    Creating redis              ... done
    Creating harbor-adminserver ... done
    Creating harbor-db          ... done
    Creating registry           ... done
    Creating harbor-ui          ... done
    Creating harbor-jobservice  ... done
    Creating nginx              ... done
    
    ✔ ----Harbor has been installed and started successfully.----
    
    Now you should be able to visit the admin portal at https://192.168.75.110. 
    For more details, please visit https://github.com/vmware/harbor .
    

    访问管理界面

    确认所有组件都工作正常:

    [root@kube-node1 harbor]# docker-compose  ps
           Name                     Command                  State                                    Ports                              
    -------------------------------------------------------------------------------------------------------------------------------------
    harbor-adminserver   /harbor/start.sh                 Up (healthy)                                                                   
    harbor-db            /usr/local/bin/docker-entr ...   Up (healthy)   3306/tcp                                                        
    harbor-jobservice    /harbor/start.sh                 Up                                                                             
    harbor-log           /bin/sh -c /usr/local/bin/ ...   Up (healthy)   127.0.0.1:1514->10514/tcp                                       
    harbor-ui            /harbor/start.sh                 Up (healthy)                                                                   
    nginx                nginx -g daemon off;             Up (healthy)   0.0.0.0:443->443/tcp, 0.0.0.0:4443->4443/tcp, 0.0.0.0:80->80/tcp
    redis                docker-entrypoint.sh redis ...   Up             6379/tcp                                                        
    registry             /entrypoint.sh serve /etc/ ...   Up (healthy)   5000/tcp
    

    浏览器访问 https://192.168.75.110

    用账号 admin 和 harbor.cfg 配置文件中的默认密码 Harbor12345 登陆系统。

    harbor 运行时产生的文件、目录

    harbor 将日志打印到 /var/log/harbor 的相关目录下,使用 docker logs XXX 或 docker-compose logs XXX 将看不到容器的日志。

    # 日志目录
    ls /var/log/harbor
    adminserver.log  jobservice.log  mysql.log  proxy.log  registry.log  ui.log
    # 数据目录,包括数据库、镜像仓库
    ls /data/
    ca_download  config  database  job_logs registry  secretkey
    

    修改默认的数据目录等

    
    # 修改"secretkey"的路径
    vim harbor.cfg
    #The path of secretkey storage
    secretkey_path = /data/harbor-data # 默认是 /data
    
    # 修改原先所有默认为"/data"的volume的挂载路径
    vim docker-compose.yml
    
    # 完成上述修改后执行下述命令重新部署容器即可:
    ./prepare
    docker-compose up -d
    
    # 注意:在整个部署过程中,不要手动修改上述关联挂载路径下的内容。若要修改相关内容,一定要保证在容器完全移除(docker-compose down)的前提下进行。
    

    docker 客户端登陆

    将签署harbor 证书的 CA 证书拷贝到客户端的指定目录下 ,假设Harbor仓库部署在主机IP是192.168.75.110的主机上,主机IP是192.168.75.111的想要远程的登陆该仓库。

    # 在主机IP是192.168.75.111上创建指定目录用来存放仓库镜像。注意后面的IP地址,仓库地址是ip则用ip,是网址的话则用网址
    mkdir -p /etc/docker/certs.d/192.168.75.110
    
    # 在主机ip是192.168.75.110上操作,把CA证书拷贝到客户端的指定目录下,也就是上一步创建的目录下,并重命名为ca.crt
    scp /etc/kubernetes/cert/ca.pem root@192.168.75.111:/etc/docker/certs.d/192.168.75.110/ca.crt
    

    登陆 harbor

    # docker login https://192.168.75.110
    Username: admin
    Password: Harbor12345 # 默认密码
    

    认证信息自动保存到 ~/.docker/config.json 文件。

    其它操作

    下列操作的工作目录均为 解压离线安装文件后 生成的 harbor 目录。

    # 修改仓库镜像保存路径,日志文件保存路径等会用到这些,可以参考上面的步骤:修改默认的数据目录等
    
    # 停止 harbor
    docker-compose down -v
    
    # 修改配置
    vim harbor.cfg
    
    # 更修改的配置更新到 docker-compose.yml 文件
    ./prepare
    Clearing the configuration file: ./common/config/ui/app.conf
    Clearing the configuration file: ./common/config/ui/env
    Clearing the configuration file: ./common/config/ui/private_key.pem
    Clearing the configuration file: ./common/config/db/env
    Clearing the configuration file: ./common/config/registry/root.crt
    Clearing the configuration file: ./common/config/registry/config.yml
    Clearing the configuration file: ./common/config/jobservice/app.conf
    Clearing the configuration file: ./common/config/jobservice/env
    Clearing the configuration file: ./common/config/nginx/cert/admin.pem
    Clearing the configuration file: ./common/config/nginx/cert/admin-key.pem
    Clearing the configuration file: ./common/config/nginx/nginx.conf
    Clearing the configuration file: ./common/config/adminserver/env
    loaded secret from file: /data/secretkey
    Generated configuration file: ./common/config/nginx/nginx.conf
    Generated configuration file: ./common/config/adminserver/env
    Generated configuration file: ./common/config/ui/env
    Generated configuration file: ./common/config/registry/config.yml
    Generated configuration file: ./common/config/db/env
    Generated configuration file: ./common/config/jobservice/env
    Generated configuration file: ./common/config/jobservice/app.conf
    Generated configuration file: ./common/config/ui/app.conf
    Generated certificate, key file: ./common/config/ui/private_key.pem, cert file: ./common/config/registry/root.crt
    The configuration files are ready, please use docker-compose to start the service.
    
    chmod -R 666 common ## 防止容器进程没有权限读取生成的配置
    # 启动 harbor
    docker-compose up -d
    

    12.清理集群

    清理 Node 节点

    停相关进程:

    systemctl stop kubelet kube-proxy flanneld docker kube-proxy kube-nginx
    

    清理文件:

    source /opt/k8s/bin/environment.sh
    # umount kubelet 和 docker 挂载的目录
    mount | grep "${K8S_DIR}" | awk '{print $3}'|xargs sudo umount
    # 删除 kubelet 工作目录
    rm -rf ${K8S_DIR}/kubelet
    # 删除 docker 工作目录
    rm -rf ${DOCKER_DIR}
    # 删除 flanneld 写入的网络配置文件
    rm -rf /var/run/flannel/
    # 删除 docker 的一些运行文件
    rm -rf /var/run/docker/
    # 删除 systemd unit 文件
    rm -rf /etc/systemd/system/{kubelet,docker,flanneld,kube-nginx}.service
    # 删除程序文件
    rm -rf /opt/k8s/bin/*
    # 删除证书文件
    rm -rf /etc/flanneld/cert /etc/kubernetes/cert
    
    

    清理 kube-proxy 和 docker 创建的 iptables:

    iptables -F && sudo iptables -X && sudo iptables -F -t nat && sudo iptables -X -t nat
    

    删除 flanneld 和 docker 创建的网桥:

    ip link del flannel.1
    ip link del docker0
    

    清理 Master 节点

    停相关进程:

    systemctl stop kube-apiserver kube-controller-manager kube-scheduler kube-nginx
    

    清理文件:

    # 删除 systemd unit 文件
    rm -rf /etc/systemd/system/{kube-apiserver,kube-controller-manager,kube-scheduler,kube-nginx}.service
    # 删除程序文件
    rm -rf /opt/k8s/bin/{kube-apiserver,kube-controller-manager,kube-scheduler}
    # 删除证书文件
    rm -rf /etc/flanneld/cert /etc/kubernetes/cert
    

    清理 etcd 集群

    停相关进程:

    systemctl stop etcd
    

    清理文件:

    source /opt/k8s/bin/environment.sh
    # 删除 etcd 的工作目录和数据目录
    rm -rf ${ETCD_DATA_DIR} ${ETCD_WAL_DIR}
    # 删除 systemd unit 文件
    rm -rf /etc/systemd/system/etcd.service
    # 删除程序文件
    rm -rf /opt/k8s/bin/etcd
    # 删除 x509 证书文件
    rm -rf /etc/etcd/cert/*
    

    A.浏览器访问 kube-apiserver 安全端口

    浏览器访问 kube-apiserver 的安全端口 6443 时,提示证书不被信任:

    ssl-failed

    这是因为 kube-apiserver 的 server 证书是我们创建的根证书 ca.pem 签名的,需要将根证书 ca.pem 导入操作系统,并设置永久信任。

    对于 Mac,操作如下:

    keychain

    对于 windows 系统使用以下命令导入 ca.perm:

    keytool -import -v -trustcacerts -alias appmanagement -file "PATH...\ca.pem" -storepass password -keystore cacerts
    

    再次访问 apiserver 地址,已信任,但提示 401,未授权的访问:

    ssl-success

    注意:从这个地方开始进行操作

    我们需要给浏览器生成一个 client 证书,访问 apiserver 的 6443 https 端口时使用。

    这里使用部署 kubectl 命令行工具时创建的 admin 证书、私钥和上面的 ca 证书,创建一个浏览器可以使用 PKCS#12/PFX 格式的证书:

    $ openssl pkcs12 -export -out admin.pfx -inkey admin-key.pem -in admin.pem -certfile ca.pem
    
    # 中间输入密码的地方都不输入密码,直接回车
    
    # windows系统直接在使用的浏览器设置中导入生成的这个证书即可
    

    将创建的 admin.pfx 导入到系统的证书中。对于 Mac,操作如下:

    admin-cert

    重启浏览器,再次访问 apiserver 地址,提示选择一个浏览器证书,这里选中上面导入的 admin.pfx:

    select-cert

    这一次,被授权访问 kube-apiserver 的安全端口:

    chrome-authored

    客户端选择证书的原理

    1. 证书选择是在客户端和服务端 SSL/TLS 握手协商阶段商定的;
    2. 服务端如果要求客户端提供证书,则在握手时会向客户端发送一个它接受的 CA 列表;
    3. 客户端查找它的证书列表(一般是操作系统的证书,对于 Mac 为 keychain),看有没有被 CA 签名的证书,如果有,则将它们提供给用户选择(证书的私钥);
    4. 用户选择一个证书私钥,然后客户端将使用它和服务端通信;

    参考

    B.校验证书

    以校验 kubernetes 证书(后续部署 master 节点时生成的)为例:

    使用 openssl 命令

    $ openssl x509  -noout -text -in  kubernetes.pem
    ...
        Signature Algorithm: sha256WithRSAEncryption
            Issuer: C=CN, ST=BeiJing, L=BeiJing, O=k8s, OU=System, CN=Kubernetes
            Validity
                Not Before: Apr  5 05:36:00 2017 GMT
                Not After : Apr  5 05:36:00 2018 GMT
            Subject: C=CN, ST=BeiJing, L=BeiJing, O=k8s, OU=System, CN=kubernetes
    ...
            X509v3 extensions:
                X509v3 Key Usage: critical
                    Digital Signature, Key Encipherment
                X509v3 Extended Key Usage:
                    TLS Web Server Authentication, TLS Web Client Authentication
                X509v3 Basic Constraints: critical
                    CA:FALSE
                X509v3 Subject Key Identifier:
                    DD:52:04:43:10:13:A9:29:24:17:3A:0E:D7:14:DB:36:F8:6C:E0:E0
                X509v3 Authority Key Identifier:
                    keyid:44:04:3B:60:BD:69:78:14:68:AF:A0:41:13:F6:17:07:13:63:58:CD
    
                X509v3 Subject Alternative Name:
                    DNS:kubernetes, DNS:kubernetes.default, DNS:kubernetes.default.svc, DNS:kubernetes.default.svc.cluster, DNS:kubernetes.default.svc.cluster.local, IP Address:127.0.0.1, IP Address:10.64.3.7, IP Address:10.254.0.1
    ...
    
    • 确认 Issuer 字段的内容和 ca-csr.json 一致;
    • 确认 Subject 字段的内容和 kubernetes-csr.json 一致;
    • 确认 X509v3 Subject Alternative Name 字段的内容和 kubernetes-csr.json 一致;
    • 确认 X509v3 Key Usage、Extended Key Usage 字段的内容和 ca-config.json 中 kubernetes profile 一致;

    使用 cfssl-certinfo 命令

    $ cfssl-certinfo -cert kubernetes.pem
    ...
    {
      "subject": {
        "common_name": "kubernetes",
        "country": "CN",
        "organization": "k8s",
        "organizational_unit": "System",
        "locality": "BeiJing",
        "province": "BeiJing",
        "names": [
          "CN",
          "BeiJing",
          "BeiJing",
          "k8s",
          "System",
          "kubernetes"
        ]
      },
      "issuer": {
        "common_name": "Kubernetes",
        "country": "CN",
        "organization": "k8s",
        "organizational_unit": "System",
        "locality": "BeiJing",
        "province": "BeiJing",
        "names": [
          "CN",
          "BeiJing",
          "BeiJing",
          "k8s",
          "System",
          "Kubernetes"
        ]
      },
      "serial_number": "174360492872423263473151971632292895707129022309",
      "sans": [
        "kubernetes",
        "kubernetes.default",
        "kubernetes.default.svc",
        "kubernetes.default.svc.cluster",
        "kubernetes.default.svc.cluster.local",
        "127.0.0.1",
        "10.64.3.7",
        "10.64.3.8",
        "10.66.3.86",
        "10.254.0.1"
      ],
      "not_before": "2017-04-05T05:36:00Z",
      "not_after": "2018-04-05T05:36:00Z",
      "sigalg": "SHA256WithRSA",
    ...
    

    参考

  • 相关阅读:
    谁在TDD
    开源许可证简单总结
    【转】IIS HTTP500错误以及COM+应用程序8004e00f错误的解决方法
    [原]Linux平台Boost的编译方法
    [原]linux下格式化磁盘的相关问题
    [原]编译MongoDB,C++连接MongoDB测试
    [转]谈谈Unicode编码,简要解释UCS、UTF、BMP、BOM等名词(科普)
    [转]linux下如何查看文件编码格式及转换文件编码
    [原]linux(虚拟机)下安装MySQL
    [转]Linux下比较全面的监控工具dstat
  • 原文地址:https://www.cnblogs.com/sanduzxcvbnm/p/11835010.html
Copyright © 2020-2023  润新知