• kubernetes 网络组件简介


    链接地址:https://blog.csdn.net/kjh2007abc/article/details/86751730

    k8s的网络模型假定了所有Pod都在一个可以直接连通的扁平的网络空间中。这是因为k8s出自Google,而在GCE里面是提供了网络模型作为基础设施的,所以k8s就假定这个网络已经存在。而在大家私有的平台设施里搭建k8s集群,就不能假定这种网络已经存在了。我们需要自己实现这个网络,将不同节点上的Docker容器之间的互相访问先打通,然后运行k8s。

    目前已经有多个开源组件支持容器网络模型。本节介绍几个常见的网络组件及其安装配置方法,包括Flannel、Open vSwitch、直接路由和Calico。

    1. Flannel
    1.1 Flannel通信原理
    Flannel之所以可以搭建k8s依赖的底层网络,是因为它能实现以下两点。
    (1)它能协助k8s,给每一个Node上的Docker容器分配互相不冲突的IP地址。
    (2)它能在这些IP地址之间建立一个覆盖网络(Overlay Network),通过这个覆盖网络,将数据包原封不动地传递到目标容器内。

    我们通过下图来看看Flannel是如何实现这两点的。

    可以看到,Flannel首先创建了一个名为flannel0的网桥,这个网桥的一端连接docker0网桥,另一端连接一个叫作flanneld的服务进程。

    flanneld进程很重要:
    flanneld首先要连接etcd,利用etcd来管理可分配的IP地址段资源,同时监控etcd中每个Pod的实际地址,并在内存中建立了一个Pod节点路由表;
    然后flanneld进程下连docker0和物理网络,使用内存中的Pod节点路由表,将docker0发给它的数据包包装起来,利用物理网络的连接将数据包投递到目标flanneld上,从而完成Pod到Pod之间的直接地址通信。
    Flannel之间的底层通信协议的可选余地很多,有UDP, VxLAN, AWS VPC等多种方式,只要能通到对端的Flannel就可以了。源flanneld加包,目标flanneld解包,最终docker0看到的就是原始的数据,非常透明,根本感觉不到中间Flannel的存在。常用的是UDP。

    Flannel是如何做到为不同Node上的Pod分配IP且不产生冲突的?因为Flannel使用集中的etcd服务管理这些地址资源信息,它每次分配的地址段都在同一个公共区域获取,这样自然能随时协调,避免冲突了。在Flannel分配好地址段后,接下来的工作就转交给Docker完成了。Flannel通过修改Docker的启动参数将分配给它的地址段传递进去。
    --bip=172.17.18.1/24

    通过这些操作,Flannel就控制了每个Node节点上的docker0地址段的地址,也能保障所有Pod的IP地址在同一水平的网络中且不产生冲突了。

    Flannel完美地解决了对k8s网络的支持,但是它引入了多个网络组件,在网络通信时需要转到flannel0网络接口,再转到用户态的flanneld程序,到对端后还需要走这个过程的反过程,所以会引入一些网络的延时消耗。

    另外,Flannel模型默认使用了UDP作为底层传输协议,UDP协议本身的非可靠性,在大流量、高并发应用场景下还需要反复测试,确保没有问题。

    1.2 Flannel的安装和配置方法
    1)安装etcd
    由于Flannel使用etcd作为数据库,所以需要预先安装好,这里不做描述。

    2)安装Flannel
    需要在每台Node上都安装Flannel。Flannel软件的下载地址为:https://github.com/coreos/flannel/releases 。将下载好的flannel-<version>-linux-amd64.tar.gz解压,把二进制文件flanneld和mk-docker-opts.sh复制到/usr/bin中,即可完成对Flannel的安装。

    3)配置Flannel
    此处以使用systemd系统为例对flanneld服务进行配置。
    编辑服务配置文件/usr/lib/systemd/system/flanneld.service:
    [root@k8s-node1 sysconfig]# more /usr/lib/systemd/system/flanneld.service
    [Unit]
    Description=flanneld overlay address etcd agent
    After=network.target
    Before=docker.service

    [Service]
    Type=notify
    EnvironmentFile=/etc/sysconfig/flannel
    ExecStart=/usr/bin/flanneld -etcd-endpoints=http://10.0.2.15:2379 $FLANNEL_OPTIONS

    [Install]
    RequiredBy=docker.service
    WantedBy=multi-user.target

    编辑配置文件/etc/sysconfig/flannel,设置etcd的URL地址:
    [root@k8s-node2 sysconfig]# more flannel
    # flanneld configuration options
    # etcd url location. Point this to the server where etcd runs
    FLANNEL_ETCD="http://10.0.2.15:2379"

    # etcd config key. This is the configuration key that flannel queries
    # For address range assignment
    FLANNEL_ETCD_KEY="/coreos.com/network"

    在启动flanneld服务之前,需要在etcd中添加一条网络配置记录,这个配置将用于flanneld分配给每个Docker的虚拟IP地址段。
    [root@k8s-master ~]# etcdctl set /coreos.com/network/config '{ "Network": "172.16.0.0/16" }'
    { "Network": "172.16.0.0/16" }
    由于Flannel将覆盖docker0网桥,所以如果Docker服务已启动,则需要停止Docker服务。

    4)启动Flannel服务
    systemctl daemon-reload
    systemctl restart flanneld

    5)重新启动Docker服务
    systemctl daemon-reload
    systemctl restart docker

    6)设置docker0网桥的IP地址
    mk-docker-opts.sh -i
    source /run/flannel/subnet.env
    ifconfig docker0 ${FLANNEL_SUBNET}

    完成后确认网络接口docker0的IP属于flannel0的子网:
    [root@k8s-node1 system]# ip a
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host
           valid_lft forever preferred_lft forever
    2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
        link/ether 08:00:27:9f:89:14 brd ff:ff:ff:ff:ff:ff
        inet 10.0.2.4/24 brd 10.0.2.255 scope global noprefixroute dynamic enp0s3
           valid_lft 993sec preferred_lft 993sec
        inet6 fe80::a00:27ff:fe9f:8914/64 scope link
           valid_lft forever preferred_lft forever
    3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
        link/ether 02:42:c9:52:3d:15 brd ff:ff:ff:ff:ff:ff
        inet 172.16.70.1/24 brd 172.16.70.255 scope global docker0
           valid_lft forever preferred_lft forever
        inet6 fe80::42:c9ff:fe52:3d15/64 scope link
           valid_lft forever preferred_lft forever
    6: flannel0: <POINTOPOINT,MULTICAST,NOARP,UP,LOWER_UP> mtu 1472 qdisc pfifo_fast state UNKNOWN group default qlen 500
        link/none
        inet 172.16.70.0/16 scope global flannel0
           valid_lft forever preferred_lft forever
        inet6 fe80::4b31:c92f:8cc9:3a22/64 scope link flags 800
           valid_lft forever preferred_lft forever
    [root@k8s-node1 system]#

    至此,就完成了Flannel覆盖网络的设置。

    使用ping命令验证各Node上docker0之间的相互访问。例如在Node1(docker0 IP=172.16.70.1)机器上ping Node2的docker0(docker0 IP=172.16.13.1),通过Flannel能够成功连接到其他物理机的Docker网络:
    [root@k8s-node1 system]# ifconfig flannel0
    flannel0: flags=4305<UP,POINTOPOINT,RUNNING,NOARP,MULTICAST>  mtu 1472
            inet 172.16.70.0  netmask 255.255.0.0  destination 172.16.70.0
            inet6 fe80::524a:4b9c:3391:7514  prefixlen 64  scopeid 0x20<link>
            unspec 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00  txqueuelen 500  (UNSPEC)
            RX packets 5  bytes 420 (420.0 B)
            RX errors 0  dropped 0  overruns 0  frame 0
            TX packets 8  bytes 564 (564.0 B)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

    [root@k8s-node1 system]# ifconfig docker0
    docker0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
            inet 172.16.70.1  netmask 255.255.255.0  broadcast 172.16.70.255
            inet6 fe80::42:c9ff:fe52:3d15  prefixlen 64  scopeid 0x20<link>
            ether 02:42:c9:52:3d:15  txqueuelen 0  (Ethernet)
            RX packets 0  bytes 0 (0.0 B)
            RX errors 0  dropped 0  overruns 0  frame 0
            TX packets 8  bytes 648 (648.0 B)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

    [root@k8s-node1 system]# ping 172.16.13.1
    PING 172.16.13.1 (172.16.13.1) 56(84) bytes of data.
    64 bytes from 172.16.13.1: icmp_seq=1 ttl=62 time=1.63 ms
    64 bytes from 172.16.13.1: icmp_seq=2 ttl=62 time=1.55 ms
    ^C
    --- 172.16.13.1 ping statistics ---
    2 packets transmitted, 2 received, 0% packet loss, time 1002ms
    rtt min/avg/max/mdev = 1.554/1.595/1.637/0.057 ms

    我们可以在etcd中查看到Flannel设置的flannel0地址与物理机IP地址的对应规则:
    [root@k8s-master etcd]# etcdctl ls /coreos.com/network/subnets
    /coreos.com/network/subnets/172.16.70.0-24
    /coreos.com/network/subnets/172.16.13.0-24

    [root@k8s-master etcd]# etcdctl get /coreos.com/network/subnets/172.16.70.0-24
    {"PublicIP":"10.0.2.4"}
    [root@k8s-master etcd]# etcdctl get /coreos.com/network/subnets/172.16.13.0-24
    {"PublicIP":"10.0.2.5"}

    2 Open vSwitch
    2.1 基本原理
    Open vSwitch是一个开源的虚拟交换软件,有点儿像Linux中的bridge,但是功能要复杂得多。Open vSwitch的网桥可以直接建立多种通信通道(隧道),例如Open vSwitch with GRE/VxLAN。这些通道的建立可以很容易地通过OVS的配置命令实现。在k8s、Docker场景下,我们主要是建立L3到L3的隧道,例如下面样子的网络架构。

    首先,为了避免Docker创建的docker0地址产生冲突,我们需要手动配置和指定下各个Node节点上docker0网桥的地址段分布。
    其次,建立Open vSwitch的网桥ovs,然后使用ovs-vsctl命令给ovs网桥增加gre端口,添加gre端口时要将目标连接的NodeIP地址设置为对端的IP地址。对每一个对端IP地址都需要这么操作(对于大型网络,需要做自动化脚本来完成)。
    最后,将ovs的网桥作为网络接口,加入Docker的网桥上。重启ovs网桥和Docker的网桥,并添加一个Docker的地址段到Docker网桥的路由规则项,就可以将两个容器的网络连接起来了。

    2.2 网络通信过程
    当容器内的应用访问另一个容器的地址时,数据包会通过容器内的默认路由发送给docker0网桥。ovs的网桥是作为docker0网桥的端口存在的,安会将数据发送给ovs网桥。ovs网络已经通过配置建立了和其他ovs网桥的GRE/VxLAN隧道,自然能将数据送达对端的Node,并送往docker0及Pod。
    通过新增的路由项,使用得Node节点本身的应用的数据也路由到docker0网桥上,和刚才的通信过程一样,自然也可以访问其他Node上的Pod。

    2.3 OVS with GRE/VxLAN组网方式的特点
    OVS的优势是,作为开源虚拟交换机软件,它相对成熟和稳定,支持各类网络隧道协议,经过了OpenStack等项目的考验。
    另一方面,相对于Flannel不但可以建立OverlayNetwork,实现Pod到Pod的通信,还和k8s、Docker架构体系紧密结合,感知k8s的Service,动态维护自己的路由表,还通过etcd来协助Docker对整个k8s集群中的docker0的子网地址进行分配。使用OVS时,很多事情就需要手工完成了。
    此外,无外是OVS,还是Flannel,通过建立Overlay Network,实现Pod到Pod的通信,都会引入一些额外的通信开销。如果是对网络依赖特别重的应用,则需要评估对业务的影响。

    2.4 Open vSwitch的安装与配置
    以两个Node为例,目标网络拓扑如下图所示。

    1)在两个Node上安装ovs
    需要确认下关闭了Node节点上的selinux。
    同时在两个Node节点上:
    yum -y install openvswitch

    查看Open vSwitch服务状态,需要有ovsdb-server与ovs-vswitchd两个进程。
    [root@k8s-node2 system]# systemctl start openvswitch
    [root@k8s-node2 system]# systemctl status openvswitch
    ● openvswitch.service - Open vSwitch
       Loaded: loaded (/usr/lib/systemd/system/openvswitch.service; disabled; vendor preset: disabled)
       Active: active (exited) since Sun 2018-06-10 17:06:40 CST; 6s ago
      Process: 8368 ExecStart=/bin/true (code=exited, status=0/SUCCESS)
    Main PID: 8368 (code=exited, status=0/SUCCESS)

    Jun 10 17:06:40 k8s-node2.test.com systemd[1]: Starting Open vSwitch...
    Jun 10 17:06:40 k8s-node2.test.com systemd[1]: Started Open vSwitch.
    [root@k8s-node2 system]# ps -ef|grep ovs
    root      8352     1  0 17:06 ?        00:00:00 ovsdb-server: monitoring pid 8353 (healthy)
    root      8353  8352  0 17:06 ?        00:00:00 ovsdb-server /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock --private-key=db:Open_vSwitch,SSL,private_key --certificate=db:Open_vSwitch,SSL,certificate --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --no-chdir --log-file=/var/log/openvswitch/ovsdb-server.log --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach --monitor
    root      8364     1  0 17:06 ?        00:00:00 ovs-vswitchd: monitoring pid 8365 (healthy)
    root      8365  8364  0 17:06 ?        00:00:00 ovs-vswitchd unix:/var/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --mlockall --no-chdir --log-file=/var/log/openvswitch/ovs-vswitchd.log --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --detach --monitor

    2)创建网桥和GRE隧道
    接下来需要在每个Node上建立ovs的网桥br0,然后在网桥上创建一个GRE隧道连接对端的网桥,最后把ovs的网桥br0作为一个端口连接到docker0这个Linux网桥上。
    这样一来,两个节点机器上的docker0网段就能互通了。
    以Node1节点为例,具体操作步骤如下:
    (1)创建ovs网桥
    [root@k8s-node1 system]# ovs-vsctl add-br br0
    (2)创建GRE隧道连接对端,remote_ip为对端的eth0网卡地址
    [root@k8s-node1 system]# ovs-vsctl add-port br0 gre1 -- set interface gre1 type=gre option:remote_ip=10.0.2.5
    (3)添加br0到本地docker0,使得容器流量通过OVS流经tunnel
    [root@k8s-node1 system]# brctl addif docker0 br0
    (4)启动br0与docker0网桥
    [root@k8s-node1 system]# ip link set dev br0 up
    [root@k8s-node1 system]# ip link set dev docker0 up
    (5)添加路由规则
    由于10.0.2.5与10.0.2.4的docker0网段分别为172.16.20.0/24与172.16.10.0/24,这两个网段的路由都需要经过本机的docker0网桥路由,其中一个24网段是通过OVS的GRE隧道到达对端的。因此需要在每个Node上添加通过docker0网桥转发的172.16.0.0/16的路由规则:
    [root@k8s-node1 system]# ip route add 172.16.0.0/16 dev docker0
    (6)清空Docker自带的iptables规则及Linux的规则,后者存在拒绝icmp报文通过防火墙的规则
    [root@k8s-node1 system]# iptables -t nat -F
    [root@k8s-node1 system]# iptables -F

    在Node1节点上完成以上操作后,在Node2节点上进行相同的配置。

    配置完成后,Node1节点的IP地址、docker0的IP地址及路由等重要信息显示如下:
    [root@k8s-node1 system]# ip addr
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
        inet6 ::1/128 scope host
           valid_lft forever preferred_lft forever
    2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
        link/ether 08:00:27:9f:89:14 brd ff:ff:ff:ff:ff:ff
        inet 10.0.2.4/24 brd 10.0.2.255 scope global noprefixroute dynamic enp0s3
           valid_lft 842sec preferred_lft 842sec
        inet6 fe80::a00:27ff:fe9f:8914/64 scope link
           valid_lft forever preferred_lft forever
    3: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
        link/ether 02:42:c9:52:3d:15 brd ff:ff:ff:ff:ff:ff
        inet 172.16.10.1/24 brd 172.16.10.255 scope global docker0
           valid_lft forever preferred_lft forever
        inet6 fe80::42:c9ff:fe52:3d15/64 scope link
           valid_lft forever preferred_lft forever
    10: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
        link/ether 5e:a9:02:75:aa:98 brd ff:ff:ff:ff:ff:ff
    11: br0: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker0 state UNKNOWN group default qlen 1000
        link/ether 82:e3:9a:29:3c:46 brd ff:ff:ff:ff:ff:ff
        inet6 fe80::a8de:24ff:fef4:f8ec/64 scope link
           valid_lft forever preferred_lft forever
    12: gre0@NONE: <NOARP> mtu 1476 qdisc noop state DOWN group default qlen 1000
        link/gre 0.0.0.0 brd 0.0.0.0
    13: gretap0@NONE: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN group default qlen 1000
        link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
    14: gre_system@NONE: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 65490 qdisc pfifo_fast master ovs-system state UNKNOWN group default qlen 1000
        link/ether 76:53:6f:11:e0:f8 brd ff:ff:ff:ff:ff:ff
        inet6 fe80::7453:6fff:fe11:e0f8/64 scope link
           valid_lft forever preferred_lft forever
    [root@k8s-node1 system]#

    [root@k8s-node1 system]# ip route
    default via 10.0.2.1 dev enp0s3 proto dhcp metric 100
    10.0.2.0/24 dev enp0s3 proto kernel scope link src 10.0.2.4 metric 100
    172.16.0.0/16 dev docker0 scope link
    172.16.10.0/24 dev docker0 proto kernel scope link src 172.16.10.1

    3)两个Node上容器之间的互通测试
    [root@k8s-node1 system]# ping 172.16.20.1
    PING 172.16.20.1 (172.16.20.1) 56(84) bytes of data.
    64 bytes from 172.16.20.1: icmp_seq=1 ttl=64 time=2.39 ms
    64 bytes from 172.16.20.1: icmp_seq=2 ttl=64 time=3.36 ms
    ^C
    --- 172.16.20.1 ping statistics ---
    2 packets transmitted, 2 received, 0% packet loss, time 1004ms
    rtt min/avg/max/mdev = 2.398/2.882/3.366/0.484 ms
    [root@k8s-node1 system]#

    在Node2上抓包观察:
    [root@k8s-node2 system]# tcpdump -i docker0 -nnn
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on docker0, link-type EN10MB (Ethernet), capture size 262144 bytes
    23:43:59.020039 IP 172.16.10.1 > 172.16.20.1: ICMP echo request, id 20831, seq 26, length 64
    23:43:59.020096 IP 172.16.20.1 > 172.16.10.1: ICMP echo reply, id 20831, seq 26, length 64
    23:44:00.020899 IP 172.16.10.1 > 172.16.20.1: ICMP echo request, id 20831, seq 27, length 64
    23:44:00.020939 IP 172.16.20.1 > 172.16.10.1: ICMP echo reply, id 20831, seq 27, length 64
    23:44:01.021706 IP 172.16.10.1 > 172.16.20.1: ICMP echo request, id 20831, seq 28, length 64
    23:44:01.021750 IP 172.16.20.1 > 172.16.10.1: ICMP echo reply, id 20831, seq 28, length 64

    接下来我们从前面曾做过的实验中找出来一份创建2实例的RC资源文件来,实际创建两个容器来测试下两个Pods间的网络通信:
    [root@k8s-master ~]# more frontend-rc.yaml
    apiVersion: v1
    kind: ReplicationController
    metadata:
      name: frontend
      labels:
        name: frontend
    spec:
       replicas: 2
       selector:
         name: frontend
       template:
         metadata:
           labels:
             name: frontend
         spec:
           containers:
           - name: php-redis
             image: kubeguide/guestbook-php-frontend
             ports:
             - containerPort: 80
               hostPort: 80
             env:
             - name: GET_HOSTS_FROM
               value: env
    [root@k8s-master ~]#

    创建并观察下结果:
    [root@k8s-master ~]# kubectl get rc
    NAME       DESIRED   CURRENT   READY     AGE
    frontend   2         2         2         33m
    [root@k8s-master ~]# kubectl get pods -o wide
    NAME             READY     STATUS    RESTARTS   AGE       IP            NODE
    frontend-b6krg   1/1       Running   1          33m       172.16.20.2   10.0.2.5
    frontend-qk6zc   1/1       Running   0          33m       172.16.10.2   10.0.2.4

    我们继续登录进入Node1节点上的容器内部:
    [root@k8s-master ~]# kubectl exec -it frontend-qk6zc -c php-redis /bin/bash
    root@frontend-qk6zc:/var/www/html# ip a
    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        inet 127.0.0.1/8 scope host lo
           valid_lft forever preferred_lft forever
    2: gre0@NONE: <NOARP> mtu 1476 qdisc noop state DOWN group default qlen 1000
        link/gre 0.0.0.0 brd 0.0.0.0
    3: gretap0@NONE: <BROADCAST,MULTICAST> mtu 1462 qdisc noop state DOWN group default qlen 1000
        link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
    22: eth0@if23: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
        link/ether 02:42:ac:10:0a:02 brd ff:ff:ff:ff:ff:ff
        inet 172.16.10.2/24 brd 172.16.10.255 scope global eth0
           valid_lft forever preferred_lft forever
    root@frontend-qk6zc:/var/www/html#
    从Node1上运行的Pod中ping一个Nod2上运行的Pod的地址:
    root@frontend-qk6zc:/var/www/html# ping 172.16.20.2
    PING 172.16.20.2 (172.16.20.2): 56 data bytes
    64 bytes from 172.16.20.2: icmp_seq=0 ttl=63 time=2017.587 ms
    64 bytes from 172.16.20.2: icmp_seq=1 ttl=63 time=1014.193 ms
    64 bytes from 172.16.20.2: icmp_seq=2 ttl=63 time=13.232 ms
    64 bytes from 172.16.20.2: icmp_seq=3 ttl=63 time=1.122 ms
    64 bytes from 172.16.20.2: icmp_seq=4 ttl=63 time=1.379 ms
    64 bytes from 172.16.20.2: icmp_seq=5 ttl=63 time=1.474 ms
    64 bytes from 172.16.20.2: icmp_seq=6 ttl=63 time=1.371 ms
    64 bytes from 172.16.20.2: icmp_seq=7 ttl=63 time=1.583 ms
    ^C--- 172.16.20.2 ping statistics ---
    8 packets transmitted, 8 packets received, 0% packet loss
    round-trip min/avg/max/stddev = 1.122/381.493/2017.587/701.350 ms
    root@frontend-qk6zc:/var/www/html#
    在Node2节点上抓包看到数据包交互:
    [root@k8s-node2 system]# tcpdump -i docker0 -nnn
    tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
    listening on docker0, link-type EN10MB (Ethernet), capture size 262144 bytes
    00:13:18.601908 IP 172.16.10.2 > 172.16.20.2: ICMP echo request, id 38, seq 4, length 64
    00:13:18.601947 IP 172.16.20.2 > 172.16.10.2: ICMP echo reply, id 38, seq 4, length 64
    00:13:18.601956 IP 172.16.20.2 > 172.16.10.2: ICMP echo reply, id 38, seq 4, length 64
    00:13:28.609109 IP 172.16.10.2 > 172.16.20.2: ICMP echo request, id 38, seq 5, length 64
    00:13:28.609165 IP 172.16.20.2 > 172.16.10.2: ICMP echo reply, id 38, seq 5, length 64
    00:13:28.609179 IP 172.16.20.2 > 172.16.10.2: ICMP echo reply, id 38, seq 5, length 64
    00:13:29.612564 IP 172.16.10.2 > 172.16.20.2: ICMP echo request, id 38, seq 6, length 64
    注:如果以上网络通信测试没有完全成功,不妨检查下Node节点上的firewalld防火墙配置。
    至此,基于OVS的网络搭建成功,由于GRE是点对点的隧道通信方式,所以如果有多个Node,则需要建立N*(N-1)条GRE隧道,即所有Node组成一个网状的网络,以实现全网互通。
    3 直接路由
    我们在前几节的实验中已经测试过通过直接手动写路由的方式,实现Node之间的网络通信功能了,配置方法不再讨论。该直接路由配置方法的问题是,在集群节点发生变化时,需要手动去维护每个Node上的路由表信息,效率很低。为了有效管理这些动态变化的网络路由信息,动态地让其他Node都感知到,就需要使用动态路由发现协议来同步这些变化。
    在实现这些动态路由发现协议的开源软件中,常用的有Quagga、Zebra等。
    下面简单介绍下配置步骤和注意事项。
    (1)仍然需要手动分配每个Node节点上的Docker bridge的地址段
    无论是修改默认的docker0使用的地址段,还是另建一个bridge并使用--bridge=XX来指定使用的网桥,都需要确保每个Node上Docker网桥使用的地址段不能重叠。
    (2)然后在每个Node上运行Quagga
    既可以选择在每台服务器上安装Quagga软件并启动,也可以使用Quagga容器来运行。在每台Node上下载Docker镜像:
    # docker pull georce/router
    在每台Node上启动Quagga容器,需要说明的是,Quagga需要以--privileged特权模式运行,并且指定--net=host,表示直接使用物理机的网络。
    # docker run -itd --name=router --privileged --net=host georce/router

    启动成功后,各Node上的Quagga会相互学习来完成到其他机器的docker0路由规则的添加。
    至此,所有Node上的docker0都可以互联互通了。
    注:如果集群规模在数千台Node以上,则需要测试和评估路由表的效率问题。

    4 Calico容器网络和网络策略
    4.1 Calico简介
    Calico 是容器网络的又一种解决方案,和其他虚拟网络最大的不同是,它没有采用 overlay 网络做报文的转发,提供了纯 3 层的网络模型。三层通信模型表示每个容器都通过 IP 直接通信,中间通过路由转发找到对方。在这个过程中,容器所在的节点类似于传统的路由器,提供了路由查找的功能。要想路由工作能够正常,每个虚拟路由器(容器所在的主机节点)必须有某种方法知道整个集群的路由信息,calico 采用的是 BGP 路由协议,全称是 Border Gateway Protocol。除了能用于 容器集群平台 kubernetes、共有云平台 AWS、GCE 等, 也能很容易地集成到 openstack 等 Iaas 平台。

    Calico在每个计算节点利用Linux Kernel实现了一个高效的vRouter来负责数据转发。每个vRouter通过BGP协议把在本节点上运行的容器的路由信息向整个Calico网络广播,并自动设置到达其他节点的路由转发规则。Calico保证所有容器之间的数据流量都是通过IP路由的方式完成互联互通的。Calico节点组网可以直接利用数据中心的网络结构(L2或者L3),不需要额外的NAT、隧道或者Overlay Network,没有额外的封包解包,能够节约CPU运算,提高网络通信效率。Calico的数据包结构示意图如下。

    Calico在小规模集群中可以直接互联,在大规模集群中可以通过额外的BGP route reflector来完成。

    此外,Calico基于iptables还提供了丰富的网络策略,实现了k8s的Network Policy策略,提供容器间网络可达性限制的功能。

    Calico的主要组件如下:
    Felix:Calico Agent,运行在每台Node上,负责为容器设置网络源(IP地址、路由规则、iptables规则等),保证主机容器网络互通。
    etcd:Calico使用的存储后端。
    BGP Client(BIRD):负责把Felix在各Node上设置的路由信息通过BGP协议广播到Calico网络。
    BGP Route Reflector(BIRD):通过一个或者多个BGP Route Reflector来完成大规模集群的分级路由分发。
    calicoctl:Calico命令行管理工具。
    4.2 部署Calico服务
    在k8s中部署Calico的主要步骤包括两部分。
    4.2.1 修改kubernetes服务的启动参数,并重启服务
    设置Master上kube-apiserver服务的启动参数:--allow-privileged=true(因为Calico-node需要以特权模式运行在各Node上)。
    设置各Node上kubelet服务的启动参数:--network-plugin=cni(使用CNI网络插件), --allow-privileged=true
    本例中的K8s集群包括两台Node:Node1(10.0.2.4)和Node2(10.0.2.5)

    4.2.2 创建Calico服务,主要包括Calico-node和Calico policy controller
    需要创建出以下的资源对象:
    创建ConfigMap calico-config,包含Calico所需的配置参数。
    创建Secret calico-etcd-secrets,用于使用TLS方式连接etcd。
    在每个Node上运行calico/node容器,部署为DaemonSet。
    在每个Node上安装Calico CNI二进制文件和网络配置参数(由install-cni容器完成)。
    部署一个名为calico/kube-policy-controller的Deployment,以对接k8s集群中为Pod设置的Network Policy。
    4.2.3 Calico服务安装与配置的详细说明
    从Calico官网下载Calico的yaml配置文件,下载地址为https://docs.projectcalico.org/v2.1/getting-started/kubernetes/installation/hosted/calico.yaml 。
    该配置文件中包括了启动Calico所需的全部资源对象的定义。下面对其逐个进行说明。
    (1)Calico所需的配置以ConfigMap对象进行创建,如下所示
    # Calico Version v2.1.5
    # https://docs.projectcalico.org/v2.1/releases#v2.1.5
    # This manifest includes the following component versions:
    #   calico/node:v1.1.3
    #   calico/cni:v1.8.0
    #   calico/kube-policy-controller:v0.5.4

    # This ConfigMap is used to configure a self-hosted Calico installation.
    kind: ConfigMap
    apiVersion: v1
    metadata:
      name: calico-config
      namespace: kube-system
    data:
      # Configure this with the location of your etcd cluster.
      etcd_endpoints: "http://10.0.2.15:2379"

      # Configure the Calico backend to use.
      calico_backend: "bird"

      # The CNI network configuration to install on each node.
      cni_network_config: |-
        {
            "name": "k8s-pod-network",
            "type": "calico",
            "etcd_endpoints": "__ETCD_ENDPOINTS__",
            "etcd_key_file": "__ETCD_KEY_FILE__",
            "etcd_cert_file": "__ETCD_CERT_FILE__",
            "etcd_ca_cert_file": "__ETCD_CA_CERT_FILE__",
            "log_level": "info",
            "ipam": {
                "type": "calico-ipam"
            },
            "policy": {
                "type": "k8s",
                "k8s_api_root": "https://__KUBERNETES_SERVICE_HOST__:__KUBERNETES_SERVICE_PORT__",
                "k8s_auth_token": "__SERVICEACCOUNT_TOKEN__"
            },
            "kubernetes": {
                "kubeconfig": "__KUBECONFIG_FILEPATH__"
            }
        }

      # If you're using TLS enabled etcd uncomment the following.
      # You must also populate the Secret below with these files.
      etcd_ca: ""   # "/calico-secrets/etcd-ca"
      etcd_cert: "" # "/calico-secrets/etcd-cert"
      etcd_key: ""  # "/calico-secrets/etcd-key"

    主要参数如下:
    etcd_endpoints:Calico使用etcd来保存网络拓扑和状态,该参数指定etcd的地址,可以使用k8s Master所用的etcd,也可以另外搭建。
    calico_backend:Calico的后端,默认为bird。
    cni_network_config:符合CNI规范的网络配置。其中type=calico表示kubelet将从/opt/cni/bin目录下搜索名为“Calico”的可执行文件,并调用它完成容器网络的设置。ipam中type=calico-ipam表示kubelet将在/opt/cni/bin目录下搜索名为"calico-ipam"的可执行文件,用于完成容器IP地址的分配。
    etcd如果配置了TLS安全认证,则还需要指定相应的ca、cert、key等文件。

    (2)访问etcd所需的secret,对于无TLS的etcd服务,将data设置为空即可
    # The following contains k8s Secrets for use with a TLS enabled etcd cluster.
    # For information on populating Secrets, see http://kubernetes.io/docs/user-guide/secrets/
    apiVersion: v1
    kind: Secret
    type: Opaque
    metadata:
      name: calico-etcd-secrets
      namespace: kube-system
    data:
      # Populate the following files with etcd TLS configuration if desired, but leave blank if
      # not using TLS for etcd.
      # This self-hosted install expects three files with the following names.  The values
      # should be base64 encoded strings of the entire contents of each file.
      # etcd-key: null
      # etcd-cert: null
      # etcd-ca: null

    (3)calico-node,以Daemonset形式在每台Node上运行一个calico-node服务和一个install-cni服务
    # This manifest installs the calico/node container, as well
    # as the Calico CNI plugins and network config on
    # each master and worker node in a Kubernetes cluster.
    kind: DaemonSet
    apiVersion: extensions/v1beta1
    metadata:
      name: calico-node
      namespace: kube-system
      labels:
        k8s-app: calico-node
    spec:
      selector:
        matchLabels:
          k8s-app: calico-node
      template:
        metadata:
          labels:
            k8s-app: calico-node
          annotations:
            scheduler.alpha.kubernetes.io/critical-pod: ''
            scheduler.alpha.kubernetes.io/tolerations: |
              [{"key": "dedicated", "value": "master", "effect": "NoSchedule" },
               {"key":"CriticalAddonsOnly", "operator":"Exists"}]
        spec:
          hostNetwork: true
          containers:
            # Runs calico/node container on each Kubernetes node.  This
            # container programs network policy and routes on each
            # host.
            - name: calico-node
              image: quay.io/calico/node:v1.1.3
              env:
                # The location of the Calico etcd cluster.
                - name: ETCD_ENDPOINTS
                  valueFrom:
                    configMapKeyRef:
                      name: calico-config
                      key: etcd_endpoints
                # Choose the backend to use.
                - name: CALICO_NETWORKING_BACKEND
                  valueFrom:
                    configMapKeyRef:
                      name: calico-config
                      key: calico_backend
                # Disable file logging so `kubectl logs` works.
                - name: CALICO_DISABLE_FILE_LOGGING
                  value: "true"
                # Set Felix endpoint to host default action to ACCEPT.
                - name: FELIX_DEFAULTENDPOINTTOHOSTACTION
                  value: "ACCEPT"
                # Configure the IP Pool from which Pod IPs will be chosen.
                - name: CALICO_IPV4POOL_CIDR
                  value: "192.168.0.0/16"
                - name: CALICO_IPV4POOL_IPIP
                  value: "always"
                # Disable IPv6 on Kubernetes.
                - name: FELIX_IPV6SUPPORT
                  value: "false"
                # Set Felix logging to "info"
                - name: FELIX_LOGSEVERITYSCREEN
                  value: "info"
                # Location of the CA certificate for etcd.
                - name: ETCD_CA_CERT_FILE
                  valueFrom:
                    configMapKeyRef:
                      name: calico-config
                      key: etcd_ca
                # Location of the client key for etcd.
                - name: ETCD_KEY_FILE
                  valueFrom:
                    configMapKeyRef:
                      name: calico-config
                      key: etcd_key
                # Location of the client certificate for etcd.
                - name: ETCD_CERT_FILE
                  valueFrom:
                    configMapKeyRef:
                      name: calico-config
                      key: etcd_cert
                # Auto-detect the BGP IP address.
                - name: IP
                  value: ""
              securityContext:
                privileged: true
              resources:
                requests:
                  cpu: 250m
              volumeMounts:
                - mountPath: /lib/modules
                  name: lib-modules
                  readOnly: true
                - mountPath: /var/run/calico
                  name: var-run-calico
                  readOnly: false
                - mountPath: /calico-secrets
                  name: etcd-certs
            # This container installs the Calico CNI binaries
            # and CNI network config file on each node.
            - name: install-cni
              image: quay.io/calico/cni:v1.8.0
              command: ["/install-cni.sh"]
              env:
                # The location of the Calico etcd cluster.
                - name: ETCD_ENDPOINTS
                  valueFrom:
                    configMapKeyRef:
                      name: calico-config
                      key: etcd_endpoints
                # The CNI network config to install on each node.
                - name: CNI_NETWORK_CONFIG
                  valueFrom:
                    configMapKeyRef:
                      name: calico-config
                      key: cni_network_config
              volumeMounts:
                - mountPath: /host/opt/cni/bin
                  name: cni-bin-dir
                - mountPath: /host/etc/cni/net.d
                  name: cni-net-dir
                - mountPath: /calico-secrets
                  name: etcd-certs
          volumes:
            # Used by calico/node.
            - name: lib-modules
              hostPath:
                path: /lib/modules
            - name: var-run-calico
              hostPath:
                path: /var/run/calico
            # Used to install CNI.
            - name: cni-bin-dir
              hostPath:
                path: /opt/cni/bin
            - name: cni-net-dir
              hostPath:
                path: /etc/cni/net.d
            # Mount in the etcd TLS secrets.
            - name: etcd-certs
              secret:
                secretName: calico-etcd-secrets
    该Pod中包括如下两个容器:
    calico-node:Calico服务程序,用于设置Pod的网络资源,保证Pod的网络与各Node互联互通,它还需要以hostNetwork模式运行,直接使用宿主机网络。
    install-cni:在各Node上安装CNI二进制文件到/opt/cni/bin目录下,并安装相应的网络配置文件到/etc/cni/net.d目录下。
    calico-node服务的主要参数如下:
    CALICO_IPV4POOL_CIDR:Calico IPAM的IP地址池,Pod的IP地址将从该池中进行分配。
    CALICO_IPV4POOL_IPIP:是否启用IPIP模式。启用IPIP模式时,Calico将在Node上创建一个名为"tunl0"的虚拟隧道。
    FELIX_IPV6SUPPORT:是否启用IPV6。
    FELIX_LOGSEVERITYSCREEN:日志级别。
    IP Pool可以使用两种模式:BGP或IPIP模式。
    使用IPIP模式时,设置CALICO_IPV4POOL_IPIP=“always”,不使用IPIP模式时,设置CALICO_IPV4POOL_IPIP="off",此时将使用BGP模式。

    IPIP是一种将各Node的路由之间做一个tunnel,再把两个网络连接起来的模式。启用IPIP模式时,Calico将在各Node上创建一个名为"tunl0"的虚拟网络接口。如下图所示。

    BGP模式则直接使用物理机作为虚拟路由路(vRouter),不再创建额外的tunnel。

    (4)calico-policy-controller容器
    用于对接k8s集群中为Pod设置的Network Policy。
    # This manifest deploys the Calico policy controller on Kubernetes.
    # See https://github.com/projectcalico/k8s-policy
    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
      name: calico-policy-controller
      namespace: kube-system
      labels:
        k8s-app: calico-policy
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
        scheduler.alpha.kubernetes.io/tolerations: |
          [{"key": "dedicated", "value": "master", "effect": "NoSchedule" },
           {"key":"CriticalAddonsOnly", "operator":"Exists"}]
    spec:
      # The policy controller can only have a single active instance.
      replicas: 1
      strategy:
        type: Recreate
      template:
        metadata:
          name: calico-policy-controller
          namespace: kube-system
          labels:
            k8s-app: calico-policy
        spec:
          # The policy controller must run in the host network namespace so that
          # it isn't governed by policy that would prevent it from working.
          hostNetwork: true
          containers:
            - name: calico-policy-controller
              image: quay.io/calico/kube-policy-controller:v0.5.4
              env:
                # The location of the Calico etcd cluster.
                - name: ETCD_ENDPOINTS
                  valueFrom:
                    configMapKeyRef:
                      name: calico-config
                      key: etcd_endpoints
                # Location of the CA certificate for etcd.
                - name: ETCD_CA_CERT_FILE
                  valueFrom:
                    configMapKeyRef:
                      name: calico-config
                      key: etcd_ca
                # Location of the client key for etcd.
                - name: ETCD_KEY_FILE
                  valueFrom:
                    configMapKeyRef:
                      name: calico-config
                      key: etcd_key
                # Location of the client certificate for etcd.
                - name: ETCD_CERT_FILE
                  valueFrom:
                    configMapKeyRef:
                      name: calico-config
                      key: etcd_cert
                # The location of the Kubernetes API.  Use the default Kubernetes
                # service for API access.
                - name: K8S_API
                  value: "https://kubernetes.default:443"
                # Since we're running in the host namespace and might not have KubeDNS
                # access, configure the container's /etc/hosts to resolve
                # kubernetes.default to the correct service clusterIP.
                - name: CONFIGURE_ETC_HOSTS
                  value: "true"
              volumeMounts:
                # Mount in the etcd TLS secrets.
                - mountPath: /calico-secrets
                  name: etcd-certs
          volumes:
            # Mount in the etcd TLS secrets.
            - name: etcd-certs
              secret:
                secretName: calico-etcd-secrets
    用户在k8s集群中设置了Pod的Network Policy之后,calico-policy-controller就会自动通知各个Node上的calico-node服务,在宿主机上设置相应的iptables规则,完成Pod间网络访问策略的设置。
    做好以上配置文件的准备工作后,就可以开始创建Calico的各资源对象了。
    [root@k8s-master ~]# kubectl create -f calico.yaml
    configmap "calico-config" created
    secret "calico-etcd-secrets" created
    daemonset "calico-node" created
    deployment "calico-policy-controller" created
    [root@k8s-master ~]#
    确保各服务正确运行:
    [root@k8s-master ~]# kubectl get pods --namespace=kube-system -o wide
    NAME                                        READY     STATUS    RESTARTS   AGE       IP         NODE
    calico-node-59n9j                           2/2       Running   1          9h        10.0.2.5   10.0.2.5
    calico-node-cksq5                           2/2       Running   1          9h        10.0.2.4   10.0.2.4
    calico-policy-controller-54dbfcd7c7-ctxzz   1/1       Running   0          9h        10.0.2.5   10.0.2.5
    [root@k8s-master ~]#

    [root@k8s-master ~]# kubectl get rs --namespace=kube-system
    NAME                                  DESIRED   CURRENT   READY     AGE
    calico-policy-controller-54dbfcd7c7   1         1         1         9h
    [root@k8s-master ~]#
    [root@k8s-master ~]# kubectl get deployment --namespace=kube-system
    NAME                       DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
    calico-policy-controller   1         1         1            1           9h
    [root@k8s-master ~]# kubectl get secret --namespace=kube-system
    NAME                  TYPE      DATA      AGE
    calico-etcd-secrets   Opaque    0         9h
    [root@k8s-master ~]# kubectl get configmap --namespace=kube-system
    NAME            DATA      AGE
    calico-config   6         9h
    [root@k8s-master ~]#

    我们看下Node1上:
    [root@k8s-node1 ~]# docker ps
    CONTAINER ID        IMAGE                                      COMMAND             CREATED             STATUS              PORTS               NAMES
    dd431155ed2d        quay.io/calico/cni                         "/install-cni.sh"   8 hours ago         Up 8 hours                              k8s_install-cni_calico-node-cksq5_kube-system_e3ed0d80-6fe9-11e8-8a4a-080027800835_0
    e7f20b684fc2        quay.io/calico/node                        "start_runit"       8 hours ago         Up 8 hours                              k8s_calico-node_calico-node-cksq5_kube-system_e3ed0d80-6fe9-11e8-8a4a-080027800835_1
    1c9010e4b661        gcr.io/google_containers/pause-amd64:3.0   "/pause"            8 hours ago         Up 8 hours                              k8s_POD_calico-node-cksq5_kube-system_e3ed0d80-6fe9-11e8-8a4a-080027800835_1
    [root@k8s-node1 ~]#
    [root@k8s-node1 ~]# docker images
    REPOSITORY                             TAG                 IMAGE ID            CREATED             SIZE
    cloudnil/pause-amd64                   3.0                 66c684b679d2        11 months ago       747kB
    gcr.io/google_containers/pause-amd64   3.0                 66c684b679d2        11 months ago       747kB
    quay.io/calico/cni                     v1.8.0              8de7b24bd7ec        13 months ago       67MB
    quay.io/calico/node                    v1.1.3              573ddcad1ff5        13 months ago       217MB
    kubeguide/guestbook-php-frontend       latest              47ee16830e89        23 months ago       510MB
    Node2上多出一个Pod:calico-policy-controller
    [root@k8s-node2 ~]# docker ps
    CONTAINER ID        IMAGE                                      COMMAND              CREATED             STATUS              PORTS               NAMES
    ff4dbcd77892        quay.io/calico/kube-policy-controller      "/dist/controller"   8 hours ago         Up 8 hours                              k8s_calico-policy-controller_calico-policy-controller-54dbfcd7c7-ctxzz_kube-system_e3f067be-6fe9-11e8-8a4a-080027800835_0
    60439cfbde00        quay.io/calico/cni                         "/install-cni.sh"    8 hours ago         Up 8 hours                              k8s_install-cni_calico-node-59n9j_kube-system_e3efa53c-6fe9-11e8-8a4a-080027800835_1
    c55f279ef3c1        quay.io/calico/node                        "start_runit"        8 hours ago         Up 8 hours                              k8s_calico-node_calico-node-59n9j_kube-system_e3efa53c-6fe9-11e8-8a4a-080027800835_0
    17d08ed5fd86        gcr.io/google_containers/pause-amd64:3.0   "/pause"             8 hours ago         Up 8 hours                              k8s_POD_calico-node-59n9j_kube-system_e3efa53c-6fe9-11e8-8a4a-080027800835_1
    aa85ee06190f        gcr.io/google_containers/pause-amd64:3.0   "/pause"             8 hours ago         Up 8 hours                              k8s_POD_calico-policy-controller-54dbfcd7c7-ctxzz_kube-system_e3f067be-6fe9-11e8-8a4a-080027800835_0
    [root@k8s-node2 ~]#
    [root@k8s-node2 ~]#
    [root@k8s-node2 ~]# docker images
    REPOSITORY                              TAG                 IMAGE ID            CREATED             SIZE
    cloudnil/pause-amd64                    3.0                 66c684b679d2        11 months ago       747kB
    gcr.io/google_containers/pause-amd64    3.0                 66c684b679d2        11 months ago       747kB
    quay.io/calico/cni                      v1.8.0              8de7b24bd7ec        13 months ago       67MB
    quay.io/calico/node                     v1.1.3              573ddcad1ff5        13 months ago       217MB
    quay.io/calico/kube-policy-controller   v0.5.4              ac66b6e8f19e        14 months ago       22.6MB
    kubeguide/guestbook-php-frontend        latest              47ee16830e89        23 months ago       510MB
    georce/router                           latest              f3074d9a8369        3 years ago         190MB
    [root@k8s-node2 ~]#

    calico-node在正常运行之后,会根据CNI规范,在/etc/cni/net.d/目录下生成如下文件和目录,并在/opt/cni/bin目录下安装二进制文件calico和calico-ipam,供kubelet调用。
    10-calico.conf:符合CNI规范的网络配置,其中type=calico表示该插件的二进制文件名为calico。
    calico-kubeconfig:Calico所需的kubeconfig文件。
    calico-tls目录:以TLS方式连接etcd的相关文件。

    [root@k8s-node1 ~]# cd /etc/cni/net.d/
    [root@k8s-node1 net.d]# ls
    10-calico.conf  calico-kubeconfig  calico-tls
    [root@k8s-node1 net.d]#
    [root@k8s-node1 net.d]# ls /opt/cni/bin
    calico  calico-ipam  flannel  host-local  loopback
    [root@k8s-node1 net.d]#

    查看k8s node1服务器的网络接口设置,可以看到一个新的名为"tunl0"的接口,并设置了网络地址为192.168.196.128
    [root@k8s-node1 net.d]# ifconfig tunl0
    tunl0: flags=193<UP,RUNNING,NOARP>  mtu 1440
            inet 192.168.196.128  netmask 255.255.255.255
            tunnel   txqueuelen 1000  (IPIP Tunnel)
            RX packets 0  bytes 0 (0.0 B)
            RX errors 0  dropped 0  overruns 0  frame 0
            TX packets 0  bytes 0 (0.0 B)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

    查看k8s node2服务器的网络接口设置,可以看到一个新的名为"tunl0"的接口,并设置了网络地址为192.168.19.192
    [root@k8s-node2 ~]# ifconfig tunl0
    tunl0: flags=193<UP,RUNNING,NOARP>  mtu 1440
            inet 192.168.19.192  netmask 255.255.255.255
            tunnel   txqueuelen 1000  (IPIP Tunnel)
            RX packets 0  bytes 0 (0.0 B)
            RX errors 0  dropped 0  overruns 0  frame 0
            TX packets 0  bytes 0 (0.0 B)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
    这两个子网都是从calico-node的IP地址池192.168.0.0/16中进行分配的。同时,docker0对于k8s设置Pod的IP地址将不再起作用。

    查看两台主机的路由表。可以看到node1服务器上有一条到node2的私网192.168.19.192的路由转发规则:
    [root@k8s-node1 net.d]# ip route
    default via 10.0.2.1 dev enp0s3 proto dhcp metric 100
    10.0.2.0/24 dev enp0s3 proto kernel scope link src 10.0.2.4 metric 100
    172.16.10.0/24 dev docker0 proto kernel scope link src 172.16.10.1
    192.168.19.192/26 via 10.0.2.5 dev tunl0 proto bird onlink
    blackhole 192.168.196.128/26 proto bird
    [root@k8s-node1 net.d]#

    然后查看node2服务器的路由表,也可以看到有一条到node1私网192.168.196.128的路由转发规则:
    [root@k8s-node2 ~]# ip route
    default via 10.0.2.1 dev enp0s3 proto dhcp metric 100
    10.0.2.0/24 dev enp0s3 proto kernel scope link src 10.0.2.5 metric 100
    172.16.20.0/24 dev docker0 proto kernel scope link src 172.16.20.1
    blackhole 192.168.19.192/26 proto bird
    192.168.196.128/26 via 10.0.2.4 dev tunl0 proto bird onlink

    这样通过Calico就完成了Node间容器网络的设置。在后续的Pod创建过程中,kubelet将通过CNI接口调用Calico进行Pod网络的设置,包括IP地址、路由规则、iptables规则等。

    如果设置CALICO_IPV4POOL_IPIP="off",即不使用IPIP模式,则Calico将不会创建tunl0网络接口,路由规则直接使用物理机网卡作为路由器进行转发。

    4.3 使用网络策略实现Pod间的访问策略
    Calico支持设置Pod间的访问策略,基本原理如下图所示。

    下面以一个提供服务的Nginx Pod为例,为两个客户端Pod设置不同的网络访问权限,允许包含Label "role=nginxclient"的Pod访问Nginx容器,无此Label的其他容器则拒绝访问。
    步骤1:
    首先为需要设置网络隔离的Namespace进行标注,本例中的所有Pod都在Namespace default中,故对其进行默认网络隔离的设置:
    # kubectl annotate ns default
    "net.beta.kubernetes.io/network-policy={"ingress": {"isolation": "DefaultDeny"}}"
    设置完成后,default内的各Pod之间的网络就无法连通了。

    步骤2:创建Nginx Pod,并添加Label "app=nginx"
    apiVersion: v1
    kind: Pod
    metadata:
      name: nginx
      labels:
        app: nginx
    spec:
      containers:
      name: nginx
      image: nginx

    步骤3:为Nginx设置准入访问 策略
    networkpolicy-allow-nginxclient.yaml
    kind: NetworkPolicy
    apiVersion: extension/v1beta1
    metadata:
      name: allow-nginxclient
    spec:
      podSelector:
        matchLabels:
          app: nginx
      ingress:
        - from:
          - podSelector:
              matchLabels:
                role: nginxclient
          ports:
          - protocol: TCP
            port: 80

    目标Pod应包含Label "app=nginx",允许访问的客户端Pod包含Label "role=nginxclient",并允许客户端访问mysql容器的80端口。

    创建该NetworkPolicy资源对象:
    # kubectl create -f networkpolicy-allow-nginxclient.yaml

    步骤4:创建两个客户端Pod,一个包含Label "role=nginxclient",另一个无此Label。分别进入各Pod,访问Nginx容器,验证网络策略的效果。
    client1.yaml
    apiVersion: v1
    kind: Pod
    metadata:
      name: client1
      labels:
        role: nginxclient
    spec:
      containers:
      - name: client1
        image: busybox
        command: [ "sleep", "3600" ]

    client2.yaml
    apiVersion: v1
    kind: Pod
    metadata:
      name: client2
    spec:
      containers:
      - name: client2
        image: busybox
        command: [ "sleep", "3600" ]

    创建以上两个Pods,并进入每个容器中进行服务访问的验证。

    上面例子中的网络策略是由calico-policy-controller具体实现的,calico-poliey-controller持续监听k8s中NetworkPolicy的定义,与各Pod通过Label进行关联,将允许访问或拒绝访问的策略通知到各calico-node服务。
    最终calico-node完成对Pod间网络访问的设置,实现应用的网络隔离。

    参考资料:
    https://blog.csdn.net/watermelonbig/article/details/80720378
    http://cizixs.com/2017/10/19/docker-calico-network

  • 相关阅读:
    How Many Answers Are Wrong
    Agri-Net —poj1258
    食物链
    A Bug's Life
    畅通工程
    Shortest path of the king
    Alex and Number
    KMP
    快速幂
    0x04
  • 原文地址:https://www.cnblogs.com/heboxiang/p/12183173.html
Copyright © 2020-2023  润新知