• 通过iptables LOG动作分析k8s的Service负载


    环境

    kubernetes 1.12.1

    内核版本 3.10.0-1062.18.1.el7.x86_64

    docker版本 1.13.1-162

    kube-proxy使用iptables mode

    pod网络 flannel vxlan 

    在集群中存在两个Service分别是prometheus和kube-dns,其中prometheus有一个实例在node01,kube-dns具有两个实例,分布在node01和master上

    Service pod node
    prometheus 10.110.238.91:9090 prometheus-546bd96fc5-h4gtg  10.244.1.6:9090 k8s-node01.com 172.21.0.13
    kube-dns 10.96.0.10:53 coredns-576cbf47c7-57xd7 10.244.1.8:53 k8s-node01.com 172.21.0.13
    coredns-576cbf47c7-q6p2h 10.244.0.4:53 k8s-master.com 172.16.0.2

    过程

    以下通过编写脚本,在每一条iptables规则上面,插入一条匹配条件一致但是动作为LOG的规则,这样就能通过LOG动作记录的日志来观察请求所匹配到的规则

    选择master节点,停止该节点上的kube-proxy pod以确保在实验过程中节点上的Iptables规则不因集群信息变化而变动

    1、停止master节点的kube-proxy

    kc get ds -n kube-system kube-proxy -o yaml > /appdata/kube-proxy.yaml 
    kc  patch ds kube-proxy -n kube-system --type=json -p='[{"op":"replace", "path": "/spec/template/spec/tolerations", "value":null}]' 
    kc edit ds kube-proxy -n kube-system #增加nodeSelector kubernetes.io/hostname: k8s-node01.com

    再次查看,发现master节点上的kube-proxy pod已消失,但是节点上的iptables规则仍然保留

    2、备份原iptables规则

    iptables-save >> /appdata/iptables.bak

    3、配置节点rsyslog,将kernel的debug日志输出到/var/log/iptables

    echo "kern.debug    /var/log/iptables" >> /etc/rsyslog.conf
    systemctl restart rsyslog

    4、编写LOG脚本,其在每一条规则上面,插入一条匹配条件一致但是动作为LOG的规则,而且每条LOG规则都利用了limit模块限制速率,否则可能有太多log不便于观察如下

    (涉及操作iptables规则,务必需要在测试环境的测试节点测试,否则有严重后果)

    #Author JianlongZ
    #At 10-08-2020
    
    #!/bin/bash
    set -e
    rulefile="./iptables-rule.log"
    bakfile="./iptables-save-$(date +%Y%m%d%H%M%S)"
    resultfile="./run-iptables-$(date +%Y%m%d%H%M%S)"
    
    #save table or rule
    iptables-save | grep -E "^-|*" > ${rulefile}
    #insert position
    position=1
    #insert table
    table=""
    #the prerule
    prerule=""
    
    #save rules to file for read
    iptables-save > $bakfile
    
    while read rule
    
    do
        if [[ $rule =~ ^* ]]; then
            table=$(awk -F* '{print $2}' <<< $rule)
            continue
        fi
    
        chain=$(awk '{print $2}' <<< $rule)
        prechain=$(awk '{print $2}' <<< $prerule)
        condition=$(echo $rule | awk '{$1=$2=""; print}' | awk -F'-j' '{print $1}')
        position=$(expr $position + 2)
        
        if [[ $chain != $prechain ]]; then
            position=1
        fi
    
        res="iptables -t $table -I $chain $position $condition -m limit --limit-burst 1 --limit 1/second -j LOG --log-prefix "$table|$position|$chain" --log-level debug "
        echo $res >> $resultfile
    
        prerule=$rule
    
    done < $rulefile
    
    echo "Finish."
    echo "Please run $resultfile to insert the log rules."

    结论

    在/var/log/iptables文件中根据svcIp和podIp筛选日志,再根据日志前缀就可以在如下完整的iptables规则中找到该此访问所经过的规则

    # Generated by iptables-save v1.4.21 on Fri Oct  9 15:39:58 2020
    *mangle
    :PREROUTING ACCEPT [512762:115269776]
    :INPUT ACCEPT [512762:115269776]
    :FORWARD ACCEPT [0:0]
    :OUTPUT ACCEPT [518304:132706339]
    :POSTROUTING ACCEPT [518304:132706339]
    COMMIT
    # Completed on Fri Oct  9 15:39:58 2020
    # Generated by iptables-save v1.4.21 on Fri Oct  9 15:39:58 2020
    *nat
    :PREROUTING ACCEPT [2:112]
    :INPUT ACCEPT [2:112]
    :OUTPUT ACCEPT [1:60]
    :POSTROUTING ACCEPT [1:60]
    :DOCKER - [0:0]
    :KUBE-MARK-DROP - [0:0]
    :KUBE-MARK-MASQ - [0:0]
    :KUBE-NODEPORTS - [0:0]
    :KUBE-POSTROUTING - [0:0]
    :KUBE-SEP-6FRGWTS5YGV54XWV - [0:0]
    :KUBE-SEP-HPQF756YQTNK43WA - [0:0]
    :KUBE-SEP-KZMEYJZBDY4HFAEO - [0:0]
    :KUBE-SEP-MXQMVNGFUQPLZSHS - [0:0]
    :KUBE-SEP-NWYX6ZRA4HKJWFJ6 - [0:0]
    :KUBE-SEP-YC5G23GHTZAZPNO5 - [0:0]
    :KUBE-SERVICES - [0:0]
    :KUBE-SVC-ERIFXISQEP7F7OF4 - [0:0]
    :KUBE-SVC-FNI7RW7PEKOXZDFO - [0:0]
    :KUBE-SVC-NPX46M4PTMTKRN6Y - [0:0]
    :KUBE-SVC-TCOU7JCQXEZGVUNU - [0:0]
    -A PREROUTING -m comment --comment "kubernetes service portals" -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|PREROUTING" --log-level 7
    -A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
    -A OUTPUT -m comment --comment "kubernetes service portals" -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|OUTPUT" --log-level 7
    -A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
    -A POSTROUTING -m comment --comment "kubernetes postrouting rules" -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|POSTROUTING" --log-level 7
    -A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
    -A POSTROUTING -s 10.244.0.0/16 -d 10.244.0.0/16 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|3|POSTROUTING" --log-level 7
    -A POSTROUTING -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN
    -A POSTROUTING -s 10.244.0.0/16 ! -d 224.0.0.0/4 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|5|POSTROUTING" --log-level 7
    -A POSTROUTING -s 10.244.0.0/16 ! -d 224.0.0.0/4 -j MASQUERADE
    -A POSTROUTING ! -s 10.244.0.0/16 -d 10.244.0.0/24 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|7|POSTROUTING" --log-level 7
    -A POSTROUTING ! -s 10.244.0.0/16 -d 10.244.0.0/24 -j RETURN
    -A POSTROUTING ! -s 10.244.0.0/16 -d 10.244.0.0/16 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|9|POSTROUTING" --log-level 7
    -A POSTROUTING ! -s 10.244.0.0/16 -d 10.244.0.0/16 -j MASQUERADE
    -A KUBE-MARK-DROP -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|KUBE-MARK-DROP" --log-level 7
    -A KUBE-MARK-DROP -j MARK --set-xmark 0x8000/0x8000
    -A KUBE-MARK-MASQ -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|KUBE-MARK-MASQ" --log-level 7
    -A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
    -A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|KUBE-POSTROUTING" --log-level 7
    -A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE
    -A KUBE-SEP-6FRGWTS5YGV54XWV -s 10.244.1.6/32 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|KUBE-SEP-6FRGWTS5YGV54X" --log-level 7
    -A KUBE-SEP-6FRGWTS5YGV54XWV -s 10.244.1.6/32 -j KUBE-MARK-MASQ
    -A KUBE-SEP-6FRGWTS5YGV54XWV -p tcp -m tcp -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|3|KUBE-SEP-6FRGWTS5YGV54X" --log-level 7
    -A KUBE-SEP-6FRGWTS5YGV54XWV -p tcp -m tcp -j DNAT --to-destination 10.244.1.6:9090
    -A KUBE-SEP-HPQF756YQTNK43WA -s 10.244.1.9/32 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|KUBE-SEP-HPQF756YQTNK43" --log-level 7
    -A KUBE-SEP-HPQF756YQTNK43WA -s 10.244.1.9/32 -j KUBE-MARK-MASQ
    -A KUBE-SEP-HPQF756YQTNK43WA -p tcp -m tcp -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|3|KUBE-SEP-HPQF756YQTNK43" --log-level 7
    -A KUBE-SEP-HPQF756YQTNK43WA -p tcp -m tcp -j DNAT --to-destination 10.244.1.9:53
    -A KUBE-SEP-KZMEYJZBDY4HFAEO -s 10.244.1.8/32 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|KUBE-SEP-KZMEYJZBDY4HFA" --log-level 7
    -A KUBE-SEP-KZMEYJZBDY4HFAEO -s 10.244.1.8/32 -j KUBE-MARK-MASQ
    -A KUBE-SEP-KZMEYJZBDY4HFAEO -p tcp -m tcp -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|3|KUBE-SEP-KZMEYJZBDY4HFA" --log-level 7
    -A KUBE-SEP-KZMEYJZBDY4HFAEO -p tcp -m tcp -j DNAT --to-destination 10.244.1.8:53
    -A KUBE-SEP-MXQMVNGFUQPLZSHS -s 10.244.1.8/32 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|KUBE-SEP-MXQMVNGFUQPLZS" --log-level 7
    -A KUBE-SEP-MXQMVNGFUQPLZSHS -s 10.244.1.8/32 -j KUBE-MARK-MASQ
    -A KUBE-SEP-MXQMVNGFUQPLZSHS -p udp -m udp -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|3|KUBE-SEP-MXQMVNGFUQPLZS" --log-level 7
    -A KUBE-SEP-MXQMVNGFUQPLZSHS -p udp -m udp -j DNAT --to-destination 10.244.1.8:53
    -A KUBE-SEP-NWYX6ZRA4HKJWFJ6 -s 10.244.1.9/32 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|KUBE-SEP-NWYX6ZRA4HKJWF" --log-level 7
    -A KUBE-SEP-NWYX6ZRA4HKJWFJ6 -s 10.244.1.9/32 -j KUBE-MARK-MASQ
    -A KUBE-SEP-NWYX6ZRA4HKJWFJ6 -p udp -m udp -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|3|KUBE-SEP-NWYX6ZRA4HKJWF" --log-level 7
    -A KUBE-SEP-NWYX6ZRA4HKJWFJ6 -p udp -m udp -j DNAT --to-destination 10.244.1.9:53
    -A KUBE-SEP-YC5G23GHTZAZPNO5 -s 172.16.0.2/32 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|KUBE-SEP-YC5G23GHTZAZPN" --log-level 7
    -A KUBE-SEP-YC5G23GHTZAZPNO5 -s 172.16.0.2/32 -j KUBE-MARK-MASQ
    -A KUBE-SEP-YC5G23GHTZAZPNO5 -p tcp -m tcp -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|3|KUBE-SEP-YC5G23GHTZAZPN" --log-level 7
    -A KUBE-SEP-YC5G23GHTZAZPNO5 -p tcp -m tcp -j DNAT --to-destination 172.16.0.2:6443
    -A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|KUBE-SERVICES" --log-level 7
    -A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ
    -A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|3|KUBE-SERVICES" --log-level 7
    -A KUBE-SERVICES -d 10.96.0.1/32 -p tcp -m comment --comment "default/kubernetes:https cluster IP" -m tcp --dport 443 -j KUBE-SVC-NPX46M4PTMTKRN6Y
    -A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|5|KUBE-SERVICES" --log-level 7
    -A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-MARK-MASQ
    -A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|7|KUBE-SERVICES" --log-level 7
    -A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4
    -A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|9|KUBE-SERVICES" --log-level 7
    -A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-MARK-MASQ
    -A KUBE-SERVICES -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|11|KUBE-SERVICES" --log-level 7
    -A KUBE-SERVICES -d 10.96.0.10/32 -p udp -m comment --comment "kube-system/kube-dns:dns cluster IP" -m udp --dport 53 -j KUBE-SVC-TCOU7JCQXEZGVUNU
    -A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.110.238.91/32 -p tcp -m comment --comment "default/prometheus:tcp cluster IP" -m tcp --dport 9090 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|13|KUBE-SERVICES" --log-level 7
    -A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.110.238.91/32 -p tcp -m comment --comment "default/prometheus:tcp cluster IP" -m tcp --dport 9090 -j KUBE-MARK-MASQ
    -A KUBE-SERVICES -d 10.110.238.91/32 -p tcp -m comment --comment "default/prometheus:tcp cluster IP" -m tcp --dport 9090 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|15|KUBE-SERVICES" --log-level 7
    -A KUBE-SERVICES -d 10.110.238.91/32 -p tcp -m comment --comment "default/prometheus:tcp cluster IP" -m tcp --dport 9090 -j KUBE-SVC-FNI7RW7PEKOXZDFO
    -A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|17|KUBE-SERVICES" --log-level 7
    -A KUBE-SERVICES -m comment --comment "kubernetes service nodeports; NOTE: this must be the last rule in this chain" -m addrtype --dst-type LOCAL -j KUBE-NODEPORTS
    -A KUBE-SVC-ERIFXISQEP7F7OF4 -m statistic --mode random --probability 0.50000000000 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|KUBE-SVC-ERIFXISQEP7F7O" --log-level 7
    -A KUBE-SVC-ERIFXISQEP7F7OF4 -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-KZMEYJZBDY4HFAEO
    -A KUBE-SVC-ERIFXISQEP7F7OF4 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|3|KUBE-SVC-ERIFXISQEP7F7O" --log-level 7
    -A KUBE-SVC-ERIFXISQEP7F7OF4 -j KUBE-SEP-HPQF756YQTNK43WA
    -A KUBE-SVC-FNI7RW7PEKOXZDFO -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|KUBE-SVC-FNI7RW7PEKOXZD" --log-level 7
    -A KUBE-SVC-FNI7RW7PEKOXZDFO -j KUBE-SEP-6FRGWTS5YGV54XWV
    -A KUBE-SVC-NPX46M4PTMTKRN6Y -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|KUBE-SVC-NPX46M4PTMTKRN" --log-level 7
    -A KUBE-SVC-NPX46M4PTMTKRN6Y -j KUBE-SEP-YC5G23GHTZAZPNO5
    -A KUBE-SVC-TCOU7JCQXEZGVUNU -m statistic --mode random --probability 0.50000000000 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|1|KUBE-SVC-TCOU7JCQXEZGVU" --log-level 7
    -A KUBE-SVC-TCOU7JCQXEZGVUNU -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-MXQMVNGFUQPLZSHS
    -A KUBE-SVC-TCOU7JCQXEZGVUNU -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "nat|3|KUBE-SVC-TCOU7JCQXEZGVU" --log-level 7
    -A KUBE-SVC-TCOU7JCQXEZGVUNU -j KUBE-SEP-NWYX6ZRA4HKJWFJ6
    COMMIT
    # Completed on Fri Oct  9 15:39:58 2020
    # Generated by iptables-save v1.4.21 on Fri Oct  9 15:39:58 2020
    *filter
    :INPUT ACCEPT [505840:113853465]
    :FORWARD DROP [0:0]
    :OUTPUT ACCEPT [511458:131041314]
    :DOCKER - [0:0]
    :DOCKER-ISOLATION - [0:0]
    :KUBE-EXTERNAL-SERVICES - [0:0]
    :KUBE-FIREWALL - [0:0]
    :KUBE-FORWARD - [0:0]
    :KUBE-SERVICES - [0:0]
    -A INPUT -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "filter|1|INPUT" --log-level 7
    -A INPUT -j KUBE-FIREWALL
    -A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "filter|3|INPUT" --log-level 7
    -A INPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes externally-visible service portals" -j KUBE-EXTERNAL-SERVICES
    -A FORWARD -m comment --comment "kubernetes forwarding rules" -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "filter|1|FORWARD" --log-level 7
    -A FORWARD -m comment --comment "kubernetes forwarding rules" -j KUBE-FORWARD
    -A FORWARD -s 10.244.0.0/16 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "filter|3|FORWARD" --log-level 7
    -A FORWARD -s 10.244.0.0/16 -j ACCEPT
    -A FORWARD -d 10.244.0.0/16 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "filter|5|FORWARD" --log-level 7
    -A FORWARD -d 10.244.0.0/16 -j ACCEPT
    -A OUTPUT -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "filter|1|OUTPUT" --log-level 7
    -A OUTPUT -j KUBE-FIREWALL
    -A OUTPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "filter|3|OUTPUT" --log-level 7
    -A OUTPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
    -A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "filter|1|KUBE-FIREWALL" --log-level 7
    -A KUBE-FIREWALL -m comment --comment "kubernetes firewall for dropping marked packets" -m mark --mark 0x8000/0x8000 -j DROP
    -A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "filter|1|KUBE-FORWARD" --log-level 7
    -A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -j ACCEPT
    -A KUBE-FORWARD -s 10.244.0.0/16 -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "filter|3|KUBE-FORWARD" --log-level 7
    -A KUBE-FORWARD -s 10.244.0.0/16 -m comment --comment "kubernetes forwarding conntrack pod source rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
    -A KUBE-FORWARD -d 10.244.0.0/16 -m comment --comment "kubernetes forwarding conntrack pod destination rule" -m conntrack --ctstate RELATED,ESTABLISHED -m limit --limit 1/sec --limit-burst 1 -j LOG --log-prefix "filter|5|KUBE-FORWARD" --log-level 7
    -A KUBE-FORWARD -d 10.244.0.0/16 -m comment --comment "kubernetes forwarding conntrack pod destination rule" -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
    COMMIT
    # Completed on Fri Oct  9 15:39:58 2020
    iptables-save-all

    比如日志Oct  9 15:34:59 k8s-master kernel: nat|1|OUTPUT IN= OUT=eth0 SRC=172.16.0.2 DST=10.110.238.91 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6879 DF PROTO=TCP SPT=49620 DPT=9090 WINDOW=29200 RES=0x00 SYN URGP=0 表示经过了nat表的OUTPUT链的第1条规则,那么这条规则的下一条就是实际我们要找的,即-A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES

    1、master节点上直接访问svc  172.16.0.2 -> 10.110.238.91

    [root@VM-0-2-centos appdata]# cat /var/log/iptables | grep -E  "10.244.1.6|10.110.238.91"
    Oct  9 15:34:59 k8s-master kernel: nat|1|OUTPUTIN= OUT=eth0 SRC=172.16.0.2 DST=10.110.238.91 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6879 DF PROTO=TCP SPT=49620 DPT=9090 WINDOW=29200 RES=0x00 SYN URGP=0 
    Oct  9 15:34:59 k8s-master kernel: nat|13|KUBE-SERVICESIN= OUT=eth0 SRC=172.16.0.2 DST=10.110.238.91 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6879 DF PROTO=TCP SPT=49620 DPT=9090 WINDOW=29200 RES=0x00 SYN URGP=0 
    Oct  9 15:34:59 k8s-master kernel: nat|1|KUBE-MARK-MASQIN= OUT=eth0 SRC=172.16.0.2 DST=10.110.238.91 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6879 DF PROTO=TCP SPT=49620 DPT=9090 WINDOW=29200 RES=0x00 SYN URGP=0 
    Oct  9 15:34:59 k8s-master kernel: nat|15|KUBE-SERVICESIN= OUT=eth0 SRC=172.16.0.2 DST=10.110.238.91 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6879 DF PROTO=TCP SPT=49620 DPT=9090 WINDOW=29200 RES=0x00 SYN URGP=0 MARK=0x4000 
    Oct  9 15:34:59 k8s-master kernel: nat|1|KUBE-SVC-FNI7RW7PEKOXZDIN= OUT=eth0 SRC=172.16.0.2 DST=10.110.238.91 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6879 DF PROTO=TCP SPT=49620 DPT=9090 WINDOW=29200 RES=0x00 SYN URGP=0 MARK=0x4000 
    Oct  9 15:34:59 k8s-master kernel: nat|3|KUBE-SEP-6FRGWTS5YGV54XIN= OUT=eth0 SRC=172.16.0.2 DST=10.110.238.91 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6879 DF PROTO=TCP SPT=49620 DPT=9090 WINDOW=29200 RES=0x00 SYN URGP=0 MARK=0x4000 
    Oct  9 15:34:59 k8s-master kernel: filter|3|OUTPUTIN= OUT=eth0 SRC=172.16.0.2 DST=10.244.1.6 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6879 DF PROTO=TCP SPT=49620 DPT=9090 WINDOW=29200 RES=0x00 SYN URGP=0 MARK=0x4000 
    Oct  9 15:34:59 k8s-master kernel: nat|1|POSTROUTINGIN= OUT=flannel.1 SRC=172.16.0.2 DST=10.244.1.6 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6879 DF PROTO=TCP SPT=49620 DPT=9090 WINDOW=29200 RES=0x00 SYN URGP=0 MARK=0x4000 
    Oct  9 15:34:59 k8s-master kernel: nat|1|KUBE-POSTROUTINGIN= OUT=flannel.1 SRC=172.16.0.2 DST=10.244.1.6 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=6879 DF PROTO=TCP SPT=49620 DPT=9090 WINDOW=29200 RES=0x00 SYN URGP=0 MARK=0x4000 
    iptables-log
    -A OUTPUT -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
    -A KUBE-SERVICES ! -s 10.244.0.0/16 -d 10.110.238.91/32 -p tcp -m comment --comment "default/prometheus:tcp cluster IP" -m tcp --dport 9090 -j KUBE-MARK-MASQ
    -A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
    -A KUBE-SERVICES -d 10.110.238.91/32 -p tcp -m comment --comment "default/prometheus:tcp cluster IP" -m tcp --dport 9090 -j KUBE-SVC-FNI7RW7PEKOXZDFO
    -A KUBE-SVC-FNI7RW7PEKOXZDFO -j KUBE-SEP-6FRGWTS5YGV54XWV
    -A KUBE-SEP-6FRGWTS5YGV54XWV -p tcp -m tcp -j DNAT --to-destination 10.244.1.6:9090
    -A OUTPUT -m conntrack --ctstate NEW -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
    -A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
    -A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE

    由于是节点上面通过curl访问,也就是节点上的进程发出请求,所以匹配的第一条规则在OUTPUT链且不经过FORWARD链

    然后发现在Openshift环境下发现节点访问svc是不会做MASQ的:我们知道本机进程发出请求之前要先根据目标ip做路由判断再匹配OUTPUT链,这个决定了包是从哪个网卡发出的,比如在当前openshift环境下会为每个节点增加一条路由规则(例如172.30.0.0/16 dev tun0,这个tun0网卡相当于这个节点所有pod的网关),这样路由判断之后匹配到的iptables规则的包的src就都是tun0即pod网段了,所以node直接访问svc是否会做MASQ跟网络插件为节点添加的路由表有关系。

    2、master节点上的pod访问svc 10.244.0.4 -> 10.110.238.91

    pid=$(docker ps | grep -i coredns | docker inspect 9752d0a23a80 --format="{{.State.Pid}}")
    nsenter -t $pid -n
    Oct 10 14:43:44 k8s-master kernel: nat|1|PREROUTINGIN=cni0 OUT= PHYSIN=veth5082b103 MAC=0a:58:0a:f4:00:01:0a:58:0a:f4:00:04:08:00 SRC=10.244.0.4 DST=10.110.238.91 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=55653 DF PROTO=TCP SPT=51398 DPT=9090 WINDOW=28200 RES=0x00 SYN URGP=0 
    Oct 10 14:43:44 k8s-master kernel: nat|15|KUBE-SERVICESIN=cni0 OUT= PHYSIN=veth5082b103 MAC=0a:58:0a:f4:00:01:0a:58:0a:f4:00:04:08:00 SRC=10.244.0.4 DST=10.110.238.91 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=55653 DF PROTO=TCP SPT=51398 DPT=9090 WINDOW=28200 RES=0x00 SYN URGP=0 
    Oct 10 14:43:44 k8s-master kernel: nat|1|KUBE-SVC-FNI7RW7PEKOXZDIN=cni0 OUT= PHYSIN=veth5082b103 MAC=0a:58:0a:f4:00:01:0a:58:0a:f4:00:04:08:00 SRC=10.244.0.4 DST=10.110.238.91 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=55653 DF PROTO=TCP SPT=51398 DPT=9090 WINDOW=28200 RES=0x00 SYN URGP=0 
    Oct 10 14:43:44 k8s-master kernel: nat|3|KUBE-SEP-6FRGWTS5YGV54XIN=cni0 OUT= PHYSIN=veth5082b103 MAC=0a:58:0a:f4:00:01:0a:58:0a:f4:00:04:08:00 SRC=10.244.0.4 DST=10.110.238.91 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=55653 DF PROTO=TCP SPT=51398 DPT=9090 WINDOW=28200 RES=0x00 SYN URGP=0 
    Oct 10 14:43:44 k8s-master kernel: filter|1|FORWARDIN=cni0 OUT=flannel.1 PHYSIN=veth5082b103 MAC=0a:58:0a:f4:00:01:0a:58:0a:f4:00:04:08:00 SRC=10.244.0.4 DST=10.244.1.6 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=55653 DF PROTO=TCP SPT=51398 DPT=9090 WINDOW=28200 RES=0x00 SYN URGP=0 
    Oct 10 14:43:44 k8s-master kernel: filter|3|FORWARDIN=cni0 OUT=flannel.1 PHYSIN=veth5082b103 MAC=0a:58:0a:f4:00:01:0a:58:0a:f4:00:04:08:00 SRC=10.244.0.4 DST=10.244.1.6 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=55653 DF PROTO=TCP SPT=51398 DPT=9090 WINDOW=28200 RES=0x00 SYN URGP=0 
    Oct 10 14:43:44 k8s-master kernel: nat|3|POSTROUTINGIN= OUT=flannel.1 PHYSIN=veth5082b103 SRC=10.244.0.4 DST=10.244.1.6 LEN=60 TOS=0x00 PREC=0x00 TTL=63 ID=55653 DF PROTO=TCP SPT=51398 DPT=9090 WINDOW=28200 RES=0x00 SYN URGP=0 
    iptables-log-pod-svc
    -A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
    -A KUBE-SERVICES -d 10.110.238.91/32 -p tcp -m comment --comment "default/prometheus:tcp cluster IP" -m tcp --dport 9090 -j KUBE-SVC-FNI7RW7PEKOXZDFO
    -A KUBE-SVC-FNI7RW7PEKOXZDFO -j KUBE-SEP-6FRGWTS5YGV54XWV
    -A KUBE-SEP-6FRGWTS5YGV54XWV -p tcp -m tcp -j DNAT --to-destination 10.244.1.6:9090
    -A FORWARD -m comment --comment "kubernetes forwarding rules" -j KUBE-FORWARD
    -A FORWARD -s 10.244.0.0/16 -j ACCEPT
    -A POSTROUTING -s 10.244.0.0/16 -d 10.244.0.0/16 -j RETURN

    节点上的pod访问svc,当包从veth流到宿主节点上时应该从PREROUTING开始匹配,对应入口第一条规则

    在flannel+vxlan实现中,如果源ip和目标ip都属于pod网段,那么请求必然是从pod发出,所以这里不需要snat

    3、pod直接访问自己的svc 10.244.0.4 -> 10.96.0.10

    Oct 10 15:19:02 k8s-master kernel: nat|1|PREROUTINGIN=cni0 OUT= PHYSIN=veth5082b103 MAC=0a:58:0a:f4:00:01:0a:58:0a:f4:00:04:08:00 SRC=10.244.0.4 DST=10.96.0.10 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=27522 DF PROTO=TCP SPT=57672 DPT=53 WINDOW=28200 RES=0x00 SYN URGP=0 
    Oct 10 15:19:02 k8s-master kernel: nat|11|KUBE-SERVICESIN=cni0 OUT= PHYSIN=veth5082b103 MAC=0a:58:0a:f4:00:01:0a:58:0a:f4:00:04:08:00 SRC=10.244.0.4 DST=10.96.0.10 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=27522 DF PROTO=TCP SPT=57672 DPT=53 WINDOW=28200 RES=0x00 SYN URGP=0 
    Oct 10 15:19:02 k8s-master kernel: nat|1|KUBE-SEP-SF3LG62VAE5ALYIN=cni0 OUT= PHYSIN=veth5082b103 MAC=0a:58:0a:f4:00:01:0a:58:0a:f4:00:04:08:00 SRC=10.244.0.4 DST=10.96.0.10 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=27522 DF PROTO=TCP SPT=57672 DPT=53 WINDOW=28200 RES=0x00 SYN URGP=0 
    Oct 10 15:19:02 k8s-master kernel: nat|1|KUBE-MARK-MASQIN=cni0 OUT= PHYSIN=veth5082b103 MAC=0a:58:0a:f4:00:01:0a:58:0a:f4:00:04:08:00 SRC=10.244.0.4 DST=10.96.0.10 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=27522 DF PROTO=TCP SPT=57672 DPT=53 WINDOW=28200 RES=0x00 SYN URGP=0 
    Oct 10 15:19:02 k8s-master kernel: nat|3|KUBE-SEP-SF3LG62VAE5ALYIN=cni0 OUT= PHYSIN=veth5082b103 MAC=0a:58:0a:f4:00:01:0a:58:0a:f4:00:04:08:00 SRC=10.244.0.4 DST=10.96.0.10 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=27522 DF PROTO=TCP SPT=57672 DPT=53 WINDOW=28200 RES=0x00 SYN URGP=0 MARK=0x4000 
    Oct 10 15:19:02 k8s-master kernel: filter|1|FORWARDIN=cni0 OUT=cni0 PHYSIN=veth5082b103 PHYSOUT=veth5082b103 MAC=0a:58:0a:f4:00:04:0a:58:0a:f4:00:04:08:00 SRC=10.244.0.4 DST=10.244.0.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=27522 DF PROTO=TCP SPT=57672 DPT=53 WINDOW=28200 RES=0x00 SYN URGP=0 MARK=0x4000 
    Oct 10 15:19:02 k8s-master kernel: filter|1|KUBE-FORWARDIN=cni0 OUT=cni0 PHYSIN=veth5082b103 PHYSOUT=veth5082b103 MAC=0a:58:0a:f4:00:04:0a:58:0a:f4:00:04:08:00 SRC=10.244.0.4 DST=10.244.0.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=27522 DF PROTO=TCP SPT=57672 DPT=53 WINDOW=28200 RES=0x00 SYN URGP=0 MARK=0x4000 
    Oct 10 15:19:02 k8s-master kernel: nat|1|POSTROUTINGIN= OUT=cni0 PHYSIN=veth5082b103 PHYSOUT=veth5082b103 SRC=10.244.0.4 DST=10.244.0.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=27522 DF PROTO=TCP SPT=57672 DPT=53 WINDOW=28200 RES=0x00 SYN URGP=0 MARK=0x4000 
    Oct 10 15:19:02 k8s-master kernel: nat|1|KUBE-POSTROUTINGIN= OUT=cni0 PHYSIN=veth5082b103 PHYSOUT=veth5082b103 SRC=10.244.0.4 DST=10.244.0.4 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=27522 DF PROTO=TCP SPT=57672 DPT=53 WINDOW=28200 RES=0x00 SYN URGP=0 MARK=0x4000 
    pod-serf-svc
    -A PREROUTING -m comment --comment "kubernetes service portals" -j KUBE-SERVICES
    -A KUBE-SERVICES -d 10.96.0.10/32 -p tcp -m comment --comment "kube-system/kube-dns:dns-tcp cluster IP" -m tcp --dport 53 -j KUBE-SVC-ERIFXISQEP7F7OF4
    -A KUBE-SVC-ERIFXISQEP7F7OF4 -m statistic --mode random --probability 0.50000000000 -j KUBE-SEP-SF3LG62VAE5ALYDV
    -A KUBE-SEP-SF3LG62VAE5ALYDV -s 10.244.0.4/32 -j KUBE-MARK-MASQ
    -A KUBE-MARK-MASQ -j MARK --set-xmark 0x4000/0x4000
    -A KUBE-SEP-SF3LG62VAE5ALYDV -p tcp -m tcp -j DNAT --to-destination 10.244.0.4:53
    -A FORWARD -m comment --comment "kubernetes forwarding rules" -j KUBE-FORWARD
    -A KUBE-FORWARD -m comment --comment "kubernetes forwarding rules" -m mark --mark 0x4000/0x4000 -j ACCEPT
    -A POSTROUTING -m comment --comment "kubernetes postrouting rules" -j KUBE-POSTROUTING
    -A KUBE-POSTROUTING -m comment --comment "kubernetes service traffic requiring SNAT" -m mark --mark 0x4000/0x4000 -j MASQUERADE

    同样是会先走PREROUTING,然后因为后端有两个pod,0.5的概率选择到自身,然后发现源ip是自己的ip,就需要做MASQ,防止pod收到包之后误以为是自己发给自己(10.244.0.4->10.244.0.4)从而握手失败(正常应该是10.96.0.10 -> 10.244.0.4),做MASQ可以让回包先到宿主节点,再利用conntrack记录的信息设置源ip为svcIp。

    类似的,当集群外客户端通过nodePort形式访问时,也需要做snat,否则回包直接发给客户端就会导致握手失败。

    问题

    在/var/log/iptables通过ip筛选规则时,发现很经常规则是不全的

  • 相关阅读:
    Build 2019 彩蛋
    崂山
    Win10 iot 修改日期时间
    《 结网:改变世界的互联网产品经理 》
    <[你在荒废时间的时候别人都在拼命!]>
    《时间的玫瑰》阅读笔记
    翻石头价值投资手册-科技行业
    No module named flask.ext.sqlalchemy.SQLALchemy
    《寻找伟大的企业》
    <《基金经理投资笔记丛书4-1:投资是一种生活方式》>
  • 原文地址:https://www.cnblogs.com/orchidzjl/p/13784264.html
Copyright © 2020-2023  润新知