背景
- 部分部署在
dev
环境服务,例如:minio
服务,需要对外(k8s
集群外部)提供服务。 - 公司
dev
环境POD
部分业务也会走域名解析访问minio
服务,服务访问minio
域名,偶尔不能访问成功
分析
- 启用一个自带
nsloopup
命令的POD
进行测试
$ cat demo.yaml
#deploy
apiVersion: apps/v1
kind: Deployment
metadata:
name: tomcat-demo
spec:
selector:
matchLabels:
app: tomcat-demo
replicas: 1
template:
metadata:
labels:
app: tomcat-demo
spec:
containers:
- name: tomcat-demo
image: tomcat:8.0.51-alpine
ports:
- containerPort: 8080
---
#service
apiVersion: v1
kind: Service
metadata:
name: tomcat-demo
spec:
ports:
- port: 80
protocol: TCP
targetPort: 8080
selector:
app: tomcat-demo
kubectl apply -f dome.yaml
- 进入容器中,指定
coredns IP
地址后,对域名进行解析
$ kubectl exec -it tomcat-dome-xxxx -- bash
# 使用 ping 命令进行测试
bash-4.4# ping dev-minio.evescn.com
ping: bad address 'dev-minio.evescn.com'
bash-4.4# ping dev-minio.evescn.com
ping: bad address 'dev-minio.evescn.com'
bash-4.4#
bash-4.4#
bash-4.4#
bash-4.4# ping dev-minio.evescn.com
PING dev-minio.edocyun.com.cn (172.16.0.223): 56 data bytes
64 bytes from 172.16.0.223: seq=0 ttl=63 time=0.368 ms
64 bytes from 172.16.0.223: seq=1 ttl=63 time=0.391 ms
64 bytes from 172.16.0.223: seq=2 ttl=63 time=0.354 ms
# 使用 nslookup 进行测试
bash-4.4# nslookup dev-minio.evescn.com 10.0.0.2
Server: 10.0.0.2
Address 1: 10.0.0.2 kube-dns.kube-system.svc.cluster.local
nslookup: can't resolve 'dev-minio.evescn.com': Name does not resolve
bash-4.4# nslookup dev-minio.evescn.com 10.0.0.2
Server: 10.0.0.2
Address 1: 10.0.0.2 kube-dns.kube-system.svc.cluster.local
nslookup: can't resolve 'dev-minio.evescn.com': Name does not resolve
bash-4.4# nslookup dev-minio.evescn.com 10.0.0.2
Server: 10.0.0.2
Address 1: 10.0.0.2 kube-dns.kube-system.svc.cluster.local
Name: dev-minio.evescn.com
Address 1: 172.16.0.232 172-16-0-232.node-exporter.kubesphere-monitoring-system.svc.cluster.local
Address 2: 172.16.0.231 172-16-0-231.kubelet.kube-system.svc.cluster.local
- 问题分析
经过上面的 ping
测试和 nslookup
测试,分析发现 k8s
集群内部 pod
解析外部域名,先走 coredns
内部域名解析,再走局域网 dns
解析。而无法解析的时候问题原因是:coredns
解析就返回报错了,定位问题为 coredns
解析外部域名存在问题。
网上查看问题,发现可能是 coredns
解析问题导致
- 查看
coredns
的配置文件如下
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
reload
loadbalance
}
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
其中 forward . /etc/resolv.conf 配置表示使用当coredns内部不能解析的时候,
向宿主机上的resolv.conf文件中配置的nameserver转发dns解析请求,
当宿主机上namserver有多个时,默认采用的时random的方式随机转发,失败后就返回错误。
- 宿主机
/etc/resolv.conf
$ cat /etc/resolv.conf
# Generated by NetworkManager
nameserver 172.16.0.50
nameserver 114.114.114.114
- 将其 forward 的 policy 设置为 sequential
forward . /etc/resolv.conf {
max_concurrent 1000 # 新增配置
policy sequential # 新增配置
}
解决方案
- 编辑
coredns
配置文件,修改配置,并重启POD
$ kubectl -n kube-system edit cm coredns
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf {
max_concurrent 1000 # 新增配置
policy sequential # 新增配置
}
cache 30
reload
loadbalance
}
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
$ kubectl -n kube-system delete pods coredns-xxxxxx
参考博客
https://blog.csdn.net/u013812710/article/details/119897020