问题描述:
1. K8S集群有一个worker,经常磁盘满,然后导致服务异常。
2. 查看/var/log/syslog, 发现非常多的异常如下:
1568405.455565] docker0: port 2(vethfd09262) entered forwarding state
[1568490.807194] aufs au_opts_verify:1612:docker[22618]: dirperm1 breaks the protection by the permission bits on the lower branch
[1568490.839695] aufs au_opts_verify:1612:docker[25041]: dirperm1 breaks the protection by the permission bits on the lower branch
3. 从/var/log/kern.log中查到以下异常:
SLUB: Unable to allocate memory on node -1 (gfp=0x2080020)
Mar 31 18:52:08 AQA-Worker-CLD kernel: [292333.759874] cache: nf_conntrack_12(1847:58cc5f8478f68d01290885da9a59e974cf0d4575d5b92047bea0c7fd5f82130f), object size: 312, buffer size: 320, default order: 1, min order: 0
原因:
AUFS不稳定,导致docker删除instance的时候不能正常删除,从docker ps上看container已经删除掉了,但系统资源并没有释放,导致磁盘使用持续上升。
参考:https://codeday.me/bug/20181115/395036.html
docker info
Containers: 0
Images: 0
Storage Driver: aufs
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
<output truncated>
解决方法:
1. sudo systemctl stop docker
2. mv /var/lib/docker
/var/lib/docker.bk
3. vim /etc/docker/daemon.json
{ "storage-driver": "overlay2" }
4. systemctl restart docker
5. docker info :
Containers: 0
Images: 0
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
<output truncated>
参考:https://docs.docker.com/storage/storagedriver/overlayfs-driver/