在一个容器中,删除一个目录,失败:
bash-4.2# pwd /home/zxcdn/ottcache/tomcat bash-4.2# uname -a Linux 3516b6c97679 3.10.0-327.22.2.el7.x86_64 #1 SMP Fri Sep 29 15:13:08 CST 2017 x86_64 x86_64 x86_64 GNU/Linux bash-4.2# whoami root bash-4.2# ls -alrt bin total 8 drwxr-xr-x. 1 root root 4096 Dec 3 02:49 . drwxr-xr-x. 1 root root 4096 Dec 4 02:28 .. bash-4.2# rm -rf bin bash-4.2# ls -i 33012 bin bash-4.2# rm -rf bin bash-4.2# ls -i 33012 bin
相关docker版本信息:
[root@host-80-80-34-255 caq]# docker info Containers: 2 Running: 1 Paused: 0 Stopped: 1 Images: 1 Server Version: 1.13.1 Storage Driver: overlay2----------存储引擎 Backing Filesystem: extfs--------底层文件系统 Supports d_type: true Native Overlay Diff: false Logging Driver: journald Cgroup Driver: systemd Plugins: Volume: local Network: bridge host macvlan null overlay Swarm: inactive Runtimes: docker-runc runc Default Runtime: docker-runc Init Binary: /usr/libexec/docker/docker-init-current containerd version: (expected: aa8187dbd3b7ad67d8e5e3a15115d3eef43a7ed1) runc version: 5eda6f6fd0c2884c2c8e78a6e7119e8d0ecedb77 (expected: 9df8b306d01f59d3a8029be411de015b7304dd8f) init version: fec3683b971d9c3ef73f284f176672c44b448662 (expected: 949e6facb77383876aeff8a6944dde66b3089574) Security Options: seccomp WARNING: You're not using the default seccomp profile Profile: /etc/docker/seccomp.json Kernel Version: 3.10.0-327.22.2.el7.x86_64 Operating System: Carrier Grade Server Linux 5 OSType: linux Architecture: x86_64 Number of Docker Hooks: 3 CPUs: 2 Total Memory: 3.703 GiB Name: host-80-80-34-255 ID: 4CV6:Y3Q4:NYGV:PABH:VG42:3CN7:CKET:SEIV:4SYF:63PI:HYAB:AZR2 Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false Registry: https://index.docker.io/v1/ WARNING: bridge-nf-call-iptables is disabled WARNING: bridge-nf-call-ip6tables is disabled Experimental: false Insecure Registries: 0.0.0.0/0 127.0.0.0/8 Live Restore Enabled: false Registries: docker.io (secure)
发现删除不了这个空目录,strace跟踪一下,报错如下:
fcntl(3, F_GETFL) = 0x38800 (flags O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_NOFOLLOW) fcntl(3, F_SETFD, FD_CLOEXEC) = 0 getdents(3, /* 2 entries */, 32768) = 48 getdents(3, /* 0 entries */, 32768) = 0 close(3) = 0 unlinkat(AT_FDCWD, "bin", AT_REMOVEDIR) = -1 EINVAL (Invalid argument) lseek(0, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
原来是unlinkat报错,然后内核打点跟踪,堆栈如下:
Returning from: 0xffffffff811ed500 : vfs_rename+0x0/0x790 [kernel] Returning to : 0xffffffffa039860b : ovl_do_rename+0x3b/0xa0 [overlay] 0xffffffffa0398e4e : ovl_clear_empty+0x27e/0x2e0 [overlay] 0xffffffffa0398f28 : ovl_check_empty_and_clear+0x78/0x90 [overlay] 0xffffffffa039999c : ovl_do_remove+0x1ec/0x470 [overlay] 0xffffffffa0399c36 : ovl_rmdir+0x16/0x20 [overlay] 0xffffffff811ec738 : vfs_rmdir+0xa8/0x100 [kernel] 0xffffffff811f16d5 : do_rmdir+0x1a5/0x200 [kernel] 0xffffffff811f28b5 : SyS_unlinkat+0x25/0x40 [kernel] 0xffffffff81649909 : system_call_fastpath+0x16/0x1b [kernel]
看下确定是vfs_rename出错了,具体按行号打点:
probe kernel.statement("vfs_rename@namei.c:4122") { p_my=@cast($old_dir,"struct inode")->i_op; iflags=@cast($old_dir,"struct inode")->i_flags; printf("line 4122 flags=%u,rename2=%x,iflags=%u ",$flags,@cast(p_my,"struct inode_operations_wrapper")->rename2,iflags); print_backtrace(); }
对应的内核源码:
int vfs_rename(struct inode *old_dir, struct dentry *old_dentry, struct inode *new_dir, struct dentry *new_dentry, struct inode **delegated_inode, unsigned int flags) { 。。。。 rename2 = get_rename2_iop(old_dir);---------------4118行 if (!old_dir->i_op->rename && !rename2) return -EPERM; if (flags && !rename2)----------------------------4122行 return -EINVAL; 。。。。 }
一开始我直接取的rename2,发现不为NULL,按道理进不去4122行,后来经细心的谈虎走查,才发现是进入了如下的判断条件:
static inline const struct inode_operations_wrapper *get_iop_wrapper(struct inode *inode, unsigned version) { const struct inode_operations_wrapper *wrapper; if (!IS_IOPS_WRAPPER(inode))------------最终是这个条件起作用了 return NULL; wrapper = container_of(inode->i_op, const struct inode_operations_wrapper, ops); if (wrapper->version < version) return NULL; return wrapper; } static inline iop_rename2_t get_rename2_iop(struct inode *inode) { const struct inode_operations_wrapper *wrapper = get_iop_wrapper(inode, 0); return wrapper ? wrapper->rename2 : NULL; }
看起来,该内核版本的overlay存储引擎,对ext3的底层文件系统,兼容性存在一些问题。后来使用device-mapper来解决了该问题。
ext4里面,ext4_iget的时候,对目录操作的时候,inode的i_flags是设置了S_IOPS_WRAPPER属性的,
} else if (S_ISDIR(inode->i_mode)) {
inode->i_op = &ext4_dir_inode_operations.ops;
inode->i_fop = &ext4_dir_operations;
inode->i_flags |= S_IOPS_WRAPPER;
但是ext3没有设置。