【现象】
mesos启动失败,查看mesos状态报错:
[root@hps102 ~]# systemctl status mesos-master
● mesos-master.service - Mesos Master
Loaded: loaded (/usr/lib/systemd/system/mesos-master.service; enabled; vendor preset: disabled)
Active: activating (auto-restart) (Result: exit-code) since Mon 2016-05-30 15:26:11 CST; 5s ago
Process: 7783 ExecStart=/usr/bin/mesos-init-wrapper master (code=exited, status=1/FAILURE)
Main PID: 7783 (code=exited, status=1/FAILURE)
May 30 15:26:10 hps102 systemd[1]: mesos-master.service: main process exited, code=exited, status=1/FAILURE
May 30 15:26:11 hps102 systemd[1]: Unit mesos-master.service entered failed state.
May 30 15:26:11 hps102 systemd[1]: mesos-master.service failed.
通过journalctl查看mesos-master日志:
[root@hps102 ~]# journalctl -f -u mesos-master
-- Logs begin at Mon 2016-05-30 04:03:01 CST. --
May 30 15:27:52 hps102 systemd[1]: mesos-master.service holdoff time over, scheduling restart.
May 30 15:27:52 hps102 systemd[1]: Started Mesos Master.
May 30 15:27:52 hps102 systemd[1]: Starting Mesos Master...
May 30 15:27:52 hps102 mesos-master[8200]: Failed to load unknown flag 'quorum.rpmsave'
May 30 15:27:52 hps102 mesos-master[8200]:
May 30 15:27:52 hps102 mesos-master[8200]: Usage: mesos-master [options]
May 30 15:27:52 hps102 mesos-master[8200]:
May 30 15:27:52 hps102 mesos-master[8200]: --acls=VALUE The value could be a JSON-formatted string of ACLs
May 30 15:27:52 hps102 mesos-master[8200]: or a file path containing the JSON-formatted ACLs used
May 30 15:27:52 hps102 systemd[1]: mesos-master.service: main process exited, code=exited, status=1/FAILURE
May 30 15:27:52 hps102 systemd[1]: Unit mesos-master.service entered failed state.
May 30 15:27:52 hps102 systemd[1]: mesos-master.service failed.
^XMay 30 15:28:12 hps102 systemd[1]: mesos-master.service holdoff time over, scheduling restart.
May 30 15:28:12 hps102 systemd[1]: Started Mesos Master.
May 30 15:28:12 hps102 systemd[1]: Starting Mesos Master...
May 30 15:28:12 hps102 systemd[1]: mesos-master.service: main process exited, code=exited, status=1/FAILURE
May 30 15:28:12 hps102 systemd[1]: Unit mesos-master.service entered failed state.
May 30 15:28:12 hps102 systemd[1]: mesos-master.service failed.
【原因】
查看mesos配置目录:
[root@hps102 ~]# ll /etc/mesos*
/etc/mesos:
total 8
-rw-r--r-- 1 root root 65 May 30 15:21 zk
-rw-r--r-- 1 root root 65 May 30 15:04 zk.rpmsave
/etc/mesos-master:
total 24
-rw-r--r-- 1 root root 5 May 30 15:21 logging_level
-rw-r--r-- 1 root root 2 May 30 15:21 quorum
-rw-r--r-- 1 root root 2 May 30 15:03 quorum.rpmsave
-rw-r--r-- 1 root root 17 May 30 15:21 work_dir
-rw-r--r-- 1 root root 17 May 30 15:03 work_dir.rpmsave
-rw-r--r-- 1 root root 7 May 30 15:21 zk_session_timeout
/etc/mesos-slave:
total 16
-rw-r--r-- 1 root root 20 May 30 15:21 attributes
-rw-r--r-- 1 root root 13 May 30 15:21 containerizers
-rw-r--r-- 1 root root 8 May 30 15:21 logging_level
-rw-r--r-- 1 root root 17 May 30 15:21 work_dir
发现多了几个.rpmsave结尾的文件。
这个是由于mesos-master执行yum remove mesos的时候,mesos-master还是运行状态,导致删除时有备份文件产生。
【处理】
删除上面那几个文件即可:
rm -f /etc/mesos-master/quorum.rpmsave /etc/mesos-master/work_dir.rpmsave /etc/mesos/zk.rpmsave