这里我们采用小米监控 open-falcon 这是server端就是 192.168.5.200 这台主机, agent就是负责将数据提交到 server端 agent整个集群所有主机都需要 dashboard就是用来将收集到的信息展示在网页上生成图表
参考:https://book.open-falcon.org/zh_0_2/quick_install/backend.html
http://www.cnblogs.com/benjamin77/p/8472632.html#auto_id_2
1.环境准备
调整时区为上海时区
[root@mage-monitor-01 ~]# ansible all -m shell -a "timedatectl set-timezone Asia/Shanghai"
[root@mage-monitor-01 ~]# ansible all -m shell -a "timedatectl"
查看时间是否同步
安装redis
yum install redis -y
安装mysql-server
rpm -ivh http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm
yum install -y mysql-server
启动mysql 激活开机自启动
systemctl start mysql;systemctl enable mysql;systemctl status mysql
初始化密码
[root@mage-monitor-01 ~]# mysql_secure_installation
数据库授权访问的网络,这里测试环境,就直接 放开
mysql -uroot -p123 grant all privileges on *.* to 'root'@'%' identified by '123'; flush privileges;
安装git
yum install git -y
下载openfalcon的一些表结构
cd /tmp/ && git clone https://github.com/open-falcon/falcon-plus.git
导入表结构
cd /tmp/falcon-plus/scripts/mysql/db_schema/ mysql -h 127.0.0.1 -u root -p < 1_uic-db-schema.sql mysql -h 127.0.0.1 -u root -p < 2_portal-db-schema.sql mysql -h 127.0.0.1 -u root -p < 3_dashboard-db-schema.sql mysql -h 127.0.0.1 -u root -p < 4_graph-db-schema.sql mysql -h 127.0.0.1 -u root -p < 5_alarms-db-schema.sql rm -rf /tmp/falcon-plus/
安装go开发包
yum install golang -y
设置go 环境变量
export GOROOT=/usr/lib/golang
export GOPATH=/home
2.单机安装open-falcon server和agent
下载
[root@mage-monitor-01 db_schema]# source /etc/profile [root@mage-monitor-01 db_schema]# cd [root@mage-monitor-01 ~]# export FALCON_HOME=/home/work [root@mage-monitor-01 ~]# export WORKSPACE=$FALCON_HOME/open-falcon
[root@mage-monitor-01 ~]# cd /home/work/open-falcon/
[root@mage-monitor-01 open-falcon]# wget https://github.com/open-falcon/falcon-plus/releases/download/v0.2.1/open-falcon-v0.2.1.tar.gz
更改配置文件的mysql用户密码
[root@mage-monitor-01 open-falcon]# sed -i 's/root:/root:123/g' aggregator/config/cfg.json
[root@mage-monitor-01 open-falcon]# sed -i 's/root:/root:123/g' graph/config/cfg.json [root@mage-monitor-01 open-falcon]# sed -i 's/root:/root:123/g' hbs/config/cfg.json [root@mage-monitor-01 open-falcon]# sed -i 's/root:/root:123/g' nodata/config/cfg.json [root@mage-monitor-01 open-falcon]# sed -i 's/root:/root:123/g' api/config/cfg.json [root@mage-monitor-01 open-falcon]# sed -i 's/root:/root:123/g' alarm/config/cfg.json
重载配置
curl 127.0.0.1:1988/config/reload
修改agent配置
[root@mage-monitor-01 config]# pwd /home/work/open-falcon/agent/config [root@mage-monitor-01 config]# sed -i 's/0.0.0.0/192.168.5.200/g' cfg.json
启动server 和agent 并检查状态
[root@mage-monitor-01 open-falcon]# ./open-falcon start [falcon-graph] 8793 [falcon-hbs] 8799 [falcon-judge] 8802 [falcon-transfer] 8808 [falcon-nodata] 8814 [falcon-aggregator] 8822 [falcon-agent] 8835 [falcon-gateway] 8843 [falcon-api] 8847 [falcon-alarm] 8856 [root@mage-monitor-01 open-falcon]# ./open-falcon start agent [falcon-agent] 8835 [root@mage-monitor-01 open-falcon]# ./open-falcon check falcon-graph UP 8793 falcon-hbs UP 8799 falcon-judge UP 8802 falcon-transfer UP 8808 falcon-nodata UP 8814 falcon-aggregator UP 8822 falcon-agent UP 8835 falcon-gateway UP 8843 falcon-api UP 8847 falcon-alarm UP 8856
3.在其他主机上开启agent
使用 ansible 创建open-falcon的工作目录 ,复制agent 目录 和 运行执行脚本 open-falcon 到远端
[root@mage-monitor-01 ~]# cd /home/work/open-falcon/
[root@mage-monitor-01 open-falcon]# ansible all -m shell -a "export HOME=/home/work;export WORKSPACE=$HOME/open-falcon"
[root@mage-monitor-01 open-falcon]# ansible all -m copy -a "path=/home/work/open-falcon/open-falcon/agent dest=/home/work/open-falcon group=501 owner=501 mode=0755"
[root@mage-monitor-01 open-falcon]# ansible all -m copy -a "src=/home/work/open-falcon/open-falcon dest=/home/work/open-falcon group=501 owner=501 mode=0755"
启动程序后 在网页输入 192.168.5.200:8081 需要注册一个用户 第一个注册的用户是管理员,具有管理用户的功能
暂时先使用 小米监控的自带模板,后期数据库 缓存等 监控 后面再添加。
5.添加 服务启动的定时任务
server端开启三个 server dashboard agent
ansible 192.168.5.200 -m cron -a "name='start open-falcon agent' special_time=reboot job='cd /home/work/open-falcon/;./open-falcon start agent'"
[root@mage-monitor-01 ~]# ansible 192.168.5.200 -m cron -a "name='start open-falcon server' special_time=reboot job='cd /home/work/open-falcon;./open-falcon start;./open-falcon check'"
# 这个用定时任务有问题,开启后最好再check一下,实在不行手动开启一下。
ansible 192.168.5.200 -m cron -a "name='start open-falcon dashboard' special_time=reboot job='cd /home/work/open-falcon/dashboard;bash control start'"
重启验证下 这样的骚操作是否有效
很稳,妥妥的有效
这下 再批量添加一下 其他所有节点 只需要启动个 agent
[root@mage-monitor-01 ~]# ansible all -m cron -a "name='start open-falcon agent' special_time=reboot job='cd /home/work/open-falcon/;./open-falcon start agent'"
这下 理论上机器健康的活着 它就得给我监控。除非意外进程挂了。
磁盘满了, cpu 资源耗光,负载过高 这种意外,难道不是监控应该先发现吗,所以上面这话没毛病。