简单的Redis及哨兵监控报警
前段时间给第三方客户部署了redis主从+读写分离+哨兵的集群,需要简单配置一个报警(毕竟人家服务器不好意思装zabbix)
一、配置Linux服务器从第三方 SMTP 服务器外发邮件
1、确保postfix服务运行
# systemctl status postfix
2、安装mailx
# yum install -y mailx
3、配置smtp服务器
修改/etc/mail.rc文件,在文件中添加以下内容
# vim /etc/mail.rc
set from=user_sunli@sina.com set smtp=smtp.sina.com set smtp-port=465 set smtp-auth-user=user_sunli@sina.com set smtp-auth-password=xxxxxxxxxxxx set smtp-auth=login
4、测试
# echo "邮件内容" |mail -s "邮件标题" 公网邮箱
# echo "hello" |mail -s "hehehe" sunli@bdszh.vip
二、监控脚本及定时任务
安装nc
yum -y install nc
编写脚本
vim /data/scripts/redis_mail.sh
#!/bin/bash local_ip=`hostname -I|awk '{print $1}'` netstat -tnlp|grep 56379 [ `echo $?` != 0 ] && systemctl restart redis.service && echo "Please check $local_ip redis " |mail -s "redis is down" sunli@bdszh.vip netstat -tnlp|grep 46379 [ `echo $?` != 0 ] && systemctl restart sentinel.service && echo "Please check $local_ip sentinel " |mail -s "sentinel is down" sunli@bdszh.vip nc -zvw3 10.0.36.132 56379 [ `echo $?` != 0 ] && ansible 10.0.36.132 -m systemd -a "name=redis state=restarted" && echo "Please check 10.0.36.132 redis " |mail -s "redis is down" sunli@bdszh.vip nc -zvw3 10.0.36.132 46379 [ `echo $?` != 0 ] && ansible 10.0.36.132 -m systemd -a "name=sentinel state=restarted" && echo "Please check 10.0.36.132 sentinel " |mail -s "sentinel is down" sunli@bdszh.vip nc -zvw3 10.0.36.134 56379 [ `echo $?` != 0 ] && ansible 10.0.36.134 -m systemd -a "name=redis state=restarted" && echo "Please check 10.0.36.134 redis " |mail -s "redis is down" sunli@bdszh.vip nc -zvw3 10.0.36.134 46379 [ `echo $?` != 0 ] && ansible 10.0.36.134 -m systemd -a "name=sentinel state=restarted" && echo "Please check 10.0.36.134 sentinel " |mail -s "sentinel is down" sunli@bdszh.vip
定时任务
在linux中 crontab的最小执行单位是分钟,没法直接实现单位秒的运行,所以得通过其他方式来处理。
思路:假如每5秒运行一次,那就运行一次后睡眠5秒,5秒后再睡眠5秒,依次类推
# crontab -e
*/1 * * * * /bin/bash -x /data/scripts/redis_mail.sh > /dev/null 2>&1 */1 * * * * sleep 5; /bin/bash -x /data/scripts/redis_mail.sh > /dev/null 2>&1 */1 * * * * sleep 10; /bin/bash -x /data/scripts/redis_mail.sh > /dev/null 2>&1 */1 * * * * sleep 15; /bin/bash -x /data/scripts/redis_mail.sh > /dev/null 2>&1 */1 * * * * sleep 20; /bin/bash -x /data/scripts/redis_mail.sh > /dev/null 2>&1 */1 * * * * sleep 25; /bin/bash -x /data/scripts/redis_mail.sh > /dev/null 2>&1 */1 * * * * sleep 30; /bin/bash -x /data/scripts/redis_mail.sh > /dev/null 2>&1 */1 * * * * sleep 35; /bin/bash -x /data/scripts/redis_mail.sh > /dev/null 2>&1 */1 * * * * sleep 40; /bin/bash -x /data/scripts/redis_mail.sh > /dev/null 2>&1 */1 * * * * sleep 45; /bin/bash -x /data/scripts/redis_mail.sh > /dev/null 2>&1 */1 * * * * sleep 50; /bin/bash -x /data/scripts/redis_mail.sh > /dev/null 2>&1 */1 * * * * sleep 55; /bin/bash -x /data/scripts/redis_mail.sh > /dev/null 2>&1
三、模拟故障情况
自行停止redis或者哨兵