一、Nagios简介
Nagios是一款开源的电脑系统和网络监视工具,能有效监控Windows、Linux和Unix的主机状态,交换机路由器等网络设置,打印机等。在系统或服务状态异常时发出邮件或短信报警第一时间通知网站运维人员,在状态恢复后发出正常的邮件或短信通知。
主要功能
- 网络服务监控(SMTP、POP3、HTTP、NNTP、ICMP、SNMP、FTP、SSH)
- 主机资源监控(CPU load、disk usage、system logs),也包括Windows主机(使用NSClient++ plugin)
- 可以指定自己编写的Plugin通过网络收集数据来监控任何情况(温度、警告……)
- 可以通过配置Nagios远程执行插件远程执行脚本
- 远程监控支持SSH或SSL加通道方式进行监控
- 简单的plugin设计允许用户很容易的开发自己需要的检查服务,支持很多开发语言(shell scripts、C++、Perl、ruby、Python、PHP、C#等)
- 包含很多图形化数据Plugins(Nagiosgraph、Nagiosgrapher、PNP4Nagios等)
- 可并行服务检查
- 能够定义网络主机的层次,允许逐级检查,就是从父主机开始向下检查
- 当服务或主机出现问题时发出通告,可通过email, pager, sms 或任意用户自定义的plugin进行通知
- 能够自定义事件处理机制重新激活出问题的服务或主机
- 自动日志循环
- 支持冗余监控
- 包括Web界面可以查看当前网络状态,通知,问题历史,日志文件等
工作原理
Nagios的功能是监控服务和主机,但是他自身并不包括这部分功能,所有的监控、检测功能都是通过各种插件来完成的。
启动Nagios后,它会周期性的自动调用插件去检测服务器状态,同时Nagios会维持一个队列,所有插件返回来的状态信息都进入队列,Nagios每次都从队首开始读取信息,并进行处理后,把状态结果通过web显示出来。
Nagios提供了许多插件,利用这些插件可以方便的监控很多服务状态。安装完成后,在nagios主目录下的/libexec里放有nagios自带的可以使用的所有插件,如,check_disk是检查磁盘空间的插件,check_load是检查CPU负载的,等等。每一个插件可以通过运行./check_xxx –h 来查看其使用方法和功能。
监控状态
Nagios可以识别4种状态返回信息,即 0(OK)表示状态正常/绿色、1(WARNING)表示出现警告/黄色、2(CRITICAL)表示出现非常严重的错误/红色、3(UNKNOWN)表示未知错误/深黄色。
Nagios根据插件返回来的值,来判断监控对象的状态,并通过web显示出来,以供管理员及时发现故障。
Nagios通过nrpe插件来远程管理服务
1. Nagios 执行安装在它里面的check_nrpe 插件,并告诉check_nrpe 去检测哪些服务。
2. 通过SSL,check_nrpe 连接远端机子上的NRPE daemon
3. NRPE 运行本地的各种插件去检测本地的服务和状态(check_disk,..etc)
4. 最后,NRPE 把检测的结果传给主机端的check_nrpe,check_nrpe 再把结果送到Nagios状态队列中。
5. Nagios 依次读取队列中的信息,再把结果显示出来。
实验环境
CentOS6.5:192.168.1.1/24 [ LAMP、Nagios、nagios-plugins、nrpe ]
CentOS6.5:192.168.1.2/24 [ nagios-plugins、nrpe ]
windows10客户端
二、Nagios服务端配置
1.安装nagios
[root@bogon ~]# rpm -q openssl-devel openssl-devel-1.0.1e-15.el6.x86_64 [root@bogon ~]# useradd -s /sbin/nologin nagios [root@bogon ~]# mkdir /usr/local/nagios [root@bogon ~]# chown -R nagios:nagios /usr/local/nagios/
[root@bogon ~]# tar -zxvf nagios-4.0.1.tar.gz -C /usr/src/ [root@bogon ~]# cd /usr/src/nagios-4.0.1/ [root@bogon nagios-4.0.1]# ./configure --prefix=/usr/local/nagios/
[root@bogon nagios-4.0.1]# make all && make install && make install-init && make install commandmode && make install-config
[root@bogon nagios-4.0.1]# chkconfig --add nagios
[root@bogon nagios-4.0.1]# chkconfig nagios on
2、安装nagios-plugins
[root@bogon ~]# tar -zxvf nagios-plugins-1.5.tar.gz -C /usr/src/
[root@bogon ~]# cd /usr/src/nagios-plugins-1.5/
[root@bogon nagios-plugins-1.5]# ./configure --prefix=/usr/local/nagios/ && make && make install
注意:
nrpe的作用Nagios的监测服务器能够远程对被监测主机系统上的信息进行获取,比如远程系统上的进程数、磁盘空间使用状况、所运行的服务等等这些必须要登录远程主机系统上才能了解的信息的话,就必须要依靠NRPE这个核心扩展插件程序,NRPE作为中间的代理程序,扮演着一手接受着Nagios监测服务器发来的请求,另一手在远程主机系统上获取指定的信息的中间人角色。
设置httpd服务
[root@bogon ~]# /usr/bin/htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin //yum自动安装httpd的位置,最好是编译安装或者用shell脚本 New password: 123 Re-type new password: 123 //不显示密码 Adding password for user nagiosadmin
[root@bogon ~]# vim /etc/httpd/conf/httpd.conf //具体路径看自己的 末尾添加 ScriptAlias /nagios/cgi-bin/ "/usr/local/nagios/sbin/" <Directory "/usr/local/nagios/sbin"> Options Indexes FollowSymLinks AllowOverride None Order allow,deny Allow from all AuthName "nagios" AuthType Basic AuthUserFile /usr/local/nagios/etc/htpasswd.users require valid-user </Directory> Alias /nagios "/usr/local/nagios/share" <Directory "/usr/local/nagios/share"> Options Indexes FollowSymLinks AllowOverride None Order allow,deny Allow from all AuthName "nagiosi access" AuthType Basic AuthUserFile /usr/local/nagios/etc/htpasswd.users require valid-user </Directory>
[root@bogon ~]# service httpd stop
[root@bogon ~]# service httpd start //重启服务
[root@bogon ~]# vim /usr/local/nagios/etc/cgi.cfg
修改
use_authentication=0 (0:允许访问所有页面。1:禁止)
[root@bogon ~]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg //语法检测
Nagios Core 4.0.1 Copyright (c) 2009-present Nagios Core Development Team and Community Contributors Copyright (c) 1999-2009 Ethan Galstad Last Modified: 10-15-2013 License: GPL Website: http://www.nagios.org Reading configuration data... Read main config file okay... Read object config files okay... Running pre-flight check on configuration data... Checking objects... Checked 8 services. Checked 1 hosts. Checked 1 host groups. Checked 0 service groups. Checked 1 contacts. Checked 1 contact groups. Checked 24 commands. Checked 5 time periods. Checked 0 host escalations. Checked 0 service escalations. Checking for circular paths... Checked 1 hosts Checked 0 service dependencies Checked 0 host dependencies Checked 5 timeperiods Checking global event handlers... Checking obsessive compulsive processor commands... Checking misc settings... Total Warnings: 0 Total Errors: 0 Things look okay - No serious problems were detected during the pre-flight check
[root@bogon ~]# service nagios start //开启服务
账户名和密码就是我们刚才设置的nagiosadmin 123
配置信息的作用
/usr/local/nagios/etc/nagios.cfg //nagios主配置文件 /usr/local/nagios/etc/cgi.cfg //控制CGI访问的配置文件 /usr/local/nagios/etc/objects/commands.cfg //定义命令配置文件,其中定义的命令可以被其他配置文件引用。/usr/local/nagios/etc/objects/contacts.cfg //定义联系人/组的配置文件 /usr/local/nagios/etc/objects/localhost.cfg //定义监控本地主机的配置文件 /usr/local/nagios/etc/objects/printer.cfg //定义监控打印机的配置文件(模板) /usr/local/nagios/etc/objects/switch.cfg //定义监控路由器的配置文件(模板) /usr/local/nagios/etc/objects/templates.cfg //定义主机和服务的配置文件(模板) /usr/local/nagios/etc/objects/timeperiods.cfg //定义nagios监控时间段的配置文件 /usr/local/nagios/etc/objects/windows.cfg //定义Windows主机的配置文件(模板)
三、监控web服务
[root@bogon ~]# cp /usr/local/nagios/etc/objects/localhost.cfg /usr/local/nagios/etc/objects/webserver.cfg //复制配置文件信息
[root@bogon ~]# vim /usr/local/nagios/etc/objects/webserver.cfg
修改
define host{ use linux-server ; Name of host template to use host_name webserver alias ftp-server address 192.168.1.2 } define service{ use local-service ; Name of service template to use host_name webserver service_description PING check_command check_ping!100.0,20%!500.0,60% }
[root@bogon ~]# vim /usr/local/nagios/etc/nagios.cfg # You can specify individual object config files as shown below: cfg_file=/usr/local/nagios/etc/objects/commands.cfg cfg_file=/usr/local/nagios/etc/objects/contacts.cfg cfg_file=/usr/local/nagios/etc/objects/timeperiods.cfg cfg_file=/usr/local/nagios/etc/objects/templates.cfg cfg_file=/usr/local/nagios/etc/objects/webserver.cfg //添加的一行
四、被控端设置
1.安装nagios-plugins
[root@bogon ~]# rpm -q openssl openssl-devel //插件,没有就用yum安装 openssl-1.0.1e-15.el6.x86_64 openssl-devel-1.0.1e-15.el6.x86_64 [root@bogon ~]# useradd -s /sbin/nologin nagios
[root@bogon ~]# tar -zxvf nagios-plugins-1.5.tar.gz -C /usr/src/ [root@bogon ~]# cd /usr/src/nagios-plugins-1.5/ [root@bogon nagios-plugins-1.5]# ./configure --prefix=/usr/local/nagios [root@bogon nagios-plugins-1.5]# make && make install [root@bogon nagios-plugins-1.5]# chown -R nagios:nagios /usr/local/nagios/
2.安装nrpe
[root@bogon ~]# tar -zxvf nrpe-2.15.tar.gz -C /usr/src/ [root@bogont ~]# cd /usr/src/nrpe-2.15/ [root@bogont nrpe-2.15]# ./configure --prefix=/usr/local/nagios/ [root@bogon nrpe-2.15]# make all [root@bogon nrpe-2.15]# make install-plugin [root@bogon nrpe-2.15]# make install-daemon [root@bogon nrpe-2.15]# make install-daemon-config
[root@bogon nrpe-2.15]# ps -elf | wc -l //统计进程数 [root@bogon nrpe-2.15]# cat /proc/cpuinfo //查看cpu个数
[root@bogon nrpe-2.15]# vim /usr/local/nagios/etc/nrpe.cfg 修改: allowed_hosts=127.0.0.1,192.168.1.1 //添加监控服务器的IP地址
[root@bogon nrpe-2.15]# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d //启动nrpe [root@bogon nrpe-2.15]# netstat -anpt | grep nrpe //占用TCP的5666端口 tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN 112128/nrpe tcp 0 0 :::5666 :::* LISTEN 112128/nrpe
五、安装飞信通知报告插件
安装library_linux
[root@bogon ~]# tar -zxvf library_linux.tar.gz -C /usr/src/ libACE-5.6.8.so libACE_SSL-5.6.8.so libcrypto.so.0.9.8 libssl.so.0.9.8 [root@bogon ~]# cd /usr/src/ [root@bogon src]# mv libACE* libcrypto.so.0.9.8 libssl.so.0.9.8 /usr/lib
安装fetion
[root@bogon ~]# tar -zxvf fetion20091117-linux.tar.gz -C /usr/src/ fx/ fx/plugins/ fx/logs/ fx/libcrypto.so.4 fx/fetion fx/libssl.so.4 fx/commands/ fx/libeay32.dll fx/cache/ fx/done/ fx/libACE_SSL-5.7.2.so fx/libACE-5.7.2.so [root@bogon ~]# mkdir /usr/local/fetion [root@bogon ~]# cp -r /usr/src/fx/* /usr/local/fetion/
配置
[root@bogon ~]# vim /etc/ld.so.conf [root@bogon ~]# ldconfig [root@bogon ~]# cp /usr/src/fx/fetion /usr/local/fetion/ cp:是否覆盖"/usr/local/fetion/fetion"? y [root@bogon ~]# /usr/local/fetion/ -bash: /usr/local/fetion/: is a directory [root@bogon ~]# cd /usr/local/fetion/ [root@bogon fetion]# /usr/local/fetion/fetion --modile=123456789() --pwd=123.com --to=123456789 --mgs-utf8="test." --debug /usr/local/fetion/fetion: error while loading shared libraries: libACE-5.7.2.so: cannot open shared object file: No such file or directory [root@bogon fetion]# cp libACE* /usr/lib [root@bogon fetion]# cp libssl.so.4 /usr/lib [root@bogon fetion]# cp libcrypto.so.4 /usr/lib [root@bogon fetion]# yum -y install libstdc++ Loaded plugins: fastestmirror, refresh-packagekit, security Loading mirror speeds from cached hostfile Setting up Install Process Package libstdc++-4.4.7-4.el6.x86_64 already installed and latest version Nothing to do [root@bogon fetion]# yum -y install libstdc++.so.6
[root@bogon fetion]# yum -y install libgssapi_krb5.so.2
[root@bogon fetion]# yum -y install libz.so.1
[root@bogon fetion]# /usr/local/fetion/fetion --modile=13161595288 --pwd=123.com -
to=13161595288 --mgs-utf8="test." --debug
设置自动发送飞信
[root@bogon ~]# vim /usr/local/nagios/etc/objects/templates.cfg 添加: service_notification_commands notify-service-by-email,notify-service-by-fetion host_notification_commands notify-host-by-email,notify-host-by-fetion
[root@bogon ~]# vim /usr/local/nagios/etc/objects/commands.cfg 添加 define command{ command_name notify-host-by-fetion command_line /usr/local/fetion/fetion --modile=13161595288 --pwd=123.com - to=$CONTACTPAGER$ --msg-gb="Host $HOSTSTATE$ Alert for $HOSTNAME$! on '$LONGDATETIME$'" $CONTACTPAGER$ } define command{ command_name notify-service-by-fetion command_line /usr/local/fetion/fetion --modile=13161595288 --pwd=123.com - to=$CONTACTPAGER$ --msg-gb="$HOSTADDRESS$ $HOSTALIAS$ $SERVICEDESC$ is $SERVICESTATE$! on $LONGDATETIME$" $CONTACTPAGER$ }
[root@bogon ~]# vim /usr/local/nagios/etc/objects/contacts.cfg 添加: define contact{ contact_name nagiosadmin use generic-contact alias Nagios Admin email hehe@benet.com //用户邮箱 pager 13161595288 //飞信发送手机 }
[root@bogon ~]# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg [root@bogon ~]# service nagios restart [root@bogon ~]# firefox http://192.168.1.1/nagios &