Nagios各组件简述及nrpe详解

Nagios各组件简述及nrpe详解

一．Nagios各组件简述

由于nagios配置较为繁锁，且里面组件也较多，这里我将几个关键的组件列举一下，且做一下简单的介绍及其和其它组件间的关系的描述。我自己的一些理解，如有偏差，欢迎指正！

我在部署过程中主要用到了以下组件： nagios-3.2.3.tar.gz，nagios-plugins-1.4.15.tar.gz，ndoutils-1.4b7.tar.gz，nrpe-2.12.tar.gz。

这些组件都是干什么的呢？

1． nagios-3.2.3.tar.gz是nagios的主要组件，里面包括了各种配置文件；

2． nagios-plugins-1.4.15.tar.gz是nagios的插件，里面提供了各种监控模板及监控命令，如check_tcp等等有很多常用的监控对象都可以使用这些模式，当然也可以自己编写脚本来实现，这一点上nagios是非常灵活的；

3． ndoutils-1.4b7.tar.gz，利用它将nagios的监控信息存入mysql数据库；

4． nrpe-2.12.tar.gz是一款用来监控被控端主机资源的工具，没有它，nagios将无法对被控端服务器的主机资源进行监控！

以上是一些主要的组件，还有一些比较重要的组件，如：NSClient-0.3.8-Win32.zip(被控端为win操作系统时要安装)，npc （主要用于cacti与nagios整合时，可用于将nagios的监控数据导给cacti）

关系也大致屡清了，上文讲过部署nagios,本文将不在嫯述了，下面将nrpe的部署过程详细整理一下！

二．Nrpe详解

1.先用表格列举一下我的监控对象和阀值：

监控对象		监控阀值
主机资源	主机存活： check_ping	-w 3000.0,80% -c 5000.0,100% -p 5(3000毫秒响应时间内，丢包率超过80%报警告，5000毫秒响应时间内，丢包率超过 100%报危急，一共发送5个包）
	登录用户： check_user	-w 5 -c 10(w为警告，c为危急)
	系统负载： check_load	-w 15,10,5 -c 30,25,20(1分钟，5分钟，15分钟大于对应的等待进程数则警告或危急)
	磁盘占用率： check_disk	-w 20% -c 10% -p /（根分区剩余空间为总大小的20%警告， 10%危急，-p后是根分区）
	脚本检测磁盘I/O： check_iostat	-w 5 –c 10 (磁盘I/O的iowait超过5%报警告,超过10%报危急)
	检测僵尸进程： check_zombie _procs	-w 5 -c 10 -s Z（有5个僵尸进程报警告，10个报危急）
	检测总进程数： check_total_procs	-w 150 -c 200（总进程到150个警告，200个报危急）
	脚本检测内存剩余： check_mem	-w 90% -c 95%(内存空闲率90%以上报警告，95%以上报危急)
	检测交换分区使用率： check_swap	-w 20% -c 10%（交换分区剩余空间为总大小的20%警告， 10%危急）
应用服务监控	监控服务端口： check_tcp	-H localhost2 -p 80(主机与对应的端口号)
	监控页面响应时间： check_http	-H localhost2 -u http://localhost2/test.jsp –w 5 –c 10(检查页面，超过5s报警告，超过10s报危急)
	脚本检测IP连接数： check_ips	-w 200 –c 250(IP连接数超过200报警告，超过250报危急)
流量监控	监控server流量: Check_traffic	-V 2c -C public -H localhost2 -I 2 -w 12,30 -c 15,35 -M –b(snmp版本,用户,主机,对应网卡,警告阀值,危急阀值)

数据库的监控以后再补上！

2.安装过程

1）主控端

主控端上也要安装nrpe,因为需要它的check_nrpe来监控远程主机：

tar zxf nrpe-2.12.tar.gz

cd nrpe-2.12

./configure

make all

make install-plugin

只运行这一步就行了,因为只需要check_nrpe 插件。

在主控端的vim /usr/local/nagios/etc/object/commands.cfg中添加：

#################################################################

# 'check_nrpe ' command definition

define command{

command_name check_nrpe

command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$

}

##################################################################

2）被控端

被控端上首先要安装nagios-plugins-1.4.15.tar.gz，再安装nrpe-2.12.tar.gz。

增加用户：

useradd nagios

安装nagios插件：

tar fvxz nagios-plugins-1.4.15.tar.gz

./configure --with-nagios-user=nagios --with-nagios-group=nagios --enable-redhat-pthread-workaround --prefix=/usr/local/nagios

make

make install

chown -R nagios.nagios /usr/local/nagios

安装nrpe：

tar fvxz nrpe-2.12.tar.gz

cd nrpe-2.12

./configure

make all

make install-plugin

make install-daemon

make install-daemon-config

找到vim /usr/local/nagios/etc/nrpe.cfg

里面有一些默认的模板了：

# The following examples use hardcoded command arguments...

command[check_users]=/opt/nagios/libexec/check_users -w 5 -c 10

command[check_load]=/opt/nagios/libexec/check_load -w 15,10,5 -c 30,25,20

command[check_hda1]=/opt/nagios/libexec/check_disk -w 20 -c 10 -p /dev/hda1

command[check_zombie_procs]=/opt/nagios/libexec/check_procs -w 5 -c 10 -s Z

command[check_total_procs]=/opt/nagios/libexec/check_procs -w 150 -c 200

这些命令是由主控端check_nrpe来执行来远程监控主机资源！我们可以修改这些选项，还可以添加一些自己想监控的东西，比如自己写的脚本等！

下面是我修改后的配置，只简单列了下，以供参考：

# The following examples use hardcoded command arguments...

command[check_users]=/usr/local/nagios/libexec/check_users -w 5 -c 10

command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20

command[check_hda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/hda1

command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z

command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 150 -c 200

command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%

command[check_disk]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /

command[check_ips]=/usr/local/nagios/libexec/ip_conn.sh 200 250

command[check_mem]=/usr/local/nagios/libexec/check_mem.sh -w 90% -c 95%

command[check_iostat]=/usr/local/nagios/libexec/check_iostat -w 5 -c 10

command[check_traffic]=/usr/local/nagios/libexec/check_traffic.sh -V 2c -C public -H localhost2 -I 2 -w 12,30 -c 15,35 -M –b

注意还要在前面设置给监控主机权限用以监控：

allowed_hosts=127.0.0.1,192.168.175.200

完成后，启动nrpe：/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg –d

最后在主控端添加要监控的服务如：

vim /usr/local/nagios/etc/object/services.cfg

define service{

host_name localhost2

service_description check-tcp-8080

check_command check_tcp!8080

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24x7

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

contact_groups sagroup

}

define service{

host_name localhost2

service_description check-http

check_command check_http!http://localhost2/test.jsp!5!10

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24x7

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

contact_groups sagroup

}

如果在/usr/local/nagios/libexec中已有的命令,那直接在被控端nrpe.cfg中添加命令,并在主控端的services.cfg中添加服务即可!

但上面有一些监控对象不是安装的nagios-plugins插件里自带的，是我在网上找的一些脚本，这些脚本怎么配置的呢？用监控服务的IP连接数来举个例子吧！

1．放在/usr/local/nagios/libexec里；

如：vim ip_conn.sh

#!/bin/sh

#if [ $# -ne 2 ]
#then
# echo "Usage:$0 -w num1 -c num2"
#exit 3
#fi
ip_conns=`netstat -an | grep tcp | grep EST | wc -l`
if [ $ip_conns -lt $1 ];
then
echo "OK -connect counts is $ip_conns"
exit 0
fi
if [ $ip_conns -gt $1 -a $ip_conns -lt $2 ];
then
echo "Warning -connect counts is $ip_conns"
exit 1
fi
if [ $ip_conns -gt $2 ];
then
echo "Critical -connect counts is $ip_conns"
exit 2
fi

2．修改所有者及其权限；

如：

3．可先执行一下脚本看脚本是否能正常工作；

如：

4．在被控端的nrpe.cfg中添加命令；

如：command[check_ips]=/usr/local/nagios/libexec/ip_conn.sh 200 250

5．在主控端的services.cfg中添加监控的服务；

如：define service{

host_name localhost2

service_description check-connect-count

check_command check_nrpe!check_ips

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24x7

notification_interval 10

notification_period 24x7

notification_options w,u,c,r

contact_groups sagroup

}

6．重启nrpe；

kllall nrpe

/usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d

7．重启nagios；

如：service nagios restart

8．在主控端用命令测试一下，看是否能正常监控；

[root@localhost libexec]# /usr/local/nagios/libexec/check_nrpe -H localhost2 -c check_ips

OK -connect counts is 13

9．在nagios的web展示页上查看是否正常显示。

如：

我这里是截取的一个cacti&nagios整合的界面，呵呵！

上面列举的别的脚本我就不一一展示了，先写到这吧，下次再写关于cacti的blog!

引自：http://www.cnblogs.com/JemBai/archive/2012/04/10/2440075.html

相关阅读:
发布(Windows)
Parallel并行编程
 query通用开源框架
 深入了解三种针对文件（JSON、XML与INI）的配置源
 GitLab CI
雅思创始人Keith Taylor谈英语学习
 查看内存使用情况
 Reverse String
分布式消息系统jafka快速起步（转）
深入浅出消息队列 ActiveMQ（转）
原文地址：https://www.cnblogs.com/taowang2016/p/3265906.html