场景:领导电话通知,我们的主站宕机了,到家后从另外一台机器上ssh一直处于等待状态,开始怀疑机器的负载比较高,
后查看监控机器,发现网卡、cpu、nginx连接数.....通通都没有数据了,显然不是负载高度问题了,应该是机器死机了,立刻通过ipmi重启机器
重启机器后,机器正常!
其实这个机器都正常运行大半年了,没啥问题!
查询/var/log/messages,发现大量的信息如下:
点击(此处)折叠或打开
- Mar 12 11:15:04 hy1 kernel: php-fpm: page allocation failure. order:1, mode:0x20
- Mar 12 11:15:04 hy1 kernel: php-fpm: page allocation failure. order:1, mode:0x20
- Mar 12 11:15:04 hy1 kernel: mysqld: page allocation failure. order:1, mode:0x20
- Mar 12 11:15:05 hy1 kernel: nginx: page allocation failure. order:1, mode:0x20
- Mar 12 11:15:05 hy1 kernel: nginx: page allocation failure. order:1, mode:0x20
- Mar 12 11:15:05 hy1 kernel: mysqld: page allocation failure. order:1, mode:0x20
- Mar 12 11:15:05 hy1 kernel: mysqld: page allocation failure. order:1, mode:0x20
- Mar 12 11:15:05 hy1 kernel: nginx: page allocation failure. order:1, mode:0x20
- Mar 12 11:15:05 hy1 kernel: nginx: page allocation failure. order:1, mode:0x20
- Mar 12 11:15:06 hy1 kernel: nginx: page allocation failure. order:1, mode:0x20
- Mar 12 11:15:09 hy1 kernel: nginx: page allocation failure. order:1, mode:0x20
- Mar 12 11:15:09 hy1 kernel: nginx: page allocation failure. order:1, mode:0x20
- Mar 12 11:15:09 hy1 kernel: mysqld: page allocation failure. order:1, mode:0x20
- Mar 12 11:15:10 hy1 kernel: mysqld: page allocation failure. order:1, mode:0x20
- Mar 12 11:15:11 hy1 kernel: mysqld: page allocation failure. order:1, mode:0x20
- Mar 12 11:15:11 hy1 kernel: mysqld: page allocation failure. order:1, mode:0x20
- Mar 12 11:15:11 hy1 kernel: mysqld: page allocation failure. order:1, mode:0x20
- Mar 12 11:15:11 hy1 kernel: mysqld: page allocation failure. order:1, mode:0x20
- Mar 12 11:15:11 hy1 kernel: mysqld: page allocation failure. order:1, mode:0x20
- Mar 12 11:17:33 hy1 kernel: swapper: page allocation failure. order:1, mode:0x20
- Mar 12 11:17:53 hy1 kernel: swapper: page allocation failure. order:1, mode:0x20
- Mar 12 11:17:53 hy1 kernel: swapper: page allocation failure. order:1, mode:0x20
- Mar 12 11:17:53 hy1 kernel: swapper: page allocation failure. order:1, mode:0x20
- Mar 12 11:17:53 hy1 kernel: swapper: page allocation failure. order:1, mode:0x20
- Mar 12 11:17:54 hy1 kernel: swapper: page allocation failure. order:1, mode:0x20
- Mar 12 11:17:54 hy1 kernel: swapper: page allocation failure. order:1, mode:0x20
- Mar 12 11:17:54 hy1 kernel: swapper: page allocation failure. order:1, mode:0x20
开始怀疑是系统的内存被吃光了,但通过检查监控,发现出问题的时候,内存还有蛮多可以用的! 当时的内存使用情况,见附件!
后来查到是内核的的一个bug
解决方法如下:
vi /etc/sysctl.conf
写入:
vm.zone_reclaim_mode = 1
sysctl -p 使其理解生效
原文:http://blog.chinaunix.net/uid-20776139-id-4155388.html