早上收到502报警,设置的报警规则是502错误两分钟超过500就报警。
排障流程:
日志分析系统报障-->查看日志系统日志-->nginx错误日志-->php错误日志-->php-fpm.log日志
在日志分析系统里面看到产生502报警的机器只有一台xxx.xxx.xxx.170,客户端IP也只有一个,说明不是大规模故障。
连接到170服务器查看nginx错误日志,看到Connection reset by peer,连接被对方重置,说明php关掉了这个连接
2016/12/29 08:55:04 [error] 1328#0: *766316197 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 123.206.95.156, server: jifen.51.com, request: "GET /center/index HTTP/1.0", upstream: "fastcgi://127.0.0.1:9000", host: "jifen.51.com", referrer: "http://jifen.51.com/" 2016/12/29 08:55:04 [error] 1328#0: *766316178 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 123.206.95.156, server: jifen.51.com, request: "GET /center/index HTTP/1.0", upstream: "fastcgi://127.0.0.1:9000", host: "jifen.51.com", referrer: "http://jifen.51.com/" 2016/12/29 08:55:04 [error] 1328#0: *766316153 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 123.206.95.156, server: jifen.51.com, request: "GET /center/index HTTP/1.0", upstream: "fastcgi://127.0.0.1:9000", host: "jifen.51.com", referrer: "http://jifen.51.com/" 2016/12/29 08:55:04 [error] 1328#0: *766316055 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 123.206.95.156, server: jifen.51.com, request: "GET /center/index HTTP/1.0", upstream: "fastcgi://127.0.0.1:9000", host: "jifen.51.com", referrer: "http://jifen.51.com/"
继续查看日志,看php的错误日志,里面什么也没有。
然后查看php-fpm.log日志文件,看到index.php脚本执行超时,都是30多秒。
[29-Dec-2016 08:55:09] WARNING: [pool www] child 22751 exited on signal 15 (SIGTERM) after 10765.182746 seconds from start [29-Dec-2016 08:55:09] NOTICE: [pool www] child 10703 started [29-Dec-2016 08:55:09] WARNING: [pool www] child 10257 exited on signal 15 (SIGTERM) after 10287.255584 seconds from start [29-Dec-2016 08:55:09] NOTICE: [pool www] child 10705 started [29-Dec-2016 08:55:11] WARNING: [pool www] child 11311, script '/opt/wwwroot/jifen.51.com/www/index.php' (request: "GET /center/index") execution timed out (31.152266 sec), terminating [29-Dec-2016 08:55:11] WARNING: [pool www] child 1432, script '/opt/wwwroot/jifen.51.com/www/index.php' (request: "GET /center/index") execution timed out (31.370696 sec), terminating [29-Dec-2016 08:55:11] WARNING: [pool www] child 15998, script '/opt/wwwroot/jifen.51.com/www/index.php' (request: "GET /center/index") execution timed out (30.236601 sec), terminating [29-Dec-2016 08:55:11] WARNING: [pool www] child 17886, script '/opt/wwwroot/jifen.51.com/www/index.php' (request: "GET /center/index") execution timed out (31.273963 sec), terminating [29-Dec-2016 08:55:11] WARNING: [pool www] child 11019, script '/opt/wwwroot/jifen.51.com/www/index.php' (request: "GET /center/index") execution timed out (30.752849 sec), terminating [29-Dec-2016 08:55:11] WARNING: [pool www] child 19115, script '/opt/wwwroot/jifen.51.com/www/index.php' (request: "GET /center/index") execution timed out (31.055392 sec), terminating [29-Dec-2016 08:55:11] WARNING: [pool www] child 9978, script '/opt/wwwroot/jifen.51.com/www/index.php' (request: "GET /center/index") execution timed out (30.037908 sec), terminating
打开php-fpm.conf配置文件,里面设置的request_terminate_timeout = 30 设置的单一脚本执行超过30s就会被终止。
并且php.ini配置文件里max_execution_time = 60,设置单一脚本最多执行60s。
错误原因找到了,产生502 Bad Gateway的原因是这个脚本执行超时。鉴于是单一情况,只有一个用户出现502错误,不是所有用户连接都出现502,也为了安全及减轻服务器压力,未调大request_terminate_timeout值。