• [原]工欲善其事,必先利其器,记一次处理Oracle Listener挂掉的处理过程


          国庆节凌晨0:36收到邮件报网通Tomcat服务器挂掉,Tomcat总是那么脆弱,重启是家常便饭了,但这次和往常不一样,WEB服务始终没有起来,马上重启整台服务器再启动Tomcat,故障依旧。

          马上查看Tomcat日志:

    2010-10-1 1:10:42 com.fsm.util.SearchDispatchFilter init
    信息: SearchDispatchFilter.init() done
    ***********************创建数据源********************
    createDataSource:DriverClassName=oracle.jdbc.driver.OracleDriver
    createDataSource:Url=jdbc:oracle:thin:@db_server:1521/ora8i
    ***************************************************

          过了3分钟,才出现久违“Server startup”:

    [FSM-ERROR]:2010-10-01 01:13:55,394-org.hibernate.util.JDBCExceptionReporter.logExceptions(LINE:78)
    Cannot create PoolableConnectionFactory (Io 异常: The Network Adapter could not establish the connection)
    org.tuckey.web.filters.urlrewrite.UrlRewriteFilter INFO: loaded (conf ok)
    2010-10-1 1:13:58 org.apache.coyote.http11.Http11BaseProtocol start
    信息: Starting Coyote HTTP/1.1 on http-80
    2010-10-1 1:13:58 org.apache.jk.common.ChannelSocket init
    信息: JK: ajp13 listening on /0.0.0.0:8009
    2010-10-1 1:13:58 org.apache.jk.server.JkMain start
    信息: Jk running ID=0 time=0/84  config=null
    2010-10-1 1:13:58 org.apache.catalina.storeconfig.StoreLoader load
    信息: Find registry server-registry.xml at classpath resource
    2010-10-1 1:13:58 org.apache.catalina.startup.Catalina start
    信息: Server startup in 205603 ms

          通常Tomcat也就是启动20来秒,这次却启动了200多秒,不太正常,再往上看出现了“JDBC”和“The Network Adapter could not establish the connection”,99%就是与数据库建立连接时候出现了问题,马上telnet数据库的1521端口,可连接是通的:

    [root@primary_node bin]# telnet db_server 1521
    Trying 192.168.4.20...
    Connected to db_server (192.168.4.20).
    Escape character is '^]'.
    aaaaa
    aaa
    aaaaa
    aaaaaa
    aaaaa

          上面那堆'”a”是我输入的,检查连接是否真的通了。

          于是将矛头指向配置文件和运行所需的jar包上,也没有发现最近被修改过的痕迹,将另一台服务器的配置文件和jar包copy过来,重启Tomcat,故障依旧。此时已是2:11分,此时该台网通Tomcat服务器的VIP早就被电信服务器抢过去了,服务没有受到影响,此时精神上有点吃不消了,赶紧睡觉去,获取周公能给我灵感。

          早上7:25分醒来,第一件事睁开眼睛,第二件事登录到服务器上继续调试问题,首先将数据库IP指向测试数据库,启动Tomcat,WEB服务跑起来了,这个现象再次让我把矛头指向Oracle服务器,由于服务器上没有安装Oracle的客户端只能通过telnet这种原始的方式进行测试,telnet测试的Oracle服务器:

    [root@primary_node bin]# telnet test_db_server 1521
    Trying 192.168.4.74...
    Connected to db_server (192.168.4.74).
    Escape character is '^]'.
    aaaaa
    aaa
    Connection closed by foreign host.

          奇怪了,测试用的Oracle服务器会主动close连接,而生产用的Oracle服务器不会主动close连接,终于找到突破口了。 登录到Oracle服务器,查看以下listener的状态:

    [oracle@wz_oracle1 adump]$ lsnrctl status 
    
    LSNRCTL for Linux: Version 10.2.0.2.0 - Production on 01-OCT-2010 09:08:09
    
    Copyright (c) 1991, 2005, Oracle.  All rights reserved.
    
    Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.4.20)(PORT=1521)))

           Hang住,取消,直接  lsnrctl stop ,还是Hang住,出必杀 killall –9 tnslsnr,再启动:

    [oracle@wz_oracle1 adump]$ lsnrctl start 
    
    LSNRCTL for Linux: Version 10.2.0.2.0 - Production on 01-OCT-2010 09:10:17
    
    Copyright (c) 1991, 2005, Oracle.  All rights reserved.
    
    Starting /u01/app/oracle/bin/tnslsnr: please wait...
    
    TNSLSNR for Linux: Version 10.2.0.2.0 - Production
    System parameter file is /u01/app/oracle/network/admin/listener.ora
    Log messages written to /u01/app/oracle/network/log/listener.log
    Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.4.20)(PORT=1521)))
    Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.4.30)(PORT=1521)))
    
    Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=192.168.4.20)(PORT=1521)))
    STATUS of the LISTENER
    ------------------------
    Alias                     LISTENER
    Version                   TNSLSNR for Linux: Version 10.2.0.2.0 - Production
    Start Date                01-OCT-2010 09:10:17
    Uptime                    0 days 0 hr. 0 min. 0 sec
    Trace Level               off
    Security                  ON: Local OS Authentication
    SNMP                      OFF
    Listener Parameter File   /u01/app/oracle/network/admin/listener.ora
    Listener Log File         /u01/app/oracle/network/log/listener.log
    Listening Endpoints Summary...
      (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.4.20)(PORT=1521)))
      (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.4.30)(PORT=1521)))
    Services Summary...
    Service "PLSExtProc" has 1 instance(s).
      Instance "PLSExtProc", status UNKNOWN, has 1 handler(s) for this service...
    Service "ora8i" has 1 instance(s).
      Instance "ora8i", status UNKNOWN, has 1 handler(s) for this service...
    Service "shdb" has 1 instance(s).
      Instance "ora8i", status UNKNOWN, has 1 handler(s) for this service...
    The command completed successfully

          搞定,重启Tomcat,一切恢复正常。

          这次故障处理时间长达3小时,最开始以为是一个普通的故障,重启一下Tomcat就能完事,但没那么简单,然后被 telnet 的现象所迷惑,走入了一条长长的弯路,“工欲善其事,必先利其器”如果当时找一台服务器用sqlplus 远程连一下数据库就不用折腾那么久了。

  • 相关阅读:
    ThinkPHP教程_PHP框架之ThinkPHP(三)【系统常量与模板替换】
    ThinkPHP教程_PHP框架之ThinkPHP(二)【URL路径访问与模块控制器、URL四种模式、PATHINFO的两种模式、模板与控制器之间的关系】
    ThinkPHP教程_PHP框架之ThinkPHP(一)【入门和介绍、ThinkPHP版本和文件夹规范、项目入口文件】
    Markdown,技术简历编辑的一缕清风...
    Zend Studio 13.5的安装、破解和汉化
    原生JS配合AJAX完成省市县三级联动(带数据库、PHP+Smarty)
    DAL与数据库类型的对应关系
    mysql 学习
    关于MySQL的分区(partion)
    Response.Write,Page.RegisterClientScriptBlock和Page.RegisterStartupScript的区别
  • 原文地址:https://www.cnblogs.com/killkill/p/1839905.html
Copyright © 2020-2023  润新知