• ORA-609 错误分析及解决方法


    某个客户数据库在巡检的时候发现alert日志里不定期会出现ORA-609错误,大致内容如下:

    ***********************************************************************

    Fatal NI connect error 12537, connecting to:

     (LOCAL=NO)

    VERSION INFORMATION:

    TNS for HPUX: Version 11.2.0.3.0 - Production

    Oracle Bequeath NT Protocol Adapter for HPUX: Version 11.2.0.3.0 - Production

    TCP/IP NT Protocol Adapter for HPUX: Version 11.2.0.3.0 - Production

      Time: 19-OCT-2014 20:24:16

      Tracing not turned on.

      Tns error struct:

        ns main err code: 12537

        

    TNS-12537: TNS:connection closed

        ns secondary err code: 12560

        nt main err code: 0

        nt secondary err code: 0

        nt OS err code: 0

    opiodr aborting process unknown ospid (2734) as a result of ORA-609

    Sun Oct 19 21:27:24 2014

    ***********************************************************************

    由于ORA-609的缘故,ospid(xxxx)进程被aborting了,同时还伴随着TNS-12537的错误,连接关闭

    去MOS搜了一圈,正好有篇文档是针对这个错误的,下面是描述:

    适用于:

    Oracle Net Services - Version 11.2.0.1 to 11.2.0.3 [Release 11.2]
    Information in this document applies to any platform.

    症状:

    alert日志出现以上类似的内容(略)变化:

    Changes in database server load, client connect descriptor, changes in network infrastructure (firewall configuration).

    原因:

    首先,这个“opiodr aborting process unknown ospid (2734) as a result of ORA-609”消息仅仅是说明了由于ORA-609,使Oracle数据库专用进程被关闭了

    来看一段描述:

    ORA-609 means  "could not attach to incoming connection" so the database process was 'aborted' (closed) because it couldn't attach to the incoming connection passed to it by the listener.

    ORA-609意味着不能通过监听把它附加到即将到来的连接上,因此服务器进程被终止(关闭)

    The reason for this is found in the sqlnet error stack, in our case is:
       TNS-12537: TNS:connection closed.
    Basically the dedicated process didn't have a client connection anymore to work with.

    客户端连接有6个步骤:

    1. Client initiates a connection to the database so it connects to the listener
    2. Listener starts (fork) a dedicated database process that will receive this connection (session)
    3. After this dedicated process is started, the listener passes the connection from the client to this process
    4. The server process takes the connection from the listener to continue the handshake with the client
    5. Server process and client exchange information required for establishing a session (ASO, Two Task Common, User logon)
    6. Session is opened
    7. In the case of the above error the connection from the client was closed somewhere between 3. and 4. So when the dedicated process tries to communicate with the client it finds that connection closed.

    鉴于以上的错误,在第3步与第4步之间时,客户端连接就关闭了,此时当专有进程尝试与客户端连接时,发现连接已经关闭了

    一个客户端连接整个步骤:

    1. 客户端发起一个connection连接监听

    2. 监听启动一个专属进程(服务器进程,也就是我们通常说的LOCA=NO进程)用于接收这个connection

    3. 在专属进程启动之后,监听会将这个connection传递给这个专属进程

    4. 专属进程通过这个connection来跟客户端握手

    5. 专属进程跟客户端信息交换需要建立一个session

    6. session打开

    To determine the client which hit this problem we can try to match the timestamp of the error from alert log with an entry in listener.log, but this might be difficult in case of a loaded listener with many incoming connections per second.
    Server sqlnet trace will not provide any information about the client.
    去确定碰到问题的client,我们可以尝试去匹配alert日志中错误发生的时间戳并且在监听日志中也有相应的条目,但当加载的监听每秒有许多连接的时候是非常困难去判断的,服务器sqlnet的trace不会提供任何该客户端的信息

    We can enable sqlnet server trace to catch the error (the match is done based on the ospid found in sqlnet server trace file name and the line with ORA-609 error):

    还可以启用sqlnet server的trace中抓取到ORA-609错误,匹配成功基于sqlnet server trace文件名和ORA-609错误信息中的ospid

    nscon: doing connect handshake...
        nscon: recving a packet
        nsprecv: entry
        nsprecv: reading from transport...
        nttrd: entry
        nttrd: exit
        ntt2err: entry
        ntt2err: Read unexpected EOF ERROR on 15    <<<<<<< error
        ntt2err: exit
        nsprecv: error exit
        nserror: entry
        nserror: nsres: id=0, op=68, ns=12537, ns2=12560; nt[0]=507, nt[1]=0, nt[2]=0; ora[0]=0, ora[1]=0, ora[2]=0
        nscon: error exit
        nsdo: nsctxrnk=0
        nsdo: error exit
        nsinh_hoff: error recving request

    可能引起问原因:

    Several possible situations can cause this to happen:

    •     client changed its mind and closed the connection immediately after initiating it
    •     client crashed
    •     firewall kills the connection
    •     some oracle timeout set on client

    解决方案:

    Because the entry from listener.log contains only CONNECT_DATA and CID related information we need to check the client configuration for any sqlnet  timeouts:

    • possible timeouts in sqlnet.ora in client oracle home:

        sqlnet.outbound_connect_time
        sqlnet.recv_timeout
        sqlnet.send_timeout
        tcp_connect_timeout

    检查客户端目录中sqlnet.ora的超时设置,通常是这个引起的

    possible timeout in client connect descriptor (hardcoded in client application or in client tnsnames.ora):

        connect_timeout

    检查客户端应用或客户单tnsnames.ora中的超时参数

  • 相关阅读:
    KVM使用入门
    虚拟化技术之KVM
    MySQL初始化以及客户端工具的使用
    Python基础数据类型-字典(dict)
    Golang异常处理-panic与recover
    HTML&CSS基础-overflow
    MySQL数据类型以及基本使用详解
    HTML&CSS基础-display和visibility
    golang数据传输格式-序列化与反序列化
    Golang的文件处理方式-常见的读写姿势
  • 原文地址:https://www.cnblogs.com/shujuyr/p/14630654.html
Copyright © 2020-2023  润新知