• ORA-04031错误导致宕机案例分析


    今天遇到一起ORACLE数据库宕机案例,下面是对这起数据库宕机案例的原因进行分析、解读。分析过程中顺便记录一下这个案例的前因后果,攒点经验值,培养一下分析、解决问题的能力。

     

    案例环境:

     

       操作系统 :Oracle Linux Server release 5.7 64 bit

       数据库版本:Oracle Database 10g Release 10.2.0.4.0 - 64bit Production

     

    案例分析:

    收到告警去检查数据库时,发现实例已经宕机。检查告警日志,发现下面错误信息:

    ORA-00604: error occurred at recursive SQL level 1
    ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select count(*) from sys.job...","sql area","tmp")
    Mon Nov  2 11:43:00 2015
    Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_cjq0_6571.trc:
    ORA-00604: error occurred at recursive SQL level 1
    ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select job, nvl2(last_date, ...","sql area","tmp")
    Mon Nov  2 11:43:00 2015
    Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_cjq0_6571.trc:
    ORA-00604: error occurred at recursive SQL level 1
    ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select count(*) from sys.job...","sql area","tmp")
    Mon Nov  2 11:43:05 2015
    Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_cjq0_6571.trc:
    ORA-00604: error occurred at recursive SQL level 1
    ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select job, nvl2(last_date, ...","sql area","tmp")
    Mon Nov  2 11:43:05 2015
    Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_cjq0_6571.trc:
    ORA-00604: error occurred at recursive SQL level 1
    ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select count(*) from sys.job...","sql area","tmp")
    Mon Nov  2 11:43:08 2015
    Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_reco_6569.trc:
    ORA-04031: unable to allocate 32 bytes of shared memory ("shared pool","select host,userid,password,...","sql area","tmp")
    Mon Nov  2 11:43:08 2015
    RECO: terminating instance due to error 4031
    Mon Nov  2 11:43:08 2015
    Errors in file /u01/app/oracle/admin/SCM2/bdump/scm2_pmon_6555.trc:
    ORA-04031: unable to allocate  bytes of shared memory ("","","","")
    Instance terminated by RECO, pid = 6569

    clip_image001

    从告警日志我们可以看到ORA-00604与ORA-04031错误导致了这次宕机事故(RECO: terminating instance due to error 4031):

    $ oerr ora 4031

    04031, 00000, "unable to allocate %s bytes of shared memory ("%s","%s","%s","%s")"

    // *Cause: More shared memory is needed than was allocated in the shared

    // pool.

    // *Action: If the shared pool is out of memory, either use the

    // dbms_shared_pool package to pin large packages,

    // reduce your use of shared memory, or increase the amount of

    // available shared memory by increasing the value of the

    // INIT.ORA parameters "shared_pool_reserved_size" and

    // "shared_pool_size".

    // If the large pool is out of memory, increase the INIT.ORA

    // parameter "large_pool_size".

     

    一般出现ORA-04031错误可能由两个原因引起:

    1:内存中存在大量碎片,导致在分配内存的时候,没有连续的内存可存放,此问题一般是需要在开发的角度上入手,比如增加绑定变量,减少硬解析来改善和避免;

    2.内存容量不足,需要扩大内存。

    这台机器分配的物理内存为8G,结果检查发现SGA只分配了1168M,不到2G,瞬时碉堡了。此时真是很无语。ASH Report分析宕机前后的Buffer Cache和Shared Pool大小如下所示。

     clip_image003

      clip_image004

    查看跟踪文件,可以看到SGA: allocation forcing component growth等待事件,可以确认的是由于SGA无法增长导致,也就是SGA被撑爆了,结合ASH Report我们可以看到当时Shared Pool的大小已经接近SGA的69.6%大小。

    SO: 0xa617d9c0, type: 4, owner: 0xa8a26c68, flag: INIT/-/-/0x00
      (session) sid: 932 trans: (nil), creator: 0xa8a26c68, flag: (51) USR/- BSY/-/-/-/-/-
                DID: 0001-000A-00000003, short-term DID: 0000-0000-00000000
                txn branch: (nil)
                oct: 0, prv: 0, sql: (nil), psql: (nil), user: 0/SYS
      last wait for 'SGA: allocation forcing component growth' blocking sess=0x(nil) seq=51324 wait_time=10714 seconds since wait started=0
              =0, =0, =0
      Dumping Session Wait History
       for 'SGA: allocation forcing component growth' count=1 wait_time=10714
              =0, =0, =0
       for 'SGA: allocation forcing component growth' count=1 wait_time=10512
              =0, =0, =0
       for 'latch: shared pool' count=1 wait_time=892
              address=600e7320, number=d6, tries=0
       for 'latch: shared pool' count=1 wait_time=28
              address=600e7320, number=d6, tries=0
       for 'latch: shared pool' count=1 wait_time=51
              address=600e7320, number=d6, tries=0
       for 'latch: shared pool' count=1 wait_time=114
              address=600e7320, number=d6, tries=0
       for 'latch: shared pool' count=1 wait_time=120
              address=600e7320, number=d6, tries=0
       for 'latch: library cache' count=1 wait_time=33
              address=a3fa46e8, number=d7, tries=1

    clip_image005

    结合上面的一些分析,可以断定SGA的不合理设置导致shared pool的内存被全部耗尽,SGA被撑爆了。于是调整SGA的参数才是解决问题的正确对策。另外考虑到这个数据库也正常运行了较长一段时间,也分析了一下awr、addm报告,发现系统的硬解析相当严重。另外通过下面脚本观察了一段时间shared pool的变化,发现其收缩、增长较频繁。

     
    SELECT start_time, 
           component, 
           oper_type, 
           oper_mode, 
           initial_size / 1024 / 1024 "INITIAL", 
           final_size / 1024 / 1024   "FINAL", 
           end_time 
    FROM   v$sga_resize_ops 
    WHERE  component IN ( 'DEFAULT buffer cache', 'shared pool' ) 
           AND status = 'COMPLETE' 
    ORDER  BY start_time, 
              component; 

    这个可以通过设置数据库参数SHARED_POOL_SIZE,保证SHARED_POOL_SIZE大小不会由于内存紧张而低于这个大小,另外可以设置SGA resize的时间间隔

    ALTER SYSTEM SET “_memory_broker_stat_interval”=n SCOPE=SPFILE;

    问题虽然解决了,但是真正需要反思的是为什么这个SGA_MAX_SIZE设置为1168M大小的事情!而且没有在巡检当中被发现。

     

    参考资料:

    http://blog.csdn.net/wenzhongyan/article/details/29866845

    http://blog.chinaunix.net/uid-20802110-id-4188357.html

    http://www.oraclefreebase.com/blog/2015/10/%E6%95%B0%E6%8D%AE%E5%BA%93ora-4031%E6%95%B0%E6%8D%AE%E5%BA%93crash/

     

  • 相关阅读:
    查询表中列转换为json
    查看死锁
    利用vba将excel中的图片链接直接转换为图片
    npoi与memcached中的ICSharpCode.SharpZipLib版本冲突的解决方案
    网页爬虫的一些笔记
    从远程服务器数据库中同步数据到本地数据库 sql server 2008 开启分布
    配置ST3在浏览器中打开
    在 sublime text 3 中添加 Emmet (ZenCoding)
    win2008远程桌面会话数增加
    20150728月度会议
  • 原文地址:https://www.cnblogs.com/kerrycode/p/4933357.html
Copyright © 2020-2023  润新知