• ksvcreate: Process creation failed故障一例


    今天上午接到电话,用户反映一套10.2.0.4的数据库出现用户无法登录的症状,随即在家里连上VPN远程支持,登上主机以后尝试测试操作系统认证登录和远程登录,发现已经能够正常登录了;为了了解起因,去查看了告警日志alert.log,发现在11:00左右出现大量的ksvcreate:process creation failed错误,具体错误日志如下:
    Wed Mar 23 10:00:17 2011
    Process m000 died, see its trace file
    Wed Mar 23 10:00:17 2011
    ksvcreate: Process(m000) creation failed
    Wed Mar 23 10:01:18 2011
    Process m000 died, see its trace file
    Wed Mar 23 10:01:18 2011
    ksvcreate: Process(m000) creation failed
    Wed Mar 23 10:02:19 2011
    Process m000 died, see its trace file
    Wed Mar 23 10:02:19 2011
    ksvcreate: Process(m000) creation failed
    Wed Mar 23 10:54:41 2011
    Process P007 died, see its trace file
    Wed Mar 23 10:56:40 2011
    Process P007 died, see its trace file
    Wed Mar 23 10:57:40 2011
    Process P007 died, see its trace file
    Wed Mar 23 10:58:40 2011
    Process P007 died, see its trace file
    Wed Mar 23 10:59:18 2011
    Process J000 died, see its trace file
    Wed Mar 23 10:59:18 2011
    kkjcre1p: unable to spawn jobq slave process
    Wed Mar 23 10:59:18 2011
    Errors in file /oracle/app/oracle/admin/sdh/bdump/sdh_cjq0_717010.trc:
    
    Wed Mar 23 11:01:00 2011
    Process m000 died, see its trace file
    Wed Mar 23 11:01:00 2011
    ksvcreate: Process(m000) creation failed
    Wed Mar 23 11:02:01 2011
    Process m000 died, see its trace file
    Wed Mar 23 11:02:01 2011
    ksvcreate: Process(m000) creation failed
    Wed Mar 23 11:02:40 2011
    Process P007 died, see its trace file
    Wed Mar 23 11:03:02 2011
    Process m000 died, see its trace file
    Wed Mar 23 11:03:02 2011
    ksvcreate: Process(m000) creation failed
    Wed Mar 23 11:04:03 2011
    Process m000 died, see its trace file
    Wed Mar 23 11:04:03 2011
    ksvcreate: Process(m000) creation failed
    Wed Mar 23 11:04:40 2011
    Process P007 died, see its trace file
    ksvcreate: Process creation failed错误信息一般在Oracle实例在创建一些辅助后台进程(如mmon的子进程m00x或者并行子进程p00x等)时出现进程启动失败时出现,而造成该错误的可能性有多种,包括Oracle实例资源不足、操作系统资源不足等等。其中较为常见的是实例instance的process使用达到上限,可以通过查询v$resource_limit视图来了解实例生命周期内是否发生过process总数暴满的情况:
    SQL> select * from v$resource_limit;
    
    RESOURCE_NAME                  CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION   LIMIT_VALUE
    ------------------------------ ------------------- --------------- -------------------- --------------------
    processes                                       79             800        800                  800
    sessions                                        80             813        885                  885
    enqueue_locks                                   18             303      10910                10910
    enqueue_resources                               18             166       4112            UNLIMITED
    ges_procs                                        0               0          0                    0
    ges_ress                                         0               0          0            UNLIMITED
    ges_locks                                        0               0          0            UNLIMITED
    ges_cache_ress                                   0               0          0            UNLIMITED
    ges_reg_msgs                                     0               0          0            UNLIMITED
    ges_big_msgs                                     0               0          0            UNLIMITED
    ges_rsv_msgs                                     0               0          0                    0
    gcs_resources                                    0               0          0                    0
    gcs_shadows                                      0               0          0                    0
    dml_locks                                        0             153       3892            UNLIMITED
    temporary_table_locks                            0               3  UNLIMITED            UNLIMITED
    transactions                                  3125            3156        973            UNLIMITED
    branches                                         0               3        973            UNLIMITED
    cmtcallbk                                        0               2        973            UNLIMITED
    sort_segment_locks                               0               8  UNLIMITED            UNLIMITED
    max_rollback_segments                           19              43        973                65535
    max_shared_servers                               0               0  UNLIMITED            UNLIMITED
    parallel_max_servers                             6             122        120                 3600
    
    /* 可以看到processes的MAX_UTILIZATION最大使用数目曾到过LIMIT_VALUE限定的800,
       sessions也有类似的情况 */
    从以上V$resource_limit视图的输出来看,极有可能是processes总数达到上限导致了新的后台辅助进程创建失败,其实我们可以很方便地验证这一点:
    SQL> select * from v$version;
    
    BANNER
    ----------------------------------------------------------------
    Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Prod
    PL/SQL Release 10.2.0.1.0 - Production
    CORE	10.2.0.1.0	Production
    TNS for Linux: Version 10.2.0.1.0 - Production
    NLSRTL Version 10.2.0.1.0 - Production
    
    SQL> alter system set processes=20 scope=spfile;
    System altered.
    
    /* 将processes参数修改到一个较小值 */
    
    SQL> startup force;
    ORACLE instance started.
    
    SQL> select resource_name,MAX_UTILIZATION,LIMIT_VALUE from v$resource_limit where resource_name in ('processes','sessions');
    
    RESOURCE_NAME               MAX_UTILIZATION LIMIT_VALUE
    ------------------------------ --------------- ---------------
    processes                    20           20
    sessions                     19           27
    
     /* 从以上输出可以看到process总数曾经到达20的上限 */
    
    /* 观察告警日志可以看到相关ksvcreate: Process(m000) creation failed的记录 */
    
    [maclean@rh8 bdump]$ tail -20 alert_PROD1.log
    ksvcreate: Process(m000) creation failed
    Process m000 died, see its trace file
    Wed Mar 23 22:01:52 2011
    ksvcreate: Process(m000) creation failed
    Wed Mar 23 22:01:55 2011
    Process q000 died, see its trace file
    Wed Mar 23 22:01:55 2011
    ksvcreate: Process(q000) creation failed
    Wed Mar 23 22:02:07 2011
    Process q000 died, see its trace file
    Wed Mar 23 22:02:07 2011
    ksvcreate: Process(q000) creation failed
    Wed Mar 23 22:02:19 2011
    Process q000 died, see its trace file
    Wed Mar 23 22:02:19 2011
    ksvcreate: Process(q000) creation failed
    Wed Mar 23 22:02:31 2011
    Process q000 died, see its trace file
    Wed Mar 23 22:02:31 2011
    ksvcreate: Process(q000) creation failed
    不同于10g,在11g中类似的错误出现后会在告警日志中说明process creation即进程创建失败的具体原因,如进程总数达到上限,那么就会出现ORA-00020错误(maximum number of processes (%s) exceeded. All process state objects are in use.Increase the value of the PROCESSES initialization parameter),类似以下日志:
    ORA-00020: No more process state objects available
    ORA-20 errors will not be written to the alert log for
     the next minute. Please look at trace files to see all
     the ORA-20 errors.
    Process m001 submission failed with error = 20
    Process m002 submission failed with error = 20
    Process m003 submission failed with error = 20
    Process m003 submission failed with error = 20
    Process m002 submission failed with error = 20
    2011-03-23 22:10:07.037000 +08:00
    Process q001 submission failed with error = 20
    实际上ksvcreate: Process creation failed错误在能够了解其发生的root cause的情况下,并不难解决。在上例中我们可以清楚地了解到是因为数据库在实际运行中出现了processes进程总数达到参数设定上限从而导致问题出现,那么可以合理增加初始化参数processes或者通过修正异常频繁的程序客户端登录来解决该问题。
  • 相关阅读:
    面试范围
    Spark集群基础概念 与 spark架构原理
    RDD 算子补充
    常用Actoin算子 与 内存管理 、共享变量、内存机制
    常用Transformation算子
    spark core (二)
    Spark 介绍
    JSON对象和string的相互转换
    javascript中substring和substr方法
    mongodb学习相关网址
  • 原文地址:https://www.cnblogs.com/macleanoracle/p/2967714.html
Copyright © 2020-2023  润新知