• 11G RAC 11.2.0.1.0实例evict故障处理


    Aix 7.1

    参考文档:

    https://blogs.oracle.com/database4cn/rac

    Resolving ORA-481 and "terminating the instance due to error 481" (Doc ID 1950963.1)

    ORA-00481 After "The instance eviction reason is 0x2" due to Lack of Ticket (Doc ID 1644015.1)

    归根结底是RAC节点之间消息传输流量控制的问题,除网络、硬件原因外,需要打补丁解决。

    以下为第一次处理前具体的alertlog。

    ◆node1
    Wed May 24 01:59:24 2017
    Remote instance 2 kill is issued with system inc 228
    LMON received an instance eviction notification from instance 1
    The instance eviction reason is 0x2
    The instance eviction map is 2
    Reconfiguration started (old inc 228, new inc 230)
    List of instances:
    1 (myinst: 1)
    Wed May 24 01:59:27 2017
    Trace dumping is performing id=[cdmp_20170524015904]
    Global Resource Directory frozen
    * dead instance detected - domain 0 invalid = TRUE
    Communication channels reestablished
    Master broadcasted resource hash value bitmaps
    Non-local Process blocks cleaned out
    Wed May 24 01:59:28 2017
    LMS 1: 14 GCS shadows cancelled, 1 closed, 0 Xw survived
    Wed May 24 01:59:28 2017
    LMS 0: 23 GCS shadows cancelled, 0 closed, 0 Xw survived
    Wed May 24 01:59:28 2017
    LMS 2: 21 GCS shadows cancelled, 0 closed, 0 Xw survived
    Set master node info
    Submitted all remote-enqueue requests
    Dwn-cvts replayed, VALBLKs dubious
    All grantable enqueues granted
    Post SMON to start 1st pass IR
    Wed May 24 01:59:32 2017
    Instance recovery: looking for dead threads
    Beginning instance recovery of 1 threads
    Wed May 24 01:59:48 2017
    parallel recovery started with 32 processes
    Started redo scan
    Wed May 24 01:59:49 2017
    Submitted all GCS remote-cache requests
    Post SMON to start 1st pass IR
    Fix write in gcs resources
    Reconfiguration complete
    Wed May 24 02:00:00 2017
    Completed redo scan
    read 2094379 KB redo, 237345 data blocks need recovery
    Wed May 24 02:00:03 2017
    Reconfiguration started (old inc 230, new inc 232)
    List of instances:
    1 2 (myinst: 1)
    Global Resource Directory frozen
    Communication channels reestablished
    Master broadcasted resource hash value bitmaps
    Non-local Process blocks cleaned out
    Wed May 24 02:00:04 2017
    LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    Wed May 24 02:00:04 2017
    Wed May 24 02:00:04 2017
    LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    Set master node info
    Submitted all remote-enqueue requests
    Dwn-cvts replayed, VALBLKs dubious
    All grantable enqueues granted
    Post SMON to start 1st pass IR
    Wed May 24 02:02:27 2017
    ,
    ,此处大量12170
    ,
    Wed May 24 02:05:34 2017
    LMON (ospid: 35979900): terminating the instance due to error 481
    Wed May 24 02:05:34 2017
    System state dump is made for local instance
    System State dumped to trace file /u01/app/11.2.0/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_diag_40894656.trc
    Instance terminated by LMON, pid = 35979900
    Wed May 24 08:12:27 2017
    Starting ORACLE instance (normal)
    sskgpgetexecname failed to get name
    LICENSE_MAX_SESSION = 0
    LICENSE_SESSIONS_WARNING = 0
    Interface type 1 en1 192.168.0.0 configured from GPnP Profile for use as a cluster interconnect
    Interface type 1 en0 10.209.199.0 configured from GPnP Profile for use as a public interface
    Picked latch-free SCN scheme 3
    Using LOG_ARCHIVE_DEST_1 parameter default value as USE_DB_RECOVERY_FILE_DEST
    Autotune of undo retention is turned on.
    LICENSE_MAX_USERS = 0
    SYS auditing is disabled
    Starting up:
    Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
    With the Partitioning, Real Application Clusters, OLAP, Data Mining
    and Real Application Testing options.
    Using parameter settings in server-side pfile /u01/app/11.2.0/oracle/product/11.2.0/db_1/dbs/initorcl1.ora
    System parameters with non-default values:
    processes = 2000
    timed_statistics = TRUE
    sga_max_size = 70G
    spfile = "+DATA1/orcl/spfileorcl.ora"
    sga_target = 50G
    control_files = "+DATA1/orcl/controlfile/current.261.832429113"
    control_files = "+ARCDG/orcl/controlfile/current.259.832429113"
    db_block_size = 8192
    compatible = "11.2.0.0.0"
    log_archive_format = "%t_%s_%r.dbf"
    cluster_database = TRUE
    db_create_file_dest = "+DATA1"
    db_recovery_file_dest = "+ARCDG"
    db_recovery_file_dest_size= 2048G
    thread = 1
    undo_tablespace = "UNDOTBS1"
    undo_retention = 10800
    instance_number = 1
    remote_login_passwordfile= "NONE"
    db_domain = ""
    service_names = "orcl"
    dispatchers = "(PROTOCOL=TCP) (SERVICE=orclXDB)"
    local_listener = "(ADDRESS = (PROTOCOL = TCP)(HOST = 10.209.199.4)(PORT = 1521))"
    remote_listener = "rac-scan:1521"
    result_cache_max_size = 268736K
    audit_file_dest = "/u01/app/11.2.0/oracle/admin/orcl/adump"
    audit_trail = "DB"
    db_name = "orcl"
    open_cursors = 1000
    sql_trace = FALSE
    optimizer_index_caching = 90
    pga_aggregate_target = 20G
    deferred_segment_creation= FALSE
    aq_tm_processes = 5
    diagnostic_dest = "/u01/app/11.2.0/oracle"
    Deprecated system parameters with specified values:
    sql_trace
    End of deprecated system parameter listing
    Cluster communication is configured to use the following interface(s) for this instance
    192.168.0.1
    cluster interconnect IPC version:Oracle UDP/IP (generic)
    IPC Vendor 1 proto 2
    Wed May 24 08:12:32 2017
    PMON started with pid=2, OS id=35652038
    Wed May 24 08:12:32 2017
    VKTM started with pid=3, OS id=33489120 at elevated priority
    VKTM running at (10)millisec precision with DBRM quantum (100)ms
    Wed May 24 08:12:32 2017
    GEN0 started with pid=4, OS id=34209898
    Wed May 24 08:12:32 2017
    DIAG started with pid=5, OS id=36438538
    Wed May 24 08:12:33 2017
    DBRM started with pid=6, OS id=24576446
    Wed May 24 08:12:33 2017
    PING started with pid=7, OS id=29687920
    Wed May 24 08:12:33 2017
    PSP0 started with pid=8, OS id=36635296
    Wed May 24 08:12:33 2017
    ACMS started with pid=9, OS id=34341462
    Wed May 24 08:12:33 2017
    DIA0 started with pid=10, OS id=32047870
    Wed May 24 08:12:33 2017
    LMON started with pid=11, OS id=36045304
    Wed May 24 08:12:35 2017
    LMD0 started with pid=12, OS id=31391860
    Wed May 24 08:12:35 2017
    LMS0 started with pid=13, OS id=35979918 at elevated priority
    Wed May 24 08:12:35 2017
    LMS1 started with pid=14, OS id=22741294 at elevated priority
    Wed May 24 08:12:36 2017
    LMS2 started with pid=15, OS id=29229354 at elevated priority
    Wed May 24 08:12:36 2017
    RMS0 started with pid=16, OS id=20185780
    Wed May 24 08:12:36 2017
    LMHB started with pid=17, OS id=29425998
    Wed May 24 08:12:36 2017
    MMAN started with pid=18, OS id=9699954
    Wed May 24 08:12:36 2017
    DBW0 started with pid=19, OS id=31850686
    Wed May 24 08:12:36 2017
    DBW1 started with pid=20, OS id=36045442
    Wed May 24 08:12:36 2017
    DBW2 started with pid=21, OS id=40894536
    Wed May 24 08:12:36 2017
    DBW3 started with pid=22, OS id=29819564
    Wed May 24 08:12:36 2017
    DBW4 started with pid=23, OS id=36634890
    Wed May 24 08:12:36 2017
    LGWR started with pid=24, OS id=35783370
    Wed May 24 08:12:37 2017
    CKPT started with pid=25, OS id=35914258
    Wed May 24 08:12:37 2017
    SMON started with pid=26, OS id=32374796
    Wed May 24 08:12:37 2017
    RECO started with pid=27, OS id=35652280
    Wed May 24 08:12:37 2017
    RBAL started with pid=28, OS id=36438322
    Wed May 24 08:12:37 2017
    ASMB started with pid=29, OS id=39387144
    Wed May 24 08:12:37 2017
    MMON started with pid=30, OS id=2621900
    NOTE: initiating MARK startup
    Wed May 24 08:12:37 2017
    Starting background process MARKMMNL started with pid=31, OS id=35979732
    
    starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
    Wed May 24 08:12:37 2017
    MARK started with pid=32, OS id=35717688
    NOTE: MARK has subscribed
    starting up 1 shared server(s) ...
    lmon registered with NM - instance number 1 (internal mem no 0)
    Reconfiguration started (old inc 0, new inc 236)
    List of instances:
    1 2 (myinst: 1)
    Global Resource Directory frozen
    * allocate domain 0, invalid = TRUE
    Communication channels reestablished
    * domain 0 valid according to instance 2
    * domain 0 valid = 1 according to instance 2
    Master broadcasted resource hash value bitmaps
    Non-local Process blocks cleaned out
    LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    Set master node info
    Submitted all remote-enqueue requests
    Dwn-cvts replayed, VALBLKs dubious
    All grantable enqueues granted
    Wed May 24 08:12:45 2017
    Submitted all GCS remote-cache requests
    Fix write in gcs resources
    Reconfiguration complete
    Wed May 24 08:12:52 2017
    LCK0 started with pid=34, OS id=24379778
    Wed May 24 08:12:52 2017
    Starting background process RSMN
    Wed May 24 08:12:52 2017
    RSMN started with pid=36, OS id=30802026
    ORACLE_BASE from environment = /u01/app/11.2.0/oracle
    Wed May 24 08:12:53 2017
    ALTER DATABASE MOUNT
    Wed May 24 08:12:53 2017
    NOTE: Loaded library: System
    Wed May 24 08:12:53 2017
    SUCCESS: diskgroup DATA1 was mounted
    SUCCESS: diskgroup ARCDG was mounted
    Wed May 24 08:12:53 2017
    NOTE: dependency between database orcl and diskgroup resource ora.DATA1.dg is established
    NOTE: dependency between database orcl and diskgroup resource ora.ARCDG.dg is established
    Wed May 24 08:12:57 2017
    Successful mount of redo thread 1, with mount id 1472569957
    Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
    Lost write protection disabled
    Completed: ALTER DATABASE MOUNT
    Wed May 24 08:12:58 2017
    ALTER DATABASE OPEN
    Block change tracking file is current.
    Picked broadcast on commit scheme to generate SCNs
    Wed May 24 08:12:58 2017
    SUCCESS: diskgroup DATA2 was mounted
    NOTE: dependency between database orcl and diskgroup resource ora.DATA2.dg is established
    SUCCESS: diskgroup DATA3 was mounted
    NOTE: dependency between database orcl and diskgroup resource ora.DATA3.dg is established
    Thread 1 advanced to log sequence 255944 (thread open)
    Thread 1 opened at log sequence 255944
    Current log# 2 seq# 255944 mem# 0: +ARCDG/orcl/onlinelog/group_2.261.840376035
    Current log# 2 seq# 255944 mem# 1: +ARCDG/orcl/onlinelog/group_2.2113.840376039
    Current log# 2 seq# 255944 mem# 2: +ARCDG/orcl/onlinelog/group_2.2114.840376041
    Current log# 2 seq# 255944 mem# 3: +ARCDG/orcl/onlinelog/group_2.2116.840376043
    Successful open of redo thread 1
    MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
    Starting background process CTWR
    Wed May 24 08:12:59 2017
    CTWR started with pid=39, OS id=32768642
    Block change tracking service is active.
    Wed May 24 08:13:00 2017
    SMON: enabling cache recovery
    Successfully onlined Undo Tablespace 2.
    Verifying file header compatibility for 11g tablespace encryption..
    Verifying 11g file header compatibility for tablespace encryption completed
    SMON: enabling tx recovery
    Database Characterset is AL32UTF8
    No Resource Manager plan active
    Starting background process GTX0
    Wed May 24 08:13:07 2017
    GTX0 started with pid=42, OS id=35062526
    Starting background process RCBG
    Wed May 24 08:13:07 2017
    RCBG started with pid=43, OS id=27721944
    replication_dependency_tracking turned off (no async multimaster replication found)
    Wed May 24 08:13:08 2017
    Starting background process QMNC
    Wed May 24 08:13:08 2017
    QMNC started with pid=44, OS id=34013894
    Completed: ALTER DATABASE OPEN
    Wed May 24 08:13:17 2017
    Starting background process CJQ0
    Wed May 24 08:13:17 2017
    CJQ0 started with pid=41, OS id=11469004
    Wed May 24 08:13:43 2017
    Starting background process SMCO
    Wed May 24 08:13:43 2017
    SMCO started with pid=120, OS id=42336480
    ,
    ,
    ,
    Wed May 24 09:08:16 2017
    Errors in file /u01/app/11.2.0/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_j001_33161436.trc:
    ORA-12012: 鑎仟錨狮鎊骚琛^啼絕茕竈239 錨呛閊証
    ORA-00001: 杩^蒎^蟐湾^辕涓^犁害鎊較哝^荨浠(CM.UK_LTE_ZY_CHECK_HISTORY)
    ORA-06512: 錨塄 "CM.JOB_LTE_ZY_CHECK", line 27
    ORA-06512: 錨塄 line 1
    
    ◆node2
    Wed May 24 01:59:01 2017
    LMS1 (ospid: 26411086) received an instance eviction notification from instance 1 [2]
    Wed May 24 01:59:01 2017
    LMON received an instance eviction notification from instance 1
    The instance eviction reason is 0x2
    The instance eviction map is 2
    Wed May 24 01:59:04 2017
    PMON (ospid: 50725354): terminating the instance due to error 481
    Wed May 24 01:59:04 2017
    System state dump is made for local instance
    System State dumped to trace file /u01/app/11.2.0/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_diag_42336686.trc
    Wed May 24 01:59:04 2017
    ORA-1092 : opitsk aborting process
    Wed May 24 01:59:04 2017
    License high water mark = 516
    Instance terminated by PMON, pid = 50725354
    USER (ospid: 50987430): terminating the instance
    Instance terminated by USER, pid = 50987430
    Wed May 24 01:59:29 2017
    Starting ORACLE instance (normal)
    sskgpgetexecname failed to get name
    LICENSE_MAX_SESSION = 0
    LICENSE_SESSIONS_WARNING = 0
    Interface type 1 en1 192.168.0.0 configured from GPnP Profile for use as a cluster interconnect
    Interface type 1 en0 10.209.199.0 configured from GPnP Profile for use as a public interface
    Picked latch-free SCN scheme 3
    Using LOG_ARCHIVE_DEST_1 parameter default value as USE_DB_RECOVERY_FILE_DEST
    Autotune of undo retention is turned on.
    LICENSE_MAX_USERS = 0
    SYS auditing is disabled
    Starting up:
    Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
    With the Partitioning, Real Application Clusters, OLAP, Data Mining
    and Real Application Testing options.
    Using parameter settings in server-side pfile /u01/app/11.2.0/oracle/product/11.2.0/db_1/dbs/initorcl2.ora
    System parameters with non-default values:
    processes = 2000
    timed_statistics = TRUE
    sga_max_size = 70G
    spfile = "+DATA1/orcl/spfileorcl.ora"
    sga_target = 50G
    control_files = "+DATA1/orcl/controlfile/current.261.832429113"
    control_files = "+ARCDG/orcl/controlfile/current.259.832429113"
    db_block_size = 8192
    compatible = "11.2.0.0.0"
    log_archive_format = "%t_%s_%r.dbf"
    cluster_database = TRUE
    db_create_file_dest = "+DATA1"
    db_recovery_file_dest = "+ARCDG"
    db_recovery_file_dest_size= 2048G
    thread = 2
    undo_tablespace = "UNDOTBS2"
    undo_retention = 10800
    instance_number = 2
    remote_login_passwordfile= "NONE"
    db_domain = ""
    dispatchers = "(PROTOCOL=TCP) (SERVICE=orclXDB)"
    local_listener = "(ADDRESS = (PROTOCOL = TCP)(HOST = 10.209.199.5)(PORT = 1521))"
    remote_listener = "rac-scan:1521"
    result_cache_max_size = 268736K
    audit_file_dest = "/u01/app/11.2.0/oracle/admin/orcl/adump"
    audit_trail = "DB"
    db_name = "orcl"
    open_cursors = 1000
    sql_trace = FALSE
    optimizer_index_caching = 90
    pga_aggregate_target = 20G
    deferred_segment_creation= FALSE
    aq_tm_processes = 5
    diagnostic_dest = "/u01/app/11.2.0/oracle"
    Deprecated system parameters with specified values:
    sql_trace
    End of deprecated system parameter listing
    Cluster communication is configured to use the following interface(s) for this instance
    192.168.0.2
    cluster interconnect IPC version:Oracle UDP/IP (generic)
    IPC Vendor 1 proto 2
    Wed May 24 01:59:34 2017
    PMON started with pid=2, OS id=66584656
    Wed May 24 01:59:34 2017
    VKTM started with pid=3, OS id=66846744 at elevated priority
    VKTM running at (10)millisec precision with DBRM quantum (100)ms
    Wed May 24 01:59:34 2017
    GEN0 started with pid=4, OS id=26608090
    Wed May 24 01:59:34 2017
    DIAG started with pid=5, OS id=26083716
    Wed May 24 01:59:34 2017
    DBRM started with pid=6, OS id=24510972
    Wed May 24 01:59:34 2017
    PING started with pid=7, OS id=65077306
    Wed May 24 01:59:34 2017
    PSP0 started with pid=8, OS id=66781402
    Wed May 24 01:59:34 2017
    ACMS started with pid=9, OS id=66978040
    Wed May 24 01:59:34 2017
    DIA0 started with pid=10, OS id=66519050
    Wed May 24 01:59:34 2017
    LMON started with pid=11, OS id=66453694
    Wed May 24 01:59:37 2017
    LMD0 started with pid=12, OS id=23658758
    Wed May 24 01:59:37 2017
    LMS0 started with pid=13, OS id=66322614 at elevated priority
    Wed May 24 01:59:37 2017
    LMS1 started with pid=14, OS id=65798210 at elevated priority
    Wed May 24 01:59:37 2017
    LMS2 started with pid=15, OS id=15204812 at elevated priority
    Wed May 24 01:59:38 2017
    RMS0 started with pid=16, OS id=65732820
    Wed May 24 01:59:38 2017
    LMHB started with pid=17, OS id=65339438
    Wed May 24 01:59:38 2017
    MMAN started with pid=18, OS id=52298036
    Wed May 24 01:59:38 2017
    DBW0 started with pid=19, OS id=52232458
    Wed May 24 01:59:38 2017
    DBW1 started with pid=20, OS id=65273872
    Wed May 24 01:59:38 2017
    DBW2 started with pid=21, OS id=65208354
    Wed May 24 01:59:38 2017
    DBW3 started with pid=22, OS id=65994980
    Wed May 24 01:59:38 2017
    DBW4 started with pid=23, OS id=61210752
    Wed May 24 01:59:38 2017
    LGWR started with pid=24, OS id=6095298
    Wed May 24 01:59:38 2017
    CKPT started with pid=25, OS id=65470648
    Wed May 24 01:59:38 2017
    SMON started with pid=26, OS id=64946374
    Wed May 24 01:59:38 2017
    RECO started with pid=27, OS id=51642854
    Wed May 24 01:59:39 2017
    RBAL started with pid=28, OS id=64618728
    Wed May 24 01:59:39 2017
    ASMB started with pid=29, OS id=64553208
    Wed May 24 01:59:39 2017
    MMON started with pid=30, OS id=64422054
    NOTE: initiating MARK startup
    Wed May 24 01:59:39 2017
    MMNL started with pid=31, OS id=42336716
    Starting background process MARK
    starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
    Wed May 24 01:59:39 2017
    MARK started with pid=32, OS id=51577168
    NOTE: MARK has subscribed
    Wed May 24 01:59:39 2017
    starting up 1 shared server(s) ...
    lmon registered with NM - instance number 2 (internal mem no 1)
    Reconfiguration started (old inc 0, new inc 232)
    List of instances:
    1 2 (myinst: 2)
    Global Resource Directory frozen
    * allocate domain 0, invalid = TRUE
    Communication channels reestablished
    * domain 0 valid = 0 according to instance 1
    Master broadcasted resource hash value bitmaps
    Non-local Process blocks cleaned out
    LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    Set master node info
    Submitted all remote-enqueue requests
    Dwn-cvts replayed, VALBLKs dubious
    All grantable enqueues granted
    Wed May 24 01:59:48 2017
    Submitted all GCS remote-cache requests
    Wed May 24 02:05:10 2017
    Trace dumping is performing id=[cdmp_20170524020534]
    Wed May 24 02:05:18 2017
    Reconfiguration started (old inc 232, new inc 234)
    List of instances:
    2 (myinst: 2)
    Nested reconfiguration detected.
    Global Resource Directory frozen
    * dead instance detected - domain 0 invalid = TRUE
    Communication channels reestablished
    Master broadcasted resource hash value bitmaps
    Non-local Process blocks cleaned out
    Wed May 24 02:05:19 2017
    LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    Wed May 24 02:05:19 2017
    LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    Wed May 24 02:05:19 2017
    LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survived
    Set master node info
    Submitted all remote-enqueue requests
    Dwn-cvts replayed, VALBLKs dubious
    All grantable enqueues granted
    Post SMON to start 1st pass IR
    Submitted all GCS remote-cache requests
    Post SMON to start 1st pass IR
    Fix write in gcs resources
    Reconfiguration complete
    Wed May 24 02:05:25 2017
    LCK0 started with pid=34, OS id=51118396
    Wed May 24 02:05:25 2017
    Starting background process RSMN
    Wed May 24 02:05:25 2017
    RSMN started with pid=36, OS id=51773786
    ORACLE_BASE not set in environment. It is recommended
    that ORACLE_BASE be set in the environment
    Reusing ORACLE_BASE from an earlier startup = /u01/app/11.2.0/oracle
    Wed May 24 02:05:25 2017
    ALTER DATABASE MOUNT
    This instance was first to mount
    Wed May 24 02:05:26 2017
    NOTE: Loaded library: System
    Wed May 24 02:05:26 2017
    SUCCESS: diskgroup DATA1 was mounted
    Wed May 24 02:05:26 2017
    NOTE: dependency between database orcl and diskgroup resource ora.DATA1.dg is established
    SUCCESS: diskgroup ARCDG was mounted
    NOTE: dependency between database orcl and diskgroup resource ora.ARCDG.dg is established
    Wed May 24 02:05:30 2017
    Successful mount of redo thread 2, with mount id 1472569957
    Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)
    Lost write protection disabled
    Completed: ALTER DATABASE MOUNT
    ALTER DATABASE OPEN
    This instance was first to open
    Wed May 24 02:05:30 2017
    SUCCESS: diskgroup DATA2 was mounted
    NOTE: dependency between database orcl and diskgroup resource ora.DATA2.dg is established
    SUCCESS: diskgroup DATA3 was mounted
    NOTE: dependency between database orcl and diskgroup resource ora.DATA3.dg is established
    Block change tracking file is current.
    Beginning crash recovery of 2 threads
    parallel recovery started with 32 processes
    Started redo scan
    Wed May 24 02:05:44 2017
    Completed redo scan
    read 2158203 KB redo, 230150 data blocks need recovery
    Started redo application at
    Thread 1: logseq 255942, block 163388
    Thread 2: logseq 236286, block 2067920
    Recovery of Online Redo Log: Thread 1 Group 3 Seq 255942 Reading mem 0
    Mem# 0: +ARCDG/orcl/onlinelog/group_3.2085.840375427
    Mem# 1: +ARCDG/orcl/onlinelog/group_3.2084.840375445
    Mem# 2: +ARCDG/orcl/onlinelog/group_3.263.840375447
    Mem# 3: +ARCDG/orcl/onlinelog/group_3.2089.840375449
    Recovery of Online Redo Log: Thread 2 Group 7 Seq 236286 Reading mem 0
    Mem# 0: +ARCDG/orcl/onlinelog/group_7.2049.840374587
    Mem# 1: +ARCDG/orcl/onlinelog/group_7.2052.840374591
    Mem# 2: +ARCDG/orcl/onlinelog/group_7.2057.840374593
    Mem# 3: +ARCDG/orcl/onlinelog/group_7.2058.840374597
    Recovery of Online Redo Log: Thread 2 Group 5 Seq 236287 Reading mem 0
    Mem# 0: +ARCDG/orcl/onlinelog/group_5.2025.840374445
    Mem# 1: +ARCDG/orcl/onlinelog/group_5.2026.840374483
    Mem# 2: +ARCDG/orcl/onlinelog/group_5.2029.840374489
    Mem# 3: +ARCDG/orcl/onlinelog/group_5.2034.840374493
    Wed May 24 02:06:05 2017
    Completed redo application of 1702.07MB
    Completed crash recovery at
    Thread 1: logseq 255942, block 291036, scn 12495715727030
    Thread 2: logseq 236287, block 2742038, scn 12495715600520
    230150 data blocks read, 229989 data blocks written, 2158203 redo k-bytes read
    Thread 1 advanced to log sequence 255943 (thread recovery)
    Picked broadcast on commit scheme to generate SCNs
    Wed May 24 02:06:06 2017
    Thread 2 advanced to log sequence 236288 (thread open)
    Thread 2 opened at log sequence 236288
    Current log# 6 seq# 236288 mem# 0: +ARCDG/orcl/onlinelog/group_6.2040.840374575
    Current log# 6 seq# 236288 mem# 1: +ARCDG/orcl/onlinelog/group_6.2041.840374579
    Current log# 6 seq# 236288 mem# 2: +ARCDG/orcl/onlinelog/group_6.2042.840374581
    Current log# 6 seq# 236288 mem# 3: +ARCDG/orcl/onlinelog/group_6.2048.840374585
    Successful open of redo thread 2
    MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set
    Starting background process CTWR
    Wed May 24 02:06:07 2017
    CTWR started with pid=89, OS id=49349114
    Block change tracking service is active.
    Wed May 24 02:06:07 2017
    SMON: enabling cache recovery
    Successfully onlined Undo Tablespace 4.
    Verifying file header compatibility for 11g tablespace encryption..
    Verifying 11g file header compatibility for tablespace encryption completed
    SMON: enabling tx recovery
    Database Characterset is AL32UTF8
    No Resource Manager plan active
    Starting background process GTX0
    Wed May 24 02:06:11 2017
    GTX0 started with pid=98, OS id=42992042
    Starting background process RCBG
    Wed May 24 02:06:11 2017
    RCBG started with pid=52, OS id=40370648
    replication_dependency_tracking turned off (no async multimaster replication found)
    Starting background process QMNC
    Wed May 24 02:06:13 2017
    QMNC started with pid=119, OS id=47514086
    Completed: ALTER DATABASE OPEN
    SMON: Parallel transaction recovery tried
    Starting background process SMCO
    Wed May 24 02:06:17 2017
    SMCO started with pid=153, OS id=50724962
    Wed May 24 02:06:17 2017
    db_recovery_file_dest_size of 2097152 MB is 4.35% used. This is a
    user-specified limit on the amount of space that will be used by this
    database for recovery-related files, and does not reflect the amount of
    space available in the underlying filesystem or ASM diskgroup.
    Wed May 24 02:06:19 2017
    Starting background process CJQ0
    Wed May 24 02:06:19 2017
    CJQ0 started with pid=46, OS id=43581760
    Wed May 24 02:07:35 2017
    Thread 2 advanced to log sequence 236289 (LGWR switch)
    Current log# 7 seq# 236289 mem# 0: +ARCDG/orcl/onlinelog/group_7.2049.840374587
    Current log# 7 seq# 236289 mem# 1: +ARCDG/orcl/onlinelog/group_7.2052.840374591
    Current log# 7 seq# 236289 mem# 2: +ARCDG/orcl/onlinelog/group_7.2057.840374593
    Current log# 7 seq# 236289 mem# 3: +ARCDG/orcl/onlinelog/group_7.2058.840374597
    Wed May 24 02:08:09 2017
    Thread 2 advanced to log sequence 236290 (LGWR switch)
    Current log# 5 seq# 236290 mem# 0: +ARCDG/orcl/onlinelog/group_5.2025.840374445
    Current log# 5 seq# 236290 mem# 1: +ARCDG/orcl/onlinelog/group_5.2026.840374483
    Current log# 5 seq# 236290 mem# 2: +ARCDG/orcl/onlinelog/group_5.2029.840374489
    Current log# 5 seq# 236290 mem# 3: +ARCDG/orcl/onlinelog/group_5.2034.840374493
    Wed May 24 02:08:51 2017
    Thread 2 advanced to log sequence 236291 (LGWR switch)
    Current log# 6 seq# 236291 mem# 0: +ARCDG/orcl/onlinelog/group_6.2040.840374575
    Current log# 6 seq# 236291 mem# 1: +ARCDG/orcl/onlinelog/group_6.2041.840374579
    Current log# 6 seq# 236291 mem# 2: +ARCDG/orcl/onlinelog/group_6.2042.840374581
    Current log# 6 seq# 236291 mem# 3: +ARCDG/orcl/onlinelog/group_6.2048.840374585
    Wed May 24 02:09:30 2017
    Thread 2 advanced to log sequence 236292 (LGWR switch)
    Current log# 7 seq# 236292 mem# 0: +ARCDG/orcl/onlinelog/group_7.2049.840374587
    Current log# 7 seq# 236292 mem# 1: +ARCDG/orcl/onlinelog/group_7.2052.840374591
    Current log# 7 seq# 236292 mem# 2: +ARCDG/orcl/onlinelog/group_7.2057.840374593
    Current log# 7 seq# 236292 mem# 3: +ARCDG/orcl/onlinelog/group_7.2058.840374597

    针对这个RAC集群,做了加多的配置修正、优化,主要是如下所述。

    ◆数据库配置优化

    0、前期大量的日志切换无法完成,导致数据库挂起,为此新增了loggroup,并删掉无用多余的logmember。

    1、增大了ASM实例的内存,由350M--->2G

    2、RAC并行进程强制在本地实例分配

    3、为减少对性能的影响,关闭audit审计

    4、AWR 资料保存时间由7天增加到30天

    5、SGA增大10G至60G

    ◆修改网络相关的系统核心参数

    /usr/sbin/no -p -o tcp_sendspace=4194304

    /usr/sbin/no -p -o tcp_recvspace=4194304

    /usr/sbin/no -p -o rfc1323=1

    /usr/sbin/no -p -o sb_max=8388608

    /usr/sbin/no -p -o udp_ephemeral_low=9000

    /usr/sbin/no -p -o tcp_ephemeral_low=9000

    ◆修改内联网卡的参数

    chdev -l en1 -a tcp_sendspace=1048576 -a rfc1323=1 -a tcp_recvspace=1048576

    ◆SWAP

    物理内存大于 16GB,但 SWAP 空间低于 16 GB,不符合 Oracle 安装的标准建议。增大PS。chps -s 192 hd6

    ◆时区及时间确认,安全起见,暂不修改。

    ◆ntp确认,使用GI集群自身的时间同步,确保各个节点的时间同步,目前是几十秒的偏差,不是问题。

    ◆HA集群配置修改,当初安装集群时部分内容没有按照官方文档实施,本次做了补救,但不全面,可能仍存在未知的风险。

    5月25日重启后,经多一段时间的观察,有所改善,但于6月7日9点又发生一次节点2当。

    但此次已经不是节点的驱逐,说明上次的修正与配置优化起到了作用,初步断定与网络参数配置修改相关。日志如下,无evict日志。

    Wed Jun 07 09:02:57 2017
    Errors in file /u01/app/11.2.0/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_lms0_7930030.trc  (incident=1023882):
    ORA-00600: internal error code, arguments: [kjbrref:pkey], [3881577], [6], [7493495], [0], [], [], [], [], [], [], []
    Incident details in: /u01/app/11.2.0/oracle/diag/rdbms/orcl/orcl2/incident/incdir_1023882/orcl2_lms0_7930030_i1023882.trc
    Wed Jun 07 09:02:59 2017
    Trace dumping is performing id=[cdmp_20170607090259]
    Errors in file /u01/app/11.2.0/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_lms0_7930030.trc:
    ORA-00600: internal error code, arguments: [kjbrref:pkey], [3881577], [6], [7493495], [0], [], [], [], [], [], [], []
    LMS0 (ospid: 7930030): terminating the instance due to error 484
    Instance terminated by LMS0, pid = 7930030
    Wed Jun 07 09:14:01 2017
    Starting ORACLE instance (normal)

    针对此次的600内部错误,初步结论:

    5月25日优化重启后,6月7日又发生过1次节点down的情况,经排查,基本确认是oracle BUG,在某些情况下会发生节点down。目前数据库PSU版本11.2.0.1.0,需要打补丁到最新版本11.2.0.1.6。后续需择机打补丁。

  • 相关阅读:
    NoSQL学习1
    inno setup 软件打包
    cmapx 保存绘制好的图层
    qt之菜单栏的创建
    qt 软件打包
    可恶的 0xc0000005异常
    成长
    msChart组件安装与编程
    qt 工具下的dump工具导出文档出现异常解决方案
    qt 环境下mapx组件的鼠标跟踪
  • 原文地址:https://www.cnblogs.com/likingzi/p/7010949.html
Copyright © 2020-2023  润新知