• 有关Oracle Data Guard Failover 的说明


     

           在之前的两篇文章里都对oracle Data GuardFailover 进行了说明,但是没有个系统的说明,所以在这篇把DGFailover 做个系统的说明。

     

           物理Data Guard Failover Redo 的处理问题

           http://blog.csdn.net/tianlesoftware/archive/2010/11/05/5989638.aspx

     

           Oracle Data Guard Linux 平台 Physical Standby 搭建实例

           http://blog.csdn.net/tianlesoftware/archive/2010/04/30/5547565.aspx

     

           Failover 是失败切换。 这种情况下切换对redo 的处理,就显的很重要。如果处理好,就不会有数据丢失。 否则就会有数据丢失。

     

           Oracle 11g里,Data Guard 切换多了一个新的功能:flush redo

           Flush 能把没有发送的redo 从主库传送到standby库。 只要主库能启动到mount 状态,那么Flush 就可以把没有发送的归档和current online redo 发送到备库。

    Flush语法:

           SQL> ALTER SYSTEM FLUSH REDO TO target_db_name;

     

           这里的target_db_name 是我们在主库的db_unique_name 名称。 也就是在tnsnames.ora 文件配置的。 Flush 会将未发送的redo 从主库传到备库,并且等待redo standby 库上apply 之后返回成功。 所以只要Flush成功,那么Failover 就没有主句丢失。

     

           如果说我们的Primary 已经不能启动到mount 状态,那么就只能按照之前的方法来。 Oracle 10g 下就是这么操作的。

     

     

    . 正常的Failover

    1.1 检查Gap

           sql> select thread#, low_sequence#, high_sequence# from v$archive_gap;

     

    如果有,将对应的归档文件copy到备库,在注册它

           sql>alter database register physical logfile 'filespec1';

     

           注意: 如果有Gap存在,并且没有解决。 那么是不能正常的进行一个Failover 只能进行一个强制的Failover 这种情况下会有数据丢失。

           sql> alter database activate physical standby database;

     

     

    1.2 解决gap问题后,进行切换

     

    1.2.1 取消Apply

    SQL> recover managed standby database cancel;

     

    1.2.2 结束Apply

    1)在oracle 10gR2 或之后的版本:如果在备用库上有备用库日志文件

    SQL> alter database recover managed standby database finish; -- [force|wait|nowait]

          

           在执行这个命令的时候,如果主库和备库之间的网络中断了。 那么备库的RFS进程就会等待网络的连接,直到TCP超时。 因此在这种情况下,我们就需要加上Foce 关键字。

     

    2)在oracle 10gR2之前的版本:没有备库日志文件

    SQL> alter database recover managed standby database finish skip standby logfile;

           注意:如果执行了这条命令,就不能在进行recover standby database;

     

    1.2.3 将备库切换成主库

    SQL> alter database commit to switchover to primary;

    SQL> shutdown immediate;

    SQL> startup

     

    .  强行切换(激活)

    2.1 使用条件

           当我们正常切换的时候,提示我们需要介质恢复的时候,就需要使用强行激活standby 库。 如:

     

    SQL> ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY;

    ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY

    *

    ERROR at line 1:

    ORA-16139: media recovery required

     

    2.2 强行激活下的 redo 问题

           在这里需要说明一点,就是我们在主库commit 之后,然后shutdown abort 这时候,主库的online redo 会自动的写入备库的最后一个归档文件里(大小会发生变化)。 我们在恢复的时候需要对备库的最后一个归档文件进行重新的注册。

           sql>alter database register physical logfile 'filespec1';

     

           如果说,主库OS是整个宕机了。 这个时候,online redo 是不会发送到备库。所以我们需要手工的将主库的所有online redo copy到备库。 然后进行recover

     

    步骤如下:

    SQL> alter database recover managed standby database cancel;

    Database altered.

     

    SQL> recover standby database until cancel;

    ORA-00279: change 509016 generated at 11/05/2010 11:40:27 needed for thread 1

    ORA-00289: suggestion : /u01/archive/1_17_734225750.dbf

    ORA-00280: change 509016 for thread 1 is in sequence #17

    -- 默认情况下会提示需要归档17 实际上这个序列为17的归档还没有生成,我们忽略它,使用我们刚才copy过来的redo 日志来恢复。

    Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

    /u01/app/oracle/oradata/orcl/redo01.log   -- 注意, 这个位置是我手动写的

    Log applied.

    Media recovery complete.

     

           这里一次就搞定了。 实际上有三个redo,如果不确定使用哪个redo的,只能一个一个试。

     

           当我们使用了recover standby database until cancel之后,只能使用强制激活备库,如果使用正常模式,会提示我们需要:

           ORA-16139: media recovery required

     

    2.3 强制激活备库:

    sql> alter database recover managed standby database cancel;
    sql> recover standby database until cancel;

    sql>alter database activate standby database;
    sql>shutdown immediate;

    sql>startup

     

     

     

    .  Switchover

    3.1 主库操作:

    1)查看状态:

    sql>select switchover_status from v$database;

     

    2)切换

    sql> alter database commit to switchover to physical standby with session shutdown;

    SQL> shutdown immediate;

    SQL> startup;

    SQL> alter database mount standby database;

    SQL> recover managed standby database disconnect;

     

     

    3.2 备库操作:

    SQL> alter database commit to switchover to primary with session shutdown;

    SQL> shutdown immediate

    SQL> startup

     

     

     

    . Failover 过程的研究

     

    4.1 Failover 日志

     

    Thu Mar 17 15:01:47 2011

    alter database activate standby database

    Thu Mar 17 15:01:47 2011

    ALTER DATABASE ACTIVATE [PHYSICAL] STANDBY DATABASE (dave)

    --我们切换的时候,命令写全命令,db 自动补全了

    RESETLOGS after complete recovery through change 1255060

    Resetting resetlogs activation ID 808909668 (0x3036fb64)

    -- resetlogs. 这就以为着产生一个新的incarnation online redo 会被清空

    Online log /u01/app/oracle/oradata/dave/redo01.log: Thread 1 Group 1 was previously cleared

    Online log /u01/app/oracle/oradata/dave/redo02.log: Thread 1 Group 2 was previously cleared

    Online log /u01/app/oracle/oradata/dave/redo03.log: Thread 1 Group 3 was previously cleared

    Standby became primary SCN: 1255058

    Thu Mar 17 15:01:48 2011

    Setting recovery target incarnation to 3

    --修改incarnation版本

    Thu Mar 17 15:01:48 2011

    Converting standby mount to primary mount.

    --standby 转成 primary

    Thu Mar 17 15:01:48 2011

    ACTIVATE STANDBY: Complete - Database mounted as primary (dave)

    Completed: alter database activate standby database

    --完成active

    Thu Mar 17 15:01:59 2011

    Shutting down instance: further logons disabled

    --关闭实例

    Thu Mar 17 15:01:59 2011

    Stopping background process CJQ0

    Thu Mar 17 15:01:59 2011

    Stopping background process MMNL

    Thu Mar 17 15:01:59 2011

    Stopping background process MMON

    Thu Mar 17 15:01:59 2011

    Shutting down instance (immediate)

    License high water mark = 7

    Thu Mar 17 15:01:59 2011

    Stopping Job queue slave processes, flags = 7

    Thu Mar 17 15:01:59 2011

    Job queue slave processes stopped

    All dispatchers and shared servers shutdown

    Thu Mar 17 15:02:35 2011

    ARC1: Archival disabled due to shutdown: 1089

    Shutting down archive processes

    Archiving is disabled

    Thu Mar 17 15:02:45 2011

    ARCH shutting down

    ARC0: Archival stopped

    Thu Mar 17 15:02:50 2011

    ARCH shutting down

    ARC1: Archival stopped

    Thu Mar 17 15:07:04 2011

    SHUTDOWN: Active processes prevent shutdown operation

    Thu Mar 17 15:07:50 2011

    ALTER DATABASE CLOSE NORMAL

    Thu Mar 17 15:07:50 2011

    ORA-1109 signalled during: ALTER DATABASE CLOSE NORMAL...

    Thu Mar 17 15:07:50 2011

    ALTER DATABASE DISMOUNT

    Completed: ALTER DATABASE DISMOUNT

    ARCH: Archival disabled due to shutdown: 1089

    Shutting down archive processes

    Archiving is disabled

    Archive process shutdown avoided: 0 active

    ARCH: Archival disabled due to shutdown: 1089

    Shutting down archive processes

    Archiving is disabled

    Archive process shutdown avoided: 0 active

    Thu Mar 17 15:08:13 2011

    Starting ORACLE instance (normal)

    --开始重新启动实例

    LICENSE_MAX_SESSION = 0

    LICENSE_SESSIONS_WARNING = 0

    Picked latch-free SCN scheme 2

    Autotune of undo retention is turned on.

    IMODE=BR

    ILAT =18

    LICENSE_MAX_USERS = 0

    SYS auditing is disabled

    ksdpec: called for event 13740 prior to event group initialization

    Starting up ORACLE RDBMS Version: 10.2.0.4.0.

    System parameters with non-default values:

      processes                = 150

      __shared_pool_size       = 113246208

      __large_pool_size        = 4194304

      __java_pool_size         = 25165824

      __streams_pool_size      = 0

      nls_territory            = AMERICA

      sga_target               = 247463936

      control_files            = /u01/app/oracle/oradata/dave/control01.ctl, /u01/app/oracle/oradata/dave/control02.ctl, /u01/app/oracle/oradata/dave/control03.ctl

      db_block_size            = 8192

      __db_cache_size          = 100663296

      compatible               = 10.2.0.1.0

      log_archive_config       = dg_config=(dave_pd,dave_st)

      log_archive_dest_1       = location=/u01/archivelog valid_for=(all_logfiles,all_roles) db_unique_name=dave_st

      log_archive_dest_2       = service=dave_pd reopen=120 lgwr async valid_for=(online_logfiles,primary_role) db_unique_name=dave_pd

      log_archive_dest_state_1 = ENABLE

      log_archive_dest_state_2 = ENABLE

      standby_archive_dest     = /u01/archivelog

      fal_client               = dave_st

      fal_server               = dave_pd

      db_file_multiblock_read_count= 16

      standby_file_management  = AUTO

      undo_management          = AUTO

      undo_tablespace          = UNDOTBS1

      remote_login_passwordfile= EXCLUSIVE

      db_domain                =

      dispatchers              = (PROTOCOL=TCP) (SERVICE=daveXDB)

      job_queue_processes      = 10

      background_dump_dest     = /u01/app/oracle/admin/dave/bdump

      user_dump_dest           = /u01/app/oracle/admin/dave/udump

      core_dump_dest           = /u01/app/oracle/admin/dave/cdump

      audit_file_dest          = /u01/app/oracle/admin/dave/adump

      db_name                  = dave

      db_unique_name           = dave_st

      open_cursors             = 300

      pga_aggregate_target     = 81788928

    PMON started with pid=2, OS id=5909

    PSP0 started with pid=3, OS id=5911

    MMAN started with pid=4, OS id=5913

    DBW0 started with pid=5, OS id=5915

    LGWR started with pid=6, OS id=5917

    CKPT started with pid=7, OS id=5919

    SMON started with pid=8, OS id=5921

    RECO started with pid=9, OS id=5923

    CJQ0 started with pid=10, OS id=5925

    MMON started with pid=11, OS id=5927

    Thu Mar 17 15:08:14 2011

    starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...

    MMNL started with pid=12, OS id=5929

    Thu Mar 17 15:08:14 2011

    starting up 1 shared server(s) ...

    Thu Mar 17 15:08:15 2011

    ALTER DATABASE   MOUNT

    Thu Mar 17 15:08:19 2011

    Setting recovery target incarnation to 3

    Thu Mar 17 15:08:19 2011

    Successful mount of redo thread 1, with mount id 808884895

    Thu Mar 17 15:08:19 2011

    Database mounted in Exclusive Mode

    Completed: ALTER DATABASE   MOUNT

    Thu Mar 17 15:08:19 2011

    ALTER DATABASE OPEN

    Thu Mar 17 15:08:19 2011

    Assigning activation ID 808884895 (0x30369a9f)

    LGWR: STARTING ARCH PROCESSES

    ARC0 started with pid=16, OS id=5937

    Thu Mar 17 15:08:19 2011

    ARC0: Archival started

    ARC1: Archival started

    LGWR: STARTING ARCH PROCESSES COMPLETE

    ARC1 started with pid=17, OS id=5939

    LNS1 started with pid=18, OS id=5941

    Thu Mar 17 15:08:22 2011

    Thread 1 advanced to log sequence 2 (thread open)

    Thu Mar 17 15:08:23 2011

    ARC0: STARTING ARCH PROCESSES

    Thu Mar 17 15:08:23 2011

    ARC1: Becoming the 'no FAL' ARCH

    ARC1: Becoming the 'no SRL' ARCH

    Thu Mar 17 15:08:23 2011

    Thread 1 opened at log sequence 2

      Current log# 2 seq# 2 mem# 0: /u01/app/oracle/oradata/dave/redo02.log

    Successful open of redo thread 1

    Thu Mar 17 15:08:23 2011

    ******************************************************************

    LGWR: Setting 'active' archival for destination LOG_ARCHIVE_DEST_2

    ******************************************************************

    Thu Mar 17 15:08:23 2011

    ARC1: LGWR is actively archiving destination LOG_ARCHIVE_DEST_2

    Thu Mar 17 15:08:23 2011

    ARC2: Archival started

    ARC0: STARTING ARCH PROCESSES COMPLETE

    ARC0: Becoming the heartbeat ARCH

    ARC2 started with pid=19, OS id=5943

    Thu Mar 17 15:08:23 2011

    MTTR advisory is disabled because FAST_START_MTTR_TARGET is not set

    Thu Mar 17 15:08:23 2011

    SMON: enabling cache recovery

    Thu Mar 17 15:08:24 2011

    Successfully onlined Undo Tablespace 1.

    Dictionary check beginning

    Dictionary check complete

    Thu Mar 17 15:08:24 2011

    SMON: enabling tx recovery

    Thu Mar 17 15:08:24 2011

    Database Characterset is ZHS16GBK

    Opening with internal Resource Manager plan

    where NUMA PG = 1, CPUs = 1

    replication_dependency_tracking turned off (no async multimaster replication found)

    Starting background process QMNC

    QMNC started with pid=20, OS id=5945

    Thu Mar 17 15:08:26 2011

    LOGSTDBY: Validating controlfile with logical metadata

    Thu Mar 17 15:08:26 2011

    LOGSTDBY: Validation complete

    Completed: ALTER DATABASE OPEN

     

    4.2 Failover 的补充说明

           4.1 中看了Failover 的整个过程,DB 会进行一次resetlogs 这个是个很有意思的过程。

    1    resetlogs 会产生一个新的incarnation 这个会影响我们的RMAN 恢复。 我们查看一下:

           RMAN> list incarnation;

    List of Database Incarnations

    DB Key  Inc Key DB Name  DB ID            STATUS  Reset SCN  Reset Time

    ------- ------- -------- ---------------- --- ---------- ----------

    1       1       DAVE     808637274        PARENT  1          30-JUN-05

    2       2       DAVE     808637274        PARENT  446075     14-MAR-11

    3       3       DAVE     808637274        CURRENT 1255061    17-MAR-11

     

           这个时候,我们只能恢复incarnation 3之内的信息,如果要恢复到其他版本的信息,要保证对应备份集存在的同时,在使用reset database incarnation to 3或者其他的版本。 之后在恢复。

     

    2)看下归档日志

    先看备库:

    SQL> select max(sequence#) from v$archived_log;

     

    MAX(SEQUENCE#)

    --------------

                 6

     

    SQL> select sequence#,applied from v$archived_log;

     

     SEQUENCE# APP

    ---------- ---

             3 YES

             2 YES

             5 YES

             4 YES

             6 YES

     

    这个是重新开始的,没有什么问题。

     

    我们看下主库:

    SQL> select max(sequence#) from v$archived_log;

     

    MAX(SEQUENCE#)

    --------------

                88

     

    SQL> select sequence#,applied from v$archived_log;

     

     SEQUENCE# APP

    ---------- ---

             4 YES

             3 YES

             5 YES

             6 YES

             7 YES

             8 YES

                  ......

            82 YES

            83 YES

            84 YES

            85 YES

            86 YES

            87 YES

            88 YES

             2 NO

             2 YES  --注意这部分,有重新开始了。

             3 NO

             3 YES

             4 NO

             5 NO

             5 YES

             4 YES

             6 NO

             6 YES

     

           因为resetlogs 会重置sequence# 将其设置为1. 所以这里又重新开始了。 但是SCN 不会重置。 我们查看一下:

     

    SQL> select sequence#,first_change#,next_change# from v$log_history;

     

     SEQUENCE# FIRST_CHANGE# NEXT_CHANGE#

    ---------- ------------- ------------

             1        446075       451208

             2        451208       483347

             3        483347       485272

             4        485272       485277

             5        485277       486119

    .....

            83       1227667      1229252

            84       1229252      1252272

            85       1252272      1252277

            86       1252277      1252293

            87       1252293      1252294

            88       1252294      1253301

             1       1255061      1255062

    --sequence# 重新开始了,但是SCN 还是继续增加的。

             2       1255062      1257639

             3       1257639      1257644

             4       1257644      1265602

             5       1265602      1265607

             6       1265607      1265913

     

    94 rows selected.

     

    SQL>

     

           所以,这种情况下,查看同步情况还是有点不直观。 但是V$LOG_HISTORY V$ARCHIVED_LOG显示的log 历史信息是从控制文件中取得的,所以说,如果要删除以前的记录,只有重建控制文件了。

     

    3)重建控制文件

    SQL> shutdown immediate

    SQL> startup nomount;

    SQL>create controlfile reuse database dave noresetlogs archivelog

    LOGFILE

    GROUP 1 '/u01/app/oracle/oradata/dave/redo01.log',

    GROUP 2 '/u01/app/oracle/oradata/dave/redo02.log',

    GROUP 3 '/u01/app/oracle/oradata/dave/redo03.log'

    DATAFILE

    '/u01/app/oracle/oradata/dave/sysaux01.dbf',

    '/u01/app/oracle/oradata/dave/system01.dbf',

    '/u01/app/oracle/oradata/dave/undotbs01.dbf',

    '/u01/app/oracle/oradata/dave/users01.dbf'

    CHARACTER SET ZHS16GBK;

    --注意,使用的是noresetlogs,如果使用resetlogsDG 就需要重新搭建了。

    SQL> ALTER TABLESPACE TEMP ADD TEMPFILE '/u01/app/oracle/oradata/dave/temp01.dbf' size 100M;

    Tablespace altered.

    --添加临时表空间,在重建控制文件的时候,不能添加TEMP表空间,只能在控制文件重建好之后,在添加temp 表空间。

     

    更多信息参考:

           Oracle 控制文件

           http://blog.csdn.net/tianlesoftware/archive/2009/12/09/4974440.aspx

     

    4)在次验证归档信息

    主库:

    SQL> select max(sequence#) from v$archived_log;

    MAX(SEQUENCE#)

    --------------

                 9

    SQL> alter system switch logfile;

    System altered.

    SQL> select max(sequence#) from v$archived_log;

    MAX(SEQUENCE#)

    --------------

    10

    SQL> select sequence#,applied from v$archived_log;

     SEQUENCE# APP

    ---------- ---

             8 NO

             7 NO

             9 NO

             9 YES

            10 YES

            10 NO

    6 rows selected.

     

    SQL> select sequence#,first_change#,next_change# from v$log_history;

     

     SEQUENCE# FIRST_CHANGE# NEXT_CHANGE#

    ---------- ------------- ------------

             9       1267828      1268212

            10       1268212      1274690

     

           从这个结果来看,重建控制文件之后,之前的所有的有关归档的信息都会被删除。

     

    备库:

    SQL> select max(sequence#) from v$archived_log;

     

    MAX(SEQUENCE#)

    --------------

                10

     

    SQL> select sequence#,applied from v$archived_log;

     

     SEQUENCE# APP

    ---------- ---

             3 YES

             2 YES

             5 YES

             4 YES

             6 YES

             7 YES

             8 YES

             9 YES

            10 YES

     

    9 rows selected.

     

     

    说明:

           我这里是测试环境,所以重建控制文件测试一下,如果是生产环境,小心操作。

     

     

     

    ---------------------------------------------------------------------------------------------------

    Blog http://blog.csdn.net/tianlesoftware

    网上资源: http://tianlesoftware.download.csdn.net

    相关视频:http://blog.csdn.net/tianlesoftware/archive/2009/11/27/4886500.aspx

    DBA1 群:62697716(); DBA2 群:62697977() DBA3 群:62697850()  

    DBA 超级群:63306533(); DBA4 群: 83829929 DBA5群: 142216823

    聊天 群:40132017

    --加群需要在备注说明Oracle表空间和数据文件的关系,否则拒绝申请

    道森Oracle,国内最早、最大的网络语音培训机构,我们提供专业、优质的Oracle技术培训和服务! 我们的官方网站:http://www.daosenoracle.com 官方淘宝店:http://daosenpx.taobao.com/
  • 相关阅读:
    java实现第八届蓝桥杯生命游戏
    java实现第八届蓝桥杯生命游戏
    进程&线程(&java.lang.Thread)详解
    IDEA入门(1)--lombok和Junit generator2插件的运用
    Ecplise中Junit4单元测试的基本用法
    Java 并发工具箱之concurrent包
    JDK 8 中包列表及介绍
    java中URLEncode和URLDecode
    Mybatis中输出映射resultType与resultMap的区别
    ANSI编码方式转化为UTF-8方式
  • 原文地址:https://www.cnblogs.com/tianlesoftware/p/3609759.html
Copyright © 2020-2023  润新知