• 测试一体机ASM failgroup的相关问题处理


    环境:3台虚拟机 RHEL 7.3 + Oracle RAC 11.2.0.4
    问题现象:RAC运行正常,ASM磁盘组Normal冗余,有failgroup整体故障,有failgroup配置错误。
    温馨提示:本文并不是市场上任何一款商业的一体机产品,只是我为了学习这类分布式存储架构自己模拟的实验环境,为了区分我暂时称之为xData吧_

    1.问题现象确认

    SQL> select group_number, name, total_mb, free_mb, USABLE_FILE_MB, offline_disks, state, type from v$asm_diskgroup;
    
    GROUP_NUMBER NAME                             TOTAL_MB    FREE_MB USABLE_FILE_MB OFFLINE_DISKS STATE                  TYPE
    ------------ ------------------------------ ---------- ---------- -------------- ------------- ---------------------- ----------
               1 CRS                                  2000       1170            585             0 MOUNTED                NORMAL
               2 DATA                                40960      35652           7586             0 MOUNTED                NORMAL
    
    SQL>  select group_number, disk_number, name, path, failgroup, mode_status, voting_file  from v$asm_disk order by 1, 2;
    
    GROUP_NUMBER DISK_NUMBER NAME                           PATH                    FAILGROUP            MODE_STATUS    VO
    ------------ ----------- ------------------------------ ----------------------- -------------------- -------------- --
               0           0                                /dev/CELL01-data2                            ONLINE         N
               0           1                                /dev/CELL01-data1                            ONLINE         N
               0           2                                /dev/CELL01-crs1                             ONLINE         Y
               1           1 CRS_0001                       /dev/CELL02-crs2        CRS_0001             ONLINE         Y
               1           2 CRS_0002                       /dev/CELL03-crs3        CRS_0002             ONLINE         Y
               2           0 DATA_0000                      /dev/CELL03-data1       DATA_0000            ONLINE         N
               2           1 DATA_0001                      /dev/CELL03-data2       DATA_0001            ONLINE         N
               2           2 DATA_0002                      /dev/CELL02-data1       CELL02               ONLINE         N
               2           3 DATA_0003                      /dev/CELL02-data2       CELL02               ONLINE         N
    
    9 rows selected.
    

    可以看到不但CELL01节点的所有盘被删除,而且CELL03节点的数据盘,failgroup目前也配置不正确!

    2.重新加入CELL01的盘

    由于时间超过默认的3.6h,offline的盘已经被删除,只有重新加入CELL01的盘才可以。
    alter diskgroup CRS add disk '/dev/CELL01-crs1';
    alter diskgroup DATA ADD FAILGROUP CELL01 disk '/dev/CELL01-data1', '/dev/CELL01-data2' rebalance power 5;
    

    直接这样加盘很可能会遇到下面这类错误,因为这些盘之前是被使用过的:

    SQL> alter diskgroup CRS add disk '/dev/CELL01-crs1';
    alter diskgroup CRS add disk '/dev/CELL01-crs1'
    *
    ERROR at line 1:
    ORA-15032: not all alterations performed
    ORA-15033: disk '/dev/CELL01-crs1' belongs to diskgroup "CRS"
    

    这个问题可以通过dd盘头,也可以加盘尝试加force参数来解决,我这里选择dd盘头的方式:

    [root@db01 ~]# dd if=/dev/zero of=/dev/CELL01-crs1 bs=8k count=1000
    1000+0 records in
    1000+0 records out
    8192000 bytes (8.2 MB) copied, 0.0691801 s, 118 MB/s
    

    dd盘头之后再次尝试添加就可以顺利完成:

    SQL> alter diskgroup CRS add disk '/dev/CELL01-crs1';
    
    Diskgroup altered.
    

    同样的,将CELL01的数据盘也重新加入到DATA磁盘组中,failgroup名称为CELL01:

    SQL> alter diskgroup DATA ADD FAILGROUP CELL01 disk '/dev/CELL01-data1', '/dev/CELL01-data2' rebalance power 5;
    
    Diskgroup altered.
    

    通过v$asm_operation视图可以查看磁盘重新平衡的进度,直到下面的查询不再返回结果说明重平衡完成:

    SQL> select * from v$asm_operation;
    
    GROUP_NUMBER OPERATION  STATE         POWER     ACTUAL      SOFAR   EST_WORK   EST_RATE EST_MINUTES ERROR_CODE
    ------------ ---------- -------- ---------- ---------- ---------- ---------- ---------- ----------- --------------------
               2 REBAL      RUN               5          5        366        529        348           0
    SQL> select * from v$asm_operation;
    
    no rows selected
    

    3.修改failgroup的配置

    CELL03的数据盘,failgroup目前配置还不正确。
    SQL> alter diskgroup DATA drop disk DATA_0000, DATA_0001;
    
    Diskgroup altered.
    

    查询v$asm_operation视图可以查看磁盘重新平衡的进度,完成后再重新加回磁盘组,指定确切的failgroup(CELL03):

    SQL> alter diskgroup DATA ADD FAILGROUP CELL03 disk '/dev/CELL03-data1', '/dev/CELL03-data2' rebalance power 5;
    
    Diskgroup altered.
    

    再次关注重平衡进度,最后查询一切正常,结果如下:

    SQL> col path for a50
    SQL> select group_number, disk_number, name, path, failgroup, mode_status, voting_file  from v$asm_disk order by 1, 2;
    
    GROUP_NUMBER DISK_NUMBER NAME                           PATH                    FAILGROUP            MODE_STATUS    VO
    ------------ ----------- ------------------------------ ----------------------- -------------------- -------------- --
               1           0 CRS_0000                       /dev/CELL01-crs1        CRS_0000             ONLINE         Y
               1           1 CRS_0001                       /dev/CELL02-crs2        CRS_0001             ONLINE         Y
               1           2 CRS_0002                       /dev/CELL03-crs3        CRS_0002             ONLINE         Y
               2           0 DATA_0000                      /dev/CELL03-data1       CELL03               ONLINE         N
               2           1 DATA_0001                      /dev/CELL03-data2       CELL03               ONLINE         N
               2           2 DATA_0002                      /dev/CELL02-data1       CELL02               ONLINE         N
               2           3 DATA_0003                      /dev/CELL02-data2       CELL02               ONLINE         N
               2           4 DATA_0004                      /dev/CELL01-data1       CELL01               ONLINE         N
               2           5 DATA_0005                      /dev/CELL01-data2       CELL01               ONLINE         N
    
    9 rows selected.
    
    SQL> select group_number, name, total_mb, free_mb, USABLE_FILE_MB, offline_disks, state, type from v$asm_diskgroup;
    
    GROUP_NUMBER NAME                             TOTAL_MB    FREE_MB USABLE_FILE_MB OFFLINE_DISKS STATE                  TYPE
    ------------ ------------------------------ ---------- ---------- -------------- ------------- ---------------------- ----------
               1 CRS                                  3000       2033            516             0 MOUNTED                NORMAL
               2 DATA                                61440      56012          17766             0 MOUNTED                NORMAL
    

    说明:一般我会将磁盘组的兼容性参数设置为11.2,如有特殊需求,还可以设置disk_repair_time(默认3.6h)。

    SQL> col COMPATIBILITY for a30
    SQL> col DATABASE_COMPATIBILITY for a30
    SQL> select NAME, COMPATIBILITY, DATABASE_COMPATIBILITY from v$asm_diskgroup;
    
    NAME                           COMPATIBILITY                  DATABASE_COMPATIBILITY
    ------------------------------ ------------------------------ ------------------------------
    CRS                            11.2.0.0.0                     11.2.0.0.0
    DATA                           11.2.0.0.0                     11.2.0.0.0
    
    --设置DATA磁盘组disk_repair_time(可理解为磁盘离线删除的时间)属性为4.5h
    SQL> ALTER DISKGROUP data SET ATTRIBUTE 'disk_repair_time' = '4.5h';
    Diskgroup altered.
    
  • 相关阅读:
    解决GOOGLE不能用的办法
    Elmah错误日志工具
    Linq 更改主键值
    qcow2、raw、vmdk等镜像格式
    Ceph相关博客、网站(256篇OpenStack博客)
    Delphi中inherited问题
    Qt qss一些伪装态,以及margin与padding区别
    Qt双缓冲机制:实现一个简单的绘图工具(纯代码实现)
    写出一篇好博文需要用到的工具
    最短路径启蒙题
  • 原文地址:https://www.cnblogs.com/jyzhao/p/9978998.html
Copyright © 2020-2023  润新知