• 一步一步搭建11gR2 rac+dg之安装rac出现问题解决(六)【转】


    一步一步在RHEL6.5+VMware Workstation 10上搭建 oracle 11gR2 rac + dg 之安装rac出现的问题 (六)

     本文转自

    一步一步搭建11gR2 rac+dg之安装rac出现问题解决(六)-lhrbest-ITPUB博客
    http://blog.itpub.net/26736162/viewspace-1297128/

    本章主要是搜集了一些安装rac的过程中出现的问题及解决办法,如果没有出现问题的话那么这一章可以不看的

    目录结构:

    1. crs安装出现的问题

    新安装了Oracle 11g rac之后,不知道是什么原因导致第二个节点上的crsd无法启动?其错误消息是CRS-4535: Cannot communicate with Cluster Ready Services。其具体的错误信息还需要查看crsd.log日志才知道。

    1、环境

    [root@linux2 ~]# cat /etc/issue

    Enterprise Linux Enterprise Linux Server release 5.5 (Carthage)

    Kernel on an m

    [root@linux2 bin]# ./crsctl query crs activeversion

    Oracle Clusterware active version on the cluster is [11.2.0.1.0]

    #注意下文中描述中使用了grid与root用户操作不同的对象。

    2、错误症状

    [root@linux2 bin]# ./crsctl check crs

    CRS-4638: Oracle High Availability Services is online

    CRS-4535: Cannot communicate with Cluster Ready Services #CRS-4535

    CRS-4529: Cluster Synchronization Services is online

    CRS-4533: Event Manager is online

    [root@linux2 bin]# ps -ef | grep d.bin #下面的查询中没有crsd.bin

    root 3886 1 1 09:50 ? 00:00:11 /u01/app/11.2.0/grid/bin/ohasd.bin reboot

    grid 3938 1 0 09:51 ? 00:00:04 /u01/app/11.2.0/grid/bin/oraagent.bin

    grid 4009 1 0 09:51 ? 00:00:00 /u01/app/11.2.0/grid/bin/gipcd.bin

    grid 4014 1 0 09:51 ? 00:00:00 /u01/app/11.2.0/grid/bin/mdnsd.bin

    grid 4028 1 0 09:51 ? 00:00:02 /u01/app/11.2.0/grid/bin/gpnpd.bin

    root 4040 1 0 09:51 ? 00:00:03 /u01/app/11.2.0/grid/bin/cssdmonitor

    root 4058 1 0 09:51 ? 00:00:04 /u01/app/11.2.0/grid/bin/cssdagent

    root 4060 1 0 09:51 ? 00:00:00 /u01/app/11.2.0/grid/bin/orarootagent.bin

    grid 4090 1 2 09:51 ? 00:00:15 /u01/app/11.2.0/grid/bin/ocssd.bin

    grid 4094 1 0 09:51 ? 00:00:02 /u01/app/11.2.0/grid/bin/diskmon.bin -d -f

    root 4928 1 0 09:51 ? 00:00:00 /u01/app/11.2.0/grid/bin/octssd.bin reboot

    grid 4945 1 0 09:51 ? 00:00:02 /u01/app/11.2.0/grid/bin/evmd.bin

    root 6514 5886 0 10:00 pts/1 00:00:00 grep d.bin

    [root@linux2 bin]# ./crsctl stat res -t -init

    --------------------------------------------------------------------------------

    NAME TARGET STATE SERVER STATE_DETAILS

    --------------------------------------------------------------------------------

    Cluster Resources

    --------------------------------------------------------------------------------

    ora.asm

    1 ONLINE ONLINE linux2 Cluster Reconfigura

    tion

    ora.crsd

    1 ONLINE OFFLINE #crsd处于offline状态

    ora.cssd

    1 ONLINE ONLINE linux2

    ora.cssdmonitor

    1 ONLINE ONLINE linux2

    ora.ctssd

    1 ONLINE ONLINE linux2 OBSERVER

    ora.diskmon

    1 ONLINE ONLINE linux2

    ora.drivers.acfs

    1 ONLINE OFFLINE #acfs处于offline状态

    ora.evmd

    1 ONLINE ONLINE linux2

    ora.gipcd

    1 ONLINE ONLINE linux2

    ora.gpnpd

    1 ONLINE ONLINE linux2

    ora.mdnsd

    1 ONLINE ONLINE linux2

    #下面查看crsd对应的日志文件

    [grid@linux2 ~]$ view $ORACLE_HOME/log/linux2/crsd/crsd.log

    2013-01-05 10:28:27.107: [GIPCXCPT][1768145488] gipcShutdownF: skipping shutdown, count 1, from [ clsgpnp0.c : 1021],

    ret gipcretSuccess (0)

    2013-01-05 10:28:27.107: [ OCRASM][1768145488]proprasmo: Error in open/create file in dg [OCR_VOTE] #打开磁盘组错误

    [ OCRASM][1768145488]SLOS : SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge

    ORA-15077: could not locate ASM instance serving a required diskgroup #出现了ORA错误

    2013-01-05 10:28:27.107: [ OCRASM][1768145488]proprasmo: kgfoCheckMount returned [7]

    2013-01-05 10:28:27.107: [ OCRASM][1768145488]proprasmo: The ASM instance is down #实例处于关闭状态

    2013-01-05 10:28:27.107: [ OCRRAW][1768145488]proprioo: Failed to open [+OCR_VOTE]. Returned proprasmo() with [26].

    Marking location as UNAVAILABLE.

    2013-01-05 10:28:27.107: [ OCRRAW][1768145488]proprioo: No OCR/OLR devices are usable #OCR/OLR设备不可用

    2013-01-05 10:28:27.107: [ OCRASM][1768145488]proprasmcl: asmhandle is NULL

    2013-01-05 10:28:27.107: [ OCRRAW][1768145488]proprinit: Could not open raw device

    2013-01-05 10:28:27.107: [ OCRASM][1768145488]proprasmcl: asmhandle is NULL

    2013-01-05 10:28:27.107: [ OCRAPI][1768145488]a_init:16!: Backend init unsuccessful : [26]

    2013-01-05 10:28:27.107: [ CRSOCR][1768145488] OCR context init failure. Error: PROC-26: Error while accessing the

    physical storage ASM error [SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge

    ORA-15077: could not locate ASM instance serving a required diskgroup

    ] [7]

    2013-01-05 10:28:27.107: [ CRSD][1768145488][PANIC] CRSD exiting: Could not init OCR, code: 26

    2013-01-05 10:28:27.107: [ CRSD][1768145488] Done.

    [root@linux2 bin]# ps -ef | grep pmon #查看pmon进程,此处也表明ASM实例没有启动

    root 7447 7184 0 10:48 pts/2 00:00:00 grep pmon

    #从上面的分析可知,应该是ASM实例没有启动的原因导致了crsd进程无法启动

    3、解决

    [grid@linux2 ~]$ asmcmd

    Connected to an idle instance.

    ASMCMD> startup #启动asm实例

    ASM instance started

    Total System Global Area 283930624 bytes

    Fixed Size 2212656 bytes

    Variable Size 256552144 bytes

    ASM Cache 25165824 bytes

    ASM diskgroups mounted

    ASMCMD> exit

    #Author : Robinson

    #Blog : http://blog.csdn.net/robinson_0612

    #再次查看集群资源的状态

    [root@linux2 bin]# ./crsctl stat res -t -init

    --------------------------------------------------------------------------------

    NAME TARGET STATE SERVER STATE_DETAILS

    --------------------------------------------------------------------------------

    Cluster Resources

    --------------------------------------------------------------------------------

    ora.asm

    1 ONLINE ONLINE linux2 Started

    ora.crsd

    1 ONLINE INTERMEDIATE linux2

    ora.cssd

    1 ONLINE ONLINE linux2

    ora.cssdmonitor

    1 ONLINE ONLINE linux2

    ora.ctssd

    1 ONLINE ONLINE linux2 OBSERVER

    ora.diskmon

    1 ONLINE ONLINE linux2

    ora.drivers.acfs

    1 ONLINE OFFLINE

    ora.evmd

    1 ONLINE ONLINE linux2

    ora.gipcd

    1 ONLINE ONLINE linux2

    ora.gpnpd

    1 ONLINE ONLINE linux2

    ora.mdnsd

    1 ONLINE ONLINE linux2

    #启动acfs

    [root@linux2 bin]# ./crsctl start res ora.drivers.acfs -init

    CRS-2672: Attempting to start 'ora.drivers.acfs' on 'linux2'

    CRS-2676: Start of 'ora.drivers.acfs' on 'linux2' succeeded

    #之后所有的状态都处于online状态

    [root@linux2 bin]# ./crsctl stat res -t -init

    --------------------------------------------------------------------------------

    NAME TARGET STATE SERVER STATE_DETAILS

    --------------------------------------------------------------------------------

    Cluster Resources

    --------------------------------------------------------------------------------

    ora.asm

    1 ONLINE ONLINE linux2 Started

    ora.crsd

    1 ONLINE ONLINE linux2

    ora.cssd

    1 ONLINE ONLINE linux2

    ora.cssdmonitor

    1 ONLINE ONLINE linux2

    ora.ctssd

    1 ONLINE ONLINE linux2 OBSERVER

    ora.diskmon

    1 ONLINE ONLINE linux2

    ora.drivers.acfs

    1 ONLINE ONLINE linux2

    ora.evmd

    1 ONLINE ONLINE linux2

    ora.gipcd

    1 ONLINE ONLINE linux2

    ora.gpnpd

    1 ONLINE ONLINE linux2

    ora.mdnsd

    1 ONLINE ONLINE linux2

    CRS-4124: Oracle High Availability Services startup failed.

    CRS-4000: Command Start failed, or completed with errors.

    ohasd failed to start: Inappropriate ioctl for device

    ohasd failed to start at/u01/app/11.2.0/grid/crs/install/rootcrs.pl line 443.

    第一次安装11gR2 RAC的时候就遇到了这个11.0.2.1的经典问题,上网一查才知道这是个bug,解决办法也很简单,

    就是在执行root.sh之前执行以下命令

    /bin/dd if=/var/tmp/.oracle/npohasd of=/dev/null bs=1024 count=1

    如果出现

    /bin/dd: opening`/var/tmp/.oracle/npohasd': No such file or directory

    的时候文件还没生成就继续执行,直到能执行为止,一般出现Adding daemon to inittab这条信息的时候执行dd命令。

    另外还有一种解决方法就是更改文件权限

    chown root:oinstall /var/tmp/.oracle/npohasd

    重新执行root.sh之前别忘了删除配置:/u01/app/11.2.0/grid/crs/install/roothas.pl -deconfig -force-verbose

    其中一个必要条件是网卡要一致,一致性表现在这几个方面

    1. 网卡名要一模一样,比如都叫eth0, eth1,不能出现一台节点eth0,eth1, 另一台eth2, eth3

    我在安装的时候就出现这种错误,造成的现象就是第一台节点能正常安装,但是第二台执行root.sh的时候总是报错Failed to Start CSSD。

    2. 不仅名字要一样,而且对应的public, private也要一致,也就是说不能一台

    eth0: 192.168.1.2

    eth1: 10.10.1.2

    另一台

    eth0: 10.10.1.3

    eth1: 192.168.1.3

    3. 不仅地址要对应,还要求子网掩码要一致,也就是同一个public,private网络不能一台子网掩码

    255.255.0.0

    另一台的子网掩码是255.255.255.0

    采用虚拟机克隆的话,网卡名不一致是最常见的。

    Redhat Enterprise Linux 6 下网卡名的修改方法:(比如eth5需要改成eth0)

    1. 修改配置文件 /etc/udev/rules.d/70-persistent-net.rules,把其中的网卡名改成新的网卡名

    2. 配置文件/etc/sysconfig/network-script/ifcfg-eth5也需要改成ifcfg-eth0

    3. /etc/rc.d/init.d/network restart 重新启动

    Oracle RAC的安装真是麻烦。一旦设置不对,后面就会出现各种错误。

    补充:建议用同型号的网卡,不一样行不行没有试过但是根据资料,至少网卡的MTU(最大传输单元)必须要一致,否则也会导致错误。

    现象:

    在Redhat Linux 6上安装Oracle RAC的过程中,到65%的时候就没有任何反应。

    原因:

    因为防火墙开着

    解决方法

    chkconfig iptables off

    service iptables stop

    [root@his2 bin]# ./crsctl check crs 检查服务状态CRS-4047: No Oracle Clusterware components configured.CRS-4000: Command Check failed, or completed with errors.[root@his2 bin]# ./crsctl stat res -tCRS-4047: No Oracle Clusterware components configured.CRS-4000: Command Status failed, or completed with errors.[root@his2 bin]# ./crs_stat -tCRS-0184: Cannot communicate with the CRS daemon.

    /app/grid/product/11.2.0/grid/crs/install/rootcrs.pl -deconfig -force 重置crs注册表Using configuration parameter file: /app/grid/product/11.2.0/grid/crs/install/crsconfig_paramsNetwork exists: 1/192.168.20.0/255.255.255.0/eth0, type staticVIP exists: /his1-vip/192.168.20.6/192.168.20.0/255.255.255.0/eth0, hosting node his1VIP exists: /his2-vip/192.168.20.7/192.168.20.0/255.255.255.0/eth0, hosting node his2GSD existsONS exists: Local port 6100, remote port 6200, EM port 2016ACFS-9200: SupportedCRS-2673: Attempting to stop 'ora.registry.acfs' on 'his2'CRS-2677: Stop of 'ora.registry.acfs' on 'his2' succeededCRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'his2'CRS-2673: Attempting to stop 'ora.crsd' on 'his2'CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'his2'CRS-2673: Attempting to stop 'ora.ORACRS.dg' on 'his2'CRS-2673: Attempting to stop 'ora.crds3db.db' on 'his2'CRS-2677: Stop of 'ora.ORACRS.dg' on 'his2' succeededCRS-2677: Stop of 'ora.crds3db.db' on 'his2' succeededCRS-2673: Attempting to stop 'ora.ORAARCH.dg' on 'his2'CRS-2673: Attempting to stop 'ora.ORADATA.dg' on 'his2'CRS-2677: Stop of 'ora.ORAARCH.dg' on 'his2' succeededCRS-2677: Stop of 'ora.ORADATA.dg' on 'his2' succeededCRS-2673: Attempting to stop 'ora.asm' on 'his2'CRS-2677: Stop of 'ora.asm' on 'his2' succeededCRS-2792: Shutdown of Cluster Ready Services-managed resources on 'his2' has completedCRS-2677: Stop of 'ora.crsd' on 'his2' succeededCRS-2673: Attempting to stop 'ora.ctssd' on 'his2'CRS-2673: Attempting to stop 'ora.evmd' on 'his2'CRS-2673: Attempting to stop 'ora.asm' on 'his2'CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'his2'CRS-2673: Attempting to stop 'ora.mdnsd' on 'his2'CRS-2677: Stop of 'ora.asm' on 'his2' succeededCRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'his2'CRS-2677: Stop of 'ora.evmd' on 'his2' succeededCRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'his2' succeededCRS-2677: Stop of 'ora.mdnsd' on 'his2' succeededCRS-2677: Stop of 'ora.ctssd' on 'his2' succeededCRS-2673: Attempting to stop 'ora.cssd' on 'his2'CRS-2677: Stop of 'ora.cssd' on 'his2' succeededCRS-2673: Attempting to stop 'ora.diskmon' on 'his2'CRS-2673: Attempting to stop 'ora.crf' on 'his2'CRS-2677: Stop of 'ora.diskmon' on 'his2' succeededCRS-2677: Stop of 'ora.crf' on 'his2' succeededCRS-2673: Attempting to stop 'ora.gipcd' on 'his2'CRS-2677: Stop of 'ora.drivers.acfs' on 'his2' succeededCRS-2677: Stop of 'ora.gipcd' on 'his2' succeededCRS-2673: Attempting to stop 'ora.gpnpd' on 'his2'CRS-2677: Stop of 'ora.gpnpd' on 'his2' succeededCRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'his2' has completedCRS-4133: Oracle High Availability Services has been stopped.Successfully deconfigured Oracle clusterware stack on this node[root@his1 ~]# su - grid[grid@his1 ~]$ crs_stat -t 在节点1上面只能看到节点1的服务这说明节点2有问题了Name Type Target State Host ------------------------------------------------------------ora....ER.lsnr ora....er.type ONLINE ONLINE his1 ora....N1.lsnr ora....er.type ONLINE ONLINE his1 ora.ORAARCH.dg ora....up.type ONLINE ONLINE his1 ora.ORACRS.dg ora....up.type ONLINE ONLINE his1 ora.ORADATA.dg ora....up.type ONLINE ONLINE his1 ora.asm ora.asm.type ONLINE ONLINE his1 ora.crds3db.db ora....se.type ONLINE ONLINE his1 ora.cvu ora.cvu.type ONLINE ONLINE his1 ora.gsd ora.gsd.type OFFLINE OFFLINE ora....SM1.asm application ONLINE ONLINE his1 ora....S1.lsnr application ONLINE ONLINE his1 ora.his1.gsd application OFFLINE OFFLINE ora.his1.ons application ONLINE ONLINE his1 ora.his1.vip ora....t1.type ONLINE ONLINE his1 ora....network ora....rk.type ONLINE ONLINE his1 ora.oc4j ora.oc4j.type ONLINE ONLINE his1 ora.ons ora.ons.type ONLINE ONLINE his1 ora....ry.acfs ora....fs.type ONLINE ONLINE his1 ora.scan1.vip ora....ip.type ONLINE ONLINE his1

    [grid@his1 ~]$ crsctl stat res -t 在1节点上面只能看到1资源说明节点2有问题了--------------------------------------------------------------------------------NAME TARGET STATE SERVER STATE_DETAILS --------------------------------------------------------------------------------Local Resources--------------------------------------------------------------------------------ora.LISTENER.lsnr ONLINE ONLINE his1 ora.ORAARCH.dg ONLINE ONLINE his1 ora.ORACRS.dg ONLINE ONLINE his1 ora.ORADATA.dg ONLINE ONLINE his1 ora.asm ONLINE ONLINE his1 Started ora.gsd OFFLINE OFFLINE his1 ora.net1.network ONLINE ONLINE his1 ora.ons ONLINE ONLINE his1 ora.registry.acfs ONLINE ONLINE his1 --------------------------------------------------------------------------------Cluster Resources--------------------------------------------------------------------------------ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE his1 ora.crds3db.db 1 ONLINE ONLINE his1 Open 2 ONLINE OFFLINE Instance Shutdown ora.cvu 1 ONLINE ONLINE his1 ora.his1.vip 1 ONLINE ONLINE his1 ora.oc4j 1 ONLINE ONLINE his1 ora.scan1.vip 1 ONLINE ONLINE his1

    [root@his2 bin]# /app/grid/product/11.2.0/grid/root.sh 执行roo.sh重新配置集群所有服务Running Oracle 11g root script...

    The following environment variables are set as: ORACLE_OWNER= grid ORACLE_HOME= /app/grid/product/11.2.0/grid

    Enter the full pathname of the local bin directory: [/usr/local/bin]: The contents of "dbhome" have not changed. No need to overwrite.The contents of "oraenv" have not changed. No need to overwrite.The contents of "coraenv" have not changed. No need to overwrite.

    Entries will be added to the /etc/oratab file as needed byDatabase Configuration Assistant when a database is createdFinished running generic part of root script.Now product-specific root actions will be performed.Using configuration parameter file: /app/grid/product/11.2.0/grid/crs/install/crsconfig_paramsLOCAL ADD MODE Creating OCR keys for user 'root', privgrp 'root'..Operation successful.OLR initialization - successfulAdding daemon to inittabACFS-9200: SupportedACFS-9300: ADVM/ACFS distribution files found.ACFS-9307: Installing requested ADVM/ACFS software.ACFS-9308: Loading installed ADVM/ACFS drivers.ACFS-9321: Creating udev for ADVM/ACFS.ACFS-9323: Creating module dependencies - this may take some time.ACFS-9327: Verifying ADVM/ACFS devices.ACFS-9309: ADVM/ACFS installation correctness verified.CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node his1, number 1, and is terminatingAn active cluster was found during exclusive startup, restarting to join the clusterPreparing packages for installation...cvuqdisk-1.0.9-1Configure Oracle Grid Infrastructure for a Cluster ... succeeded

    [root@his2 bin]# ./crsctl stat res -t 看结果已经恢复正常--------------------------------------------------------------------------------NAME TARGET STATE SERVER STATE_DETAILS --------------------------------------------------------------------------------Local Resources--------------------------------------------------------------------------------ora.LISTENER.lsnr ONLINE ONLINE his1 ONLINE ONLINE his2 ora.ORAARCH.dg ONLINE ONLINE his1 ONLINE ONLINE his2 ora.ORACRS.dg ONLINE ONLINE his1 ONLINE ONLINE his2 ora.ORADATA.dg ONLINE ONLINE his1 ONLINE ONLINE his2 ora.asm ONLINE ONLINE his1 Started ONLINE ONLINE his2 ora.gsd OFFLINE OFFLINE his1 OFFLINE OFFLINE his2 ora.net1.network ONLINE ONLINE his1 ONLINE ONLINE his2 ora.ons ONLINE ONLINE his1 ONLINE ONLINE his2 ora.registry.acfs ONLINE ONLINE his1 ONLINE ONLINE his2 --------------------------------------------------------------------------------Cluster Resources--------------------------------------------------------------------------------ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE his1 ora.crds3db.db 1 ONLINE ONLINE his1 Open 2 ONLINE ONLINE his2 Open ora.cvu 1 ONLINE ONLINE his1 ora.his1.vip 1 ONLINE ONLINE his1 ora.his2.vip 1 ONLINE ONLINE his2 ora.oc4j 1 ONLINE ONLINE his1 ora.scan1.vip 1 ONLINE ONLINE his1 --------------------------------------------

    1. root脚本

    1. When I run the script at the second node, the error info as below:

    [root@rac2 ~]# /oracle/app/grid/product/11.2.0/root.sh

    Running Oracle 11g root script...

    The following environment variables are set as:

    ORACLE_OWNER= grid

    ORACLE_HOME= /oracle/app/grid/product/11.2.0

    Enter the full pathname of the local bin directory: [/usr/local/bin]:

    Copying dbhome to /usr/local/bin ...

    Copying oraenv to /usr/local/bin ...

    Copying coraenv to /usr/local/bin ...

    Creating /etc/oratab file...

    Entries will be added to the /etc/oratab file as needed by

    Database Configuration Assistant when a database is created

    Finished running generic part of root script.

    Now product-specific root actions will be performed.

    Using configuration parameter file: /oracle/app/grid/product/11.2.0/crs/install/crsconfig_params

    Creating trace directory

    LOCAL ADD MODE

    Creating OCR keys for user 'root', privgrp 'root'..

    Operation successful.

    OLR initialization - successful

    Adding daemon to inittab

    ACFS-9200: Supported

    ACFS-9300: ADVM/ACFS distribution files found.

    ACFS-9307: Installing requested ADVM/ACFS software.

    ACFS-9308: Loading installed ADVM/ACFS drivers.

    ACFS-9321: Creating udev for ADVM/ACFS.

    ACFS-9323: Creating module dependencies - this may take some time.

    ACFS-9327: Verifying ADVM/ACFS devices.

    ACFS-9309: ADVM/ACFS installation correctness verified.

    CRS-2672: Attempting to start 'ora.mdnsd' on 'rac2'

    CRS-2676: Start of 'ora.mdnsd' on 'rac2' succeeded

    CRS-2672: Attempting to start 'ora.gpnpd' on 'rac2'

    CRS-2676: Start of 'ora.gpnpd' on 'rac2' succeeded

    CRS-2672: Attempting to start 'ora.cssdmonitor' on 'rac2'

    CRS-2672: Attempting to start 'ora.gipcd' on 'rac2'

    CRS-2676: Start of 'ora.gipcd' on 'rac2' succeeded

    CRS-2676: Start of 'ora.cssdmonitor' on 'rac2' succeeded

    CRS-2672: Attempting to start 'ora.cssd' on 'rac2'

    CRS-2672: Attempting to start 'ora.diskmon' on 'rac2'

    CRS-2676: Start of 'ora.diskmon' on 'rac2' succeeded

    CRS-2676: Start of 'ora.cssd' on 'rac2' succeeded

    Disk Group CRS creation failed with the following message:

    ORA-15018: diskgroup cannot be created

    ORA-15031: disk specification '/dev/oracleasm/disks/CRS3' matches no disks

    ORA-15025: could not open disk "/dev/oracleasm/disks/CRS3"

    ORA-15056: additional error message

    Configuration of ASM ... failed

    see asmca logs at /oracle/app/oracle/cfgtoollogs/asmca for details

    Did not succssfully configure and start ASM at/oracle/app/grid/product/11.2.0/crs/install/crsconfig_lib.pm line 6464.

    /oracle/app/grid/product/11.2.0/perl/bin/perl -I/oracle/app/grid/product/11.2.0/perl/lib -I/oracle/app/grid/product/11.2.0/crs/install /oracle/app/grid/product/11.2.0/crs/install/rootcrs.pl execution failed

    [grid@rac2 ~]$ vi oracle/app/oracle/cfgtoollogs/asmca/asmca-110428PM061902.log

    ……………….

    [main] [ 2011-04-28 18:19:38.135 CST ] [UsmcaLogger.logInfo:142] Diskstring in createDG to be updated: '/dev/oracleasm/disks/*'

    [main] [ 2011-04-28 18:19:38.136 CST ] [UsmcaLogger.logInfo:142] update param sql ALTER SYSTEM SET asm_diskstring='/dev/oracleasm/disks/*' SID='*'

    [main] [ 2011-04-28 18:19:38.262 CST ] [InitParamAttributes.loadDBParams:4450] Checking if SPFILE is used

    [main] [ 2011-04-28 18:19:38.276 CST ] [InitParamAttributes.loadDBParams:4461] spParams = [Ljava.lang.String;@1a001ff

    [main] [ 2011-04-28 18:19:38.277 CST ] [ASMParameters.loadASMParameters:459] useSPFile=false

    [main] [ 2011-04-28 18:19:38.277 CST ] [SQLEngine.doSQLSubstitution:2392] The substituted sql statement:=select count(*) from v$ASM_DISKGROUP where name=upper('CRS')

    [main] [ 2011-04-28 18:19:38.423 CST ] [UsmcaLogger.logInfo:142] CREATE DISKGROUP SQL: CREATE DISKGROUP CRS EXTERNAL REDUNDANCY DISK '/dev/oracleasm/disks/CRS1',

    '/dev/oracleasm/disks/CRS2',

    '/dev/oracleasm/disks/CRS3' ATTRIBUTE 'compatible.asm'='11.2.0.0.0'

    [main] [ 2011-04-28 18:19:38.724 CST ] [SQLEngine.done:2167] Done called

    [main] [ 2011-04-28 18:19:38.731 CST ] [UsmcaLogger.logException:172] SEVERE:method oracle.sysman.assistants.usmca.backend.USMDiskGroupManager:createDiskGroups

    [main] [ 2011-04-28 18:19:38.731 CST ] [UsmcaLogger.logException:173] ORA-15018: diskgroup cannot be created

    ORA-15031: disk specification '/dev/oracleasm/disks/CRS3' matches no disks

    ORA-15025: could not open disk "/dev/oracleasm/disks/CRS3"

    ORA-15056: additional error message

    [main] [ 2011-04-28 18:19:38.731 CST ] [UsmcaLogger.logException:174] oracle.sysman.assistants.util.sqlEngine.SQLFatalErrorException: ORA-15018: diskgroup cannot be created

    ORA-15031: disk specification '/dev/oracleasm/disks/CRS3' matches no disks

    ORA-15025: could not open disk "/dev/oracleasm/disks/CRS3"

    ORA-15056: additional error message

    oracle.sysman.assistants.util.sqlEngine.SQLEngine.executeImpl(SQLEngine.java:1655)

    oracle.sysman.assistants.util.sqlEngine.SQLEngine.executeSql(SQLEngine.java:1903)

    oracle.sysman.assistants.usmca.backend.USMDiskGroupManager.createDiskGroups(USMDiskGroupManager.java:236)

    oracle.sysman.assistants.usmca.backend.USMDiskGroupManager.createDiskGroups(USMDiskGroupManager.java:121)

    oracle.sysman.assistants.usmca.backend.USMDiskGroupManager.createDiskGroupsLocal(USMDiskGroupManager.java:2209)

    oracle.sysman.assistants.usmca.backend.USMInstance.configureLocalASM(USMInstance.java:3093)

    oracle.sysman.assistants.usmca.service.UsmcaService.configureLocalASM(UsmcaService.java:1047)

    oracle.sysman.assistants.usmca.model.UsmcaModel.performConfigureLocalASM(UsmcaModel.java:903)

    oracle.sysman.assistants.usmca.model.UsmcaModel.performOperation(UsmcaModel.java:779)

    oracle.sysman.assistants.usmca.Usmca.execute(Usmca.java:171)

    oracle.sysman.assistants.usmca.Usmca.main(Usmca.java:366)

    [main] [ 2011-04-28 18:19:38.732 CST ] [UsmcaLogger.logExit:123] Exiting oracle.sysman.assistants.usmca.backend.USMDiskGroupManager Method : createDiskGroups

    [main] [ 2011-04-28 18:19:38.732 CST ] [UsmcaLogger.logInfo:142] Diskgroups created

    [main] [ 2011-04-28 18:19:38.733 CST ] [UsmcaLogger.logInfo:142] Diskgroup creation is not successful.

    [main] [ 2011-04-28 18:19:38.733 CST ] [UsmcaLogger.logExit:123] Exiting oracle.sysman.assistants.usmca.model.UsmcaModel Method : performConfigureLocalASM

    [main] [ 2011-04-28 18:19:38.733 CST ] [UsmcaLogger.logExit:123] Exiting oracle.sysman.assistants.usmca.model.UsmcaModel Method : performOperation

    Solution:

    Add permission to the /dev/oraclease for user grid:

    [root@rac2 ~]# chown -R grid.oinstall /dev/oracleasm

    [root@rac2 ~]# chmod -R 775 /dev/oracleasm

    2. The ora.asm cannot run at rac2

    [grid@rac2 ~]$ crsctl status resource -t

    --------------------------------------------------------------------------------

    NAME TARGET STATE SERVER STATE_DETAILS

    --------------------------------------------------------------------------------

    Local Resources

    --------------------------------------------------------------------------------

    ora.CRS.dg

    ONLINE ONLINE rac1

    ONLINE ONLINE rac2

    ora.asm

    ONLINE ONLINE rac1 Started

    ONLINE ONLINE rac2

    ora.gsd

    OFFLINE OFFLINE rac1

    OFFLINE OFFLINE rac2

    ora.net1.network

    ONLINE ONLINE rac1

    ONLINE ONLINE rac2

    ora.ons

    ONLINE ONLINE rac1

    ONLINE ONLINE rac2

    ora.registry.acfs

    ONLINE ONLINE rac1

    ONLINE ONLINE rac2

    --------------------------------------------------------------------------------

    Cluster Resources

    --------------------------------------------------------------------------------

    ora.LISTENER_SCAN1.lsnr

    1 ONLINE ONLINE rac1

    ora.cvu

    1 ONLINE ONLINE rac1

    ora.oc4j

    1 ONLINE ONLINE rac1

    ora.rac1.vip

    1 ONLINE ONLINE rac1

    ora.rac2.vip

    1 ONLINE ONLINE rac2

    ora.scan1.vip

    1 ONLINE ONLINE rac1

    [root@rac2 ~]# /oracle/app/grid/product/11.2.0/crs/install/rootcrs.pl -verbose -deconfig -force -lastnode

    Using configuration parameter file: /oracle/app/grid/product/11.2.0/crs/install/crsconfig_params

    Network exists: 1/10.157.45.0/255.255.255.0/eth0, type static

    VIP exists: /rac1vip/10.157.45.174/10.157.45.0/255.255.255.0/eth0, hosting node rac1

    VIP exists: /rac2vip/10.157.45.157/10.157.45.0/255.255.255.0/eth0, hosting node rac2

    GSD exists

    ONS exists: Local port 6100, remote port 6200, EM port 2016

    ACFS-9200: Supported

    CRS-2673: Attempting to stop 'ora.registry.acfs' on 'rac2'

    CRS-2677: Stop of 'ora.registry.acfs' on 'rac2' succeeded

    CRS-2673: Attempting to stop 'ora.crsd' on 'rac2'

    CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on 'rac2'

    CRS-2673: Attempting to stop 'ora.CRS.dg' on 'rac2'

    CRS-2677: Stop of 'ora.CRS.dg' on 'rac2' succeeded

    CRS-2673: Attempting to stop 'ora.asm' on 'rac2'

    CRS-2677: Stop of 'ora.asm' on 'rac2' succeeded

    CRS-2792: Shutdown of Cluster Ready Services-managed resources on 'rac2' has completed

    CRS-2677: Stop of 'ora.crsd' on 'rac2' succeeded

    CRS-2673: Attempting to stop 'ora.ctssd' on 'rac2'

    CRS-2673: Attempting to stop 'ora.evmd' on 'rac2'

    CRS-2673: Attempting to stop 'ora.asm' on 'rac2'

    CRS-2677: Stop of 'ora.asm' on 'rac2' succeeded

    CRS-2673: Attempting to stop 'ora.cluster_interconnect.haip' on 'rac2'

    CRS-2677: Stop of 'ora.evmd' on 'rac2' succeeded

    CRS-2677: Stop of 'ora.cluster_interconnect.haip' on 'rac2' succeeded

    CRS-2677: Stop of 'ora.ctssd' on 'rac2' succeeded

    CRS-2673: Attempting to stop 'ora.cssd' on 'rac2'

    CRS-2677: Stop of 'ora.cssd' on 'rac2' succeeded

    CRS-2673: Attempting to stop 'ora.diskmon' on 'rac2'

    CRS-2677: Stop of 'ora.diskmon' on 'rac2' succeeded

    CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node rac1, number 1, and is terminating

    Unable to communicate with the Cluster Synchronization Services daemon.

    CRS-4000: Command Delete failed, or completed with errors.

    crsctl delete for vds in CRS ... failed

    CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'rac2'

    CRS-2673: Attempting to stop 'ora.mdnsd' on 'rac2'

    CRS-2673: Attempting to stop 'ora.crf' on 'rac2'

    CRS-2677: Stop of 'ora.mdnsd' on 'rac2' succeeded

    CRS-2677: Stop of 'ora.crf' on 'rac2' succeeded

    CRS-2673: Attempting to stop 'ora.gipcd' on 'rac2'

    CRS-2677: Stop of 'ora.gipcd' on 'rac2' succeeded

    CRS-2673: Attempting to stop 'ora.gpnpd' on 'rac2'

    CRS-2677: Stop of 'ora.gpnpd' on 'rac2' succeeded

    CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'rac2' has completed

    CRS-4133: Oracle High Availability Services has been stopped.

    Successfully deconfigured Oracle clusterware stack on this node

    [root@rac2 ~]# /oracle/app/grid/product/11.2.0/root.sh

    Running Oracle 11g root script...

    The following environment variables are set as:

    ORACLE_OWNER= grid

    ORACLE_HOME= /oracle/app/grid/product/11.2.0

    Enter the full pathname of the local bin directory: [/usr/local/bin]:

    The contents of "dbhome" have not changed. No need to overwrite.

    The contents of "oraenv" have not changed. No need to overwrite.

    The contents of "coraenv" have not changed. No need to overwrite.

    Entries will be added to the /etc/oratab file as needed by

    Database Configuration Assistant when a database is created

    Finished running generic part of root script.

    Now product-specific root actions will be performed.

    Using configuration parameter file: /oracle/app/grid/product/11.2.0/crs/install/crsconfig_params

    LOCAL ADD MODE

    Creating OCR keys for user 'root', privgrp 'root'..

    Operation successful.

    OLR initialization - successful

    Adding daemon to inittab

    ACFS-9200: Supported

    ACFS-9300: ADVM/ACFS distribution files found.

    ACFS-9307: Installing requested ADVM/ACFS software.

    ACFS-9308: Loading installed ADVM/ACFS drivers.

    ACFS-9321: Creating udev for ADVM/ACFS.

    ACFS-9323: Creating module dependencies - this may take some time.

    ACFS-9327: Verifying ADVM/ACFS devices.

    ACFS-9309: ADVM/ACFS installation correctness verified.

    CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node rac1, number 1, and is terminating

    An active cluster was found during exclusive startup, restarting to join the cluster

    Preparing packages for installation...

    cvuqdisk-1.0.9-1

    Configure Oracle Grid Infrastructure for a Cluster ... succeeded

    3. The service ora.rac1.LISTENER_RAC1.lsnr cannot online

    NAME=ora.rac1.LISTENER_RAC1.lsnr

    TYPE=application

    TARGET=ONLINE

    STATE=OFFLINE

    [grid@rac1 ~]$ crs_start ora.rac1.LISTENER_RAC1.lsnr

    CRS-2805: Unable to start 'ora.LISTENER.lsnr' because it has a 'hard' dependency on resource type 'ora.cluster_vip_net1.type' and no resource of that type can satisfy the dependency

    CRS-2525: All instances of the resource 'ora.rac2.vip' are already running; relocate is not allowed because the force option was not specified

    CRS-0222: Resource 'ora.rac1.LISTENER_RAC1.lsnr' has dependency error.

    4. [INS-20802] Oracle Cluster Verification Utility failed.

    Cause?燭he plug-in failed in its perform method Action?燫efer to the logs or contact Oracle Support Services. Log File Location

    /oracle/app/oraInventory/logs/installActions2011-04-28_05-32-48PM.log

    tail -n 6000 /oracle/app/oraInventory/logs/installActions2011-04-28_05-32-48PM.log

    INFO: Checking VIP reachability

    INFO: Check for VIP reachability passed.

    INFO: Post-check for cluster services setup was unsuccessful.

    INFO: Checks did not pass for the following node(s):

    INFO: rac-cluster,rac1

    INFO:

    WARNING:

    INFO: Completed Plugin named: Oracle Cluster Verification Utility

    1. 利用UDEV服务解决RAC ASM存储设备名

    在我们介绍了使用ASMLIB作为一种专门为Oracle Automatic Storage Management特性设计的内核支持库(kernel support library)的优缺点,同时建议使用成熟的UDEV方案来替代ASMLIB。

    这里我们就给出配置UDEV的具体步骤,还是比较简单的:

    1.确认在所有RAC节点上已经安装了必要的UDEV包

    [root@rh2 ~]# rpm -qa|grep udev udev-095-14.21.el5

    2.通过scsi_id获取设备的块设备的唯一标识名,假设系统上已有LUN sdc-sdp

    forin c d e f g h i j k l m n o p ;

    do

    echo"sd$i""`scsi_id -g -u -s /block/sd$i` ";

    done

    sdc 1IET_00010001

    sdd 1IET_00010002

    sde 1IET_00010003

    sdf 1IET_00010004

    sdg 1IET_00010005

    sdh 1IET_00010006

    sdi 1IET_00010007

    sdj 1IET_00010008

    sdk 1IET_00010009

    sdl 1IET_0001000a

    sdm 1IET_0001000b

    sdn 1IET_0001000c

    sdo 1IET_0001000d

    sdp 1IET_0001000e

    以上列出于块设备名对应的唯一标识名

    3.创建必要的UDEV配置文件,

    首先切换到配置文件目录

    [root@rh2 ~]# cd /etc/udev/rules.d

    定义必要的规则配置文件

    [root@rh2 rules.d]# touch 99-oracle-asmdevices.rules

    [root@rh2 rules.d]# cat 99-oracle-asmdevices.rules

    KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_00010001", NAME="ocr1", WNER="grid", GROUP="asmadmin", MODE="0660"

    KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_00010002", NAME="ocr2", WNER="grid", GROUP="asmadmin", MODE="0660"

    KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_00010003", NAME="asm-disk1", OWNER="grid", GROUP="asmadmin", MODE="0660"

    KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_00010004", NAME="asm-disk2", OWNER="grid", GROUP="asmadmin", MODE="0660"

    KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_00010005", NAME="asm-disk3", OWNER="grid", GROUP="asmadmin", MODE="0660"

    KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_00010006", NAME="asm-disk4", OWNER="grid", GROUP="asmadmin", MODE="0660"

    KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_00010007", NAME="asm-disk5", OWNER="grid", GROUP="asmadmin", MODE="0660"

    KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_00010008", NAME="asm-disk6", OWNER="grid", GROUP="asmadmin", MODE="0660"

    KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_00010009", NAME="asm-disk7", OWNER="grid", GROUP="asmadmin", MODE="0660"

    KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_0001000a", NAME="asm-disk8", OWNER="grid", GROUP="asmadmin", MODE="0660"

    KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_0001000b", NAME="asm-disk9", OWNER="grid", GROUP="asmadmin", MODE="0660"

    KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_0001000c", NAME="asm-disk10", WNER="grid", GROUP="asmadmin", MODE="0660"

    KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_0001000d", NAME="asm-disk11", WNER="grid", GROUP="asmadmin", MODE="0660"

    KERNEL=="sd*", BUS=="scsi", PROGRAM=="/sbin/scsi_id -g -u -s %p", RESULT=="1IET_0001000e", NAME="asm-disk12", WNER="grid", GROUP="asmadmin", MODE="0660"

    Result 为/sbin/scsi_id -g -u -s %p的输出--Match the returned string of the last PROGRAM call. This key may be used in any following rule after a PROGRAM call.

    按顺序填入刚才获取的唯一标识名即可

    OWNER为安装Grid Infrastructure的用户,在11gr2中一般为grid,GROUP为asmadmin,MODE采用0660即可

    NAME为UDEV映射后的设备名,

    建议为OCR和VOTE DISK创建独立的DISKGROUP,为了容易区分将该DISKGROUP专用的设备命名为ocr1..ocrn的形式

    其余磁盘可以根据其实际用途或磁盘组名来命名

    4.将该规则文件拷贝到其他节点上

    [root@rh2 rules.d]# scp 99-oracle-asmdevices.rules Other_node:/etc/udev/rules.d

    5.在所有节点上启动udev服务,或者重启服务器即可

    [root@rh2 rules.d]# /sbin/udevcontrol reload_rules

    [root@rh2 rules.d]# /sbin/start_udev Starting udev: [ OK ]

    6.检查设备是否到位

    [root@rh2 rules.d]# cd /dev [root@rh2 dev]# ls -l ocr* brw-rw---- 1 grid asmadmin 8, 32 Jul 10 17:31 ocr1 brw-rw---- 1 grid asmadmin 8, 48 Jul 10 17:31 ocr2

    [root@rh2 dev]# ls -l asm-disk*

    brw-rw---- 1 grid asmadmin 8, 64 Jul 10 17:31 asm-disk1

    brw-rw---- 1 grid asmadmin 8, 208 Jul 10 17:31 asm-disk10

    brw-rw---- 1 grid asmadmin 8, 224 Jul 10 17:31 asm-disk11

    brw-rw---- 1 grid asmadmin 8, 240 Jul 10 17:31 asm-disk12

    brw-rw---- 1 grid asmadmin 8, 80 Jul 10 17:31 asm-disk2

    brw-rw---- 1 grid asmadmin 8, 96 Jul 10 17:31 asm-disk3

    brw-rw---- 1 grid asmadmin 8, 112 Jul 10 17:31 asm-disk4

    brw-rw---- 1 grid asmadmin 8, 128 Jul 10 17:31 asm-disk5

    brw-rw---- 1 grid asmadmin 8, 144 Jul 10 17:31 asm-disk6

    brw-rw---- 1 grid asmadmin 8, 160 Jul 10 17:31 asm-disk7

    brw-rw---- 1 grid asmadmin 8, 176 Jul 10 17:31 asm-disk8

    brw-rw---- 1 grid asmadmin 8, 192 Jul 10 17:31 asm-disk9

    1. 三思共享

    自"手把手教你用VMware在linux下安装oracle10g RAC"一文发布以来,俺个人的虚荣心再一次得到了极大满足,因为持续不断的有朋友对俺持续不断地表达感谢(就是没几个表示请俺吃饭的,5555,这帮口惠心不实的家伙`~~),说俺这篇文档写的好是写的妙,不光步骤清晰又明了,还有图片做参照。

    不过,这其中也有不少朋友与我联系,说是安装过程中遇到问题,其中大多数 问题 均是 出在 节点2执行oracleasm listdisks时检测不到共享的磁盘。

    对于vmware下配置RAC数据库,出现这一问题的原因通常有两个方面,下面分别描述 。

    1、VMWARE中添加磁盘共享参数

    使用VMWARE 配置RAC数据库,一定要选择server的VMWARE(VMWARE分为server和workstation两种类型),这一点 三思在文档中专门强调了,有些朋友可能仍未注意 ;再一个就是 各个 vmware 节点的*.vmx配置文件中,添加相关的磁盘共享参数, 否则也有可能导致 无法正常识别共享的磁盘。

    下面举一下三思配置环境时,vmx文件中关于磁盘共享参数的示例,如下:

    disk.locking = "false"

    diskLib.dataCacheMaxSize = "0"

    diskLib.dataCacheMaxReadAheadSize = "0"

    diskLib.DataCacheMinReadAheadSize = "0"

    diskLib.dataCachePageSize = "4096"

    diskLib.maxUnsyncedWrites = "0"

    scsi1.sharedBus = "virtual"

    2、共享磁盘不共享

    第2个原因就更加BT了,不过这种问题导致磁盘无法共享的例子也不鲜见,出现这种问题 主要是对于 Oracle RAC 架构的理解不深入导致的。

    在阐述主题之前,俺首先想明确一个相关概念:究竟何为共享存储。所谓 共享存储,顾名思义,也就是 磁盘空间应由相关的节点共享访问,更直白的讲就是节点访问的是同一个(或几个)磁盘, 对于虚机 环境 的话,就是访问相同的磁盘文件。

    ORACLE 数据库是由实例+数据库组成,实例是由一组操作系统进程+操作系统的一块内存区域组成;数据库则是一堆各种类型的特性文件的合集(比如数据文件、临时文件,重做日志文件、控制文件等),RAC环境的ORACLE数据库,实际上是多个实例(每个实例分别运行在不同的节点上----一般情况下,你要让它运行于同一个节点应该也是可行的)访问和读写一份数据库。数据库是放在哪呢,就是放在共享存储上,也就是说RAC的几个实例 访问的文件应该在 相 同的磁盘上 。

    Ok,回到主题,有些朋友在创建第二个节点,为该节点添加用于voting disk,ocr以asm用的磁盘时,并不是选择第一个节点中创建的文件 (Use an existing virtual disk) ,而是又重新创建了 新的磁盘文件(Create a new virtual disk), 这种情况完全没有共享的概念,自然也就不可能实现磁盘的共享存储了。

     

    在安装GI,第一个节点的root.sh一般不易出错,但是第二个节点就遇到了下面的错误Start of resource "ora.asm -init" failedFailed to start ASMFailed to start Oracle Clusterware stack

    检查错误日志grid@znode2 crsconfig]$ pwd/u01/app/11.2.0/grid/cfgtoollogs/crsconfig[grid@znode2 crsconfig]$ vi + rootcrs_znode2.log

    …2012-06-28 17:53:50: Starting CSS in clustered mode2012-06-28 17:54:42: CRS-2672: Attempting to start 'ora.cssdmonitor' on 'znode2′2012-06-28 17:54:42: CRS-2676: Start of 'ora.cssdmonitor' on 'znode2′ succeeded2012-06-28 17:54:42: CRS-2672: Attempting to start 'ora.cssd' on'znode2′2012-06-28 17:54:42: CRS-2672: Attempting to start 'ora.diskmon' on 'znode2′2012-06-28 17:54:42: CRS-2676: Start of 'ora.diskmon' on'znode2′ succeeded2012-06-28 17:54:42: CRS-2676: Start of 'ora.cssd' on 'znode2′ succeeded—注意这里等待了10分钟2012-06-28 18:04:47: Start of resource "ora.ctssd -init -env USR_ORA_ENV=CTSS_REBOOT=TRUE" Succeeded2012-06-28 18:04:47: Command return code of 1 (256) from command: /u01/app/11.2.0/grid/bin/crsctl start resource ora.asm -init2012-06-28 18:04:47: Start of resource "ora.asm -init" failed2012-06-28 18:04:47: Failed to start ASM2012-06-28 18:04:47: Failed to start Oracle Clusterware stack…

    /log/znode2/alert/znodename.log

    CRS-5818:Aborted command 'start for resource: ora.ctssd 1 1′ for resource 'ora.ctssd'. Details at..

    有多种原因可能出现这个问题1, 是public,private网卡配置不正确2,是firewall 未关闭3,hostname 出现在/etc/hosts的127.0.0.1行

    我检查/etc/hosts127.0.0.1 znode1 localhost.localdomain localhost然后把所有节点的127.0.0.1那行改成了127.0.0.1 localhost.localdomain localhost

    关掉OUI,deinstall,重装安装顺利通过

    不关oui,只在失败的节点/app/product/grid/11.2.0/crs/install/roothas.pl -delete -force -verbose重新执行root.sh应该也是可以的。/app/product/grid/11.2.0/root.sh

    1. 把第二块盘设置成scsi1控制,设置硬盘为independent-persistent,在两个节点都做同样的添加。

    2. 添加或修改如下参数:

    scsi1:0.mode = "independent-persistent"scsi1.present = "TRUE"#scsi1.sharedBus = "none"scsi1.virtualDev = "lsilogic"scsi1:0.present = "TRUE"scsi1:0.fileName = "D:VM RACvm_rac1vm_shared_disk.vmdk"scsi1:0.writeThrough = "TRUE"disk.locking = "FALSE"diskLib.dataCacheMaxSize = "0"diskLib.dataCacheMaxReadAheadSize = "0"diskLib.DataCacheMinReadAheadSize = "0"diskLib.dataCachePageSize = "4096"diskLib.maxUnsyncedWrites = "0"scsi1.sharedBus = "virtual"scsi1.shared = "TRUE"

    3. 然后,启动两个vm,在第一个节点上,做fdisk共享盘,然后,partprobe 这个设备,在节点一查看下,再节点2再查看下,看看是否同步:

    [root@rac1 ~]# fdisk -lDisk /dev/sda: 10.7 GB, 10737418240 bytes255 heads, 63 sectors/track, 1305 cylindersUnits = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System/dev/sda1 * 1 13 104391 83 Linux/dev/sda2 14 1177 9349830 83 Linux/dev/sda3 1178 1305 1028160 82 Linux swap / SolarisDisk /dev/sdb: 5368 MB, 5368709120 bytes255 heads, 63 sectors/track, 652 cylindersUnits = cylinders of 16065 * 512 = 8225280 bytesDisk /dev/sdb doesn't contain a valid partition table

    [root@rac1 ~]# fdisk /dev/sdb

    。。。。。

    。。。。。

    Command (m for help): p Disk /dev/sdb: 5368 MB, 5368709120 bytes255 heads, 63 sectors/track, 652 cylindersUnits = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System/dev/sdb1 1 13 104391 83 LinuxCommand (m for help): wThe partition table has been altered!Calling ioctl() to re-read partition table.WARNING: Re-reading the partition table failed with error 16: Device or resource busy.The kernel still uses the old table.The new table will be used at the next reboot.Syncing disks.

    [root@rac1 ~]# partprobe /dev/sdb[root@rac1 ~]# fdisk -lDisk /dev/sda: 10.7 GB, 10737418240 bytes255 heads, 63 sectors/track, 1305 cylindersUnits = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System/dev/sda1 * 1 13 104391 83 Linux/dev/sda2 14 1177 9349830 83 Linux/dev/sda3 1178 1305 1028160 82 Linux swap / SolarisDisk /dev/sdb: 5368 MB, 5368709120 bytes255 heads, 63 sectors/track, 652 cylindersUnits = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System/dev/sdb1 1 13 104391 83 Linux

    节点2:

    [root@rac2 ~]# fdisk -lDisk /dev/sda: 10.7 GB, 10737418240 bytes255 heads, 63 sectors/track, 1305 cylindersUnits = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System/dev/sda1 * 1 13 104391 83 Linux/dev/sda2 14 1177 9349830 83 Linux/dev/sda3 1178 1305 1028160 82 Linux swap / SolarisDisk /dev/sdb: 5368 MB, 5368709120 bytes255 heads, 63 sectors/track, 652 cylindersUnits = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System/dev/sdb1 1 13 104391 83 Linux

    在OEL 6.3上搭建一台11G的RAC测试环境,在最后执行root.sh脚本的时候遇到libcap.so.1: cannot open shared object file: No such file or directory 错误。如下所示:[root@rac1 11.2.0]# /g01/oraInventory/orainstRoot.shChanging permissions of /g01/oraInventory.Adding read,write permissions for group.Removing read,write,execute permissions for world.Changing groupname of /g01/oraInventory to oinstall.The execution of the script is complete.[root@rac1 11.2.0]# /g01/app/11.2.0/grid/root.shRunning Oracle 11g root.sh script...The following environment variables are set as: ORACLE_OWNER= grid ORACLE_HOME= /g01/app/11.2.0/gridEnter the full pathname of the local bin directory: [/usr/local/bin]: Copying dbhome to /usr/local/bin ... Copying oraenv to /usr/local/bin ... Copying coraenv to /usr/local/bin ...Creating /etc/oratab file...Entries will be added to the /etc/oratab file as needed byDatabase Configuration Assistant when a database is createdFinished running generic part of root.sh script.Now product-specific root actions will be performed.2013-10-10 03:41:35: Parsing the host name2013-10-10 03:41:35: Checking for super user privileges2013-10-10 03:41:35: User has super user privilegesUsing configuration parameter file: /g01/app/11.2.0/grid/crs/install/crsconfig_paramsCreating trace directory/g01/app/11.2.0/grid/bin/clscfg.bin: error while loading shared libraries: libcap.so.1: cannot open shared object file: No such file or directoryFailed to create keys in the OLR, rc = 127, 32512OLR configuration failed查询了一下,发现是由于缺少包导致的。在2个节点上重新安装上此包。[root@rac1 Packages]# rpm -ivh compat-libcap1-1.10-1.x86_64.rpmwarning: compat-libcap1-1.10-1.x86_64.rpm: Header V3 RSA/SHA256 Signature, key ID ec551f03: NOKEYPreparing... ########################################### [100%] 1:compat-libcap1 ########################################### [100%]删除以前的CRS配置。[root@rac1 ~]# perl $GRID_HOME/crs/install/rootcrs.pl -verbose -deconfig -force2013-10-10 04:01:41: Parsing the host name2013-10-10 04:01:41: Checking for super user privileges2013-10-10 04:01:41: User has super user privilegesUsing configuration parameter file: /g01/app/11.2.0/grid/crs/install/crsconfig_paramsPRCR-1035 : Failed to look up CRS resource ora.cluster_vip.type for 1PRCR-1068 : Failed to query resourcesCannot communicate with crsdPRCR-1070 : Failed to check if resource ora.gsd is registeredCannot communicate with crsdPRCR-1070 : Failed to check if resource ora.ons is registeredCannot communicate with crsdPRCR-1070 : Failed to check if resource ora.eons is registeredCannot communicate with crsdADVM/ACFS is not supported on oraclelinux-release-6Server-3.0.2.x86_64ACFS-9201: Not SupportedFailure at scls_scr_setval with code 8Internal Error Information: Category: -2 Operation: failed Location: scrsearch3 Other: id doesnt exist scls_scr_setval System Dependent Information: 2CRS-4544: Unable to connect to OHASCRS-4000: Command Stop failed, or completed with errors.error: package cvuqdisk is not installedSuccessfully deconfigured Oracle clusterware stack on this node再次执行root.sh脚本,又遇到了ohasd failed to start at /g01/app/11.2.0/grid/crs/install/rootcrs.pl line 443 错误。[root@rac1 ~]# /g01/app/11.2.0/grid/root.shRunning Oracle 11g root.sh script...The following environment variables are set as: ORACLE_OWNER= grid ORACLE_HOME= /g01/app/11.2.0/gridEnter the full pathname of the local bin directory: [/usr/local/bin]: Copying dbhome to /usr/local/bin ... Copying oraenv to /usr/local/bin ... Copying coraenv to /usr/local/bin ...Entries will be added to the /etc/oratab file as needed byDatabase Configuration Assistant when a database is createdFinished running generic part of root.sh script.Now product-specific root actions will be performed.2013-10-10 04:02:57: Parsing the host name2013-10-10 04:02:57: Checking for super user privileges2013-10-10 04:02:57: User has super user privilegesUsing configuration parameter file: /g01/app/11.2.0/grid/crs/install/crsconfig_paramsLOCAL ADD MODE Creating OCR keys for user 'root', privgrp 'root'..Operation successful. root wallet root wallet cert root cert export peer wallet profile reader wallet pa wallet peer wallet keys pa wallet keys peer cert request pa cert request peer cert pa cert peer root cert TP profile reader root cert TP pa root cert TP peer pa cert TP pa peer cert TP profile reader pa cert TP profile reader peer cert TP peer user cert pa user certAdding daemon to inittabCRS-4124: Oracle High Availability Services startup failed.CRS-4000: Command Start failed, or completed with errors.ohasd failed to start: Inappropriate ioctl for deviceohasd failed to start at /g01/app/11.2.0/grid/crs/install/rootcrs.pl line 443.网上搜了一下,竟然是ORACLE 的BUG。解决方法竟然是出现pa user cert的时候在另一个窗口不停的执行下面的命令,直到命令执行成功,真是变态啊。/bin/dd if=/var/tmp/.oracle/npohasd of=/dev/null bs=1024 count=1具体可以参考:https://forums.oracle.com/thread/2352285

    1. oracle cluster utility failed

    在linux下安装Oracle 11gR2 的grid infrastructure时,root.sh执行完后,最后一步校验时出现

    下列错"oracle cluster utility failed"。

    之前所有过程都是好的。

    检查crs_stat -t,发现gsd资源是目标和状态值都是offline。

    Oracle官方资料解释说。这个GSD是用在9.2库上的,如果没有9.2的库,该服务是可以OFFLINE的。

    5.3.4 Enabling The Global Services Daemon (GSD) for Oracle Database Release 9.2By default, the Global Services daemon (GSD) is disabled. If you install Oracle Database 9i release 2 (9.2) on Oracle Grid Infrastructure for a Cluster 11g release 2 (11.2), then you must enable the GSD. Use the following commands to enable the GSD before you install Oracle Database release 9.2:

    srvctl enable nodeapps -gsrvctl start nodeapps

    因此,这个问题可以不考虑。

    在安装日志文件中发现NTPD的错误信息,回忆起之前NTPD后台服务我没有启动,忽略错误过去的。

    因此,手工启动NTPD服务。

    [bash]# chkconfig --level 2345 ntpd on [bash]# /etc/init.d/ntpd restart

    再重新安装一次,这次一切正常。

    11g安装过程中,所有的校验项都是成功的最好。

    后记:

    其实,可以忽略该错误的。NTPD可以不用。不就是一个时间同步嘛。

    gsd的资源offline也没关系,初始设置的目标就是offline,在11g中也用不上了。

     

    原因说是在hosts文件里指定了scan ip

    日志中也报错:

    INFO: Checking Single Client Access Name (SCAN)...INFO: Checking name resolution setup for "rac-scan"...INFO: ERROR: INFO: PRVF-4664 : Found inconsistent name resolution entries for SCAN name "rac-scan"INFO: ERROR: INFO: PRVF-4657 : Name resolution setup check for "rac-scan" (IP address: 192.168.0.20) failedINFO: ERROR: INFO: PRVF-4664 : Found inconsistent name resolution entries for SCAN name "rac-scan"INFO: Verification of SCAN VIP and Listener setup failed

    搜了一下发现老杨(yangtingkun)的文章中也提到了这个错误:

    F:RHEL5.532oracle_patchyangtingkun 安装Oracle11_2 RAC for Solaris10 sparc64(二).mht

    老杨在文章的最后提到:

    导致这个错误的原因是在/etc/hosts中配置了SCAN的地址,尝试ping这个地址信息,如果可以成功,则这个错误可以忽略。

    我尝试ping scan ip可以ping通,所以暂时也就忽略了这个错误。

    1. database 安装出现的问题
      1. 11gr2 RAC安装INS-35354问题一例

    今天在安装一套11.2.0.2 RAC数据库时出现了INS-35354的问题:

    因为之前已经成功安装了11.2.0.2的GI,而且Cluster的一切状态都正常,出现这错误都少有点意外:

    [grid@vrh1 ~]$ crsctl check crs

    CRS-4638: Oracle High Availability Services is online

    CRS-4537: Cluster Ready Services is online

    CRS-4529: Cluster Synchronization Services is online

    CRS-4533: Event Manager is online

    去MOS搜了一圈,发现有可能是oraInventory中的inventory.xml更新不正确导致的:

    Applies to:

    Oracle Server - Enterprise Edition - Version: 11.2.0.1 to 11.2.0.2 - Release: 11.2 to 11.2

    Information in this document applies to any platform.

    Symptoms

    Installing 11gR2 database software in a Grid Infrastrsucture environment fails with the error INS-35354:

    The system on which you are attempting to install Oracle RAC is not part of a valid cluster.

    Grid Infrastructure (Oracle Clusterware) is running on all nodes in the cluster which can be verified with:

    crsctl check crs

    Changes

    This is a new install.

    Cause

    As per 11gR2 documentation the error description is:

    INS-35354: The system on which you are attempting to install Oracle RAC is not part of a valid cluster.

    Cause: Prior to installing Oracle RAC, you must create a valid cluster.

    This is done by deploying Grid Infrastructure software,

    which will allow configuration of Oracle Clusterware and Automatic Storage Management.

    However, the problem at hand may be that the central inventory is missing the "CRS=true" flag

    (for the Grid Infrastructure Home).

    -------------

    <home_list>

    <node_list>

    -------------

    From the inventory.xml, we see that the HOME NAME line is missing the CRS="true" flag.

    The error INS-35354 will occur when the central inventory entry for the Grid Infrastructure

    home is missing the flag that identifies it as CRS-type home.

    Solution

    Use the -updateNodeList option for the installer command to fix the the inventory.

    The full syntax is:

    ./runInstaller -updateNodeList "CLUSTER_NODES={node1,node2}"

    ORACLE_HOME="" ORACLE_HOME_NAME="" LOCAL_NODE="Node_Name" CRS=[true|false]

    Execute the command on any node in the cluster.

    Examples:

    For a two-node RAC cluster on UNIX:

    Node1:

    cd /u01/grid/oui/bin

    ./runInstaller -updateNodeList "CLUSTER_NODES={node1,node2}" ORACLE_HOME="/u01/crs"

    ORACLE_HOME_NAME="GI_11201" LOCAL_NODE="node1" CRS=true

    For a 2-node RAC cluster on Windows:

    Node 1:

    cd e:app11.2.0gridouiin

    e:app11.2.0gridouiinsetup -updateNodeList "CLUSTER_NODES={RACNODE1,RACNODE2}"

    ORACLE_HOME="e:app11.2.0grid" ORACLE_HOME_NAME="OraCrs11g_home1" LOCAL_NODE="RACNODE1" CRS=true

    我环境中的inventory.xml内容如下:

    [grid@vrh1 ContentsXML]$ cat inventory.xml

    <version_info>

    11.2.0.2.0

    2.1.0.6.0

    <home_list>

    <node_list>

    显然是在这里缺少了CRS="true"的标志,导致OUI安装界面在检测时认为该节点没有安装GI。

    解决方案其实很简单只要加入CRS="true"在重启runInstaller即可,不需要如文档中介绍的那样使用runInstaller -updateNodeList的复杂命令组合。

    [grid@vrh1 ContentsXML]$ cat /g01/oraInventory/ContentsXML/inventory.xml

    <version_info>

    11.2.0.2.0

    2.1.0.6.0

    <home_list>

    <node_list>

    如上修改后问题解决,安装界面正常:

  • 相关阅读:
    poj 3278 catch that cow
    POJ 1028 Web Navigation
    poj 2643 election
    hdu 1908 double queues
    hdu_2669 Romantic(扩展欧几里得)
    0/1背包 dp学习~6
    校验码
    最长上升子序列(LIS经典变型) dp学习~5
    LCS最长公共子序列~dp学习~4
    最长上升子序列(LIS) dp学习~3
  • 原文地址:https://www.cnblogs.com/paul8339/p/6894346.html
Copyright © 2020-2023  润新知