Archive

Archive for the ‘clusterware’ Category

CRS-4000: Command Replace failed, or completed with errors.

September 18, 2014 Leave a comment

Today, I tried to restore OCR and VOTE Diskgroup from backup. I re-created the OCR and VOTE diskgroup with the following command

CREATE DISKGROUP VOTEDG NORMAL REDUNDANCY
DISK ‘/dev/asmdisk/ycfmsvot’
DISK ‘/dev/asmdisk/jcfmsvot’
QUORUM DISK ‘/nfsvoting/nfsvotedk’
ATTRIBUTE ‘compatible.asm’ = ‘11.2.0.0’;

After that, I could restore the OCR from backup. But when I replaced the vote disks on the VOTEDG diskgroup, it showed errors as

root@cfmsvm2:/dev/asmdisk# crsctl replace votedisk +VOTEDG
CRS-4602: Failed 27 to add voting file b594600fe26d4f2dbff4f31c84969822.
CRS-4602: Failed 27 to add voting file 7cb1bfa1d53e4f67bf5e656d38ffb025.
CRS-4602: Failed 27 to add voting file 407128acec804ff5bfdc32047f07c7ba.
Failed to replace voting disk group with +VOTEDG.
CRS-4000: Command Replace failed, or completed with errors.

And the ASM alert showed the messages below

NOTE: [crsctl.bin@cfmsvm2 (TNS V1-V3) 13029] opening OCR file
NOTE: updated gpnp profile ASM diskstring:
2014-09-18 11:35:24.225000 +08:00
NOTE: Creating voting files in diskgroup VOTEDG
NOTE: Voting File refresh pending for group 1/0x84c09605 (VOTEDG)
NOTE: Attempting voting file creation in diskgroup VOTEDG
NOTE: voting file allocation on grp 1 disk VOTEDG_0002
NOTE: voting file allocation on grp 1 disk VOTEDG_0000
2014-09-18 11:35:25.375000 +08:00
NOTE: voting file allocation on grp 1 disk VOTEDG_0001
NOTE: Attempting voting file refresh on diskgroup VOTEDG
NOTE: Voting file relocation is required in diskgroup VOTEDG
NOTE: Attempting voting file relocation on diskgroup VOTEDG
NOTE: voting file deletion on grp 1 disk VOTEDG_0000
NOTE: voting file deletion on grp 1 disk VOTEDG_0001
NOTE: voting file deletion on grp 1 disk VOTEDG_0002
NOTE: Failed voting file relocation on diskgroup VOTEDG

Finally, I found the QUORUM DISK ‘/nfsvoting/nfsvotedk’ size was not equal to size of  disk ‘/dev/asmdisk/ycfmsvot’.

root@cfmsvm2:/nfsvoting# ls -al
total 409604
drwxr-xr-x   2 grid     oinstall      64 Sep  3 14:34 .
drwxr-xr-x  25 root     root          29 Sep 17 16:13 ..
-rwxrwxr-x   1 grid     oinstall 209715200 Sep 18 11:38 nfsvotedk <– the size was about 200M

 

Fix Procedure

1. Drop the quorum disk

SQL>  alter system set asm_diskstring=’/dev/asmdisk/*’,’/nfsvoting/*’;

SQL> select NAME, FAILGROUP, path from v$asm_disk;

NAME                           FAILGROUP                      PATH
—————————— —————————— —————————————-
VOTEDG_0001                    VOTEDG_0001                    /dev/asmdisk/jcfmsvot
VOTEDG_0000                    VOTEDG_0000                    /dev/asmdisk/ycfmsvot
VOTEDG_0002                    VOTEDG_0002                    /nfsvoting/nfsvotedk

SQL> ALTER DISKGROUP VOTEDG DROP QUORUM DISK ‘VOTEDG_0002’;

Diskgroup altered.

SQL> select NAME, FAILGROUP, path from v$asm_disk;

NAME                           FAILGROUP                      PATH
—————————— —————————— —————————————-
VOTEDG_0001                    VOTEDG_0001                    /dev/asmdisk/jcfmsvot
VOTEDG_0000                    VOTEDG_0000                    /dev/asmdisk/ycfmsvot

2. Re-create the quorum disk

dd if=/dev/zero of=nfsvotedk bs=1024k count=10240

3. Re-add the quorum disk

SQL> ALTER DISKGROUP VOTEDG ADD  QUORUM DISK ‘/nfsvoting/nfsvotedk’;

Diskgroup altered.

SQL> select NAME, FAILGROUP, path from v$asm_disk;

NAME                           FAILGROUP                      PATH
—————————— —————————— —————————————-
/dev/asmdisk/jasmdk1
/dev/asmdisk/yasmdk1
VOTEDG_0001                    VOTEDG_0001                    /dev/asmdisk/jcfmsvot
VOTEDG_0000                    VOTEDG_0000                    /dev/asmdisk/ycfmsvot
VOTEDG_0002                    VOTEDG_0002                    /nfsvoting/nfsvotedk

4. Replace VOTEDISK  successfully

bash-4.1$ crsctl replace votedisk +VOTEDG
Successful addition of voting disk 49612e9331d64ff5bf59c81c665e201c.
Successful addition of voting disk 3f45109f98254f62bff4947648761a16.
Successful addition of voting disk 4dff0f244aeb4feabf2df797dbdab091.
Successfully replaced voting disk group with +VOTEDG.
CRS-4266: Voting file(s) successfully replaced

 

 

 

 

Categories: clusterware

Unable -2 to apply correct permissions to new voting disk

October 10, 2012 Leave a comment

Add thrid NFS voting disk on extended RAC

1. Setup Solaris NFS server:/votedisk

2. Mount the export on the Solaris RAC Nodes

3. Add the third nfs voting by the command below and show error

# crsctl add css votedisk /votedisk/third_votedisk
Now formatting voting disk: /votedisk/third_votedisk.
Unable -2 to apply correct permissions to new voting disk /votedisk/third_votedisk.
Failure at scrsctl_vdiskperms with code -2
Segmentation Fault – core dumped
#

Finally, I found that I used wrong anon=<number> to export /votedisk on the NFS Server. The annon=<number> is required to equal to UID of oracle account on the RAC node.
Fix:

$ id
uid=100(oracle) gid=100(dba) <– get the uid

2. Go to the NFS Server and export the /votedisk

# cat dfstab

#       Place share(1M) commands here for automatic execution
#       on entering init state 3.
#
#       Issue the command ‘svcadm enable network/nfs/server’ to
#       run the NFS daemon processes and the share commands, after adding
#       the very first entry to this file.
#
#       share [-F fstype] [ -o options] [-d “<text>”] <pathname> [resource]
#       .e.g,
#       share  -F nfs  -o rw=engineering  -d “home dirs”  /export/home2
share -F nfs -o  anon=100 /votedisk <– Edit the anon=100

# shareall

3. Add the third nfs votedisk on RAC node

# crsctl add css votedisk /votedisk/third_votedisk
Now formatting voting disk: /votedisk/third_votedisk.
Successful addition of voting disk /votedisk/third_votedisk.

Categories: clusterware

Oracle Cluster Registry initialization failed accessing Oracle Cluster Registry device: PROC-26: Error while accessing the physical storage CRS daemons not set to start.

August 30, 2012 Leave a comment

Problem:

1. Instantiate Oracle 11gR1 clusterware by the $CRS_HOME/root.sh in solaris 10 on SPARCH Server, it returns as the following

# ./root.sh
Checking to see if Oracle CRS stack is already configured
Setting the permissions on OCR backup directory
Setting up Network socket directories
Oracle Cluster Registry configuration upgraded successfully
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node <nodenumber>: <nodename> <private interconnect name> <hostname>
node 1: jracdb1 jracdb1-priv jracdb1
node 2: jracdb2 jracdb2-priv jracdb2
Creating OCR keys for user ‘root’, privgrp ‘root’..
Operation successful.
Now formatting voting device: /dev/asmdisk/vot1
Now formatting voting device: /dev/asmdisk/vot2
Now formatting voting device: /dev/asmdisk/vot3
Format of 3 voting devices complete.
Oracle Cluster Registry initialization failed accessing Oracle Cluster Registry device: PROC-26: Error while accessing the physical storage
CRS daemons not set to start.

2. From the $CRS_HOME/log/jracdb1/client/ocrconfig_xxxx.log, get the message

Oracle Database 11g CRS Release 11.1.0.7.0 – Production Copyright 1996, 2007 Oracle. All rights reserved.
2012-08-30 09:17:16.625: [ OCRCONF][1]ocrconfig starts…
2012-08-30 09:17:16.627: [ OCRCONF][1]Upgrading OCR data
2012-08-30 09:17:16.941: [  OCRRAW][1]propriogid:1: INVALID FORMAT
2012-08-30 09:17:16.946: [  OCRRAW][1]propriogid:1: INVALID FORMAT
2012-08-30 09:17:16.946: [  OCRRAW][1]proprioini: both disks are not OCR formatted
2012-08-30 09:17:16.946: [  OCRRAW][1]proprinit: Could not open raw device
2012-08-30 09:17:16.947: [ default][1]a_init:7!: Backend init unsuccessful : [26]
2012-08-30 09:17:16.948: [ OCRCONF][1]Exporting OCR data to [OCRUPGRADEFILE]
2012-08-30 09:17:16.949: [  OCRAPI][1]a_init:7!: Backend init unsuccessful : [33]
2012-08-30 09:17:16.949: [ OCRCONF][1]There was no previous version of OCR. error:[PROC-33: Oracle Cluster Registry is not configured]
2012-08-30 09:17:16.979: [  OCRRAW][1]propriogid:1: INVALID FORMAT
2012-08-30 09:17:16.979: [  OCRRAW][1]propriogid:1: INVALID FORMAT
2012-08-30 09:17:16.979: [  OCRRAW][1]proprioini: both disks are not OCR formatted
2012-08-30 09:17:16.979: [  OCRRAW][1]proprinit: Could not open raw device
2012-08-30 09:17:16.980: [ default][1]a_init:7!: Backend init unsuccessful : [26]
2012-08-30 09:17:17.009: [  OCRRAW][1]propriogid:1: INVALID FORMAT
2012-08-30 09:17:17.010: [  OCRRAW][1]propriogid:1: INVALID FORMAT
2012-08-30 09:17:17.012: [  OCRRAW][1]ibctx:1:ERROR: INVALID FORMAT
2012-08-30 09:17:17.012: [  OCRRAW][1]proprinit:problem reading the bootblock or superbloc 22
2012-08-30 09:17:17.041: [  OCRRAW][1]propriogid:1: INVALID FORMAT
2012-08-30 09:17:17.041: [  OCRRAW][1]propriogid:1: INVALID FORMAT
2012-08-30 09:17:17.048: [  OCRRAW][1]propriowv_bootbuf: Vote information on disk 0 [/dev/asmdisk/ocr1] is adjusted from [0/0] to [1/2]
2012-08-30 09:17:17.048: [  OCRRAW][1]propriowv_bootbuf: Vote information on disk 1 [/dev/asmdisk/ocr2] is adjusted from [0/0] to [1/2]
2012-08-30 09:17:17.071: [  OCRRAW][1]iniconfig:No 92 configuration
2012-08-30 09:17:17.095: [  OCRAPI][1]a_init:6a: Backend init successful
2012-08-30 09:17:17.189: [ OCRCONF][1]Initialized DATABASE keys in OCR
2012-08-30 09:17:17.330: [ OCRCONF][1]csetskgfrblock0: output from clsmft: [WARNING:DKIOCGAPART ioctl failed with errno=48

clsfmt: successfully initialized file /dev/asmdisk/ocr1
]
2012-08-30 09:17:17.467: [ OCRCONF][1]csetskgfrblock0: output from clsmft: [WARNING:DKIOCGAPART ioctl failed with errno=48

clsfmt: successfully initialized file /dev/asmdisk/ocr2
]
2012-08-30 09:17:17.477: [ OCRCONF][1]Successfully set skgfr block 0
2012-08-30 09:17:17.478: [ OCRCONF][1]Exiting [status=success]…

Fixed the problem

Finally, I found that the problem was caused by the disk label format. By default, the OCR raw disk I use EFI label. After I changed it to SMI label. The problem was fixed. The procedure to change label was shown as below

# format -e
Searching for disks…done

AVAILABLE DISK SELECTIONS:
0. c0d0 <SUN-DiskImage-60GB cyl 1704 alt 2 hd 96 sec 768>
/virtual-devices@100/channel-devices@200/disk@0
1. c0d1 <SUN-DiskSlice-504MB cyl 64 alt 2 hd 64 sec 256>
/virtual-devices@100/channel-devices@200/disk@1
2. c0d2 <SUN-DiskSlice-504MB cyl 64 alt 2 hd 64 sec 256>
/virtual-devices@100/channel-devices@200/disk@2
3. c0d3 <SUN-DiskSlice-504MB cyl 64 alt 2 hd 64 sec 256>
/virtual-devices@100/channel-devices@200/disk@3
4. c0d4 <SUN-DiskSlice-504MB cyl 64 alt 2 hd 64 sec 256>
/virtual-devices@100/channel-devices@200/disk@4
5. c0d5 <SUN-DiskSlice-504MB cyl 64 alt 2 hd 64 sec 256>
/virtual-devices@100/channel-devices@200/disk@5
Specify disk (enter its number): 1
selecting c0d1
[disk formatted, no defect list found]

FORMAT MENU:
disk       – select a disk
type       – select (define) a disk type
partition  – select (define) a partition table
current    – describe the current disk
format     – format and analyze the disk
repair     – repair a defective sector
show       – translate a disk address
label      – write label to the disk
analyze    – surface analysis
defect     – defect list management
backup     – search for backup labels
verify     – read and display labels
save       – save new disk/partition definitions
volname    – set 8-character volume name
!<cmd>     – execute <cmd>, then return
quit
format> l
[0] SMI Label
[1] EFI Label
Specify Label type[0]:0 <-Enter

Categories: clusterware

Oracle 11.1.06 Clusterware Installation Hung at root.sh

April 18, 2012 Leave a comment

Node 1:

root.sh run successfully

Node 2:

root.sh hung

[root@yecdb02 crs]# ./root.sh
Checking to see if Oracle CRS stack is already configured

Setting the permissions on OCR backup directory
Setting up Network socket directories
Oracle Cluster Registry configuration upgraded successfully
clscfg: EXISTING configuration version 4 detected.
clscfg: version 4 is 11 Release 1.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node <nodenumber>: <nodename> <private interconnect name> <hostname>
node 1: yecdb01 yecdb01-priv yecdb01
node 2: yecdb02 yecdb02-priv yecdb02
clscfg: Arguments check out successfully.

NO KEYS WERE WRITTEN. Supply -force parameter to override.
-force is destructive and will destroy any previous cluster
configuration.
Oracle Cluster Registry for cluster has already been initialized
Startup will be queued to init within 30 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.

 

Log messages in $CRS_HOME/log/yecdb02/crsd/crsd.log:

2012-04-17 17:03:33.441: [ COMMCRS][1084229952]clsc_connect: (0x1385ffb0) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_yecdb02_))

2012-04-17 17:03:33.441: [ CSSCLNT][2916376800]clsssInitNative: failed to connect to (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_yecdb02_)), rc 9

2012-04-17 17:03:33.442: [  CRSRTI][2916376800] CSS is not ready. Received status 3 from CSS. Waiting for good status ..

2012-04-17 17:03:34.445: [ COMMCRS][1084229952]clsc_connect: (0x1385ff90) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_yecdb02_))

2012-04-17 17:03:34.445: [ CSSCLNT][2916376800]clsssInitNative: failed to connect to (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_yecdb02_)), rc 9

2012-04-17 17:03:34.446: [  CRSRTI][2916376800] CSS is not ready. Received status 3 from CSS. Waiting for good status ..

 

Fix procedures

1. cd $CRS_HOME/install

[root@yecdb02 install]# ./rootdelete.sh
Getting local node name
NODE = yecdb02
PRKO-2006 : Invalid node name: yecdb02
Stopping resources.
This could take several minutes.
Error while stopping resources. Possible cause: CRSD is down.
Stopping Cluster Synchronization Services.
Unable to communicate with the Cluster Synchronization Services daemon.
Oracle CRS stack is not running.
Oracle CRS stack is down now.
Removing script for Oracle Cluster Ready services
Updating ocr file for downgrade
Cleaning up SCR settings in ‘/etc/oracle/scls_scr’
Cleaning up Network socket directories

2. Rerun root.sh

[root@yecdb02 crs]# ./root.sh
Checking to see if Oracle CRS stack is already configured

Setting the permissions on OCR backup directory
Setting up Network socket directories
Oracle Cluster Registry configuration upgraded successfully
clscfg: EXISTING configuration version 4 detected.
clscfg: version 4 is 11 Release 1.
Successfully accumulated necessary OCR keys.
Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.
node <nodenumber>: <nodename> <private interconnect name> <hostname>
node 1: yecdb01 yecdb01-priv yecdb01
node 2: yecdb02 yecdb02-priv yecdb02
clscfg: Arguments check out successfully.

NO KEYS WERE WRITTEN. Supply -force parameter to override.
-force is destructive and will destroy any previous cluster
configuration.
Oracle Cluster Registry for cluster has already been initialized
Startup will be queued to init within 30 seconds.
Adding daemons to inittab
Expecting the CRS daemons to be up within 600 seconds.
Cluster Synchronization Services is active on these nodes.
yecdb01
yecdb02
Cluster Synchronization Services is active on all the nodes.
Waiting for the Oracle CRSD and EVMD to start
Oracle CRS stack installed and running under init(1M)
Running vipca(silent) for configuring nodeapps

Creating VIP application resource on (2) nodes…
Creating GSD application resource on (2) nodes…
Creating ONS application resource on (2) nodes…
Starting VIP application resource on (2) nodes…
Starting GSD application resource on (2) nodes…
Starting ONS application resource on (2) nodes…

Done.

 

 

Categories: clusterware

CRS-0184: Cannot communicate with the CRS daemon

August 4, 2011 Leave a comment

Last week,  a  development 11gR1 RAC cluster couldn’t be started in redhat. When I checked the cluster status with crs_stat -t, it returned “CRS-0184: Cannot communicate with the CRS daemon”. The following was the fixing procedures

1. Check system messages

# cd /var/log

[root@SGRAC1 log]# tail messages
Jul 25 11:48:21 SGRAC1 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.5380.
Jul 25 11:49:21 SGRAC1 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.5380.
Jul 25 11:49:21 SGRAC1 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.5301.
Jul 25 11:49:21 SGRAC1 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.5559.
Jul 25 11:50:21 SGRAC1 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.5559.
Jul 25 11:50:21 SGRAC1 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.5380.
Jul 25 11:50:21 SGRAC1 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.5301.
Jul 25 11:51:21 SGRAC1 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.5559.
Jul 25 11:51:21 SGRAC1 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.5301.
Jul 25 11:51:21 SGRAC1 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.5380.

2.  Check details of cluster messages in /tmp/crsctl.5380.

[root@SGRAC1 log]# cat /tmp/crsctl.5380
clsscfg_vhinit: unable(1) to open disk (/dev/sdc1) <– this shows that it can read disk
Internal Error Information:
Category: 1234
Operation: scls_block_open
Location: open
Other: open failed /dev/sdc1
Dep: 13
Failure 1 checking the Cluster Synchronization Services voting disk ‘/dev/sdc1’.
Not able to read adequate number of voting disks

3. Check raw /dev/sdc1 status

[root@SGRAC1 log]# ls -l /dev/sdc1
brw-r—– 1 root disk 8, 33 Jul 25 11:29 /dev/sdc1 <– this shows that there is no permission for Oracle

4. Change owner to oracle

[root@SGRAC1 log]# chown oracle:dba /dev/sdc1

After that, the cluster was back to normal

 

 

Categories: clusterware

Modifying current VIP configuration

May 30, 2011 Leave a comment

The current VIP configuration are stored in OCR (Oracle Cluster Registry). There are four parameters, which are VIP hostname, VIP IP address, VIP subnet mask and interface name, to be modified. The following is an example to show mofity VIP interface name. The details refer to support.oracle.com [ID 276434.1]

We are going to change VIP interface from vnet1 to vnet100001 since we vlan interface is used.

1. Check current VIP configuration currently

$ srvctl config nodeapps -n tvmdb01 -a
VIP exists.: /tvmdb01-vip/172.31.10.102/255.255.0.0/vnet1

These outputs show that:
The VIP Hostname is ‘tvmdb01-vip’
The VIP IP address is ‘172.31.10.102’
The VIP subnet mask is ‘255.255.0.0’
The Interface Name used by the VIP is called ‘vnet1’

2.  Stop Instance, asm and nodeapps

$ srvctl stop instance -d tdb -i tdb1
$ srvctl stop asm -n tvmdb01
$ srvctl stop nodeapps -n tvmdb01

3. Unplumb vnet1 and plumb vnet100001 in OS

4. Bring the vnet100001 interface with the IP of vnet1 in OS

5.  Super to root and modify the VIP interface

su – root
source /export/home/oracle/.profile
# srvctl modify nodeapps -n tvmdb01 -A 172.31.10.102/255.255.255.0/vnet100001
# srvctl config nodeapps -n tvmdb01 -a
VIP exists.: /tvmdb01-vip/172.31.10.102/255.255.255.0/vnet100001 <– this is changed

6. Startup nodeapps, asm and instances

Categories: clusterware

CRS-0215: Could not start resource ‘ora.tvmdb02.LISTENER_TVMDB02.lsnr’.

May 30, 2011 Leave a comment

Symptoms

When starting nodeapps of cluster nodes, it returned the messages below

$ srvctl start nodeapps -n tvmdb02
tvmdb02:ora.tvmdb02.vip:checkIf: Default gateway is not defined (host=tvmdb02)
tvmdb02:ora.tvmdb02.vip:Interface vnet100001 checked failed (host=tvmdb02)
tvmdb02:ora.tvmdb02.vip:Failed to start VIP 172.31.10.103 (host=tvmdb02)
tvmdb02:ora.tvmdb02.vip:checkIf: Default gateway is not defined (host=tvmdb02)
tvmdb02:ora.tvmdb02.vip:Interface vnet100001 checked failed (host=tvmdb02)
tvmdb02:ora.tvmdb02.vip:Failed to start VIP 172.31.10.103 (host=tvmdb02)
CRS-0215: Could not start resource ‘ora.tvmdb02.LISTENER_TVMDB02.lsnr’.

Checking system default gateway, the default gateway was missing

$ netstat -nr
Routing Table: IPv4
Destination           Gateway           Flags  Ref     Use     Interface
——————– ——————– —– —– ———- ———
10.0.0.0             10.1.2.201           U         1        130 vnet2
172.31.10.0          172.31.10.101        U         1         17 vnet100001
192.168.77.0         192.168.77.221       U         1          2 vnet0
127.0.0.1            127.0.0.1            UH       28    3831355 lo0

Fix

Adding the default gateway to the system by root

#  route add default 172.31.10.254

Restart the nodeapps successfully

$  srvctl start nodeapps -n tvmdb02

Categories: clusterware