Site icon DBA의 정석

Oracle RAC 10g 의 Failover 테스트 방법

 

시스템
네트웍
구성
점검

 

DB Server Network

/etc/hosts

kblotdb1@oracle10:/home2/oracle10/work>cat /etc/hosts

# @(#)B.11.11_LRhosts $Revision: 1.9.214.1 $ $Date: 96/10/08 13:20:01 $

#

# The form for each entry is:

# <internet address> <official hostname> <aliases>

#

# For example:

# 192.1.2.34 hpfcrm loghost

#

# See the hosts(4) manual page for more information.

# Note: The entries cannot be preceded by a space.

# The format described in this file is the correct format.

# The original Berkeley manual page contains an error in

# the format description.

#

 

127.0.0.1 localhost loopback

10.55.50.201 kblotdb1

10.55.50.202 kblotdb2

10.55.49.206 kblotdb1_int

10.55.49.207 kblotdb2_int

10.55.50.208 kblotdb1_vip

10.55.50.209 kblotdb2_vip

 

netstat

kblotdb1@oracle10:/home2/oracle10>netstat -in

Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll

lan3* 1500 none none 0 0 0 0 0

lan2 1500 10.55.49.0 10.55.49.206 11357335 0 11975506 0 0

lan1 1500 10.55.50.0 10.55.50.201 6916147 0 13666471 0 0

lan0 1500 192.1.1.0 192.1.1.1 1044011 0 1973741 0 0

lo0 4136 127.0.0.0 127.0.0.1 34678497 0 34678513 0 0

lan4* 1500 none none 0 0 0 0 0

lan1:1 1500 10.55.50.0 10.55.50.208 7737385 0 2486558 0 0

modify nodeapps

lan1
장애가
발생하면 lan3 lan1 IP
받게
되므로, oracle VIP lan3
정보를
가지고
있어야
. 아래와
같은
작업이
필요. (root user
수행해야
)

/home2/oracle10/bin/srvctl modify nodeapps -n kblotdb1 -o /home2/oracle10 -A kblotdb1_vip/255.255.255.0/lan1\|lan3

/home2/oracle10/bin/srvctl modify nodeapps -n kblotdb2 -o /home2/oracle10 -A kblotdb2_vip/255.255.255.0/lan1\|lan3

 

 

 

Cluster Interconnect

앞의
구성에서


있지만, sysdba
접속한 SQL상에서
다음과
같은
명령에
생성된 trace file
보고
확인할

있음. trace file udump
생성됨.

SQL> oradebug setmypid

SQL> oradebug ipc

SSKGXPT 0x275efc flags SSKGXPT_READPENDING info for network 0

socket no 8 IP 10.55.49.206 UDP 54216

sflags SSKGXPT_UP

info for network 1

socket no 0 IP 0.0.0.0 UDP 0

sflags SSKGXPT_DOWN

context timestamp 0

no ports

sconno accono ertt state seq# sent async sync rtrans acks

ach accono sconno admno state seq# rcv rtrans acks

 

SSKGXPT 0x275fb4 flags SSKGXPT_READPENDING info for network 0

socket no 8 IP 10.55.49.207 UDP 51946

sflags SSKGXPT_UP

info for network 1

socket no 0 IP 0.0.0.0 UDP 0

sflags SSKGXPT_DOWN

context timestamp 0

no ports

sconno accono ertt state seq# sent async sync rtrans acks

ach accono sconno admno state seq# rcv rtrans acks

 

 

 

REMOTE_LISTENER

테스트
이전

장애
테스트
, 서버의 listener 정보가
다음과
같았음

kblotdb1@oracle10:/home2/oracle10/admin/dslot/udump>lsnrctl ser LISTENER_KBLOTDB1

 

LSNRCTL for HPUX: Version 10.1.0.4.0 – Production on 21-SEP-2005 14:59:44

 

Copyright (c) 1991, 2004, Oracle. All rights reserved.

 

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=kblotdb1_vip)(PORT=1521)))

Services Summary…

Service “PLSExtProc” has 1 instance(s).

Instance “PLSExtProc”, status UNKNOWN, has 1 handler(s) for this service…

Handler(s):

“DEDICATED” established:0 refused:0

LOCAL SERVER

Service “dslot” has 2 instance(s).

Instance “dslot1”, status READY, has 2 handler(s) for this service…

Handler(s):

“DEDICATED” established:481 refused:0 state:ready

LOCAL SERVER

“DEDICATED” established:0 refused:0 state:ready

REMOTE SERVER

(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=kblotdb1_vip)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=dslot)(INSTANCE_NA

ME=dslot1)))

Instance “dslot2”, status READY, has 1 handler(s) for this service…

Handler(s):

“DEDICATED” established:131 refused:0 state:ready

REMOTE SERVER

(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=kblotdb2_vip)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=dslot)(INSTANCE_NAME=dslot2)))

The command completed successfully

 

REMOTE SERVER
등록된
배경은
다음 init.ora tnsnames.ora
내용을
살펴보면
. (현재 spfile
사용하고
있지
않음)

remote_listener=LISTENERS_DSLOT

dslot1.local_listener =’LOCAL_DSLOT1′

dslot2.local_listener =’LOCAL_DSLOT2′

 

LOCAL_DSLOT2 =

(DESCRIPTION =

(ADDRESS = (PROTOCOL = TCP)(HOST = kblotdb2_vip)(PORT = 1521))

(CONNECT_DATA =

(SERVER = DEDICATED)

(SERVICE_NAME = dslot)

(INSTANCE_NAME = dslot2)

)

)

 

LOCAL_DSLOT1 =

(DESCRIPTION =

(ADDRESS = (PROTOCOL = TCP)(HOST = kblotdb1_vip)(PORT = 1521))

(CONNECT_DATA =

(SERVER = DEDICATED)

(SERVICE_NAME = dslot)

(INSTANCE_NAME = dslot1)

)

)

 

LISTENERS_DSLOT =

(ADDRESS_LIST =

(ADDRESS = (PROTOCOL = TCP)(HOST = kblotdb1_vip)(PORT = 1521))

(ADDRESS = (PROTOCOL = TCP)(HOST = kblotdb2_vip)(PORT = 1521))

)

 

 

REMOTE_LISTENER

REMOTE_LISTENER
정의되어
있으면 server단에서 connection load balancing
이루어지기
때문에, client
의도하지
않은
상황으로 DB connection
맺어질

있음.

 

 

테스트
상황

따라서 init.ora tnsnames.ora
아래와
같이
구성해서 REMOTE_LISTENER
사용하지
않도록
.    

#remote_listener=LISTENERS_DSLOT

dslot1.local_listener =’LISTENER_DSLOT1′

dslot2.local_listener =’LISTENER_DSLOT2′

 

LISTENER_DSLOT1 =

(ADDRESS_LIST =

(ADDRESS = (PROTOCOL = TCP)(HOST = kblotdb1_vip)(PORT = 1521))

)

 

LISTENER_DSLOT2 =

(ADDRESS_LIST =

(ADDRESS = (PROTOCOL = TCP)(HOST = kblotdb2_vip)(PORT = 1521))

)

 


상황에서

서버의 listener 정보는
다음과
같음

kblotdb1@oracle10:/home2/oracle10/dbs>lsnrctl ser LISTENER_KBLOTDB1

 

LSNRCTL for HPUX: Version 10.1.0.4.0 – Production on 21-SEP-2005 15:42:03

 

Copyright (c) 1991, 2004, Oracle. All rights reserved.

 

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=kblotdb1_vip)(PORT=1521)))

Services Summary…

Service “PLSExtProc” has 1 instance(s).

Instance “PLSExtProc”, status UNKNOWN, has 1 handler(s) for this service…

Handler(s):

“DEDICATED” established:0 refused:0

LOCAL SERVER

Service “dslot” has 1 instance(s).

Instance “dslot1”, status READY, has 1 handler(s) for this service…

Handler(s):

“DEDICATED” established:22 refused:0 state:ready

LOCAL SERVER

The command completed successfully

 

kblotdb2|/home2/oracle10/dbs> lsnrctl ser LISTENER_KBLOTDB2

 

LSNRCTL for HPUX: Version 10.1.0.4.0 – Production on 21-SEP-2005 15:46:24

 

Copyright (c) 1991, 2004, Oracle. All rights reserved.

 

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=kblotdb2_vip)(PORT=1521)))

Services Summary…

Service “PLSExtProc” has 1 instance(s).

Instance “PLSExtProc”, status UNKNOWN, has 1 handler(s) for this service…

Handler(s):

“DEDICATED” established:0 refused:0

LOCAL SERVER

Service “dslot” has 1 instance(s).

Instance “dslot2”, status READY, has 1 handler(s) for this service…

Handler(s):

“DEDICATED” established:40 refused:0 state:ready

LOCAL SERVER

The command completed successfully

 

 

RAC10g Failover 테스트

 

ORACLE instance 강제
종료

RAC상의
임의의 instance
강제
종료시킨
경우, client WebLogic
서비스가 RAC
살아있는 instance failover
되어야
.

1호기의 instance
강제
종료시킨
직후, 2호기의 alert.log
다음과
같음

Wed Sep 21 16:31:31 2005

Reconfiguration started (old inc 5, new inc 6)

List of nodes:

1

Global Resource Directory frozen

* dead instance detected – domain 0 invalid = TRUE

Update rdomain variables

Communication channels reestablished

Master broadcasted resource hash value bitmaps

Non-local Process blocks cleaned out

Wed Sep 21 16:31:31 2005

LMS 1: 0 GCS shadows cancelled, 0 closed

Wed Sep 21 16:31:31 2005

LMS 0: 0 GCS shadows cancelled, 0 closed

Set master node info

Submitted all remote-enqueue requests

Dwn-cvts replayed, VALBLKs dubious

All grantable enqueues granted

Post SMON to start 1st pass IR

Wed Sep 21 16:31:32 2005

LMS 1: 2988 GCS shadows traversed, 0 replayed

Wed Sep 21 16:31:32 2005

LMS 0: 2871 GCS shadows traversed, 0 replayed

Wed Sep 21 16:31:32 2005

Submitted all GCS remote-cache requests

Post SMON to start 1st pass IR

Fix write in gcs resources

Wed Sep 21 16:31:32 2005

Instance recovery: looking for dead threads

Wed Sep 21 16:31:32 2005

Beginning instance recovery of 1 threads

Reconfiguration complete

Wed Sep 21 16:31:33 2005

Started redo scan

Wed Sep 21 16:31:33 2005

Completed redo scan

240 redo blocks read, 104 data blocks need recovery

Wed Sep 21 16:31:33 2005

Started redo application at

Thread 1: logseq 7, block 1392, scn 0.0

Wed Sep 21 16:31:33 2005

Recovery of Online Redo Log: Thread 1 Group 2 Seq 7 Reading mem 0

Mem# 0 errs 0: /dev/kblotdb_vgdb01/rredo112.dbf

Mem# 1 errs 0: /dev/kblotdb_vgdb02/rredo212.dbf

Wed Sep 21 16:31:33 2005

Completed redo application

Wed Sep 21 16:31:34 2005

Completed instance recovery at

Thread 1: logseq 7, block 1632, scn 0.3659612

84 data blocks read, 128 data blocks written, 240 redo blocks read

 

DB 서버 shutdown

RAC상의
임의의 DB 서버에
장애를
발생시킨
경우, client WebLogic
서비스가 RAC
살아있는
서버(instance) failover
되어야
.

1호기의 DB 서버에
장애가
발생한
, 2호기의 log
다음과
같음

$ORA_CRS_HOME/css/log/ocssd2.log

2005-09-22 02:08:52.076 [4] >WARNING: clssnmeventhndlr: Receive failure with node 1, rc=11

2005-09-22 02:08:52.441 [3] >TRACE: clssnm_skgxncheck: CSS daemon failed on node 1

2005-09-22 02:08:55.330 [8] >WARNING: clssnmPollingThread: node(1) missed(4) checkin(s)

2005-09-22 02:08:56.340 [8] >WARNING: clssnmPollingThread: node(1) missed(5) checkin(s)

2005-09-22 02:08:57.350 [8] >WARNING: clssnmPollingThread: Eviction started for node 1, flags 0x0001, state 3, wt4c 0

2005-09-22 02:09:02.402 [8] >TRACE: clssnmDoSyncUpdate: Initiating sync 15

2005-09-22 02:09:02.402 [4] >TRACE: clssnmHandleSync: Acknowledging sync: src[2] seq[1] sync[15]

2005-09-22 02:09:02.871 [1] >USER: NMEVENT_SUSPEND [00][00][00][04]

2005-09-22 02:09:06.441 [8] >TRACE: clssnmEvict: Evicting node 1, birth 10, death 0, killme 1

2005-09-22 02:09:06.443 [4] >USER: clssnmHandleUpdate: SYNC(15) from node(2) completed

2005-09-22 02:09:06.443 [4] >USER: clssnmHandleUpdate: NODE(2) IS ACTIVE MEMBER OF CLUSTER

2005-09-22 02:09:06.911 [13] >USER: NMEVENT_RECONFIG [00][00][00][04]

2005-09-22 02:09:06.911 [13] >TRACE: clssgmCleanupGrocks: cleaning up grock DBDSLOT type 2

2005-09-22 02:09:06.912 [13] >TRACE: clssgmCleanupGrocks: cleaning up grock DGDSLOT type 2

2005-09-22 02:09:06.912 [13] >TRACE: clssgmCleanupGrocks: cleaning up grock DAALL_DB type 2

2005-09-22 02:09:06.912 [13] >TRACE: clssgmCleanupGrocks: cleaning up grock ocr_crs type 2

2005-09-22 02:09:06.912 [13] >TRACE: clssgmCleanupGrocks: cleaning up grock IGDSLOTALL type 2

2005-09-22 02:09:06.912 [13] >TRACE: clssgmCleanupGrocks: cleaning up grock RES ora.dslot.dslot.dslot2.srv type 3

2005-09-22 02:09:06.912 [13] >TRACE: clssgmEstablishConnections: 1 nodes in cluster incarn 15

2005-09-22 02:09:06.912 [7] >TRACE: clssgmPeerListener: connects done (1/1)

CLSS-3000: reconfiguration successful, incarnation 15 with 1 nodes

 

CLSS-3001: local node number 2, master node number 2

 

2005-09-22 02:09:06.985 [13] >TRACE: clssnmpostev: leave event posted, node 1

$ORA_CRS_HOME/crs/log/kblotdb2.log

2005-09-22 02:09:07.001: Processing MemberLeave

2005-09-22 02:09:07.001: [MEMBERLEAVE:717] Processing member leave for kblotdb1, incarnation: 15

2005-09-22 02:09:07.217: [RESOURCE:717] Not failing resource ora.dslot.dslot.dslot2.srv because it was locked.

2005-09-22 02:09:07.218: [RESOURCE:717] X_RES_Unavailable : Resource ora.dslot.dslot.dslot2.srv is locked

(File: rti.cpp, line: 812)

2005-09-22 02:09:07.351: Attempting to start ora.kblotdb1.vip on member kblotdb2

2005-09-22 02:09:35.059: Start of ora.kblotdb1.vip on member kblotdb2 succeeded.

2005-09-22 02:09:35.194: Attempting to start ora.dslot.dslot.cs on member kblotdb2

2005-09-22 02:09:35.755: Start of ora.dslot.dslot.cs on member kblotdb2 succeeded.

2005-09-22 02:09:35.865: Attempting to start ora.dslot.db on member kblotdb2

2005-09-22 02:09:36.319: Start of ora.dslot.db on member kblotdb2 succeeded.

2005-09-22 02:09:36.323: [MEMBERLEAVE:717] Do failover for: kblotdb1

2005-09-22 02:09:36.324: [MEMBERLEAVE:717] Post recovery done evmd event for: kblotdb1

netstat

kblotdb2|/home2/oracle10/work> netstat -in

Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll

lan3* 1500 none none 0 0 0 0 0

lan2 1500 10.55.49.0 10.55.49.207 12518382 0 12044400 0 0

lan1:1 1500 10.55.50.0 10.55.50.209 6161297 0 1696506 0 0

lan1 1500 10.55.50.0 10.55.50.202 12894636 0 22979733 0 0

lan0* 1500 192.1.1.0 192.1.1.2 2111713 0 1247138 0 0

lo0 4136 127.0.0.0 127.0.0.1 36147569 0 36147578 0 0

lan1:2 1500 10.55.50.0 10.55.50.208 1575 0 169 0 0

lan4* 1500 none none 0 0 0 0 0

 

DB 서버
네트웍
장애

Public LAN 장애

정상인
경우 2호기의
네트웍
상황은
다음과
같음

kblotdb2|/home2/oracle10> netstat -in

Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll

lan3* 1500 none none 0 0 0 0 0

lan2 1500 10.55.49.0 10.55.49.207 12518881 0 12044830 0 0

lan1:1 1500 10.55.50.0 10.55.50.209 6485707 0 1702063 0 0

lan1 1500 10.55.50.0 10.55.50.202 12911405 0 23327953 0 0

lan0 1500 192.1.1.0 192.1.1.2 2112194 0 1247528 0 0

lo0 4136 127.0.0.0 127.0.0.1 36181974 0 36181983 0 0

lan1:2 1500 10.55.50.0 10.55.50.208 2914 0 272 0 0

lan4* 1500 none none 0 0 0 0 0

 

2호기의 public LAN lan1
네트웍을
절체한
경우
다음과
같이
변함

kblotdb2|/home2/oracle10> netstat -in

Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll

lan3 1500 10.55.50.0 10.55.50.202 1346 0 3331 0 0

lan2 1500 10.55.49.0 10.55.49.207 102765 0 102573 0 0

lan1* 1500 none none 13257 0 22091 0 0

lan0 1500 192.1.1.0 192.1.1.2 3766 0 6839 0 0

lo0 4136 127.0.0.0 127.0.0.1 140637 0 140637 0 0

lan3:1 1500 10.55.50.0 10.55.50.209 1 0 0 0 0

lan4* 1500 none none 2621 0 2677 0 0

2호기의 lan1
복구가
되면
아래처럼
원래대로
돌아오는
것을
확인할

있음

kblotdb2|/home2/oracle10> netstat -in

Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll

lan3* 1500 none none 3106 0 7692 0 0

lan2 1500 10.55.49.0 10.55.49.207 107982 0 107628 0 0

lan1:1 1500 10.55.50.0 10.55.50.209 12 0 0 0 0

lan1 1500 10.55.50.0 10.55.50.202 13752 0 22982 0 0

lan0 1500 192.1.1.0 192.1.1.2 4122 0 7513 0 0

lo0 4136 127.0.0.0 127.0.0.1 152489 0 152489 0 0

lan4* 1500 none none 2621 0 2677 0 0

 

 

cluster_interconnect LAN 장애

2호기의 cluster_interconnect LAN lan2
네트웍을
절체한
경우
다음과
같음

kblotdb2|/home2/oracle10> netstat -in

Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll

lan3 1500 10.55.49.0 10.55.49.207 5519 0 10620 0 0

lan2* 1500 none none 108070 0 107778 0 0

lan1:1 1500 10.55.50.0 10.55.50.209 487 0 13 0 0

lan1 1500 10.55.50.0 10.55.50.202 15595 0 24175 0 0

lan0 1500 192.1.1.0 192.1.1.2 4354 0 7953 0 0

lo0 4136 127.0.0.0 127.0.0.1 160463 0 160463 0 0

lan4* 1500 none none 2621 0 2677 0 0

Exit mobile version