시스템
네트웍
구성
점검
DB Server Network
/etc/hosts
kblotdb1@oracle10:/home2/oracle10/work>cat /etc/hosts # @(#)B.11.11_LRhosts $Revision: 1.9.214.1 $ $Date: 96/10/08 13:20:01 $ # # The form for each entry is: # <internet address> <official hostname> <aliases> # # For example: # 192.1.2.34 hpfcrm loghost # # See the hosts(4) manual page for more information. # Note: The entries cannot be preceded by a space. # The format described in this file is the correct format. # The original Berkeley manual page contains an error in # the format description. #
127.0.0.1 localhost loopback 10.55.50.201 kblotdb1 10.55.50.202 kblotdb2 10.55.49.206 kblotdb1_int 10.55.49.207 kblotdb2_int 10.55.50.208 kblotdb1_vip 10.55.50.209 kblotdb2_vip |
- kblotdb1/kblotdb2는 public IP, kblotdb1_int/kblotdb2_int는 cluster interconnect, kblotdb1_vip/kblotdb2_vip는 oracle VIP임
netstat
kblotdb1@oracle10:/home2/oracle10>netstat -in Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll lan3* 1500 none none 0 0 0 0 0 lan2 1500 10.55.49.0 10.55.49.206 11357335 0 11975506 0 0 lan1 1500 10.55.50.0 10.55.50.201 6916147 0 13666471 0 0 lan0 1500 192.1.1.0 192.1.1.1 1044011 0 1973741 0 0 lo0 4136 127.0.0.0 127.0.0.1 34678497 0 34678513 0 0 lan4* 1500 none none 0 0 0 0 0 lan1:1 1500 10.55.50.0 10.55.50.208 7737385 0 2486558 0 0 |
- 두
서버
모두 public IP는 lan1의 10.55.50.x를
사용하고
있고, cluster interconnect는 lan2의 10.55.49.x를
사용하고
있음
- Oracle VIP는 public IP의 lan1에
대해
구성되어
있음 (lan1:1)
- lan3과 lan4는 standby로
구성되어
이중화가
되어
있음
-
lan0는 HP MC/SG cluster의 HeartBit으로
구성되어
있음
modify nodeapps
lan1에
장애가
발생하면 lan3가 lan1의 IP를
받게
되므로, oracle VIP가 lan3의
정보를
가지고
있어야
함. 아래와
같은
작업이
필요. (root user로
수행해야
함)
/home2/oracle10/bin/srvctl modify nodeapps -n kblotdb1 -o /home2/oracle10 -A kblotdb1_vip/255.255.255.0/lan1\|lan3 /home2/oracle10/bin/srvctl modify nodeapps -n kblotdb2 -o /home2/oracle10 -A kblotdb2_vip/255.255.255.0/lan1\|lan3 |
- 이
작업을
위해서는 DB, nodeapps 모두
내리고
해야
함. (관련
정보는 metalink Note 296874.1 참조)
Cluster Interconnect
앞의
구성에서
알
수
있지만, sysdba로
접속한 SQL상에서
다음과
같은
명령에
생성된 trace file을
보고
확인할
수
있음. trace file은 udump에
생성됨.
SQL> oradebug setmypid
SQL> oradebug ipc
SSKGXPT 0x275efc flags SSKGXPT_READPENDING info for network 0 socket no 8 IP 10.55.49.206 UDP 54216 sflags SSKGXPT_UP info for network 1 socket no 0 IP 0.0.0.0 UDP 0 sflags SSKGXPT_DOWN context timestamp 0 no ports sconno accono ertt state seq# sent async sync rtrans acks ach accono sconno admno state seq# rcv rtrans acks |
SSKGXPT 0x275fb4 flags SSKGXPT_READPENDING info for network 0 socket no 8 IP 10.55.49.207 UDP 51946 sflags SSKGXPT_UP info for network 1 socket no 0 IP 0.0.0.0 UDP 0 sflags SSKGXPT_DOWN context timestamp 0 no ports sconno accono ertt state seq# sent async sync rtrans acks ach accono sconno admno state seq# rcv rtrans acks |
- UDP에
해당하는 IP를
보면, 앞에서
살펴본
바와
동일함을
알
수
있음
REMOTE_LISTENER
테스트
이전
장애
테스트
전, 서버의 listener 정보가
다음과
같았음
kblotdb1@oracle10:/home2/oracle10/admin/dslot/udump>lsnrctl ser LISTENER_KBLOTDB1
LSNRCTL for HPUX: Version 10.1.0.4.0 – Production on 21-SEP-2005 14:59:44
Copyright (c) 1991, 2004, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=kblotdb1_vip)(PORT=1521))) Services Summary… Service “PLSExtProc” has 1 instance(s). Instance “PLSExtProc”, status UNKNOWN, has 1 handler(s) for this service… Handler(s): “DEDICATED” established:0 refused:0 LOCAL SERVER Service “dslot” has 2 instance(s). Instance “dslot1”, status READY, has 2 handler(s) for this service… Handler(s): “DEDICATED” established:481 refused:0 state:ready LOCAL SERVER “DEDICATED” established:0 refused:0 state:ready REMOTE SERVER (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=kblotdb1_vip)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=dslot)(INSTANCE_NA ME=dslot1))) Instance “dslot2”, status READY, has 1 handler(s) for this service… Handler(s): “DEDICATED” established:131 refused:0 state:ready REMOTE SERVER (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=kblotdb2_vip)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=dslot)(INSTANCE_NAME=dslot2))) The command completed successfully |
REMOTE SERVER가
등록된
배경은
다음 init.ora와 tnsnames.ora의
내용을
살펴보면
됨. (현재 spfile을
사용하고
있지
않음)
remote_listener=LISTENERS_DSLOT dslot1.local_listener =’LOCAL_DSLOT1′ dslot2.local_listener =’LOCAL_DSLOT2′ |
LOCAL_DSLOT2 = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = kblotdb2_vip)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = dslot) (INSTANCE_NAME = dslot2) ) )
LOCAL_DSLOT1 = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = kblotdb1_vip)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = dslot) (INSTANCE_NAME = dslot1) ) )
LISTENERS_DSLOT = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = kblotdb1_vip)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = kblotdb2_vip)(PORT = 1521)) ) |
REMOTE_LISTENER
REMOTE_LISTENER가
정의되어
있으면 server단에서 connection load balancing이
이루어지기
때문에, client가
의도하지
않은
상황으로 DB connection이
맺어질
수
있음.
- BEA WebLogic의 Connection pool을
사용하기
때문에, 굳이 REMOTE_LISTENER를
사용할
필요가
없음.
- 그리고 LOCAL_LISTENER에
해당하는 tnsnames.ora에 CONNECT_DATA는
불필요함
테스트
상황
따라서 init.ora와 tnsnames.ora를
아래와
같이
구성해서 REMOTE_LISTENER를
사용하지
않도록
함.
#remote_listener=LISTENERS_DSLOT dslot1.local_listener =’LISTENER_DSLOT1′ dslot2.local_listener =’LISTENER_DSLOT2′ |
LISTENER_DSLOT1 = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = kblotdb1_vip)(PORT = 1521)) )
LISTENER_DSLOT2 = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = kblotdb2_vip)(PORT = 1521)) ) |
이
상황에서
각
서버의 listener 정보는
다음과
같음
kblotdb1@oracle10:/home2/oracle10/dbs>lsnrctl ser LISTENER_KBLOTDB1
LSNRCTL for HPUX: Version 10.1.0.4.0 – Production on 21-SEP-2005 15:42:03
Copyright (c) 1991, 2004, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=kblotdb1_vip)(PORT=1521))) Services Summary… Service “PLSExtProc” has 1 instance(s). Instance “PLSExtProc”, status UNKNOWN, has 1 handler(s) for this service… Handler(s): “DEDICATED” established:0 refused:0 LOCAL SERVER Service “dslot” has 1 instance(s). Instance “dslot1”, status READY, has 1 handler(s) for this service… Handler(s): “DEDICATED” established:22 refused:0 state:ready LOCAL SERVER The command completed successfully |
kblotdb2|/home2/oracle10/dbs> lsnrctl ser LISTENER_KBLOTDB2
LSNRCTL for HPUX: Version 10.1.0.4.0 – Production on 21-SEP-2005 15:46:24
Copyright (c) 1991, 2004, Oracle. All rights reserved.
Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=kblotdb2_vip)(PORT=1521))) Services Summary… Service “PLSExtProc” has 1 instance(s). Instance “PLSExtProc”, status UNKNOWN, has 1 handler(s) for this service… Handler(s): “DEDICATED” established:0 refused:0 LOCAL SERVER Service “dslot” has 1 instance(s). Instance “dslot2”, status READY, has 1 handler(s) for this service… Handler(s): “DEDICATED” established:40 refused:0 state:ready LOCAL SERVER The command completed successfully |
RAC10g Failover 테스트
ORACLE instance 강제
종료
RAC상의
임의의 instance를
강제
종료시킨
경우, client인 WebLogic의
서비스가 RAC의
살아있는 instance로 failover가
되어야
함.
1호기의 instance를
강제
종료시킨
직후, 2호기의 alert.log는
다음과
같음
Wed Sep 21 16:31:31 2005 Reconfiguration started (old inc 5, new inc 6) List of nodes: 1 Global Resource Directory frozen * dead instance detected – domain 0 invalid = TRUE Update rdomain variables Communication channels reestablished Master broadcasted resource hash value bitmaps Non-local Process blocks cleaned out Wed Sep 21 16:31:31 2005 LMS 1: 0 GCS shadows cancelled, 0 closed Wed Sep 21 16:31:31 2005 LMS 0: 0 GCS shadows cancelled, 0 closed Set master node info Submitted all remote-enqueue requests Dwn-cvts replayed, VALBLKs dubious All grantable enqueues granted Post SMON to start 1st pass IR Wed Sep 21 16:31:32 2005 LMS 1: 2988 GCS shadows traversed, 0 replayed Wed Sep 21 16:31:32 2005 LMS 0: 2871 GCS shadows traversed, 0 replayed Wed Sep 21 16:31:32 2005 Submitted all GCS remote-cache requests Post SMON to start 1st pass IR Fix write in gcs resources Wed Sep 21 16:31:32 2005 Instance recovery: looking for dead threads Wed Sep 21 16:31:32 2005 Beginning instance recovery of 1 threads Reconfiguration complete Wed Sep 21 16:31:33 2005 Started redo scan Wed Sep 21 16:31:33 2005 Completed redo scan 240 redo blocks read, 104 data blocks need recovery Wed Sep 21 16:31:33 2005 Started redo application at Thread 1: logseq 7, block 1392, scn 0.0 Wed Sep 21 16:31:33 2005 Recovery of Online Redo Log: Thread 1 Group 2 Seq 7 Reading mem 0 Mem# 0 errs 0: /dev/kblotdb_vgdb01/rredo112.dbf Mem# 1 errs 0: /dev/kblotdb_vgdb02/rredo212.dbf Wed Sep 21 16:31:33 2005 Completed redo application Wed Sep 21 16:31:34 2005 Completed instance recovery at Thread 1: logseq 7, block 1632, scn 0.3659612 84 data blocks read, 128 data blocks written, 240 redo blocks read |
- 장애 instance에
대한 instance recovery가
완전히
이루어지는데 3초
정도
소요됨을
알
수
있음
-
10g JDBC driver(THIN)을
쓰는 WebLogic5.1의 connection pool이
살아있는 RAC의 instance로 failover되었음
DB 서버 shutdown
RAC상의
임의의 DB 서버에
장애를
발생시킨
경우, client인 WebLogic의
서비스가 RAC의
살아있는
서버(instance)로 failover가
되어야
함.
1호기의 DB 서버에
장애가
발생한
후, 2호기의 log는
다음과
같음
$ORA_CRS_HOME/css/log/ocssd2.log
2005-09-22 02:08:52.076 [4] >WARNING: clssnmeventhndlr: Receive failure with node 1, rc=11 2005-09-22 02:08:52.441 [3] >TRACE: clssnm_skgxncheck: CSS daemon failed on node 1 2005-09-22 02:08:55.330 [8] >WARNING: clssnmPollingThread: node(1) missed(4) checkin(s) 2005-09-22 02:08:56.340 [8] >WARNING: clssnmPollingThread: node(1) missed(5) checkin(s) 2005-09-22 02:08:57.350 [8] >WARNING: clssnmPollingThread: Eviction started for node 1, flags 0x0001, state 3, wt4c 0 2005-09-22 02:09:02.402 [8] >TRACE: clssnmDoSyncUpdate: Initiating sync 15 2005-09-22 02:09:02.402 [4] >TRACE: clssnmHandleSync: Acknowledging sync: src[2] seq[1] sync[15] 2005-09-22 02:09:02.871 [1] >USER: NMEVENT_SUSPEND [00][00][00][04] 2005-09-22 02:09:06.441 [8] >TRACE: clssnmEvict: Evicting node 1, birth 10, death 0, killme 1 2005-09-22 02:09:06.443 [4] >USER: clssnmHandleUpdate: SYNC(15) from node(2) completed 2005-09-22 02:09:06.443 [4] >USER: clssnmHandleUpdate: NODE(2) IS ACTIVE MEMBER OF CLUSTER 2005-09-22 02:09:06.911 [13] >USER: NMEVENT_RECONFIG [00][00][00][04] 2005-09-22 02:09:06.911 [13] >TRACE: clssgmCleanupGrocks: cleaning up grock DBDSLOT type 2 2005-09-22 02:09:06.912 [13] >TRACE: clssgmCleanupGrocks: cleaning up grock DGDSLOT type 2 2005-09-22 02:09:06.912 [13] >TRACE: clssgmCleanupGrocks: cleaning up grock DAALL_DB type 2 2005-09-22 02:09:06.912 [13] >TRACE: clssgmCleanupGrocks: cleaning up grock ocr_crs type 2 2005-09-22 02:09:06.912 [13] >TRACE: clssgmCleanupGrocks: cleaning up grock IGDSLOTALL type 2 2005-09-22 02:09:06.912 [13] >TRACE: clssgmCleanupGrocks: cleaning up grock RES ora.dslot.dslot.dslot2.srv type 3 2005-09-22 02:09:06.912 [13] >TRACE: clssgmEstablishConnections: 1 nodes in cluster incarn 15 2005-09-22 02:09:06.912 [7] >TRACE: clssgmPeerListener: connects done (1/1) CLSS-3000: reconfiguration successful, incarnation 15 with 1 nodes
CLSS-3001: local node number 2, master node number 2
2005-09-22 02:09:06.985 [13] >TRACE: clssnmpostev: leave event posted, node 1 |
-
살아있는 2호기에서 1호기의
장애를
감지하고 1호기
노드를 eviction했음을
알
수
있음.
$ORA_CRS_HOME/crs/log/kblotdb2.log
2005-09-22 02:09:07.001: Processing MemberLeave 2005-09-22 02:09:07.001: [MEMBERLEAVE:717] Processing member leave for kblotdb1, incarnation: 15 2005-09-22 02:09:07.217: [RESOURCE:717] Not failing resource ora.dslot.dslot.dslot2.srv because it was locked. 2005-09-22 02:09:07.218: [RESOURCE:717] X_RES_Unavailable : Resource ora.dslot.dslot.dslot2.srv is locked (File: rti.cpp, line: 812) 2005-09-22 02:09:07.351: Attempting to start 2005-09-22 02:09:35.059: Start of 2005-09-22 02:09:35.194: Attempting to start 2005-09-22 02:09:35.755: Start of 2005-09-22 02:09:35.865: Attempting to start 2005-09-22 02:09:36.319: Start of 2005-09-22 02:09:36.323: [MEMBERLEAVE:717] Do failover for: kblotdb1 2005-09-22 02:09:36.324: [MEMBERLEAVE:717] Post recovery done evmd event for: kblotdb1 |
-
다음으로 CRS가 1호기에
있던 oracle VIP를
살아있는 2호기로 failover시켰음을
알
수
있음.
netstat
kblotdb2|/home2/oracle10/work> netstat -in Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll lan3* 1500 none none 0 0 0 0 0 lan2 1500 10.55.49.0 10.55.49.207 12518382 0 12044400 0 0 lan1:1 1500 10.55.50.0 10.55.50.209 6161297 0 1696506 0 0 lan1 1500 10.55.50.0 10.55.50.202 12894636 0 22979733 0 0 lan0* 1500 192.1.1.0 192.1.1.2 2111713 0 1247138 0 0 lo0 4136 127.0.0.0 127.0.0.1 36147569 0 36147578 0 0 lan1:2 1500 10.55.50.0 10.55.50.208 1575 0 169 0 0 lan4* 1500 none none 0 0 0 0 0 |
- 실제로 1호기의 oracle VIP가 2호기의 lan1:2로 failover됐음을
알
수
있음.
-
WebLogic 서비스에
문제
없음
DB 서버
네트웍
장애
Public LAN 장애
정상인
경우 2호기의
네트웍
상황은
다음과
같음
kblotdb2|/home2/oracle10> netstat -in Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll lan3* 1500 none none 0 0 0 0 0 lan2 1500 10.55.49.0 10.55.49.207 12518881 0 12044830 0 0 lan1:1 1500 10.55.50.0 10.55.50.209 6485707 0 1702063 0 0 lan1 1500 10.55.50.0 10.55.50.202 12911405 0 23327953 0 0 lan0 1500 192.1.1.0 192.1.1.2 2112194 0 1247528 0 0 lo0 4136 127.0.0.0 127.0.0.1 36181974 0 36181983 0 0 lan1:2 1500 10.55.50.0 10.55.50.208 2914 0 272 0 0 lan4* 1500 none none 0 0 0 0 0 |
2호기의 public LAN인 lan1의
네트웍을
절체한
경우
다음과
같이
변함
kblotdb2|/home2/oracle10> netstat -in Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll lan3 1500 10.55.50.0 10.55.50.202 1346 0 3331 0 0 lan2 1500 10.55.49.0 10.55.49.207 102765 0 102573 0 0 lan1* 1500 none none 13257 0 22091 0 0 lan0 1500 192.1.1.0 192.1.1.2 3766 0 6839 0 0 lo0 4136 127.0.0.0 127.0.0.1 140637 0 140637 0 0 lan3:1 1500 10.55.50.0 10.55.50.209 1 0 0 0 0 lan4* 1500 none none 2621 0 2677 0 0 |
- Public IP가 standby였던 lan3으로
이동됐으며, 이에
따라 oracle VIP가 lan3:1로
올라와
있음을
알
수
있음
-
WebLogic 서비스에
문제
없음
2호기의 lan1이
복구가
되면
아래처럼
원래대로
돌아오는
것을
확인할
수
있음
kblotdb2|/home2/oracle10> netstat -in Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll lan3* 1500 none none 3106 0 7692 0 0 lan2 1500 10.55.49.0 10.55.49.207 107982 0 107628 0 0 lan1:1 1500 10.55.50.0 10.55.50.209 12 0 0 0 0 lan1 1500 10.55.50.0 10.55.50.202 13752 0 22982 0 0 lan0 1500 192.1.1.0 192.1.1.2 4122 0 7513 0 0 lo0 4136 127.0.0.0 127.0.0.1 152489 0 152489 0 0 lan4* 1500 none none 2621 0 2677 0 0 |
cluster_interconnect LAN 장애
2호기의 cluster_interconnect LAN인 lan2의
네트웍을
절체한
경우
다음과
같음
kblotdb2|/home2/oracle10> netstat -in Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll lan3 1500 10.55.49.0 10.55.49.207 5519 0 10620 0 0 lan2* 1500 none none 108070 0 107778 0 0 lan1:1 1500 10.55.50.0 10.55.50.209 487 0 13 0 0 lan1 1500 10.55.50.0 10.55.50.202 15595 0 24175 0 0 lan0 1500 192.1.1.0 192.1.1.2 4354 0 7953 0 0 lo0 4136 127.0.0.0 127.0.0.1 160463 0 160463 0 0 lan4* 1500 none none 2621 0 2677 0 0 |
- Standby인 lan3으로 cluster_interconnect IP가
이동됐음을
알
수
있음.
- Oracle, WebLogic 모두
서비스에
문제
없음
Have been taking little over a month.
Podany adres e-mail jest już używany w systemie.
I used to be recommended this blog by way of my cousin. I’m not certain whether or not this post is written by way of him as no one else know such specific about my trouble.
You’re amazing! Thank you!
What’s up, its nice paragraph about media print, we all know media is a fantastic source of facts.
I have been surfing online more than 2 hours today, yet I never found any interesting article like yours.
It is pretty worth enough for me. In my view, if all website
owners and bloggers made good content as you did, the web will be a lot more useful
than ever before.
Awesome post.