Please beware, these procedures are not for the weak hearted, unless you are responsible for and know how to handle majority of Unix/Solaris stuff, I strongly urge you to stay away... and "Contact your System Administrator" ( lol...typical Msft disclaimer) accordingly.
The terminology is fairly simple, but do research around if you need to know more.
Fellow coffee-mongers and czars of the tech world, read on. Caution: Extremely Detailed
12K/15K CPU (System Board) Replacement using DR
Overview:The following procedure is used to replace a CPU board (System Board) in a 15K / 12K domain. Dynamic Reconfiguration (DR) commands are used to replace a CPU Board with minimal interruption to a domain operation. All the operations described below are performed on the MAIN System Controller (except for steps 2b and 3b).
These steps need to be followed if a SUN technician determines that a CPU board (System Board)
in a domain is faulty and needs replacement. The board that needs replacement can be replaced without shutting down the domain to which it belongs.
Steps:
1. Log into the MAIN SC as a domain administrator for the domain on which the CPU board is to be replaced. Verify with the SUN technician the board that needs to be replaced before powering off the CPU board. The list of boards in each domain can be obtained from the showboards command.
For example, to find out the CPU boards in domain A, type:
sms-svc:> showboards -d a
Retrieving board information. Please wait.
......
Location Pwr Type of Board Board Status Test Status Domain
-------- --- ------------- ------------ ----------- ------
SB0 On CPU Active Passed xxxx-A
SB1 On CPU Active Passed yyyy-A
SB2 - Empty Slot Available - Isolated
SB3 - Empty Slot Available - Isolated
......
IO0 On HPCI Active Passed xxxxx-A
IO1 On HPCI Active Passed yyyyy-A
IO2 - Empty Slot Available - Isolated
IO3 - Empty Slot Available - Isolated
......
2. Check to see if the domain in which the board to be replaced resides in is running Sun Cluster.
a. NOTE: The following command needs to be executed as root on the domain (NOT THE SC) that is having the problem.
# scstat
-- Cluster Nodes --
Node name Status
--------- ------
Cluster node: xxxxx Online
Cluster node: yyyy Online
b. If scstat returns with an output the domain is running Sun Cluster and if the board to be replaced has the permanent memory DR can’t be used. The domain will need to be brought down before the board can be replaced.
c. sms-user:> rcfgadm –d domainID –val |grep perm
If the SB#, which is in the first field of the output, is the same as the System Board that is being replaced, then the OS (domain) will be suspended for approximately 3 minutes during step 3. If the board doesn’t contain the permanent memory, skip step 2b and go directly to step 3. If the board contains permanent memory, execute the following command that will stop the xntpd daemon before proceeding to step 3.
For example: To check the board that contains permanent memory in domain A
sms-user:> rcfgadm -d a -val|grep perm
SB0::memory connected configured ok base address 0x16000000000, 8388608 KBytes total, 1097200 KBytes permanent
The output of the above command shows that the permanent memory for domain A resides on SB0.
d. NOTE: The following command needs to be executed as root on the domain (NOT THE SC) that is having the problem.
# /etc/rc2.d/S74xntpd stop
4.
a. Power Off the CPU board to be replaced.
This command will unconfigure the CPUs and memory of this system board from the running domain. First, check to see if the board that needs to be replaced contains the permanent memory.
sms-user:> deleteboard
where:
board: Specifies the board that is being replaced. For a 15K, the CPU
boards can range from SB0…SB17. For a 12K, the CPU boards can
range from SB0…SB8.
For example: To replace CPU board SB3
sms-svc:> deleteboard sb3
request delete capacity (4 cpus)
request delete capacity (1048576 pages)
request delete capacity SB6 done
....
disconnect SB3
disconnect SB3 done
poweroff SB3
poweroff SB3 done
SB3 successfully unassigned
The CPU board is now powered off and ready to be physically removed.
b. NOTE: If you needed to stop the xntpd daemon in step 2b, it should now be restarted. This command needs to be executed as root on the domain (NOT THE SC) that is having the problem.
# /etc/rc2.d/S74xntpd start
5. You or your on-site technician should now remove the faulty CPU Board from the slot and replace it with a new CPU board.
6. Power on the CPU Board.
Type:
sms-user:> poweron
For example, to power on the CPU board SB3 that was just replaced:
sms-svc:> poweron sb3
7. Update the firmware of the new CPU Board. This is required to bring the firmware of the new CPU board in sync with the firmware of the existing boards. Execute this step even if the firmware of the new CPU board is of a higher revision than the firmware of the other CPU boards.
Type:
sms-user:> flashupdate – f /opt/SUNWSMS/hostobjs/sgcpu.flash
Current System Board FPROM Information
========================================
CPU at SB1, FPROM 0:
POST 05/02/03 16:05:00 Release 5.14.5 Build 2.0 I/F 12
OBP 05/02/03 16:05:00 Release 5.14.5 Build 2.0
Ver 05/02/03 16:05:00 Release 5.14.5 Build 2.0
CPU at SB1, FPROM 1:
POST 05/02/03 16:05:00 Release 5.14.5 Build 2.0 I/F 12
OBP 05/02/03 16:05:00 Release 5.14.5 Build 2.0
Ver 05/02/03 16:05:00 Release 5.14.5 Build 2.0
Flash Image Information
==========================
POST 05/02/03 16:05:00 Release 5.14.5 Build 2.0 I/F 12
OBP 05/02/03 16:05:00 Release 5.14.5 Build 2.0
Ver 05/02/03 16:05:00 Release 5.14.5 Build 2.0
Do you wish to update the FPROM (yes/no)? yes
sms-svc:>
8. Configure the CPU Board in the domain.
Type:
sms-user:> addboard -d domain
where:
-D domain: Specifies the domain to which the board belongs.
board: Specifies the board that is being added. For a 15K, the CPU
boards can range from SB0…SB17. For a 12K, the CPU boards can
range from SB0…SB8.
For example: Replacing CPU board SB3 in domain A
sms-svc:> addboard -d a sb3
configure SB3
configure SB3 done
notify online SUNW_cpu/cpu96
notify online SUNW_cpu/cpu97
notify online SUNW_cpu/cpu98
notify online_SUNW_cpu/cpu99
notify add capacity (4 cpus)
notify add capacity (1048576 pages)
notify add capacity SB3 done
SB3 assigned to domain: A
Thats it, you are all done !
12K/15K CPU (System Board) Replacement without DR
Follow this procedure when system is at the OK prompt or when DR can’t be used.
(For this example Domain a & sb2 will be used)
1. Identify failed bd. & the domain bd. resides in.
sms-svc:> showboards –v
2. Check status of domain. Verify if domain is at the ok> or
needs to brought down. If domain is up have SA bring domain
down.
sms-svc:> console -d a
3. Power off the domain.
sms-svc:> setkeyswitch –d a off
Verify keyswitch setting.
sms-svc:> showkeyswitch –d a
3. Replace system bd. Verify amber light is on before pulling board. After replacing board. Wait until light turns green before proceding.
4. Power on domain to standby mode.
sms-svc:> setkeyswitch –d a standby
Verify keyswitch setting
sms-svc:> showkeyswitch –d a
5. Verify all SB are powered on before flashing.
sms-svc:> showboards –d a
6. Update the sb firmware
sms-svc:> flashupdate – f /opt/SUNWSMS/hostobjs/sgcpu.flash sb3
If firmware on the new bd. Doesn’t match the existing bds.
Answer YES to: Do you wish to update the FPROM?
Run the command again to verify update was successful
Sms-svc:> flashupdate – f /opt/SUNWSMS/hostobjs/sgcpu.flash sb3
7. Power on domain
sms-svc:> setkeyswitch –d a on
There you go, that covers both means of System Board replacement on a 12K/15K SunFire.