Project Phoenix
Table of contents
- Overview
- Background
- Project objectives
- Major challenges
- Key technical work
- Outcome
- Technical Implementation
- Data Collection
- Phase 1 - New Production Cluster
- 1. Capture Imaging of Primary Hosts
- 2. Hardware Preparation
- 3. OS Installation on New Hardware
- 4. Post-Installation Cleanup
- 5. Cluster Clean up - remove nodes
- 6. Create the New Cluster
- 7. Storage Configuration
- 8. Data & Filesystem Migration
- 9. Configure IPMP Network Redundancy
- 10. Recreating the Cluster Configuration xml
- 11. Recreate the Cluster Resource Groups
- 12. Resource Registration
- 13. Resource Creation
- 14. Testing
- 15. Final Data Sync
- 16. Failover/Migration to New Cluster
- Phase 2 - New Disaster Recovery Cluster
- Phase 3 - Third Copy Cluster
Overview
The client
A major pan‑European financial institution serving around 15 million clients across Italy, Germany, Austria, and Central & Eastern Europe.
Its markets and investment banking division operates at multi‑billion‑euro scale, supporting international trading, treasury, and structured finance activities. The organisation is known for its large, complex technology landscape, combining legacy platforms with modern digitalisation initiatives to support high‑volume, real‑time financial operations.
The Project
To deliver a full relocation and expansion of a Tier‑1 business‑critical trading cluster from Germany to Italy.
Replacing the original 2-node active/active (prod/DR) environment into three 2 node clusters spanning production, disaster recovery, and a dedicated non‑production replica.
Application Stack
The platform provided front‑to‑back support for repo and securities‑lending operations, combining trade capture, pricing, position management, and lifecycle processing in a single integrated system.
It was used across front‑office, middle‑office, and back‑office teams, supporting execution, risk, settlement, collateral, and operational workflows.
The system was tightly integrated with a market‑connectivity layer that handled electronic interfaces to repo markets and trading venues, enabling real‑time interaction with external liquidity sources.
Note:
All system names, domains, client references, and geographic details have been anonymised for confidentiality.
Background
The existing environment consisted of:- A 2-node active/active cluster, at 2 german data centers
- Each node had capacity to handle the full load, therefore they both acted as Production & Disaster Recovery
- Sun SPARC Enterprise M4000 servers (circa 2007–2008)
- Oracle Solaris 10
- Solaris Cluster 3.x
Key Implications of the Legacy Stack
- No new patches, firmware, or vendor fixes
- Deeply embedded customisations, operational tooling, and additional components unavailable for reinstall
- No ability to reinstall or rebuild from scratch
- Hardware too old and fragile to be physically relocated
- Cluster version incompatible with modern hardware
- Zero vendor support for troubleshooting
Project objectives
The Project was split into three major phases:Phase 1 - New Production Cluster
Replicate the 2 node cluster to Italian Data Center 1, active/active, as Production only
The german cluster then becomes the Disaster Recovery site
Phase 2 - New Disaster Recovery Cluster
Replicate the new production cluster in Italy, to data center 2.
Work out how to synchronzie data between the 2 clusters for failover
This 2nd cluster becomes the Disaster Recovery site.
The Germany cluster can then be decommissioned
Phase 3 - Third Copy Cluster
Create another copy of the production cluster in Italian Data Center 3.
This becomes inactive 3rd copy.
Major challenges
1. Hardware too old to move or power‑cycle safely
The original SPARC M4000 systems were at high risk of failure if transported or even rebooted. This forced a strategy of non‑intrusive extraction, cloning, and remote analysis.
2. No installation media for critical software
Many components of the original application stack and cluster configuration were no longer available from the vendor, not stored internally, and not reproducible from scratch. This made image‑based cloning and rsync‑driven reconstruction the only viable method.
3. Zero vendor support
With hardware EOSL and Solaris 10/Cluster 3.x in limited Extended Support, there was effectively no vendor assistance, no updated documentation, no new patches, and no troubleshooting help. Every step — from cluster reconstruction to storage provisioning — had to be researched, tested, and validated manually.
4. Rebuilding a cluster without reinstalling Solaris
Because fresh installation was impossible, the new clusters had to be built from system images, re‑parameterised, re‑networked, re‑clustered, re‑quorumed, and re‑storage‑mapped, all without breaking compatibility with the legacy application stack.
5. Multi‑site, multi‑cluster consistency
Three clusters had to behave identically despite different hardware, storage arrays, network topologies, interconnects,
and site‑level constraints. This required repeatable automation, custom scripts, and extensive failover testing.
Metadbs not being recognised after replication
Key technical work
System imaging and reconstruction
- Created Solaris FLAR images from the original systems to capture a consistent baseline.
- Rebuilt environments on newer SPARC hardware using image‑based deployment.
- Phase 1: Used rsync to migrate application, configuration, and database data.
- Phase 2+3: data synchronised via SAN level LUN synchronization.
Cluster rebuild
- Reconstructed Solaris Cluster configuration from exported definitions.
- Re‑created resource groups, HAStoragePlus, GDS, and application services.
- Re‑established quorum devices and private interconnect networks.
- Validated fencing, failover, and recovery behaviour end‑to‑end.
Storage and filesystem engineering
- Replicated all SVM metasets and metadevices: 16 LUNs, 3 metasets.
- Mapped new LUNs and rebuilt storage configurations on the target arrays.
- Recreated filesystems with application‑specific parameters for database workloads.
- Ensured compatibility with existing database and application expectations.
Network and redundancy
- Implemented IPMP for network interface failover and redundancy.
- Rebuilt interconnect networks with strict isolation to avoid cross‑cluster interference.
- Reconstructed resolver, LDAP, and related service configurations for the new sites.
Testing and validation
- Performed controlled failover simulations across all clusters.
- Validated cluster membership, quorum behaviour, and fencing logic.
- Ensured application services behaved consistently across all three environments.
- Documented procedures to support future migrations and operational tasks.
Outcome
Despite the absence of vendor support, missing installation media, and the fragility of the original hardware, the project delivered three fully functional Solaris/SPARC clusters with identical behaviour across environments. The work provided a safe migration path away from EOSL hardware, a validated disaster‑recovery strategy, and a reproducible process for future rebuilds and migrations.
This engagement required deep knowledge of Solaris internals, clustering, storage, and legacy systems, as well as extensive problem‑solving in an environment with no practical vendor safety net.
Technical Implementation
The following section provides a reconstructed and sanitised walkthrough of the technical steps involved in rebuilding and migrating the legacy Solaris/SPARC cluster environment. All hostnames, domains, and client‑specific identifiers have been removed. Disk sizes, set names etc, also have been changed. But not the number of devices or resources - so the complexity of the project remains unaltered.
Data Collection
Hosts- Original German Production/DR Cluster: originhost01, originhost02
- New Italian Production Cluster: prodhost01, prodhost02
- New Italian Disaster Recovery Cluster: prodhost03, prodhost04
- New Italian 3rd Copy Cluster: prodhost05, prodhost06
Number x Size of LUNs, provisioned for each of the 4 clusters
- 2 x 1GB
- 1 x 5GB
- 1 x 50GB
- 8 x 100GB
- 2 x 250GB
- 2 x 500GB
- Application: APP-DS: d100 -m d101 (2 x 150gb stripe) -> /opt/app
- Database Metaset: DBSID-DS d110 -m d111 (4 x 100gb stripe) -> /data/DBSID/sybase01 d120 -m d121 (4 x 100gb stripe) -> /data/DBSID/sybase02 d130 -m d131 (2 x 500gb stripe) -> /data/DBSID/backup d140 -m d132 (1 x 1gb) -> /opy/sybase/admin/DBSID f150 -m d151
- MQ File Transfer Edition metaset:MQFTE-DS d150 -m d151 (1 x 5gb) -> /data/mqfte/config d160 -m d161 (1 x 50gb) -> /data/mqfte/files
- APP-rg
- DBSID-rg
- MQFTE-rg
- Original German Production/DR Cluster: originhost01, originhost02
- New Italian Production Cluster: 16
- New Italian Disaster Recovery Cluster: prodhost03, prodhost04
- New Italian 3rd Copy Cluster: prodhost05, prodhost06
Phase 1 - New Production Cluster
1. Capture Imaging of Primary Hosts
1.1 Create FLAR images of the original SPARC systems This is the first time critical event that had to be arranged with the bank business side.The system had to be taken down to single user mode, to ensure minimal number of files are altered.
Network interface and routing only started, to allow the image to be written to nfsserver.
# flarcreate -x -x -S -n originhost01 -L cpio /mnt/originhost01.flar
# flarcreate -x -x -S -n originhost02 -L cpio /mnt/originhost02.flar
1.2. Export Cluster Configuration
# cluster export > /mnt/origHost1/clusterconfig.xml
1.3 Copy Disk configuration settings
# metaset | grep -i set
# metastat -s DB-DS -p > /mnt/origHost1/DB-DS.lst
# metastat -s APP-DS -p > /mnt/origHost1/APP-DS.lst
# metastat -s MQ-DS -p > /mnt/origHost1/MQ-DS.lst
# metadb -i > /mnt/origHost1/metadb-i.lst
# cat /etc/vfstab > /mnt/origHost1/vfstab.lst
# cat /etc/hosts > /mnt/origHost1/hosts.lst
# echo | format > /mnt/origHost1/format.lst
# cfgadm -al > /mnt/origHost1/cfgadm-al.lst
# devfsadm -v > /mnt/origHost1/devfsadm-v.lst
# multipath list LU > /mnt/origHost1/multipath-list.lst
# luxadm probe > /mnt/origHost1/luxadm-probe.lst
# scdidadm -l > /mnt/origHost1/scdidadm-l.lst
2. Hardware Preparation
- Rack and cable new SPARC systems
- Install FC and network cards
- Configure ILOM/XSCF
- Request switch ports and validate connectivity
- Insert Solaris 10 installation media
3. OS Installation on New Hardware
3.1 Connect to the eXtended System Control Facility (XSCF)ssh prodhost01-rsa -l admin
XSCF> poweron -a
XSCF> showhardconf
3.2 Connect to the domain console
XSCF> console -d0
3.3 Confirm Disks are ok, and then boot off cdrom
{0} ok probe-scsi-all
{0} ok boot cdrom
3.4 The Solaris Installation Program
Networked [X] Yes
Network interfaces [X] nxge3
Use DHCP: No
Host name: prodhost01
IP address: 10.10.10.121
System part of a subnet: Yes
Netmask: 255.255.255.0
Enable IPv6: No
Default Route: Specify one
Router IP Address: 10.10.10.1
System identification complete.
Starting Solaris installation program...
Executing JumpStart preinstall phase...
Searching for SolStart directory...
Checking rules.ok file...
Using begin script: install_begin
Using finish script: patch_finish
Executing SolStart preinstall phase...
Executing begin script "install_begin"...
Begin script install_begin execution completed.
3.5 Exit installer, configure netmask, NIC/VLAN/IP and nfs mountPress F5 (or ESC-5) to exit the installer:
If you exit the Solaris Interactive Installation program, your
profile is deleted. However, you can restart the Solaris
Interactive Installation program from the console window.
F2_Exit Installation F5_Cancel
Press F2 (ESC-2) to continue to a shell prompt:
To restart the Solaris installation program,
type "install-solaris".
Solaris installation program exited.
# echo "10.0.0.128 255.255.255.224" >> /etc/netmasks
# echo 10.0.0.130 nfsfiler >> /etc/hosts
# ifconfig nxge3777 plumb
# ifconfig nxge3777 down
# ifconfig nxge3777 10.0.0.131 netmask 255.255.255.224 broadcast +
# ifconfig nxge3777 up
# route add net default 10.0.0.1
# mount nfsfiler:/import/SPARC /mnt
3.6a Troubleshooting
# ls -la /mnt/originhost01.flar
ls: can't read ACL on /mnt/originhost01.flar: Permission denied
# chown nobody:nobody /mnt/originhost01.flar
# getfacl /mnt/originhost01.flar
# file: /mnt/originhost01.flar
# owner: nobody4
# group: nogroup
user::rwx
group::rwx #effective:rwx
mask:rwx
other:rwx
# mount -o vers=3 nfsfiler:/import/SPARC/ /mnt
3.7 Continue Installer
# install-solaris
Select install from flar:
First DR host - select nfsfiler:/import/SPARC/originhost01.flar
2nd DR host - select nfsfiler:/import/SPARC/originhost02.flar
Select disk, and other OS install options:
Installation Option: Flash
Boot Device: c0t0d0
Root File System Type: ZFS
Client Services: None
Software: 1 Flash Archive
local file: originhost01.flar
Pool Name: rpool
Boot Environment Name: s10s_u11wos_24a
Pool Size: 858407 MB
Devices in Pool: c0t0d0
c0t1d0
Preparing system for Flash install
Configuring disk (c0t0d0)
- Creating Solaris disk label (VTOC)
Configuring disk (c0t1d0)
- Creating Solaris disk label (VTOC)
- Creating pool rpool
- Creating swap zvol for pool rpool
3.8. Reconfiguring OS Cloned image
# mkdir /tmp/A
# zfs set mountpoint=/tmp/A rpool/ROOT/s10s_u11wos_24a
# zfs mount rpool/ROOT/s10s_u11wos_24a
# cd /tmp/A/etc
# vi passwd (add adminx user)
# vi shadow (add adminx user)
# vi /etc/sudoers (add adminx user)
# vi /etc/nsswitch.conf (change to files, as ldap will not work on new network)
# zfs umount rpool/ROOT/s10s_u10wos_17b
# zpool export rpool
# sync;sync; halt
{0} ok boot -x
4. Post-Installation Cleanup
System is in single-user maintenace mode4.1 Update IP address, netmask, hostname, nodename, vips
Set originhosts to loopback address:
echo 127.0.0.1 originhost01 originhost01.domain.net >> /etc/hosts
echo 127.0.0.1 originhost02 originhost02.domain.net >> /etc/hosts
4.2 Hash out all SVM mounts invi /etc/vfstab
4.3 Update DNS servers and domain search paths
vi /etc/resolv.conf
4.4 Disable ldapsvcadm disable ldapclient
4.5 Reconfigre ldap for new location4.6 Restart ldap
svcadm enable ldapclient
4.7 Identify HBA WWNsList the connected HBA’s:
root@prodhost01:~ 12:46:24 luxadm -e port |grep CONNECTED
/devices/pci@1,700000/SUNW,qlc@0/fp@0,0:devctl CONNECTED
/devices/pci@3,700000/SUNW,qlc@0/fp@0,0:devctl CONNECTED
Verify FC ports are connected and configured:
root@prodhost01:~ 12:48:36 cfgadm -al -o show_FCP_dev |grep fc-fabric
c1 fc-fabric connected configured unknown
c2 fc-fabric connected configured unknown
Request SAN Storage to be provisioned to thise WWNs.4.8 Create a backup boot environment
# lucreate -n s10u11_ProdImage.01clean
5. Cluster Clean up - remove nodes
5.1 Remove each host from the old cluster# clnode remove
Verifying that no unexpected global mounts remain in /etc/vfstab ... done
Verifying that no device services still reference this node ... done
Archiving the following to /var/cluster/uninstall/uninstall.29656/archive:
/etc/cluster ...
/etc/path_to_inst ...
/etc/vfstab ...
/etc/nsswitch.conf ...
Removing the private hostname from "ntp.conf.sc" on node "orginhost01" ...done
Removing the private hostname from "ntp.conf.sc" on node "orginhost02" ...done
dumb: Unknown terminal type
clnode: Unable to remove "etc/cluster/nodeid" entry from the boot archive ("/boot/solaris/filelist.ramdisk")
Attempting to contact the cluster ...
Trying "orginhost01" ... timed out
Trying "orginhost02" ... timed out
Unable to contact the cluster.
Additional housekeeping may be required to unconfigure
orginhost02 from the active cluster.
Removing the following:
/etc/cluster ...
/dev/global ...
/dev/md/shared ...
/.globaldevices ...
/dev/did ...
/devices/pseudo/did@0:* ...
The private host entry of this node has been removed from
/etc/inet/ntp.conf.sc, but the NTP service is still enabled. If you
have no further use for the NTP service, you can disable it after the
uninstall command has completed.
The /var/cluster directory has not been removed.
Among other things, this directory contains
uninstall logs and the uninstall archive.
You may remove this directory once you are satisfied
that the logs and archive are no longer needed.
Log file - /var/cluster/uninstall/uninstall.29656/log
# devfsadm -Cv
# init 6
5.2 Create another backup boot environment
# lucreate -n s10u11_ProdImage.02.nocluster
6. Create the New Cluster
6.1 Solaris Cluster Install:# scinstall
*** Main Menu ***
Please select from one of the following (*) options:
* 1) Create a new cluster or add a cluster node
2) Configure a cluster to be JumpStarted from this install server
3) Manage a dual-partition upgrade
4) Upgrade this cluster node
* 5) Print release information for this cluster node
* ?) Help with menu options
* q) Quit
Option: 1
*** New Cluster and Cluster Node Menu ***
Please select from any one of the following options:
1) Create a new cluster
2) Create just the first node of a new cluster on this machine
3) Add this machine as a node in an existing cluster
?) Help with menu options
q) Return to the Main Menu
Option: 1
*** Create a New Cluster ***
This option creates and configures a new cluster.
You must use the Oracle Solaris Cluster installation media to install
the Oracle Solaris Cluster framework software on each machine in the
new cluster before you select this option.
If the "remote configuration" option is unselected from the Oracle
Solaris Cluster installer when you install the Oracle Solaris Cluster
framework on any of the new nodes, then you must configure either the
remote shell (see rsh(1)) or the secure shell (see ssh(1)) before you
select this option. If rsh or ssh is used, you must enable root access
to all of the new member nodes from this node.
Press Control-D at any time to return to the Main Menu.
Do you want to continue (yes/no) [yes]? yes
>>> Typical or Custom Mode <<<
This tool supports two modes of operation, Typical mode and Custom
mode. For most clusters, you can use Typical mode. However, you might
need to select the Custom mode option if not all of the Typical mode
defaults can be applied to your cluster.
For more information about the differences between Typical and Custom
modes, select the Help option from the menu.
Please select from one of the following options:
1) Typical
2) Custom
?) Help
q) Return to the Main Menu
Option [1]: 2
What is the name of the cluster you want to establish [sc_prodapp]?
Node name (Control-D to finish): prodhost01
Node name (Control-D to finish): prodhost02
Do you need to use DES authentication (yes/no) [no]?
Should this cluster use at least two private networks (yes/no) [yes]?
Does this two-node cluster use switches (yes/no) [yes]?
What is the name of the first switch in the cluster [switch1]?
What is the name of the second switch in the cluster [switch2]?
Select the first cluster transport adapter: nxge1
Will this be a dedicated cluster transport adapter (yes/no) [yes]? yes
For node "prodhost01",
Name of the switch to which "nxge1" is connected [switch1]?
For node "prodhost01",
Use the default port name for the "nxge1" connection (yes/no) [yes]?
Select the second cluster transport adapter:nxge2
Will this be a dedicated cluster transport adapter (yes/no) [yes]?
For node "prodhost01",
Name of the switch to which "nxge2" is connected [switch2]?
For node "prodhost01",
Use the default port name for the "nxge2" connection (yes/no) [yes]?
For all other nodes,
Autodiscovery is the best method for configuring the cluster
transport. However, you can choose to manually configure the remaining
adapters and cables.
Is it okay to use autodiscovery for the other nodes (yes/no) [yes]?
Is it okay to accept the default network address (yes/no) [yes]?
Is it okay to accept the default netmask (yes/no) [yes]?
Do you want to turn off global fencing (yes/no) [no]?
Global Devices File System
The default is to use lofi.
For node "prodhost01",
Is it okay to use this default (yes/no) [yes]?
For node "prodhost02",
Is it okay to use this default (yes/no) [yes]?
Configuring global device using lofi on prodhost02: done
Is it okay to create the new cluster (yes/no) [yes]?
Interrupt cluster creation for cluster check errors (yes/no) [no]?
Cluster Creation
Log file - /var/cluster/logs/install/scinstall.log.10359
Starting discovery of the cluster transport configuration.
The following connections were discovered:
prodhost01:nxge1 switch1 prodhost02:nxge1
prodhost01:nxge2 switch2 prodhost02:nxge2
Completed discovery of the cluster transport configuration.
Started cluster check on "prodhost01".
Started cluster check on "prodhost02".
cluster check failed for "prodhost01".
cluster check failed for "prodhost02".
The cluster check command failed on both of the nodes.
Refer to the log file for details.
The name of the log file is /var/cluster/logs/install/scinstall.log.10359.
Configuring "prodhost02" ... done
Rebooting "prodhost02" ... done
Configuring "prodhost01" ... done
Rebooting "prodhost01" ...
Log file - /var/cluster/logs/install/scinstall.log.10359
Note: nxge1 and 2 are the interconnects, there are no switches, it is done with crossover cables7. Storage Configuration
7.1 Confirm HBA's are connectedList the connected HBA’s:
root@prodhost01:~ 12:46:24 luxadm -e port |grep CONNECTED
/devices/pci@1,700000/SUNW,qlc@0/fp@0,0:devctl CONNECTED
/devices/pci@3,700000/SUNW,qlc@0/fp@0,0:devctl CONNECTED
Verify FC ports are connected and configured:
root@prodhost01:~ 12:48:36 cfgadm -al -o show_FCP_dev |grep fc-fabric
c1 fc-fabric connected configured unknown
c2 fc-fabric connected configured unknown
7.1 Scan for the new LUNsNote the disk IDs have the LUns have no apparent sequentional order in relation to the metadevices.
This is because storage had been added, removed, migrated, so many times over the years.
SAN storage administrators provided list of WWNs for all LUNS.
# cfgadm -c configure cX
# cfgadm -al
# cfgadm -al -o show_SCSI_LUN
# devfsadm -Cv
# scgdevs
Confirm can see all 16 new LUNs:# luxadm probe | grep "Logical Path" | wc -l
Confirm sizes of LUNs, and sort by number of each size:luxadm probe |grep Logical|awk -F\: '{print"echo "$2";luxadm display "$2"|grep capacity"}'\
|sh|grep capacity|awk '{print $3" "$4}'| sort | uniq -c |sort
1 5120 MBytes (1 x 5GB)
1 51200 MBytes (1 x 50GB)
2 1024 MBytes (2 x 1GB)
2 153600 MBytes (2 x 150GB)
2 512000 MBytes (2 x 500GB)
8 102400 MBytes (8 x 100GB)
Create a table with the LUN IDs, and sizes
| LUN ID | Size |
| <WWN_SAN_ID>000000000A1d0 | 1GB |
7.2 Confirm all paths to Storage are active
Each LUN should have 4 paths and all operational:
mpathadm list lu
/dev/rdsk/c3t<WWN_SAN_ID>00A1d0s2
Total Path Count: 4
Operational Path Count: 4
Confirm all 16 Luns have 4 paths, and all operational:
mpathadm list lu | grep "Total Path Count: 4" | wc -l
mpathadm list lu | grep "Operational Path Count: 4" | wc -l
7.3 Format and Label each diskConfirm there are 18 disks showing - 2 os internal disks + 16 LUNS:
# echo | format | wc -l
Format, select each disk in turn, and label it.# format c3t<WWN_SAN_ID>000000000A1d0
selecting c3t<WWN_SAN_ID>000000000A1d0
[disk formatted]
Disk not labeled. Label it now? y
format> q
7.4 Add DID for each LUN to table
Add DID to the LUN table:
| LUN ID | Size | DID |
| <WWN_SAN_ID>000000000A1d0 | 1GB | d4 |
7.5 Determine the metasets for each disk We have the LUN sizes, so next label which metaset each LUN belongs to:
| LUN ID | Size | DID | metaset |
| <WWN_SAN_ID>000000000A1d0 | 1GB | d4 | quorum |
| <WWN_SAN_ID>000000000A2d0 | 1GB | d10 | DBSID-DS |
| <WWN_SAN_ID>000000000A3d0 | 5GB | d5 | MQFTE-DS |
| <WWN_SAN_ID>0000000012Cd0 | 150GB | d19 | APP-DS |
7.6 Configure the Quorum 7.6.1 Check Initial status
clq status
=== Cluster Quorum ===
--- Quorum Votes Summary from (latest node reconfiguration) ---
Needed Present Possible
------ ------- --------
1 1 1
--- Quorum Votes by Node (current status) ---
Node Name Present Possible Status
--------- ------- -------- ------
prodhost01 1 1 Online
prodhost02 0 0 Online
7.6.2 Add shared LUN to the quorumclq add d4
7.6.3 Recheck Status
clq status
=== Cluster Quorum ===
--- Quorum Votes Summary from (latest node reconfiguration) ---
Needed Present Possible
------ ------- --------
2 3 3
--- Quorum Votes by Node (current status) ---
Node Name Present Possible Status
--------- ------- -------- ------
prodhost01 1 1 Online
prodhost02 1 1 Online
--- Quorum Votes by Device (current status) ---
Device Name Present Possible Status
----------- ------- -------- ------
d4 1 1 Online
7.4 Create Slices for Metadb on the quorum disk
Create 2 x 128mb slices on the quorum diskUse the first for metadbs for host1
metadb -f -a -c 3 /dev/dsk/c3t000000000A1d0s0
Use the 2nd slice for metdbs on 2nd host
metadb -f -a -c 3 /dev/dsk/c3t000000000A1d0s1
7.5 Assign metadevice IDs and mount reference to LUNs table
| LUN | Size | DID | metaset | metadevice | mount |
| <WWN_SAN_ID>00A1d0 | 1GB | d4 | quorum | quorum | N/A |
| <WWN_SAN_ID>00A2d0 | 1GB | d10 | DBSID-DS | d141 | admin |
| <WWN_SAN_ID>00A3d0 | 5GB | d5 | MQFTE-DS | d151 | config |
| <WWN_SAN_ID>00A4d0 | 50GB | d22 | MQFTE-DS | d161 | files |
| <WWN_SAN_ID>011Ad0 | 100GB | d18 | DBSID-DS | d111 | sybase01 |
| <WWN_SAN_ID>011Bd0 | 100GB | d9 | DBSID-DS | d111 | sybase01 |
| <WWN_SAN_ID>011Cd0 | 100GB | d11 | DBSID-DS | d111 | sybase01 |
| <WWN_SAN_ID>011Dd0 | 100GB | d7 | DBSID-DS | d111 | sybase01 |
| <WWN_SAN_ID>011Ed0 | 100GB | d15 | DBSID-DS | d121 | sybase02 |
| <WWN_SAN_ID>011Fd0 | 100GB | d17 | DBSID-DS | d121 | sybase02 |
| <WWN_SAN_ID>012Ad0 | 100GB | d28 | DBSID-DS | d121 | sybase02 |
| <WWN_SAN_ID>012Bd0 | 100GB | d35 | DBSID-DS | d121 | sybase02 |
| <WWN_SAN_ID>012Cd0 | 150GB | d19 | APP-DS | d101 | app |
| <WWN_SAN_ID>012Dd0 | 150GB | d25 | APP-DS | d101 | app |
| <WWN_SAN_ID>012Ed0 | 500GB | d6 | DBSID-DS | d131 | backup |
| <WWN_SAN_ID>012Fd0 | 500GB | d14 | DBSID-DS | d131 | backup |
7.6 Recreate metasets and metadevices.
The metasets must be recreated in the same order as one the existing production hosts:
Set name = APP-DS, Set number = 1
Set name = DBSID-DS, Set number = 2
Set name = MQFTE-DS, Set number = 3
7.6.1 Purge any refernces to the old metasets transferred during the initial setup:metaset -s APP-DS -P
metaset -s DBSID-DS -P
metaset -s MQFTE-DS -P
7.6.2 Check Cluster Disk Group Status, if sets showing, they must be removed:
cldg status
/usr/cluster/lib/sc/dcs_config -c remove -s APP-DS
/usr/cluster/lib/sc/dcs_config -c remove -s DBSID-DS
/usr/cluster/lib/sc/dcs_config -c remove -s MQFTE-DS
7.6.3 Recreate the application disk set, devices and filesystem:metaset -s APP-DS -a -h prodhost01 prodhost02
metaset -s APP-DS -a /dev/did/rdsk/d19
metaset -s APP-DS -a /dev/did/rdsk/d25
metainit -s APP-DS d101 2 1 /dev/did/rdsk/d19s0 1 /dev/did/rdsk/d25s0
=> APP-DS/d111: Concat/Stripe is setup
metainit -s APP-DS d100 -m d101
=> APP-DS/d110: Mirror is setup
newfs /dev/md/APP-DS/rdsk/d100
=> newfs: construct a new file system /dev/md/APP-DS/rdsk/d100: (y/n)? y
mount /dev/md/APP-DS/dsk/d100 /opt/APP
df -h !$
umount !$
7.6.4 Recreate the database disk set, devices and filesystem:metaset -s DBSID-DS -a -h prodhost01 prodhost02
metaset -s DBSID-DS -a /dev/did/rdsk/d18
metaset -s DBSID-DS -a /dev/did/rdsk/d9
metaset -s DBSID-DS -a /dev/did/rdsk/d11
metaset -s DBSID-DS -a /dev/did/rdsk/d7
metaset -s DBSID-DS -a /dev/did/rdsk/d15
metaset -s DBSID-DS -a /dev/did/rdsk/d17
metaset -s DBSID-DS -a /dev/did/rdsk/d28
metaset -s DBSID-DS -a /dev/did/rdsk/d6
metaset -s DBSID-DS -a /dev/did/rdsk/d14
metaset -s DBSID-DS -a /dev/did/rdsk/d35
metaset -s DBSID-DS -a /dev/did/rdsk/d10
metainit -s DBSID-DS d111 4 1 /dev/did/rdsk/d18s0 1 /dev/did/rdsk/d9s0 1 /dev/did/rdsk/d11s0 1 /dev/did/rdsk/d7s0
=> DBSID-DS/d111: Concat/Stripe is setup
metainit -s DBSID-DS d110 -m d111
=> DBSID-DS/d110: Mirror is setup
newfs /dev/md/DBSID-DS/rdsk/d110
mount /data/DBSID/sybase01
df -h !$
umount !$
metainit -s DBSID-DS d121 4 1 /dev/did/rdsk/d15s0 1 /dev/did/rdsk/d17s0 1 /dev/did/rdsk/d28s0 1 /dev/did/rdsk/d11s0
=> DBSID-DS/d121: Concat/Stripe is setup
metainit -s DBSID-DS d120 -m d121
=> DBSID-DS/d120: Mirror is setup
newfs /dev/md/DBSID-DS/rdsk/d120
mount /data/DBSID/sybase02
df -h !$
umount !$
metainit -s DBSID-DS d131 3 1 /dev/did/rdsk/d6s0 1 /dev/did/rdsk/d14s0 1 /dev/did/rdsk/d35s0
metainit -s DBSID-DS d130 -m d131
newfs/dev/md/DBSID-DS/rdsk/d130
mount /data/DBSID/backup
df -h !$
umount !$
metainit -s DBSID-DS d141 1 1 /dev/did/rdsk/d10s0
metainit -s DBSID-DS d140 -m d141
/dev/md/DBSID-DS/rdsk/d140
mount /opt/sybase/admin/DBSID
df -h !$
umount !$
7.6.5 Recreate the Message Queue disk set, devices and filesystem:
metaset -s MQFTE-DS -a -h prodhost01 prodhost02
metaset -s MQFTE-DS -a /dev/did/rdsk/d5
metaset -s MQFTE-DS -a /dev/did/rdsk/d22
metainit -s MQFTE-DS d151 1 1 /dev/did/rdsk/d5s0
=> MQFTE-DS/d151: Concat/Stripe is setup
metainit -s MQFTE-DS d150 -m d151
=> MQFTE-DS/d150: Mirror is setup
newfs /dev/md/MQFTE-DS/rdsk/d150
mount /data/MQFTE/config
umount !$
metainit -s MQFTE-DS d161 1 1 /dev/did/rdsk/d22s0
metainit -s MQFTE-DS d160 -m d161
newfs
mount /data/MQFTE/files
umount !$
7.6.6 Re-enable metadevice mountsAdd updated mount entries:
vi /etc/vfstab
/dev/md/APP-DS/dsk/d100 /dev/md/APP-DS/rdsk/d100 /opt/app ufs 2 no logging,forcedirectio,largefiles,noatime
/dev/md/DBSID-DS/dsk/d110 /dev/md/DBSID-DS/rdsk/d110 /data/DBSID/sybase01 ufs 2 no logging,forcedirectio,largefiles,noatime
/dev/md/DBSID-DS/dsk/d120 /dev/md/DBSID-DS/rdsk/d120 /data/DBSID/sybase02 ufs 2 no logging,forcedirectio,largefiles,noatime
/dev/md/DBSID-DS/dsk/d130 /dev/md/DBSID-DS/rdsk/d130 /data/DBSID/backup ufs 2 no logging,forcedirectio,largefiles
/dev/md/DBSID-DS/dsk/d140 /dev/md/DBSID-DS/rdsk/d140 /opt/sybase/admin/DBSID ufs 2 no logging,forcedirectio,largefiles,noatime
/dev/md/MQFTE-DS/dsk/d150 /dev/md/MQFTE-DS/rdsk/d150 /data/mqfte/config ufs 2 no logging,largefiles,noatime
/dev/md/MQFTE-DS/dsk/d160 /dev/md/MQFTE-DS/rdsk/d160 /data/mqfte/files ufs 2 no logging,largefiles,noatime
7.7 Create another backup boot environment
# lucreate -n s10u11_ProdImage.03withSAN
8. Data & Filesystem Migration
This is a time-critical activity, as live production systems can only be taken down for 24 hrs on sunday.Database and all applications must be stopped to ensure no data is altered during copy.
8.1 Application files
Activity RunBook:
- take orignhost01 down to single user mode: init S
- start network interface, and set route
- Copy data using rsync:
# cd /opt/app; rsync -rugpotvl . prodhost01:/opt/app
Activity RunBook:
- take orignhost02 down to single user mode: init S
- start network interface, and set route
- with almost 1TB of data, the copy needs to be optimized
6 concurrent rsync sessions are the optimum, before transfer rate is impacted:
-
When those large files finish clean up the remaining files:# cd /data/sybase01; rsync -rugpotvl -progress hist* prodhost02:/data/sybase01/ # cd /data/sybase01; rsync -rugpotvl -progress repo* prodhost02:/data/sybase01/ # cd /data/sybase01; rsync -rugpotvl -progress temp*. prodhost02:/data/sybase01/ # cd /data/sybase02; rsync -rugpotvl -progress hist*. prodhost02:/data/sybase02/ # cd /data/sybase02; rsync -rugpotvl -progress repo. prodhost02:/data/sybase02/ # cd /data/sybase02; rsync -rugpotvl -progress temp. prodhost02:/data/sybase02/- cd /data/sybase01; rsync -rugpotvl -progress . prodhost02:/data/sybase01/
- cd /data/sybase02; rsync -rugpotvl -progress . prodhost02:/data/sybase02/
- cd /data/DBSID/backup; rsync -rugpotvl -progress . /data/DBSID/backup/
- cd /opt/sybase/DBSID/admin; rsync -rugpotvl -progress . /opt/sybase/DBSID/admin
-
9. Configure IPMP Network Redundancy
Rebuild IPMP configuration for the new environmentEnsure interconnect networks are isolated from existing clusters.
# svcprop -p config/local_only network/rpc/bind false
# scinstall
# svcadm enable /network/rpc/scrinstd
# scp nfsfiler:/mnt/origHost1/clusterconfig.xml /var/tmp/clusterconfig.xml
10. Recreating the Cluster Configuration xml
10.1 Copy Original Cluster Config
mkdir /var/tmp/NewCluster
cd !$
cp nfsfiler:/mnt/OriginHost01/clusterconfig.xml
10.2 Create newfooter.xml with all the Resource ConfigsNeed to split the file based at these two lines:
</devicegroupList>
<resourcetypeList>
Determine which line numbers there are at: cat -n clusterconfig.xml
/rescourcetypeList
will show, for example:
949 </devicegroupList>
950 <resourcetypeList>
Make a copy of the config file as a new footer:cp clusterconfig.xml newfooter.xml
Remove all entries up to the resourcetyeList:vi newfooter.xml
d949d
10.3 Update newfooter.xml to reflect new cluster config:%s/orginHost01/prodhost01/g
:%s/orginHost02/prodhost02/g
:%s/orginCluster/prodcluster/g
10.4 Export XML for new Production Cluster in Veronacluster export > /var/tmp/NewCluster/NewClusterConfig.xml
copy cd /var/tmp; cp NewClusterConfig.xml newheader.xml
10.5 Create newheader.xml with new cluster settingsWill remove all resource definitions
Determine which line numbers there are at:
cat -n newheader.xml
/rescourcetypeList
will show, for example:
735 </devicegroupList>
740 <resourcetypeList>
Remove all resource entriesvi newheader.xml
:740
dG
10.6 Merge the two configscat newheader.xml newfooter/xml > MergedCluster.xml
11. Recreate the Cluster Resource Groups
- # clrg create -i /var/tmp/clusterconfig.xml APP-rg
- # clrg create -i /var/tmp/clusterconfig.xml DBSID-rg
- # clrg create -i /var/tmp/clusterconfig.xml MQFTE-rg
12. Resource Registration
List resource types, and their version, installed on the system:clrt list -v
Resource Type Node-List
---------------------- ---------
SUNW.LogicalHostname:2 <ALL>
SUNW.SharedAddress:2 <ALL>
SUNW.gds:6 <ALL>
SUNW.HAStoragePlus:8 <ALL>
SUNW.apache:4.2 <ALL>
SUNW.sybase:5 <ALL>
Register these specific versions:# clrt register SUNW.LogicalHostname:2
# clrt register SUNW.SharedAddress:2
# clrt register SUNW.gds:6
# clrt register SUNW.HAStoragePlus:8
# clrt register SUNW.apache:4.2
# clrt register SUNW.sybase:5
Might see "already registered" for HAStorage and LogicalHostname, if pre-registered with the cluster.
13. Resource Creation
13.1 Create and enable Storage Resources- # clrs create -g APP-rg -t SUNW.HAStoragePlus-i /var/tmp/NewCluster/MergedCluster APP-Stor
- # clrs enable APP-stor
- # clrs create -g DBSID-rg -t SUNW.HAStoragePlus-i /var/tmp/NewCluster/MergedCluster DBSID-Stor
- # clrs enable DBSID-stor
- # clrs create -g MQFTE-rg -t SUNW.HAStoragePlus-i /var/tmp/NewCluster/MergedCluster MQFTE-Stor
- # clrs enable MQFTE-stor
- # clrs create -g DBSID-rg -t SUNW.LogicalHostname -i /var/tmp/NewCluster/MergedCluster APP-VIP
- # clrs enable APP-VIP
- # clrs create -g DBSID-rg -t SUNW.LogicalHostname -i /var/tmp/NewCluster/MergedCluster DBSID-VIP
- # clrs enable DBSID-VIP
Create the database resources:
clrs create -g DBSID-rg -i /var/tmp/NewCluster/MergedCluster DBSID-Syb
clrs enable DBSID-syb
clrs create -g DBSID-rg -i /var/tmp/NewCluster/MergedCluster DBSID-ssh
clrs enable DBSID-ssh
Online the resource group:
clrg online -M DBSID-rg
13.2 Create and enable the Application ResourcesThere are multiple dependencies, so each resource must be created and enabled in order:
# clrs create -g APP-rg -i /var/tmp/NewCluster/MergedCluster APP-ssh
# clrs enable APP-ssh
# clrs create -g APP-rg -i /var/tmp/NewCluster/MergedCluster Net_server
# clrs create -g APP-rg -i /var/tmp/NewCluster/MergedCluster Naming_service
# clrs enable Net_server Naming_service
# clrs create -g APP-rg -i /var/tmp/NewCluster/MergedCluster Monitoring
# clrs create -g APP-rg -i /var/tmp/NewCluster/MergedCluster Listener
# clrs enable Monitoring Listener
# clrs create -g APP-rg -i /var/tmp/NewCluster/MergedCluster Apache-APP
# clrs enable Apache-APP
# clrs create -g APP-rg -i /var/tmp/NewCluster/MergedCluster WebGUI
# clrs enable WebGUI
# clrs create -g APP-rg -i /var/tmp/NewCluster/MergedCluster Gds-agt-APP_QUEUE001_PROD
# clrs create -g APP-rg -i /var/tmp/NewCluster/MergedCluster GServBSInput
# clrs create -g APP-rg -i /var/tmp/NewCluster/MergedCluster GServXMLTransTicket
# clrs enable Gds-agt-APP_QUEUE001_PROD GServBSInput GServXMLTransTicket
# clrs create -g APP-rg -i /var/tmp/NewCluster/MergedCluster Ticketdaemon
# clrs create -g APP-rg -i /var/tmp/NewCluster/MergedCluster Feeddaemon
# clrs create -g APP-rg -i /var/tmp/NewCluster/MergedCluster Tradeserv
# clrs enable Ticketdaemon Feeddaemon Tradeserv
# clrs create -g APP-rg -i /var/tmp/NewCluster/MergedCluster PositionPublisher
# clrs enable PositionPublisher
# clrs create -g APP-rg -i /var/tmp/NewCluster/MergedCluster Tran
# clrs create -g APP-rg -i /var/tmp/NewCluster/MergedCluster Ticketsorter
# clrs create -g APP-rg -i /var/tmp/NewCluster/MergedCluster Trade
# clrs create -g APP-rg -i /var/tmp/NewCluster/MergedCluster Position
# clrs create -g APP-rg -i /var/tmp/NewCluster/MergedCluster PosPubApp
# clrs create -g APP-rg -i /var/tmp/NewCluster/MergedCluster Prepaytran
# clrs create -g APP-rg -i /var/tmp/NewCluster/MergedCluster Cache
# clrs create -g APP-rg -i /var/tmp/NewCluster/MergedCluster Cacheupdate
# clrs create -g APP-rg -i /var/tmp/NewCluster/MergedCluster Editserver
# clrs create -g APP-rg -i /var/tmp/NewCluster/MergedCluster Limits
# clrs create -g APP-rg -i /var/tmp/NewCluster/MergedCluster Margin
Enable all remaining resources in the group:
#clrs enable -g APP-rg
Online the resource group
clrg online -M APP-rg
14. Testing
- Perform full failover and recovery tests
- Validate application behaviour across nodes
15. Final Data Sync
Repeat all actions in Step 8. Data & Filesystem Migration16. Failover/Migration to New Cluster
Immediately after the final data sync, we fully failed over to the new Italian hosts:- DNS for APP-VIP & DBSID-VIP changed to Italian addresses
Phase 2 - New Disaster Recovery Cluster
- backlist luns
map prod -> dr