Exchange 2010 Restore-databaseavailabilitygroup High Availability and Disaster Recovery
Exchange 2010 - High Availability and Disaster Recovery With Only 3 Servers
One of my customers wants to know how to leverage Exchange 2010 to provide high-availability (server failure) and disaster recovery (site failure) using the minimum number of servers. Here is a walk-through of the reference design and site fail-over experience:
Production Site:
- DC (FSW)
- Hardware Load Balancer (VIP for CAS Array)
- EX2010-1 (CAS/HTS/MBX Roles)
- EX2010-2 (CAS/HTS/MBX Roles)
- DC-DR (Alternate FSW)
- EX2010-3 (CAS/HTS/MBX Roles)
Configuring Disaster
Recovery with one additional Exchange 2010 Server
The first step is to
configure my DAG to handle a site failure. This entails setting the
DatacenterActivationMode to DagOnly and adding an Alternate File Share
Witness using the AlternateWitnessServer and AlternateWitnessDirectory
attributes. Setting the DatacenterActivationMode to DagOnly is required
so that I can manually modify the DAG and to prevent split-brain when
the Production site is restored.
At this point I will
simulate a site failure by shutting down all of the servers in my Prod
site (DC, EX2010-1, EX2010-2, and my hardware load balancer). In a 3
server DAG, cluster quorum is maintained by a node majority - so at this
point with two nodes offline the remaining server cannot hold quorum
and therefore my database is dismounted and cannot be re-mounted.
My Outlook clients
are all showing as Disconnected.
In order to restore
service, I must first get my database mounted. To do this I first need
to stop my DAG for my Prod servers using the
Stop-DatabaseAvailabilityGroup cmdlet.
Next I
will need to stop the Clustering service using the Services snap-in.
Next I will need to
restore my DAG for my DR site using the
Restore-DatabaseAvailabilityGroup cmdlet.
At this point I can
now mount my database in my DR site.
Although my database
has been mounted, my Outlook clients are still offline because they are
pointing to my hardware load balancer which is in a failed state. I can
restore service to my clients by updating the DNS entries for
internal.test.local and external.test.local to point to EX2010-3.
Shortly thereafter my Outlook clients will be able to reconnect.
Failing Back to the
Production Site
When my production
site comes back online, I will want to fail-back. Fortunately this
process is fairly easy (provided that I don't have to re-seed my
database replicas).
Once my Production
site is back online, my servers will start synchronizing with the active
replica on EX2010-3.
After that process is
complete, I can re-start my DAG using the
Start-DatabaseAvailabilityGroup cmdlet. Note that all of the Exchange
servers are now populated in the StartedMailboxServers field.
At this point I can
now re-activate my database on EX2010-1 and update my DNS records to
point to my VIP for internal.test.local and external.test.local.


Comments