Follow

Overview 

Although 3-way replication relies on async replication, there is a subtle but significant difference when it comes to the operations 3-way supports:

  • Normally, the SRA uses the change-role operation on the replica on the target system when you "Run Recovery Plan" on the DR system
  • However, the async replica on system C, in a 3-way replication, does not support change-role 
  • So, instead of doing change-role the SRA will delete the replica on system C
  • The way to recover the replication is to delete the async replica on system A or B (whichever was the source), and then recreate the thing from scratch

The scenario

This article does not refer to the use of "Test Recovery Plan" operation on the DR site, which brings up a sandboxed environment on C:

  • It talks about how to recover after using the "Run Recovery Plan" operation on the DR site.

The information below relates to the following situation:

  • InfiniBox systems A and B are used for vMSC (vSphere Metro Storage Cluster), where volumes/CGs (i.e. datastores) are Active-Active replicated between A and B
    The VM cluster may use uniform or non-uniform storage access topology
  • The volumes/CGs on system A are also replicated asynchronously to a third system C, which constitutes 3-way replication
  • The administrator used SRM (Site Recovery Manager) to run a DR to the site with system C, either for testing purpose or a true DR event
  • The administrator wants to recover the 3-way replication

The method of recovery depends on the particular scenario.

  • Case 1: Temporarily activation of system C for testing purposes (the real data remains on Systems A and B).  
  • Case 2: An actual DR causes complete loss of the primary site, or a situation where the real data now exists only on system C. 

Case 1: Temporary activation for testing purposes

Customers may use the "Run" operation to actually test the DR usability:

  • They disconnect the two sites (so that A&B continue to serve), and then RUN the DR on C. 
  • They don't intend to really move any production workload to the DR site, it just a test.

The recovery procedure:

  1. On InfiniBox system A:
    1. Delete the async replica.
    2. Create an asynchronous replica to system C, using the same target CGs, relying on the snapshots retained when the SRA deleted the replica.
  2. On the SRM:
    1. Delete the Protection Group. 
    2. Run array discovery.
    3. Create a new Protection Group and add to the recovery plan.

Case 2: Actual DR recovery

  1. In real life: recover systems A and B.
  2. On system A: delete the async replica (if it still exists).  
  3. On systems A and B: delete the A/A replica (if it is still there). Keep the staging area.
  4. On system C: create async replica to A, using a base snapshot (if it exists). 
  5. On the SRM:
    1. Delete the Protection Group.
    2. Run array discovery.
    3. Create the protection group and add to the recovery plan.
  6. On system C: wait until the async replica is synchronized with small RPO.
  7. On SRM: planned graceful fail-over from system C to system A .
  8. On system C: delete the async replica to A. Keep the staging area.
  9. On system A:
    1. create A/A replica between systems A and B, using a base snapshot (if it exists).
    2. create async replica to C.
  10. On the SRM:
    1. Delete the Protection Group.
    2. Run array discovery. 
    3. Create the Protection Group and add to the recovery plan.





Was this article helpful?
0 out of 0 found this helpful

0 out of 0 found this helpful

Comments