Follow

Disaster recovery scenarios for async and sync replication

Disaster recovery using target snapshots

Since the target dataset is always write-protected, for DR scenarios, InfiniBox supports taking a snapshot of the target.

In async replicas, the snapshot that is taken on the target is always consistent with the last replication cycle.

This ability to take a snapshot on the target is ideal for Disaster Recovery tests that aim to verify the integrity of the data on the target without affecting the replication process.

It allows the user to map the snapshot to the remote host without stopping the replication of the source dataset to the target.

Note that it is also possible to map the target dataset itself to a remote host, taking into account that the dataset will be consistent and write-enabled only after the replica was changed to source or deleted.

Testing the disaster recovery site (Firedrill)

Failover and failback are operations that handle a situation in which the connectivity between the local and the remote systems is down. These operations switch the roles of the source and the target.

Failover

  • The link between the source and the target is down and the target is connected to a host and can serve it.
  • The target has to have its role changed to a source and will now accept host writes.
  • The replica on the original source side gets into an auto-suspended state.
  • During this phase, the target and the source are no longer consistent.

Failback

The user returns both source and target to their original roles, and the replica returns to the state it was prior to the failover.

  • The original target should be changed back to target (was changed to source in the failover).
  • The replica should be resumed from the original source side.
  • A sync job will start in order to return the replica to synchronize state.

Switching the replica in the case of a real disaster

In some disaster scenarios, the workload will have to be moved to the target system and the applications will continue working on the target system.

In this case, the user might want to replicate the data changed on the target datasets back to the source datasets.

To do so, the user will need to do the following:

Failover

  • The source system is down and the target is connected to a host and can serve it.
  • The target has to have its role changed to a source and will now accept host writes.
  • During this phase, the source datasets are unavailable to the user due to the disaster.

Failback

The user switches the original roles of the replica and synchronizes the data from the new source to the old source.

  • The original source should be changed to target.
  • The replica should be resumed from the new source side.
  • A sync job will start in order to return the replica to synchronize state.

After this procedure, the user can decide whether to keep the replicas roles reversed or change them back to the original sites (using switch role for sync if possible, or change role on both sides when there is no new data on the source, to prevent data loss in the process).

Was this article helpful?
0 out of 0 found this helpful

0 out of 0 found this helpful

Last edited: 2021-11-29 12:52:11 UTC

Comments