Follow

Replica Create

Create the replica on the source system because that system holds the dataset data and connects to the application active host.

When creating a replica, provide the following information:

  • Replication type
  • Source dataset - the volume, filesystem, or consistency group that will be replicated
  • Target system
    • The target system must be connected to the source system via a link
    • The link must be defined prior to the replica creation
    • The link must be in a Connected state
  • Target pool or target dataset
    • If the target pool is provided, the dataset is created as a part of the replica creation in the specified pool
    • If the target dataset is provided (for volumes/CG only), it must be created in advance, empty, and the same size as the source dataset. If it's a consistency group, see the CG section below.
  • Interval and RPO - only required for the creation of an async replica
  • Preferred system (optional) - can only be provided for the creation of an Active-Active replica
    • Unless specifically stated, the preferred system is the one where the replica is created

Replica Delete

If possible, a replica entity should be deleted from the source replica. The system automatically deletes the target replica as well.

When the replica is deleted, the pairing between the replicated datasets is deleted and both the source and target datasets return to regular usage. Deleting the replica does not delete the replicated datasets on either side.

If there is no connectivity between the source and target systems, or if there is a configuration mismatch between the replicas, there is an option to delete the replica locally using a force flag. This requires the user to do the cleanup on the other system.

When deleting an async or active-active replica, there is an option to retain the staging area (the last snapshot replicated) and expose it to the user for future use.  For more information, see Overview of InfiniBox replica#InitialSync. If the replica is in Initializing state, there are no snapshots to retain.

When deleting an active-active replica, the serial number of the remote volume is changed, and so the remote volume must be unmapped before the replica is deleted.

If the force_keep_serial flag is used, the serial is kept on the local volume. This must be handled carefully because its can cause a "split brain" between the two volumes, where both volumes have the same serial number but are not connected to each other.

Replica Suspend and Resume (sync and async replicas only)

The user can suspend and resume the replica at any time from the source replica only.

The Suspend Replica command causes the source replica to stop transferring data to the target replica. The Resume Replica command  resumes the data transfer between the datasets, if possible.

If there is no connectivity between the source and target systems, or if there is a configuration mismatch between the replicas, the resume command fails.

Replica Suspend and Resume (Active-Active replicas)

The user can suspend and resume the replica at any time from either InfiniBox system. This affects both the data transfer and the availability to hosts of the underlying volumes.

The Suspend Replica command causes the replica to stop transferring data to the other system, and it turns the volume on the other system offline.

Only the paths to the volume on the system where the command is issued remain online. Note: this operation ignores the preferred system settings of the replica. 

The Resume Replica command resumes the data transfer between the datasets, and eventually turns the volume on the other system online (when they complete the sync).

The user must suspend an Active-Active replica in order to:

  • Resize the replicated volume.
  • Add a new member to a replicated CG.
  • Change the preferred system settings of the replica.

Replica Change Role (sync and async replicas only)

The user can change the replica role at any time, except when the replica is not yet initialized.

The Change Role can be done on either the source or the target replica. The source replica must be suspended prior to changing the role.

When changing source to target, the source dataset is changed to target dataset and will not accept user writes. This may cause a loss of updated source data that was not replicated to the target yet.

When changing target to source, the target dataset is changed to source dataset and will accept host writes. I/O from the another system will be blocked.

After a Change Role command, the replica must be manually resumed in order to continue replicating.

Replica Switch Role (for sync replicas only)

In sync replication, there is an option to switch the replica direction by synchronously changing both sides' roles.

The Switch Role command can only be used from the source replica, and only if the link between the systems is connected and the replica is in synchronized state.

Consistency group replica (sync and async replicas only)

A consistency group (CG) is an entity that groups several volumes together and allows the user to take a consistent snapshot of these volumes.

To ensure consistency, all the volume snapshots of a replicated CG are taken at the same point-in-time.

The consistency group is replicated as a whole.

Create a CG replica

Creating a replicated CG is similar to a volume or filesystem Replica Create.

When using CG replicas, there are several options for the local dataset at the time of creation:

  • An empty CG - The replica entity is created between the two empty CGs. When the user adds volumes to the CG, they will automatically be replicated.
  • A CG with volumes - All the volumes in the source CG are paired with target volumes, and the replication process starts.
    • The members on the target side can be either created automatically, or previously created by the user and paired specifically.

Delete a CG replica

Deleting a CG replica is similar to the deleting a volume.

The CG replica delete should be done from the source replica, if possible. It will delete the replica entity on both sides.

Deleting the CG replica will not delete the CG itself on either side or change it in any way.

If there is no connectivity between the source and target systems, or if there is a configuration mismatch between the replicas, there is an option to delete the replica locally using a force flag. This requires the user to do the cleanup on the other system.

When deleting an async CG replica, there is an option to retain the staging area. This exposes a snap-group of all the last replicated snapshots of the CG.

Add a member to an async CG replica

Adding a member to a replicated CG can be done on an async replica only!!!
For a sync CG replica, the user must change the replica type of the CG replica to async and wait for the completion of the sync job prior to the add/remove operation.

Adding a member to a replicated CG can be done on the source replica only.

The added member can be a volume or an async replica. In both cases, the new member will get the async replica definitions from the CG (RPO and interval).

Adding a member to a replicated CG changes the CG replica sync state to initializing until all the volumes are replicated and the targets are consistent.

Remove a member from an async CG replica

Removing a member from a replicated CG can be done on an async replica only!!!
For a sync CG replica, the user must change the replica type of the CG replica to async and wait for the completion of the sync job prior to the add/remove operation.

Removing a member from a replicated CG can be done on the source replica only.

When removing a member, the replica link must be connected and the replica must be from async type. Before removing a member to a sync CG, the replica type must be changed to async.

When removing a member, if the user chooses to keep the member replicated, a new replica entity is created for the removed member.

The user can also choose to retain the staging area for the removed member, as can be done when deleting a replica.

Async replication specifics

Async replication mechanism

The InfiniBox async replication feature is snapshot-based replication, based on sync jobs that are scheduled automatically by the system.

The sync job creates a snapshot on the source dataset and delivers it to the target. The next sync job takes a new snapshot of the source, calculates the delta from the previous snapshot, and sends only the data that was changed since the previous sync job. 

The amount of time between two scheduled sync jobs is called the sync interval. The sync interval can be changed by the user. Changes to the replica sync interval take effect on the next sync job. 

The async snapshots are internal snapshots and are not visible to the user. The capacity is presented to the user in the replica information as staging area capacity.

Sync Now command

The user can use the Sync now command to trigger a sync job on any async replica, regardless of the interval defined. If no sync job is currently replicating, a sync job will be initiated.

If the user can define a None interval for a replica, the system does not initiate any sync jobs for the replica, and all sync jobs must be initiated via a Sync now command.

Sync job states

When the replica is in Active state, a sync job is initiated as needed. InfiniBox manages the sync job through the following states:

  • Pending - The sync job is planned but not yet executed
  • Initializing - The initial sync job that replicates all of the source data to the target

  • Replicating - The sync job is now running
  • Done - The sync job has finished
  • Paused - The sync job is paused because the replica is suspended by the user
  • Stalled - The sync job is stalled due to link problems

The sync job states and the replica state are only visible on the source replica. On the target replica, the state is N/A.

RPO state

In addition to the replica state, for async replicas there is also an RPO state.

The RPO state is presented on both sides of the replica and is calculated locally.

The RPO state can be:

  • RPO OK - The replica recovery point is within the defined RPO
  • RPO Lagging - The replica passed the defined RPO, and there is a potential large data loss in case of a disaster. This state might be reached when there are connectivity issues preventing a proper data flow or an incorrect RPO definition.

Possible async replica states

Replica States

Sync Job States

RPO States

Active

Pending

RPO OK / RPO Lagging


Initializing

Replicating

Done

Stalled

Suspended

Paused

Done

Auto Suspend

Paused

Done

Best practices for setting the sync interval and RPO

Since the recovery point is based on the sync interval defined, the best practice is to set the RPO to be at least twice as large as the sync interval.

Sync replication specifics

Sync replication mechanism

In sync replication, each host write is replicated to the target system prior to acknowledging the host. The replica depends on the quality of the link between the source and the target.

InfiniBox takes measures to handle the synchronous replica if the link cannot support the replica, including the safe return to synchronous replication when the connectivity conditions are back to normal.

Internal fail-over to async replication

If a problem prevents the replica from being synchronized, the replica automatically changes to an internal async replication mode.

The replica operates as if it is async until a Synchronization state is reached.

The replica state is Out Of Sync, or Sync In Progress, until InfiniBox returns the replica to a Synchronized state.

The replication type remains sync at all times. The user cannot perform async replication operations, such as sync now, and cannot configure the replica RPO and Interval.

The fallback to sync replication does not require an Initialization and will be done automatically by the system.

Replica sync states

Sync and Active-Active replicas have a sync state similar to the sync job states in async.

Possible sync states:

  • Synchronized - The source and target dataset data are identical
  • Sync in progress - The replica is in the process of returning to synchronized state (using the async internal mode)
  • Initializing - The initial replica process is running, copying the all of the source data to the target
  • Initializing pending - The initialization process will start once other initialization processes will end
  • Out of sync - The replication process is paused because the replica source cannot send data to the target

The state of the replica is only visible on the source. On the target, the state is N/A.

Possible sync replica states of the source replica

Replica States

Sync States

Active

Synchronized

Initializing

Sync in progress

Out of sync

Suspended

Out of sync

Auto Suspend

Out of sync

Was this article helpful?
0 out of 0 found this helpful

0 out of 0 found this helpful

Last edited: 2022-05-09 13:35:15 UTC

Comments