- Replication Planning
- Physical connectivity
- Logical network settings
- Replication performance and network interfaces
- Network performance requirements for asynchronous replication
- Network performance requirements for synchronous replication and active-active replication
- Controlling the Replication Bandwidth
- Measuring the bandwidth between the sites
- Creating the Replication Service
- Active-Active replication witness
- Creating a replica
Scope of document
This document provides guidelines and instructions for setting up the InfiniBox Replication service.
This document is part of a series that includes:
- InfiniBox Best Practices Guide for Setting Up the Network Connectivity
- Describes how to set up the physical network prior to setting up the services
- InfiniBox Best Practices Guide for Setting Up the Replication Service (this document)
- InfiniBox Best Practices Guide for Setting Up a NAS Service
For best practices and configurations for using Infinidat InfiniBox Active-Active Replication with VMware vSphere Metro Storage Cluster (vMSC), see VMware vSphere Metro Storage Cluster with Infinidat InfiniBox Active-Active Replication.
InfiniBox allows customers to replicate volumes, consistency groups, and filesystems across multiple InfiniBox systems. Each dataset is replicated either synchronously (using sync and Active-Active replication) or asynchronously (async replication).
Active-Active replication is available when both InfiniBox systems run release 5.0 or above. For further details about Active-Active prerequisites and limitations, see Active-Active replication.
InfiniBox asynchronous replication is a snapshot-based solution that allows users to protect their data by replicating it to a remote site, without adding latency to the host I/Os. Users can set an RPO (Recovery Point Objective) as low as 4 seconds if the link quality requirements between the sites are fulfilled.
InfiniBox synchronous replication allows users to protect their data with zero RPO by sending the I/O to the remote site before acknowledging the host. Synchronous replication has an impact on the latency of the write operations because the acknowledgement to the host is sent only after the data is written in both sites.
Synchronous replication during network outage
InfiniBox synchronous replication depends significantly on the quality of the link between the sites. Synchronous replication can fail during a network outage. When this happens, InfiniBox uses its InfiniSnap technology to keep track of these changes. Once connectivity is restored, InfiniBox uses asynchronous replication to bring the DR system up to date, and then switches back to synchronous replication automatically, without dropping I/O.
InfiniSnap technology eliminates the need for a complete re-synchronization of the data or for generating inflated journals of the data that need to be sent to the DR site after network connectivity is restored.
Replicating over IP
With the increase in Ethernet performance and reliability, Fibre Channel (FC) is no longer the sole option as a replication infrastructure. The following differences drive the cost of FC higher than comparable Ethernet:
- FC requires either dedicated fiber optics between sites or a xWDM channel dedicated to the Fiber Channel fabric. It also requires another one for redundancy.
- IP WAN links are cheaper, can easily be shared, and are always deployed between production sites and DR sites to facilitate administration, monitoring, clustering, etc.
- IP provides a robust framework for devising an optimized protocol for replication>
- The ubiquity of IP and Ethernet provides a rich set of algorithms, toolkits, and expertise for optimizing various line conditions, as well as troubleshooting.
InfiniBox benefits from these advantages by using Ethernet networks to replicate data between sites.
However, just like with FC, using Ethernet for replication requires users to think ahead about proper bandwidth sizing. Customers must plan and test their networks to ensure they can sustain the additional bandwidth requirements of replication. This is especially important when using synchronous replication.
InfiniBox version requirements for replication
Volumes, consistency groups, and filesystems can be replicated across multiple InfiniBox systems that either run the same major version or else are one major version apart. The major version is the first term of the InfiniBox version number. For example, you can replicate entities between InfiniBox versions 5.0.3 and 5.5.40 (same major version), and between InfiniBox versions 5.0.3 and 6.0.20 (one major version apart).
Replication is not supported across systems that are more than one major version apart. For example, versions 4.0.60 and 6.0.20.
Understanding the requirements
For a successful replication deployment, administrators must make sure the environment conforms with the three types of requirements:
- Physical connectivity
These settings allow the source and target systems to communicate with each other. Examples include LACP interfaces, routers, firewall rules.
- Logical network settings
These settings allow the source and target system to communicate. Examples include IP addresses, default gateway / routing rules.
- Network performance requirements
These requirements focus on the “quality” of the network, to support the required bandwidth, latency, stability, etc.
Connecting the two sites
The two sites must be connected using any form of physical connectivity that can sustain the required bandwidth and latency. (See more information below.) Each system is connected to a local switch fabric, using one or more ports in an LACP (Link Aggregation Control Protocol). The switch allows the system to access the remote site, usually passing through a combination of routers, firewalls, and VPNs.
Connecting InfiniBox to the replication network
The InfiniBox on each site must be connected to the switch that has access to the link between the sites. This connectivity must be resilient to overcome single point failures. For each site, follow the instructions on this page.
Use two switches that support creating LACP port-groups (aka a LAG) that spread across the switches. This is also known as a Port Channel. Typically, such a configuration is supported by stacked switches and by some non-stacked switches (such as Cisco Nexus Virtual Port Channel).
It is recommended to:
- Set the relevant port-group (or port channel) LACP setting for "fast" rate, to match the InfiniBox port groups configuration which use 802.3ad (aka LACP) for availability and load-balancing of the physical ports.
- Configure the switch to use a spanning-tree algorithm so that the ports connected to InfiniBox react to network changes.
The switch configuration semantics of many network vendor implementations refer to this as “portfast”, “edge” or “edge-port”.
- Enable RX/TX flow-control for the entire network path between the two InfiniBox systems.
- Set the MTU of all the switches and hosts in your replication service to 9000 bytes.
- Connect the port to the switches:
- Select one port from each node, and connect it to Switch A.
- Select one port from each node, and connect it to Switch B.
Creating a Network Interface
Configure a LAG for the paired ports in the diagram above.
- In the InfiniBox Management Console, click the Settings icon on the left toolbar.
- Select the Network Interfaces tab, click the Create button, and select Port Group.
- Create a new InfiniBox Port Group that includes the ports connected to all the nodes.
Some customers deploy network accelerators to compress data over the WAN. This is acceptable for asynchronous replication. However, it is harmful in synchronous replication scenarios as it adds latency to the I/O, even in environments that are below the maximum supported latency.
Logical network settings
Each InfiniBox system uses a set of IP addresses for replication, some for data plane others for control plane. These IP addresses are specified when you create the Network Space, and are used to allow one system to communicate directly with a remote system. The number of IP addresses allocated to each system is tightly coupled with the types of replication used.
IP Address for
Minimum number of addresses
|Recommended number of addresses|
Asynchronous replication only
Some networks require routing to cross from one site to the other. InfiniBox supports both a “default gateway” configuration (simple, common) and a static route table (advanced, more flexible).
It is recommended that the storage administrator get the routing definitions from the network team in advance to prevent delays during the replication configuration.
InfiniBox replication uses TCP/IP for both asynchronous and synchronous replications, over the following TCP ports:
- Management: 80 (HTTP) or 443 (HTTPS)
- Data: 8067
InfiniBox systems need to communicate bidirectionally. This happens in the event of a primary site failure and recovery, when data needs to be sent from the DR site back to the production site. This requires firewall rules to be open in both directions.
The firewall rules must also allow access from any IP address on the network space on one system to all IP addresses on the network space on the remote system.
Infinidat recommends installing a witness when using Active-Active replication. For more information see Active-Active replication witness.
The InfiniBox systems communicate with the witness IP address from the three InfiniBox node IP addresses, over the following TCP ports:
- Heartbeat: 443 (HTTPS)
- Troubleshooting: 22 (SSH)
The firewall rules must also allow access between the InfiniBox systems management IP address and the node IP addresses (total of 4 IP addresses) with the witness server.
Creating a Replication Network Space
- In the InfiniBox Management Console, click the Settings icon on the left toolbar.
- Select the Network Spaces tab, and click Create.
The Create Network Space step opens.
- Enter a network space name.
In the Service drop-down menu, select Replication.
- (Optional) In the Rate Limit per Node field, you can specify the throughout limit of the replication network space bandwidth per node, in megabits (not megabytes) per second.
This limit only affects asynchronous replication, the resynchronization of synchronous replication failover, and the Initialization phase of the replication. It does not affect synchronous replication.
- elect Async Only if you don't plan to use Sync or Active-Active replicas through this network space.
Choosing Async Only assigns all the IP addresses to the async replicas. You will not be able to define Sync or Active-Active replicas on links using this network space.
- In the MTU field, enter the size, in bytes, of the Ethernet transfer over the wire. It is recommended to set this to 9000.
For each node, select its Ethernet interface from the drop-down menu.
If the desired interface was not created earlier from the Network Interfaces tab and is not in the drop-down menu, you can add a new one to the menu. See the Creating Ethernet Interfaces section in this document.
- (Optional) To group the interfaces into a Virtual LAN, click the Create VLAN button.
- Click Next to proceed to the IP Configuration step.
- Enter the networking data:
Enter the first IP address in the network range.
For each individual IP address or IP range in the format 172.16.34.5-12:
Enter an IP address or range within the subnet range, and then click Add to verify the validity of the IP address.
Enter 10 IP addresses for a service that will run sync and async replications.
Enter 7 IP addresses for a service that will run async-only replications.
- Click Finish.
The network space is displayed in the window.
Creating Ethernet Interfaces
You can create Ethernet interfaces from the Network Interfaces tab, or from the Create Network Space window.
- In the Create Network Space window, click the Create new option from a node's Ethernet interface field's drop-down menu, or click the Create Interfaces button at the bottom of the window.
The Create Ethernet Interfaces tab opens.
- You can rename the default interface name.
- Select a single port for replication from the available Ethernet ports.
Ports that are already taken by other interfaces are greyed-out.
- Click Create.
- The new interface is now selected in the Create Network Space window.
Replication performance and network interfaces
Each replicated dataset (either volume or filesystem) is replicated from a single replication IP address on the source system to a single replication IP address on the target system. As a simple outcome, each dataset is replicated from a single node on the source system to a single node on the target system.
Because of the underlying LACP interface behavior, a single replicated dataset will be limited to the bandwidth of a single physical Ethernet port. If there are multiple datasets replicating in parallel, then they might utilize the aggregated bandwidth of the Ethernet interfaces in the replication network space, but this is not guaranteed.
Each replicated dataset (volumes only) is replicated from the three nodes on the source system to the three nodes on the target system.
Because of the underlying LACP interface behavior, all replicated datasets together will be limited to three times the bandwidth of a single physical Ethernet port.
Network performance requirements for asynchronous replication
Latency and reliability
For asynchronous replication, latency has little effect as InfiniBox leverages TCP/IP in highly optimized fashion.
However, reliability of the connection is of utmost importance: intermittent failures such as packet drops and TCP re-transmissions will severely degrade the actual throughput and prevent Asynchronous replication from achieving the desired RPO. When you design and verify your network connection between the systems, make sure to test for these conditions over time: even periodic re-transmission rates of 1% could degrade performance and cause RPO lagging.
Data written to the storage needs to be sent to the DR site. In Asynchronous replications, this happens periodically (at the interval set by the storage admin) and will usually take a short time to complete. This creates a “bursty” behavior.
The bandwidth required for async replication depends on the I/O pattern and the interval at which async replication is triggered for a particular dataset.
- Not every I/O that hosts send will be replicated eventually: if a host writes data to an LBA and shortly after than overwrites the LBA with new data or unmaps the LBA, the original write operation will not be sent (unless a replication is triggered in the middle).
- InfiniBox async replication also tries to identify specific portions of data that have changed, even if a host overwrites an existing LBA with similar content.
Because of this it is not easy to predict how much bandwidth with be required for async replication. Customers may begin with estimating this based on the throughput of their hosts, and test various interval settings to understand the application behavior.
As applications behavior changes over time and additional replicas may be added, it is important to measure the bandwidth utilization periodically.
Before starting to use a replication link it is important to create a baseline of the available bandwidth.
Testing your network for its available bandwidth should be done over a period of time and several times during the day as network traffic may vary in different times of the day. It is also critically important that you coordinate such a test with your network team, as the test may compete for bandwidth with other applications.
For more information about measuring the bandwidth between the sites see Measuring the bandwidth between the sites below.
Network performance requirements for synchronous replication and active-active replication
Latency and distance
For Synchronous replication and Active-Active replication, latency is important as each I/O has to traverse between the two storage arrays before it is acknowledged back to the host.
InfiniBox Synchronous replication and Active-Active replication require up to 5ms round-trip latency and up to 100 kilometers maximum distance.
In addition, the reliability of the connection is of utmost importance: intermittent failures such as packet drops and TCP re-transmissions will severely degrade the actual latency and throughput and will cause high response times for your applications I/O. When you design and verify your network connection between the systems, make sure to test for these conditions over time: even periodic re-transmission rates of 1% could degrade performance.
Measuring the latency between the sites
It is highly recommended to test the latency between the sites ahead of the installation, checking for both the average latency and its jitter - the fluctuations in latency. The simplest way of testing this is using a ping command running 1,000 samples in each measurement, and repeating the test several times per day.
This can help uncover many potential issues such as:
- High level of jitter - the response time of each sample varies dramatically
- High latency - many samples take longer than 5ms to complete
- Different latency in different hours of the day - this usually indicates a bottleneck somewhere in the network (for example - the WAN link) at some hours of the day, which will affect the ability of the system to send data to the remote system and will result in high latency for synchronous replications
- Packet loss - packets get lost due to low link quality or congestion
An ideal environment will show consistent sub-5ms response time without losing any of the 1000 samples in the process. Some variance in the response times is expected, as long as the 5ms rule is kept.
Sizing the replication link bandwidth for Synchronous replication
For Synchronous replication, the required replication link bandwidth derives from the WRITE/XCOPY throughput on the volumes planned to be replicated.
It is recommended to size the replication link bandwidth to at least 130%-150% of the observed aggregate WRITE/XCOPY throughput to the volumes that are planned to be replicated.
The additional bandwidth throughput is to enable smooth handling of the following scenarios affecting needed replication bandwidth:
- Bursts of I/O that may occur normally as part of your workload
- Gradual increase in I/O in the future
- Synchronous replications that failed and are currently in re-synchronizing state, i.e. running the replication asynchronously in order to close the gap
- Minimal overhead that the replication protocol incurs
Note: Not sizing the replication link bandwidth may result in increased latency to the replicated volumes and potentially to inability to sustain Sync replication resulting in fallback to Async.
Link sizing example
Assume hosts are writing to a volume at the rate of 100MB/s, and that this volume needs to be replicated synchronously to a remote system.
The minimal bandwidth for the link dedicated to this volume replication should be more than 100MB/s, in order to allow for small bursts of I/O and for some future growth. It is important that the latency for sending data at a rate of 100MB/sec (over the WAN) to the remote system is 5ms or less.
Use InfiniMetrics to observe the I/O on the relevant volume.
Select the system, and go to the VOLUMES tab which show all the sampled volume activity. You can filter the list using the search box to locate the relevant volume whose activity you want to examine.
Click on the volume name to drill-in and view its activity over time.
Select the time range in the bar: it is recommended to investigate the activity over an entire day or week.
Look at the THROUGHPUT graph, and the legend that appears on the right side. Remove the green line for READ operations by clicking on the toggle next to the Read, and add the blue line for XCOPY by clicking on the toggle next to the X-Copy.
The examples above show consistent write activity of 300MB/s doing WRITE operations and 0MB/s doing XCOPY, which total at 300MB/s, adding 50% overhead we get to 450MB/s.
As a result, for the replication link between the InfiniBox systems should be capable of sustained throughput of 450MB/s.
Contention between synchronous and asynchronous replications or other network consumers
Synchronous and Asynchronous may coexist and compete over a shared bandwidth. This is clearly the case when both Synchronous and Asynchronous replications are defined in the system. It can also happen no async replications are defined, when some replicas are synchronized but other have fallen out of sync.
In such cases, bursts of asynchronous replication may incur an increased latency for Synchronous replications. To prevent this, it’s recommended to set the replication interval of the asynchronous replications very low, and to set a rate limit on the network space used for Asynchronous replication.
Note: The rate limit of the replication network space only affects the Asynchronous replication, the resynchronization of Synchronous replication failover, and the Initialization phase of the replication.
To calculate the rate limit, you will need to know the total verified available bandwidth for replication between the two sites. Make sure to verify this bandwidth beforehand (see Measuring the bandwidth between the sites)
Also, you will need to know the total bandwidth required for sync replication of volumes and consistency groups, as measured by existing I/O to the volumes (see Controlling Replication Bandwidth).
The formula for the rate limit is simple (divide by 3 because the rate limit is per node):
1.1 * 8 * ( Bandwidth between the systems - sync replication bandwidth ) / 3
Note: that the above formula calculates the rate limit in bits/second. The rate limit on the network space is measured in bps, whereas the bandwidth calculations and measurements above are in Bps). Multiplying by 1.1 allows for some imbalance between nodes traffic.
Set the rate limit on the network space where replication is running using the results from this formula. Modify the network space to place the rate limit on the replication network:
If you share the replication network between InfiniBox and other devices, it is important to make sure the network provides consistent bandwidth and performance to every device. Note that this requirement is no "on average" or "over time", but rather needs to be accurate at ANY time. Typically, only network QoS (quality of service) can provide these capabilities.
Controlling the Replication Bandwidth
The mechanisms for controlling the bandwidth utilized for replication relies on limiting the bandwidth rate for network spaces in InfiniBox.
Rate limit for Network Spaces
Every network space, replication included, allows you to set a limit on the rate each InfiniBox node will utilize for sending data. This rate limit feature is a key element in controlling the bandwidth used for async-replication in many scenarios and avoid contention with other traffic on the same network.
One example is allowing you to avoid increased latencies in sync-replication caused by async-replication saturating the network line. Another example is allowing you to avoid congestion on the network when there are other (non-InfiniBox) traffic taking place.
The rate limit can be set when you create a network space, or modified later on.
- The rate limit is specified in mega-bits/seconds (not mega-bytes/seconds).
- The rate limit is specified per node. Set the limit to slightly more than 1/3rd of the actual bandwidth intended for the InfiniBox system.
The rate limit of the replication network space only affects the Asynchronous replication, the resynchronization of Synchronous replication failover, and the Initialization phase of the replication.
The rate limit ensures that the InfiniBox system will not flood the network. It slows the transmission rate for the reads.
Rate limit only works on the following points:
- Rate limit only operates on the egress.
- For NAS, the rate limit only affects reads.
What rate limit does not do:
- Rate limit does not limit the amount of write operations.
- Rate limit does not limit ingress.
- Rate limit is not a replacement for QoS.
Measuring the bandwidth between the sites
The best tool for testing the available bandwidth is iPerf, a free traffic generating tool supporting multiple operating systems.
To use iPerf you will need two Linux hosts, one in the production site and another in the DR site. It is highly recommended to use a physical host or at least a dedicated network interface for this test to be sure
iPerf does not compete for the bandwidth of that interface with anyone else.
Choose a host in the DR site to be the server, which has connectivity to the Network Space of the replication link.
Run the following command on that host:
Choose a host in the production site to be the client, which has connectivity to the Network Space of the replication link.
Run the following command where BANDWIDTH is replaced with the theoretical total allocated bandwidth for all replications between the two systems:
-b parameter to make sure the iPerf test does not overuse the link capacity, which might affect other applications that use the same link adversely.
Make sure the connectivity between the two hosts is using the same switches and routes as the replication link.
Creating the Replication Service
Creating a link
- In the InfiniBox Management Console, click the Replication icon on the left toolbar.
- Select the Links tab, and click Create.
Enter the IP address of the relevant replication network space of the remote system, whose type is Replication Control .
To find the IP addresses, click the Settings icon on the left toolbar, select the Network Spaces tab, and select a replication service network space to drill down into.
- Select the local network space that runs the Replication service.
- If you plan to use this replication link for Active-Active replicas, add the witness IP address (optional).
- If the remote system credentials are not identical to the credential of the local system, you will be asked to enter them.
- Click Create.
The two systems are now connected. The Links tab displays the link status.
Active-Active replication witness
The witness is an arbitrator entity residing in a 3rd site (separate from the two InfiniBox systems involved in Active-Active replication), that acts as quorum in case of failure. It is a lightweight stateless software deployed as a VM.
For more information, please see Active-Active replication witness.
Infinidat strongly recommends that you use a witness system when you deploy Active-Active replication.
Should you want to use Active-Active replication without a witness, enter the witness address as 0.0.0.0. Be advised that if you choose this method, if the preferred system goes offline, the replicated volumes will also go offline
Adding a witness to a link
The witness can be defined when creating a new replication link, or it can be added to an existing link so that the link can be used for Active-Active replication.
Update a witness
If the witness IP address is changed, a new witness was deployed, update the link to use the new witness IP address.
Creating a replica
After following the steps above, you can now create a replica on the source system.
Select the Asyn, Sync or Active-Active tab, and click Create Replica.
Fill in the following details:
|Replication type||Select the type of the replication: Async, Sync or Active-Active|
|Remote system||The system that stores the target of the replica|
|Dataset type||The type of dataset to replicate: filesystem, volume or consistency group|
|Source||The name of the dataset that holds the initial data|
|Remote pool||Select the pool that will store the remote dataset|
The interval between two consecutive sync jobs (relevant to Async replicas only)
The maximum allowed lag between the source and target systems (relevant to Async replicas only)
Click Create. The replica is created.
The replica automatically initializes and its replication progress is visible on the replicas screen.