Follow

Abstract

Modern storage systems vary in both front-end and back-end topology. Many of the high-end systems in the market today still carry legacy components and architectural choices made decades ago. They require complex configurations on the host side while failing to fully leverage current OS / Fabric capabilities that can maximize performance while minimizing management costs.

The InfiniBox storage system was designed from the ground up for the age of Analytics, Big Data and Cognitive Computing. This gives applications hosted on InfiniBox higher, more predicable performance, as well as much simpler, easy to manage host side configurations.

Applications migrated to InfiniBox benefit from a much lower TCO. Storage and application administrators are no longer burdened by mundane and error-prone configuration choices required by legacy arrays.

Introduction

The InfiniBox storage system is a modern, fully symmetric, all-active controller (node) system with an advanced multi-layer caching architecture. This architecture encompasses a double parity (wide-stripe) data distribution model. The system requires very little optimization in order to achieve the maximum performance with unparalleled ease of use. The self-optimization and healing attributes also mitigate typical user errors. Maximum performance is achieved with no tuning or optimization (when the best practices and deployment guides are followed). InfiniBox uses standard ‘off the shelf’ hardware components (CPU/Memory/HDDs/SSDs) capable of extracting the maximum performance from the 480 Near-Line SAS drives used in the InfiniBox architecture.

One of the key elements developed in the core system code is the ability to analyze real application profiles and to surgically define cache pre-fetch and de-stage algorithms. The system design specifically targets real life profiles such as OLTP/DBO to provide optimum performance.

Due to the unique, self-optimizing architecture of our system, this document focuses primarily on host and fabric connectivity settings and best practices derived directly from field experience and from Infinidat’s lab environment. InfiniBox Host PowerTools™ can automatically set up and create the most optimized host environment (multipath configuration, kernel drivers, etc.), leaving only the SAN zoning parameters left for the user to configure.

It is recommended that performance tests be constructed in a way that most closely resembles the I/O load for each individual application (based upon workload and requirements). Realistic applications and workloads yield far better results than synthetic testing, and should be employed where possible. It is understood that in certain cases, synthetic testing is the only mechanism at your disposal where real-life environments are difficult to replicate.

Host PowerToolsTM

Infinidat Host PowerTools deploys several methods for analyzing the host configuration to mirror the best practice configuration for performance as well as reliability. These configurations include HBA Queue Depth, Multipath & path failover, and OS patch/KB fixes. In general, allowing Host PowerTools to apply the built-in configuration achieves the best performance from InfiniBox, but you have the option to change these settings.

Queue Depth

Host HBA queue depth controls how many I/O operations are permitted to be sent to InfiniBox at the same time. Queue depth is related to performance by Little’s Law, which states that Rate (R) = Queue (Q) / Response Time (T).

Defining a relatively large queue depth (Q) allows InfiniBox to increase parallelism and engage more system resources simultaneously. This increases R for a given T so that more I/O operations are sent back from InfiniBox to the host at the same time, resulting in a higher I/O rate.

Given today’s modern multi socket / core / thread CPUs and applications, it generally makes sense to use higher queue depths. When increasing or decreasing the queue depth on a given host, carefully consider the amount of available queues on the InfiniBox (combined target queue depth) vs. the performance demands of the attached hosts. As of the writing of this document, the current buffer queue on InfiniBox FC target ports is set at 2048 per port.

To provide better I/O response time when connecting numerous hosts with a combined queue depth greater than 2048 per InfiniBox FC port, lower the queue depth on each host port. It is recommended to set the queue depth to 128. Depending on the environment and peak I/O times across the enterprise, it is fine to oversubscribe the InfiniBox ports in order to simplify and standardize the roll-out of multiple systems.

Host PowerTools can change the queue depth on Linux, but not on any other OS.

Multipath

InfiniBox employs a true active-active-active I/O handling mechanism that ensures proper load balancing of all resources under all conditions, including failure scenarios. It is highly recommended to engage all three (3) nodes for all hosts. This ensures best performance, increases overall host reliability, and provides consistent performance during failure scenarios.

Multipath Policy

We recommend that you choose either Round Robin or Least Queue (or equivalent algorithms) as the multipath policy.

These multipath policies use all the paths connected to the storage, with slight changes on the path selection for each I/O operation.

  • For Linux and UNIX hosts, the recommended multipath policy is Round Robin.
  • For Windows hosts, the recommended multipath policy is Least Queue Depth (LQD).
  • For vSphere hosts running version 6.5 or below, the recommended multipath policy is Round Robin with IOPS policy (iops policy set to 1).
  • For vSphere hosts running version 6.7 or above, the recommended multipath policy is Round Robin with latency policy (policy set to latency).

Multipathing (Path Changing Frequency)

InfiniBox’s data distribution algorithm ensures utilization of all the drives all the time. To achieve best utilization of all nodes, multipath / OS drivers spread I/O across all the paths in a regular schedule. To yield best overall results, set the frequency for switching between paths to a value smaller than the queue depth. It is recommended to set this frequency to 1 so that the path is switched for each I/O. For more information regarding items that are configured and checked by Host PowerTools, refer to List of items being checked by Host PowerTools.

Cabling & Zoning

To achieve the best performance from InfiniBox and highest availability for hosts, zoning each host to all three (3) active-active-active nodes on the system is strongly recommended. By doing this, we ensure balanced utilization of all the resources on the system in an optimal way. However, in some environments there may be too many hosts or insufficient FC switch ports available, so tradeoffs may have to be made. The InfiniBox will still make a best effort of utilizing and balancing all the nodes, even if a given host sends I/O to just one or two nodes by using the Infiniband interconnect between the nodes.

Generally speaking, the following guidelines should be observed when zoning hosts to an InfiniBox.

  • Each physical host should be zoned to all 3 storage nodes via at least 2 independent HBA ports (initiators) on 2 independent SAN fabrics.
  • A maximum 1-to-3 fan-out from host (initiator) to storage (target) port should normally be used. Thus for a host with two HBA ports, there will be 2 x 3 = 6 paths per LUN. For a host with 4 HBA ports, there will be 4 x 3 = 12 paths per LUN.
  • Zones should be created as single-initiator WWPN zones. Thus each zone will normally have a maximum of 4 WWPNs assigned: 1 WWPN from the host (initiator) and 3 from the storage (target).
  • As discussed in the previous section, hosts should only use supported and properly configured multipathing software to ensure proper load balancing across the storage nodes and maximum performance.
  • The table below shows examples of common host port configurations with the maximum theoretical BW achievable for either read or write. Performance for any individual host will depend heavily on the actual host I/O profile.

Bandwidth/Fan-out Table

Host Port Speed

# Host Ports

Recommended Fan-out

Paths Per LUN

Peak Theoretical Bandwidth

8Gb

2

1:3

6

1.6 GB/s

8Gb

4

1:3

12

3.2 GB/s

8Gb

6

1:1 or 1:2

6 or 12

4.8 GB/s

8Gb

12

1:1

12

9.6 GB/s

16Gb

2

1:3

6

3.2 MB/s

16Gb

4

1:3

12

6.4 GB/s

16Gb

6

1:1 or 1:2

6 or 12

9.6 GB/s

16Gb

12

1:1

12

19.2 GB/s

32Gb

2

1:3

6

6.4 GB/s

32Gb

4

1:3

12

12.6 GB/s

32Gb

6

1:1 or 1:2

6 or 12

19.2 GB/s

32Gb

12

1:1

12

38.4 GB/s

To achieve the recommended zoning goals, the physical cabling between the SAN fabrics and InfiniBox needs to be performed in a manner consistent with maximizing both performance as well as availability. This means cabling two separate SAN fabrics to each InfiniBox target HBA with as many connections as necessary to support the host infrastructure. Typical installations either use 12 of the InfiniBox FC ports or all 24 of them. Consider the diagram below as a basic guide for the physical cabling, where each green line represents 2-4 physical connections between the InfiniBox and the SAN fabrics.

As stated previously, when connecting InfiniBox to the Fiber Channel fabric, special care has to be taken to connecting ports from all three nodes and their HBAs to each switch. To avoid confusion, InfiniBox comes with a simple, color-coded patch panel that makes the cabling simpler. There are two types of patch panels, depending on the release.

The patch panel has 8 ports for each node on one row. The numbering above 1-4 & 5-8 still represents the division between the two HBAs per node: HBA one (1) – Ports 1-4, HBA two (2) – Ports 5-8.

For availability, simplicity and serviceability, a SAN fabric cabling strategy that makes use of both InfiniBox node HBAs is recommended. Thus, for a 24-port dual-SAN configuration, connect the InfiniBox to the switches as follows:

  • SAN A: Nodes 1-3, ports 1, 2, 5 & 6
  • SAN B: Nodes 1-3, ports 3, 4, 7 & 8

And for a 12-port dual-SAN configuration, connect the InfiniBox to the switches as follows:

  • SAN A: Nodes 1-3, ports 1 & 5
  • SAN B: Nodes 1-3, ports 2 & 6

LUN and Snapshots Considerations

LUN sizes

LUN size on the InfiniBox can vary from 1GB to the entire capacity of the system (above one (1) Petabyte). From the perspective of the system there is no difference between a 1GB LUN or a 1PB LUN. Performance will not change based on the LUN size because the system distributes the data in 64KB sections across all the drives in the array under all conditions. Based on this fact, the best practice is to provision LUN sizes that are ideal for the selected application and host OS. Generally using fewer, larger LUNs will yield better operational simplicity and while maintaining the best performance the system can deliver.

Number of LUNs

As opposed to traditional systems which might require many LUNs to engage multiple drives in the system, the InfiniBox array will spread the data on all the drives evenly regardless of the number or size of the LUNs. Be aware that depending on the OS, LUNs have queuing and data structures that can limit overall performance if over-consolidation is attempted. More LUNs will help mitigate this issue if applicable to your environment. Also, certain applications need to have multiple LUNs to achieve separation for reliable backup and DR, e.g., a database may require separate LUNs for database logs, as well as separate table spaces, indexes, temp space or binaries. If your application allocates threads or processes based on the number of LUNs provisioned, you might consider increasing the LUN count, allowing the application to parallelize I/O execution. However, if the application is sophisticated enough to define multiple threads independent of the number of LUNs, or the number of LUNs has no effect on application threads, there is no compelling reason to have multiple LUNs. The overall number of LUNs should generally less than the amount one would have on legacy storage systems.

Data awareness

The InfiniBox system is aware of the status of each one of the 64K chunks comprising each LUN. If a specific portion of the LUN has never been written, then any read requests to those sections will be answered directly from cache (no drive access). In addition, if an application/benchmark is writing zeroes (example dd if=/dev/zero), the InfiniBox system will be aware of this and send zeroes from cache in response to read requests. This data awareness can create unrealistic test results which might not match a real application I/O profile. As such when testing the system make sure your test suite is writing data other than zeros, or better yet use real application data.

Snapshots

InfiniBox uses a unique, patented mechanism of creating time-based entries to mark a snapshot start point on the system. This superior technology enables the instantaneous creation of snapshots without any impact on the utilization, drives, CPU, memory or performance in general. Modern day CPU’s can store a time-stamp based pointer in a non-blocking, atomic task with no overhead. Our technology utilizes that feature and creates a timestamp based entry. Once a snapshot has been created, InfiniBox will apply a Fast Overwrite (an in-cache process to minimize the access to disk). For example: when overwriting a current section in cache that is dirty (not written out to disk), the system will not discard the section if there are snapshots associated with the specific data. A quick, in-memory manipulation on the snapshot tree ensures snapshot and volume data are always consistent. During the de-stage process, InfiniBox employs a Redirect-On-Write mechanism (not Copy-On-Write which involves extra I/O requirements) for any snapshot changes that have taken place. This technique, in concert with InfiniBox InfiniRAID’s ability to span data across all the drives in a virtual schema, also contributes to a zero-impact operation. In summary, the end result of this innovative technology is shown in the chart below. Response times remain constant throughout the test as a large number of snapshots are created and then later removed from association with the volume. This is usually an intrusive process to operations on existing platforms today. The interval and usage of snapshots very much depends on the requirements of the individual application and customer. Usually, an RPO (recovery point objective) is in place and will ultimately guide you in this area. However, as a rule of thumb, users can apply at least one snap shot per LUN per day as a basic implementation strategy.



Was this article helpful?
0 out of 0 found this helpful

0 out of 0 found this helpful

Last edited: 2022-08-06 08:08:37 UTC

Comments