Modern storage systems vary in topology in both their front-end and their back-end. This is often a direct result of the fact that many of the high-end systems in the market today still carry legacy components and architectural choices made decades ago. Thus they require much more complex configurations on the host side while failing to fully leverage current OS / Fabric capabilities that can maximize performance while minimizing management costs.
Being designed from the ground up for the age of Analytics, Big Data and Cognitive Computing, the InfiniBox system benefits tremendously by avoiding these legacy architectural decisions. This gives applications hosted on InfiniBox higher, more predicable performance as well as much simpler, easier to manage host side configurations.
The net result is a much lower TCO for applications migrated to InfiniBox, and a freedom from the mundane and error-prone configuration choices by storage and application administrators required for legacy arrays.
The InfiniBox storage system is a modern fully symmetric, all-active controller (node) system with an advanced multi-layer caching architecture. This architecture encompasses a double parity (wide-stripe) data distribution model. The system requires very little optimization in order to achieve the maximum performance with unparalleled ease of use. The self-optimization and healing attributes also mitigate typical user errors. Maximum performance is achieved with no tuning or optimization (when the best practices and deployment guides are followed). InfiniBox industry uses standard ‘off the shelf’ hardware components (CPU/Memory/HDDs/SSDs) capable of extracting the maximum performance from the 480 Near-Line SAS drives used in the InfiniBox architecture. Most of the optimization requirements addressed in this document are related to fabric / host settings and best practices derived directly from field experience. They have also been extensively tested in INFINIDAT’s lab environment.
One of the key elements developed in the core system code is related to ability of analyzing real application profiles and surgically defining cache pre-fetch and de-stage algorithms. The systems design specifically targets real life profiles such as OLTP/DBO and provides optimum performance under such conditions.
Due to the unique, self-optimizing architecture of our system, this document will primarily focus around host and fabric connectivity. InfiniBox Host PowerTools™ can automatically setup and create the most optimized host environment (multipath configuration, kernel drivers, etc.), leaving only the SAN zoning parameters left for the user to configure.
It is also worth briefly mentioning at this point that any performance tests should be constructed in a way that most closely resembles the I/O load for each individual application (based upon workload and requirements). You will find that realistic applications and workloads yield far better results than synthetic testing, and therefore should be employed where available / feasible. Clearly, it is understood that in certain cases, synthetic testing is the only mechanism at your disposal where real live environments are hard to replicate. Simply stated, the goal is to mimic a real production environment as close as possible, given the limitations of the performance tools selected.
INFINIDAT Host PowerTools deploy several methods to analyze the host configuration to mirror the best practice configuration for performance as well as reliability. Configurations such as HBA Queue Depth, Multipath & path failover, and OS patch/KB fixes. In general allowing Host PowerTools to apply the canned configuration will achieve the best performance out of InfiniBox. You may consider changing the specific ones listed below.
Host HBA queue depth controls how many I/O operations are permitted to be sent to InfiniBox at the same time. Queue depth is related to performance by Little’s Law, which states that Rate (R) = Queue (Q) / Response Time (T).
As such, defining a relatively large queue depth (Q) will allow InfiniBox to increase parallelism and engage more system resources at the same time, thereby increasing R for a given T. The end result is that more I/O operations will be sent back at the same time from InfiniBox to the host and a higher IO rate can be achieved.
Given today’s modern multi socket / core / thread CPU’s and applications it generally makes sense to use higher queue depths. The queue depth used on supported OSes by the current Host PowerTools release is 128 per host initiator port. The consideration to increase or decrease the queue depth on a given host should be carefully weighed against the amount of available queues on the InfiniBox (combined target queue depth) vs. the performance demands of the attached hosts. As of the writing of this document, the current buffer queue on InfiniBox FC target ports is set at 2048 per port.
Overall, InfiniBox has a combined buffer queue of 2048 x 24 ports = 49,152. Given the 128 default queue depth on each host port, the maximum number of supported host ports without any oversubscription is 49,152 / 128 = 384 initiators. This implies support for up to 192 dual-homed hosts. Note, this assumes even distribution across all ports with round-robin default multipath policy as configured by the INFINIDAT Host PowerTools. If there is a requirement to connect more hosts then lowering the queue depth to 64 on each host port would allow up to 768 initiators connected to each InfiniBox. Depending on the environment and peak I/O times across the enterprise, it may be perfectly fine to oversubscribe the InfinBox ports in order to simplify and standardize the roll-out of multiple systems.
InfiniBox employs a true active-active-active IO handling mechanism, which ensures proper load balancing of all resources under all conditions (including failure scenarios). It is therefore highly recommended to engage all three (3) modules for all the hosts when possible. Not only it will ensure best performance but also provide consistent performance during failure scenarios as well as increase overall host reliability.
When choosing the multipath policy we recommend either Round-Robin or Least Queue (or equivalent algorithms). Round-robin will use all the paths connected to the storage without querying any information such as current queue size, recent IO activity or type of resources on the host prior to making the path decision. In a few lab test cases (demonstrated below), Least Queue performed slightly better than Round-Robin. Since most of the multipath drivers support round-robin, Host PowerTools will configure Round-Robin as the default multipath configuration during install.
Multipathing (Path Changing Frequency)
InfiniBox’s data distribution algorithm ensures utilization of all the drives all the time. All that remains is for multipath / OS drivers to spread I/O across all the paths in a regular schedule in order to achieve best utilization of all the nodes. Generally, setting the frequency to switch between paths to a value smaller than the queue depth will yield best results overall. As an example the current settings of the Linux rr_min_io/rr_min_io_rq parameter is set to 1, while the default queue depth is set to 128.
Cabling & Zoning
To achieve the best performance from InfiniBox and highest availability for hosts, zoning each host to all three (3) active-active-active Nodes on the system is strongly recommended. By doing this, we ensure balanced utilization of all the resources on the system in an optimal way. However, in some environments there may be too many hosts or insufficient FC switch ports available, so tradeoffs may have to be made. The InfiniBox will still make a best effort of utilizing and balancing all the Nodes, even if a given host sends I/O to just one or two Nodes by using the Infiniband interconnect between the nodes.
Generally speaking, the following guidelines should be observed when zoning hosts to an InfiniBox.
- Each physical host should be zoned to all 3 storage nodes via at least 2 independent HBA ports (initiators) on 2 independent SAN fabrics.
- A maximum 1-to-3 fan-out from host (initiator) to storage (target) port should normally be used. Thus for a host with two HBA ports, there will be 2 x 3 = 6 paths per LUN. For a host with 4 HBA ports, there will be 4 x 3 = 12 paths per LUN.
- Zones should be created as single-initiator WWPN zones. Thus each zone will normally have a maximum of 4 WWPNs assigned: 1 WWPN from the host (initiator) and 3 from the storage (target).
- As discussed in the previous section, hosts should only use supported and properly configured multipathing software to ensure proper load balancing across the storage nodes and maximum performance.
- The table below shows examples of common host port configurations with the maximum theoretical BW achievable for either read or write. Performance for any individual host will depend heavily on the actual host I/O profile.
Host Port Speed
# Host Ports
Paths Per LUN
Peak Theoretical Bandwidth
1:1 or 1:2
6 or 12
1:1 or 1:2
6 or 12
To achieve the recommended zoning goals, the physical cabling between the SAN fabrics and InfiniBox needs to be performed in a manner consistent with maximizing both performance as well as availability. This means cabling two separate SAN fabrics to each InfiniBox target HBA with as many connections as necessary to support the host infrastructure. Typical installations either use 12 of the InfiniBox FC ports or all 24 of them. Consider the diagram below as a basic guide for the physical cabling, where each green line represents 2-4 physical connections between the InfiniBox and the SAN fabrics.
As stated previously, when connecting InfiniBox to the Fiber Channel fabric, special care has to be taken to connecting ports from all three nodes and their HBAs to each switch. To avoid confusion, InfiniBox comes with a simple, color-coded patch panel that makes the cabling simpler. There are two types of patch panels, depending on the release.
The patch panel has 8 ports for each Node on one row. The numbering above 1-4 & 5-8 still represents the division between the two HBAs per Node: HBA one (1) – Ports 1-4, HBA two (2) – Ports 5-8.
For simplicity and serviceability, a SAN fabric cabling strategy that makes use of both InfiniBox Node HBAs is recommended. Thus, for a 24-port dual-SAN configuration, connect the InfiniBox to the switches as follows:
- SAN A: Nodes 1-3, ports 1, 2, 3 & 4
- SAN B: Nodes 1-3, ports 5, 6, 7 & 8
And for a 12-port dual-SAN configuration, connect the InfiniBox to the switches as follows:
- SAN A: Nodes 1-3, ports 1 & 2
- SAN B: Nodes 1-3, ports 5 & 6
In both cases a given host can be zoned to ports 1 & 5, or 2 & 6, and so on to make maximum balanced use of the InfiniBox resources.
LUN and Snapshots Considerations
LUN size on the InfiniBox can vary from 1GB to the entire capacity of the system (above one (1) Petabyte). From the perspective of the system there is no difference between a 1GB LUN or a 1PB LUN. Performance will not change based on the LUN size because the system distributes the data in 64KB sections across all the drives in the array under all conditions. Based on this fact, the best practice is to provision LUN sizes that are ideal for the selected application and host OS. Generally using fewer, larger LUNs will yield better operational simplicity and while maintaining the best performance the system can deliver.
Number of LUNs
As opposed to traditional systems which might require many LUNs to engage multiple drives in the system, the InfiniBox array will spread the data on all the drives evenly regardless of the number or size of the LUNs. Be aware that depending on the OS, LUNs have queuing and data structures that can limit overall performance if over-consolidation is attempted. More LUNs will help mitigate this issue if applicable to your environment. Also, certain applications need to have multiple LUNs to achieve separation for reliable backup and DR, e.g., a database may require separate LUNs for database logs, as well as separate table spaces, indexes, temp space or binaries. If your application allocates threads or processes based on the number of LUNs provisioned, you might consider increasing the LUN count, allowing the application to parallelize IO execution. However, if the application is sophisticated enough to define multiple threads independent of the number of LUNs, or the number of LUNs has no effect on application threads, there is no compelling reason to have multiple LUNs. The overall number of LUNs should generally less than the amount one would have on legacy storage systems.
The InfinBox system is aware of the status of each one of the 64K chunks comprising each LUN. If a specific portion of the LUN has never been written, then any read requests to those sections will be answered directly from cache (no drive access). In addition, if an application/benchmark is writing zeroes (example dd if=/dev/zero), the InfiniBox system will be aware of this and send zeroes from cache in response to read requests. This data awareness can create unrealistic test results which might not match a real application IO profile. As such when testing the system make sure your test suite is writing data other than zeros, or better yet use real application data.
InfiniBox uses a unique, patented mechanism of creating time-based entries to mark a snapshot start point on the system. This superior technology enables the instantaneous creation of snapshots without any impact on the utilization, drives, CPU, memory or performance in general. Modern day CPU’s can store a time-stamp based pointer in a non-blocking, atomic task with no overhead. Our technology utilizes that feature and creates a timestamp based entry. Once a snapshot has been created, InfiniBox will apply a Fast Overwrite (an in-cache process to minimize the access to disk). For example: when overwriting a current section in cache that is dirty (not written out to disk), the system will not discard the section if there are snapshots associated with the specific data. A quick, in-memory manipulation on the snapshot tree ensures snapshot and volume data are always consistent. During the de-stage process, InfiniBox employs a Redirect-On-Write mechanism (not Copy-On-Write which involves extra IO requirements) for any snapshot changes that have taken place. This technique, in concert with InfiniBox InfiniRAID’s ability to span data across all the drives in a virtual schema, also contributes to a zero-impact operation. In summary, the end result of this innovative technology is shown in the chart below. Response times remain constant throughout the test as a large number of snapshots are created and then later removed from association with the volume. This is usually an intrusive process to operations on existing platforms today. The interval and usage of snapshots very much depends on the requirements of the individual application and customer. Usually, an RPO (recovery point objective) is in place and will ultimately guide you in this area. However, as a rule of thumb, users can apply at least one snap shot per LUN per day as a basic implementation strategy.