Flash memory is used to achieve epic performance in the data center, but it doesn’t always come down to choosing this silicon approach over hard disk. Even when Flash is your decision, there’s an application-focused approach to choosing between putting it directly into the server or going network-attached. In this article, we’ll cover specific use cases to determine whether PCIe Flash cards or a Flash array is best suited for your needs.
The types of applications that are best targeted at PCIe cards include:
- Applications where fault tolerance is built into the software layer, as is common in many BigData/NoSQL/NewSQL software platforms. Generally these are designed for no-shared storage (DAS) and store multiple copies over the network.
- Read-caching use cases where it’s too expensive in CapEx and power to only use DRAM, but the application can take advantage of a large read cache.
- Applications that are not business-critical and don’t need capacities beyond what can fit into a PCIe card and can accept some loss of availability if the server goes down or needs to be rebooted.
Flash arrays, on the other hand, are best used for applications that don’t fit the profiles above and require higher levels of fault-tolerance, as well as the ability to share resources among multiple servers across a network (FC/iSCI/IB).
Applications with Fault Tolerance Built-In
Many newer applications have built fault tolerance into the software layer; this is especially common in BigData/NoSQL/NewSQL software platforms. Generally, these are designed for no-shared storage (DAS) and synchronously replicate writes over an IP network, but nothing precludes one from using SAN storage in these deployments. While these extra data copies multiply the effective $/GB of the storage, the applications turn that necessity into a performance virtue by allowing the read and compute activity to be spread out among the servers. For these applications, the RAID overhead inherent in creating a highly reliable array does not result in increased availability, therefore cards would generally be the preferred option.
Another advantage of cards over arrays in these deployments is that performance can scale linearly with servers. Assuming that the IP switches can keep up, then the absence of shared storage resources allows performance to grow up to the limits of the software platforms.
When evaluating storage options for these platforms, consider the fact that most of them do relatively large writes (128KB and above is common) and have both small and large reads. This means that write IOPS are low but data volume can be large, whereas read IOPS and volume will both be high. Assuming a 50/50 read/write load from a bandwidth point of view is a reasonable model; most PCIe cards are not designed with this high a write load in mind.
There are many IO intensive applications that really only need reads to be as fast as possible, whereas writes are of relatively little concern. In these workflows, one can use either a PCIe Flash card or an array, but cards tend to be the most cost-effective and the highest performing option when the software supports it.
There are other caching workflows like VDI and swap files that may well be over 50 percent writes, but that data is transient and doesn’t have high availability requirements. One would imagine that PCIe cards would be the preferred choice here, but be careful. Arrays have historically been better able to handle mixed read/write loads than PCIe cards due to their ability to spread the write and erase load across many more Flash modules. Arrays are also better able to maximize the longevity of the Flash than many independent cards.
Finally, the caching concept can be extended to workflows where there is another copy of the source data somewhere else in the environment. It is common for analytics applications to work on a copy of the source data on other servers to avoid affecting the performance of a high priority workflow. In this case, even though there may actually be data written by the analytics application, it can generally be recomputed should it be lost. So there is a much lower requirement for high availability. This would again appear to be a use case for cards, but there is a more efficient option.
Instead of going through the overhead in time and networking bandwidth to make a copy of the source data, it’s far better to give that analytics application a read-only or read-write clone of the source data served by the same SAN-attached array. Flash arrays generally have enough performance to support the added IOPS required by the primary and secondary workflows and clones take far less overhead than full copies over the network. Additionally, cloning allows the analytics application to operate on far more current data than might be possible when copying over the network, resulting in more accurate and timely results.
The most common use cases for Flash arrays are OLTP/OLAP and virtualized servers. Mission-critical relational databases have been written to a clustered, shared storage model which typically means SAN arrays; when looking at Oracle RAC, SAP, Clustered Microsoft SQL Server, large data warehouses and similar applications an array will be the preferred solution. Virtualized servers generally require shared storage to enable virtual servers to move across physical servers. In these enterprise deployments, SAN arrays rule because the applications require them. The high performance, relatively high write loads, and high fault tolerance requirements mandate a Flash array solution.
Where individual database instances are so small in capacity that an array seems like overkill, be sure to look at the larger environment. Often there are many small databases that in aggregate can cost-justify using an array for database consolidation; while no single database instance may seem mission-critical, in aggregate these dozens or hundreds of databases have a major impact on the business.
To Card or to Array
In the end, the most important difference between Flash cards and arrays is really the most obvious–cards are not highly available. No matter how high quality that card might be, if the server is down, you cannot access the data on the card, whereas arrays are generally highly fault tolerant. This difference in availability tends to drive the applications and workflows that are more appropriate for arrays and those that can leverage the lower-cost card alternative.
Jonathan Goldick is the Violin Memory CTO of Software, responsible for driving the data management software architecture. As the former CTO at OnStor, he set the technical direction for scale-out network attached storage (NAS) hardware and software design.
Image: Stefan Petru Andronache/Shutterstock.com