Ethan Miller
The storage industry is currently in the midst of a flash revolution. Today’s smartphones, cameras, and many laptops all use flash storage, but the $30 billion a year enterprise storage market is still dominated by spinning disk. Flash has large advantages in speed and power consumption, but its disadvantages (cost, limited overwrites, large erase block size) have prevented it from being a drop-in replacement for disk in enterprise storage environments. This talk will describe the techniques that we’ve developed at Pure Storage to overcome these obstacles in creating a high-performance flash storage array using commodity SSDs. We’ll describe the design of the Pure FlashArray, an enterprise storage array built from the ground up from relatively inexpensive consumer flash storage. The array and its software, Purity, leverage the advantages of flash while minimizing the downsides. Purity performs all writes to flash in multiples of the SSD erase block size, and keeps data in a key-value store that persists approximate answers to further reduce writes at the cost of extra (cheap) reads. Our key-value store, which includes a medium-grained identifiers to enable large numbers of snapshots and a key range invalidation table, provides other advantages, such as the ability to take nearly instantaneous, zero-overhead snapshots and the ability to bound the size of our metadata structures despite using monotonically-increasing unique identifiers for many purposes. Purity also reduces the amount of user data stored on flash through a range of techniques, including compression, deduplication, and thin provisioning. The system relies upon RAID both for reliability and for performance consistency: by avoiding reads to devices that are being written, it ensures more efficient writes and eliminates long-latency reads. The net result is a flash array that delivers sustained read-write performance of over 500,000 8KB I/O requests per second while maintaining uniform sub-millisecond latency and providing an average data reduction rate of 6x, averaged across installed systems.