Overview of storage performance fundamentals, including metrics, workload modeling, latency, and differences between local and network-attached storage.
Storage performance is defined by several key concepts that affect how your system reads, writes, and handles real-world workloads. To understand what to expect from your storage, it helps to start with how performance is measured, how synthetic benchmarks compare to real-world workloads, and how your choice of storage type impacts the performance you can expect.
Most storage is measured in the number of read and write IOs that can be performed per second for a mix of those operation types at a given block size. Since storage systems may have different limitations in the number of operations per second they can perform or the maximum throughput they can attain however, the performance at either extreme of block size may be reported differently.
That is to say, generally:
Operations per second x block size in bytes = throughput in bytes per secondAt small block sizes the limitation is usually how fast operations requests can be responded to. At large block sizes the limitation is usually how fast the data can be sent to the storage client (or received by it). For this reason you will usually see benchmarks for small block sizes point to IOPS numbers and large block sizes point to throughput numbers.
There is also the question of latency. Each request takes a certain amount of time to be processed, the data read or written, and the response to be sent back to the client. Generally speaking, latency increases with higher IOPS at a given block size and climbs dramatically as we approach the throughput limit of the storage. Total IOPS or throughput may benefit from increasing parallelism, the number of workers requesting operations at the same time. Individual requests may have higher latency than would allow for maximum IOPS or throughput if requested serially. That is to say, your storage system can usually handle a lot more requests at the same time than operations latency may allow it to answer unless you make lots of requests in parallel.
Once your storage system is capable of providing more IOPS and throughput than a single client can process, the limit becomes the rate at which the client can process IO requests. Clients with little to no protocol offload capabilities may be bound by CPU, memory, or both. Protocol offload may reduce the load on CPU, but still have limitations as to the throughput or IOPS they can process.
Getting the best performance then depends on ensuring that storage performance is available at both the storage system and the storage client (in the form of sufficient processing power).
An additional complicating factor with file system performance and object performance is that reads and writes are not the only relevant operations that could consume time or processing power. Metadata operations such as checking for access to a file, reading a directory or listing a bucket can have profound performance implications as well.
Benchmarks usually simulate a well defined workload. A typical example may be 100% random read at 4 KB block size. Another example might be 100% sequential writes at a 4 MB block size. A common mixed workload may be 70% read, 30% write with 70% random, 30% sequential. But in the real world, your application’s workload may not be nearly so clean. Odds are good that your ratio of read to write and random to sequential may vary at any given moment and sometimes quite wildly. Your block sizes may be mixed within the workload as well.
Never expect that your workload will be served at precisely the same number of IOPS and throughput rates that a given synthetic workload will be. Benchmark numbers then should always be taken with a grain of salt, even these benchmark numbers. At best they are a guide to what you can likely expect if your workload approaches the ones used in the benchmarking. Still, they’re valuable so you can make an informed decision of just how much performance you can get out of a given storage system.
Local NVMe storage will always be higher performing than network attached storage. The reason is right there in the name. It’s local to the machine where your instance is running. It doesn’t have to be accessed over a network. Latency will therefore be much lower, IOPS will be higher, and the throughput will generally be higher than you can attain through your network interface. The downside of local storage though is that it is not as redundant as Vultr’s network attached storage offerings and should anything happen to the host, it may not be quickly recoverable whereas network attached storage can just be shifted to a different host where your instance is restarted.
This is an important difference to understand because some people hear names like NVMe Block and think they should expect the same performance as local NVMe. Yes, the storage in NVMe Block is using NVMe drives, but those NVMe drives are being accessed over the network. Yes, a very fast network, but that’s not the same as local.