Storage Performance for Vultr File System

Updated on 25 February, 2026

Comprehensive performance analysis of Vultr File System, including fio and smallfile benchmarks, metadata optimization, and tuning guidance.


Vultr File System is intended for clusters of CPU or GPU systems to share stored data between them, with an eye toward cache coherency so that writes made by any one client of the file system are seen by subsequent reads of the file system by other clients. Because Vultr File System tends to be used with large numbers of clients accessing the same file system at the same time, metadata operations have also been heavily optimized.

For additional context on the performance metrics referenced throughout this document and to better understand how storage performance is measured, benchmarked, and compared, see What Are the Fundamentals of Storage Performance?.

Rate Limits

To make performance tuning simple for large clusters, there are presently no rate limits imposed on Vultr File System by default. We reserve the option to impose such limits in the future if the lack of such limits is abused. Until such a time, this means that Vultr File System can provide very high throughput in use cases where workloads require lots of throughput, such as keeping data loading into GPU memory in AI Training clusters.

Benchmark Results

With VFS, we performed two different sets of benchmarks.

  • To test generalized read/write performance we used the utility fio to measure performance for 4 KB, 64 KB, 512 KB, 1024 KB, and 4096 KB block sizes. We ran tests with 100% read (both random and sequential), 100% write (again both random and sequential), and a mixed workload of both reads and writes.

  • We also used the benchmarking utility smallfile to measure file system performance, at 64 KB, 512 KB, 1024 KB and 4096 KB total file sizes. This tool exercises file system semantics and metadata operations, something a tool like fio cannot be used to measure properly.

We performed these tests across a wide variety of VM instance plans so as to see break points where insufficient CPU or memory in the plan could impact storage performance. We did not find such a break point with block performance even on plans with only 1 core and 1 GB of RAM.

Fio

There are three different classes of hypervisor hosts at Vultr and they each have different performance characteristics:

Because of how these different plan types consume host resources, they may or may not be able to optimally support VFS. Both Vultr Cloud GPU and VX1 plans will perform optimally, whereas older GPU plans have less performance at the time of this writing.

VCG plans and VX1 plans have the following performance profile.

IO Type Block Size Mean IOPS Mean Throughput (MiB/s) Mean Latency (ms)
randwrite 4 KB 12,398.0 48.4 78.1
randwrite 64 KB 7,882.7 492.7 125.7
randwrite 512 KB 2,826.2 1,413.1 368.0
randwrite 1 MB 1,788.6 1,788.6 568.3
randwrite 4 MB 383.2 1,532.6 2,520.0
randread 4 KB 33,706.7 131.7 32.0
randread 64 KB 21,353.8 1,334.6 52.0
randread 512 KB 7,468.4 3,734.2 130.0
randread 1 MB 5,287.3 5,287.3 185.0
randread 4 MB 1,094.0 4,375.9 893.7
randrw 4 KB 16,981.3 66.3 56.8
randrw 64 KB 10,634.7 664.7 86.1
randrw 512 KB 3,900.8 1,950.4 254.3
randrw 1 MB 2,505.4 2,505.3 394.5
randrw 4 MB 507.2 2,029.0 1,885.0

The following table shows the performance characteristics of the “other” plan types.

IO Type Block Size Mean IOPS Mean Throughput (MiB/s) Mean Latency (ms)
randwrite 4 KB 1,489.2 5.4 20.5
randwrite 64 KB 974.4 60.9 36.1
randwrite 512 KB 357.2 188.75 84.58
randwrite 1 MB 236.2 236.25 136.25
randwrite 4 MB 63.0 252 508.25
randread 4 KB 4,371.2 17.08 6.32
randread 64 KB 2,560.0 160 12.23
randread 512 KB 782.0 391 41.25
randread 1 MB 427.8 427.75 77.8
randread 4 MB 101.0 404 327
randrw 4 KB 4,416.0 17.25 11.8
randrw 64 KB 2,752.8 172.05 21.83
randrw 512 KB 868.0 434 62.46
randrw 1 MB 517.0 517 103.3
randrw 4 MB 146.9 587.5 409.38

Smallfile

The smallfile benchmark exercises file system metadata operations. Generally, repeatedly opening and closing small file sizes result in significant metadata operations overhead. Many metadata operations have a significant cost when compared to simply reading and writing data. Using fio alone to gauge the performance of your file system storage would ignore this, essentially only exercising read and write operations without accounting for metadata operations overhead.

We did a significant amount of performance optimization work to dial in the right balance between impact to CPU and memory resources on the client hosts and the overall performance of VFS as a result of our testing. We paid particular attention to the performance when small files are opened, read, and then closed on after another. The result of these optimizations at the hypervisor level should lead to the best possible performance even when faced with what is colloquially known as “the small file problem”.

The following table can give you an idea of how VFS will perform when these metadata operations become a significant portion of your overall workload. For example, stat operations on files can be particularly taxing, so being able to perform them on a large number of files per second has been a significant focus for us.

IO Type File Size Files/s or IOPS Total Data Size (GiB) Throughput (MiB/s)
create 64 KB 4,808.8 56.2 300.6
create 512 KB 2,954.5 433.7 1,477.4
create 1024 KB 1,702.3 735.5 1,702.6
create 4096 KB 3,249.0 1,166.9 3,249.5
append 64 KB 4,021.8 56.6 251.4
append 512 KB 2,295.5 459.7 1,147.8
append 1024 KB 1,646.0 851.8 1,646.5
append 4096 KB 4,036.7 1,504.4 4,037.1
stat 64 KB 13,548.8 N/A N/A
stat 512 KB 10,944.3 N/A N/A
stat 1024 KB 10,365.8 N/A N/A
stat 4096 KB 13,714.3 N/A N/A
chmod 64 KB 10,815.8 N/A N/A
chmod 512 KB 9,773.8 N/A N/A
chmod 1024 KB 7,313.5 N/A N/A
chmod 4096 KB 10,175.0 N/A N/A
read 64 KB 8,428.0 60.7 526.8
read 512 KB 4,279.3 485.3 2,139.9
read 1024 KB 2,482.8 835.8 2,483.1
read 4096 KB 5,966.0 1,448.6 5,966.4
overwrite 64 KB 3,458.3 57.6 216.2
overwrite 512 KB 2,524.8 464.8 1,262.5
overwrite 1024 KB 1,507.8 822.2 1,508.0
overwrite 4096 KB 3,469.3 1,514.0 3,469.7
delete 64 KB 6,715.8 N/A N/A
delete 512 KB 5,717.8 N/A N/A
delete 1024 KB 7,642.3 N/A N/A
delete 4096 KB 6,686.7 N/A N/A

Some interesting things to note about the table above:

  • Some of the IO Types do not result in transfer of data, only metadata, as a result they do not have a corresponding data size or throughput associated with the test. In those cases, the results represent the number of files processed per second rather than the number of IOPS.
  • The 4096 KB file size aligns with the object size of the underlying storage, so it tends to perform better than the trend with smaller file sizes might predict because it leads to less back-end storage system communication between storage cluster nodes.

Replicating These Results

Vultr File System is benchmarked using two different methods.

Using fio

You can replicate these results for yourself by using fio at any of the blocksizes mentioned at any operations mix you would like.

  1. Install the fio utility and any dependencies. It can be found on git.kernel.org, but your distribution likely has it available as a package. In most distributions the package is simply called fio. You should also install libaio so that it is available to fio. The package is usually called either libaio_dev or libaio_devel, depending on your distribution.

  2. When running fio, you will need to create a job configuration file that you can reference and then run a command line that points at the job file.

    ini
    [FIOJOB]
    filename=/mnt/vbs/fio.raw
    size=500G
    random_generator=lfsr
    buffered=0
    direct=1
    invalidate=0
    ioengine=libaio
    rw=randwrite
    bs=4k
    iodepth=4
    numjobs=16
    runtime=900
    loops=1
    time_based=1
    

    Key values to change to match the workload you are testing are:

    • filename= This should be a file on the file system where you are testing. If you are testing directly on the block device itself, understand that the test is destructive to any data contained in the file and will destroy any file system on the raw device.
    • direct= 1 enables O_DIRECT, 0 disables it. Use direct=1 with ioengine=libaio.
    • ioengine= We recommend libaio for best results, but you may wish to compare it with sync or psync.
    • rw= randread, randwrite, and randrw are the most useful options.
    • bs= Block size.
    • iodepth= The queue depth per job.
    • numjobs= The number of simultaneous jobs to run.
  3. Then you can reference the job config from the command line:

    console
    $ fio \
        --eta=never \
        --status-interval=5000ms \
        --output-format=json+ \
        $FIOJOBFILE
    

    Where $FIOJOBFILE is the path to the job file created above. See the fio documentation for more details.

Using smallfile

  1. To obtain smallfile you can download it from its GitHub repository. It requires Python.

  2. You need to create a YAML jobfile describing the job parameters. Refer to the smallfile documentation on its GitHub repository for full details, but the jobfile takes the following form.

    ini
    top: /mnt/vfs/smallfile
    operation: create
    output-json: /tmp/smallfile/smallfileresults.json
    files: 1000
    threads: 100
    auto-pause: true
    file-size: 512
    files-per-dir: 1000
    dirs-per-dir: 1000
    hash-into-dirs: true
    xattr-size: 128
    finish: true
    

    Key values to change to match the workload you are testing are:

    • top: This is the path to the filesystem you’re testing. Change it to match your VFS mount point.
    • operation: This is the operation benchmark type. See the smallfile documentation for a full list of available benchmarks.
    • output-json: This is the path to the output file for your results.
    • threads: The number of threads to execute the benchmark. We used 100.
    • file-size: The size of the file used for testing. We used 4k, 64k, 512k, 1M, and 4M in our testing.
  3. Once the job file is created, you can execute the benchmark with the command:

    console
    $ python3 smallfile_cli.py \
        --yaml-input-file=$SMALLFILEJOBFILE
    

    In the above command, $SMALLFILEJOBFILE is the path to the jobfile created above.

Tuning Tips for Best Performance with Vultr File System

  • Run the latest kernel possible in your instance. Recent kernels have significant optimizations for the virtio-fs filesystem type used by VFS.

  • Be sure to mount the file system with the relatime or noatime mount options if your application permits. Updating atime is expensive if your application doesn’t require it. See the documentation for the mount command and the /etc/fstab configuration file for your OS distribution for more details.

  • Metadata operations are expensive. Avoid opening and closing files repeatedly whenever possible. Instead, open files and keep them open until they are no longer needed. Consolidating data into larger files that can be opened once and kept open (such as databases and big data file formats) can make a huge difference in file system performance (for all file systems, not just VFS).

  • Avoid synchronous IO such as psync under most workloads. Instead, use libaio for the benefit of greater queue depths. Test both to determine best performance. Some applications may be able to leverage libaio via a configurable option. If you’re writing your own application, see the documentation for libaio as used in the language of your choice. For fio, enabling libaio is achieved with the option ioengine=libaio.

  • Direct I/O allows you to bypass Linux caches. Consider using O_DIRECT for increased performance. In fio this is enabled with the combination of two options: direct=1 and buffered=0.

  • Enable high levels of parallelism by making more simultaneous requests. Increase the number of processes, threads or workers making requests. How to do this in your specific application will vary. Many applications will have configuration options, and many libraries will have methods for doing this. In the fio benchmarking utility you can increase parallelism by increasing numjobs.

    If you are trying to discover best possible performance limits on a system where you do not already know the potential bottlenecks you can try repeatedly increasing levels of parallelism until you discover the point where increasing parallelism does not result in improved performance. Typically this will be some multiple of the number of available threads on your CPU. Start with 1 x vCPU and work up from there.

  • Increasing queue depths will allow for more requests to be in flight waiting for replies so as not to allow high latency to lower your throughput artificially. How to do this in your specific application will vary. Many applications will have configuration options, and many libraries will have methods for doing this. In the fio benchmarking utility you can increase the number of requests each job will allow to be in flight without a response by increasing iodepth.

    If you are trying to discover best possible performance limits on a system where you do not know the potential bottlenecks you can try repeatedly increasing the queue depth until you discover the point where a deeper queue does not result in improved performance. Typically this is a function of average latency and the number of requests you can queue while waiting for a typical response. The goal is to have a queue that is always full.

Comments