Measuring the speed of an NVMe PCIe SSD in PetaLinux

With FPGA Drive we can connect an NVM Express SSD to an FPGA, but what kind of real-world read and write speeds can we achieve with an FPGA? The answer is: it depends. The R/W speed of an SSD depends as much on the SSD as it does on the system it’s connected to. If I connect my SSD to a 286, I can’t expect to get the same performance as when it’s connected to a Xeon. And depending on how it’s configured, the FPGA can be performing more like a Xeon or more like a 286. To get the highest performance from the SSD, the FPGA must be a pure hardware design, implementing NVMe protocol in RTL to minimize latency and maximize throughput. But that’s hard work, and not very flexible, which is why most people will opt for the less efficient configuration whereby the FPGA implements a microprocessor running an operating system. In this configuration, we typically wont be able to exploit the full bandwidth of NVMe SSDs because our processor is just not powerful enough.

But we still want to know, what speeds do we get from an FPGA running PetaLinux? To answer this question, I’ve done tests on two platforms. One on the KC705 board, running PetaLinux on the Microblaze soft processor, and another on the PicoZed 7030, running PetaLinux on the ARM Cortex-A9 processor.

How to measure the speed of an SSD in Linux. There are many ways to measure the read and write speed of an SSD in Linux, but the only one available to us in PetaLinux is the dd command. But that wont suffice. Normally the dd command alone gives us the read/write speed, but PetaLinux is built with a leaner version of dd that does not make this calculation for us. So we have to use the time command as well, and make the calculation ourselves.

The dd command lets us specify an input device, an output device and the number of bytes to transfer between them. The commands below will transfer 2 Gigabytes of data between the input device and output device.

Write test: time dd if=/dev/zero of=/dev/nvme0n1p1 bs=2M count=1000 Read test: time dd if=/dev/nvme0n1p1 of=/dev/null bs=2M count=1000

KC705 Test


From the screenshot above, we can see that the write test transfers 2 Gigabytes of data in 7 minutes and 45 seconds. The read test transfers 2 Gigabytes in 2 minutes and 21 seconds.

The video was taken during the write test to show the SSD activity LED. Notice that the SSD has a couple of seconds of inactivity at regular intervals. I’m not sure exactly what is happening during those couple seconds, but something is holding up the show. Although not shown in the video, during the read test, the SSD doesn’t seem to go through these same periods of inactivity.

PicoZed Test


You can see in the screenshot above that I used the lsblk command to get the name of a partition on the SSD (nvme0n1p1). I use this device name as the input device in the write test, and the output device in the read test. The transfers went a lot faster in the PicoZed test, so I used a larger transfer size just to make the test last a bit longer and improve the accuracy of the result. The write test transfers 16 Gigabytes of data in 3 minutes and 9 seconds. The read test transfers 16 Gigabytes in 2 minutes and 12 seconds.

The video was taken during the write test to show the SSD activity LED. Notice that there are no breaks in SSD activity, in contrast to the Microblaze design. During the read test, same thing.


Kintex-7 KC705 MicroBlaze processor clocked at 125MHz

  • Write speed: 4.3 MBps
  • Read speed: 14.2 MBps

Zynq-7000 PicoZed 7030 ARM Cortex-A9 clocked at 667MHz

  • Write speed: 84.7 MBps
  • Read speed: 121.2 MBps

So there’s a massive difference between the performance of the PicoZed and that of the KC705. The Zynq gets almost 20 times faster write speeds, and about 9 times faster read speeds. This isn’t only due to the faster processor, in the Zynq design, the AXI Memory Mapped to PCIe IP connects to the system memory via a high-performance (HP) AXI slave interface, and it doesn’t have to share that interface with the processor. In the Microblaze design, both the processor and the PCIe IP have to share access to the MIG through an AXI Interconnect.

Neither design comes close to the performance in the Samsung specs nor test results by Arstechnica, showing sequential write speeds of 944MBps and read speeds of 2,299MBps. So this raises the next question: Can a hardware NVMe IP core running on this same hardware actually reach those speeds? If you want to help me find out, please get in touch.

If you want my tutorials on how to get an NVMe SSD up and running in PetaLinux, follow these links:


See also