Writings on high-performance digital design by Jeff Johnson, maker of the Ethernet FMC and other FPGA solutions.

Measuring the speed of an NVMe PCIe SSD in PetaLinux

Measuring the speed of an NVMe PCIe SSD in PetaLinux

With FPGA Drive we can connect an NVM Express SSD to an FPGA, but what kind of real-world read and write speeds can we achieve with an FPGA? The answer is: it depends. The R/W speed of an SSD depends as much on the SSD as it does on the system it’s connected to. If I connect my SSD to a 286, I can’t expect to get the same performance as when it’s connected to a Xeon. And depending on how it’s configured, the FPGA can be performing more like a Xeon or more like a 286. To get the highest performance from the SSD, the FPGA must be a pure hardware design, implementing NVMe protocol in RTL to minimize latency and maximize throughput. But that’s hard work, and not very flexible, which is why most people will opt for the less efficient configuration whereby the FPGA implements a microprocessor running an operating system. In this configuration, we typically wont be able to exploit the full bandwidth of NVMe SSDs because our processor is just not powerful enough.

But we still want to know, what speeds do we get from an FPGA running PetaLinux? To answer this question, I’ve done tests on two platforms. One on the KC705 board, running PetaLinux on the Microblaze soft processor, and another on the PicoZed 7030, running PetaLinux on the ARM Cortex-A9 processor.

How to measure the speed of an SSD in Linux. There are many ways to measure the read and write speed of an SSD in Linux, but the only one available to us in PetaLinux is the dd command. But that wont suffice. Normally the dd command alone gives us the read/write speed, but PetaLinux is built with a leaner version of dd that does not make this calculation for us. So we have to use the time command as well, and make the calculation ourselves.

The dd command lets us specify an input device, an output device and the number of bytes to transfer between them. The commands below will transfer 2 Gigabytes of data between the input device and output device.

Write test: time dd if=/dev/zero of=/dev/nvme0n1p1 bs=2M count=1000
Read test: time dd if=/dev/nvme0n1p1 of=/dev/null bs=2M count=1000

KC705 Test


From the screenshot above, we can see that the write test transfers 2 Gigabytes of data in 7 minutes and 45 seconds. The read test transfers 2 Gigabytes in 2 minutes and 21 seconds.

The video was taken during the write test to show the SSD activity LED. Notice that the SSD has a couple of seconds of inactivity at regular intervals. I’m not sure exactly what is happening during those couple seconds, but something is holding up the show. Although not shown in the video, during the read test, the SSD doesn’t seem to go through these same periods of inactivity.

PicoZed Test


You can see in the screenshot above that I used the lsblk command to get the name of a partition on the SSD (nvme0n1p1). I use this device name as the input device in the write test, and the output device in the read test. The transfers went a lot faster in the PicoZed test, so I used a larger transfer size just to make the test last a bit longer and improve the accuracy of the result. The write test transfers 16 Gigabytes of data in 3 minutes and 9 seconds. The read test transfers 16 Gigabytes in 2 minutes and 12 seconds.

The video was taken during the write test to show the SSD activity LED. Notice that there are no breaks in SSD activity, in contrast to the Microblaze design. During the read test, same thing.


Kintex-7 KC705 MicroBlaze processor clocked at 125MHz

  • Write speed: 4.3 MBps
  • Read speed: 14.2 MBps
Zynq-7000 PicoZed 7030 ARM Cortex-A9 clocked at 667MHz

  • Write speed: 84.7 MBps
  • Read speed: 121.2 MBps

So there’s a massive difference between the performance of the PicoZed and that of the KC705. The Zynq gets almost 20 times faster write speeds, and about 9 times faster read speeds. This isn’t only due to the faster processor, in the Zynq design, the AXI Memory Mapped to PCIe IP connects to the system memory via a high-performance (HP) AXI slave interface, and it doesn’t have to share that interface with the processor. In the Microblaze design, both the processor and the PCIe IP have to share access to the MIG through an AXI Interconnect.

Neither design comes close to the performance in the Samsung specs nor test results by Arstechnica, showing sequential write speeds of 944MBps and read speeds of 2,299MBps. So this raises the next question: Can a hardware NVMe IP core running on this same hardware actually reach those speeds? If you want to help me find out, please get in touch.

If you want my tutorials on how to get an NVMe SSD up and running in PetaLinux, follow these links:

Jeff holds a bachelors degree in Electrical Engineering from the University of Sydney, and has more than a decade of experience in electronic and FPGA design. He has worked for design houses in Australia and Canada developing electronic products for a wide range of industries and markets. Jeff now works as an electronic design consultant and offers electronic and FPGA design services through his company Opsero.

Facebook Twitter LinkedIn 

Recent posts

Bye bye Platform Cable USB II, Hello JTAG HS3

Bye bye Platform Cable USB II, Hello JTAG HS3

Now that I think about it, I’ve been using my Xilinx Platform Cable USB II for 10 years now!!! That’s a terrific run in my opinion, I got it in a kit for the Virtex-5 ML505 board in 2006 and I would have kept using it if I didn’t start getting these...

FMC for Connecting an SSD to an FPGA

FMC for Connecting an SSD to an FPGA

Here’s a first look at the FMC version of the FPGA Drive product, featured with the Samsung VNAND 950 Pro SSD. The FMC version can carry M-keyed M.2 modules for PCI Express and is designed to support up to 4-lanes. It has a HPC FMC connector which can be used on...

Multi-port Ethernet in PetaLinux

Multi-port Ethernet in PetaLinux

Many FPGA-based embedded designs require connections to multiple Ethernet devices such as IP cameras, and control of those devices under an operating system, typically Linux. The development of such applications can be accelerated through the use of development boards...



ethernet_fmc_6Academic discounts now available!

Check it out!