With FPGA Drive we can connect an NVM Express SSD to an FPGA, but what kind of real-world read and write speeds can we achieve with an FPGA? The answer is: it depends. The R/W speed of an SSD depends as much on the SSD as it does on the system it’s connected to. If I connect my SSD to a 286, I can’t expect to get the same performance as when it’s connected to a Xeon. And depending on how it’s configured, the FPGA can be performing more like a Xeon or more like a 286. To get the highest performance from the SSD, the FPGA must be a pure hardware design, implementing NVMe protocol in RTL to minimize latency and maximize throughput. But that’s hard work, and not very flexible, which is why most people will opt for the less efficient configuration whereby the FPGA implements a microprocessor running an operating system. In this configuration, we typically wont be able to exploit the full bandwidth of NVMe SSDs because our processor is just not powerful enough.
But we still want to know, what speeds do we get from an FPGA running PetaLinux? To answer this question, I’ve done tests on two platforms. One on the KC705 board, running PetaLinux on the Microblaze soft processor, and another on the PicoZed 7030, running PetaLinux on the ARM Cortex-A9 processor.
How to measure the speed of an SSD in Linux. There are many ways to measure the read and write speed of an SSD in Linux, but the only one available to us in PetaLinux is the
ddcommand. But that wont suffice. Normally the
ddcommand alone gives us the read/write speed, but PetaLinux is built with a leaner version of
ddthat does not make this calculation for us. So we have to use the
timecommand as well, and make the calculation ourselves.
ddcommand lets us specify an input device, an output device and the number of bytes to transfer between them. The commands below will transfer 2 Gigabytes of data between the input device and output device.
time dd if=/dev/zero of=/dev/nvme0n1p1 bs=2M count=1000
time dd if=/dev/nvme0n1p1 of=/dev/null bs=2M count=1000
From the screenshot above, we can see that the write test transfers 2 Gigabytes of data in 7 minutes and 45 seconds. The read test transfers 2 Gigabytes in 2 minutes and 21 seconds.
The video was taken during the write test to show the SSD activity LED. Notice that the SSD has a couple of seconds of inactivity at regular intervals. I’m not sure exactly what is happening during those couple seconds, but something is holding up the show. Although not shown in the video, during the read test, the SSD doesn’t seem to go through these same periods of inactivity.
You can see in the screenshot above that I used the
lsblk command to get the name of a partition on the SSD (
nvme0n1p1). I use this device name as the input device in the write test, and the output device in the read test. The transfers went a lot faster in the PicoZed test, so I used a larger transfer size just to make the test last a bit longer and improve the accuracy of the result. The write test transfers 16 Gigabytes of data in 3 minutes and 9 seconds. The read test transfers 16 Gigabytes in 2 minutes and 12 seconds.
The video was taken during the write test to show the SSD activity LED. Notice that there are no breaks in SSD activity, in contrast to the Microblaze design. During the read test, same thing.
- Write speed: 4.3 MBps
- Read speed: 14.2 MBps
- Write speed: 84.7 MBps
- Read speed: 121.2 MBps
So there’s a massive difference between the performance of the PicoZed and that of the KC705. The Zynq gets almost 20 times faster write speeds, and about 9 times faster read speeds. This isn’t only due to the faster processor, in the Zynq design, the AXI Memory Mapped to PCIe IP connects to the system memory via a high-performance (HP) AXI slave interface, and it doesn’t have to share that interface with the processor. In the Microblaze design, both the processor and the PCIe IP have to share access to the MIG through an AXI Interconnect.
Neither design comes close to the performance in the Samsung specs nor test results by Arstechnica, showing sequential write speeds of 944MBps and read speeds of 2,299MBps. So this raises the next question: Can a hardware NVMe IP core running on this same hardware actually reach those speeds? If you want to help me find out, please get in touch.
If you want my tutorials on how to get an NVMe SSD up and running in PetaLinux, follow these links:
- MicroBlaze PCI Express Root Complex design in Vivado
- Zynq PCI Express Root Complex design in Vivado
- Connecting an SSD to an FPGA running PetaLinux
With FPGA Drive we can connect an NVM Express SSD to an FPGA, but what kind of real-world read and write speeds can we achieve with an FPGA? The answer is: it depends. The R/W speed of an SSD depends as much on the SSD as it does on the system it’s connected to....
Let me introduce you to Opsero’s latest offering: FPGA Drive FMC, a new FPGA Mezzanine Card that allows you to connect an NVMe PCIe solid-state drive to your FPGA. There’s got to be a better way. In the past, if you were developing an FPGA based product...
Now that I think about it, I’ve been using my Xilinx Platform Cable USB II for 10 years now!!! That’s a terrific run in my opinion, I got it in a kit for the Virtex-5 ML505 board in 2006 and I would have kept using it if I didn’t start getting these...
Did you know that the Zynq Ultrascale+ has 4 built-in Gigabit Ethernet MACs (GEMs)? That makes it awesome for Ethernet applications which is why I’ve just developed and shared an example design for the Zynq Ultrascale+ ZCU102 Evaluation board, armed with an...
Here’s a first look at the FMC version of the FPGA Drive product, featured with the Samsung VNAND 950 Pro SSD. The FMC version can carry M-keyed M.2 modules for PCI Express and is designed to support up to 4-lanes. It has a HPC FMC connector which can be used on...
For those of you who were interested in running my recent tutorials about connecting a PCIe SSD to the Zynq (Zynq PCI Express Root Complex design in Vivado and Connecting an SSD to an FPGA running PetaLinux) you’ll be happy to know that Avnet has released the...
Many FPGA-based embedded designs require connections to multiple Ethernet devices such as IP cameras, and control of those devices under an operating system, typically Linux. The development of such applications can be accelerated through the use of development boards...
This is the final part of a three part tutorial series on creating a PCI Express Root Complex design in Vivado and connecting a PCIe NVMe solid-state drive to an FPGA. Part 1: Microblaze PCI Express Root Complex design in Vivado Part 2: Zynq PCI Express Root Complex...
This is the second part of a three part tutorial series in which we will create a PCI Express Root Complex design in Vivado with the goal of connecting a PCIe NVMe solid-state drive to our FPGA. Part 1: Microblaze PCI Express Root Complex design in Vivado Part 2: Zynq...
This is the first part of a three part tutorial series in which we will go through the steps to create a PCI Express Root Complex design in Vivado, with the goal of being able to connect a PCIe end-point to our FPGA. We will test the design on hardware by connecting a...