Update 2014-08-06: This tutorial is now available in a Vivado version – Using the AXI DMA in Vivado
One of the essential devices for maximizing performance in FPGA designs is the DMA Engine. DMA stands for Direct Memory Access and a DMA engine allows you to transfer data from one part of your system to another. The simplest usage of a DMA would be to transfer data from one part of the memory to another, however a DMA engine can be used to transfer data from any data producer (eg. an ADC) to a memory, or from a memory to any data consumer (eg. a DAC). In older systems, the processor would handle all data transfers between memories and devices. As the complexity and speed of systems increased over time, this method obviously was not sustainable. DMA was invented to remove the bottleneck and free up the processor from having to deal with transferring data from one place to another. In high performance digital and FPGA systems, the data throughput is typically way too high for the processor to deal with, so a DMA is essential.
Xilinx provides us with an AXI DMA Engine IP core in its EDK design tool. In this tutorial, I’ll write about how to add a DMA engine into your design and how to connect it up to a data producer/consumer. We will test the design on the ZC706 evaluation board. We’ll use the Xilinx DMA engine IP core and we’ll connect it to the processor memory. The data producer/consumer will be created using the Peripheral Wizard which will generate a custom IP core that implements an AXI streaming input (data consumer) and an AXI streaming output (data producer). Internally, the AXI streams will be connected in loopback to enable us to test the design. After, you will be able to break the loop and insert whatever devices you would like, be it an IP core for processing data, an ADC, a DAC, you name it.
Start with the base project
You will need to use the Base System Builder to create the base EDK project. If you are not familiar with the BSB, I have gone through this process in another tutorial here: Using the base system builder. Otherwise you can download the base project from my Github page at the link below:
In this tutorial, I have copied the base project files into a folder called “zc706-axi-dma”.
Add the DMA Engine
In the IP catalog, open the “DMA and Timer” branch and find the “AXI DMA Engine” IP core.
What just happened?
Over those few steps there was quite a bit of magic performed behind the curtains, here are a few things that the EDK has done for you:
- An AXI interconnect was added to the design and labelled “axi_interconnect_1”. The base design had only an AXI lite interface to connect the processor to the GPIO peripherals DIP_Switches_4Bits, GPIO_SWs and LEDs_3Bits. For a high performance DMA, you need a full AXI interconnect.
- The DMA bus ports have been connected. I’ll explain these buses in another post.
- The DMA interrupts have been connected to the processor. You have to click on the Ports tab to see this.
- The DMA engine has been given an address on the memory map. You have to click on the Addresses tab to see this.
Notice that there are four buses that are not connected to anything:
The last two are control buses which we will not use. The first two buses are the AXI streaming master and slave interfaces (the data producer and data consumer respectively). We will have to connect these up to the custom peripheral that we will generate in the next few steps.
Create the data producer/consumer peripheral
We’ll now use the Peripheral Wizard to create an IP core that will serve as our data producer/consumer. It will have an AXI streaming master interface (output/producer) and an AXI streaming slave interface (input/consumer).
Now you have to name the peripheral. I called mine “axi_stream_generator” but you can use the name you like. In a real-world design, this peripheral would be wrapping your data producer or data generator, so it might be called “axi_adc” or “axi_dac” depending on what device you are pushing data to or getting data from.
On the next page we provide information specific to the loopback example that the EDK will generate. The example peripheral will take in a number of 32-bit words on the AXI-stream slave interface (let’s call it a packet), calculate the sum of those values and then output the sum on the AXI-stream master interface. This page of the wizard allows us to specify the packet size. Leave the default of 8 x 32-bit words and click Next.
If you now go down to the bottom of your IP catalog, you should see your custom peripheral listed in the Project Local PCores->USER branch.
The template that the EDK just generated for us is great, however it doesn’t quite satisfy the requirements for the AXI streaming interfaces of the DMA Engine. The AXI streaming protocol includes a signal called TLAST which should be asserted when the last data is sent, unfortunately the template peripheral generated by the Peripheral Wizard does not drive the TLAST signal and so we have to make a minor modification to the code.
In your favourite text editor, open the file “\zc706-axi-dma\EDK\pcores\axi_stream_generator_v1_00_a\hdl\vhdl\axi_stream_generator.vhd”. This is the VHDL code for the peripheral template we just generated.
Replace ALL the code with the following code you can get from Github:
Save and close the file.
If you want to eventually modify the custom peripheral to suit your application, this is the file you will have to modify so I suggest you read the code and try to get a good idea of how it works.
Add the Custom Peripheral to the project
We must now connect the buses as follows:
Now you can see that we have an AXI streaming interface going from the DMA to our peripheral, and another going from our peripheral to the DMA.
Normally the Xilinx tools would connect up the clock and reset signals for our custom peripheral when we make the bus connections. In this case, it hasn’t done so, so we have to do it manually.
Using your favourite text editor, open the system.mhs file from the EDK project folder.
Go to the bottom of the file and find the following code:
BEGIN axi_stream_generator PARAMETER INSTANCE = axi_stream_generator_0 PARAMETER HW_VER = 1.00.a BUS_INTERFACE S_AXIS = axi_dma_0_M_AXIS_MM2S BUS_INTERFACE M_AXIS = axi_stream_generator_0_M_AXIS END
Add two lines to make it the following:
BEGIN axi_stream_generator PARAMETER INSTANCE = axi_stream_generator_0 PARAMETER HW_VER = 1.00.a BUS_INTERFACE S_AXIS = axi_dma_0_M_AXIS_MM2S BUS_INTERFACE M_AXIS = axi_stream_generator_0_M_AXIS PORT ACLK = processing_system7_0_FCLK_CLK0 PORT ARESETN = processing_system7_0_FCLK_RESET0_N_0 END
Save the file and close it.
Generate the bitstream
Software Development Kit
- The SDK should automatically open after the design is exported.
- When the SDK starts up, it will ask you which workspace to open. Create a folder called SDK in the zc706-axi-dma folder (or the project folder you are using) and select this as your workspace. Click OK.
Now we need to create an application that will run on our ZC706 evaluation board and test our DMA engine. We will use the UART as an output console so that we can put print statements in our code to make it easier to see what is going on.
Modify the application code
Now we will add code to the template to test our DMA. Double click the helloworld.c file to open it in the SDK, then replace ALL the code with the following code on Github:
When you select Save, the SDK should automatically start building the application.
By the way, if you didn’t know about it already, that folder contains heaps of examples that you will find useful, I suggest you check it out.
Once the application is built, you’re ready to run it on the ZC706 evaluation board.
Load the FPGA with the bitstream
1. Turn on your hardware platform (ZC706 or whatever you are using).
2. Connect a USB cable from your board’s UART port (J21 on the ZC706) to your computer’s USB port.
3. Open your terminal program (eg. Putty or Miniterm) and connect to the COM port that corresponds to your UART over USB device. Make sure the port settings are 115200 baud, 8-bits, no parity, 1 stop bit.
The Zynq will then be programmed with the bitstream and the console window should give you the message:
FPGA configured successfully with bitstream "E:/Github/fpgadeveloper/zc706-axi-dma/SDK/EDK_hw_platform/system.bit"
Run the Software Application
If you go through the application code, you will see that the test is run 10 times. This is what we did in each test:
- We write a packet of 8 words (specifically 0,1,2,3,4,5,6,7) to a transmit buffer that is located in memory
- We setup and trigger a DMA transfer from our peripheral to the receive buffer (streaming to memory mapped) – at this point there is no data being sent by our peripheral, but we setup the RX in preparation because there soon will be.
- We setup and trigger a DMA transfer from the transmit buffer to our peripheral (memory mapped to streaming) – this triggers the DMA to send the data from memory to the AXI-streaming master interface, which is connected to the AXI-streaming slave interface of our custom peripheral. That data then gets summed and the answer gets pumped out of the AXI-streaming master interface 8 times (the size of one packet).
- We wait for both transfers to complete.
- We read the receive buffer which is also located in memory and the DMA should have just filled up with the received data.
- We print the received data to the console.
The result should be 0+1+2+3+4+5+6+7=28=0x1C in hexadecimal!
If you want the source code for this project, you can get it from my Github page at the link below: