Update 2014-08-06: This tutorial is now available in a Vivado version - Using the AXI DMA in Vivado
One of the essential devices for maximizing performance in FPGA designs is the DMA Engine. DMA stands for Direct Memory Access and a DMA engine allows you to transfer data from one part of your system to another. The simplest usage of a DMA would be to transfer data from one part of the memory to another, however a DMA engine can be used to transfer data from any data producer (eg. an ADC) to a memory, or from a memory to any data consumer (eg. a DAC). In older systems, the processor would handle all data transfers between memories and devices. As the complexity and speed of systems increased over time, this method obviously was not sustainable. DMA was invented to remove the bottleneck and free up the processor from having to deal with transferring data from one place to another. In high performance digital and FPGA systems, the data throughput is typically way too high for the processor to deal with, so a DMA is essential.
Xilinx provides us with an AXI DMA Engine IP core in its EDK design tool. In this tutorial, I’ll write about how to add a DMA engine into your design and how to connect it up to a data producer/consumer. We will test the design on the ZC706 evaluation board. We’ll use the Xilinx DMA engine IP core and we’ll connect it to the processor memory. The data producer/consumer will be created using the Peripheral Wizard which will generate a custom IP core that implements an AXI streaming input (data consumer) and an AXI streaming output (data producer). Internally, the AXI streams will be connected in loopback to enable us to test the design. After, you will be able to break the loop and insert whatever devices you would like, be it an IP core for processing data, an ADC, a DAC, you name it.
Start with the base project
You will need to use the Base System Builder to create the base EDK project. If you are not familiar with the BSB, I have gone through this process in another tutorial here: Using the base system builder. Otherwise you can download the base project from my Github page at the link below:
In this tutorial, I have copied the base project files into a folder called
Add the DMA Engine
Open the base EDK project using Xilinx Platform Studio 14.7. Your screen should look somewhat like the image below.
In the IP catalog, open the “DMA and Timer” branch and find the “AXI DMA Engine” IP core.
Right click on the AXI DMA Engine and select “Add IP”.
Click Yes to confirm.
EDK will now open the settings for the AXI DMA Engine.
Disable the Scatter Gather Engine and click OK. EDK will then propose to make the connections to the processor for you. Click OK.
The EDK will then place the DMA into our base design. Click on the “Bus Interfaces” tab to see the AXI DMA Engine in our design and how it’s connected.
axi_dma_0 branch to see the bus connections.
What just happened?
Over those few steps there was quite a bit of magic performed behind the curtains, here are a few things that the EDK has done for you:
- An AXI interconnect was added to the design and labelled
axi_interconnect_1. The base design had only an AXI lite interface to connect the processor to the GPIO peripherals DIP_Switches_4Bits, GPIO_SWs and LEDs_3Bits. For a high performance DMA, you need a full AXI interconnect.
- The DMA bus ports have been connected. I’ll explain these buses in another post.
- The DMA interrupts have been connected to the processor. You have to click on the Ports tab to see this.
- The DMA engine has been given an address on the memory map. You have to click on the Addresses tab to see this.
Notice that there are four buses that are not connected to anything:
The last two are control buses which we will not use. The first two buses are the AXI streaming master and slave interfaces (the data producer and data consumer respectively). We will have to connect these up to the custom peripheral that we will generate in the next few steps.
Create the data producer/consumer peripheral
We’ll now use the Peripheral Wizard to create an IP core that will serve as our data producer/consumer. It will have an AXI streaming master interface (output/producer) and an AXI streaming slave interface (input/consumer).
From EDK, select Hardware->Create or Import Peripheral.
The Peripheral Wizard will open to the welcome screen. Click Next.
Select “Create templates for a new peripheral” and click Next.
The next window wants to know where you will place the peripheral files. Tick “To an XPS project”, make sure that the folder is your current project and click Next.
Now you have to name the peripheral. I called mine
axi_stream_generator but you can use the name you like. In a real-world design, this peripheral would be wrapping your data producer or data generator, so it might be called
axi_dac depending on what device you are pushing data to or getting data from.
Now you have to chose the type of AXI interface for this peripheral. We want to use AXI streaming.
On the next page we provide information specific to the loopback example that the EDK will generate. The example peripheral will take in a number of 32-bit words on the AXI-stream slave interface (let’s call it a packet), calculate the sum of those values and then output the sum on the AXI-stream master interface. This page of the wizard allows us to specify the packet size. Leave the default of 8 x 32-bit words and click Next.
Just click Next on the page for optional file generations. We wont need any of that.
Click Finish on the last page and EDK will generate the template for our new custom peripheral.
If you now go down to the bottom of your IP catalog, you should see your custom peripheral listed in the Project Local PCores->USER branch.
The template that the EDK just generated for us is great, however it doesn’t quite satisfy the requirements for the AXI streaming interfaces of the DMA Engine. The AXI streaming protocol includes a signal called TLAST which should be asserted when the last data is sent, unfortunately the template peripheral generated by the Peripheral Wizard does not drive the TLAST signal and so we have to make a minor modification to the code.
In your favourite text editor, open the file
\zc706-axi-dma\EDK\pcores\axi_stream_generator_v1_00_a\hdl\vhdl\axi_stream_generator.vhd. This is the VHDL code for the peripheral template we just generated.
Replace ALL the code with the following code you can get from Github:
Save and close the file.
If you want to eventually modify the custom peripheral to suit your application, this is the file you will have to modify so I suggest you read the code and try to get a good idea of how it works.
Add the Custom Peripheral to the project
Right click on the IP core we just created (“axi_stream_generator” or whatever you called it) and select “Add IP”.
Click Yes to confirm.
EDK will now open the configuration window for the peripheral. Just leave the defaults and click OK.
Now go into the Bus Interfaces tab and open up the axi_stream_generator branch to display its buses.
We must now connect the buses as follows:
- S_AXIS of the axi_stream_generator_0 must be connected to
- S_AXIS_S2MM of the axi_dma_0 must be connected to
After making those connections, your Bus Interfaces window should look like in the image below.
Shift over the bus visualization window to see the AXI streaming buses in a light blue colour.
Now you can see that we have an AXI streaming interface going from the DMA to our peripheral, and another going from our peripheral to the DMA.
Normally the Xilinx tools would connect up the clock and reset signals for our custom peripheral when we make the bus connections. In this case, it hasn’t done so, so we have to do it manually.
Using your favourite text editor, open the system.mhs file from the EDK project folder.
Go to the bottom of the file and find the following code:
BEGIN axi_stream_generator PARAMETER INSTANCE = axi_stream_generator_0 PARAMETER HW_VER = 1.00.a BUS_INTERFACE S_AXIS = axi_dma_0_M_AXIS_MM2S BUS_INTERFACE M_AXIS = axi_stream_generator_0_M_AXIS END
Add two lines to make it the following:
BEGIN axi_stream_generator PARAMETER INSTANCE = axi_stream_generator_0 PARAMETER HW_VER = 1.00.a BUS_INTERFACE S_AXIS = axi_dma_0_M_AXIS_MM2S BUS_INTERFACE M_AXIS = axi_stream_generator_0_M_AXIS PORT ACLK = processing_system7_0_FCLK_CLK0 PORT ARESETN = processing_system7_0_FCLK_RESET0_N_0 END
Save the file and close it.
Generate the bitstream
In EDK click Generate Bitstream.
Once the bitstream has been generated, click “Export Design” to bring the design into SDK.
Click “Export and Launch SDK”.
Software Development Kit
- The SDK should automatically open after the design is exported.
- When the SDK starts up, it will ask you which workspace to open. Create a folder called SDK in the zc706-axi-dma folder (or the project folder you are using) and select this as your workspace. Click OK.
SDK opens up with a welcome screen that should look like the following image.
Now we need to create an application that will run on our ZC706 evaluation board and test our DMA engine. We will use the UART as an output console so that we can put print statements in our code to make it easier to see what is going on.
Select “File->New->Application project”.
In the dialog box that appears, type the name of the project as
dma_test and click “Next”.
We’re now asked if we would like to use a template for the application. Select the “hello world” template and click “Finish”.
The SDK will now build the dma_test application and the dma_test BSP (board support package). When it is finished, your Project Explorer should look like the image below.
Modify the application code
Now we will add code to the template to test our DMA. Double click the helloworld.c file to open it in the SDK, then replace ALL the code with the following code on Github:
When you select Save, the SDK should automatically start building the application.
C:\Xilinx\14.7\ISE_DS\EDK\sw\XilinxProcessorIPLib\drivers\axidma_v7_02_a\examples\xaxidma_example_simple_poll.cBy the way, if you didn’t know about it already, that folder contains heaps of examples that you will find useful, I suggest you check it out.
Once the application is built, you’re ready to run it on the ZC706 evaluation board.
Load the FPGA with the bitstream
Turn on your hardware platform (ZC706 or whatever you are using).
Connect a USB cable from your board’s UART port (J21 on the ZC706) to your computer’s USB port.
Open your terminal program (eg. Putty or Miniterm) and connect to the COM port that corresponds to your UART over USB device. Make sure the port settings are 115200 baud, 8-bits, no parity, 1 stop bit.
From the SDK menu, select “Xilinx Tools->Program FPGA”.
- In the “Program FPGA” dialog box, the defaults should already specify the correct bitstream for the hardware project. Make sure they correspond to the image below and click Program.
The Zynq will then be programmed with the bitstream and the console window should give you the message:
FPGA configured successfully with bitstream `E:/Github/fpgadeveloper/zc706-axi-dma/SDK/EDK_hw_platform/system.bit`
Run the Software Application
- First make sure that the dma_test application is selected in the Project Explorer, then select “Run->Run” or click the icon with the green play symbol in the toolbar.
- In the “Run As” dialog box, select “Launch on Hardware” and click OK.
- SDK will then program the Zynq with the dma_test application and run it. You should see the following output in your terminal window.
If you go through the application code, you will see that the test is run 10 times. This is what we did in each test:
- We write a packet of 8 words (specifically 0,1,2,3,4,5,6,7) to a transmit buffer that is located in memory
- We setup and trigger a DMA transfer from our peripheral to the receive buffer (streaming to memory mapped) - at this point there is no data being sent by our peripheral, but we setup the RX in preparation because there soon will be.
- We setup and trigger a DMA transfer from the transmit buffer to our peripheral (memory mapped to streaming) - this triggers the DMA to send the data from memory to the AXI-streaming master interface, which is connected to the AXI-streaming slave interface of our custom peripheral. That data then gets summed and the answer gets pumped out of the AXI-streaming master interface 8 times (the size of one packet).
- We wait for both transfers to complete.
- We read the receive buffer which is also located in memory and the DMA should have just filled up with the received data.
- We print the received data to the console.
The result should be 0+1+2+3+4+5+6+7=28=0x1C in hexadecimal!
If you want the source code for this project, you can get it from my Github page at the link below: