Update 2014-08-06: This tutorial is now available in a Vivado version – Using the AXI DMA in Vivado

One of the essential devices for maximizing performance in FPGA designs is the DMA Engine. DMA stands for Direct Memory Access and a DMA engine allows you to transfer data from one part of your system to another. The simplest usage of a DMA would be to transfer data from one part of the memory to another, however a DMA engine can be used to transfer data from any data producer (eg. an ADC) to a memory, or from a memory to any data consumer (eg. a DAC). In older systems, the processor would handle all data transfers between memories and devices. As the complexity and speed of systems increased over time, this method obviously was not sustainable. DMA was invented to remove the bottleneck and free up the processor from having to deal with transferring data from one place to another. In high performance digital and FPGA systems, the data throughput is typically way too high for the processor to deal with, so a DMA is essential.

Xilinx provides us with an AXI DMA Engine IP core in its EDK design tool. In this tutorial, I’ll write about how to add a DMA engine into your design and how to connect it up to a data producer/consumer. We will test the design on the ZC706 evaluation board. We’ll use the Xilinx DMA engine IP core and we’ll connect it to the processor memory. The data producer/consumer will be created using the Peripheral Wizard which will generate a custom IP core that implements an AXI streaming input (data consumer) and an AXI streaming output (data producer). Internally, the AXI streams will be connected in loopback to enable us to test the design. After, you will be able to break the loop and insert whatever devices you would like, be it an IP core for processing data, an ADC, a DAC, you name it.


Start with the base project

You will need to use the Base System Builder to create the base EDK project. If you are not familiar with the BSB, I have gone through this process in another tutorial here: Using the base system builder. Otherwise you can download the base project from my Github page at the link below:


In this tutorial, I have copied the base project files into a folder called “zc706-axi-dma”.


Add the DMA Engine

Open the base EDK project using Xilinx Platform Studio 14.7. Your screen should look somewhat like the image below.axi_dma_edk_0000


In the IP catalog, open the “DMA and Timer” branch and find the “AXI DMA Engine” IP core.

Right click on the AXI DMA Engine and select “Add IP”.axi_dma_edk_0001


Click Yes to confirm.axi_dma_edk_0002


EDK will now open the settings for the AXI DMA Engine.axi_dma_edk_0035


Disable the Scatter Gather Engine and click OK. EDK will then propose to make the connections to the processor for you. Click OK.axi_dma_edk_0004


The EDK will then place the DMA into our base design. Click on the “Bus Interfaces” tab to see the AXI DMA Engine in our design and how it’s connected.axi_dma_edk_0005axi_dma_edk_0006



Expand the axi_dma_0 branch to see the bus connections.axi_dma_edk_0007

What just happened?

Over those few steps there was quite a bit of magic performed behind the curtains, here are a few things that the EDK has done for you:

  • An AXI interconnect was added to the design and labelled “axi_interconnect_1”. The base design had only an AXI lite interface to connect the processor to the GPIO peripherals DIP_Switches_4Bits, GPIO_SWs and LEDs_3Bits. For a high performance DMA, you need a full AXI interconnect.
  • The DMA bus ports have been connected. I’ll explain these buses in another post.
  • The DMA interrupts have been connected to the processor. You have to click on the Ports tab to see this.
  • The DMA engine has been given an address on the memory map. You have to click on the Addresses tab to see this.

Notice that there are four buses that are not connected to anything:


The last two are control buses which we will not use. The first two buses are the AXI streaming master and slave interfaces (the data producer and data consumer respectively). We will have to connect these up to the custom peripheral that we will generate in the next few steps.


Create the data producer/consumer peripheral

We’ll now use the Peripheral Wizard to create an IP core that will serve as our data producer/consumer. It will have an AXI streaming master interface (output/producer) and an AXI streaming slave interface (input/consumer).

From EDK, select Hardware->Create or Import Peripheral.axi_dma_edk_0008


The Peripheral Wizard will open to the welcome screen. Click Next.axi_dma_edk_0009

Select “Create templates for a new peripheral” and click Next.axi_dma_edk_0010


The next window wants to know where you will place the peripheral files. Tick “To an XPS project”, make sure that the folder is your current project and click Next.axi_dma_edk_0011


Now you have to name the peripheral. I called mine “axi_stream_generator” but you can use the name you like. In a real-world design, this peripheral would be wrapping your data producer or data generator, so it might be called “axi_adc” or “axi_dac” depending on what device you are pushing data to or getting data from.axi_dma_edk_0012


Now you have to chose the type of AXI interface for this peripheral. We want to use AXI streaming.axi_dma_edk_0013


On the next page we provide information specific to the loopback example that the EDK will generate. The example peripheral will take in a number of 32-bit words on the AXI-stream slave interface (let’s call it a packet), calculate the sum of those values and then output the sum on the AXI-stream master interface. This page of the wizard allows us to specify the packet size. Leave the default of 8 x 32-bit words and click Next.axi_dma_edk_0014


Just click Next on the page for optional file generations. We wont need any of that.axi_dma_edk_0015


Click Finish on the last page and EDK will generate the template for our new custom peripheral.axi_dma_edk_0016


If you now go down to the bottom of your IP catalog, you should see your custom peripheral listed in the Project Local PCores->USER branch.


Patch time

The template that the EDK just generated for us is great, however it doesn’t quite satisfy the requirements for the AXI streaming interfaces of the DMA Engine. The AXI streaming protocol includes a signal called TLAST which should be asserted when the last data is sent, unfortunately the template peripheral generated by the Peripheral Wizard does not drive the TLAST signal and so we have to make a minor modification to the code.

In your favourite text editor, open the file “\zc706-axi-dma\EDK\pcores\axi_stream_generator_v1_00_a\hdl\vhdl\axi_stream_generator.vhd”. This is the VHDL code for the peripheral template we just generated.

Replace ALL the code with the following code you can get from Github:


Save and close the file.

If you want to eventually modify the custom peripheral to suit your application, this is the file you will have to modify so I suggest you read the code and try to get a good idea of how it works.


Add the Custom Peripheral to the project

Right click on the IP core we just created (“axi_stream_generator” or whatever you called it) and select “Add IP”.axi_dma_edk_0017


Click Yes to confirm.axi_dma_edk_0018


EDK will now open the configuration window for the peripheral. Just leave the defaults and click OK.axi_dma_edk_0019


Now go into the Bus Interfaces tab and open up the axi_stream_generator branch to display its buses.axi_dma_edk_0020


We must now connect the buses as follows:

  • S_AXIS of the axi_stream_generator_0 must be connected to “axi_dma_0_M_AXIS_MM2S”axi_dma_edk_0021


  • S_AXIS_S2MM of the axi_dma_0 must be connected to “axi_stream_generator_0_M_AXIS”axi_dma_edk_0022


After making those connections, your Bus Interfaces window should look like in the image below.axi_dma_edk_0023


Shift over the bus visualization window to see the AXI streaming buses in a light blue colour.axi_dma_edk_0024


Now you can see that we have an AXI streaming interface going from the DMA to our peripheral, and another going from our peripheral to the DMA.

You will not find our custom peripheral in the Addresses tab because being an AXI streaming peripheral, it is not on the memory map.


Patch time

Normally the Xilinx tools would connect up the clock and reset signals for our custom peripheral when we make the bus connections. In this case, it hasn’t done so, so we have to do it manually.

Using your favourite text editor, open the system.mhs file from the EDK project folder.

Go to the bottom of the file and find the following code:

BEGIN axi_stream_generator
 PARAMETER INSTANCE = axi_stream_generator_0
 BUS_INTERFACE M_AXIS = axi_stream_generator_0_M_AXIS

Add two lines to make it the following:

BEGIN axi_stream_generator
 PARAMETER INSTANCE = axi_stream_generator_0
 BUS_INTERFACE M_AXIS = axi_stream_generator_0_M_AXIS
 PORT ACLK = processing_system7_0_FCLK_CLK0
 PORT ARESETN = processing_system7_0_FCLK_RESET0_N_0

Save the file and close it.


Generate the bitstream

In EDK click Generate Bitstream.axi_dma_edk_0025


Once the bitstream has been generated, click “Export Design” to bring the design into SDK.axi_dma_edk_0025a


Click “Export and Launch SDK”.axi_dma_edk_0026


Software Development Kit

  1. The SDK should automatically open after the design is exported.
  2. When the SDK starts up, it will ask you which workspace to open. Create a folder called SDK in the zc706-axi-dma folder (or the project folder you are using) and select this as your workspace. Click OK.axi_dma_edk_0027


SDK opens up with a welcome screen that should look like the following image.axi_dma_edk_0028


Now we need to create an application that will run on our ZC706 evaluation board and test our DMA engine. We will use the UART as an output console so that we can put print statements in our code to make it easier to see what is going on.

Select “File->New->Application project”.axi_dma_edk_0028a


In the dialog box that appears, type the name of the project as “dma_test” and click “Next”.axi_dma_edk_0029


We’re now asked if we would like to use a template for the application. Select the “hello world” template and click “Finish”.axi_dma_edk_0029a


The SDK will now build the dma_test application and the dma_test BSP (board support package). When it is finished, your Project Explorer should look like the image below.axi_dma_edk_0031


Modify the application code

Now we will add code to the template to test our DMA. Double click the helloworld.c file to open it in the SDK, then replace ALL the code with the following code on Github:


When you select Save, the SDK should automatically start building the application.

The code at the link above comes from an example provided by Xilinx in the installation files. You can find it at this location on your PC:


By the way, if you didn’t know about it already, that folder contains heaps of examples that you will find useful, I suggest you check it out.

Once the application is built, you’re ready to run it on the ZC706 evaluation board.


Load the FPGA with the bitstream

1. Turn on your hardware platform (ZC706 or whatever you are using).

2. Connect a USB cable from your board’s UART port (J21 on the ZC706) to your computer’s USB port.

3. Open your terminal program (eg. Putty or Miniterm) and connect to the COM port that corresponds to your UART over USB device. Make sure the port settings are 115200 baud, 8-bits, no parity, 1 stop bit.axi_dma_edk_0033


4. From the SDK menu, select “Xilinx Tools->Program FPGA”.axi_dma_edk_0033a


5. In the “Program FPGA” dialog box, the defaults should already specify the correct bitstream for the hardware project. Make sure they correspond to the image below and click Program.axi_dma_edk_0033b


The Zynq will then be programmed with the bitstream and the console window should give you the message:

FPGA configured successfully with bitstream "E:/Github/fpgadeveloper/zc706-axi-dma/SDK/EDK_hw_platform/system.bit"

Run the Software Application

1. First make sure that the dma_test application is selected in the Project Explorer, then select “Run->Run” or click the icon with the green play symbol in the toolbar.axi_dma_edk_0033c


2. In the “Run As” dialog box, select “Launch on Hardware” and click OK.axi_dma_edk_0034


3. SDK will then program the Zynq with the dma_test application and run it. You should see the following output in your terminal window.axi_dma_edk_0036


If you go through the application code, you will see that the test is run 10 times. This is what we did in each test:

  • We write a packet of 8 words (specifically 0,1,2,3,4,5,6,7) to a transmit buffer that is located in memory
  • We setup and trigger a DMA transfer from our peripheral to the receive buffer (streaming to memory mapped) – at this point there is no data being sent by our peripheral, but we setup the RX in preparation because there soon will be.
  • We setup and trigger a DMA transfer from the transmit buffer to our peripheral (memory mapped to streaming) – this triggers the DMA to send the data from memory to the AXI-streaming master interface, which is connected to the AXI-streaming slave interface of our custom peripheral. That data then gets summed and the answer gets pumped out of the AXI-streaming master interface 8 times (the size of one packet).
  • We wait for both transfers to complete.
  • We read the receive buffer which is also located in memory and the DMA should have just filled up with the received data.
  • We print the received data to the console.

The result should be 0+1+2+3+4+5+6+7=28=0x1C in hexadecimal!

If you want the source code for this project, you can get it from my Github page at the link below:


Jeff is passionate about FPGAs, SoCs and high-performance computing, and has been writing the FPGA Developer blog since 2008. As the owner of Opsero, he leads a small team of FPGA all-stars providing start-ups and tech companies with FPGA design capability that they can call on when needed.

Facebook Twitter LinkedIn