Processorless Ethernet: Part 3

State machine based Ethernet on FPGA

Processorless Ethernet: Part 3

For those of you who want to experiment with processorless Ethernet on FPGAs, I’ve just released a 4-port example design that supports these Xilinx FPGA development boards:

Here’s the Git repo for the project: Processorless Ethernet on FPGA

Why processorless?

Pure hardware designs can trump software where the need for low latency and/or high throughput is greater than the need for flexibility and complexity (eg. the support of complex protocols). There are lots of applications that rely on hardware based packet processing to achieve their superior performance. High frequency trading platforms are often fed market pricing over multicast UDP, so their profitability is directly linked to their ability to process UDP with the lowest possible latency. Network security devices that monitor traffic usually need to be as transparent as possible while also being able to detect threats and take action with the lowest possible delay. Whatever your reason for processing Ethernet frames in the FPGA fabric, make sure that you consider both sides of the coin:

  • Pros: Speed and ultra-low latency
    The natural advantage of running your algorithms closer to the machine, FPGAs allow you to perform packet processing on dedicated hardware, without the overheads that slow down software based designs.

  • Cons: Difficulty of design and lack of flexibility
    The possibilities for processorless Ethernet are limited by the difficulty of designing state machines and logic to handle complex IP protocols. Furthermore, handling updates and maintenance is much more difficult and costly with pure hardware designs.

For most applications, a processor brings far more value to the design than it costs in resources and complexity. These example designs can get you started with packet processing on FPGA, but there’s obviously nothing stopping you from running a processor alongside them.

4-port design

The block diagram below illustrates the new design which has four ports vs the single port of the original design. To create this new design, we have created an IP block called “Ethernet driver”, the yellow blocks in the diagram below. This IP block contains the key elements of the original example design, without the TEMAC, clocking and reset logic. We split things up this way so that we can hook everything together in a block design in Vivado IP integrator, and easily extend the design to 4 ports.

TEMAC Example block diagram

The resulting block diagram (in Vivado) only uses three block types:

  1. Clocking wizard
  2. Tri-mode Ethernet MAC
  3. Ethernet driver (module)

The block design has 4x TEMACs and 4x Ethernet drivers so that there is one connected to each port. Each one of these operates independently of the other ports.

Basic operation

As before, the design has two main modes of operation: loopback mode and packet generating/checking mode. In loopback mode, the received packets on a port are sent back out the same port, after having their destination/source addresses swapped. In the packet generating/checking mode, the port sends a stream of packets and checks the received packets to make sure that they fit the same format of the outgoing frames.

A more detailed description of operation can be found in the TEMAC Product Guide. For more instructions on using the example designs, refer to the Github page and my original post.

Play with packets

The best place to start playing around with the packet processing in this design is to checkout the packet generator and checker, found in files tri_mode_ethernet_mac_0_axi_pat_gen.v and tri_mode_ethernet_mac_0_axi_pat_check.v respectively. They’re both written in Verilog, which like VHDL, is a great language for designing Ethernet packet processing in FPGAs. Another option is to use a high level synthesis tool like Vivado HLS, and replace the pattern generator/checker with your HLS core. I have mixed opinions of high level synthesis and I would only recommend using it if your application requires a complex algorithm that would be very difficult to code in Verilog or VHDL. Even then, sometimes you’re still better off with Verilog and VHDL because timing or placement can become more of an issue in big complex designs.

Add a TCP/IP core

One interesting thing to do with this design would be to add an IP core to implement the full TCP/IP stack. There are lots of TCP/IP cores on the market but just don’t expect them to be cheap. Here is a list of some of them:

Another alternative (if you have lots of time) is to experiment with an open source TCP/IP core:

The end

If you find this design useful, or you do anything interesting with it, I’d be keen to know about it. Here’s the link to the Git repo again:

Processorless Ethernet on FPGA