Let’s say we want to be able to switch dynamically between two (or more) clocks. In the Virtex FPGAs we have a primitive which allows us to do just this, it’s called the BUFGCTRL. The BUFGCTRL is a global clock buffer (like BUFG) which has two clock inputs and a series of control inputs that allow you to select between the two clocks. The great thing about the BUFGCTRL is that it allows you to switch between clocks “glitch free”.
If you have two clock inputs and you want to switch between them without glitches at the output, use this code:
BufGCtrlMux_l : BUFGCTRL generic map ( INIT_OUT => 0, PRESELECT_I0 => FALSE, PRESELECT_I1 => FALSE) port map ( O => ClkOutputMux, CE0 => not ClkSel, CE1 => ClkSel, I0 => ClkInput0, I1 => ClkInput1, IGNORE0 => '0', IGNORE1 => '0', S0 => '1', -- Clock select0 input S1 => '1' -- Clock select1 input );
One problem with using the BUFGCTRL in “glitch free” configuration is that it requires that both clocks be running at all times. If the selected clock suddenly drops out, you will not be able to switch to the other clock. If for example, your clocks come from external sources that come and go, you will not be able to use the BUFGCTRL in “glitch free” configuration and instead you will have to use it in asynchronous mode. In this mode, you can switch between the clocks as you like and it will never get locked into one or the other.
Use this code for the asynchronous clock MUX if you don’t care about glitch free operation:
BufGCtrlMux_l : BUFGCTRL generic map ( INIT_OUT => 0, PRESELECT_I0 => FALSE, PRESELECT_I1 => FALSE) port map ( O => ClkOutputMux, CE0 => '1', CE1 => '1', I0 => ClkInput0, I1 => ClkInput1, IGNORE0 => '1', IGNORE1 => '1', S0 => not ClkSel, -- Clock select0 input S1 => ClkSel -- Clock select1 input );
What if you have four clocks to choose from? Well you can use 3 BUFGCTRLs to implement a 4-to-1 clock multiplexer. Obviously, your select signal becomes a 2-bit signal. Use this code for a 4-input asynchronous clock MUX:
BufGCtrlMuxA_l : BUFGCTRL generic map ( INIT_OUT => 0, PRESELECT_I0 => FALSE, PRESELECT_I1 => FALSE) port map ( O => ClkOutputMuxA, CE0 => '1', CE1 => '1', I0 => ClkInput0, I1 => ClkInput1, IGNORE0 => '1', IGNORE1 => '1', S0 => not ClkSel(0), -- Clock select0 input S1 => ClkSel(0) -- Clock select1 input ); BufGCtrlMuxB_l : BUFGCTRL generic map ( INIT_OUT => 0, PRESELECT_I0 => FALSE, PRESELECT_I1 => FALSE) port map ( O => ClkOutputMuxB, CE0 => '1', CE1 => '1', I0 => ClkInput2, I1 => ClkInput3, IGNORE0 => '1', IGNORE1 => '1', S0 => not ClkSel(0), -- Clock select0 input S1 => ClkSel(0) -- Clock select1 input ); BufGCtrlMux_l : BUFGCTRL generic map ( INIT_OUT => 0, PRESELECT_I0 => FALSE, PRESELECT_I1 => FALSE) port map ( O => ClkOutputMux, CE0 => '1', CE1 => '1', I0 => ClkOutputMuxA, I1 => ClkOutputMuxB, IGNORE0 => '1', IGNORE1 => '1', S0 => not ClkSel(1), -- Clock select0 input S1 => ClkSel(1) -- Clock select1 input );
This example assumes that you have the following signals declared somewhere!
signal ClkOutputMuxA : std_logic; signal ClkOutputMuxB : std_logic; signal ClkOutputMux : std_logic; signal ClkInput0 : std_logic; signal ClkInput1 : std_logic; signal ClkInput2 : std_logic; signal ClkInput3 : std_logic; signal ClkSel : std_logic_vector(1 downto 0);
Remember that the BUFGCTRL is a global clock buffer, and you only have a limited number of these in any Virtex device, so be aware of the limitation on your device. For more information on BUFG and BUFGCTRL, read the Clocking Resources User Guide for your specific FPGA device.
For the occasions that you find yourself with a netlist file and you don’t know where it came from or what version it is, etc. this post is about how you can interpret the netlist file (ie. convert it into something readable).
Today I found myself with two netlists and I needed to know if they were the same. Yes of course you can try comparing the two files with a program such as Beyond Compare, but if the netlists were compiled on separate dates, you will have trouble recognizing this from the raw binary data. The best thing to do in this case is to convert the netlists to EDIF files, a readable, text file version of the netlist. Another option is to convert the netlists into VHDL or Verilog code. Here is how you can do this:
To convert a netlist (.ngc) to an EDIF file (.edf)
- Get a command window open by typing “cmd” in the Start->Run menu option in Windows. If you use Linux, open up a terminal window.
- Use the “cd” command to move to the folder in which you keep your netlist.
- Type “ngc2edif infilename.ngc outfilename.edf” where infilename and outfilename correspond to the input and output filenames respectively.
- Open the .edf file with any text editor to view the netlist.
To reverse engineer a netlist with ISE versions older than 6.1i
- Convert the netlist to an EDIF file using the above instructions.
- Type “edif2ngd filename.edf filename.ngd” to convert the EDIF file into an NGD file (Xilinx Native Generic Database file).
- To convert the netlist into VHDL type “ngd2vhdl filename.ngd filename.vhd“.
- To convert the netlist into Verilog type “ngd2ver filename.ngd filename.v“.
To reverse engineer a netlist with ISE versions 6.1i and up
- To convert the netlist into VHDL type “netgen -ofmt vhdl filename.ngc“. Netgen will create a filename.vhd file.
- To convert the netlist into Verilog type “netgen -ofmt verilog filename.ngc“. Netgen will create a filename.v file.
Now you should have all the tools you need to read an NGC netlist file.
Back in 2009 I did a presentation on why companies needed to be using FPGAs in their high frequency trading:
1. Stop competing in the arms race
Profits for being first to the game are over. Hardware will advance more quickly than you can develop strategies to run on it. Don’t compete in the arms race unless you can buy out Xilinx or Altera.
2. Stop focusing on speed of execution
Trying to get your order out faster than anyone else is a crowded game. Find intelligent strategies rather than fast and stupid strategies. Use FPGAs for what they are good at: fast parallel number crunching. Focus on processing market data to find trade opportunities, not on crunching protocols to save 2 microseconds.
3. Leverage existing hardware
Don’t waste your time developing your own custom hardware. The kind of hardware used in high frequency trading costs too much money to develop and involves too much risk (ironically). But the main problem is the development lead time which means that by the time you can trade on it you can buy something else which is cheaper and faster.
4. Use more data
The next profits will come from FPGA trading platforms that process data streams coming from everywhere and everything. Bring together data from a multitude of sources that are not yet being looked at and find the intercorrelations that can only be exploited by the speed of an FPGA.
Not long ago I discovered GitHub, the social coding website. Basically its a place where you can share your code and manage open source projects online. I think it’s mainly used by non-HDL programmers but the concept is not language specific so I figured it would be a good place to share FPGA designs. Gradually I will bring all the source code of all our tutorials onto GitHub so that people can more easily share it, modify it and contribute to it.
To start things off, I’ve uploaded the most popular project (at this time): Microblaze 16×2 LCD Driver.
Here’s the GitHub repository: Microblaze 16×2 LCD Driver on GitHub
Here’s how it’s organized:
- Each project will have its own repository.
- The first folder within the repository will be the name of the hardware platform (eg. ML505, XUPV5, XUPV2P, etc).
- The second folder will be the name of the software and the version number (eg. edk10-1, ise10-1, etc).
- After that, we will use the same folder structure as used by the software used, whether it be EDK, ISE or whatever.
As a new GitHub user, I admit that this might not be the best layout and I’m open to suggestions so by all means let me know in the comments if you see any problems with this.
Here’s what I want you to do:
- Get on GitHub if you are not already.
- Share FPGA developer projects with your friends and colleagues.
- Contribute to FPGA developer projects. What can you contribute?
- If you make a project work on a different hardware platform, add your code to the repository.
- If you make a project work in a different version of EDK/ISE/etc, add your code to the repository.
- If you can improve on a project, fork it and start a new one.
In general, I want you to share, learn and enjoy!
Recently, what looks to be the first open source FPGA bitcoin miner was released on GitHub. The code is based on the Terasic DE2-115 development board featuring the Altera Cyclone IV, however the author says the design should be applicable to any other FPGA. Maybe we should make it work on a Xilinx FPGA? Here is what they say about its performance:
Project is fully functional and allows mining of Bitcoins both in a Pool and Solo. It also supports Namecoins.
Current Performance: 109 MHash/s On a Terasic DE2-115 Development Board
Note: The included default configuration file, and source files, are built for 50 MHash/s performance (downclocked). This is meant to prevent damage to your valuable chip if you don’t provide an appropriate cooling solution.
For more information about bitcoins: http://bitcoin.org/
I wonder what performance we could get on the ML505/XUPV5? If anyone has done it, let us know. More importantly, I wonder if anyone is making money with this…
This is the first part of a series of posts I will write on various code structures and examples for HDL designs. Here I want to talk about the generate statement and particularly the for loop.
Most programmers think of a for loop as being a code segment that is repeated during execution of the program. The generate for loop is similar in concept however the difference is that the code segment is repeated on compilation time. For example, I could write the code:
for(i = 0; i < 8; i++) printf("hello world");
To achieve the same functional effect, I could have written the printf statement 8 times. Of course you wouldn’t do this because it’s not good coding practice and likewise you would not do this in HDL. But what does the compiler do with the for loop? In reality, the C compiler will not replace your for loop with 8 copies of the printf statement, but in the case of the generate for loop, the synthesis program will do that! That is precisely the point of the generate for loop: to save you writing the same code segment multiple times, preventing you from making errors and making for cleaner code.
The example below shows a generate for loop that generates 8 regional clock buffers (BUFR) using the same chip enable (CE) and clear (CLR) signals but with their own clock input and output signals. The separate clock input and output signals are referenced to different bits of a signal vector using the variable called index.
VHDL generate for loop:
gen_code_label: for index in 0 to 7 generate begin BUFR_inst : BUFR generic map ( BUFR_DIVIDE => "BYPASS") port map ( O => clk_o(index), CE => ce, CLR => clear, I => clk_i(index) ); end generate;
Verilog generate for loop:
genvar index; generate for (index=0; index < 8; index=index+1) begin: gen_code_label BUFR BUFR_inst ( .O(clk_o(index)), // Clock buffer ouptput .CE(ce), // Clock enable input .CLR(clear), // Clock buffer reset input .I(clk_i(index)) // Clock buffer input ); end endgenerate
Now you might ask why you would want to write the same code segment multiple times so here are a couple of examples where you would want to use the generate for loop:
- Instantiating multiple RocketIO, HDL modules, buffers, etc. Using the generate for loop makes your code cleaner and is easier to check and debug later on. Going up a notch, when you have to instantiate hundreds of thousands of something, the generate loop becomes absolutely necessary, not just convenient.
- Making a large number of connections between several signals. Writing out the connections for a hundred signal vectors can be made easier by grouping the vectors into an array and writing a generate for loop to make the connections.
So if you have written code that contains lots of repetitive stuff, try using the generate loop to clean it up. If you’ve got questions about the generate for loop, leave them in the comments below.
With the top two FPGA companies taking up 89% of the FPGA market, you can be forgiven for thinking there was no one else out there. Xilinx and Altera have done a good job of defending the duopoly but a few companies are gradually winning market share by targeting specific applications and sub-markets. Here is a list of the top 5 FPGA companies by revenue.
Market share: 49% ($2,369.45 million) 12 months ending 2011-01-02
The leader in FPGAs for many years, Xilinx has a good range of FPGAs in terms of cost and performance. In recent years, the popular Spartan series has covered the low-to-mid-end market while the Virtex series has covered the high-end. Recently, Xilinx released the “7″ family of FPGAs which are built on 28-nm process and for the first time introduced the Artex-7 and Kintex-7 series which provide better coverage of the lower and mid-end applications previously covered by the Spartan series. The Kintex-7 recently won the “Highly Commended Prize” Semiconductor of the year award for 2011.
Market share: 40% ($1,954.43 million) 12 months ending 2011-01-02
The Altera FPGAs cover the low, mid and upper end markets with the Cyclone, Arria and Stratix series respectively. The most recent offering from Altera is the Cyclone-V, Arria-V and Stratix-V, all build on 28-nm process technology.
Larger than Xilinx in market value, Altera has made great progress in winning market share in recent years. Many people would say that their software tools are much better than those of Xilinx which has likely been an important factor in their success.
Market share: 6% ($297.77 million) 12 months ending 2011-01-02
Lattice Semiconductor tackles the low-power and low-cost market for FPGAs. They market their products as the “high-value FPGAs” of the industry, providing best performance per cost. With the explosion in portable electronics, this has been a good strategy for Lattice. Lattice claims to have the industry’s lowest power and price SERDES-capable FPGA: LatticeECP3. Obviously they didn’t follow the trend of naming FPGAs after greek mythology or meteorological phenomena (not saying its a bad move!).
Microsemi (was Actel)
Market share: 4% ($207.49 million) 12 months ending 2011-01-02
Microsemi specializes in low-power and mixed-signal FPGAs. Here are some of Microsemi’s claims:
- The industry’s lowest power FPGA: the IGLOO.
- The industry’s only FPGA with hard 32-bit ARM Cortex-M3 microcontroller: the SmartFusion.
Market share: 1% ($26.20 million) 12 months ending 2011-01-02
QuickLogic’s focus is on the mobile devices industry meaning ultra-low power, small form factor packaging, and high design security. Rather than selling “FPGA”, they pitch “customizable semiconductors”. You will not find the word “FPGA” on the front page of their website.
“Our patented ViaLink® interconnect technology enables QuickLogic to deliver the lowest power, most routable FPGA in the industry,” Brian Faith, Quicklogic’s Manager of FPGA products.
I’m interested to hear from FPGA developers who have worked on Lattice, Microsemi and QuickLogic FPGAs. Share your experience with us in the comments below. What are their tools like? How do they compare in performance and price to Xilinx and Altera? Can anyone see them breaking the duopoly?
You might already know I’m interested in the application of FPGAs in the financial markets, a field that has been growing over the last few years. JP Morgan has been working on this over the last 3 years and its paying off.
Prior to the implementation, JP Morgan would take eight hours to do a complete risk run, and an hour to run a present value, on its entire book. If anything went wrong with the analysis, there was no time to re-run it.
It has now reduced that to about 238 seconds, with an FPGA time of 12 seconds.