Chapter 6
Real World Design: Tools, Techniques, and Trade-offs

The Real World is specific, not generic. It is theoretically possible to write portable code that will run on any vendor’s hardware (including ASIC processes), but the required compromises in performance and efficiency are generally not worth the trade-off. We, as over-worked FPGA designers, will find ourselves using vendor-specific libraries and techniques to achieve tight (fast and small) designs.

Now we have to choose an FPGA vendor. Eighty percent of the FPGA market is split between two powerhouses that dominate: Xilinx and Altera. Both are good companies with great technology. This book focuses on Xilinx FPGAs. We’ll look at specific architecture differences between competing companies in Chapter 7. For compilers, at this writing, the market leaders are Exemplar Logic and Synplicity. Both are excellent products. In addition, Synopsis’ FPGA Express is close enough to usable that it bears watching, and not just because, when bundled with Xilinx Design Manager, it’s the cheapest package available. We are using Exemplar Logic’s LeonardoSpectrum for this book.

The design flow and tools we will use are as follows:

• Specify the design. It doesn’t make sense to start coding until the job is defined. In the Real World we often have to start a job before marketing has fully defined the requirements, but we’ll try to get the job scoped out as much as possible first.

• Partition the design. Divide the job into sections. Reuse old designs as much as possible. We want our modules to be 5,000 to 10,000 gates. My estimate is approximately 20 gates per line (which can vary wildly), so this is 250 to 500 lines of code (semicolons) per module.

• Write the code. Use a color-coded editor to help avoid syntax errors (the color coding acts as an on-the-fly syntax checker and is remarkably useful). Implement area, timing, clock/reset resource-assignment, and pin-assignment constraints.

• To help locate syntax problems, try compiling your design with every tool you can find, including different simulators. You’ll find that each vendor provides differing error messages with differing levels of helpfulness.

• If possible, use a lint program like Verilint. There are several errors that a Verilog compiler will accept, like mismatched vectors and the creation of unwanted latches, that Verilint will catch. Pay close attention to warnings that may indicate problems with synthesis.

• Simulate the design. Write test fixtures and use automated testing and waveforms to verify the design. In this book, this means use Simucad’s Silos III to simulate the design at as high a level as possible.

• Compile the code. In this book, this means use Exemplar Logic’s LeonardoSpectrum to create a netlist. Watch the gate counts and speed estimates. Use the schematic viewer to assure that your code is being implemented in the manner you expect. Examine how clocks and resets are implemented. Make sure global signals are detected and handled in the manner you expect.

• Place and route the netlist. In this book, this means use Xilinx Design Manager to create a downloadable configuration file. Manipulate the place/route controls and perform as many place/route passes as necessary to achieve the design requirements.

• Download the design and test it in the target hardware. FPGA designers tend to jump to this step too soon, owing either to not having the right tools or to impatience. The designer should be very sure the design is good before testing in circuit.

COMPILING WITH LEONARDOSPECTRUM

LeonardoSpectrum has a graphical user interface and a wizard that leads the designer through the design requirements. Very quickly, however, the designer will find the use of scripts to be a faster and more efficient method of creating a netlist. The design script created by the Wizard can be captured and run. Listing 6-1 is an example script created by the design wizard.

Listing 6-1 Example LeonardoSpectrum Script

set register2register 50

set input2register 50

set input2output 50

set register2output 50

set output_file “C:/verilog/latch.edf”

set novendor_constraint_file FALSE

_gc_read_init

_gc_run_init

set input_file_list { “C:/verilog/latch.v” }

set part 4013xlPQ160

set process 3

set wire_table 4013xl-3_avg

set nowrite_eqn FALSE

set chip TRUE

set area TRUE

set report brief

set global_sr reset

set output_file “C:/verilog/latch.edf”

set target xi4xl

_gc_read

set register2register 50

set input2register 50

set input2output 50

set register2output 50

set output_file “”

Let’s look at this script line-by-line.

set register2register 50 In the design wizard, I selected an overall constraint of 20 MHz, which gives a clock period of 50 nsec. This constraint means that all signals between registers (from a register output to a register input) must resolve in 50 nsec.

set input2register 50 Based on the overall design requirement of running with a 20 MHz clock, all signals between the device input and a register must resolve in 50 nsec. The designer must consider the problem of insuring this requirement is met in the logic outside the device. It may be that a much tighter constraint must be applied to these nodes, depending on the timing of the external circuitry. Devices that have I/O registers make this problem much easier to solve.

set input2output 50 Based on the overall clock requirement of 20 MHz, all signals between logic and a device pin this logic drives must be resolved in 50 nsec. This constraint may need to be much tighter to satisfy the circuitry outside the device.

set register2output 50 Based on the overall clock requirement of 20 MHz, all signals between a register and a device output pin must be resolved in 50 nsec. This constraint may need to be much tighter to satisfy the circuitry outside the device. Devices that have I/O registers make this problem much easier to solve.

set global_sr reset Connect the global set/reset resource to the reset signal. Xilinx supports the connection of a user-defined global reset, which can be used by any register in the device. The signal still has to be identified and used in every always block where the reset is desired.

lut_max_fanout 4 To control the output loading (which affects the area and speed of the design), LeonardoSpectrum allows the designer to control the maximum number of loads that will be connected to a CLB. In this case, a light load of 4 is used. This will result in many buffers being used to reduce loading.

set output_file “C:/verilog/latch.edf” The netlist created by the compiler will be in the form of an EDIF (.edf) file and will be saved in the indicated path. Note usage of UNIX-style forward slashes in the path! Options for file output include: .edf (edif), .edif (edif), .eds(edif), .sdf (standard delay format), .v (verilog), .verilog (verilog), .vhd (vhdl), .vhdl (vhdl), .xdb (binary dump), .xnf (Xilinx netlist format).

set novendor_constraint_file FALSE This double negative means that we will create a FPGA vendor (in this case: Xilinx) constraint file and use that to guide the place and route of our logic.

set input_file_list { “C:/verilog/latch.v” } This is the list of input files to be linked together. In this case just one file is used to create the design. Note usage of UNIX-style forward slashes in the path. Options for file input include: .edf (edif), .edif (edif), .eds(edif), .sdf (standard delay format), .v (verilog), .verilog (verilog), .vhd (vhdl), .vhdl (vhdl), .xdb (binary dump), .xnf (Xilinx netlist format).

set part 4013xlPQ160 The device we will implement this design in is a Xilinx 4013XL (roughly 13,000 gates) in a PQ160 (160-pin surface-mount) package.

set process 3 We are using the LeonardoSpectrum Level 3 design flow. Levels 1 and 2 are subsets of level 3, level one is a single-vendor FPGA design flow; level 2 is multi-vendor FPGA flow; level 3 is multivendor and includes ASIC flows.

set wire_table 4013xl-3_avg The delays will be based on average (as compared to worst-case) loading for a -3 speed grade device.

set nowrite_eqn FALSE Here’s another double negative that means we will write device equations into the schematic when the schematic is extracted from the netlist.

set chip TRUE The netlist will be compiled to a device and will include I/O pins for pins at the top level.

set area TRUE The design will be compiled for area optimization. The option is to compile for speed. LeonardoSpectrum Level 3 allows individual modules to be compiled for either area or speed—a great feature.

set report brief The report will be concise.

hierarchy_preserve TRUE LeonardoSpectrum will combine modules in an attempt to reduce logic by maintaining the hierarchy. This reduction is not allowed. Setting this TRUE during debugging is useful because it is more likely that your signal names will be preserved.

set target xi4xl Implement the design using primitives from the Xilinx 4000XL library.

To refresh our memory, Listing 6-2 is the design we’re working with. This design has a problem: an inadvertent latch is created. LeonardoSpectrum is polite enough to point this out to us in the message log of Listing 6-3 (see bold-highlighted text).

Listing 6-2 Verilog Latch

    // Your Basic Latch. 
    module latch2(q, q_not, set, reset); 
        output         q, q_not; 
        reg            q; 
        input          set, reset; 
     
        wire           set, reset; 
     
        assign q_not =        ~q; 
         always @ (set or reset) 
         begin 
                if (set) 
                q       =       1; 
                else if (reset) 
                q       =       0; 
         end 
    endmodule 
   

Listing 6-3 LeonardoSpectrum Message Log for Verilog Latch

    -- Reading target technology xi4xl
    Reading library file
    ImageC:EXEMPLARLEOSPECV19991Dlibxi4xl.synImage
    Library version = 1.8
    Delays assume: Process=3 
    -- read -tech xi4xl { ImageImageC:/Verilog/SourceCode/latch2.v′′ }
    -- Reading file ImageC:/Verilog/SourceCode/latch2.v′…
    -- Loading module latch2
    -- Compiling root module Imagelatch2′
    ImageImageC:/Verilog/SourceCode/latch2.v′′,line 4: Warning, q is not always
    assigned. latches could be needed.
    -- Pre Optimizing Design .work.latch2.INTERFACE
    Info: Finished reading design
    ->_gc_run
    -- Run Started On Mon Sep 06 10:42:20 Pacific Daylight Time 1999
    --
    -- optimize -target xi4xl -effort quick -chip -area -
    hierarchy=auto
    Using wire table: 4013xl-3_avg
    Info, Inferred net Imageset′ as GSR net.
    -- Start optimization for design .work.latch2.INTERFACE
    Using wire table: 4013xl-3_avg

          Pass     Area    Delay     DFFs  PIs   POs --CPU--
                   (FGs)    (ns)                                 min:sec
          1          0        7         0     2     2   00:00 
    Info, Added global buffer BUFG for port reset 
    Using wire table: 4013xl-3_avg
    -- Start timing optimization for design .work.latch2.INTERFACE
    No critical paths to optimize at this level
    
    *******************************************************
    
    Cell: latch2    View: INTERFACE    Library: work
    
    *******************************************************
    
     Number of ports :                               4
     Number of nets :                               10
     Number of instances :                           9
     Number of references to this view :             0
    
    Total accumulated area : 
     Number of BUFG :                                1
     Number of CLB Latches :                         1
     Number of IBUF :                                1
     Number of OBUF :                                2
     Number of STARTUP :                             1
    
    ***********************************************
    Device Utilization for 4010xlPQ100
    ***********************************************
    Resource                Used    Avail   Utilization
    -----------------------------------------------
    IOs                     4       77        5.19%
    FG Function Generators  0       800       0.00%
    H Function Generators   0       400       0.00%
    CLB Flip Flops          0       800       0.00%
    
    -----------------------------------------------
                            Clock Frequency Report
    
        Clock                : Frequency
          ------------------------------------
      
        reset                : 3333.3 MHz

Some items in the message log bear comment.

• Reading library file ‘C:EXEMPLARLEOSPECV19991Dlibxi4xl.syn’…

The library that LeonardoSpectrum uses to implement the latch design is the xi4xl library for the Xilinx 4xxxXL family.

• “C:/verilog/latch.v”, line 6: Warning, q is not always assigned. latches could be needed.

LeonardoSpectrum has very politely warned that a latch has been created. Generally, this is an error in the code caused by not defining all output conditions completely.

• optimize -target xi4xl -effort quick -chip -area -flatten=TRUE

We have selected a Xilinx 4010XL as a target device. We have selected a quick optimization as compared to an extended (multipass) compilation where multiple trials are evaluated. We have selected the chip mode, so device pins will be assigned at the top level. We have selected area optimization instead of optimization for speed. The netlist is flattened into one merged netlist; the hierarchy (where each module has a different section of the netlist) is dissolved.

• Info, Inferred net ‘set’ as GSR net.

LeonardoSpectrum has selected the set signal to be used as a global set (GSR stands for Global Set-Reset) resource. Xilinx has a globally routed signal that can be used for a set or reset without consuming the generic routing of the device; generally this network is used for a global reset.

    Pass     Area    Delay     DFFs  PIs   POs --CPU-- 
              (FGs)   (ns)                      min:sec 
    1          0       7         0     2     2   00:00  

We selected a 1-pass optimization; this pass resulted in a delay of 7 nsec. This design uses no D flipflops, uses two input ports and two output ports, and took zero seconds to compile. All right, not 0 seconds, but it compiled fast.

• Info, Added global buffer BUFG for port reset

In addition to the Global SR resource, the 4xxxXL family has eight global signals available (BUFG). Generally they are used for clocks, but LeonardoSpectrum has automatically extracted the reset signal and assigned it to a Global Buffer.

• Info, setting outputs in top level view ‘INTERFACE’ to fast.

The output pins assigned in this module use fast buffers. Generally, the designer should use slow buffers where possible to reduce power consumption and noise.

• Using wire table: 4013xl-3_avg

Use average loading during analysis. The alternative is to use worst-case loading that includes the worst-case effects of temperature and power-supply voltage. The -default mode is for quick and dirty lab testing. The -default mode can also be used when the speed effects are not pertinent—for example, if the FPGA is being used to emulate a design that will be implemented in a faster technology (an ASIC).

• IOs                     4       77        5.19%

We’ve used a very small part of the 4010XL device.

• Writing file C:/verilog/latch.edf

The output of the LeonardoSpectrum tool is an EDIF netlist which will be used by the Xilinx place-and-route tool to create a device configuration file (.bit file).

To get control of LeonardoSpectrums’ configuration settings, look under the Tools toolbar. There you’ll find a tab called Variable Editor; this pulls down a list of all the LeonardoSpectrum settings, some of which (like xlx_fast_slew, which sets the pin default drive to fast slew rate unless otherwise constrained) are not available in the GUI.

Running LeonardoSpectrum in the Batch Mode

Once you’re familiar with LeonardoSpectrum and want to get things done faster and in a more repeatable and controlled manner compared to using the GUI, you can run in in the batch mode with the spectrum executable (this program was called elsyn in previous versions of LeonardoSpectrum). Make sure the DOS PATH environment setting in autoexec.bat points to the spectrum program. For example, in my environment, this path is c:exemplarLeoSpecv1999.1dinwin32.

For example, an elementary command mode which will compile our basic latch design might look like:

    spectrum –source basiclatch.v  -edif_file basiclatch.edf -ta xi4e 

Another way is to cut and paste from the GUI filtered command window and create a file like basiclatch.run as shown in Listing 6-4.

Listing 6-4 Sample LeonardoSpectrum Executable Script File

    restore_project_script C:/Verilog/verilog/basiclatch.scr
    _gc_read_init
    _gc_run_init
    set input_file_list { ImageImageC:/Verilog/verilog/basiclatch.v′′ }
    set part 4013xlPQ160
    set process 3
    set wire_table 4000xl-default
    set pack_clbs FALSE
    set timespec_generate FALSE
    set nowrite_eqn FALSE
    set chip TRUE
    set macro FALSE
    set area TRUE
    set delay FALSE
    set report brief
    set hierarchy_preserve FALSE
    set output_file ImageImageC:/Verilog/verilog/basiclatch.edf′′
    set novendor_constraint_file FALSE
    set target xi4xl
    _gc_read
    _gc_run

This file was invoked with the command line: spectrum –file basiclatch.scr. Type “spectrum -batchhelp” to list all the command-line options (similar to Listing 6-5).

Listing 6-5 LeonardoSpectrum Batch Mode Commands

   -nomap_global_bufs

Don’t use global buffers for clocks and other global signals (Xilinx/Actel).

   -use_qclk_bufs

Use quadrant clocks for Actel 3200dx architecture.

   -insert_global_bufs

Use global buffers for clocks and other global signals (Xilinx/Actel).

   -max_cap_load <float>

Override default max_cap_load if specified in the library.

   -max_fanout_load <float>

Override default max_fanout_load if specified in the library.

   -lut_max_fanout <integer>

Specify net fanout for LUT technologies (Xilinx, Altera Flex, and Lucent ORCA).

   -noenable_dff_map

Disable clock-enable detection from HDLs.

   -enable_dff_map_optimize

Enable use of flipflop clock-enable extracted from random logic.

   -exclude <list>

Don’t use listed gate in mapping.

   -include <list>

Map to specified synchronous DFFs and DLATCHes.

   -pal_device

Disable map to complex IOs for Actel.

   -wire_tree <string>

Interconnect wire tree : best|balanced|worst = default.

   -wire_table <string>

Wire load model to use for interconnect delays.

   -nowire_table

Ignore interconnect delays during delay analysis.

   -nobreak_loops_in_delay

Don’t break combinational loops statically for timing analysis.

   -crit_path_analysis_mode <string>

maximum(report setup violations) | minimum(report hold violations) | both = default.

   -num_crit_paths <integer>

Report <integer> number of critical paths.

   -crit_path_slack <float>

Slack threshold in nanoseconds.

   -crit_path_arrival <float>

Arrival threshold in nanoseconds.

   -crit_path_longest

Show longest paths rather than critical paths.

   -crit_path_detail <string>

full(detailed point-to-point)(default) | short(startpoint-endpoint)

   -crit_path_no_io_terminals

Don’t report paths terminating in primary outputs.

   -crit_path_no_int_terminals

Don’t report paths terminating in internal endpoints.

   -crit_paths_from <list>

Report only paths starting at this <list> port, port_inst or instance.

   -crit_paths_to <list>

Report only paths ending at this <list> port, port_inst or instance.

   -crit_paths_thru <list>

Report only critical paths through the <list> net.

   -crit_paths_not_thru <list>

Report only critical paths that do not go through <list> net.

   -crit_path_report_input_pins

Report input pins of gates. Default = off.

   -crit_path_report_nets

Report net names. Default = off.

   -nocounter_extract

Disable automatic extraction of counters.

   -noram_extract

Disable automatic extraction of rams.

   -nodecoder_extract

Disable automatic extraction of decoders.

   -optimize_cpu_limit <integer>

Set a CPU limit for optimization.

   -notimespec_generate

Don’t create TIMESPEC info from user constraints; Xilinx only.

   -nopack_clbs

Don’t pack look-up tables (LUTs) into CLBs; for Xilinx 4K families only.

   -write_clb_packing

Print CLB packing (HBLKNM) info, if available, in XNF/EDIF.

   -crit_path_rpt <string>

Write critical path reporting in this file.

   -nocrit_path_rpt

Don’t create a critical path reporting file.

   -report_brief| -report_full

Generate a concise design summary or a detailed one. Default = full.

   -map_area_weight <float>

A number between 0 and 1.0. The larger this number, the more mapping will try to minimize area.

   -map_delay_weight <float>

A number between 0 and 1.0. The larger this number, the more mapping will try to minimize delay.

   -simple_port_names

Create simple names for vector ports: %s%d instead of %s(%d).

   -bus_name_style <string>

Naming style for vector ports and nets: default %s(%d)| simple %s%d| old_galileo %s_%d

   -nobus

Write busses in expanded form. This may be required for the Xilinx EDIF reader.

   -nowrite_eqn

Don’t write equations in output; use technology primitives instead.

   -nopld_xor_decomp

Don’t do XOR decomposition for Altera MAX and Xilinx CPLD technologies.

   -noglobal_symbol

Delete startup (GSR) block.

   -notime_opt

Don’t run timing optimization.

   -max_frequency <float> <a id=”page_0”></a>

Desired maximum operating frequency in MHz.

   -edifin_ground_net_names <list>

Specify that net(s) with <list> name(s) are ground nets.

   -edifin_power_net_names <list>

Specify that net(s) with <list> name(s) are power nets.

   -edifin_ground_port_names <list>

Specify that port(s) with <list> name(s) are ground ports.

   -edifin_power_port_names <list>

Specify that port(s) with <list> name(s) are power ports.

   -edifin_ignore_port_names <list>

Specify that port(s) with <list> name(s) are ignored ports.

   -edifout_power_ground_style_is_net

Write out power and ground as undriven nets with an extracted or inferred net name.

   -edifout_power_net_name <string>

Use <string> name for power nets when ‘edifout_power_ground_style_is_net’ is TRUE; default = ‘VCC’.

   -edifout_ground_net_name <string>

Use <string> name for ground nets when ‘edifout_power_ground_style_is_net’ is TRUE; default = ‘GND’.

COMPLETE DESIGN FLOW, 8-BIT EQUALITY COMPARATOR

So far, we’ve done only half the design work: the design entry and synthesis. To finish the job, we need to run the Xilinx place-and-route tool, the Design Manager. To illustrate how this tool is used, we’ll take an example design all the way through the process. This design is similar to an HC688, an 8-bit equality comparator. This design compares two bytes and generates a signal called equal if they are equivalent. A cascade input is also provided to expand the inputs that are compared; if cascade is not asserted, the equal output is inhibited. Because of personal preference, I’ve made a couple of design changes; all signals are active high, and I made the equal output synchronous. See Listing 6-6 for the Verilog code for this design.

Listing 6-6 8-Bit Equality Comparator

    // Synchronous 8-bit equality comparator.
    // All signals changed to be active high.
    // Output made synchronous.
    module hc688s (equal, clock, reset, cascade, a, b); 
    output              equal; 
    input               clock, reset; 
    input               cascade; 
    input          [7:0]  a, b; 
    reg                 equal; 
     
    always @ (posedge clock or posedge reset) 
         begin 
         if (reset) 
         equal  < =     0; 
         else if     (~cascade) 
         equal <  =     0; 
         else if (a == b) 
         equal   < =     1; 
         else 
         equal   < =     0;      // Make sure all input cases are covered. 
         end 
    endmodule 

The Verilog code is simple enough; equal can go high only if cascade is high and the a and b input bytes are equal. Let’s see what LeonardoSpectrum makes of this design by looking at the extracted schematic ofFigure 6-1

Figure 6-1 HC688s LeonardoSpectrum RTL Schematic

Image

FromFigure 6-1 we can see that the equal output is created by a flipflop and that the clock and reset were implemented as intended. LeonardoSpectrum has instantiated a library function from their module generator (modgen) to do the equality-test logic. For greater detail, LeonardoSpectrum has another schematic view option, the gate-level schematic, shown inFigure 6-2.

Figure 6-2 HC688s LeonardoSpectrum Gate-Level Schematic

Image

The gate-level schematic shows the logic as it is mapped into Xilinx hardware. The Xilinx Configurable Logic Block (CLB) will be explored in more detail in Chapter 7; for now we can note the assignment of our logic to 2-, 3-, and 4-input look-up tables (LUTs), the use of global buffers for clock and reset, and the flipflop that drives the equal output signal.

A couple of things should be noted about these schematic views. First of all, they are graphical representations of the netlist that LeonardoSpectrum synthesized. There is still some processing to be done on the design by the Xilinx Design Manager (the place-and-route tool). The use of this schematic is as a sanity check; if the design is not being synthesized effectively, the designer can try different compilation options or design in a more structural way. For example, the designer can replace the high-level equality operator (==) with structural gates to assert more control of how the design is synthesized.

LeonardoSpectrum provides one last view of the schematic, the critical path as shown inFigure 6-3.

Figure 6-3 HC688s LeonardoSpectrum Critical-Path Schematic

Image

The critical path is the longest delay path through the design. If the design needs to be optimized for greater speed, the designer should focus on redesigning this path to remove layers of logic. From this schematic, we can see that the longest delay path is from the b[4] input to the equal output, and there are four layers of logic in this path. Like the adders we studied earlier, there is probably a way to add extra logic to “look ahead” and streamline this logic, if necessary.

For this design, compiling to optimize for delay didn’t change anything, but for most designs there will be a change in interpreting the design, hopefully a change for the better.

There is one more view that has some value. The output of the synthesizer is a netlist, in this case an EDIF (.edf) file, but this type of file is not intended to be read by humans. LeonardoSpectrum can also generate a structural version of the netlist in a Verilog format. In fact, one great feature of LeonardoSpectrum is the ability to translate between netlists of various types. Anyway, we’re learning Verilog, so let’s look at the Verilog version of the netlist as shown in Listing 6-7.

Listing 6-7 8-Bit Equality Comparator Structural Netlist

// 
// Verilog description for cell hc688s,  
// 09/06/99 11:00:40 
// 
 
module hc688s ( equal, clock, reset, cascade, a, b ) ; 
 
    output  equal ; 
    input  clock ; 
    input  reset ; 
    input cascade ; 
    input  [7:0]a ; 
    input  [7:0]b ; 
 
    wire nx12, modgen_eq_2_nx21, modgen_eq_2_nx22, 
modgen_eq_2_nx23, modgen_eq_2_nx28, modgen_eq_2_nx29, clock_int, 
reset_int, cascade_int, a_7__int, a_6__int, a_5__int, a_4__int, 
a_3__int, a_2__int, a_1__int, a_0__int, b_7__int, b_6__int, 
b_5__int, b_4__int, b_3__int, b_2__int, b_1__int, b_0__int, nx15; 
    wire [8:0] $dummy ; 
 
    assign modgen_eq_2_nx22 = ( ~a_7__int &&  ~b_7__int &&  
~a_6__int &&  ~b_6__int) || ( ~a_7__int &&  ~ b_7__int && a_6__int 
&& b_6__int) || (a_7__int && b_7__int &&  ~ a_6__int &&  
~b_6__int) || (a_7__int && b_7__int && a_6__int && b_6__int) ; 
 
    assign modgen_eq_2_nx23 = ( ~a_5__int &&  ~b_5__int &&  
~a_4__int &&  ~b_4__int) || ( ~a_5__int &&  ~ b_5__int && a_4__int 
&& b_4__int) || (a_5__int && b_5__int &&  ~ a_4__int &&  
~b_4__int) || (a_5__int && b_5__int && a_4__int && b_4__int) ; 
 

    assign  modgen_eq_2_nx21 = (modgen_eq_2_nx22 && 
modgen_eq_2_nx23) ; 
 
    assign  modgen_eq_2_nx28 = ( ~a_3__int &&  ~b_3__int &&  
~a_2__int &&  ~b_2__int) || ( ~a_3__int &&  ~ b_3__int && a_2__int 
&& b_2__int) || (a_3__int && b_3__int &&  ~ a_2__int &&  
~b_2__int) || (a_3__int && b_3__int && a_2__int && b_2__int) ; 
 
    assign  modgen_eq_2_nx29 = ( ~a_1__int &&  ~b_1__int &&  
~a_0__int &&  ~b_0__int) || ( ~a_1__int &&  ~ b_1__int && a_0__int 
&& b_0__int) || (a_1__int && b_1__int &&  ~ a_0__int &&  
~b_0__int) || (a_1__int && b_1__int && a_0__int && b_0__int) ; 
 
    assign nx12 = (modgen_eq_2_nx28 && modgen_eq_2_nx29 && 
modgen_eq_2_nx21); 
 
    STARTUP ix63 (.Q2 ($dummy [0]), .Q3 ($dummy [1]), .Q1Q4 
($dummy [2]), .DONEIN ($dummy [3]), .GSR (reset_int), .GTS 
($dummy [4]), .CLK ( 
            $dummy [5])) ; 
 
     IBUF b_0__ibuf (.O (b_0__int), .I (b[0])) ; 
     IBUF b_1__ibuf (.O (b_1__int), .I (b[1])) ; 
     IBUF b_2__ibuf (.O (b_2__int), .I (b[2])) ; 
     IBUF b_3__ibuf (.O (b_3__int), .I (b[3])) ; 
     IBUF b_4__ibuf (.O (b_4__int), .I (b[4])) ; 
     IBUF b_5__ibuf (.O (b_5__int), .I (b[5])) ; 
     IBUF b_6__ibuf (.O (b_6__int), .I (b[6])) ; 
     IBUF b_7__ibuf (.O (b_7__int), .I (b[7])) ; 
     IBUF a_0__ibuf (.O (a_0__int), .I (a[0])) ; 
     IBUF a_1__ibuf (.O (a_1__int), .I (a[1])) ; 
     IBUF a_2__ibuf (.O (a_2__int), .I (a[2])) ; 
     IBUF a_3__ibuf (.O (a_3__int), .I (a[3])) ; 
     IBUF a_4__ibuf (.O (a_4__int), .I (a[4])) ; 
     IBUF a_5__ibuf (.O (a_5__int), .I (a[5])) ; 
     IBUF a_6__ibuf (.O (a_6__int), .I (a[6])) ; 
     IBUF a_7__ibuf (.O (a_7__int), .I (a[7])) ; 
    IBUF cascade_ibuf (.O (cascade_int), .I (cascade)) ; 
     IBUF reset_ibuf (.O (reset_int), .I (reset)) ; 
    OFDX reg_equal (.Q (equal), .C (clock_int), .D (nx15), .CE 
($dummy [6]), .GSR ($dummy [7]), .GTS ($dummy [8])) ; 
 
     BUFG clock_ibuf (.O (clock_int), .I (clock)) ; 
 
    assign nx15 = (nx12 && cascade_int) ; 
endmodule

This is a bit of an ugly mess, but there are a few things we can extract from it. Note the _int attached to the internal signals. This is very polite; some synthesizers convert a useful signal name like clock into a signal name like ifght_2746 instead of clock_int which makes it very difficult to search netlists. We want the synthesizer to do whatever is necessary to isolate a signal as it gets routed, but keep some part of the signal name we assigned in there somewhere. The equality module is modgen_2, and it gets wired up to the input buffers (ibufs). The equal register is an OFDX (output D flipflop); note the assignments for Q output, clock/data/clock enable. The GTS is a global tristate control and the GSR is the global set/reset control.

The place-and-route tool works on the netlist that is extracted from the input design and influenced by the design constraints and synthesis controls. If there is a problem with synthesized logic, it may help to look at the netlist and make sure things are being synthesized in a reasonable manner.

Another netlist form is the .xnf (Xilinx Netlist Format) which is very readable. Sadly though, Xilinx is moving to standardize on the much-less-readable EDIF format.

8-BIT EQUALITY COMPARATOR WITH HIERARCHY

Let’s hook up a few of our equality comparators and see what effect a hierarchical design has on the resulting netlist. The hier688 design, shown in Listing 6-8, instantiates three of our hc688s designs to create a 24-bit address decoder.

Listing 6-8 8-Bit Equality Comparator Hierarchical Example

    module hier688(chip_select, output_enable, addr, rwn, clock,
    reset) ;
    output              chip_select, output_enable;
    input       [23:0] addr;
    input               rwn, clock, reset;
    wire                low, middle, high;
    reg                 chip_select, output_enable;
    parameter   low_range        =      8′h80;
    parameter   mid_range        =      8′ha0;
    parameter   high_range      =       8′hff;
    
    // Tie off cascade input for low address comparator.
    hc688s u1 (low,    clock, reset, 1′b1,   addr[7:0],               low_range);
    hc688s u2 (middle, clock, reset, low,                addr[15:8],  mid_range);
    hc688s u3 (high,   clock, reset, middle, addr[23:16], high_range);
    
    // Synchronize the module outputs.
    always @ (posedge clock or posedge reset)
        begin
                if (reset)
                        begin
                        chip_select             < =      0;
                        output_enable           < =      0;
                        end
                else
                        begin
                        chip_select             < =      high; 
                        output_enable           < =      (high # ~rwn); 
                        end 
         end 
     
    endmodule 

Figure 6-4 Hierarchical HC688s Gate-Level Schematic

Image

The schematic of Figure 6-4 is not very legible, but you can see that our structural use of the HC688 decoders results in cascaded logic. This design is not going to be very fast, but is easy to put together as it reuses predesigned HC688 modules. Although we’re not going to analyze the critical path, clearly it will be from a low-order address input to the output_enable output signal.

Let’s carry this design into a real device. We do this by placing and routing the design and creating a configuration file for the Xilinx device where our design will live. We will open the Design Manager, create a new project, and browse (see Figure 6-5) until we find the hier688.edf netlist. The Design Manager has a one-button operation (the idea is: if the designer falls over dead, his or her head will hit the keyboard, and a place-and-route will still take place). We’ll play dumb and just run the default Design Manager flow and see what we get.

Figure 6-5 Opening a Design With Xilinx Design Manager

Image

A convenient way to execute the Design Manager is to create a shortcut icon on your Windows desktop. For example, in my environment the command line is: C:Xilinxin tdsgnmgr.exe.

Listing 6-9 8-Bit Equality Comparator Hierarchical Example, Xilinx Translation Report

ngdbuild:  version M1.5.19 
Copyright (c) 1995-1998 Xilinx, Inc.  All rights reserved. 
 
Command Line: ngdbuild -p xc4010xl-3-pq100 -dd .. 
C:VerilogSourceCodehier688.edf hier688.ngd  
 
Launcher: Executing edif2ngd ÜC:VerilogSourceCodehier688.edfÜ 
ÜC:VerilogSourceCodexprojver1hier688.ngoÜ 
Reading NGO file ÜC:/Verilog/SourceCode/xproj/ver1/hier68wid8.ngoÜ 
… 
Reading component libraries for design expansion… 
 
Checking timing specifications … 
 
Checking expanded design … 
 
NGDBUILD Design Results Summary: 
  Number of errors:      0 
  Number of warnings:    0 
 
Writing NGD file Ühier688.ngdÜ … 
 
Writing NGDBUILD log file Ühier688.bldÜ… 

Figure 6-6 shows the Report Browser window. If we click on the Translation Report, we will see the report of Listing 6-9, and we can see that the input design was read without error. The EDIF netlist is converted to a Xilinx binary netlist file: a .ngo file.

Figure 6-6 Design Manager Reports

Image

Listing 6-10 8-Bit Equality Comparator Hierarchical Example, Xilinx Place and Route Report

Starting Constructive Placer.  REAL time: 7 secs
Placer score = 13350 
Placer score = 9810 
Placer score = 6780 
Placer score = 5730 
Placer score = 5190 
Placer score = 4440 
Placer score = 3720 
Placer score = 3570 
Placer score = 3480 
Placer score = 3270 
Placer score = 3090 
Finished Constructive Placer.  REAL time: 7 secs

Listing 6-10 is a clip from the Xilinx place-and-route report. Like a printed circuit board autorouter, the place-and-route tool tries different placements and selects the ones with the better results. At this point an estimate of the timing can be extracted.

Listing 6-11 Equality Comparator Hierarchical Example, Xilinx Average Delay Report

The Number of signals not completely routed for this design is: 0 
 
   The Average Connection Delay for this design is:      1.929 ns 
   The Average Connection Delay on critical nets is:     0.000 ns 
   The Average Clock Skew for this design is:            0.098 ns 
   The Maximum Pin Delay is:                             5.937 ns 
   The Average Connection Delay on the 10 Worst Nets is: 2.983 ns 
 
   Listing Pin Delays by value: (ns) 
 
d <= 10  < d <= 20   < d <= 30   < d <= 40   < d <= 50    d > 50 
-------   ---------   ---------   ---------   ---------   ------- 
  37            0           0           0           0         0 

The signal delays are binned per Listing 6-11. This is a moderately fast design (looks like it would run at 100 MHz to me) but only because very little of the device is used! As the device gets fuller and more logic competes with routing resources, the design will get slower.

Listing 6-12 8-Bit Equality Comparator Hierarchical Example, Xilinx Pad Report

# Pinout constraints listing 
# These constraints are in PCF grammar format 
# and may be cut and pasted into the PCF file 
# after the ÜSCHEMATIC END ;Ü statement to 
# preserve this pinout for future design iterations.

COMP Üaddr(0)Ü LOCATE = SITE ÜP90Ü ;  
COMP Üaddr(1)Ü LOCATE = SITE ÜP89Ü ;  
COMP Üaddr(10)Ü LOCATE = SITE ÜP36Ü ;  
COMP Üaddr(11)Ü LOCATE = SITE ÜP35Ü ;  
COMP Üaddr(12)Ü LOCATE = SITE ÜP37Ü ;  
COMP Üaddr(13)Ü LOCATE = SITE ÜP39Ü ;  
COMP Üaddr(14)Ü LOCATE = SITE ÜP44Ü ;  
COMP Üaddr(15)Ü LOCATE = SITE ÜP42Ü ;  
COMP Üaddr(16)Ü LOCATE = SITE ÜP32Ü ;  
COMP Üaddr(17)Ü LOCATE = SITE ÜP22Ü ;  
COMP Üaddr(18)Ü LOCATE = SITE ÜP30Ü ;  
COMP Üaddr(19)Ü LOCATE = SITE ÜP31Ü ;  
COMP Üaddr(2)Ü LOCATE = SITE ÜP93Ü ;  
COMP Üaddr(20)Ü LOCATE = SITE ÜP23Ü ;  
COMP Üaddr(21)Ü LOCATE = SITE ÜP21Ü ;  
COMP Üaddr(22)Ü LOCATE = SITE ÜP24Ü ;  
COMP Üaddr(23)Ü LOCATE = SITE ÜP33Ü ;  
COMP Üaddr(3)Ü LOCATE = SITE ÜP95Ü ;  
COMP Üaddr(4)Ü LOCATE = SITE ÜP97Ü ;  
COMP Üaddr(5)Ü LOCATE = SITE ÜP94Ü ;  
COMP Üaddr(6)Ü LOCATE = SITE ÜP88Ü ;  
COMP Üaddr(7)Ü LOCATE = SITE ÜP96Ü ;  
COMP Üaddr(8)Ü LOCATE = SITE ÜP38Ü ;  
COMP Üaddr(9)Ü LOCATE = SITE ÜP43Ü ;  
COMP Üchip_selectÜ LOCATE = SITE ÜP20Ü ;  
COMP ÜclockÜ LOCATE = SITE ÜP5Ü ;  
COMP Üoutput_enableÜ LOCATE = SITE ÜP18Ü ;  
COMP ÜresetÜ LOCATE = SITE ÜP56Ü ;  
COMP ÜrwnÜ LOCATE = SITE ÜP17Ü ;  

We did not assign pin locations in the input design. The first time through it is not a bad idea to let the place-and-route tool assign the pins (particularly with Altera devices). The FPGA design tries to allow pins to be assigned in a universal manner (i.e., not be sensitive to pin usage by the designer; allow any I/O pin to be used with logic anywhere on the chip), but there is some assumption made, for example, that data flow is horizontal (with relation to the Pin 1 location on the device) and control is vertical. On the other hand, for the PWB design, you may want to control the pin locations and keep addresses together and that sort of thing. Once the circuit board has been designed, we don’t want the compiler reassigning pins, so we are going to constrain the pin locations. The pins assigned by the Xilinx place-and-route tool can be located in the Pad Report as shown in Listing 6-12. This file can be cut, pasted, and edited into the LeonardoSpectrum Constraint file to lock down pin assignments as shown in Listing 6-13. This can also be done in Xilinx Design Manager, but I prefer to lock these pins in the design capture environment.

Listing 6-13 8-Bit Equality Comparator, Xilinx Pin Assignments

addr(0)             INPUT          P90 
addr(1)             INPUT          P89 
addr(10)            INPUT          P36 
addr(11)            INPUT          P35 
addr(12)            INPUT          P37 
addr(13)            INPUT          P39 
addr(14)            INPUT          P44 
addr(15)            INPUT          P42 
addr(16)            INPUT          P32 
addr(17)            INPUT          P22 
addr(18)            INPUT          P30 
addr(19)            INPUT          P31 
addr(2)             INPUT          P93 
addr(20)            INPUT          P23 
addr(21)            INPUT          P21 
addr(22)            INPUT          P24 
addr(23)            INPUT          P33 
addr(3)             INPUT          P95 
addr(4)             INPUT          P97 
addr(5)             INPUT          P94 
addr(6)             INPUT          P88 
addr(7)             INPUT          P96 
addr(8)             INPUT          P38 
addr(9)             INPUT          P43 
chip_select         OUTPUT         P20 
clock               INPUT          P5 
output_enable       OUTPUT         P18 
reset               INPUT          P56 
rwn                 INPUT          P17 

These pins can be assigned in the LeonardoSpectrum environment by going to the Constraints Tab, finding the Input or Output tab, and filling in the entry box for Pin Location. Make sure to hit the Apply button once all the pin assignments are filled in (see Figure 6-7). They can also be assigned in the batch mode as so:

set_attribute –port {<hierarchical net name>} –name PIN_NUMBER –
value PXX 
 
    Note: replace XX with the desired pin number. 

Figure 6-7 LeonardoSpectrum Pin Assignment using the GUI

Image

These are not the only required pin assignments on the circuit board. We must hook up the dedicated signals including power, ground, and configuration signals on the board-level schematic.

Listing 6-14 8-Bit Equality Comparator Hierarchical Example, Xilinx Asynchronous Delay Report

The 20 Worst Net Delays are: 
------------------------------- 
| Max Delay (ns)  | Netname    | 
------------------------------- 
   5.937            low 
   4.154            middle 
   3.314            high 
   2.751            clock_int 
   2.508            reset_int 
   2.490            addr(23)_int 
   2.228            addr(17)_int 
   2.187            addr(21)_int 
   2.179            addr(4)_int 
   2.097            addr(1)_int 
   2.085            addr(7)_int 
   1.823            addr(14)_int 
   1.823            addr(10)_int 
   1.767            addr(9)_int 
   1.754            addr(22)_int 
   1.739            addr(0)_int 
   1.705            addr(15)_int 
   1.693            addr(11)_int 
   1.637            addr(6)_int 
   1.557             addr(16)_int

The top 20 delays can be viewed in the Asynchronous Delay Report as shown in Listing 6-14. From this, we can guess that this design would run at 168 MHz, not bad for a slow –3 speed grade part. Again, we’re using only a tiny percentage of the device. Still, this is not the full story, this is just the delays between individual nodes; to get the full delay we have to run full timing analysis with this result:

Timing constraint: Default period analysis 
 34 items analyzed, 0 timing errors detected. 
 Minimum period is   9.967ns. 
 
Delay:     9.967ns low to middle (8.027ns delay plus 1.940ns
setup)Path low to middle contains 2 levels of logic: 2wide
Path starting from Comp: CLB_R1C10.K (from clock_int) 
To         Delay type         Delay(ns)        Physical 
Logical Resources                                     Resource 
-------------------------------------------------  -------- 
CLB_R1C10.XQ      Tcko               2.090R             low 
                                                       u1_reg_equal 
CLB_R20C10.C2     net (fanout=1)     5.937R             low 
CLB_R20C10.K      Thh1ck             1.940R            middle 
modgen_eq_3_ix18                                       u2_reg_equal 
------------------------------------------------- 
Total (4.030ns logic, 5.937ns route)       9.967ns (to clock_int) 
      (40.4% logic, 59.6% route)

This tells us that the worst-case delay from flipflop to flipflop is 9.967 nsec, so we can really only run our clock at 100 MHz, not nearly so impressive.

OPTIMIZATION OPTIONS IN THE XILINX ENVIRONMENT

The Xilinx place-and-route tool, called the Design Manager, converts the EDIF netlist into a configuration file that can be loaded into a target device. Some of the place-and-route tool optimization parameters are configurable by the designer. To get into the options menu, select options from the implementation menu as shown in Figure 6-8.

Figure 6-8 Xilinx Design Manager Options Selection

Image

MAPPING OPTIONS

The synthesized netlist has some placeholders for precompiled library elements. The mapper finds the library elements (.ngo files, a binary netlist format) and merges them in. The mapper then converts the merged netlist into a physical netlist with specific hardware elements assigned to all the netlist logic elements. The mapper output is an .ncd (physical netlist format) file. The user can configure the mapping process with the following options from the Implementation Options window shown in Figure 6-9.

Figure 6-9 Xilinx Design Manager Implementation Options

Image

Trim Unconnected Logic

If the mapper encounters logic that is not used, this logic can be deleted from the design. This simplifies the logic and speeds up the place-and-route process. However, the designer might want to keep the unused logic because it will be used in a later version of the design. Leaving the logic in may give a better estimate of the resources and timing related to the final design.

Replicate Logic to Allow Logic Level Reduction

Redundant logic can be added to the design to reduce driver loading and speed up the design (the basic area/speed trade-off).

Generate 5-Input Functions

Generally, the basic Xilinx logic element is a 4-input look-up table. However, in some Xilinx families the CLB logic can be configured to create 5-input LUTs.

CLB Packing Strategy

The mapper uses a set of rules to attempt to utilize the CLBs effectively. The CLB Packing Strategy modifies the logic partitioning to allow less signal sharing and allows the use of a CLB flipflop without the associated LUT. Again, this is a speed/area trade-off; the CLB Packing Strategy can use more logic but may allow the design to run at a higher operating speed. The Fit Device option packs the CLBs with possibly unrelated logic until the design fits into the target device or until no more packing is possible. Turning this option Off allows only related logic (logic with shared inputs) to be packed into a CLB.

Pack CLB Registers for Minimum Area or Structure

This option controls register ordering by analyzing bussed signal names. The Minimum Area option will result in a denser design with registers mapped in a more random order. The Structure option enables register-ordering analysis.

Pack I/O Registers/Latches into IOBs for Inputs Only, Outputs Only, Inputs and Outputs, and Off

Normally, the synthesis tool assigns logic to I/O buffers (IOBs). However, this option allows the mapper to assign IOBs and can result in better CLB packing. Use the Off option to allow the synthesis tool to control IOB assignment.

Use Generic Clock Buffers (BUFGs) in Place of BUFGPs and BUFGSs

Older Xilinx devices used primary (BUFGP) and secondary (BUFGS) global buffers for global signals, so some synthesis tools may make these assignments. Newer Xilinx devices use a pool of generic global buffers (BUFGs). Enabling this option will allow the replacement of BUFGSs and BUFGPs with BUFGs.

Place-and-Route Options

Place & Route Effort Level

Another trade-off is the amount of time spent optimizing a design versus the optimization results as shown in the Place and Route menu in Figure 6-10. If the place-and-route tool tries longer, it will have more options to select from, and the area/speed results will probably be better. Higher effort levels will increase the run time.

Figure 6-10 Xilinx Design Manager Place-and-Route Options

Image

Router Option, Run Routing Passes

The designer can select the number of routing passes. Each routing pass is a complete attempt at placement. Once the router has met the design requirements (the design fits into the device with all timing constraints met), the router exits.

Run Delay-Based Cleanup Passes

Once a design has been placed, the timing can probably be improved. With this option the designer can run 1 to 5 additional cleanup passes to attempt to improve the operating speed.

Use Timing Constraints During Place-and-Route

The timing constraints can be used to influence the place-and-route and achieve higher operating speeds. Using timing constraints trades off processing time for design performance. Turn this option Off to ignore timing constraints and speed up the place-and-route process.

Logic Level Timing Report/Post Layout Timing Report 209

Produce Logic Level Timing Report

For a quick view of the timing performance of the design, a logic level timing report can be produced by selecting the check box shown in Figure 6-11. These estimated results can be reviewed without going through the complete (and often very time-consuming) place-and-route process.

Figure 6-11 Xilinx Design Manager Implementation Options

Image

Produce Post Layout Timing Report

A top-level report of the device timing can be reviewed with this brief timing report. The maximum clock speed is reported. For error and path reports the entries are sorted by constraint and delay value. Negative slack-time values indicate a constraint that was not met.

Limit Report to n Paths per Timing Constraint

This setting, either Summary, No Limit, or a number from one to ten, limits the reported number of worst-case paths per timing constraint.

Report Paths Using Advanced Design Analysis (No Timing Constraints)

This option provides a timing analysis when no user constraints are present. The analysis includes all clocks, the required offset for each clock, and a listing of combinational paths sorted by delay value.

Report Paths in Timing Constraints

This option generates a timing report based on timing constraints. The number of paths reported per constraint is per the selection made in the Limit Report to n Paths per Timing Constraint dialog box.

Listing 6-15 is an example of a timing report for a signal in the hier688.v design. All the delay paths between rwn and output_enable are listed, along with the positive slack time (good!). Note that 80% of the delay is in logic. This percentage will get smaller (possibly much smaller) as the design gets more dense and the logic fights for routing resources.

Listing 6-15 Example, Xilinx Timing Report

=================================================================
Timing constraint: TS01 = MAXDELAY FROM TIMEGRP ÜPADSÜ TO TIMEGRP 
ÜFFSÜ 50nS;  
 30 items analyzed, 0 timing errors detected. 
 Maximum delay is  13.354ns. 
-----------------------------------------------------------------
Slack:    36.646ns path rwn to output_enable relative to 
          50.000ns delay constraint 
 
Path rwn to output_enable contains 3 levels of logic: 
Path starting from Comp: P102.PAD 

To                   Delay type          Delay(ns)  Physical 
Resource 
                                                    Logical 
Resource(s) 
-------------------------------------------------  -------- 
P102.I1              Tpid                  3.000R  rwn 
                                                    IPAD_rwn 
                                                    ix46 
CLB_R7C14.F2         net (fanout=1)         1.215R  rwn_int 
CLB_R7C14.X          Tilo                   2.700R  D 
                                                    ix79 
P99.O                net (fanout=1)         1.439R  D 
P99.OK               Took                  5.000R  output_enable 
                                                   
reg_output_enable 
------------------------------------------------- 
Total (10.700ns logic, 2.654ns route)     13.354ns (to clock_int) 
      (80.1% logic, 19.9% route) 

Report Paths Failing Timing Constraints

This option generates a report of signals and paths that fail the timing constraints, listed from worst to best. The logic and routing delays are identified and the failing path delays are broken out to show all the delays that build up to cause the problem. A close examination of the delays will provide clues to areas that can be pipelined or simplified to make the design run faster or identify areas where the constraint is over-specified.

The number of paths reported per constraint is per the selection made in the Limit Report to n Paths per Timing Constraint dialog box.

Interface Options

Macro Search Path

When the netlist is merged and .ngo files are inserted, the compiler searches for the proper file to insert. The user can add other search paths. Multiple search paths can be entered, a semicolon being used as path separator.

Rules File

To be merged in the ncf netlist, the filetype must be an ngo. The rules file path can point to a utility for converting other netlist file formats to an .ngo filetype.

Create I/O Pads from Ports

Some design tools convert PAD symbols into module port symbols. This checkbox option will convert top-level module ports into PADs (device pins).

Simulation Options

Simulation Data Options

Xilinx can create a timing-annotated netlist in three flavors: EDIF, VHDL, and Verilog. We’ll want to use the Verilog option to support Verilog simulation, of course. Vendors supported for this version of the Xilinx place-and-route tool include generic EDIF, generic Verilog, generic VHDL, ActiveVHDL, Concept NC-Verilog, Concept Verilog-XL, Foundation EDIF, ModelSim Verilog (for the purposes of this book, this is the option we will use), ModelSim VHDL, NC-Verilog, Quicksim, Verilog-XL, Viewsim-XL, Viewsim-EDIF, VSS, and Default.

Correlate Simulation Data to Input Design

To use your logic gate and signal names instead of the names assigned by the place-and-route tool in the optimized netlist, check this checkbox.

Simulation Netlist Name

Define the filename for the simulation output file. If you want to keep multiple versions of the simulation file, enter the filenames here, otherwise the new file will overwrite the previous one.

VHDL/VERILOG SIMULATION OPTIONS

Bring Out Global Set/Reset Net as a Port

For simulation purposes, it can be handy to have the internal Set/Reset node available as a port at the toplevel of the design. The signal name that drives the Global Set/Reset (GSR) resource can be entered in the dialog box to match the HDL design.

Bring Out Global Tristate Net as a Port

For simulation purposes, it can be handy to have the internal tristate control node available as a port at the toplevel of the design. The signal name that drives the Global Tristate (GTS) resource can be entered in the dialog box to match the HDL design. This tristate controls all device outputs and is useful for isolating a device from a circuit board being tested (stimulated) with external equipment.

Generate Test Fixture/Testbench File

Check this checkbox to create a Verilog test fixture (.tv) template file.

Include uselib Directive in Verilog File

Xilinx provides a set of timing-annotated SIMPRIM (SIMulation PRIMitive) files. The path to these files can be automatically inserted in the Verilog test-fixture file by checking this checkbox.

Generate Pin File

Check this checkbox to create signal-to-pin (.pin) mapping file.

Retain Hierarchy in Netlist

The Verilog test-fixture file can maintain the input design hierarchy or flatten the netlist into one big file. Check this checkbox to maintain the input design hierarchy.

Configuration Options

Xilinx devices are SRAM based and must have their configuration loaded after each power-on. There are many configuration modes, including serial PROM, parallel master, parallel slave, download cable, etc.

Configuration Rate

Slow (1 MHz) or Fast(8 MHz) internal configuration clock (master modes). These are approximate speeds.

Threshold Levels (XC4000E and XC4000EX Only)

Select between a TTL-compatible input threshold (nominally 30% of the power-supply value) or CMOS threshold (nominally 50% of the power-supply value) and output drive. Select Read from Design to use the TTL/CMOS input level defined in the physical constraints (PCF) file.

Configuration Pins

Various pull-up and pull-down options are available for the TDO, Mode, and Done configuration pins, including a tristate mode.

Perform CRC During Configuration

The internal Xilinx configuration logic can perform a four-bit partial CRC check of configuration data frames or just do a simple check of the 0110 pattern at the end of each frame.

Produce ASCII Configuration File

The normal configuration file is a binary .bit file. An ASCII version (.rbt) of this configuration bitstream file can also be created.

5V Tolerant I/Os (XC4000XLA and XC4000XV Only)

I/O pins on a low-voltage device can be configured to withstand higher drive voltages for mixed-power-supply operation.

Start-Up Options

Start-up Clock

Configuration can be started based on an internal (CCLK) or external clock (User Clock) source.

Synchronize Start-up to DONE Input Pin

The status of the open-drain DONE pin can be monitored. In cases where multiple FPGA DONE pins are wire-ORed together, enabling this feature will cause all devices to start-up when the last device has finished configuring.

Output Events

Control signals can be asserted or released with different timing. These status signals include Done, Enable Outputs, and Release Set/Reset.

Readback

The device configuration can be read when readback is enabled (readback can be disabled for design security reasons). This tab includes options for the readback clock source (internal or external) and termination of the readback process.

Tie Unused Interconnect

Unused pins can be tied high or low to reduce noise and power consumption.

Advanced Options

In the master parallel configuration mode, where the FPGA generates address lines to control a parallel memory device, the configuration address lines can be configured for 18 or 22 lines.

OTHER DESIGN MANAGER TOOLS

Design Manager tools include the Flow Engine (which we used to perform the place-and-route process), Timing Analyzer, Floor Planner, PROM File Formatter, Hardware Debugger (which includes the FPGA download utility), and the EPIC Design Editor.

Timing Analyzer

The Timing Analyzer will provide a report of selected paths in the design. For example, it is possible to examine all clocks in the design. Specific paths can be excluded.

Listing 6-16 Example, Xilinx Timing Report

=================================================================
Timing constraint: Default period analysis 
 12 items analyzed, 0 timing errors detected. 
 Maximum delay is  11.647ns. 
----------------------------------------------------------------
Delay:    11.647ns device_bus2(0) to device_bus1(2) 
 
Path device_bus2(0) to device_bus1(2) contains 3 levels of logic:
Path starting from Comp: P46.PAD 
To                   Delay type         Delay(ns)  Physical 
Resource 
                                                   Logical 
Resource(s) 
-------------------------------------------------  -------- 
P46.I2               Tpid                  1.560R  device_bus2(0)
IPAD_device_bus2(0) 
                                                    ix57 
CLB_R24C1.G2         net (fanout=3)         2.016R  
device_bus2(0)_int 
CLB_R24C1.Y          Tilo                   1.590R  
device_bus1_dup0(3) 
                                                    ix66 
P44.O                net (fanout=1)         2.441R  
device_bus1_dup0(2) 
P44.PAD              Topf                  4.040R  device_bus1(2) 
                                                    ix50 
                                                   
OPAD_device_bus1(2) 
------------------------------------------------- 
Total (7.190ns logic, 4.457ns route)       11.647ns 
      (61.7% logic, 38.3% route) 

List 6-16 shows a generic timing report for the worst path (critical path) in the hier688 design. The maximum delay for this path is 11.647 nsec. Note the division of time between logic and routes listed at the bottom. As the design gets denser, the routing will be a higher percentage of the delay.

Floorplanning

Floorplanning is a procedure where the arrangement and location of logic inside the FPGA is manipulated and optimized. Figure 6-12 illustrates a typical device floorplan. Some aspects of the design are obvious to the designer and may or may not be recognized by the automated place-and-route tools. Which parts of the design are critical and should be located adjacent to other logic elements? Can things be switched around to get a more faster and more efficient design? Humans are better at these types of tasks than computers.

Figure 6-12 Xilinx Design Manager Floorplanner Tool

Image

Figure 6-13 shows a zoom view of the hier688 logic, the pin assignments, the CLBs, and a rats-nest view of the signal routing.

Figure 6-13 Xilinx Design Manager Floorplanner Tool, hier688 Design Zoom View

Image

PROM File Formatter

Xilinx supports serial and parallel configuration PROM versions. A file can also be created and linked into a microcontroller PROM. Large devices may require multiple PROMs. The PROM File Formatter allows the design to be split into multiple configuration devices as shown in Figure 6-14.

Figure 6-14 Xilinx Design Manager PROM File Formatter Options

Image

Hardware Debugger

The Hardware Debugger allows communication options (shown in Figure 6-15) which allow a device to be configured with a PC serial port, parallel port or with a Xilinx Xchecker cable (which also connects to a PC parallel port). A header (standard 0.25 square posts, 0.1 center pattern) wired per Figure 6-16 must on the circuit board to support this download. Xilinx also supports 4-wire (TDI, TMS, TCK, TDO) JTAG serial-port programming.

Figure 6-15 Communication Setup Options

Image

Figure 6-16 Xilinx Xchecker Cable Header Wiring

Image

Epic Design Editor

This tool provides a graphical representation of the design as if you were looking down at the physical device itself (see Figure 6-17). Pins, pin buffers and registers, global signals, signal routing, and CLBs are all visible. Some routing can be done at this level. For example, it is possible to hook up test-points without resynthesizing and recompiling.

Figure 6-17 EPIC Representation of the hier688 Design

Image

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.135.209.249