14.4. Accelerating the Fast Fourier Transform

The heart of the decimation-in-frequency FFT algorithm is the “butterfly” operation, which resides at the algorithm’s innermost loop. Each butterfly operation requires six additions and four multiplications to compute the real and imaginary components of a radix-2 butterfly result. Using the TIE language, it’s possible for a design team to augment the Xtensa processor’s pipeline with four adders and two multipliers so that the processor can compute half of an FFT butterfly in one cycle.

The Xtensa processor’s configurable memory interfaces can be configured to be as wide as 128 bits so that all four real and imaginary integer input terms for each butterfly can be loaded into special-purpose FFT input registers in one cycle. A processor with 128-bit memory interfaces can store all four computed output components into memory in one cycle as well.

Practically speaking, it’s very hard to create single-cycle, synthesizable multipliers for SOCs that operate at clock rates of several hundred MHz. It’s better to stretch the multiplication across two cycles so that the multiplier doesn’t become the critical timing element on the SOC. The additional multiplier latency does not affect the throughput of the butterfly computations in this example and, if necessary, even longer latencies can be accommodated through additional state storage in the butterfly execution unit.

This approach adds a SIMD (single-instruction, multiple data) butterfly computation unit to the processor (using fewer than 35,000 gates including the two 24 × 24-bit multipliers). The performance improvements appear in Table 14.1. The table also shows the code size of the FFT programs with and without the TIE extensions.

Table 14.1. Acceleration results from processor augmentation with FFT instructions
  C code with software multiplicationC code with hardware multiplicationC code with TIE-based butterfly instruction extensionPerformance improvement
Code size 430 bytes + libraries430 bytes158 bytes 
 FFT length    
Performance128-point7635481697392269337×
(cycles)256-point17876453864984711379×
 512-point39752458671339841404×
 1024-point9241893192264420603449×

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset
3.144.221.19