Dedicated processors outperform DSP boards in FFT processing

Representing a key mathematical transform used in both military and medical image processing, the fast Fourier transform (FFT) is also one of the most computationally expensive. In some military systems, FFT processing can require dozens of digital-signal processors (DSPs) tightly coupled across standard buses, such as VME, and high-speed open-standards interconnects, such as Raceway from Mercury Computer Systems, Sky Channel from Sky Computers (Chelmsford, MA), and Myrinet from Myricom (Arcadia

Dedicated processors outperform DSP boards in FFT processing

Representing a key mathematical transform used in both military and medical image processing, the fast Fourier transform (FFT) is also one of the most computationally expensive. In some military systems, FFT processing can require dozens of digital-signal processors (DSPs) tightly coupled across standard buses, such as VME, and high-speed open-standards interconnects, such as Raceway from Mercury Computer Systems, Sky Channel from Sky Computers (Chelmsford, MA), and Myrinet from Myricom (Arcadia, CA).

To process signal and image data rapidly, the FFT is often partitioned among many processors. As Eric Fowler, systems engineer at Mercury Computer Systems, points out, selecting the most optimal section length of data requires certain trade-offs. For example, whereas the smallest possible section length favors maximum performance, larger sections yield a higher percentage of useful output data points.

To demonstrate this, Fowler computed the time required to perform varying numbers of operations using section lengths ranging from 1-k to 16-k points using the 21060 SHARC DSP from Analog Devices (Norwood, MA). The 8-k element sections provide the highest throughput per SHARC processor at 1.27 million samples/s (see table). Therefore, concludes Fowler, processing data at 50 million samples/s requires a minimum of forty 21060 SHARC processors.

To meet these demands, Mercury, Sky, and CSPI (Billerica, MA) have implemented multiple DSPs on single VME boards and lowered the board-count requirement to implement such systems. Despite these gains, however, multiple boards are still required when the FFT must be accomplished in real time. To reduce the number of boards used in such systems, companies such as Catalina Research (Colorado Springs, CO) and Valley Technologies (Tamaqua, PA) are building processing boards purely to perform FFTs. In the second quarter of this year, Valley Technologies is expected to market PCI- and VME-based processor boards based on the DSP-24 IC from DSP Architectures (Vancouver, WA).

"Signing an early engagement agreement and working closely with DSP Architectures," says Jerry Petrole, president of Valley Technologies, "has helped us design board-level products in parallel with chip developments, making VTI the first board vendor to provide DSP-24-based products." Valley Tech nologies is expected to introduce the VT-5260, a VME-based DSP processor capable of carrying two DSP-24 devices on separate FFT modules (see figure). With a rated clock speed of 100 MHz, each processing module is capable of performing a 1024-point complex FFT in 21 µs, proving a maximum computational rate of 11 µs when the two modules are used in parallel or cascade.

Like Mercury Computer Systems, Valley Technologies has also benchmarked its VT-5260 configured with a single processor module against a range of data points. The time to compute an 8-k point, real FFT is 0.166 ms, making the processing module approximately 15 times faster than the original 21060 SHARC processor. "Only two DSP-24 devices are required to perform a 50-million-sample/s, 8-k complex FFT, as opposed to more than 40 SHARC processors. With the same two DSP-24s, the 8-k real FFT can be sustained at more than 90 million samples/s," claims Petrole.

Adds Steve Byrd, VTI sales and marketing manager, "The VT-5260 architecture provides not only high-speed but also fast input and output buffer memory circuits to allow sustained high sample-rate processing, consistent with the performance of the DSP. Data-routing bottleneck issues don`t exist with this approach, as they often do with multiple general-purpose DSP implementations."

For image-processing applications, Valley Technologies has populated the board with banks of 64k ¥ 48-bit SRAM. With this amount of memory, FFTs of images as large as 128 ¥ 128 pixels can be performed on-board.

"To increase this image size, we will populate future versions of the board with 256k ¥ 48-bit and 512k ¥ 48-bit deep memory, allowing 512 ¥ 512-bit and 1024 ¥ 1024-bit deep images to be computed on-board," says Byrd. Such 512k ¥ 48-bit deep memories are expected to be available at the end of this year, he adds.

To challenge the role of DSPs in this area, Valley Technologies has incorporated the Raceway interface onto the board. This interface allows existing Mercury customers and other VME/Raceway systems integrators to integrate the VT-5260 into their systems. "Because the Raceway interface is implemented as a field-programmable gate array (FPGA)," says Petrole, "we can reconfigure the board for Sky Channel and Myrinet interfaces as required."

To ease the systems-integration task, Valley Technologies is planning to offer a development language, compiler, and simulator for the board. Dubbed VectorWare, the software toolset consists of VectorBuilder, an optimizing compiler that generates vector microcode for the VT-5260 and VectorSIM, which allows DSP-24 code to be simulated as it would run on the VT-5260 board.

VectorWare also includes VectorCode, a high-level vector DSP language that enables developers to program at an intuitive level with a line-oriented language to support single-line vector instructions. To implement a 1024-point fast convolution, performing an FFT, frequency-domain filtering, and an inverse FFT, for example, requires just a few lines of code. In this way, the VectorCode language implements complex signal- or image-processing applications in easy to understand vector instructions, leaving the detailed register-level programming of the DSP`s memory management units to the compiler.

Even before the VT-5260 is available, Integrated Sensors Inc. (ISI; Utica, NY), a systems integrator of image-processing and machine-vision systems, has selected the board for a synthetic-aperture-radar application. According to Walter Szczepanski, ISI senior staff engineer, the board was chosen over several other candidate DSP solutions. The VT-5260 is expected to be integrated into a heterogeneous multiprocessing system using the ISI RTExpress rapid-prototyping software environment.

"Priced at $42,000 in single quantities with two processing modules, the VT-5260 VME board may, at first glance, seem expensive," says Szczepanski. "However, when price/ performance comparisons against competing architectures are made, substantial cost savings can be real- ized with the Valley Technologies approach," he adds.

More in Boards & Software