Andrew Wilson, Editor, [email protected]
With nearly double the performance of comparable FPGAs, the Arrix family of field-programmable object arrays (FPOAs) from MathStar (Beaverton, OR, USA; www.mathstar.com) are supported by the company’s Machine Vision and Pro Video Libraries of IP cores. The Arrix family can incorporate up to 400 MAC, ALU, and RF “objects,” is clocked to 1 GHz, and features a total throughput of 64 Gbits/s of LVDS, general-purpose, and external memory I/O. Since the introduction of its FPOA family last year, MathStar has secured a number of design wins, most notably with Adaptive Microware (Ft. Wayne, IN, USA; www.adaptivemicro.com), Alacron (Nashua, NH, USA; www.alacron.com), and Honeywell (Morris Township, NJ, USA; www.honeywell.com). While Adaptive Microware and Honeywell will target the video broadcast and military DSP market, Alacron has incorporated an FPOA in its Fast-X GigE series of PCI Express-based frame grabbers.
MathStar is leveraging third-party support from IP companies such as Barco Silex (Louvain-la-Neuve, Belgium; www.barco.com) and Cadre Codesign Technologies (St.-Laurent, QC, Canada; cadrecodesign.com) in the development of specialty codecs. “Barco Silex has already developed MPEG 2 and other codecs for the FPOA,” says Tim Teckman, vice president of engineering at MathStar, “while Cadre Codesign supplies us and our customers with JPEG 2000 encoders.”
MathStar also offers a number of codecs for the machine-vision, medical, defense, and test-and-measurement markets. These include functions for color space conversion, flat-field correction, image scaling and rotation, and more complex algorithms for CT-filtered back-projection and beam-forming applications.
The company’s cofounder and CEO Doug Pihl is also chairman of Vital Images (Minnetonka, MN, USA; www.vitalimages.com), a developer of visualization and analysis software for the medical market. “More and more,” says Teckman, “designers of machine-vision and medical-imaging systems are looking for ways to incorporate reprogrammable parts in their systems so that embedded algorithms can be upgraded in the field.” In the high-performance digital camera market, for example, companies such as Imperx (Boca Raton, FL, USA; www.imperx.com) have incorporated FPGAs into the data pipeline of their cameras allowing such functions as flat-field and defective-pixel correction to be uploaded into the camera systems (see Vision Systems Design, July 2006, p. 20).
“While these applications currently use FPGAs to perform such functions, rapid increases in image sizes and frame rates may limit their use in real-time applications,” says Teckman. One alternative to this is the use of custom ASICs that can perform such functions at gigahertz rates.
“Unfortunately, these ASIC designs are expensive and costly to fabricate, and, once deployed, cannot be upgraded,” he says. FPOAs offer an alternative approach that adds increased speed while retaining the flexibility of FPGA programmability. In designing with FPOAs, developers take advantage of the fact that all the MACs, ALUs, and RF clock to 1 GHz.
“In conventional FPGA designs,” says Teckman, “developers use HDLs to develop their application. After logic synthesis, floor planning, gate mapping, place, and route, the timing must be verified” (see figure). Very often after timing closure is performed, it appears that the device will clock more slowly than anticipated and the designer must re-evaluate and optimize the HDL code. This iterative process can be time-consuming, as well as frustrating. Using building blocks that clock at a constant speed, timing closure can be performed in the design and verification process, eliminating the need for the designer to perform separate timing closure.
Because nearest-neighbor interconnections between objects can clock at 1 GHz, simple functions such as color space conversion that use eight objects can clock at 1 Gpixel/s. Other more-complex programs may consist of multiple objects that use the FPOA’s party line links, of which there are ten per object that can connect to three objects away in one clock cycle.
More complex program, such as flat-field error correction of 40 objects, use these links to move data across objects more than three blocks apart, requiring use of pipelining to maintain gigapixel processing rates. “In essence,” says Techman, “this still provides a much greater performance increase over the use of conventional FPGAs.”