MIPS-based processor targets imaging applications

OEM suppliers of smart cameras and add-in boards can use a number of processors to perform image-processing and machine-vision tasks.

Th 137966

OEM suppliers of smart cameras and add-in boards can use a number of processors to perform image-processing and machine-vision tasks. By choosing to use general-purpose microprocessors such as Intel's Pentium for this task, developers can leverage the large number of software-development tools and off-the-shelf image-processing code at the expense of execution speed. Because of this, RISC-based architectures and digital-signal processors (DSPs) are often used to speed the multiply/accumulate functions associated with image-processing algorithms.

To increase this processing speed further, hardware-based implementations of specific functions such as the fast Fourier transform (FFT) are now offered as hardwired functions by companies such as DSP Architectures (Vancouver, WA, USA; www.dsparchitectures.com) or as gate-array-based programmable cores from Xilinx (San Jose, CA, USA; www.xilinx.com) and Altera (San Jose, CA, USA; www.altera.com). Because a large amount of data parallelism is inherent in image-processing applications, single-instruction-multiple-data (SIMD) architectures provide a better programming model. In such architectures an arrangement of multiplier/accumulators (often square) is fed a single instruction and the data or image matrix moved to the processing elements.

The speed of functions such as convolution, which require a standard template to be computed over the image, can be rapidly computed using such architectures. Despite valiant efforts by many universities and some image-processing vendors to make such architectures commercially available, however, the lack of high-level software to program these architectures has limited their use.

To ease the programmer's task in deploying such systems, Intrinsity (Austin, TX, USA; www.intrinsity.com) has developed a novel single-chip architecture that combines the power of a 32-bit fixed-point MIPS core with a 4 ¥ 4 SIMD array of 32-bit processing elements, each with local register file. A 2.5-GHz adaptive signal processor, dubbed FastMATH, was revealed at the October In-STAT/ MDR Microprocessor Forum.

Capable of executing more than 688,000 1024-point FFTs/s, the processor is now sampling at speeds of 1.5 and 2.0 GHz. Using a MIPS core as the heart of the FastMATH processor gives it a standard programming model that is already supported by major vendors such as Green Hills Software (Santa Barbara, CA, USA; www.ghs.com) and Wind River Systems (Alameda, CA, USA; www.windriver.com). The MIPS core sends the matrix instructions in the instruction stream, along with scalar data from the MIPS core register file as needed, to the matrix engine.

Th 137966
Intrinsity FastMATH processor is a hybrid MIPS/SIMD device that has been tested at speeds of 2.5 MHz. Targeted at signal- and image-processing applications, the processor can execute more than 688,000 1024-point FFTs/s.
Click here to enlarge image


FastMATH-processor-evaluation kits are currently available and consist of a development environment with signal-processing libraries to speed software development for the FastMATH processor's matrix and vector instruction extensions. The development kits also include library source code and an evaluation copy of Green Hills Software MULTI and Wind River Tornado Integrated Development Environments.

To speed applications development, Intrinsity is also offering a 30-day free trial of Green Hills tools to allow program developers to test image-processing code running on Intrinsity's PC-based simulator, a cycle accurate tool that simulates the performance of the FastMATH processor. Header files included with the FastMATH toolchain packages also contain C macros and inline functions based on C intrinsics, a set of C macros and inline functions that provide compact C statements to perform simple operations such as shift add and matrix multiplication. An evaluation board, priced at $7000, can evaluate the performance of the device.

"From a programmer's perspective," says Ken McCormick, senior applications engineer with Intrinsity, "the individual register files form a set of 16 matrix registers, each holding 64 bytes of data, arranged as a 4 × 4 matrix of 32-bit elements that can each synchronously execute the same matrix instruction in parallel." In the matrix unit, the processing elements are connected by rows and by columns in a full mesh, unlike systolic arrays or other systems where the connections are to nearest neighbor only. Thus, each element can broadcast a value to all the other elements in its row or column and each element can use operands from local registers or from a broadcast during each operation.

Expected pricing of the processor will be $300 in 1000-piece quantities. Already, a number of smart-camera vendors are evaluating the device. And, according to McCormack, three third-party image-processing-board vendors are already evaluating the device and are expected to offer products based around it sometime next year.

More in Boards & Software