VLIW design pushes processor performance
"Media processing applications, such as video compression and decompression, image synthesis, and image understanding, demand very high arithmetic rates," says Scott Rixner of Stanford University (Stanford, CA).
"Media processing applications, such as video compression and decompression, image synthesis, and image understanding, demand very high arithmetic rates," says Scott Rixner of Stanford University (Stanford, CA). "To operate in real time on large images, these applications currently demand 1010 to 10 11 operations per second. To achieve these high operation rates, processor architectures with tens to hundreds of arithmetic units are required."
To meet these needs, researchers at Stanford are developing a programmable architecture, dubbed Imagine, that is claimed to achieve the performance of special purpose hardware on image and signal processing tasks. To date, the Imagine architecture has been simulated as sustaining up to 7.2 GFLOPs while performing an FFT and an average of 9.4 GOPs across a range of media-processing kernels. According to Rixner, Imagine will perform 10 to 20 times faster than the TMS320C6201 and C6700 from Texas Instruments (Dallas, TX).
Imagine is designed to achieve the performance of a special-purpose image-processing engine with the flexibility of a general-purpose programmable processor. A single Imagine chip is expected to sustain in excess of 10 GFLOPs on problems such as image processing, polygon-based graphics, and signal processing.
Imagine is a programmable single-chip processor that supports stream programming. The processor provides a three-tiered storage bandwidth hierarchy consisting of a streaming memory system, a large stream register file, and direct forwarding of results among arithmetic units.
Imagine achieves its performance through a combination of vector processing, VLIW arithmetic clusters, a streaming memory system, and conditional stream operations. Vector performance is achieved by interleaving stream elements among the eight computation clusters, each consisting of many distinct arithmetic logic units (ALUs). "Each Imagine chip has a network interface to allow communication among Imagine chips, so that multiprocessor designs can be built if individual chips do not provide enough performance," says Rixner.
At the heart of the device is a general-purpose stream register file that is connected to clusters, the network, the memory system, and a host processor. The stream register file is program controlled and serves as a staging area for data that are used by the other units. Memory accesses are reduced by keeping frequently used data in the stream register file.
The Imagine chip is controlled by a host processor, which accesses Imagine control and status registers and issues commands via a host interface. The host interface also is connected to the stream register file, so that data can be loaded into the machine and then sent to the memory system, network, or any other unit connected to the register file. Programmed using C++ , the library calls on a host processor and calls pass instructions to Imagine using the processor's stream instruction set. Load and store instructions move streams between the stream register file and memory.