Parallel DSPs exploit MIMD architectures
In image-processing applications such as medical imaging, simple imaging operators such as convolution kernels must perform repeated operations on large data sets. While hardwired multiply-accumulators can perform these tasks, their lack of programmability to perform other imaging tasks has led researchers at Data Flux Systems (Berkeley, CA) and the University of California (Berkeley, CA) to turn to multiple-instruction/multiple-data (MIMD) approaches.
Using an array of eight custom VLSI digital-signal-processing (DSP) integrated circuits (ICs), each containing 48 16-bit processors, Dr. Srini and his team at Data Flux, an R&D company, have constructed a board for an image-processing system that is capable of image digitization, processing, and display. This work was funded by the US Air Force Research Laboratory (Eglin AFB, FL) under an STTR contract, with the University of California at Berkeley acting as a partner under the leadership of Jan Rabaey in the electrical engineering and computer science depart ment. The multi processor digital-signal processing board is an ex perimental system designed to show that image high-speed processing tasks can be achieved by net working simple pro cessors.
"In the design of the board," says Srini, "we used off-the-shelf parts for image digitization and display and an interface to the Sbus." These parts consisted of a Brooktree Bt812 image decoder and a Bt855 video encoder and LSI Logic`s L64853A Sbus interface chip.
To process images at high data rates, however, Srini and his colleagues turned to the PADDI-II, an MIMD parallel-processing IC developed at the University of California at Berkeley. This IC contains 48 processors organized in 12 clusters of four processors each. Each processor has a 16-bit data path; 13 instructions that include add, subtract, shift, and invert; a control unit; and a scan chain for loading data. Each processor in each cluster is linked via a 16-bit data bus and four 1-bit control buses. The 12 clusters are further linked via other 16-bit data and control buses.
To program the processors on the device, a 7464-bit-long scan snakes through the IC, setting registers and loading instruction memory. In this way, multiple processors can be programmed to operate on imaging data in an MIMD or a single program and multiple-data- stream method.
"At 50 MHz, the board provides approximately 20-GOP/s processing power," says Srini. For future use, researchers are planning a complex processing PCI bus board based around the VGI-1 chip, a 2-W, 4-GOP/s processor containing 16 clusters of six processors each. "Using such processors, scaleable systems using dies of the chip can be constructed for greater than 50-GOP/s performance," adds Srini.