Extreme processors use multiple architectures

At the annual InStat/MDR (Scottsdale, AZ, USA; www.instat.com) Analyst Choice Awards (Jan. 5, 2004), a number of new semiconductor vendors announced what In-Stat is touting as "extreme" processors.

Mar 1st, 2004
Th 144663

At the annual InStat/MDR (Scottsdale, AZ, USA; www.instat.com) Analyst Choice Awards (Jan. 5, 2004), a number of new semiconductor vendors announced what In-Stat is touting as "extreme" processors. Targeted at DSP-based compute-intensive applications such as embedded image processing and DSP, these processors are using established SIMD (single instruction/ multiple data), MIMD (multiple instruction/multiple data), RISC, and VLIW (very-long instruction word) architectures-and sometimes a combination of multiple architectures-to achieve speeds as high as 25 GFLOPs. At the Extreme Processor session, Intrinsity (Austin, TX, USA; www.intrinsity.com), Cradle Technologies (Mountain View, CA, USA; www.cradle.com), and ClearSpeed (Los Gatos, CA, USA; www.clearspeed.com) were recognized for their excellence in technology innovation, design, and implementation.

Targeted at embedded-video and image-processing applications, Cradle Technologies' single-chip ECE 3400 multiprocessor features no less than eight independent DSPs, four RISC processors, on-board RAM, and programmable I/O. Unlike the combined RISC/SIMD approach taken by recently introduced processors such as the FastMATH processor from Intrinsity (see Vision Systems Design, December 2003, p. 8), Cradle's approach is a shared-memory MIMD device that uses a single 32-bit address space for all register and memory elements. According to Cradle, using a repeated sequence of DSP and RISC processors, the multiprocessor DSP (MDSP) hardware architecture also achieves a processing density superior to VLIW architectures while delivering linear scaling performance.


FIGURE 1. Cradle Technologies MIMD device uses a sequence of DSP and RISC processors. A cluster of processors is connected by a local bus and shares local memory. Groupings of these processors are then connected by a high-speed global bus and communicate with external DRAM and I/O interfaces.
Click here to enlarge image

Cradle's MDSP architecture consists of multiple processors hierarchically connected by two levels of buses (see Fig. 1). Each processor subsystem consists of four RISC-like processors called processing engines, eight DSP processors, and one memory-transfer engine used for data movement. Programmed using standard ANSI C or a C-like assembly language, the chip is supplied with GNU-based optimizing C-compilers, assemblers, linkers, debuggers, a functional and performance accurate simulator, code profilers, and analysis tools. For real-time implementations, real-time kernels such as the open source, royalty-free, eCOS available from RedHat (Raleigh, NC, USA; sources.redhat.com/ecos/) can be used.

Combining different architectural processing types was also on the mind of ClearSpeed designers, who were also honored at the In-Stat/MDR Awards. The company's processor, the ClearSpeed CS30, sports an architecture that combines both a RISC-like control unit and a SIMD-like array of processing elements. In a white paper entitled "Multithreaded Array Processor Architecture," available on the company's Web site, the details of the design are compared with those of a conventional RISC-based processor (see Fig. 2).


FIGURE 2. In developing the CS30, ClearSpeed has built a processor architecture that combines both a standard RISC-like control unit that is coupled to a series of parallel execution units. While one execution unit handles program flow control, the other elements process parallel data.
Click here to enlarge image

The device appears as a single processor running a single C program where the RISC-like control unit executes a single instruction stream sending instructions to the execution unit that handles flow control, other processing elements, or one of the I/O controllers. Like Intrinsity and Cradle Technologies, ClearSpeed offers a C compiler, graphical debugger, suite of supporting tools and libraries, and PC-based add-in development boards for developers wishing to evaluate the part.

The development of such parts is a shining achievement in semiconductor design. As many industry analysts have pointed out, however, developing such technologies is not always successful. It is also a function of price/performance, ease of porting already established C code, time-to-market, market timing, third-party source availability, company reputation, and numerous other factors. Such architectures do, however, show that the future will not be a choice of a single architecture. More than likely, future systems will rely on a combination of two or more architectures.

More in Boards & Software