Novel memory architecture speeds image processing

For years, designers of image-processing systems have attempted to best match their algorithms with specific hardware architectures; Von Neuman, SIMD, MIMD, and pipelined architectures have all been proposed to increase the speed of specific algorithms. A novel twist on the single-instruction multiple-data (SIMD) architecture using an intelligent memory-based associative processor called the Xium-2 has recently been developed by Associative Computing (Raanana, Israel).

Th Vsd51343 8

Novel memory architecture speeds image processing

For years, designers of image-processing systems have attempted to best match their algorithms with specific hardware architectures; Von Neuman, SIMD, MIMD, and pipelined architectures have all been proposed to increase the speed of specific algorithms. A novel twist on the single-instruction multiple-data (SIMD) architecture using an intelligent memory-based associative processor called the Xium-2 has recently been developed by Associative Computing (Raanana, Israel).

According to Avidan Akerib, vice president of associative computing, associative processing is a natural way to manipulate massive amounts of data in parallel by changing the content of the system memory where it is stored. Unlike conventional processors that use separate microprocessor and memory devices, the Xium IC combines these two devices into one (see figure). As a result, the single device takes the same execution time to process 100 numbers or 2000 numbers. The exact execution time depends on the specific arithmetic function performed.

Because the associative memory on the associative processing array is intelligent, every computing function is based on two operators: compare and write. A combination of compares and writes performs each instruction on the device. If, for example, the memory content in the associative array represents a portion of an image, the values of the pixels could all be changed in parallel by comparing the content of each memory location and writing new values to them, depending on the required result.

Because the Xium-2 works on blocks of data, input image data are usually too large for the processor to store internally in the associative array. The chip is forced to process the data in regions. The three operations of loading the next region, clocking out the previously processed region, and processing the current region are all performed in parallel. The Xium-2 architecture enables parallel execution of data input/output and data processing. It carries out this task by splitting the associative array into two independent sections; one serves as the video input/output channel, and the other serves as the associative computational core. While the associative core is processing a block of data, the input/output section is exiting the previously processed block and accepting the next data block (see table on p. 10).

"A main advantage of the processor is that it produces a linear increase in performance with the increase in the number of processors cascaded together," says Akerib. "The execution time of each instruction is independent of the number of processors, and the instruction is executed in all processors concurrently. If there were four cascaded Xium-2 processors in the system, the performance would be four times faster," he adds.

In addition to developing the Xium-2 processor, Akerib has also designed an Xium-based video-capture and processing board both as a development platform and a reference design for system integrators. The PCI-based board incorporates as many as four Xium-2 processors, program memory, frame buffers, and control logic. Associated compilers, assemblers, and simulators allow developers to use C++ programming language to develop applications and evaluate system performance. For more information, contact Akerib at avidan@asp.co.il.

Th Vsd51343 8
Click here to enlarge image

To perform parallel image processing using the Xium-2 processor, the associative memory is split into two independent sections. The first-in, first-out (FIFO) section serves as the video input/output array. The other or associative-array section performs all of the associative computing.

More in Home