Imaging vendors put G4 processor on PCI boards
To increase image-processing performance, many board-level vendors have optimized their PC-based software offerings for Intel's MMX instruction set.
Andrew Wilson, Editor
To increase image-processing performance, many board-level vendors have optimized their PC-based software offerings for Intel's MMX instruction set. Realizing the benefits of such an approach, companies such as Datacube (Peabody, MA) and Coreco Imaging (St. Laurent, Quebec, Canada) now offer image-processing boards with on-board Pentium processors and optimized image-processing software.
FastVision is a PCI-based image-processing board with on-board FireWire, Channel Link, and packet-switched interface ports. Using dual G4 processors, the board is capable of a peak 6.4-GOPs performance on 8-bit video data.
During the past few months, however, a number of companies including Alacron (Nashua, NH), Matrox (Dorval, Quebec, Canada), and Mercury Computer Systems (Chelmsford, MA) have shunned this approach. Instead, they have introduced PCI-based signal- and image-processing products using the MPC7400 (or "G4") processor from Motorola (Austin, TX).
"The trend to move away from Pentium-based on-board processing to RISC-based microprocessors with on-board vector extensions," says Joe Delfino, Alacron director of sales and marketing, "stems from a number of performance issues. These include Intel's need to maintain software compatibility with older x86-based processors, the number of registers available to the programmer, the efficiency of the instruction set and available programming and development tools," he says.
Alacron's latest PCI-based image-processing board, the FastVision, harnesses two 400-MHz AltiVec-enabled G4 processors, each with 64 to 128 Mbytes of SDRAM per processor (expandable to 528 Mbytes) with the capability of providing more than 2.6 to 3.6 GFLOPs. On-board Channel Link, Firewire, and 64-bit I/O interfaces allow the board to be interfaced to a variety of cameras.
"The biggest problem with MMX and MMX2 (also known as Katmai's New Instructions; KNI) is that the x86 only supports 8 x 128-bit registers compared with Altivec's 32 x 128 bit registers," says software engineer David K. Every (dke@MacKiDo.com). "Since "load/stores" and being register-starved are things you cannot work around, algorithm speed is penalized by using KNI's fewer registers," he says.
Some vendors, such as Coreco Imaging, take issue with analyzing a microprocessor's architecture and then extrapolating these data in the judgment of system's performance. "Although there are differences in the register architecture of the two processors," says Yvon Bouchard, manager of the applications group at Coreco, "other factors such as how well the CPU manages RAM access, pipelines instructions, and the configuration of the processors chipset may be more important than just the number of on-board registers," he says.
Performance is also affected by the efficiency of the processor's instruction set. In image and signal processing, many algorithms use some form of a multiply and add instruction. In the Altivec processor this single instruction is performed in one cycle.
"Unfortunately," says Every, "KNI does not have a multiply and add instruction, and a multiply instruction must be followed by an add. So the KNI is at least half as fast as AltiVec." Intel's MMX is also compatible with processors without MMX. "To accomplish this," says Every, "Intel could not add new registers but "shared" them with the floating-point unit. Every time the CPU switches between floating-point and MMX mode, the computer stalls while saving the processor's states," he says.
"Although this may be true," says Philip Colet, director of marketing at Coreco, "what is more important is how fast a board can perform an image-processing application. In a typical machine-vision application, he says, image filtering may be followed by blob analysis and image classification. Although image filtering may be faster on an AliVec machine, the scalar nature of blob analysis may be more suited to the Pentium. "Overall system performance across image-processing applications should remain the benchmark for these processors," he says.
Already, independent benchmarks are verifying the effectiveness of the AltiVec architecture in image processing. To benchmark the processor, Nicholas Coult, an industrial postdoctoral member of the University of Minnesota (Minneapolis, MN), has used the Hierarchical INTegration (HINT) computer-performance analyzer from Technology Labs (Clear Lake, IA) to determine the performance improvement.
"On modified HINT floating-point benchmark code, the use of Altivec instructions over code that is not optimized results in a 58% peak performance improvement," he says. Apple Computer's own benchmark of image-processing code confirms these benchmarks (see developer.apple.com/hardware/ve/summary. html). In performing matrix multiplies, for example, the AltiVec-enabled PowerPC takes half as many clock cycles.
Programming tools are also an important consideration. According to Every, Motorola's C-like compiler, emulators, and free source-code libraries make it far less costly to develop code for AltiVec than for KNI. Hardware vendors are also offering tools of their own. For it's FastVision board, for example, Alacron also offers the CodeWarrior, a GUI-based development environment from Metroworks (Austin, TX). SAGE, an application-development tool, codeveloped with Honeywell Space Systems (Clearwater, FL), also allows multiprocessor applications to be developed on a single Windows NT or Sun Solaris workstation and then mapped across multiple processors.