FPGA-based algorithms give depth

By Andrew Wilson

Adding depth perception to machine-vision systems is important in applications such as mobile robot navigation, people tracking, gesture recognition, targeting, and 3-D surface visualization. While the algorithms for performing depth perception are widely known, their computational demands often yield slow response times even when executed on today’s fastest computers.

To address this, Focus Robotics (Hudson, NH, USA; www.focusrobotics.com) has introduced its nDepth Vision processor, an OEM subassembly that allows system designers to inexpensively add depth measurements to their systems. According to cofounder Andrew Worcester, the nDepth processor provides 752 × 480 pixels of depth information at a rate of up to 60 frames/s. “This is approximately 30 times faster than using conventional depth perception software running on a 3-GHz Pentium processor,” he says.


Focus Robotics FPGA development depth perception evaluation system features two off-the-shelf CMOS imagers interfaced to an FPGA development board from Xilinx.
Click here to enlarge image

At the recent Robobusiness Conference (May 2005; Cambridge, MA, USA; www.roboevent.com), Focus Robotics unveiled a prototype of the nDepth processor systems that consists of two MT9V022 CMOS microcameras from Micron Technology (Boise, ID, USA; www.micron.com) and a PCI-33 Xilinx Spartan FPGA development board from Xilinx (San Jose, CA, USA; www.xilinx.com).

To digitize images from the cameras, image data are transferred over two LVDS links directly to the Xilinx Spartan 3. Image-sensor control is provided by an I2C interface from the FPGA. The development board’s FPGA, in turn, provides an interface to both USB-based peripherals and additional FPGAs, if required. “To perform image depth measurement,” explains Worcester, “a technique called computational stereo vision (CSV) is used.”

In CSV, a single point in 3-D space projects to two unique camera image locations, left and right. As it is possible to locate these corresponding points in the camera images, the 3-D location of the single point in space can be computed by triangulation. To perform this triangulation, the position and orientation, lens distortion, focal lengths, and optical centers of each camera must be known.

“By preprogramming camera setup information in the FPGA using a checkerboard calibration target,” says Worcester, “it is possible to remove radial distortion and rectify each new image in real time.” Finding the points in the left and right images that correspond to the same physical point in space is called the correspondence problem. “From a theoretical perspective,” says Worcester, “no general solution can exist given the ambiguity which results from texture-less regions, occlusion, and specularities that may exist within images. And from a computational standpoint, trying to match each of the pixels in one image to each of the pixels in the other image is extremely difficult because of the massive number of comparisons required.”

To reduce this computational demand, the nDepth processor first rotates and aligns left and right images so that only a search along one horizontal scan line is required for a corresponding pixel. This image rectification image also reduces any false matches that may occur. “To increase the likelihood of a correct match, the processor examines 9 × 9-pixel regions around each pixel in the left image and searches for the best matching region of equal size in the right.

To compare matches, the sum of absolute differences algorithm is used. Because the horizontal distance (or disparity) searched across the image is inversely proportional to distance, the greater the matched disparity, the closer the object is to the cameras.

“This same algorithm is used by NASA’s twin Mars Exploration Rovers-Spirit and Opportunity,” says Worcester. “However, while the rovers’ on-board general-purpose processor can only achieve a few frames per second on 256 × 256 images, the nDepth processor is capable of 30 frames/s on 752 × 480 images with 92 disparity levels.”

To further reduce errors, the processor also computes correspondence between the right image and left image. If the results differ, they are marked as invalid. This is effective at object borders and occlusions where points are only visible in one camera and no triangulation is possible. In total, the nDepth pipelined processor is capable of 2 billion pixel-disparity operations per second.