Programmable embedded processor optimizes feature-extraction algorithms

With support from the National Science Foundation and the Gigascale Systems Research Center, a team from the University of Michigan has developed a multicore processor specifically designed to increase the speed of feature-extraction algorithms.

Programmable EFFEX embedded processor optimizes feature extraction algorithms
Programmable EFFEX embedded processor optimizes feature extraction algorithms

With support from the National Science Foundation (NSF; Arlington, VA, USA) and the Gigascale Systems Research Center (GSRC; Princeton, NJ, USA), Silvio Savarese of the University of Michigan (Ann Arbor, MI, USA) and his colleagues have developed a multicore processor specifically designed to increase the speed of feature-extraction algorithms.

Announced at the June 2011 Design Automation Conference (DAC) held in San Diego, CA, the EFFEX processor uses a programmable architecture, making it applicable to a range of feature-extraction algorithms such as the Features from Accelerated Segment Test (FAST), the Histogram of Oriented Gradients (HOG), and the Scale Invariant Feature Transform (SIFT).

“These algorithms represent a tradeoff of quality and performance, ranging from the high-speed, low-quality FAST algorithm to the very high-quality and computationally expensive SIFT algorithm. In addition, these algorithms are representative of the type of operations typically found in feature-extraction algorithms,” says Savarese.

Although these algorithms are complex, many of the operations used within them are repetitive, allowing them to be optimized by a combination of functional units and parallel processing.

For example, one of the major computational tasks performed in the FAST algorithm is corner location—performed by comparing a single pixel to a number of surrounding pixels—a task that lends itself to implementation in a hardware-based multiplier accumulator (MAC). Since SIFT finds image features by applying a difference of Gaussian function in scale-space to a series of smooth and resampled images, this also requires multiple convolutions, and its speed can also be increased using a MAC.

Similarly, the most computationally intensive function within the HOG algorithm is building image feature descriptors using a histogram of the gradients of pixel intensities within a region of the image, a function whose speed can also be increased in hardware.

To exploit the parallelism inherent in these algorithms, the EFFEX processor consists of one complex core surrounded by a variable number of simple cores in a mesh network (see figure). While the complex core is used to perform high-level system tasks, the simple cores are dedicated to the repetitive tasks associated with pattern-recognition algorithms.

Three application-specific functional units are incorporated into the processor to speed the feature-extraction tasks. These include a one-to-many compare unit, a MAC used for convolution, and a gradient unit.

“Since searching for feature point locations in algorithms such as FAST and SIFT involves comparing a single pixel location and its neighbors across an entire image, this processing can be performed by the one-to-many processing unit in hardware and in parallel using an instruction extension of the EFFEX processor,” says Savarese. This simple one-to-many compare unit can also be used for histogram binning by comparing a value to the limits of the histogram bins.

Convolution functions used in SIFT and HOG are similarly performed in parallel using the convolution multiply accumulate (CMAC) unit. To speed the gradient computation functions associated with SIFT and HOG, the processor employs a single-instruction multiple-data (SIMD) gradient processor that computes the gradient of an image patch using a Prewitt version of the Sobel convolution kernel.

To benchmark the processor, Savarese and his colleagues simulated the architecture with a single complex core based on a 1-GHz ARM A5 (with floating point) model and eight specialized simple cores.

For the SIFT, HOG, and FAST algorithms, the resulting speed increases over a single ARM core running at the same speed were 4x, 12x, and 4x faster, respectively. At present, Savarese is looking to expand the processor’s functionality to include the machine-learning-based algorithms used in analyzing extracted features.

--By Andrew Wilson, Vision Systems Design

More in Boards & Software