Combining scene understanding and object recognition

While many object-recognition systems identify single objects, scene-understanding systems can label regions within images. Although recent object-recognition research has concentrated on recognizing complex objects through deformable models, few systems are available that can combine machine learning and computer vision.

Combining scene understanding and object recognition

While many object-recognition systems identify single objects, scene-understanding systems can label regions within images. Although recent object-recognition research has concentrated on recognizing complex objects through deformable models, few systems are available that can combine machine learning and computer vision.

To address this problem, researchers Craig Dillon and Terry Caelli of the department of computer science, Curtin University of Technology (Perth, Australia) have developed a general-purpose scene-understanding and object-recognition system named Cite. Using supervised incremental learning to construct a hierarchical knowledge base, the software hierarchically interprets each scene and segments each image in the scene.

"Cite combines hierarchical knowledge representation with incremental learning of object descriptions and relationships to provide an integrated scene-understanding and object-recognition system," says Dillon. In operation, the system stores information in a knowledge base, a scene interpreter, and a visual interpreter. The knowledge base contains a hierarchy of parts, views, and types that is incrementally learned using several different scenes. It is constructed from a series of training scenes. Incorrect classifications are corrected by supervised learning after each scene is processed.

"Most previous object-recognition systems used a single segmentation algorithm with a fixed set of parameters to segment images. Cite operates on knowledge-driven resegmentation. Therefore, when an object is only partially identified, the software can resegment the appropriate image region using segmentation knowledge stored at each knowledge-base node," he says.

Scene interpretation is represented as a similarly constructed graph with the nodes representing objects and views of objects found in the scene. Hypothesis links connect the nodes in the scene interpreter to nodes in the knowledge base. Each image of the scene, taken from a different camera viewpoint, is represented as a hierarchical segmentation called visual interpretation. The nodes within each image`s segmentation are connected by links to the nodes in the scene interpretation that represents the current state of the scene contents.

The X11/Unix-based software currently runs only on workstations from Silicon Graphics (Mountain View, CA) and is downloadable from the World Wide Web at www.cs.curtin.edu.au/~cdillon, But, according to Dillon, plans are underway to port the software to other computer systems.

For more information,contact Craig Dillon and Terry Caelli via e-mail:

cdillon@cs.curtin.edu.au and tmc@cs.curtin.edu.au, respectively.

More in Boards & Software