Understanding image understanding
Understanding image understanding
Andy Wilson Editor at Large
Off-the-shelf imaging products are now being used to develop sophisticated image-processing systems for a variety of applications. While many imaging systems use proven mathematical tools, such as template-matching and Fourier analysis, to recognize and analyze images, the newer systems are using such advanced techniques as neural networks and wavelet analysis to understand features within images (see p. 32).
Despite rapid advances in solid-state cameras, processors, software, and storage systems, current vision systems are still primitive when compared to the human visual system and cognitive process. Unlike machine-vision systems that perform image-processing functions with computer-based hardware and software, the human visual system performs image recognition in a distributed fashion. Receptors in the human eye separate gray-scale and color images, while the visual cortex provides preprocessing functions such as edge detection. Storing, retrieving, and recognizing images are also performed in a complex, distributed fashion that is not yet well understood.
In machine-vision systems, image sensors are starting to rival the resolution of the human eye. Image processors can perform edge detection and other primitive operations as fast as the visual cortex. And image-storage systems are now capable of storing millions of images. But many components of commercial image-processing systems remain unintelligent, relegating the processing of images to dedicated hardware and software.
In some applications, however, OEMs are recognizing that a distributed approach to machine vision offers a more-effective solution. In web-inspection systems, for example, CCD cameras with on-board DSPs or custom-built hardware are being used to preprocess images, reducing the amount of data that need to be processed by the host CPU.
Making such sensors intelligent by using on-board CPUs is certainly a step in the right direction. But it is still a far step to a fully distributed image processor. Building such a system requires an approach in which every aspect of the system is fully understood. To emulate the human eye, for example, intelligence could be added to every sensing element on an imaging device in the form of a primitive image processor. Processing functions, such as edge detection, could then be performed very rapidly by using a single-instruction multiple-data approach.
To build more sophisticated systems with image understanding, feature extraction techniques are being incorporated. At present, such techniques are often relegated to software running on Von-Neumann architectures. Although such software is proving useful in applications from fingerprint analysis to target recognition, it is most often application-specific and cannot be generalized to other applications.
Here, the underlying problem is one of figuring out which distributed architecture best fits an image-understanding system. To emulate the human visual process, it must be one that is both trainable and able to store feature sets of images. And it is likely that a neural-network-like approach realizes this type of system far better than a Von-Neumann architecture.
But by using Von-Neumann-like processors and off-the-shelf development tools, system developers are realizing relatively sophisticated vision systems. Because of this success, the development of more sophisticated, distributed imaging architectures has been relegated to research or classified defense programs. This is unfortunate because Von-Neumann machines are still inadequate for building distributed image-understanding systems. Only when such systems are developed in a unified, distributed way will image-understanding systems become more sophisticated.