Understanding image-segmentation basics (Part 1)

In designing automated systems for the interpretation or manipulation of image data, system developers often need to perform software imaging operations, called segmentation, that extract information about the structure of objects and to separate and discern various parameters of interest within the data. Measurements or attributes of these objects, known as features, can then be calculated and used for defect inspection, quality control, or clinical qualitative analysis.

Understanding image-segmentation basics (Part 1)

PETER EGGLESTON

In designing automated systems for the interpretation or manipulation of image data, system developers often need to perform software imaging operations, called segmentation, that extract information about the structure of objects and to separate and discern various parameters of interest within the data. Measurements or attributes of these objects, known as features, can then be calculated and used for defect inspection, quality control, or clinical qualitative analysis.

Accordingly, common vision processes deal with the identification of discrete objects within an image. Such processes transform single-pixel representations of the image data into geometric descriptors representing groups of pixel elements. These descriptors, known as objects, take the form of points, lines, regions (blobs), polygons, or other unique representations.

Segmentation techniques are divided into two basic categories: edge-based and region-based (see Fig. 1). Edge-based segmentation is primarily used to look for image discontinuities. The technique is generally applied where changes of gray-level intensity occur in the image. The assumption is that changes occur in the data at the boundary between objects of interest. The output of edge-segmentation schemes can be

x and y gradient (rate of change)--two images are used to represent the edges found, one in the x (horizontal) direction and one in the y (vertical) direction

gradient strength and direction

binary edge map

edge representation.

In contrast, region-based segmentation is used to look for similarities between adjacent pixels. That is, pixels that possess similar attributes are grouped into unique regions. The assumption is made that each region represents one object of interest. Using gray-level intensity is the most common means of assigning similarity, but many other possibilities exist, such as variance, color, and multispectral features (see Fig. 2).

Most commercial vision systems use region-based segmentation schemes based on pixel-intensity values. These segmentation techniques assume that the objects of interest possess uniform shading and that a significant and constant gray-level change occurs between the object(s) of interest and the background. However, in many vision applications, these assumptions have proven erroneous. Therefore, these techniques are considered fragile and commonly require controlled conditions or human supervision.

Effects of uneven sample illumination, shadowing, partial occlusion, clutter, noise, and subtle object-to-background changes can all contribute to errors in basic segmentation pro cesses. They generally result in false segmentations of the background, partial segmentations of the objects of interest, clumping of objects, or inadequate segmentations. Errors in the segmentation of the data can also result in the calculation of erroneous features. Therefore, it is essential that the segmentation method chosen support the final processing goals of the vision system.

Some applications, such as robotic vision, can endure a few errors in the segmentation process, as long as the objects are classified correctly. In other applications, such as clinical analysis, exact feature measurements might be critically important and may dictate the use of a sophisticated and computationally expensive segmentation algorithm.

Thresholding

Thresholding is perhaps the most common segmentation technique and is the most basic region-segmentation technique. The technique separates pixels into background and foreground (object of interest) classes based upon their similarities in gray-level intensity. To implement this technique, a threshold (T) value is chosen. Every pixel in the image is then compared to the T value. Each pixel is given a region label of "0" (background) if the pixel value is less than or equal to T or "1" (foreground) if greater than T. This form of region segmentation results in a binary image, in which each region is either white (1) or black (0). Many variations exist within the general concept of segmentation by thresholding, which will be discussed in a future column.

The threshold value is critical, as it sets the discriminating criterion for the segmentation process. Most often, the threshold selection is arrived at by user interaction with a series of test data. An alternative method is to set threshold points at valleys in the histogram of the pixel data. This makes the threshold process more adaptive and can be used to select multiple threshold points if multiple object classes are desired. However, this method will work only if the objects truly have uniform pixel distributions within the classes, which is most often not a valid assumption.

Uneven lighting can be extremely detrimental to this thresholding process. Prefiltering techniques, such as low-stop filtering, can be used to correct uneven lighting conditions (see Vision Systems Design, June 1998, p. 19). Alternatively, local pixel neighborhood-based thres hold selection, rather than image-wide or global threshold selection, can be used. Therefore, the threshold operator is made ad aptive and can ac count to some degree for changes across the image (see Fig. 3).

Whereas thresholding attempts to build regions in the image based upon similarities in the pixels, another popular segmentation technique attempts to identify points in the image where there are dissimilarities, or differences in pixel values. The most popular of these techniques is boundary detection by locating object boundaries as the points of maximal gradient or change.

One method used to find these boundaries is to calculate the zero-crossing points in a Laplacian of the Gaussian two-dimensional second-derivative analog of the image data, as proposed by D. Marr and E. Hildreth.1 Because this technique involves filtering out some portion of the high-frequency information in the data, the results obtained with this method may either not yield enough detail due to the smoothing action of the Gaussian or yield too much detail due to noise present in the data.

Another popular segmentation technique is that of averaging edge detectors, edge templates, or compass masks such as the Sobel, Prewitt, and Roberts operators. This method locates points in the image where significant gradient changes are occurring and uses pixel-neighborhood averaging to reduce the effects of noise. However, the result of applying these operators is not an image segmentation. Most often, the application of these operators is followed by a threshold step such as discussed in Fig. 3.

Pattern matching

Various morphological and transform techniques can be used to locate objects of interest through a process of pattern matching. The most common technique uses a binary or gray-level prototype or template. The premise here is that a high degree of correlation exists between the template and the object being matched. While this process may work well for well-controlled conditions such as those in manufacturing identical parts or components, it leaves much to be desired for use in other uncontrolled environments. It is often the case that objects of interest in clinical analysis, security, and medical applications are irregular, ambiguous, and amorphous.

The result of pattern matching is most often considered to be the final result: an identification of the part being matched and a point in the image on which the match is centered. In more sophisticated systems, pattern matching is used to define an area of interest, so that more intensive processing or other segmentation techniques can be applied about the match. The object match can be used to control subsequent process branching or to process parameter settings (see Fig. 4).

Part 2 of Understanding image-segmentation basics, will present more details on segmentation and will look at some interesting variations and combinations of region- and edge-based segmentation techniques.

PETER EGGLESTON

REFERENCE

1. D. Marr and E. Hildreth, "Theory of Edge Detection," Proc. Royal Society of London, B 207, 187 (1980).

PETER EGGLESTON is senior director of business development, Amerinex Applied Imaging Inc., Amherst, MA; e-mail: eggleston@aai.com.

More in Consumer Packaged Goods