3D imaging gives vision systems a different perspective
Numerous methods of implementing 3D vision systems are adding a new dimension to machine vision systems
Numerous methods of implementing 3D vision systems are adding a new dimension to machine vision systems.
Andrew Wilson, European Editor
If there was any doubt left that adding 3D capability to machine vision systems was unimportant, it was quashed by last October’s announcement that Cognex (Natick, MA, USA;www.cognex.com) had acquired both EnShape (Jena; Germany; www.enshape.de) and Aqsense (Girona, Spain; www.aqsense.com). While Cognex gains EnShape’s expertise in 3D pattern projection hardware and software for full-field 3D shape measurement, the company will no doubt leverage the capability of Aqsense’s SAL3D software that allows developers to manipulate point cloud data and graphically display, analyze and edit rendered images.
Such acquisitions are important since they demonstrate the interest now being paid by major companies in 3D machine vision. Needless to say, however, while large imaging companies may have developed or purchased important technologies such as stereo imaging, structured light scanning and pattern projection, these are only a few methods of performing 3D imaging.
Figure 1.As an example of a passive stereo 3D measurement system, the Ricoh SV-M-S1 uses two calibrated cameras and stereo triangulation to capture image data. The camera achieves an accuracy of ±1 mm at measurements of 1 m and provides a measurement visual field of 500×400 mm and a working distance of between 800-1200 mm.
As Srikanth Varanesi of the Department of Applied Signal Processing at the Blekinge Institute of Technology (Karlskrona, Sweden;www.bth.se) points out in his Master’s Thesis “3D Object Reconstruction using XBOX Kinect v2.0,” the many methods of performing 3D can be classified as either passive or active (http://bit.ly/VSD-KIN).
Passive methods do not require controlled illumination and instead rely on the features of an object to be imaged by a single or multiple cameras under ambient lighting conditions. As such, 3D features of an object may be discerned by methods such as structure from motion, stereo vision, shape from texture and shape from focus.
Active methods, however, employ controlled illumination methods to illuminate the object being scanned and include such technologies as structured light imaging, pattern projection, shape from shading and time-of-flight (ToF) imaging. A review of many of these methods can be found in “Robot Guidance Using Machine Vision Techniques in Industrial Environments: A Comparative Review,” by Luis Pérez of the Fundación PRODINTEC (Gijón, Asturias, Spainwww.prodintec.es) that can be accessed at http://bit.ly/VSD-ROBGUI.
Many of today’s passive 3D imaging systems are implemented with fixed stereo vision cameras that incorporate two factory-calibrated cameras set at a fixed baseline (distance between cameras). Examples of such fixed stereo vision cameras are the Bumblebee2 and Bumblebee XB3 from FLIR (Richmond, BC, Canada;www.flir.com/mv), the Ensenso 3D N and X Series from IDS Imaging Development Systems (IDS; Obersulm, Germany; https://en.ids-imaging.com), the SV-M-S1 3D industrial stereo camera from Ricoh (Tokyo, Japan; www.ricoh.com) – shown in Figure 1 - and the Scorpion 3D Stinger Camera from Tordivel (Oslo, Norway, www.scorpionvision.com).
In choosing such a camera to perform 3D measurements, systems integrators must consider the application in which they will be used. Since the two cameras employed will have a fixed baseline, capturing two images of the same scene and comparing the distance between corresponding points within them results in a disparity (distance) map that encodes the difference in horizontal coordinates of the points that are inversely proportional to the scene depth at corresponding pixel locations. This disparity map is generated as a 2D greyscale image to indicate the distance between known features in both the images. However, since disparity is proportional to the camera separation, the accuracy of the depth determined will increase with a larger camera baseline. However, as the camera separation becomes larger, it may be more difficult to obtain corresponding points within the two images. A mathematical explanation of this concept can be found on the website of National Instruments (Austin, TX; USA;www.ni.com) at http://bit.ly/VSD-NI.
This stereo correspondence problem may be compounded by the fact that there is little or no texture on the object or that it is highly reflective. If this is the case, then depth information may be difficult or impossible to calculate. In such cases, it is necessary to use stereo cameras in conjunction with a pattern projector to illuminate a known pattern on the object from which corresponding points can be calculated. This is the principle behind the Ensenso camera from IDS that uses projected texture stereo vision to provide a more detailed disparity map and thus more complete depth information of the scene being imaged.
Systems developers wishing to build custom systems to perform 3D imaging may wish to purchase individual cameras to perform the task. To do so, Nerian Vision Technologies (Leinfelden-Echterdingen, Germany;www.nerian.com) has developed an online calculator that aids developers in selecting the correct lenses for such set-ups, the required focal length, the corresponding field of view, the baseline distance and the expected depth error. The calculator can be accessed at: http://bit.ly/VSD-NER.
Figure 2.Pre-calibrated laser/camera systems such as the ECCO family from SmartRay alleviate the need for systems integrators to perform laser/camera calibration.
While the advantages of such an approach is that specific camera baselines and thus depth accuracy can be more easily specified, such cameras will require calibration software. Using a known 3D target pattern such as a three orthogonal planes with chessboard patterns of black and white squares can be captured from which the correspondences between the two cameras are then determined (see “Passive 3D Imaging” by Stephen Se and Nick Pears;http://bit.ly/VSD-PASS).
To generate an accurate disparity map demands that specific features must be extracted from each corresponding image. To do so, a number of different methods exist. Perhaps the most well-known of this is the Scale Invariant Feature Transform (SIFT), originally conceived in 1999 by Dr. David Lowe, now a Senior Research Scientist at Google (Seattle, WA, USA;www.google.com). Other methods include both feature and texture-based algorithms such as color histogram analysis, the sum of absolute differences (SAD), Features from Accelerated Segment Test (FAST), Speeded Up Robust Features (SURF) and the Kanade–Lucas–Tomasi (KLT) feature tracker.
Dr. M.M. El-gayar of the Mansoura University (Mansoura, Egypt;www.mans.edu.eg) evaluated five feature detection methods (excluding SAD and KLT) and found that Fast SIFT (F-SIFT) has the best overall performance above SIFT and SURF but suffers from detecting very few features and therefore matches (see “A comparative study of image low level feature extraction algorithms,” http://bit.ly/VSD-FEAT).
Shape from motion
Just as feature-detection algorithms can be used to extract features from pairs of fixed stereo cameras, they are also used in single camera implementations. Here, instead of using two cameras, a single camera, often mounted on a robot, is used to position the camera at different points around an object to produce stereo image pairs. To do so, the image taken by the camera at one position must be mapped to the image taken at the second position. This can be accomplished using feature-extraction algorithms, optical flow methods in which the pixels are tracked from one image to another, or by using a combination of both feature extraction and optical flow.
An explanation of these methods and how to implement them using OpenCV can be found at “Structure from Motion and 3D reconstruction on the easy in OpenCV 2.3+” (http://bit.ly/VSD-STRUC). Once these pixels are mapped, the motion of the camera can be determined and thus a disparity map generated from the two images. Such a task has also been implemented by The MathWorks Inc. (Natick, MA, USA; www.mathworks.com) in MATLAB, see “Structure From Motion From Two Views,” (http://bit.ly/VSD-STRUC2) and by Chris Sweeney in the Theia computer vision library (see “Structure from Motion (SfM)” at http://bit.ly/VSD-STRUC3. To date, a number of companies have implemented commercial implementations of such shape from motion techniques. These include Motoman (Miamisburg, OH, USA; www.motoman.com) with its Motosight 3D Cortex Vision system (http://bit.ly/VSD-MOT), Robotic Vision Technology (Silver Spring, MD, USA; www.roboticvisiontech.com) with its eVisionFactory (eVF) and ISRA (Darmstadt, Germany, www.isravision.com) with its Adapted Uncalibrated Robot Automation (AURA) 3D robot guidance system.
“In addition to binocular disparity, shading, texture, and focus all play a role in how shape can be perceived. The study of how shape can be inferred from such cues is sometimes called shape from X, since the individual instances are called shape from shading, shape from texture, and shape from focus,” says Richard Szeliski in his book “Computer Vision: Algorithms and Applications” (http://bit.ly/VSD-ALAP). While generating 3D images by analyzing the texture of an object still appears to be the subject of much research (see “Shape from Texture;” http://bit.ly/VSD-SHAPE), shape from focus methods have, it seems, been limited to microscopy applications (see “Shape from focus: fully automated 3D reconstruction and visualization of microscopic objects” (http://bit.ly/VSD-SHAPE2).
Indeed, the most successful commercial applications of the shape from X instance is that of shape from shading. While the original concept of shape from shading is to compute the 3D shape of a surface from one image of the surface, in practice this results in a number of ambiguously reflected surface normals. However, by illuminating the object from three or more directions, this ambiguity is resolved, making such illumination systems a necessity for practical implementations of the technology.
This is the principle behind the DotScan system (Figure 3) from In-situ (Sauerlach, Germany,www.in-situ.de), the LumiTrax system from Keyence (Osaka, Japan; www.keyence.com), and Sirius Advanced Cybernetics’ (SAC; Karlsruhe, Germany; www.sac-vision.de) Trevista system (see “3D expands the dimensions of vision systems;” Vision Systems Design, January 2016; http://bit.ly/VSD-3DEX).
While such systems are active 3D imaging systems (since the lighting used is not ambient), other active systems use laser light sources with which to illuminate an object. Several methods exist to generate a 3D image from such projected light patterns. By using a single structured laser light, for example, projected across the object, the 3D surface shape is extracted by digitizing the reflection of the distorted projected structured light pattern using a CCD or CMOS-based camera. This can be accomplished in a number of ways.
Figure 3.Shape from shading techniques are used by In-situ in its DotScan system designed to recognize embosser malfunctions in the early production of Braille books.
At Origin Technologies (Madison, AL, USA;www.origintech.com), for example, the company has developed a cross-hair imaging method for scanning circular features such as fasteners using two perpendicular laser stripes to give two independent scans in two planes obviating the need to use multiple scans.
More commonly, 3D images of such objects are built by using a calibrated camera and laser combination in which a series of profiles is digitized from the reflected laser light as the object moves past the camera/laser system’s field-of-view. By measuring the X, Y and Z co-ordinates, a point cloud can then be generated to render a 3D map of the object’s surface.
To develop such systems, both the laser and cameras can be purchased separately and calibrated manually or purchased in pre-calibrated systems. Structured laser lights from Coherent (Santa Clara, CA, USA;www.coherent.com), Osela (Lachine, QC, Canada; www.osela.com), ProPhotonix (Salem, NH, USA; www.prophotonix.com) and Z-LASER (Freiburg, Germany; www.z-laser.com) can all be used to provide the structured light in such systems. However, as Wallace Latimer, former Product Line Manager at Coherent points out, such laser line projection systems can be implemented in several different ways, each of which has its own unique characteristics, advantages and disadvantages (see “Understanding laser-based 3D triangulation methods,” Vision Systems Design, June 2015; http://bit.ly/VSD-3DTRI).
Rather than develop such laser/camera systems, pre-calibrated systems that alleviate the need for systems integrators to perform laser/camera calibration are available from numerous vendors. These include the DS1000 Series displacement sensors from Cognex (Natick, MA, USA;www.cognex.com), the L-V Series In-line Profilometer from Keyence ( Itasca, IL, USA; www.keyence.com), the Gocator series from LMI Technologies (Burnaby, BC, Canada; www.lmi-technologies.com), the TriSpector1000 Series from SICK (Waldkirch, Germany; www.sick.com), the ECCO family (Figure 2) from SmartRay (SmartRay, Wolfratshausen, Germany; www.smartray.com) and the Scorpion 3D Stinger from Tordivel.
Unlike single line laser/camera combinations where the laser/camera combination captures single laser line reflections as the object or scanner moves, pattern projection methods can be used to capture a complete image in a single scan without motion. Such pattern projection systems can be used where an object is stationary for a known period of time such that spatially varying multiple-cycle patterns can be projected across the object. As the geometric shape of the object changes, the reflected distortion of this structured light pattern on the captured image can then be compared with the known projection pattern and the 3D geometric shape computed.
Numerous pattern-projection methods exist to perform this task, including binary coding that uses black and white stripes to form a sequence of projection patterns, gray-code projection and phase-shift techniques and spatially-varying color patterns. All of these have been well documented by Jason Geng of the IEEE Intelligent Transportation System Society (Rockville, MD, USA;www.ieee-itss.org) in his paper “Structured-light 3D surface imaging: a tutorial,” that can be accessed at http://bit.ly/VSD-STRUC4.
To date, a number of different manufacturers have used this technology to develop pattern projection systems. Among these are the RV1100 3D Machine Vision System from Canon (Tokyo, Japan;www.canon.com) the Inspect and Detect products from EnShape and the StereoScan neo color projection system from AICON 3D Systems (Braunschweig, Germany; www.aicon3d.com). To demonstrate the capability of such scanners, Denso (Southfield, MI, USA; www.denso.com) teamed up with EnShape to develop a 3D bin picking system that can recognize and process asymmetrical complex parts within 1.5s, even when they are piled randomly.
Time of flight
Just as stereo vision, shape from X techniques, structured light and pattern projection systems can be used to generate 3D models of an object, so too can ToF cameras. Such active 3D systems incorporate both illumination sources that use either pulsed or continuous wave light sources. While pulsed-light-based cameras measure directly the time for a light pulse to travel from the illumination source and back, continuous-wave-modulated cameras measure the phase difference between the emitted and received signals. In this way, distance measurement can be computed. A comparison of the differences between stereo vision, structured light and ToF systems can be found in Larry Li’s paper “Time-of-Flight Camera– An Introduction” athttp://bit.ly/VSD-TOF.
Several manufacturers offer products that incorporate both pulsed-light and continuous wave imaging techniques. Among those that supply pulsed light ToF cameras are Advanced Scientific Concepts (Santa Barbara, CA, USA;ww.advancedscientificconcepts.com) with its Peregrine 3D Flash LIDAR camera, odos imaging (Edinburgh, Scotland; www.odos-imaging.com) with its real.iZ-1K-VS vision system and the ToF Camera – shown in Figure 4 - from Basler (Ahrensburg, Germany; www.baslerweb.com).
Microsoft (Redmond, WA; USA;www.microsoft.com), a pioneer in the development of continuous wave 3D cameras with its Kinect system, also has a number of competitors, most notably ifm efector (Exton, PA; www.ifm.com) with its O3D Smart Sensor that use a ToF imager from PMD Technologies (Siegen, Germany; www.pmdtec.com). Other companies that offered such products included Fotonic (Stockholm, Sweden; www.fotonic.com), recently acquired by Autoliv (Stockholm, Sweden; www.autoliv.com). According to Fotonic, the company will not further develop or launch any new products. Likewise, the SR4000/5000 systems from Mesa Imaging (Zurich, Switzerland; www.mesa-imaging.ch), recently acquired by Heptagon (Santa Clara, CA, USA; www.hptg.com), were discontinued last month.
Figure 4.Among those that supply pulsed light time of flight cameras are Basler with its Time-of-Flight (ToF) Camera. The camera has an integrated light source with eight LEDs that operate in the 850nm range.
According to Radu Horaud and his colleagues at INRIA (Montbonnot Saint-Martin, France;www.inria.fr) the accuracy of depth measurements of such ToF devices depends on multiple factors, such as the surface properties of the scene objects, illumination conditions and frame rate. In his excellent paper, “An Overview of Depth Cameras and Range Scanners Based on Time-of-Flight Technologies,” Horaud compares many ToF systems, existing designs, prototypes, commercially available devices and discusses the benefits and challenges of combined ToF and color camera systems (http://bit.ly/VSD-TOF2).
With the interest now being shown in 3D imaging, it is no wonder then that large corporations such as Cognex are investing in this technology. Cognex, however, is not alone. Last year, for example, the assets of German manufacturer DAVID Vision Systems were acquired by HP (Palo Alto, CA, USA; www.hp.com), that now offers a line of 3D scanning systems(http://bit.ly/VSD-HP3D). More importantly, many more international and domestic suppliers are entering the market (see “TOP 10 low-cost 3D scanners” at http://bit.ly/VSD-TOP103D). With the number of different 3D technologies available increasing and becoming mainstream, imaging software vendors such as Matrox Imaging (Dorval, QC, Canada; www.matrox.com) and MVTec Software (Munich, Germany; www.mvtec.com) will, likely as not, find an increasing demand for their products.
Advanced Scientific Concepts
Santa Barbara, CA, USA
Blekinge Institute of Technology
Santa Clara, CA, USA
Natick, MA, USA
David Vision Systems
(now part of HP)
Southfield, MI, USA
Richmond, BC, Canada
Gijón, Asturias, Spain
Seattle, WA, USA
Santa Clara, CA, USA
Palo Alto, CA, USA
ICON 3D Systems
IEEE Intelligent Transportation System Society
Imaging Development Systems (IDS)
Montbonnot Saint-Martin, France
Burnaby, BC, Canada
Dorval, Quebec, Canada
Redmond, WA, USA
Miamisburg, OH, USA
Austin, TX, USA
Nerian Vision Technologies
Madison, AL, USA
Lachine, QC, Canada
Salem, NH, USA
Robotic Vision Technology
Silver Spring, MD, USA
Sirius Advanced Cybernetics (SAC)
The MathWorks Inc.
Natick, MA, USA
For more information about 3D imaging companies and products, visit Vision Systems Design’s Buyer’s Guide buyersguide.vision-systems.com