At the University of California, Berkeley (www.berkeley.edu), a portable laser backpack has been developed that is capable of producing automatic and realistic 3-D maps of interiors. Funded by the Air Force Office of Scientific Research (Arlington, VA, USA; www.wpafb.af.mil) and the Army Research Office (Adelphi, MD, USA; www.aro.army.mil), Avideh Zakhor and her colleagues have also developed sensor fusion algorithms that use cameras, lasers range finders, and inertial measurement units to generate textured, photorealistic, 3-D models that can operate without requiring a GPS.
To generate these models in a realistic manner, image data from cameras and laser range scanners must be fused with positional and temporal information. To capture image data, the backpack is equipped with three 1.3-Mpixel FireWire cameras fromPoint Grey Research (Richmond, BC, Canada; www.ptgrey.com). By mounting these cameras orthogonally, a 3-D image of the scene can be captured as the user traverses a building.
While these cameras only provide visual information about the scene, depth information must also be gathered. To accomplish this, three 2-D laser scanners are also mounted orthogonally to provide 3-D point cloud data maps by computing the pitch, yaw, and roll of the backpack. While two vertically mounted 10-Hz URG-04LX 2D laser scanners from Hokuyo (Osaka, Japan;www.hokuyo-aut.jp) are used to capture pitch and roll data at a 4-m range over a field of view of 240°, a horizontally mounted 40-Hz Hokuyo UTM-30LX 2-D scanner with a 30-m range and a field of view of 270° captures yaw information by applying scan-matching algorithms.
The scans are transferred to the host PC on the backpack over a USB interface to perform the angle calculation. The vertical scanners serve the dual purpose of estimating pitch and roll and providing the 3-D geometry needed in a 3-D point cloud.
Although 2-D and 3-D data can be captured by these cameras and laser scanners, it is also necessary to understand the position of the backpack both temporally and spatially. To compute this position, many systems use global positioning systems to provide the data. Unfortunately, in indoor applications, such systems are ineffective since satellite signal strengths are reduced considerably.
To overcome this limitation, Zakhor and her colleagues have developed a number of localization algorithms, which for the most part use the same sensors used for geometry and texture capture to compute the six degrees of freedom (DoF) movement of the backpack over time, namelyx, y, z, yaw, pitch, and roll. However, to enhance and improve the localization resulting from scanners, they also employ an InertiaCube3 inertial measurement unit (IMU) from InterSense (Billerica, MA, USA; www.intersense.com) to provide roll, pitch orientation parameters to the CPU at a rate of 180 Hz.
During algorithm development, a second navigation grade IMU, HG9900 from Honeywell (Morristown, NJ, USA;www51.honeywell.com) equipped with Applanix software (Richmond Hill, ON, Canada; www.applanix.com), was used to generate the ground truth data for localization. Even though this system is too heavy and power hungry to be deployed in a commercial version of the backpack, and requires frequent stops during operation for self-correction, it was used during software development to aid in the choice of the sensors, and to characterize the performance of the algorithms used for localization and model construction.
To generate a 3-D model, point cloud data from the three laser scanners are first transformed into a 3-D coordinate frame from which a 2-D triangulated surface model can be generated.A texture map is then created by projecting temporally close projectively transformed images onto the model, resulting in a lifelike image (see figure).
Using a combination of laser scanners, cameras, and inertial navigation systems, researchers at the University of California have developed a backpack that can be used to generate lifelike interiors of buildings.
To align the images accurately on the model, images used for texture mapping are also used to refine the position/localization information of the backpack. Once again, the camera sensor serves the dual purpose of both providing texture and aligning the final texture mapped models.
To generate such models is, however, computationally intensive. Using an octal core Intel Xeon CPU with 4 Gbytes of RAM integrated with a QuadFX 4600 graphics card from Nvidia (Santa Clara, CA, USA;www.nvidia.com), for example, running graphical rendering software developed at the University of California–Berkeley, it took up to two hours to compute “loop closure”; i.e., when the human operator returns to an approximate location that has been visited previously.
This loop closure is performed by calculating spatially close images using a shift invariant feature transform (SIFT) technique for 100 images representing 3 min of data capture. Excluding the loop closure detection, the run time for generating a 3-D model for a 3-min walk is about 1 hr. Since the backpack is a human-operated system, it can be used to model complex environments such as staircases, where robots or other wheeled systems cannot be used.
Vision Systems Articles Archives