Daniel Lau discusses advances in 3-D imaging, structured light illumination, GPUs/CPUs, and image-processing algorithms
Daniel Lau is associate professor at the University of Kentucky, and president/CEO of Lau Consulting. His research interests include 3-D imaging sensors, 3-D fingerprint identification, and multispectral color acquisition and display. He holds a PhD in electrical engineering from the University of Delaware.
VSD:Which aspects of image processing interest you? What current projects are you pursuing in image processing?
Lau: As an undergraduate in Jan Allebach’s class at Purdue University, I learned about halftoning and, because of that, my advisor at the University of Delaware, Gonzalo Arce, pulled me into a related project that evolved into my PhD work. Of course, a strongly related topic to halftoning is print quality assessment, which is what got me involved in digital cameras and optics for the purpose of taking pictures of printed dots on paper.
When I started my work at Kentucky, I needed a research topic that I could get funded through government grants because halftoning is largely limited to industry funding sources—which won’t get you tenure. Instead, I started collaborating with Larry Hassebrook, whose research focus was structured light illumination.
The attacks of 9/11 occurred three weeks after my arrival at Kentucky, so as government funding immediately turned toward homeland security and biometrics, so did my research interest in developing real-time scanning methods for biometric applications. Our early proposals focused on face recognition in 3-D space. Eventually, we dropped the face recognition component and focused purely on sensor development, which led to the Flashscan3D fingerprint-scanning project (www.flashscan3D.com) and to our most recent real-time system.
Separately, I got involved with an ophthalmology professor who was interested in high-throughput screening for drug discovery. That project has led to developing an automated means of making vascular assays, which are basically tiny spheroids of cells responsible for blood vessel growth in the eye. For this project, I’m building a gantry system that searches a Petri dish for assays.
Assays of good size and shape are plucked out of the dish using a steel surgical tube and syringe pump and transplanted to a 384-well plate for later processing. I am building the system from scratch because that it was the only way to truly customize the software for analyzing the images at high magnification. It also proved to be a lot cheaper since a commercial system was about $70K while my system is under $10K.
VSD:In the area of 3-D imaging, what advances in hardware and software have been or will be made to make structured light illumination more practical? How does your work compare with other methods such as stereoscopy and laser scanning?
Lau: Probably the biggest technological improvement for structured light is the introduction of low-cost, high-performance GPU processing as well as multicore CPUs. This is important because structured light is a method of 3-D where each pixel of the received image is independently processed. An architecture, like a GPU, only works when data can be processed in small “chunks” massively in parallel. A chunk would be taking the same pixel over several frames of video with each chunk independently processed.
As a result, one can process structure light illumination (SLI) video in real time without much headache. Of course, having broken this bottleneck, the new bottleneck becomes the projector and the sensor. Having GigE cameras greatly reduces the barrier to entry for high-speed cameras such as the one we are using in our real-time system, since I’ve almost completely abandoned Camera Link or any frame grabber hardware.
As for smart cameras and FPGAs, they certainly have their advantages as one could easily implement the SLI video processing on an FPGA such that the camera could output phase video, but the effort it takes to train a student on this type of system is very high, not to mention the price of the hardware and software. C-to-FPGA tools certainly help, but not enough so that I would abandon the PC platform that I currently employ.
I think with regard to other 3-D imaging technologies, the choice of one versus another is largely application dependent. Certainly with the introduction of a real-time SLI system such as ours, SLI can now be applied to a much wider array of applications where one might have employed stereo vision. But with difficulties in bright ambient light environments, even our system has limits. So stereo vision becomes the default. But stereo vision does not—nor can I see it ever—achieve the same resolution as SLI.
With regard to laser scanning, I consider it to be the same as SLI, except that SLI illuminates the entire scene versus just along a single plane of light. I think there are applications, such as scanning objects as they travel along a conveyer belt where laser scanning benefits from simple setup. But I’ve also been in competitions with laser scanners where our SLI system was chosen hands down—again, application dependent.
|Lau and his colleagues at the University of Kentucky developed a low-cost, real-time, structured-light scanning system. Currently, the projector/camera pair enables them to produce fully volumetric 3-D point clouds at 150 frames/s.|
VSD:How do you think this research will impact future generations of image-processing markets such as machine vision?
Lau: Real-time operation has always been our goal since Larry Hassebrook and I started working together; only recently have we achieved true real time at low cost, which by our definition is producing fully volumetric 3-D point clouds at the speed of the camera using commodity hardware. In our setup, we can process video at up to 250 frames/s at 640 × 480[-pixel] video, but the camera/projector pair limit us to 150 frames/s (see figure). We do have a projector built from a Boulder Nonlinear Systems (Lafayette, CO, USA) spatial light modulator that will work at up to 1000 frames/s, but this isn’t commodity hardware.
Machine vision benefits greatly from such a system since, in most cases, illumination is tightly controlled. And the cost is extremely low, since we could have disassembled a commodity DLP projector instead of the $25K R&D system we are currently using. A sub-$5K unit is entirely conceivable at retail prices. I think the system will also have a significant impact on work involving human-computer interfacing and biometrics where methods of recognition benefit from the additional dimension of information now that the data can be acquired at greater than 30 frames/s rates.
VSD:What hardware and software advances in cameras, interfaces, and processors will enable sophisticated algorithms for motion tracking and 3-D modeling to be adopted more rapidly?
Lau: A tremendous amount of work is being done in computer vision that relies heavily on affine transformations and low-level (Harris and edge) feature detection. So I think the integration of Intel Atom-based processing engines inside smart cameras, which can handle these operations, is a major step forward, but the price is way too high at present.
As for the “I wish they made one of these although I might be the only customer” camera, I think that cameras having multiple, reconfigurable data paths would be of great value, and what I’m thinking about here is a 1280 × 960-pixel camera that outputs four separate 640 × 480 video signals across four different GigE ports such that you can get the high 200 frames/s feed. However, these windows need not be the four nonoverlapping quadrants but could be configured to overlap or spliced to even and odd rows, for example.
I also think that alternate processing architectures should be developed. In particular, I have a PlayStation 3 (PS3) in my lab that we’ve been programming that can do amazing things with video for 3-D imaging, but with the new PS3 slim design, you can no longer run Linux. Hence, there is no source for inexpensive cell-based processing hardware.
As for software, clearly there needs to be significant improvement in basic programming skills development for multicore architectures. And this goes back to GPU processing being limited to independently processed chunks of data. I want to run a pipeline of processing on the same data in a neighborhood-dependent way with some level of parallelism—but there isn’t much research being done to develop these non-SIMD types of architectures as everyone is so interested in jumping on the GPU bandwagon.
VSD:How will you incorporate such developments into your systems?
Lau: The affine transformations are a basic process in calibrating the camera and projector for an SLI system as well as stereo system. The more of this fundamental processing that we can move onto the camera, the better. Someone also needs to implement the arctangent function in hardware so that it takes no more processing cycles than a floating-point multiply operation.
As for the multiple data paths idea, we could process larger data sets since we could push the processing onto multiple CPUs. Another thing we could do is bring stereo vision into our SLI systems to start assembling real-time hybrid systems that give us wrap-around properties. The multicore architectures allow us to increase the amount of data we process, thereby allowing for more sophisticated algorithms.
VSD:What algorithms, technology, and software developments do you see emerging in the next five years?
Lau: We will continue to see the number of processing cores inside a single processor package multiply. So the ability to process pixels will skyrocket, but will the bandwidth be there to get the pixels out of the image sensor and into the processor?
As our system demonstrates, we are processing faster than we can get the data over the GigE port. At the same time, while I may be able to increase the bandwidth of my camera connection through 10GigE, that doesn’t mean that my pixel exposure will give me good dynamic range at a high frame rate.
This gets back to my answer to a previous question—if I had reconfigurable data paths, then I could split pixels up into even and odd rows such that the even rows are taken on the rising edge of the clock and the odd rows on the falling edge. I would then send the even row image on one Ethernet line and the odd rows on a second line. Both would be going at some rate F, but by multiplexing frames, I would have an equivalent frame rate of 2F while maintaining an integration/exposure time that is limited by the F rate. Now I can do that with two cameras placed side by side, but there are issues when trying to image objects at varying distances simultaneously, not to mention the cost of two cameras, two lenses, and two power supplies.
VSD:What is your experience in commercializing research-based systems? How will future government funding impact the advancement and deployment of such systems?
Lau: We’ve built scanners for Toyota Manufacturing in Georgetown, KY, as well as for Gentle Giant Studios in Burbank, CA. So we have experience building systems for commercial use. At the same time, we helped establish the company Flashscan3D (Richardson, TX, USA) thanks to our work with the Department of Homeland Security on the fingerprint SLI scanning system.
We have received an STTR award from NASA in collaboration with Boulder Nonlinear Systems and an SBIR award in collaboration with Michigan Aerospace. And we are hoping to find an industry partner interested in commercializing our real-time technology.
With regard to government funding, our work is pretty much focused on fingerprint scanning, but we are trying to branch out into applications such as deceit detection. Also, there is a need for 3-D scanning to generate data for virtual reality systems used in first-responder training and that’s something we want to tap into.