Deep learning for vision processing: The emerging algorithm alternative
As the recent 4-to-1 drubbing of Go world champion Lee Sedol by Google's DeepMind AlphaGo program signifies, artificial intelligence has entered mainstream awareness.
As the recent 4-to-1 drubbing of Go world champion Lee Sedol by Google's DeepMind AlphaGo program signifies, artificial intelligence has entered mainstream awareness. It's enabled by the evolution of traditionalneural network approaches, the steadily increasing processing acceleration "muscle" of FPGAs, GPUs and dedicated co-processors, and the steadily decreasing cost of system memory. Among the most compelling uses for so-called "deep learning" techniques such as convolutional neural networks is object identification in images, where the approach offers compelling advantages over conventional computer vision algorithms.
Traditional rule-based object recognition algorithms require the mathematical modeling and algorithmic coding of a software function capable of reliably identifying a particular object within a still image or video frame. Unfortunately, even if such an approach can be made reasonably reliable under some conditions, the quality of results frequently falls apart under other conditions, such as when the camera’s viewing angle is non-ideal, for example, or under degraded ambient lighting.
Conversely, a convolutional neural network tuned to identify a particular object or set of objectsself-trains in response to being "fed" a set of reference images. The more examples of the object to be identified, the more accurate the results; variations in lighting, color and perspective can even be computer-generated. And re-tuning the network to identify different objects involves only discarding the existing neural array "weights" and re-training the network with a different set of reference images, versus re-coding a new algorithm from scratch.
One of the most exciting aspects of deep learning for vision is the large amount of ongoing development by vibrant open-source communities, along with the extensive open-source contributions by industry. Caffe, for example, is an open-source convolutional neural network framework that originated at and is primarily maintained by UC Berkeley's Vision and Learning Center. As arecent interview I conducted with the development team there made clear, Caffe's embrace by the open-source community and the technology industry is extensive and accelerating.
Extensions and other improvements made by many well-known companies that leverage Caffe are often merged into the framework, to the benefit of all users. Alternative open-source deep learning frameworks include Minerva, Theano and Torch. And don't forget about the company-developed neural network software packages and data sets that are now open source-licensed, such as Baidu's Warp-CTC, Facebook's FAIR software modules and hardware accelerator designs, Google's TensorFlow, Microsoft's CNTK, Yahoo's CaffeOnSpark, and many others.
Reflective of the fact that deep learning is quickly and pervasively being adopted by the computer vision community, it's getting a dedicated day at the upcomingEmbedded Vision Summit, a multi-day conference sponsored by the Embedded Vision Alliance. The May 2nd Deep Learning Day begins with the keynote "Large-Scale Deep Learning for Building Intelligent Computer Systems," from Jeff Dean, Senior Fellow at Google Research. Two parallel presentation tracks are then available; a technical tutorial, focusing on designing, implementing and training CNNs, and a set of business insight presentations covering deep-learning-enabled and other computer vision applications and markets.
Also available is a half-day hands-on Caffe/CNN tutorial, taught by the Caffe development team at UC Berkeley. Note, too, that deep-learning-related content will extend beyond May 2 to the remainder of the conference, with a number of talks on deep learning implementation techniques and enabling technologies. The Embedded Vision Summit, an educational forum for product creators interested in incorporating visual intelligence into electronic systems and software, takes place in Santa Clara, California May 2-4, 2016.Register now, as space is limited and seats are filling up!
Editor-in-Chief, Embedded Vision Alliance