Is deep learning the solution to all computer vision problems?

The following blog post is from Jeff Bier, Founder of the Embedded Vision Alliance, Co-Founder & President of Berkeley Design Technology, Inc.

At the May 2017 Embedded Vision Summit, I had the privilege of hearing a brilliant keynote presentation from Professor Jitendra Malik of UC Berkeley. Malik, whose research and teaching have helped shape the field of computer vision for 30 years, explained that he had been skeptical about the value of deep neural networks for computer vision, but ultimately changed his mind in the face of a growing body of impressive results.

There’s no question that deep neural networks (DNNs) have transformed the field of computer vision. DNNs are delivering superior results on recognizing objects, localizing objects within a frame, and determining which pixels belong to which object. Even problems like optical flow and stereo correspondence, which had been solved quite well with conventional techniques, are now finding better solutions using deep learning techniques. And the success of deep learning goes well beyond computer vision, to tasks like speech recognition.

As a result of these impressive successes, deep learning has attracted huge attention and investment, both among researchers and in industry. This focus and investment is accelerating progress both in deep learning algorithms and in techniques to implement these algorithms efficiently, enabling them to be integrated into a growing range of systems, including those with significant cost and power constraints.

This naturally raises the question: If you are incorporating computer vision functionality into your system or application, should you consider anything other than deep learning? In my company's consulting practice, we’re increasingly hearing from clients who want to solve a computer vision problem using deep learning. But we’ve found that in in some cases, other types of algorithms are preferable. Why?

First, the visual world is infinitely varied, and there are an infinite number of ways in which system designers can use visual data. A few of these use cases, like object recognition and localization, are well addressed by published deep learning techniques. So, if your application requires an algorithm to recognize furniture, for example, you’re in luck: You can select a deep neural network algorithm from the published literature and retrain it with your own data set.

But let’s talk about that data set for a moment. Training data is critical to effective deep learning algorithms. Training a DNN typically requires many thousands of labeled training images (i.e., images labeled with the desired output), and many thousands more labeled images for evaluating candidate trained algorithms. And, of course, the nature of this data is important: the training and validation data must represent the diversity of cases that the algorithm is expected to handle. If obtaining enough diverse training data is difficult or impossible, you may be better off with conventional techniques.

Another reason to consider techniques other than DNNs is if you need to perform a computer vision task that hasn’t yet been addressed by a DNN algorithm in the published literature. In this scenario, you could try to use an existing DNN algorithm that was created for another purpose. Or you could try to create a new DNN algorithm tailored to your requirements. Either way, you’re in the realm of research. This can be daunting, because few people and organizations have experience developing novel deep neural network algorithms. And, it’s difficult to know whether you’ll succeed within the available time, effort and computing resources.

When we delve into our customers’ requirements, we often find that what starts out looking like a single visual perception problem can be broken down into several sub-tasks. Often, some of these sub-tasks are a natural fit for DNNs, while others are not. For these projects, a solution that combines DNNs and conventional techniques is often a better approach, rather than trying to force the entire problem into a DNN solution.

It’s also important to remember that machine learning techniques are many and varied. Long before deep neural networks become popular, other machine learning techniques (such as support vector machines) were being used to good effect on many vision problems, and they remain useful today.

Given the huge investments being made in DNN research and technology, it’s clear that the range of problems for which DNNs are the preferred solution will continue to expand rapidly. Nevertheless, for the foreseeable future, many applications will be best served by conventional techniques (including other forms of machine learning), or by a combination of deep learning and conventional algorithms.

The tradeoffs between conventional, DNN and hybrid (conventional plus DNN) computer vision techniques are among the topics that will be discussed in detail in the presentations, demonstrations and one-on-one conversations with subject experts at the 2019 Embedded Vision Summit, taking place May 20-23 in Santa Clara, California. Over the past six years, the Summit has become the preeminent event for people building products incorporating vision. In 2019, both conventional algorithms and DNN approaches will once again be focus areas for the Summit program. Mark your calendar and plan to be there. Registration is now open on the Summit website.

Jeff Bier
Founder, Embedded Vision Alliance
President, BDTI