Five steps for building and deploying a deep learning neural network

June 15, 2020
Accelerating machine vision implementation with deep learning is nothing to fear.

Brian Cha

Free tools and training data, easy-to-find tutorials, and low hardware costs have made deep learning no longer a method available only to researchers or people with highly specialized skills and/or big budgets.

This presents both opportunities and threats as new players emerge to disrupt established names and spur innovation. It also provides opportunities for a machine vision system to do things previously unimaginable. Deep learning can recognize unexpected anomalies, typically very difficult or almost impossible to achieve with traditional rules-based coding, for example.

Deep learning fundamentals

Deep learning, a subset of machine learning inspired by how the human brain works, takes place in two stages: training and inference.

The training phase involves defining the number of neurons and layers that will comprise the neural network and exposing the network to labeled training data, usually good images of objects that would pass inspection and bad images of objects that would fail.

The neural network then figures out properties of each grade, such as size, shape, color, consistency of color, and so on. Manual definition of these characteristics or programming the parameters of a good or bad product are not required. The neural network trains itself.

In the inference phase the trained neural network, when presented with fresh images, will provide an inference as to the quality of the object and the neural network’s confidence in its assessment.

Step 1 - Identify the appropriate deep learning function

Four of the most common deep learning tasks include classification, detection and localization, segmentation, and anomaly detection.

Classification involves sorting images into different classes and then grouping images based on common properties, most often into categories of pass and fail. Any item classified pass continues down the production line. An item classified fail does not.

Detection and localization can identify features in an image and draw a bounding box around those features to determine their position and size. This function can provide a more detailed assessment of why an item deserves a fail classification, for example, by detailing the location of the fault.

Related: Deep learning continues growth in machine vision

Segmentation (Figure 1) identifies which pixels in an image belong to which corresponding objects, to determine the context of an object and its relationship to other objects. Advanced driver assistance systems (ADAS) use segmentation routines to identify cars, street signs, or other objects while the car moves.

Anomaly detection functions can identify regions on an image that do not match a pattern. For instance, a deep learning system could process an empty shelf in a grocery store as an anomaly compared to nearby, full shelves, and mark the empty shelf as requiring a restock.

Step 2 - Select a framework

A framework, or a toolset used to develop a neural network, usually includes a starter neural network and tools for training and testing the network. Free, easy to use frameworks like PyTorch, TensorFlow, and Caffe2 provide great documentation and include examples to allow novice users to train and deploy neural networks with minimum effort.

PyTorch (https://pytorch.org), an open source solution now part of Facebook (Menlo Park, CA, USA; www.facebook.com), is simple and easy to use and employed in many research projects, but not commonly used for large deployments and only fully supported for the Python programming language.

TensorFlow (www.tensorflow.org) by Google (Mountain View, CA, USA; https://about.google) has a large userbase supported with good documentation. It offers scalable production and deployment and supports mobile deployment. It has a higher learning curve compared to PyTorch, however.

Caffe2 (https://caffe2.ai) by Facebook, a lightweight option, translates to efficient deployment. One of the oldest frameworks, Caffe2 has widely supported libraries for convolutional neural networks and computer vision applications and is best suited for mobile devices using OpenCV.

The optimal framework for a task ultimately depends on complexity and required inference speed. The more layers a neural network has, the slower the inference.

Step 3 - Preparing training data for the neural network

The number of images required for training depends on the type of data a neural network will evaluate. Generally, every characteristic and every grade of that characteristic the neural network must assess, requires a set of training images (Figure 2).  The more images provided for each category, the more finely the neural network can learn to assess those categories.

For common use cases, free or purchasable pre-labelled datasets that match specific requirements may exist online. Companies such as Cvedia (Arlington, VA, USA; www.cvedia.com) can create synthetic datasets annotated and optimized for neural network training. In the absence of other options, self-produced and -labeled images may need creating. Turning a single image into many images through rotating, resizing, stretching, and brightening or darkening can save time.

Related: What is deep learning and how do I deploy it in imaging?

Several developers in the deep learning market open source their image labeling solutions and share them for free. LabelImg (bit.ly/VSD-LBMG), particularly useful for unlabeled datasets, provides a graphical image annotation tool that helps label objects into bounding boxes within images. Alternatively, third parties can handle the labeling process. Preparing training data can become even more important in light of specific hardware limitations or preferences, as some deep learning tools support only a finite set of hardware.

Step 4 - Train and validate the neural network to ensure accuracy

This stage involves configuring and running the scripts on a computer until the training process delivers acceptable levels of accuracy for a specific use case. Separating training and test data ensures a neural network does not accidentally train on data used later for evaluation. Taking advantage of transfer learning or utilizing a pre-trained network and repurposing it for another task, can accelerate this process. A neural network already trained for feature extraction, for example, may only need a fresh set of images to identify a new feature. Frameworks like Caffe2 and TensorFlow provide pre-trained networks for free.

Several graphical user interface-based software options for neural network training exist, like Matrox Imaging Library X, or MIL (bit.ly/VSD_MILX) from Matrox Imaging (Dorval, QC, Canada; www.matrox.com) which work with different frameworks and make the training and deployment process very intuitive, even for less experienced users.

Step 5 - Deploy the neural network and run inference on new data

The last step entails deployment of a trained neural network on the selected hardware to test performance and collect data in the field. The first few phases of inference, ideally used in the field to collect additional test data, may provide training data for future iterations.

Cloud deployment offers significant savings on hardware cost and the ability to scale up quickly and deploy and propagate changes in several locations. Internet connection issues can cause critical failures, however, and cloud deployment has high latency compared to edge deployment.

Edge deployment on a highly customizable PC suits high performance applications. Selected PC components may fit a specific application, which makes pricing flexible. Edge deployment still has a higher cost than other options and the footprint of the needed hardware requires consideration.

Related: Edge device uses inexpensive, off-the-shelf components for deep learning inference

Edge deployment on ARM, FPGA, or inference cameras like the Firefly DL camera (www.flir.com/firefly-dl) from FLIR Systems, Inc. (Wilsonville, OR, USA; www.flir.com), requires less power than other options, offers savings in peripheral hardware, and has high reliability. This creates a secure system isolated from other hardware, or an ideal compact application, but may not handle computationally demanding tasks effectively.

Potential shortcomings of deep learning

Deep learning, a black box for the most part, can make explaining how a neural network arrives at its decisions difficult to illustrate. While inconsequential for some applications, companies in the medical, health, and life sciences field have strict documentation requirements for the product approval by the FDA or its counterparts in other regions. Full awareness of how deep learning software functions and potential requirements to document the entire operation in fine detail are necessary in some cases.

Optimizing a neural network in a predictable manner may present an issue. Many neural networks take advantage of transfer learning to retrain existing networks, while very little optimization occurs.

Even minor errors in labeling training data can throw off the accuracy of the neural network. Debugging the problem becomes extremely tedious as it requires reviewing all training data individually to find incorrect labels.

In addition to these shortcomings, logic-based solutions better suit some applications. For instance, logic-based solutions may provide better results for a well defined, deterministic, and predictable problem compared to deep learning-based solutions. Typical examples include barcode reading, part alignment, and precise measurements.      

Conclusion

Even with some of the shortcomings, for certain applications the potential benefits accrued from deep learning like rapid development, ability to solve complex problems, and ease of use and deployment, outweigh the negatives. Deep learning also continually improves to account for these shortcomings.

Also, with wider adoption many companies now develop their own neural networks instead of relying on transfer learning which improves performance and customizes the solution for a specific problem.

Even in applications well-suited for logic-based programming, deep learning can assist the underlying logic to increase overall accuracy of the system. As a parting note, it’s getting easier and cheaper than ever before to get started on developing a deep learning system (bit.ly/VSD-DLCS).

Brian Cha is a Technical Product Manager at FLIR Systems, Inc. (Arlington, VA, USA; www.flir.com)

Voice Your Opinion

To join the conversation, and become an exclusive member of Vision Systems Design, create an account today!