Software-defined flow helps developers boost performance per watt in embedded vision systems.
Giles PeckhamandAdam Taylor
Up to now, one tradeoff of using a programmable system on a chip (SoC) compared with GPU and typical SoC industry alternatives is the amount of hardware expertise required to implement image-processing pipelines and machine-learning inference engines.
To remove that design barrier, new software-defined programming gives software and system engineers the opportunity to use industry-standard libraries and frameworks to create a system model.
Figure 1:Example of agricultural drone in use
This approach increases access to more developers seeking to use programmable SoCs in an effort to boost performance per watt and incorporate the latest neural networks, algorithms, sensors and interfaces into their embedded vision system designs.
Embedded vision systems can be split into two high-level categories: systems that perceive the environment or systems that perceive the environment and take action. Vision-guided robotics and drones take action, needing to respond to situations such as sensing obstacles and avoiding collisions.
Within the civil space, drones used in commercial, medical, agricultural, broadcast, and law-enforcement applications offer significant advantages, such as reducing cost by providing capabilities that have previously required the use of helicopters (Figure 1).
Drones can also be deployed to custom services, such as the forthcoming Amazon Prime delivery service or the delivery of medical products to remote areas of Uganda. For agricultural applications, drones can use hyperspectral imaging to determine the health of crops.
Such diverse applications demonstrate current trends in the wider embedded vision world: First is intelligence from machine learning. With embedded intelligence, drones are able to extract information from the scene captured by its cameras and act upon that information.
Next are open, high-level languages and frameworks for implementing intelligence. The most commonly used of these are open-source multi-platform frameworks like OpenCV for computer vision and OpenVX for cross-platform computer vision acceleration within the embedded vision world and the Caffe deep learning framework within the machine-learning sphere.
Third is multi-level security to ensure the drone remains operational and uncompromised, which requires security at the device, system and network levels. And the final trend is the ubiquity of embedded vision itself. While vision-guided robots and drones are not yet as ubiquitous as our smart phones, applications using them are experiencing significant growth as developers exploit new use cases.
Figure 2:Stereo Block Matching Disparity map generated from two imagers.
Drones typically have three critical subsystems: a real-time motor control system, software-defined radio for bi-directional communication, and an embedded vision system. Typically, the vision system uses cameras with frame rates high enough and processing capability powerful enough such that the drone can react to the scene faster than a human. Moreover, because most vision-guided drones are battery powered, the lower the power consumption, the longer the drone can remain deployed before recharging.
For many applications, sensor fusion is standard. Homogeneous sensor fusion involves employing multiple sensors of the same type, such as stereo or multi-camera vision, where data streams are combined to create a 360° 3D map of the environment around the drone. Heterogeneous sensor fusion refers to applications employing different sensor technologies such as hyperspectral or infrared to observe different bands of the electromagnetic spectrum.
Simultaneous localization and mapping (SLAM) and dense optical flow algorithms provide the platform with an enhanced perception and obstacle-avoidance capability. To further aid platform interaction within its environment, depth perception can be provided using two imagers separated by a small distance and implementing a Stereo Block Matching algorithm.
Coupled with more traditional pattern and object-recognition algorithms, such a platform requires not only significant processing capabilities with a low and deterministic latency, but power-optimization capability and a scalable, future-proof solution (Figure 2).
Pairing processing power with programmability
The All Programmable Zynq-7000 SoC or Zynq UltraScale+ MPSoC devices are suitable for these applications because they provide high-performance logic coupled with high-performance ARM A53 or A9 processors. For real-time control, as may be required for motor control, the Zynq UltraScale+ MPSoC also provides a real-time processing unit that contains dual ARM R5 processors capable of implementing a safety processor.
Figure 3:The reVISION Acceleration stack and its constituent frameworks and libraries.
Such a tightly integrated, heterogeneous processing unit enables efficient segmentation of the functionality within either the processor or programmable logic, and makes it possible to perform real-time vision analytics and decision making. Coupled with the programmable logic fabric, such SoCs provide the ability to detect objects in real-time and then classify them using the processor system.
Once classified, the vision-guided robotic system or drone can take appropriate action for the class of object detected. To enable real-time decision making, developers are deploying machine-learning inference techniques.
Executing algorithms in programmable logic
Traditionally, implementing image-processing pipelines and machine-learning inference engines within programmable logic has required an HDL (hardware description language) specialist to replicate the high-level system algorithmic model, increasing development time and cost. The reVISION Acceleration stack for All Programmable Zynq SoC or Zynq UltraScale+ MPSoC devices enables developers to work directly with high-level industry-standard frameworks and libraries to create a system model (Figure 3).
To support this ability, the stack is arranged into three distinct layers:
- Platform layer. This is the lowest level of the stack and is the one on which the remaining layers are built. This layer includes platform definitions of the hardware and software environment. If developers choose not to use predefined platforms, they can generate a custom platform using the Vivado DesignSuite.
- Algorithm layer. The SDSoC Design Environment and the platform definition for the target hardware are used to create the application. It is within this algorithm layer that developers can use the acceleration-ready OpenCV functions, along with predefined and optimized implementations for convolutional neural network (CNN) developments such as inference accelerators. Designers can efficiently build these within programmable logic.
- Application layer. The highest layer of the stack development is where developers use high-level frameworks such as Caffe and OpenVX to complete the application.
With reVISION, developers can accelerate several OpenCV functions into the programmable logic within the Zynq-7000 or Zynq MPSoC device selected. Parallel pipelines enable more responsive image processing. Parallel execution within programmable logic removes the main bottleneck, transferring data on and off chip to double data rate (DDR) memory, as happens when a design is executed by a GPU. Keeping data within a device also reduces the power dissipation of a solution.
Because machine learning inference is often used to enable decision making in autonomous applications, reVISION also makes it possible to implement machine learning inference engines within programmable logic. reVISION can take network and training information directly from a Caffe prototxt file definition of the network. By using programmable logic, designers can accelerate an inference engine and thus provide a more responsive, power-efficient solution (Figure 4).
Accelerating inference engines
Designers achieve a machine learning inference engine within programmable logic using the int8 number system (int8 converts the elements of an array into signed 8-bit integers of class int8).
The int8 operations use DSP48E2 slices available within the UltraScale+ architecture. These DSP elements provide a performance increase as they are dedicated multiply accumulate blocks designed for performing fixed-point math.
Figure 4:OpenCV functions capable of being accelerated into the programmable logic today
The structure of these DSP blocks also enables resource efficiency as each can perform up to two int8 multiply accumulate operations if they use the same kernel weights. This approach can provide up to a 1.75 times throughput improvement and enables a cost-optimized solution that delivers two to six times increased power efficiency (expressed in giga operations per second per Watt) when compared with competing devices.
Using reVISION to design image-processing systems for vision-guided robots and drones reduces development time and provides a more responsive, power-efficient and flexible solution.
Compared with a GPU-based solution, an All Programmable Zynq-7000 SoC or Zynq UltraScale+ MPSoC offers one quarter the latency along with a performance and power efficiency increase of up to 4X images per second per Watt for machine learning applications and up to 42X frames per second per Watt for embedded vision.
Due to the remote nature of vision-guided drones and robots, the optimal design solution must also consider security to prevent unauthorized modification or accessing of the system and its data or malicious hacking.
Although developing a secure design must be considered from the system level downwards, an All Programmable Zynq-7000 SoC or Zynq UltraScale+ MPSoC provides several device and system-level security aspects that can be put into effect.
With these devices, developers can encrypt and authenticate the boot and configuration process, along with supporting Arm TrustZone technology. With Trustzone, the development team can create orthogonal worlds, a practice that limits software access to the underlying hardware, with the use of a hypervisor.
Meanwhile the inbuilt Xilinx analog to digital converter (XADC) or System Monitor can be used to monitor device voltages and temperatures along with external parameters to provide for an anti-tamper approach. Dependent upon system requirements, there are several additional design choices, particularly within the Zynq UltraScale+ MPSoC, that can further strengthen security.
During this time of significant growth, developers of vision systems for vision guided robotics and drones can optimize performance per watt by using an All Programmable Zynq-7000 SoC or Zynq UltraScale+ MPSoC designed with the reVISION software-defined programming flow. This approach also provides them the opportunity to use industry-standard high-level frameworks and libraries to design a power-efficient, responsive, flexible and secure system. For more information, please visit:http://bit.ly/VSD-REV.