Optimized GigE Vision Implementation Drives High-Speed Imaging Advancements

Aug. 9, 2023
Why RDMA and TCP options for GigE Vision are a step backwards for High-Speed GigE Cameras.

The quest for the optimal interface medium and protocol for transmitting image sensor data from source to destination is one that manufacturers within the machine vision industry have pursued since the first attempts at standardization more than 20 years ago. Over the course of decades, sensors have evolved – resolutions have grown, and clock rates have increased dramatically, resulting in image sensors capable of transmitting data at 100+ Gigabit rates. The evolution of sensor technology has opened endless applications for deployment but along with the growth in addressable market, we’ve seen challenges emerge with not only transmitting data reliably, but also with transmitting data over long distances and with latency and jitter low enough and well enough controlled to enable success in the most demanding imaging applications.

These challenges have continuously been met with innovation – industry stepping forward to develop new capability or repurposing existing technology with enhancements that make it suitable for solving the problem of the day. Many of these innovations have traditionally been point-to-point interface solutions, utilizing specialized protocols and relying on purpose-built acquisition boards or grabbers to receive and reassemble transmitted data. Some manufacturers, however, have chosen to make their investments for high-performance data transmission using Ethernet – a ubiquitous technology which already serves as the basis for GigE Vision which is the machine vision industry’s leading camera interface technology. It enables the use of off-the-shelf cabling, switches, and network interface cards (NICs) and is readily supported by Windows and Linux. Furthermore, Ethernet is incredibly scalable. 

GigE Vision Origins

Ratified in 2006, The GigE Vision standard uses a UDP-based protocol called GVSP to facilitate image transfer using Ethernet. As data rates have increased, some manufacturers have encountered difficulty achieving performance using GVSP, particularly when data rates approach 10Gbps or higher. This has led to experimentation with TCP or RDMA protocols to mitigate those difficulties, but Emergent Vision sees these options converge the GigE Vision standard with point-to-point protocols like CXP and USB.

Zero Copy Image transfer

With GigE Vision, the problem has been tied to the need to dissect the many Ethernet packets at the receiver to provide the image data to the application in contiguous form which necessitates splitting off the Ethernet packet headers. This can be accomplished in software at a large cost with triple the memory bandwidth and higher CPU utilization (this, incidentally, is what RDMA proponents compare with when discussing pros and cons of traditional GigE Vision and RDMA). We avoid this cost by utilizing the built in splitting features of the modern day NICs (Network Interface Card) to perform this Zero copy image transfer.

GigE Vision with Support for TCP

TCP is one protocol explored by some to improve the performance of GigE Vision. Some even claim this is a guaranteed transfer mechanism which is false. TCP is not a Zero copy process, so it triples the required memory bandwidth. In addition, TCP is point-to-point which converges this protocol with CXP and USB which all but eliminates the benefits over those protocols especially since CXP is adopting the Ethernet physical layer in newer revisions to address its own deficiencies. In all senses, TCP is a non-starter for performance applications.

GigE Vision with Support for RDMA/RoCE

RDMA/RoCE is another protocol explored by some for the same reasons. Some will continue to claim THIS is now the guaranteed transfer mechanism which is again false. RDMA is a Zero copy process which is its primary benefit, but, as with TCP, is a point-to-point protocol and incurs network overhead to support its connected nature. It is important to remember that RDMA and TCP were really designed for large data transfer on the internet with many multiple hops through switches and routers with dropped and out of order packets. In machine vision, the systems are closed with controlled routing if switches are used. A reminder also that TCP and RDMA are far from ratified into the GigE Vision standard, but Emergent will integrate the RDMA addition if and when support is ratified, as this would represent a small effort and would be backward compatible with all existing products we sell and support.

Zero Copy GigE Vision with Mature GVSP Protocol

Zero copy with header splitting is indeed possible with modern NICs by NVIDIA/Mellanox, Broadcom, Intel, and Marvell. Emergent has implementations deployed with NVIDIA/Mellanox and Broadcom which are the primary NICs explored by those experimenting with RDMA RoCE which eliminates any concerns surrounding interoperability. In fact, Emergent has been using this same method for over 15 years and have the maximum design-in densities of any interface standard with reliability to match. The same approach is also used for ST2110 for the massive media and entertainment market.

Zero copy does not guarantee Zero data loss in any interface or protocol implementation. Any performance system still needs proper design and margining to achieve desired results. This goes for CXP, RDMA / RoCE, and even optimized GVSP implementations. But we can guarantee that the optimal GVSP implementation will equal or better RDMA/RoCE without turning GigE Vision into a point-to-point protocol and eliminating what has made GigE Vision the most popular interface over the years. It is important to note that when the retransmission feature of RDMA is engaged that this is a sign of a back up in the system which is also a sign of often undesired latency and jitter.

It is also important to note that CXP doesn’t use resends or flow control yet is able to sustain high data transfer rates with optimal receiver performance, low latency and jitter. Much of this can be attributed to adequate buffering on the purpose-built frame grabbers required for CXP. Low-cost NICs often lack sufficient buffering capability however modern NICs are readily available at cost-effective price points with ample physical buffering.

It is worth noting at 25Gbps and higher that PoE (power over ethernet) is dead. Thus, new deployments should be focused on SFP technologies and distributed power systems. It is also noteworthy even at 10GigE speeds that the big NIC providers do not support PoE which forces camera vendors to sell their proprietary card solutions.

GPU Direct – Better than Zero copy

Zero copy minimizes the CPU and memory bandwidth utilization by writing to memory only once, but we can avoid that transfer altogether by writing directly to the GPU – this is called GPU Direct. And it makes sense in many performance applications to send data directly to the GPU for processing and then taking the lower bandwidth results to the CPU and memory for user or system interaction. Emergent has been supporting GPU Direct with NVIDIA GPUs on Windows and Linux for over four years in a variety of applications. NVIDIA RTXA6000/5000/4000, Orin, and Xavier are used in many applications using Emergent cameras. Unfortunately for RDMA users, NVIDIA/Mellanox only allow GPU Direct on Windows to select partners such as Emergent and this OS is where 80% of machine vision applications continue to be deployed. Linux, however, does remain an option for RDMA with GPU Direct for all.

Integrated Interface and Processing Cards for Ultimate Performance

Zero copy is great. GPU Direct improves on this. But it would be the ultimate achievement if we received and processed the data from the cameras all on one card. In this case, CPU, memory, and all server resources are not used at all. Emergent supports AMD/Xilinx Alveo cards for this very purpose and has multiple performance applications leveraging this technology. Emergent is also working closely with NVIDIA to bring Bluefield NIC support. Think of Bluefield as the merging of NVIDIA NICs with NVIDIA GPUs. In both cases, the computer can be a low-end PC which primarily supplies power to the chosen card.

Multicast (Not Supported by RDMA / RoCE and TCP

While not in use by all applications, many of Emergent’s largest deployments utilize the Multicast feature of the Ethernet standard. Point-to-point protocols like TCP and RDMA cannot support multicasting by their nature. RDMA does have other modes it can operate in which essentially remove its flow control and packet retransmission feature – this is tantamount to the current GVSP standard. Two primary benefits of multicast exist: redundancy and distributed processing. Redundancy allows critical systems to have the fastest fail over to avoid downtime. In larger systems, switches are present and back-up servers can be set up to take over camera streams when one server has a problem.

Distributed processing is especially important as the number and speed of cameras is increased and also very much dependent on the type of processing required. Some applications will simply take multicast camera data to another system for display while the heavy processing is done in other systems. Even on the same server the switch can send virtually Zero delay copies to different GPUs for parallel processing. It is nice to start with technology that allows for such an architecture even if not immediately required. One representative deployment is with the 240 Emergent Bolt HB-25000SB 25GigE 25MP cameras running at 90fps across six mid-range servers – that is 40x 25GigE cameras per mid-range server which is unparalleled and miles ahead of any solution out there.

The Big Picture

Many camera manufacturers focus their attention on enabling the transfer of data from camera to receiver. They claim success once sensor data has arrived in system memory, leaving the integrator or customer with responsibility to manage that data into processing nodes. In some applications, system memory and the CPU are sufficient for managing and processing the incoming data stream(s) particularly when post-processing can be employed. In others however, particularly where multiple 10GigE, 25GigE or 100GigE streams are being used, real-time processing requires the use of offload technologies to more adequate processing nodes. In the concepts and proposals for alternate interface or protocol methods, this seldom comes up. We need to see the big picture. Over the past 15 years, Emergent has pioneered and developed 10GigE, 25GigE, and 100GigE area and line scan cameras and created an eco-system to support the most reliable highest speed imaging applications.

Voice Your Opinion

To join the conversation, and become an exclusive member of Vision Systems Design, create an account today!