Vision Systems Design had a few questions about this technology, so we reached out to Jeff Mahler, CTO and co-founder of Ambi Robotics.
Editor’s Note: The following Q&A may have been edited for style and clarity.
Vision Systems Design (VSD): Can you provide more technical details on the AI models and algorithms behind AmbiVision's cognitive OCR and item identification features?
Jeff Mahler (JM): AmbiVision uses vision language models (VLMs) trained on a combination of simulated and real data collected from over 250K operation hours across live commercial operations.
VSD: How do these algorithms handle the variability and inconsistency of real-world labels and text?
JM: The VLM can reason about inconsistencies in text. For example, if an item needs to be sorted by quantity and the label says “total units,” the VLM will understand that both refer to the same thing. The data we’ve collected enables us to achieve 99.9% accuracy while generalizing to a wide variety of item shapes, sizes, materials, and appearances.
VSD: What has been your approach and the key technical challenges in integrating AmbiVision with existing vision hardware such as the Cognex Dataman 380? How did you ensure compatibility and performance?
JM: Key to our integration is having the AI operate on images from the scanner. We worked closely with Cognex to develop that integration, using their Dataman 380 hardware. We’ve already tested this capability in commercial operations with existing AmbiStack customers.
VSD: Could you explain the specific pose estimation and dimension measurement techniques AmbiVision uses to support robotic handling and palletizing applications?
JM: First, AmbiVision looks for whether there are items at all. From there, it will determine the size of the item, using that information to assign a pose to the object. That information is carried forward to track the item over time.
VSD: How do these techniques improve automation precision?
JM: This is very useful for downstream handling tasks. Most automation today is rigid and requires items to be fixed, or in the same exact position and orientation every time. AmbiVision enables automation when position and orientation can vary along the surface of the conveyor. AmbiVision informs downstream automation of incoming package orientations etc. information needed for 99.9% reliability.
VSD: How does AmbiVision maintain real-time performance and robustness and fast-paced noisy logistics environments?
JM: AmbiVision uses best-in-class commercially available hardware, including the Cognex Dataman 380, to capture dozens of high-resolution images of each case. Neural networks then identify text regions for Cognitive OCR processing, with optional validation against known data sources. The system supports conveyor speeds up to 165 ft/min, processes 600-1,200 cases per hour, and delivers OCR results within six seconds of a case exiting the tunnel, with >98% identification accuracy and >95% coverage for in-spec cases.
VSD: Does Ambi robotics provide any developer tools, APIs, or STKs to enable machine vision engineers and integrators to customize or extend AmbiVision's AI capabilities for their specific workflows?
JM: Currently we do not, but perhaps in the future.