Visions Podcast: Vision Models in Manufacturing: Part 2
Key Highlights
- VLMs are transforming manufacturing and robotics by enabling more accurate inspections and safer work environments.
- Integration with digital twins and quality standards creates traceable systems with comprehensive audit logs.
- Practical guidance is provided for plant managers on selecting use cases and measuring ROI of VLM deployments.
- The episode addresses misconceptions about job loss, emphasizing the complementary role of humans and AI.
- VLMs improve inspection safety, consistency, and worker productivity through real-world applications.
Here is the second installment of a two-part podcast on Vision Language Models. In this episode, host Jim Tatum speaks with Dijam Panigrahi, co-founder and COO of GridRaster, about VLMs and their expanding role in machine vision and robotics.
The conversation covers real-world deployments in manufacturing and depot environments, how VLMs integrate with quality standards and digital twins to create living systems with traceability and audit logs, and offers practical guidance for plant managers on selecting use cases and measuring ROI.
The episode also addresses misconceptions about job loss, emphasizes the human-in-the-loop approach, and highlights how VLMs can improve inspection safety, consistency, and worker productivity.
Visions: A Machine Vision and Automation Solutions Podcast, is the podcast for engineers, designers, integrators, and end users who want to keep an informed eye on the imaging and machine vision industry. Every Tuesday we will explore the latest in imaging trends, developments and solutions. Here you will find interesting, useful insights and observations from expert interviews, solo episodes, even the occasional panel discussion, all of which aim to expand your knowledge on imaging and machine vision.
Related: Harnessing VLMs for Real Time Factory Decision Making
Related: VLMs Explained: Augmenting Machine Vision and Robotics in Manufacturing
Transcript
Well, hello and welcome to Visions: A Machine Vision and Automation Solutions Podcast. I'm your host, Jim Tatum, senior editor of Vision Systems design and Visions is an Endeavor Business Media production from your friends at Vision Systems Design. Here you'll find the latest on everything from end user machine vision solutions to trends, developments, and perspectives on all things machine vision and imaging. Whether you've been working in the industry for a while or you're just starting to take a closer look at it, this podcast is designed to grow your knowledge and bring greater focus to your understanding of the imaging and machine vision industry. And now on to our show.
Well, hi everybody, and welcome back to visions. I'm Jim Tatum, senior editor of Vision Systems Design. And this is the second half of a two-part podcast on the very interesting and exciting technology of Vision Language Models. So what would you say if I told you that there are robots that can make independent decisions instead of simply following pre-programmed instructions. Well, as it turns out, this has been happening for about a decade. A next generation AI technology, VLM, is designed to augment traditional legacy vision systems by providing interpretive insights and active guidance, transforming factory automation, and empowering workers with real time expert knowledge. By combining visual data and natural language to analyze and understand visual scenes, reason and make decisions, the technology is starting to gain traction in a number of areas, including machine vision and robotics. With the technology becoming more scalable in recent times, VMs are starting to go beyond the lab and into the real world, operating successfully in manufacturing settings such as factory floors. Performing such functions as enabling robots to actually look at a complex, manufactured component Reason about what they see against learned expert behavior and documented standards, and make quality decisions autonomously. Intrigued? So are we. So we reached out to John Panigrahi, co-founder and CEO of Grid Raster, a Mountain View, California based company that specializes in spatial AI and extended reality.
What makes inspection a safer place to launch autonomy as opposed to somewhere in production?
Because I think the inspection, the volume of things, data that you're analyzing is always huge, right? You're looking for like each and every portion of it. It is tedious. It is laborious. Right. Inspection is. Yeah. So quality part is always that. And I think this is where you are able to get the maximum benefit with a system that can automate many of this over the humans doing it. And that's the only reason, right? Which makes it the most attractive because I think you're going to get the most value out of it. A lot of the operational settings, what happens that maybe that operation is on? There is one particular operation that is kind of happening, right? And in that case, it's pretty well defined in many ways, right. The the, the VMs come into really handy in a low mix, I'm sorry, high mix, low volume kind of environment. Let's suppose, you know, if you go to any of the big factories over here, like Tesla or anybody like you will have, okay, one robot, which will be kind of just doing the, the doors, you know. Right. Yeah. It's just does the doors. Okay. I mean, there is nothing that you, you could really make use of using all this, right? Right. But if I go to a depot environment, like in the same depot, they have to repair the aircraft comes in, it has a nose, it has a dome, it has a wing. Whole aircraft needs to be repaired. Right? You can't say a single robot is dedicated for only the wing. And and then you have different platforms. There are different aircrafts that are that are being used. So you have some time, the wings some time, the spa, sometime the radome, sometime the nose. You are allowing the same robot to be a be adaptive to this environment. This is where the VMs come in, right? They, they allow you to kind of adapt to these changes that is happening in real time. Okay. Provide you those necessary information, and then the robot can perform the job, whether it's grinding, sanding, painting, repainting, whatever needs to be done, right? So in those environments, VMs are extremely useful.
Okay. How do VMs interface with things like quality standards and language?
Uh, it's basically again, no, it is, it is actually end of the day, the inspection is not complete without those, right? So you basically, you are feeding in those tolerances and the, the Southern Pacific guidelines that what needs to be done and needs to be analyzed against. For example, let's suppose I want. I'm. I'm taking a. We. We do certain things in 3D as well, right? So you take the, the 3D understanding of it using like the headsets, right. Um, so essentially like the, the VMs are like visual models, right? They work on the images videos. Uh, so now video is the next stretch. Video is not in 2D in 3D. Like that's where we are using the headsets with the lidar camera, 3D depth sensing, uh, and all of that, right? Many times what happens is that you're kind of scanning, you're kind of scanning the assets, uh, to kind of understand. And that's in three dimension. You're kind of scanning that asset to understand that the, the tolerances, because when you manufacture something, there could be variation in, in, in the, you know, how the part kind of really comes out from how it has been defined right now. what? What is good enough needs to be evaluated, right? And that's where like you can define thresholds where the VLM understands and gives you the corresponding, you know, heat maps of what those variations are. And based on, let's suppose, you know, you define those thresholds and thresholds are provided through, um, you know, the textual, uh, you have the manuals, you have, you know, many ways to kind of provide that, uh, it can really generate also the heat map of where all those deviations are and what is good enough and define pass and fail as well.
Okay. And then to follow that up, are, uh, manufacturers getting comfortable with autonomous pass and fail judgment?
Uh, there are, there are scenarios where they need it to function one hundred percent. Okay. Either you're one hundred percent right or not. Right. And in those scenarios, obviously, I think there is still some distance to go. Okay. Uh, but many places they are able to automate, as I was saying, like measurement of the. So you can break down the whole process into multiple, uh, part, uh, like for example, if I'm doing an inspection and classification of a defect, uh, the VLMs is allowed, is allowing them to do many of this very, very confidently. And then basically then you're bringing in the human in the loop to just make, make a judgment on certain places that it looks good, but the collection of the data measurement of the data and all of those things has been automatically done by the system itself.
How did VLMs turn, uh, digital twins from static models into living systems that improve quality decisions? Is that just more learning and data input?
Yes. So because because you're bringing in the real time learning, uh, attaching it rather than it's just a, just a visual 3D model that you have created. But now you're not only able to bring all the sensor data, which earlier, even the sensor data were there, but the variations that is kind of happening in the environment, you can quickly, um, embed into those, uh, digital twins. And that basically then allows you to make it a living system rather than a static digital twin.
Yeah. Okay. What role does the twin play in traceability and audit requirements?
Uh, huge. Huge. Uh, because yeah, so just think of this, right? Uh, so there is the, just take an example of like the, some real life things that we are kind of doing right now. Like you have, uh, this whole aircraft that come for repair, right? And now you, you take, take the, the wing out of it. And basically now you're looking to repair that wing, right? As you. And what we do is basically not only that, with the, uh, using the, the camera and the lidar, we are able to just quickly take a scan of that physical asset, convert that into a 3D model. But also we are able to attach the metadata like, here is where I see the defects and I can attach it all those defects to exactly the location which is there on the physical asset. So I'm first, I'm kind of digitizing the whole thing into a twin and basically providing those specific damages everywhere that is there, right? Some one could be just can be visually seen and some damage are structural damages. For example, I may use a microwave to kind of detect where those damages are, right? I may use some surface data. We use a sensor sensor based sense, um, tools to detect whether there are some surface label problems that are there. Somewhere they use the eddy currents to figure that out. Now there are different mechanism that is also used to detecting where those damages are. Now as we detect those damages we can kind of take the damage how it looks and, and attach it to that physical, uh, the virtual model that we create, or the digital twin that we created of the physical asset. Now, as you do the repair, even the repair is captured and kind of into that twin. So you pretty much all always have a complete spatial log of that, um, you know, the defects, the repairs, the time, who did it, uh, metadata, all that information is there in that, that is what it kind of converts that from, uh, from an audit perspective, the whole traceability, what was done, what, um, who did it? Uh, how was the defect or what kind of defect? It's always available to you, uh, to comprehensive. Yes, absolutely. Um, well, I know you mentioned earlier the fears of, you know, some people have of these things eventually taking over human functions and human jobs and that sort of things.
Are there other misconceptions out there that are kind of keeping people leery of this at all, or is it moving ahead as it should?
Or I mean, anything new comes. I think we are as humans. I think we're just resistant to any change. Right? That's just given. Right. And specifically where, um, something that, uh, kind of begin to you feel threatened by sudden you were doing certain things now, now the system is able to do those things, right. Yeah, I think, I think the way I see it is always like, uh, people, uh, with, um, you know, the, with the, I see this as a tool basically, uh, people who are able to kind of utilize this available tool, there'll be more effective. And, um, in the long run, I think they would be the most desired. Uh, you know what I say, the, the workers or the operators that people will seek. It's similar to like, there was a time when I used, you know, we used to do the wrenching, like, you know, using the tools with our own force. Then came like all that torque and all. Then I may decide to still do it, but now I can do more without putting so much stress on my body. Right? Similar to that, it's just it also basically from a reasoning angle, many of the things it allows you to do much more easily. Uh, but yeah, as I said, the role definitions are bound to change because many of these things that can be now taken by care, by the system. But this is where your domain expertise and the, the operational expertise and all of this thing comes to the forefront. So basically five years from now, there will be certain areas where VMs are absolutely indispensable, but there will always be areas where people also be indispensable. Absolutely. I mean, people in general will be indispensable. I think in many ways, I mean, people may sound it it may sound a little weird, but the way I see it is a lot of this AI will help you kind of spend more time for us to be more human, right? The things that AI will never be able to do, right? Right. You know, uh, which I tend to agree with that. Yeah. Yeah. And yeah. And in, in, in, in many ways, in my real, in this, my day to day thing, there are many things that I have to do. There are activities I know it just needs, needs to be done right. But I would prefer to kind of spend more time with, with my kids, you know, just, you know, have a little more time for myself to do certain things. I believe that AI will allow me to kind of get some of the stuff which I have to pour in my hours to get it done, because it needs to be done, is allow me to give me those time that I can really spend spending with ten time, maybe spending a little more time on my wellness, just spending more time with my friends or maybe with the family. It is. It will open up a time for me, which today I have to kind of spend on other things. AI is definitely helping me do that better.
Okay. All right. Um, I think one last thing we can get into here if you've got a minute. Um, if a plant manager's already investing in spatial AI and digital twins, what's, what are some of the big practical questions they need to ask about VLMs?
Um, I think all the places, uh, what I say, the really first, first understanding that, uh, which is a scenario that you really can benefit in a big way, right? Many a times we get swayed by a new technology and try to apply it everywhere. And sometimes we know there's some things are not really made for it. Right. I think the, the places where there is a lot of visual, uh, that needs to be taken into consideration to kind of do the job. I think VMs will be extremely helpful in terms of making it much better, faster, and even the quality wise, right? So we need to kind of assess those use cases from those lenses before you make a judgment that where we where do you bring in those realms? Right. I think there is, I, I always kind of, even with a lot of customers, like we go, they see this, what we have the capabilities, why they get really excited. Oh, I can use it here. I can use it here. I can use it here. I said, I said the moment those conversations go to more than two, I feel that's going to fail for sure. Because because yeah, it's it's not it's not that the value is across the board and equally, right. So you start with where the real value is. And even sometimes in the pro, the whole use case, you may want to break down to a particular part of that particular process within that use case where it's an extremely good fit, right. And just first address that and rip and understand what those Rois and benefits coming out of it, and that will lay the ground for kind of taking it to other use cases. Understood. But let's start with the I would say one, but sometimes just to keep. Because there is always a difference, right? That okay, you know, somebody sees, okay, this is more attractive, or maybe it's more closer to a use case to them. So maybe two.
Well, that's a wrap for this episode of visions produced by Endeavor Business Media, a division of endeavor B2B. Thanks very much for tuning in. If you enjoyed today's show, be sure to subscribe to the podcast and share this episode with a colleague who would find it helpful. Until our next episode, you can find us at vision dash systems dot com or on LinkedIn, Facebook, or for more insights, updates, and breaking news to keep you in the know. Thanks for tuning in. Until next time, stay focused on your visions.
About the Author
Jim Tatum
Senior Editor
VSD Senior Editor Jim Tatum has more than 25 years experience in print and digital journalism, covering business/industry/economic development issues, regional and local government/regulatory issues, and more. In 2019, he transitioned from newspapers to business media full time, joining VSD in 2023.



