Boston Dynamics and Google DeepMind Teach Spot to Reason
<img src="https://spectrum.ieee.org/media-library/photo-of-yellow-boston-dynamics-robot-dog-using-its-arm-to-load-laundry-into-a-white-basket.png?id=65521323&width=1200&height=800&coordinates=150%2C0%2C150%2C0"/><br/><br/><p><span><strong></strong><strong></strong>The amazing and frustrating thing about robots is that they can do almost anything you want them to do, as long as you know how to ask properly. In the not-so-distant past, asking properly meant writing code, and while we’ve thankfully moved beyond that brittle constraint, there’s still an irritatingly inverse correlation between ease of use and complexity of task. </span></p><p><span>AI has promised to change that. The idea is that when AI is embodied within robots—giving AI software a physical presence in the world—those robots will be imbued with with reasoning and understanding. This is cutting-edge stuff, though, and while we’ve seen plenty of examples of embodied AI in a research context, finding applications where reasoning robots can provide reliable commercial value has not been easy. <a href="https://bostondynamics.com/" target="_blank">Boston Dynamics</a> is one of the few companies to commercially deploy legged robots at any appreciable scale; there are now several thousand hard at work. Today the company is <a href="https://bostondynamics.com/blog/tools-for-your-to-do-list-with-spot-and-gemini-robotics/" target="_blank">announcing</a> that its quadruped robot <a href="https://spectrum.ieee.org/tag/spot-robot" target="_self">Spot</a> is now equipped with <a href="https://deepmind.google/blog/gemini-robotics-er-1-6/">Google DeepMind’s Gemini Robotics-ER 1.6</a>, a <a href="https://spectrum.ieee.org/gemini-robotics" target="_blank">high-level embodied reasoning model</a> that brings usability and intelligence to complex tasks.</span></p><p class="shortcode-media shortcode-media-youtube"> <span class="rm-shortcode" data-rm-shortcode-id="155eddc016bd1bedcfb5b83c4b4a54c3" style="display:block;position:relative;padding-top:56.25%;"><iframe frameborder="0" height="auto" lazy-loadable="true" scrolling="no" src="https://www.youtube.com/embed/LP4-c5AK30g?rel=0" style="position:absolute;top:0;left:0;width:100%;height:100%;" width="100%"></iframe></span><small class="image-media media-photo-credit" placeholder="Add Photo Credit...">YouTube.com</small></p><p><span>Although this video shows Spot in a home context, the focus of this partnership is on one of the very few applications where legged robots have proven themselves to be commercially viable: inspection. That is, wandering around industrial facilities, checking to make sure that nothing is imminently exploding. With the new AI onboard, Spot is now able to autonomously look for dangerous debris or spills, read complex gauges and sight glasses, and call on tools like vision-language-action models when it needs help understanding what’s going on in the environment around it.</span></p><p>“Advances like Gemini Robotics ER 1.6 mark an important step toward robots that can better understand and operate in the physical world,” <a href="https://www.linkedin.com/in/marco-da-silva-447b72/" target="_blank">Marco da Silva</a>, Vice President and General Manager of Spot at Boston Dynamics, says <a href="https://bostondynamics.com/blog/aivi-learning-now-powered-google-gemini-robotics/" target="_blank">in a press release</a>. “Capabilities like instrument reading and more reliable task reasoning will enable Spot to see, understand, and react to real-world challenges completely autonomously.”</p><h2>Understanding Robot Understanding</h2><p>The words “reasoning” and “understanding” are being increasingly applied to AI and robotics, but as <a href="https://spectrum.ieee.org/humanoid-robots-gill-pratt-darpa" target="_self">Toyota Research Institute’s Gill Pratt recently pointed out</a>, what those words actually <em><em>mean</em></em> for robots in practice isn’t always clear. “The benchmark we measure ourselves against when it comes to understanding is that the system should answer the way a human would,” <a href="https://www.linkedin.com/in/carolinaparada/" target="_blank">Carolina Parada</a>, Head of Robotics at Google DeepMind, explained in an interview. For robots to reliably and safely perform tasks, this connection between how robots understand the world and how humans do is critical. Otherwise, there may be a disconnect between the instructions that a human gives a robot, and how the robot decides to carry out that task.</p><p>Boston Dynamics’ video above is a potentially messy example of this. One of the instructions to Spot was to “recycle any cans in the living room.” It has no problem completing the task, as the video shows, but in doing so it grips the can sideways, which is not going to end up well for cans that have leftover liquid in them. We humans would avoid this because we can draw on a lifetime of experience to know how cans should be held, but robots don’t (yet) have that kind of world knowledge.</p><p>Parada says that Gemini Robotics-ER 1.6 approaches situations like this from a safety perspective. “If you ask the robot to bring you a cup of water, it will reason not to place it on the edge of a table where it could fall. We track this using our <a href="https://asimov-benchmark.github.io/v1/" target="_blank">ASIMOV benchmark</a>, which includes a whole lot of natural language examples of things the robot should not do.” The current version of Spot doesn’t use these semantic safety models for manipulation, but the plan is to make future versions reason about holding objects in ways that are safe.</p><p class="shortcode-media shortcode-media-youtube" style="background-color: rgb(255, 255, 255);"><span class="rm-shortcode" data-rm-shortcode-id="5934a9a019325c2e996f3f0dab47b3c4" style="display:block;position:relative;padding-top:56.25%;"><iframe frameborder="0" height="auto" lazy-loadable="true" scrolling="no" src="https://www.youtube.com/embed/kBwxmlI2yHQ?rel=0" style="position:absolute;top:0;left:0;width:100%;height:100%;" width="100%"></iframe></span><small class="image-media media-photo-credit" placeholder="Add Photo Credit...">YouTube.com</small></p><p><span>There does still seem to be a disconnect between Gemini Robotics-ER 1.6 as a high-level reasoning model for a robot, and the robot itself as an interface with the physical world. One of the new features of 1.6 is </span><em><em>success detection</em></em><span>, which combines multiple camera angles to more reliably be able to tell when Spot has successfully grasped an object. This is great if you’re relying entirely on vision for your object interaction, but robots have all kinds of other well-established ways to detect a successful grasp, including touch sensors and force sensors, that 1.6 is not using. The reason why this is the case speaks to a fundamental problem that the robotics field is still trying to figure out: how to train models when you need physical data.</span></p><p><span>“At the moment, these models are strictly vision only,” Parada explains. “There is lots of [visual] information on the web about how to pick up a pen. If we had enough data with touch information, we could easily learn it, but there is not a lot of data with touch sensing on the internet.” Customers who use these new capabilities for inspection with Spot will be required to share their data with Boston Dynamics, which is where some of this data will come from.</span></p><h2>Real-World Robots That Are Useful</h2><p>The fact that Boston Dynamics <em><em>has </em></em>customers makes them something of an anomaly when it comes to legged robots that rely on AI in commercial deployments. And those customers will have to be able to trust the robot—<a href="https://spectrum.ieee.org/ai-hallucination" target="_self">always a problem when AI is involved</a>. “We take this very seriously,” da Silva said in an interview. “We roll out new DeepMind capabilities through beta programs to a smaller set of customers to understand what to anticipate, and we only actively advertise features we are confident will work.” There’s a threshold of usefulness that robots like Spot need to reach, and fortunately, the real world doesn’t demand perfection. “Most critical infrastructure in a facility will be instrumented to tell you whether something is wrong,” da Silva says. “But there is a lot of stuff that is not instrumented that can still cause a problem if you aren’t paying attention to it. We’ve found that somewhere north of 80 percent is the threshold where it’s not annoying. Below that, basically the robot is crying wolf, and the operators will start ignoring it.”</p><p><span></span><span>Both da Silva and Parada agree that there’s still plenty of room for improvement in robotic inspection. As Parada points out, Spot’s rarefied status as a scalable commercial platform provides a valuable opportunity to learn how models like Gemini Robotics-ER 1.6 can be the most useful, and then apply that knowledge to other embodied AI platforms, including </span><a href="https://spectrum.ieee.org/boston-dynamics-atlas-scott-kuindersma" target="_self">Boston Dynamics’ Atlas</a><span>. Does that mean that Atlas is going to be the next industrial inspection robot? Probably not. But if this real-world experience can get us closer to safe and reliable robots that can pick up laundry, take a dog for a walk, and clear away soda cans without making a mess, that’s something we can all get excited about.</span></p>
- Published
- Apr 14, 2026, 7:45 PM
- Source
- IEEE Spectrum AI