Google DeepMind Integrates Gemini 1.5 Pro into Robots for Real-World Navigation

Vansh

8 months ago

Do you know? According to Google Deep Mind’s latest post, google uses Gemini AI 1.5 pro to train its robots to enhance robots’ ability to navigate real-world environments. It means it can take photos, and videos along with the text as input, and by processing that information it generates the response

HOW THEY TRAINED ROBOTS BY USING GEMINI:

In the latest post on Twitter and Instagram, Google DeepMind shared that it’s using the Gemini 1.5 Pro AI’s 1 million token context window to train its robots. This context window allows the AI to see and process a broad range of information related to a query.

For example, if someone asks the AI about the “best places to visit in Paris,” the AI will use the keywords Paris and places to gather relevant information. With a small context window, the AI might only list a few well-known tourist spots. But with a larger context window, the AI can analyze numerous sources to provide a more comprehensive list of popular attractions, including hidden gems and local favorites.

How can Gemini 1.5 Pro’s long context window help robots navigate the world? 🤖

A thread of our latest experiments. 🧵 pic.twitter.com/ZRQqQDEw98

— Google DeepMind (@GoogleDeepMind) July 11, 2024

By leveraging this advanced context, DeepMind is training its robots to understand better and follow human commands while using common sense. The robots can use video tours and various inputs to navigate their surroundings intelligently.

According to Google Deep Mind, the Company achieved 86%and 90% success rates when it tested in an office and in a home-like environment

DeepMind is using this extensive context window to train its robots in real-world settings. They want to see if robots can remember specific details about an environment and assist users based on vague or contextual questions. In a video shared on Instagram, and Twitter they demonstrated a robot guiding a user to a whiteboard when asked for a place to draw.

By utilizing Gemini 1.5 Pro’s large context window, DeepMind aims to improve the way robots navigate and assist in real-world environments, making them more effective and intuitive for everyday use.

Collaborative Technology:

Alongside the Gemini AI, DeepMind is also using its Robotic Transformer 2 (RT-2) model, which blends vision, language, and actions. This model learns from online information and hands-on experience with robots, making it better at understanding and interacting with the world.

DeepMind’s advancements in AI and robotics represent a major leap forward in building smarter machines that can help people with everyday tasks and navigate complex situations more effectively.