The AI Institute introduces Theia imaginative and prescient basis mannequin to enhance robotic studying

Listen to this article

In the field of robotics, vision-based learning systems are a promising strategy for enabling machines to interpret and interact with their environment, said the AI Institute today. It introduced the Theia vision foundation model to facilitate robot training.

Vision-based learning systems must provide robust representations of the world, allowing robots to understand and respond to their surroundings, said the AI Institute. Traditional approaches typically focus on single-task models—such as classification, segmentation, or object detection—which individually do not encapsulate the diverse understanding of a scene required for robot learning.

This shortcoming highlights the need for a more holistic solution capable of interpreting a broad spectrum of visual cues efficiently, said the Cambridge, Mass.-based institutewhich is developing Theia to address this gap.

In a paper published in the Conference on Robot Learning (CoRL), the AI Institute introduced Theia, a model that is designed to distill the expertise of multiple off-the-shelf vision foundation models (VFMs) into a single model. By combining the strengths of multiple different VFMs, each trained for a specific visual task, Theia generates a richer, unified visual representation that can be used to improve robot learning performance.

Robot policies trained using Theia’s encoder achieved a higher average task success rate of 80.97% when evaluated against 12 robot simulation tasks, a statistically significant improvement over other representation choices.

Furthermore, in real robot experiments, where the institute used behavior cloning to learn robot policies across four multi-step tasks, the trained policy success rate using Theia was on average 15 percentage points higher than policies trained using the next-best representation.

Robot control policies trained with Theia outperform policies trained with alternative representations on MuJoCo robot simulation tasks, with much less computation, measured by the number of Multiply-Accumulate operations in billions (MACs). Source: The AI Institute

Theia designed to combine visual models

Theia’s design is based on a distillation process that integrates the strengths of multiple VFMs such as CLIP (vision language), DINOv2 (dense visual correspondence), and ViT (classification), among others. By carefully selecting and combining these models, Theia is able to produce robust visual representations that can improve downstream robot learning performance, said the AI Institute.

At its core, Theia consists of a visual encoder (backbone) and a set of feature translators, which work in tandem to incorporate the knowledge from multiple VFMs into a unified model. The visual encoder generates latent representations that capture diverse visual insights.

These representations are then processed by the feature translators, which refine them by comparing the output features against ground truth. This comparison serves as a supervisory signal, optimizing Theia’s latent representations to enhance their diversity and accuracy.

These optimized latent representations are subsequently used to fine-tune policy learning models, enabling robots to perform a wide range of tasks with greater accuracy.

Theia’s design is based on a process that distills the strengths of multiple VFMs, including CLIP, SAM, DINOv2, Depth-Anything, and ViT, among others. Source: The AI Institute

Robots learn in the lab

Researchers at the AI Institute tested Theia in simulation and on a number of robot platforms, including Boston Dynamics‘ Spot and a WidowX robot arm. For one of the rounds of lab testing, it used Theia to train a policy enabling a robot to open a small microwave, place toy food inside, and close the microwave door.

Previously, researchers would have needed to combine all the VFMs, which is slow and computationally expensive, or select which VFM to use to represent the scene in front of the robot. For example, they could choose a segmentation image from a segmentation model, a depth image from a depth model, or a text class name from an image classification model. Each provided different types and granularity of information about the scene.

Generally, a single VFM might work well for a single task with known objects but might not be the right choice for other tasks or other robots.

With Theia, the same image from the robot can be fed through the encoder to generate a single representation with all the key information. That representation can then be input into Theia’s segmentation decoder to output a segmentation image. The same representation can be input into Theia’s depth decoder to output a depth image, and so on.

Each decoder uses the same representation as input because the shared representation possesses the information required to generate all the outputs from the original VFMs. This streamlines the training process and making actions transferable to a broader range of situations, said the researchers.

While it sounds easy for a person, the microwaving task represents a more complex behavior because it requires successful completion of multiple steps: picking up the object, placing it into the microwave, and closing the microwave door. The policy trained with Theia is among the top performers for each of these steps, comparable only to E-RADIO, another approach which also combines multiple VFMs, although not specifically for robotics applications.

Researchers used Theia to train a policy enabling a robot arm to microwave various types of toy food. Source: The AI Institute

Theia prioritizes efficiency

One of Theia’s main advantages over other VFMs is its efficiency, said the AI Institute. Training Theia requires about 150 GPU hours on datasets like ImageNet, reducing the computational resources needed compared to other models.

This high efficiency does not come at the expense of performance, making Theia a practical choice for both research and application. With a smaller model size and reduced need for training data, Theia conserves computational resources during both the training and fine-tuning processes.

AI Institute sees transformation in robot learning

Theia enables robots to learn and adapt more quickly and effectively by refining knowledge from multiple vision models into compact representations for classification, segmentation, depth prediction, and other modalities.

While there is still much work to be done before reaching a 100% success rate on complex robotics tasks using Theia or other VFMs, Theia makes progress toward this goal while using less training data and fewer computational resources.

The AI Institute invited researchers and developers to explore Theia and further evaluate its capabilities to improve how robots learn and interpret their environments.

“We’re excited to see how Theia can contribute to both academic research and practical applications in robotics,” it said. Visit the AI Institute’s project page and demo page to learn more about Theia.

Apply to speak.

Source link

The AI Institute introduces Theia imaginative and prescient basis mannequin to enhance robotic studying

Theia designed to combine visual models

Robots learn in the lab

Theia prioritizes efficiency

AI Institute sees transformation in robot learning

The Obtain: how OpenAI assessments its fashions, and the ethics of uterus transplants

Cash-saving items that carry on giving: MoneySense’s 2024 vacation information

High Featured Objects in Pittsburgh’s Drinks Menu | Celebration Venues Pittsburgh | Occasions Venue | Banquet Halls

Concentrating on the Cybercrime Provide Chain

XRP Worth Outlook Brightens: Knowledgeable Predicts $2 Goal Submit-Gensler Period

Amazon US kickstarts Black Friday Week

Robotic Discuss Episode 99 – Joe Wolfel

AeroVironment buying BlueHalo for $4.1B to spice up protection tech

Pickle Robotic will get orders for over 30 unloading programs plus $50M in funding

What is the Distinction Between Automation and Robotics?

MC600 combines UR cobot with MiR base for cellular manipulation

Flexiv releases Parts Sequence 3 to simplify robotic simulation, programming

Leave a ReplyCancel reply

Adele Goes on Profanity-Laced Rant in Assist of Pleasure Month

IFA 2024 drives Copilot+ PC portfolio momentum

Kevin Costner Remembers ‘Area of Goals’ Costar James Earl Jones

Diddy Sued by Girl Who Claims He Raped Her After Tupac Homicide Remark

Will Beyonce’s Kamala Harris endorsement matter?

Apple nonetheless leads the worldwide pill market regardless of Amazon’s wonderful quarter

5 Tequila Shot Recipes to Attempt Proper Now | Celebration Venues | Occasion Venue

Why look forward to Black Friday? Sennheiser’s best-selling headphones are closely discounted proper now

The Obtain: how OpenAI assessments its fashions, and the ethics of uterus transplants

Cash-saving items that carry on giving: MoneySense’s 2024 vacation information

High Featured Objects in Pittsburgh’s Drinks Menu | Celebration Venues Pittsburgh | Occasions Venue | Banquet Halls

Concentrating on the Cybercrime Provide Chain

XRP Worth Outlook Brightens: Knowledgeable Predicts $2 Goal Submit-Gensler Period

Amazon US kickstarts Black Friday Week

Theia designed to combine visual models

Robots learn in the lab

Theia prioritizes efficiency

AI Institute sees transformation in robot learning

Leave a ReplyCancel reply

Log In

Sign In

Forgot password?

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections