AGIBOT holds World Problem 2026 to see how AI fashions carry out on actual duties

Participants in the challenge tested and debugged robots working on different tasks. | Source: AGIBOT

AGIBOT Innovation Technology Co. last week hosted the AGIBOT World Challenge 2026 alongside ICRA 2026 in Vienna. The company brought together 526 research and enterprise teams from 27 countries to compete across two embodied AI tracks: “Reasoning to Action” and “World Model.”

Shanghai-based AGIBOT said the competition highlighted a key shift in how embodied AI is evaluated. The company said it showed that the industry is moving beyond simulation scores toward closed-loop testing on real robots, real tasks, and standardized benchmarks.

The competition adopted a benchmark-driven format that combined online automated evaluation with an offline real-robot final in Vienna. With AGIBOT’s EWMBench and Genie Sim Benchmark, the consistent framework enabled automated testing, standardized metrics, and reproducible results.

During the offline final, finalist teams completed tasks using the AGIBOT G2 humanoid robot. By incorporating real-robot validation into the evaluation process, the competition placed robot stability, real-world adaptability, and long-horizon task reliability at the center of the scoring system. The company, also known as Zhiyuan Robotics Co., said this more closely aligns technical evaluation with practical deployment needs.

The challenge drew research and industry teams from leading institutions and companies, including the Chinese Academy of Sciences, Tsinghua University, the University of Science and Technology of China, the University of California San Diego, Russia’s Sber Robotics Center, Alibaba, Amap, and vivo. More than 100 teams surpassed the official baseline.

What’s the difference between the R2A and WM tracks?

The two tracks at the AGIBOT World Challenge 2026 reflected the broader evolution of embodied AI from task execution toward understanding, prediction, and decision-making, according to AGIBOT.

The Reasoning to Action (R2A) track evaluated how robots understand tasks, plan actions, and execute them in physical environments. The R2A track, upgraded from the 2025 Manipulation track, expanded the evaluation from action execution to the full process of environment understanding, task planning, and physical execution.

The World Model (WM) track focused on how AI systems predict physical-world changes and model interactions based on robot actions and sensor inputs.

Teams trained reasoning-and-manipulation models using the AGIBOT WORLD open-source dataset and evaluated them through Genie Sim 3.0, with the benchmark covering language understanding, spatial reasoning, atomic skills, disturbance adaptation, and zero-shot transfer.

In the final ranking, PrismBot from vivo won the championship with 43.47 points, followed by Shanghai RoboParty’s RP-VLA with 35.66 points and Russia’s GreenVLA with 33.19 points.

AGIBOT targets supermarket tasks with the challenge

Alongside the competition, AGIBOT and Dexmal launched a supermarket benchmark track focused on end-to-end decision-making and whole-body control. This track incorporated non-ideal physical interactions, including object drops and grasping failures, to better reflect the complexity of real-world interaction and provide a more practical evaluation framework for world model research.

Set in a realistic retail environment, the track required models to complete the full mobile manipulation process, from autonomous navigation and item picking to item transport and placement, under physical constraints such as shelf height limits and randomized item placement. Through API-based remote control, participants’ algorithms directly controlled real robots, creating a practical benchmark for evaluating embodied intelligence in deployment-oriented scenarios.

In the World Model (WM) track, NeoVerse-ABot, a joint team from the Institute of Automation of the Chinese Academy of Sciences, and Amap CV Lab, won first place. The PAI@IAII team from the Institute of Industrial Artificial Intelligence at the Chinese Academy of Sciences, ranked second. The Loop team from the University of Science and Technology of China placed third.

With the World Challenge, AGIBOT hoped to contribute to a more practical and reproducible evaluation framework for embodied AI. | Source: AGIBOT

AGIBOT releases full-stack toolchain for robot validation

Beyond the competition itself, AGIBOT opened a full-stack toolchain covering real-world data, simulation evaluation, and real-robot testing. The toolchain included the AGIBOT WORLD open-source dataset, Genie Sim 3.0, and the AGIBOT G2 robot platform, helping developers validate models across the path from training to simulation and physical deployment.

EWMBench and Genie Sim Benchmark supported standardized metrics, automated evaluation, and comparable results across simulation and physical testing. They addressed common challenges such as inconsistent evaluation criteria and the gap between simulated performance and real-world deployment.

AGIBOT said it will integrate the technical and ecosystem resources developed through the competition with its ongoing benchmark development and open-source efforts. The company also plans to launch an online simulation leaderboard, introduce more test tasks and diversified benchmarks, and support more comprehensive quantitative evaluation of model capabilities.

In addition, AGIBOT said it will continue to refine its benchmarks and full-stack toolchain, working with global research institutions, developers, and industry partners. Its stated goal is to help embodied AI move from individual algorithmic advances toward systems that can be deployed and scaled in real-world settings.

In other benchmark news, Fraunhofer IPA last month offered a new test benchmark for humanoid robots, and NIST proposed its own baseline performance benchmark for humanoids.

Submit your session idea for the 2026 RoboBusiness

Source link