Fraunhofer IPA has developed a benchmark to test humanoid robots such as the Unitree G1. Source: Fraunhofer IPA In recent years, humanoid robots have been omnipresent in the media and continue to fascinate us. On social media as well as in public spaces, they are guaranteed to attract attention. The technology is poised to take over tasks in areas where, due to demographic change, human labor will no longer be available in the future.
Yet there remains a significant information gap between media hype and actual abilities. While the robots are being tested in isolated, non-public pilot applications, questions persist for most potential users regarding what abilities the robots actually possess, how reliably they operate, and to what extent we humans can trust humanoids.
“For end users and manufacturers alike, it is therefore essential to look behind the facade sometimes constructed by marketing agencies,” explained Simon Schmidt , senior manager of the automated systems business unit at Fraunhofer. “The market is too volatile and opaque to allow for a well-founded assessment and reliable evaluation of humanoids for one’s own applications or in comparison to other models.” This is precisely why Fraunhofer IPA has developed a benchmark.
In this process, research teams from the institute’s automation division put humanoids through various challenges and evaluate the results. The neutral, third-party service is modular, allowing manufacturers, end users, or software providers to select the areas relevant to their application. Benchmark follows industrial standards The question of abilities and trustworthiness is broken down into six application-relevant criteria: Six test categories for humanoid robots.
Source: Fraunhofer IPA The benchmark draws on the research teams’ existing expertise and, where possible, follows established industrial standards that have been internationally recognized for decades. In the area of cleanliness, for example, Fraunhofer IPA has tested and qualified over 3,000 automation components according to ISO 14644 in recent years.
The measurement of collision forces and other safety -relevant properties is based on common safety standards for force- and power-limited robots such as ISO 10218 and ISO TS 15066 . The benchmark is based on master data such as the robots’ size, weight, and reach. It was also important to the research teams to define reproducible tests that can be standardized and remain meaningful for future generations of humanoids.
After all, more humanoids will follow. “With this tool, humanoids can be compared not only with one another but also with known automation components,” said Werner Kraus , head of the research division automation and robotics at Fraunhofer IPA. “Users can interpret the results directly and thus find the right humanoid for the right application.” Using the Unitree G1 as an example, Fraunhofer IPA itself put the humanoid through its paces with regard to its suitability for use in production and, in doing so, learned a great deal about the current limitations of the technology.
The technical basis was the Unitree G1 EDU-4 with Dex3-1 3-finger hands delivered in May 2025 with firmware Version 1.04. Editor’s note: The 2026 Robotics Summit & Expo this month in Boston will include a keynote and sessions on humanoid robots. Register now to attend. Six criteria for evaluating humanoid robots 1. Technology and basic abilities The technologies used in humanoid robots, such as sensors or AI models, allow direct conclusions to be drawn about precision and reliability.
Examining additional basic abilities enables detailed assessments of the humanoid’s technological potential and abilities. Among other things, the evaluation examines sensor technology (such as vision , audio, text recognition, speech recognition, human detection), manipulation abilities ( gripper type, number and mobility of fingers), as well as strength (handled loads, gripping forces), and walking speed.
An examination of the technologies is conducted by identifying the installed components and comparing them with data sheets. Tests enable the determination of basic abilities. A 3D tracking system from Vicon is used to determine walking speed. Gripping forces are measured using a force sensor. Dumbbells of varying weights are available to determine the maximum load that can be handled.
Using the G1 as an example, it became apparent, among other things, that its dexterity is still far below human levels. As delivered by the manufacturer, it can only walk via remote control. Users must implement additional basic abilities themselves. Extending the arms horizontally even without any additional load can cause them to shut down and drop after one to two minutes because the actuators overheat.
Measurements taken with the Vicon tracker determined that 0.49 m/s (1 mph) is the slow walking speed and 0.84 m/s (1.8 mph) is the fast walking speed. When carrying a 3-kg (6.6 lb.) payload, the robot does not slow down, but it takes a few tenths of a second longer to accelerate and decelerate. The results show that the humanoid cannot yet perform many tasks that humans can handle.
However, future software and hardware updates are expected to increase the number of pre-implemented basic abilities. 2. Complex abilities Building on the basic abilities, this section focuses on performing small generic tasks that require a combination of technologies and skills. The benchmark enables a comprehensive evaluation of humanoids across various task domains.
The tests can be categorized into whole-body movements (running, jumping, climbing, navigating ramps, standing up), manipulative skills (opening doors), and navigation through obstacle courses, as well as precision and force control, The influence of changing environmental conditions and additional loads on robot performance can also be measured.
Many of the designed tests could not yet be performed with the G1. For example, according to the manufacturer’s specifications, the robot is not suitable for climbing stairs. Complex obstacle courses requiring the robot to navigate are also not feasible with its onboard abilities. However, when walking over steps (cable duct) and on a slope with 20% inclination, the legged robot demonstrated good self-stabilization abilities.
It never lost its balance during the tests. For the G1, standing up from a supine position requires surfaces with sufficient friction, such as carpet. On smooth tile or hardwood floors, the arms may slip, resulting in a failure to push the robot in an upright position. The tests for complex abilities are intentionally designed to overwhelm current humanoids.
Only future models will be able to fully meet these benchmarks. This enables comparability of humanoids across multiple model generations. Furthermore, the tasks that cannot currently be performed clearly demonstrate the limits of current technology to potential users. 3. Cleanliness The cleanroom suitability benchmark examines whether humanoids and other automation components can be used in sensitive production environments such as the semiconductor, optical, electrical , pharmaceutical , biotechnology, and food industries without causing critical contamination.
Particle emission is evaluated at various points on the moving robot in accordance with ISO 14644-14 , outgassing behavior according to ISO 14644-15 , and cleanability and hygienic design according to current guidelines. The goal is to objectively determine the robot’s suitability for cleanroom use and, if necessary, identify areas for optimization in design or material selection.
Conducting the benchmark is challenging due to the complexity of humanoids and their diverse range of motion. Typical operating parameters must be defined and worst-case scenarios examined to obtain realistic results. The limited battery life must be considered when determining the test duration. Upon completion of the benchmark, customers receive a qualifi