Google DeepMind’s new AI can follow commands inside 3D games it hasn’t seen before


presented new research highlighting an AI agent capable of performing previously unseen tasks in 3D games. The team has been experimenting with AI models that can win likes for a long time and learn chess and even games . Now, for the first time, according to DeepMind, an artificial intelligence agent has shown that it can understand large game worlds and perform tasks within them based on natural language instructions.

The researchers teamed up with studios and publishers like Hello Games (), Tuxedo Laboratories () and Coffee Stain ( and ) to train a Scalable Instructable Multiworld Agent (SIMA) in nine games. The team also used four research environments, including one built in Unity where agents were tasked with creating sculptures using building blocks. This gave SIMA, described as a “generic artificial intelligence agent for 3D virtual settings”, a range of environments and settings to learn from with different graphical styles and perspectives (first and third person).

“Each game in SIMA’s portfolio opens up a new interactive world, including a range of skills to learn, from simple navigation and menu use to extracting resources, flying a spaceship or crafting a helmet,” the researchers wrote in a blog post. Learning to follow instructions for such tasks in video game worlds could lead to more useful AI agents in any environment, they noted.

A flowchart detailing how Google DeepMind trained the SIMA AI agent.  The team used gameplay video and adapted it to keyboard and mouse inputs for the AI ​​to learn.A flowchart detailing how Google DeepMind trained the SIMA AI agent.  The team used gameplay video and adapted it to keyboard and mouse inputs for the AI ​​to learn.

Google DeepMind

The researchers recorded people playing the game and recorded the keyboard and mouse inputs used to perform the actions. They used this information to train SIMA, a “video model that accurately maps visual language and predicts what will happen next on the screen.” Artificial intelligence can perceive a range of environments and perform tasks to achieve a specific goal.

The researchers say SIMA doesn’t need the game’s source code or API access — it works on commercial versions of the game. It also requires only two inputs: what is displayed on the screen and the user’s directions. Because it uses the same keyboard and mouse input method as humans, DeepMind claims SIMA can work in almost any virtual environment.

The agent is evaluated on hundreds of basic skills that can be performed in 10 seconds or more across several categories, including menu-based tasks such as navigation (“turn right”), object interaction (“pick up mushrooms”), and booting. prepare a map or an item. Ultimately, DeepMind hopes to be able to order agents to perform more complex and multi-step tasks based on natural language instructions, such as “find resources and build a camp.”

In terms of performance, SIMA performed well across a number of training benchmarks. The researchers trained the agent in a game (say Goat simulator 3, for clarity’s sake) and using it as a basis for performance, I took the same title to play. In all nine games, the trained SIMA agent outperformed the merely trained agent Goat simulator 3.

Chart showing the relative performance of Google DeepMind's SIMA AI agent based on different training data.Chart showing the relative performance of Google DeepMind's SIMA AI agent based on different training data.

Google DeepMind

Particularly interesting is that a version of SIMA that was trained on the other eight games, then a version that played another one, only performed as well on average as the trained agent on the last one. “The ability to operate in novel environments highlights SIMA’s ability to generalize beyond training,” DeepMind said. “This is a promising preliminary result, but more research is needed to make SIMA perform at a human level in both visible and invisible games.”

For SIMA to be truly successful, language input is required. In tests where an agent was not given language training or instruction, it performed the action of gathering common resources instead of (for example) walking where it was told. In such cases, SIMA “behaves appropriately but aimlessly,” the researchers said. So we are not only mortal. AI models sometimes need a little nudge to get things right.

DeepMind notes that this is preliminary research and that the results “show the potential to develop a new wave of generic, language-based AI agents.” The team expects the AI ​​to become more versatile and generalizable as it is exposed to more training environments. The researchers hope that future versions of the agent will improve SIMA’s understanding and ability to perform more complex tasks. “As a result, our research builds on more general AI systems and agents that can understand and safely perform a wide range of tasks online and in the real world in a way that is useful to humans,” DeepMind said.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *