A Minecraft bot created by OpenAI trained on 70,000 hours of video from people playing the popular PC game. This is an example of a powerful new technique that can be used to train machines to perform a wide range of tasks, using sites such as YouTube, a vast and untapped source of training data.
Minecraft AI has learned how to perform complex sequences of key and mouse presses to complete in-game tasks such as chopping trees and crafting tools. This is the first bot for Minecraft that can create so-called diamond tools – a task that usually takes good players 20 minutes of clicks – or about 24,000 actions.
This result was made possible thanks to a breakthrough in a method known as imitation learning, in which neural networks are trained to perform tasks by observing how people do them. Imitation learning can be used to teach AI how to operate robotic arms, drive cars, or navigate web pages.
There are a huge number of videos on the Internet in which people perform different tasks. Using this resource, the researchers hope to do for imitation learning what GPT-3 did for large language models. “Over the past few years, we have seen the rise of this GPT-3 paradigm , where we see amazing opportunities coming from large models trained on huge data from the Internet,” says OpenAI’s Bowen Baker, one of the developers who created the new bot for Minecraft . “A lot of that has to do with the fact that we’re modeling what people do when they go online.”
The problem with existing imitation learning approaches is that video demonstrations must be marked up at each step—doing this action leads to this, that to that, and so on. Annotating manually in this way requires a lot of work, so these datasets tend to be small. Baker and his colleagues wanted to find a way to turn the millions of videos available online into a new data set.
The team’s approach, called Video Pre-Training (VPT), circumvents the bottleneck in imitation training by training another neural network to automatically label videos. The researchers first hired crowdworkers to play Minecraft and recorded their keyboard and mouse clicks along with video from their screens. This gave them 2,000 hours of Minecraft play with annotations, which they used to train the model to match actions to on-screen results. Clicking the mouse button in a certain situation causes the character, for example, to swing an axe.
The next step was to use this model to create action labels for 70,000 hours of unlabeled video taken from the Internet, and then train a Minecraft bot on this large dataset.
“Video is a learning resource with great potential,” says Peter Stone, CEO of Sony AI America, who has previously worked on imitation learning.
Imitation learning is an alternative to reinforcement learning, in which a neural network learns to perform a task from scratch through trial and error. This is the method behind many of the biggest breakthroughs in artificial intelligence over the past few years. It has been used to train models that can beat people in games, control a fusion reactor, and discover faster ways to perform fundamental calculations.
The problem is that reinforcement learning works best for tasks that have a clear goal, where random actions can lead to random success. Reinforcement learning algorithms reward these random successes to make them more likely to repeat.
But Minecraft is a game without a clear goal. Players can do what they like: roam the computer-generated world, mine various materials and combine them to create different items.
The openness of Minecraft makes it a good learning environment for AI. Baker was one of the researchers behind Hide & Seek, a project that released bots into a virtual playground where they used reinforcement learning to figure out how to cooperate and use tools to win simple games. But the bots soon outgrew their surroundings. “Agents kind of took over the universe; they had nothing else to do,” Baker says. “We wanted to expand it and we thought Minecraft was a great area to work in.”
Minecraft is becoming an important testing ground for new methods of artificial intelligence. MineDojo, a Minecraft environment with dozens of pre-designed challenges, won an award at this year’s NeurIPS, one of the largest artificial intelligence conferences.
Using the VPT, the OpenAI bot was able to perform tasks that would not be possible with reinforcement learning alone, such as creating boards and turning them into a table, which includes about 970 sequential actions. However, the team found that the best results came from using simulation learning and reinforcement learning together. By taking a bot trained with VPT and refining it with reinforcement learning, it was able to perform tasks involving over 20,000 sequential actions.
The researchers claim that their approach can be used to train AI to perform other tasks. First of all, for bots that use the keyboard and mouse to navigate websites – booking airline tickets or buying groceries online. But theoretically, it could be used to train robots to perform physical tasks in the real world by copying first-person videos of people doing these things.
Mobile App Development Best Practices – 03.10
iOS MetaCodable – Supercharge Swift’s Codable implementations with macros meta-programming How to build a Tuist plugin and publish it using...
How to make and use BOM (Bill of Materials) dependencies in Android projects
By using a BOM dependency, you can avoid specifying the versions of each individual library in your app, and let...
Telegram turns 10 years old and revenues stagnate
Telegram seems to want to grow not only through messaging but also through communities, which pretty much means it wants...
MetaCodable – Supercharge Swift’s Codable implementations with macros meta-programming
Supercharge Swift‘s Codable implementations with macros. Overview MetaCodable framework exposes custom macros which can be used to generate dynamic Codable implementations. The core of the framework...
How to get started with Swift Concurrency 🧵 (Beginner Tutorial)
Swift has built-in support for writing asynchronous and parallel code in a structured way. Asynchronous code can be suspended and resumed later,...
Mobile App Development Best Practices – 02.10
Data.ai has summarized the interim results of the year – and once again we have a record. Annual consumer spending...