DeepMind's "Gato" is humble, so why did they build it?  |  ZDNet

DeepMind’s “Gato” is humble, so why did they build it? | ZDNet


DeepMind’s “Gato” neural network excels at many tasks including controlling robot arms that stack blocks, playing Atari 2600 games, and annotating images.

deep mind

The world is used to seeing headlines about the latest breakthrough in deep learning forms of artificial intelligence. However, the latest achievement of Google’s DeepMind division can be summed up as “a single AI program that does a lot of things.”

Gato, as DeepMind is called, was unveiled this week as a multimedia program that can play video games, chat, write compositions, comment on photos, and control robotic arm stacking blocks. It is a single neural network that can work with multiple types of data to perform multiple types of tasks.

“Using one set of weights, Gato can engage in dialogue, annotate images, stack blocks with a real robot arm, outsmart humans at playing Atari games, navigate 3D simulated environments, follow instructions, and more,” lead author Scott Reed wrote. and colleagues in their paper, “General Proxy,” on the Arxiv preprint server.

DeepMind co-founder Demis Hassabis exclaimed to the team, shouting in a tweet“Our most public agent to date!! Great job from the team!”

also: New experiment: Does AI really know cats or dogs – or something else?

The only catch is that Gato isn’t actually great at many tasks.

On the other hand, the software is able to do a better job than custom machine learning software at controlling Sawyer’s robotic arm that stacks blocks. On the other hand, it produces photo captions that are in many cases very poor quality. Likewise, her ability in a standard chat dialogue with a human interlocutor is mediocre, sometimes giving rise to contradictory and illogical statements.

It also has less gameplay for the Atari 2600 video games than most dedicated machine learning programs designed to compete in the standard arcade learning environment.

Why make a program that does some things well and a bunch of other not so good? Antecedent and anticipation according to the authors.

There is precedent for more general types of software becoming state-of-the-art in artificial intelligence, and there is an expectation that increasing amounts of computing power will in the future make up for the shortcomings.

Generalism can tend to win in AI. As the authors note, citing artificial intelligence scientist Richard Sutton, “Historically, general models that are better at making use of computation also tend to eventually override industry-specific methods.”

As Sutton writes in his own blog, “The biggest lesson to read from 70 years of AI research is that general methods that take advantage of computation are ultimately the most effective and by a large margin.”

In a formal thesis, Reed and his team wrote: “Here we test the hypothesis that training an agent generally capable of a large number of tasks is possible; and that such a general agent can be adapted with little additional data to succeed on a larger number of tasks.”

also: The brilliant LeCun in Meta AI explores the limits of deep learning power

The form, in this case, is, in fact, very general. It is a copy of Transformer, the dominant type of attention-based model that has become the basis for many programs including GPT-3. The transformer models the probability of an element by looking at the elements that surround it, such as words in a sentence.

In Gato’s case, DeepMind scientists can use the same conditional probability search on many types of data.

As Reed and colleagues describe the task of training Gatto,

During the Gato training phase, data from different tasks and methods is sequenced into a fixed sequence of symbols, assembled and processed by neural network adapters similar to a large language model. The loss is masked so that Gato only predicts business and text goals.

In other words, Gato does not treat tokens differently whether they are words in a conversation or motion vectors in a block stacking exercise. It’s all the same.


Gattu training script.

Reed et al 2022

Buried within Reed and his team’s corollary hypothesis, that more and more computing power will eventually win. At the moment, Gato is limited by the response time of the arm of the Sawyer robot stacking blocks. At 1.18 billion network parameters, Gato is much smaller than very large AI models like GPT-3. As the scale of deep learning models increases, inference performance leads to a latency that can fail in the non-deterministic world of a real-world bot.

However, Reed and his colleagues expect that limit will be exceeded as AI devices get faster at processing.

“We are focusing our training on a point-to-scale operating model that allows real-time control of robots in the real world, currently around 1.2B of parameters in the Gato case,” they wrote. “As device and model structures improve, this operating point will naturally increase the size of the possible model, pushing the generic models up the expansion law curve.”

Hence, Gato is really a model of how Scale Computing continues to be the main vector of machine learning development, by making generic models bigger and bigger. In other words, bigger is better.


Gato improves as the size of the neural network increases in parameters.

Reed et al 2022

The authors have some evidence for this. Gato seems to be getting better as it gets bigger. They compare average scores across all standard tasks for three sizes of the model at the criteria, 79 million, 364 million, and the main model, 1.18 billion. “We can see that for the number of equivalent tokens, there is a significant improvement in performance with increasing size,” the authors wrote.

An interesting future question is whether specialized software is more dangerous than other types of AI software. The authors spend a significant amount of research time discussing the fact that there are potential risks that are not yet well understood.

The idea of ​​a program that handles multiple tasks for the average person suggests some kind of human adaptability, but that could be a dangerous misconception. “For example, physical embodiment may lead to users avataring a proxy, resulting in misplaced trust in the event of a broken system, or it can be exploited by bad actors,” Reed and his team wrote.

Additionally, while transfer of knowledge across domains is often a goal in ML research, it can lead to unexpected and undesirable outcomes if certain behaviors (such as arcade fighting) are shifted into the wrong context.

Hence, they wrote, “Ethics and safety considerations for knowledge transfer may require substantial new research as public systems advance.”

(As an interesting side note, Gato’s paper uses a chart to describe risks devised by former Google AI researcher Margaret Michell and her colleagues, called Model Cards. Model Cards provide a brief summary of what the AI ​​program is, what it does, and what factors influence How it works. Michelle wrote last year that she was forced out of Google to support her former colleague, Timnit Gebru, whose ethical concerns about AI conflict with Google’s leadership in AI.)

Jato is by no means unique in its generalizing tendency. It’s part of the general trend of mainstreaming, and larger models use buckets of horsepower. The world got its first taste of Google’s tilt in this direction last summer, with Google’s “Perceiver” neural network that combined the tasks of a text, image, audio, and LiDAR spatial coordinate converter.

also: Google Supermodel: DeepMind Perceiver is a step on the way to an AI machine that can process anything and everything

Among its peers is PaLM, the Pathways Language Model, introduced this year by Google scientists, which is a model of 540 billion parameters that uses new technology to orchestrate thousands of chips, known as Pathways, also invented by Google. A neural network released in January by Meta, called “data2vec,” uses transducers for image data, sound waveforms for speech, and text language representations all in one.

What’s new in Gato seems to be the intent to use AI for non-robotic tasks and push it into the world of robotics.

The creators of Gato, who point to the achievements of Pathways, and other generic approaches, see the ultimate achievement in artificial intelligence that can operate in the real world, with any kind of task.

“Future work should consider how to unify these script capabilities into a complete generic agent that can also operate in real time in the real world, in diverse environments and incarnations.”

You can then consider Gato as an important step on the way to solving the most difficult AI problem, which is robotics.

2022-05-14 13:22:00

Leave a Comment

Your email address will not be published.