Video
Transcript
Today, we can use AI to write poetry
compose music
write computer programs
Very quickly, we can end up in a place where machines are far more capable
than us in science, and they can help us solve very hard scientific problems
that humans are not capable of solving themselves.
So what’s really interesting is as AI systems get more capable,
they don’t automatically become better at doing what humans want.
In fact, sometimes they become less inclined to following human intentions.
This is what we call the Alignment Problem.
I think solving this problem is of critical importance if we want
life on Earth to go well.
Like humans, when machines learn, they make mistakes.
And so the question is – How do we prevent machines
from making mistakes that have significant consequences?
Even seemingly obvious values like telling the truth,
the system actually has to be incentivized to do.
It has to want to tell you the truth.
Even today, we can’t peer into the depths of a neural net
and understand what’s happening inside the mind of the machine.
So how do we make sure that the system actually acts in accordance
with human intentions and in accordance with human values?
For the first time in the history of AI, we have these very powerful
large language models like GPT-3 that has such linguistic competence
that sometimes it’s indistinguishable from what humans can produce.
But technically, it’s not a trivial problem
to figure out how to get these machines to do the things that we want them to do.
So for example, if you ask GPT-3,
“Please explain the moon landing to five-year-old.”
It will try to guess what the pattern is and might say something like,
“How do you explain the concept of infinity to a five year old?”
“Explain humor, comedy, parody to a five-year-old” and so on.
“Where do babies come from?” “What is war?”
And so it’s trying to guess the pattern of what we’re getting at.
But that’s not actually what you wanted.
You wanted an actual explanation.
And so we have to align GPT-3 to follow instructions.
And we do that by designing systems that learn from human feedback.
As a first step, we show the model what it means to follow instructions.
And so we have our researchers provide a bunch of demonstrations
of questions and answers.
And then as a second step, we have a human look at a bunch of responses and say,
“I like this one better than that one,” and so on.
And little by little, the system learns to follow instructions as humans want.
On July 20, 1969
two astronauts did something no one had ever done before.
So using human feedback, we can align the system to follow instructions.
They went to the moon!
And that makes it more useful, more reliable, and more trustworthy.
Then you end up with a collaboration between humans and AI.
We teach AI our individual values.
And AI helps us in turn by living better, more fulfilling lives.
AI is going to play a larger and larger role in our lives.
And that raises the question, “Where are we going with all this
and what’s going to happen in the future?”
As these systems become more powerful,
alignment will become even more critical.
It’s probable that AI systems will become part of everyday life.
And the key is to ensure that these machines
are aligned with human intentions and human values.