Aligning AI systems with human intent ｜ OpenAI

Video

Transcript

Today, we can use AI to write poetry

compose music

write computer programs

Very quickly, we can end up in a place where machines are far more capable

than us in science, and they can help us solve very hard scientific problems

that humans are not capable of solving themselves.

So what’s really interesting is as AI systems get more capable,

they don’t automatically become better at doing what humans want.

In fact, sometimes they become less inclined to following human intentions.

This is what we call the Alignment Problem.

I think solving this problem is of critical importance if we want

life on Earth to go well.

Like humans, when machines learn, they make mistakes.

And so the question is – How do we prevent machines

from making mistakes that have significant consequences?

Even seemingly obvious values like telling the truth,

the system actually has to be incentivized to do.

It has to want to tell you the truth.

Even today, we can’t peer into the depths of a neural net

and understand what’s happening inside the mind of the machine.

So how do we make sure that the system actually acts in accordance

with human intentions and in accordance with human values?

For the first time in the history of AI, we have these very powerful

large language models like GPT-3 that has such linguistic competence

that sometimes it’s indistinguishable from what humans can produce.

But technically, it’s not a trivial problem

to figure out how to get these machines to do the things that we want them to do.

So for example, if you ask GPT-3,

“Please explain the moon landing to five-year-old.”

It will try to guess what the pattern is and might say something like,

“How do you explain the concept of infinity to a five year old?”

“Explain humor, comedy, parody to a five-year-old” and so on.

“Where do babies come from?” “What is war?”

And so it’s trying to guess the pattern of what we’re getting at.

But that’s not actually what you wanted.

You wanted an actual explanation.

And so we have to align GPT-3 to follow instructions.

And we do that by designing systems that learn from human feedback.

As a first step, we show the model what it means to follow instructions.

And so we have our researchers provide a bunch of demonstrations

of questions and answers.

And then as a second step, we have a human look at a bunch of responses and say,

“I like this one better than that one,” and so on.

And little by little, the system learns to follow instructions as humans want.

On July 20, 1969

two astronauts did something no one had ever done before.

So using human feedback, we can align the system to follow instructions.

They went to the moon!

And that makes it more useful, more reliable, and more trustworthy.

Then you end up with a collaboration between humans and AI.

We teach AI our individual values.

And AI helps us in turn by living better, more fulfilling lives.

AI is going to play a larger and larger role in our lives.

And that raises the question, “Where are we going with all this

and what’s going to happen in the future?”

As these systems become more powerful,

alignment will become even more critical.

It’s probable that AI systems will become part of everyday life.

And the key is to ensure that these machines

are aligned with human intentions and human values.