Lex Fridman Podcast - #9 - Stuart Russell: Long-Term Future of AI

🎁Amazon Prime 💗The Drop 📖Kindle Unlimited 🎧Audible Plus 🎵Amazon Music Unlimited 🌿iHerb 💰Binance

The following is a conversation with Stuart Russell. He’s a professor of computer science at

UC Berkeley and a coauthor of a book that introduced me and millions of other people

to the amazing world of AI called Artificial Intelligence, A Modern Approach. So it was an

honor for me to have this conversation as part of MIT course in artificial general intelligence

and the artificial intelligence podcast. If you enjoy it, please subscribe on YouTube,

iTunes or your podcast provider of choice, or simply connect with me on Twitter at Lex Friedman

spelled F R I D. And now here’s my conversation with Stuart Russell.

So you’ve mentioned in 1975 in high school, you’ve created one of your first AI programs

that play chess. Were you ever able to build a program that beat you at chess or another board

game? So my program never beat me at chess. I actually wrote the program at Imperial College.

So I used to take the bus every Wednesday with a box of cards this big and shove them into the

card reader. And they gave us eight seconds of CPU time. It took about five seconds to read the cards

in and compile the code. So we had three seconds of CPU time, which was enough to make one move,

you know, with a not very deep search. And then we would print that move out and then

we’d have to go to the back of the queue and wait to feed the cards in again.

How deep was the search? Are we talking about one move, two moves, three moves?

No, I think we got an eight move, a depth eight with alpha beta. And we had some tricks of our

own about move ordering and some pruning of the tree. But you were still able to beat that program?

Yeah, yeah. I was a reasonable chess player in my youth. I did an Othello program and a

backgammon program. So when I got to Berkeley, I worked a lot on what we call meta reasoning,

which really means reasoning about reasoning. And in the case of a game playing program,

you need to reason about what parts of the search tree you’re actually going to explore because the

search tree is enormous, bigger than the number of atoms in the universe. And the way programs

succeed and the way humans succeed is by only looking at a small fraction of the search tree.

And if you look at the right fraction, you play really well. If you look at the wrong fraction,

if you waste your time thinking about things that are never going to happen,

moves that no one’s ever going to make, then you’re going to lose because you won’t be able

to figure out the right decision. So that question of how machines can manage their own computation,

how they decide what to think about, is the meta reasoning question. And we developed some methods

for doing that. And very simply, the machine should think about whatever thoughts are going

to improve its decision quality. We were able to show that both for Othello, which is a standard

two player game, and for Backgammon, which includes dice rolls, so it’s a two player game

with uncertainty. For both of those cases, we could come up with algorithms that were actually

much more efficient than the standard alpha beta search, which chess programs at the time were

using. And that those programs could beat me. And I think you can see the same basic ideas in Alpha

Go and Alpha Zero today. The way they explore the tree is using a form of meta reasoning to select

what to think about based on how useful it is to think about it. Is there any insights you can

describe with our Greek symbols of how do we select which paths to go down? There’s really

two kinds of learning going on. So as you say, Alpha Go learns to evaluate board positions. So

it can look at a go board. And it actually has probably a superhuman ability to instantly tell

how promising that situation is. To me, the amazing thing about Alpha Go is not that it can

be the world champion with its hands tied behind his back, but the fact that if you stop it from

searching altogether, so you say, okay, you’re not allowed to do any thinking ahead. You can just

consider each of your legal moves and then look at the resulting situation and evaluate it. So

what we call a depth one search. So just the immediate outcome of your moves and decide if

that’s good or bad. That version of Alpha Go can still play at a professional level.

And human professionals are sitting there for five, 10 minutes deciding what to do and Alpha Go

in less than a second can instantly intuit what is the right move to make based on its ability to

evaluate positions. And that is remarkable because we don’t have that level of intuition about Go.

We actually have to think about the situation. So anyway, that capability that Alpha Go has is one

big part of why it beats humans. The other big part is that it’s able to look ahead 40, 50, 60 moves

into the future. And if it was considering all possibilities, 40 or 50 or 60 moves into the

future, that would be 10 to the 200 possibilities. So way more than atoms in the universe and so on.

So it’s very, very selective about what it looks at. So let me try to give you an intuition about

how you decide what to think about. It’s a combination of two things. One is how promising

it is. So if you’re already convinced that a move is terrible, there’s no point spending a lot more

time convincing yourself that it’s terrible because it’s probably not going to change your mind. So

the real reason you think is because there’s some possibility of changing your mind about what to do.

And it’s that changing your mind that would result then in a better final action in the real world.

So that’s the purpose of thinking is to improve the final action in the real world. So if you

think about a move that is guaranteed to be terrible, you can convince yourself it’s terrible,

you’re still not going to change your mind. But on the other hand, suppose you had a choice between

two moves. One of them you’ve already figured out is guaranteed to be a draw, let’s say. And then

the other one looks a little bit worse. It looks fairly likely that if you make that move, you’re

going to lose. But there’s still some uncertainty about the value of that move. There’s still some

possibility that it will turn out to be a win. Then it’s worth thinking about that. So even though

it’s less promising on average than the other move, which is a good move, it’s worth thinking

about on average than the other move, which is guaranteed to be a draw. There’s still some

purpose in thinking about it because there’s a chance that you will change your mind and discover

that in fact it’s a better move. So it’s a combination of how good the move appears to be

and how much uncertainty there is about its value. The more uncertainty, the more it’s worth thinking

about because there’s a higher upside if you want to think of it that way.

And of course in the beginning, especially in the AlphaGo Zero formulation, everything is shrouded

in uncertainty. So you’re really swimming in a sea of uncertainty. So it benefits you to,

I mean, actually following the same process as you described, but because you’re so uncertain

about everything, you basically have to try a lot of different directions.

Yeah. So the early parts of the search tree are fairly bushy that it will look at a lot

of different possibilities, but fairly quickly, the degree of certainty about some of the moves,

I mean, if a move is really terrible, you’ll pretty quickly find out, right? You lose half

your pieces or half your territory and then you’ll say, okay, this is not worth thinking

about anymore. And then so further down the tree becomes very long and narrow and you’re following

various lines of play, 10, 20, 30, 40, 50 moves into the future. And that again is something that

human beings have a very hard time doing mainly because they just lack the short term memory.

You just can’t remember a sequence of moves that’s 50 moves long. And you can’t imagine

the board correctly for that many moves into the future.

Of course, the top players, I’m much more familiar with chess, but the top players probably have,

they have echoes of the same kind of intuition instinct that in a moment’s time AlphaGo applies

when they see a board. I mean, they’ve seen those patterns, human beings have seen those patterns

before at the top, at the grandmaster level. It seems that there is some similarities or maybe

it’s our imagination creates a vision of those similarities, but it feels like this kind of

pattern recognition that the AlphaGo approaches are using is similar to what human beings at the

top level are using.

I think there’s, there’s some truth to that, but not entirely. Yeah. I mean, I think the,

the extent to which a human grandmaster can reliably instantly recognize the right move

and instantly recognize the value of the position. I think that’s a little bit overrated.

But if you sacrifice a queen, for example, I mean, there’s these, there’s these beautiful games of

chess with Bobby Fischer, somebody where it’s seeming to make a bad move. And I’m not sure

there’s a perfect degree of calculation involved where they’ve calculated all the possible things

that happen, but there’s an instinct there, right? That somehow adds up to

Yeah. So I think what happens is you, you, you get a sense that there’s some possibility in the

position, even if you make a weird looking move, that it opens up some, some lines of,

of calculation that otherwise would be definitely bad. And, and it’s that intuition that there’s

something here in this position that might, might yield a win.

And then you follow that, right? And, and in some sense, when a, when a chess player is

following a line and in his or her mind, they’re, they’re mentally simulating what the other person

is going to do, what the opponent is going to do. And they can do that as long as the moves are kind

of forced, right? As long as there’s, you know, there’s a, a fort we call a forcing variation

where the opponent doesn’t really have much choice how to respond. And then you follow that,

how to respond. And then you see if you can force them into a situation where you win.

You know, we see plenty of mistakes even, even in grandmaster games where they just miss some

simple three, four, five move combination that, you know, wasn’t particularly apparent in,

in the position, but was still there. That’s the thing that makes us human.

Yeah. So when you mentioned that in Othello, those games were after some matter reasoning

improvements and research was able to beat you. How did that make you feel?

Part of the meta reasoning capability that it had was based on learning and, and you could

sit down the next day and you could just feel that it had got a lot smarter, you know, and all of a

sudden you really felt like you’re sort of pressed against the wall because it was, it was much more

aggressive and, and was totally unforgiving of any minor mistake that you might make. And, and

actually it seemed understood the game better than I did. And Gary Kasparov has this quote where

during his match against Deep Blue, he said, he suddenly felt that there was a new kind of

intelligence across the board. Do you think that’s a scary or an exciting

possibility for, for Kasparov and for yourself in, in the context of chess, purely sort of

in this, like that feeling, whatever that is? I think it’s definitely an exciting feeling.

You know, this is what made me work on AI in the first place was as soon as I really understood

what a computer was, I wanted to make it smart. You know, I started out with the first program

I wrote was for the Sinclair programmable calculator. And I think you could write a

21 step algorithm. That was the biggest program you could write, something like that. And do

little arithmetic calculations. So I think I implemented Newton’s method for a square

roots and a few other things like that. But then, you know, I thought, okay, if I just had more

space, I could make this thing intelligent. And so I started thinking about AI and,

and I think the, the, the thing that’s scary is not, is not the chess program

because, you know, chess programs, they’re not in the taking over the world business.

But if you extrapolate, you know, there are things about chess that don’t resemble

the real world, right? We know, we know the rules of chess.

The chess board is completely visible to the program where of course the real world is not

most, most of the real world is, is not visible from wherever you’re sitting, so to speak.

And to overcome those kinds of problems, you need qualitatively different algorithms. Another thing

about the real world is that, you know, we, we regularly plan ahead on the timescales involving

billions or trillions of steps. Now we don’t plan those in detail, but you know, when you

choose to do a PhD at Berkeley, that’s a five year commitment and that amounts to about a trillion

motor control steps that you will eventually be committed to. Including going up the stairs,

opening doors, drinking water. Yeah. I mean, every, every finger movement while you’re typing,

every character of every paper and the thesis and everything. So you’re not committing in

advance to the specific motor control steps, but you’re still reasoning on a timescale that

will eventually reduce to trillions of motor control actions. And so for all of these reasons,

you know, AlphaGo and Deep Blue and so on don’t represent any kind of threat to humanity,

but they are a step towards it, right? And progress in AI occurs by essentially removing

one by one these assumptions that make problems easy. Like the assumption of complete observability

of the situation, right? We remove that assumption, you need a much more complicated

kind of computing design. It needs, it needs something that actually keeps track of all the

things you can’t see and tries to estimate what’s going on. And there’s inevitable uncertainty

in that. So it becomes a much more complicated problem. But, you know, we are removing those

assumptions. We are starting to have algorithms that can cope with much longer timescales,

that can cope with uncertainty, that can cope with partial observability.

And so each of those steps sort of magnifies by a thousand the range of things that we can

do with AI systems. So the way I started in AI, I wanted to be a psychiatrist for a long time. I

wanted to understand the mind in high school and of course program and so on. And I showed up

University of Illinois to an AI lab and they said, okay, I don’t have time for you,

but here’s a book, AI and Modern Approach. I think it was the first edition at the time.

Here, go, go, go learn this. And I remember the lay of the land was, well, it’s incredible that

we solved chess, but we’ll never solve go. I mean, it was pretty certain that go

in the way we thought about systems that reason wasn’t possible to solve. And now we’ve solved

this. So it’s a very… Well, I think I would have said that it’s unlikely we could take

the kind of algorithm that was used for chess and just get it to scale up and work well for go.

And at the time what we thought was that in order to solve go, we would have to do something similar

to the way humans manage the complexity of go, which is to break it down into kind of sub games.

So when a human thinks about a go board, they think about different parts of the board as sort

of weakly connected to each other. And they think about, okay, within this part of the board, here’s

how things could go in that part of board, here’s how things could go. And then you try to sort of

couple those two analyses together and deal with the interactions and maybe revise your views of

how things are going to go in each part. And then you’ve got maybe five, six, seven, ten parts of

the board. And that actually resembles the real world much more than chess does because in the

real world, we have work, we have home life, we have sport, different kinds of activities,

shopping, these all are connected to each other, but they’re weakly connected. So when I’m typing

a paper, I don’t simultaneously have to decide which order I’m going to get the milk and the

butter, that doesn’t affect the typing. But I do need to realize, okay, I better finish this

before the shops close because I don’t have anything, I don’t have any food at home. So

there’s some weak connection, but not in the way that chess works where everything is tied into a

single stream of thought. So the thought was that to solve go, we’d have to make progress on stuff

that would be useful for the real world. And in a way, AlphaGo is a little bit disappointing,

right? Because the program designed for AlphaGo is actually not that different from Deep Blue

or even from Arthur Samuel’s checker playing program from the 1950s. And in fact, the two

things that make AlphaGo work is one is this amazing ability to evaluate the positions,

and the other is the meta reasoning capability, which allows it to

explore some paths in the tree very deeply and to abandon other paths very quickly.

So this word meta reasoning, while technically correct, inspires perhaps the wrong degree of

power that AlphaGo has, for example, the word reasoning is a powerful word. So let me ask you,

sort of, you were part of the symbolic AI world for a while, like AI was, there’s a lot of

excellent, interesting ideas there that unfortunately met a winter. And so do you think it reemerges?

So I would say, yeah, it’s not quite as simple as that. So the AI winter

for the first winter that was actually named as such was the one in the late 80s.

And that came about because in the mid 80s, there was a really a concerted attempt to push AI

out into the real world using what was called expert system technology. And for the most part,

that technology was just not ready for primetime. They were trying, in many cases, to do a form of

uncertain reasoning, judgment, combinations of evidence, diagnosis, those kinds of things,

which was simply invalid. And when you try to apply invalid reasoning methods to real problems,

you can fudge it for small versions of the problem. But when it starts to get larger,

the thing just falls apart. So many companies found that the stuff just didn’t work, and they

were spending tons of money on consultants to try to make it work. And there were other

practical reasons, like they were asking the companies to buy incredibly expensive

Lisp machine workstations, which were literally between $50,000 and $100,000 in 1980s money,

which would be like between $150,000 and $300,000 per workstation in current prices.

And then the bottom line, they weren’t seeing a profit from it.

Yeah, in many cases. I think there were some successes, there’s no doubt about that. But

people, I would say, overinvested. Every major company was starting an AI department, just like

now. And I worry a bit that we might see similar disappointments, not because the current technology

is invalid, but it’s limited in its scope. And it’s almost the duel of the scope problems that

expert systems had. So what have you learned from that hype cycle? And what can we do to

prevent another winter, for example? Yeah, so when I’m giving talks these days,

that’s one of the warnings that I give. So this is a two part warning slide. One is that rather

than data being the new oil, data is the new snake oil. That’s a good line. And then the other

is that we might see a kind of very visible failure in some of the major application areas. And I think

self driving cars would be the flagship. And I think when you look at the history,

so the first self driving car was on the freeway, driving itself, changing lanes, overtaking in 1987.

And so it’s more than 30 years. And that kind of looks like where we are today, right? You know,

prototypes on the freeway, changing lanes and overtaking. Now, I think that’s one of the things

that’s been made, particularly on the perception side. So we worked a lot on autonomous vehicles

in the early mid 90s at Berkeley. And we had our own big demonstrations. We put congressmen into

self driving cars and had them zooming along the freeway. And the problem was clearly perception.

At the time, the problem was perception. Yeah. So in simulation, with perfect perception,

you could actually show that you can drive safely for a long time, even if the other cars are

misbehaving and so on. But simultaneously, we worked on machine vision for detecting cars and

tracking pedestrians and so on. And we couldn’t get the cars to do that. And so we had to do

that for pedestrians and so on. And we couldn’t get the reliability of detection and tracking

up to a high enough level, particularly in bad weather conditions, nighttime,

rainfall. Good enough for demos, but perhaps not good enough to cover the general operation.

Yeah. So the thing about driving is, you know, suppose you’re a taxi driver, you know,

and you drive every day, eight hours a day for 10 years, right? That’s 100 million seconds of

driving, you know, and any one of those seconds, you can make a fatal mistake. So you’re talking

about eight nines of reliability, right? Now, if your vision system only detects 98.3% of the

vehicles, right, then that’s sort of, you know, one in a bit nines of reliability. So you have

another seven orders of magnitude to go. And this is what people don’t understand. They think,

oh, because I had a successful demo, I’m pretty much done. But you’re not even within seven orders

of magnitude of being done. And that’s the difficulty. And it’s not the, can I follow a

white line? That’s not the problem, right? We follow a white line all the way across the country.

But it’s the weird stuff that happens. It’s all the edge cases, yeah.

The edge case, other drivers doing weird things. You know, so if you talk to Google, right, so

they had actually a very classical architecture where, you know, you had machine vision which

would detect all the other cars and pedestrians and the white lines and the road signs. And then

basically that was fed into a logical database. And then you had a classical 1970s rule based

expert system telling you, okay, if you’re in the middle lane and there’s a bicyclist in the right

lane who is signaling this, then you do that, right? And what they found was that every day

they’d go out and there’d be another situation that the rules didn’t cover. You know, so they’d

come to a traffic circle and there’s a little girl riding her bicycle the wrong way around

the traffic circle. Okay, what do you do? We don’t have a rule. Oh my God. Okay, stop.

And then, you know, they come back and add more rules and they just found that this was not really

converging. And if you think about it, right, how do you deal with an unexpected situation,

meaning one that you’ve never previously encountered and the sort of reasoning required

to figure out the solution for that situation has never been done. It doesn’t match any previous

situation in terms of the kind of reasoning you have to do. Well, you know, in chess programs,

this happens all the time, right? You’re constantly coming up with situations you haven’t

seen before and you have to reason about them and you have to think about, okay, here are the

possible things I could do. Here are the outcomes. Here’s how desirable the outcomes are and then

pick the right one. You know, in the 90s, we were saying, okay, this is how you’re going to have to

do automated vehicles. They’re going to have to have a look ahead capability, but the look ahead

for driving is more difficult than it is for chess because there’s humans and they’re less

predictable than chess pieces. Well, then you have an opponent in chess who’s also somewhat

unpredictable. But for example, in chess, you always know the opponent’s intention. They’re

trying to beat you, right? Whereas in driving, you don’t know is this guy trying to turn left

or has he just forgotten to turn off his turn signal or is he drunk or is he changing the

channel on his radio or whatever it might be. You’ve got to try and figure out the mental state,

the intent of the other drivers to forecast the possible evolutions of their trajectories.

And then you’ve got to figure out, okay, which is the trajectory for me that’s going to be safest.

And those all interact with each other because the other drivers are going to react to your

trajectory and so on. So, you know, they’ve got the classic merging onto the freeway problem where

you’re kind of racing a vehicle that’s already on the freeway and you’re going to pull ahead of

them or you’re going to let them go first and pull in behind and you get this sort of uncertainty

about who’s going first. So all those kinds of things mean that you need a decision making

architecture that’s very different from either a rule based system or it seems to me kind of an

end to end neural network system. So just as AlphaGo is pretty good when it doesn’t do any

look ahead, but it’s way, way, way, way better when it does, I think the same is going to be

true for driving. You can have a driving system that’s pretty good when it doesn’t do any look

ahead, but that’s not good enough. And we’ve already seen multiple deaths caused by poorly

designed machine learning algorithms that don’t really understand what they’re doing.

Yeah. On several levels, I think on the perception side, there’s mistakes being made by those

algorithms where the perception is very shallow. On the planning side, the look ahead, like you

said, and the thing that we come up against that’s really interesting when you try to deploy systems

in the real world is you can’t think of an artificial intelligence system as a thing that

responds to the world always. You have to realize that it’s an agent that others will respond to as

well. So in order to drive successfully, you can’t just try to do obstacle avoidance.

Right. You can’t pretend that you’re invisible, right? You’re the invisible car.

Right. It doesn’t work that way.

I mean, but you have to assert yet others have to be scared of you. Just we’re all,

there’s this tension, there’s this game. So if we study a lot of work with pedestrians,

if you approach pedestrians as purely an obstacle avoidance, so you’re doing look ahead as in

modeling the intent that they’re not going to, they’re going to take advantage of you. They’re

not going to respect you at all. There has to be a tension, a fear, some amount of uncertainty.

That’s how we have created.

Or at least just a kind of a resoluteness. You have to display a certain amount of

resoluteness. You can’t be too tentative. And yeah, so the solutions then become

pretty complicated, right? You get into game theoretic analyses. And so at Berkeley now,

we’re working a lot on this kind of interaction between machines and humans.

And that’s exciting.

And so my colleague, Ankur Dragan, actually, if you formulate the problem game theoretically,

you just let the system figure out the solution. It does interesting unexpected things. Like

sometimes at a stop sign, if no one is going first, the car will actually back up a little,

right? And just to indicate to the other cars that they should go. And that’s something it

invented entirely by itself. We didn’t say this is the language of communication at stop signs.

It figured it out.

That’s really interesting. So let me one just step back for a second. Just this beautiful

philosophical notion. So Pamela McCordick in 1979 wrote, AI began with the ancient wish to

forge the gods. So when you think about the history of our civilization, do you think

that there is an inherent desire to create, let’s not say gods, but to create superintelligence?

Is it inherent to us? Is it in our genes? That the natural arc of human civilization is to create

things that are of greater and greater power and perhaps echoes of ourselves. So to create the gods

as Pamela said. Maybe. I mean, we’re all individuals, but certainly we see over and over

again in history, individuals who thought about this possibility. Hopefully when I’m not being too

philosophical here, but if you look at the arc of this, where this is going and we’ll talk about AI

safety, we’ll talk about greater and greater intelligence. Do you see that there in, when you

created the Othello program and you felt this excitement, what was that excitement? Was it

excitement of a tinkerer who created something cool like a clock? Or was there a magic or was

it more like a child being born? Yeah. So I mean, I certainly understand that viewpoint. And if you

look at the Lighthill report, which was, so in the 70s, there was a lot of controversy in the UK

about AI and whether it was for real and how much money the government should invest. And

there was a long story, but the government commissioned a report by Lighthill, who was a

physicist, and he wrote a very damning report about AI, which I think was the point. And he

said that these are frustrated men who are unable to have children would like to create and create

a life as a kind of replacement, which I think is really pretty unfair. But there is a kind of magic,

I would say, when you build something and what you’re building in is really just, you’re building

in some understanding of the principles of learning and decision making. And to see those

principles actually then turn into intelligent behavior in specific situations, it’s an

incredible thing. And that is naturally going to make you think, okay, where does this end?

And so there’s magical optimistic views of where it ends, whatever your view of optimism is,

whatever your view of utopia is, it’s probably different for everybody. But you’ve often talked

about concerns you have of how things may go wrong. So I’ve talked to Max Tegmark. There’s a

lot of interesting ways to think about AI safety. You’re one of the seminal people thinking about

this problem amongst sort of being in the weeds of actually solving specific AI problems. You’re

also thinking about the big picture of where are we going? So can you talk about several elements

of it? Let’s just talk about maybe the control problem. So this idea of losing ability to control

the behavior in our AI system. So how do you see that? How do you see that coming about?

What do you think we can do to manage it?

Well, so it doesn’t take a genius to realize that if you make something that’s smarter than you,

you might have a problem. Alan Turing wrote about this and gave lectures about this in 1951.

He did a lecture on the radio and he basically says, once the machine thinking method starts,

very quickly they’ll outstrip humanity. And if we’re lucky, we might be able to turn off the power

at strategic moments, but even so, our species would be humbled. Actually, he was wrong about

that. If it’s sufficiently intelligent machine, it’s not going to let you switch it off. It’s

actually in competition with you. So what do you think is most likely going to happen?

What do you think is meant just for a quick tangent, if we shut off this super intelligent

machine that our species will be humbled? I think he means that we would realize that

we are inferior, right? That we only survive by the skin of our teeth because we happen to get

to the off switch just in time. And if we hadn’t, then we would have lost control over the earth.

Are you more worried when you think about this stuff about super intelligent AI,

or are you more worried about super powerful AI that’s not aligned with our values? So the

paperclip scenarios kind of… So the main problem I’m working on is the control problem, the problem

of machines pursuing objectives that are, as you say, not aligned with human objectives. And

this has been the way we’ve thought about AI since the beginning.

You build a machine for optimizing, and then you put in some objective, and it optimizes, right?

And we can think of this as the King Midas problem, right? Because if the King Midas put

in this objective, everything I touch should turn to gold. And the gods, that’s like the machine,

they said, okay, done. You now have this power. And of course, his father,

his drink, and his family all turned to gold. And then he dies of misery and starvation. And

it’s a warning, it’s a failure mode that pretty much every culture in history has had some story

along the same lines. There’s the genie that gives you three wishes, and the third wish is always,

you know, please undo the first two wishes because I messed up. And when Arthur Samuel wrote his

checker playing program, which learned to play checkers considerably better than

Arthur Samuel could play, and actually reached a pretty decent standard.

Norbert Wiener, who was one of the major mathematicians of the 20th century,

he’s sort of the father of modern automation control systems. He saw this and he basically

extrapolated, as Turing did, and said, okay, this is how we could lose control.

And specifically, that we have to be certain that the purpose we put into the machine is the

purpose which we really desire. And the problem is, we can’t do that.

You mean we’re not, it’s a very difficult to encode,

to put our values on paper is really difficult, or you’re just saying it’s impossible?

So theoretically, it’s possible, but in practice, it’s extremely unlikely that we could

specify correctly in advance, the full range of concerns of humanity.

You talked about cultural transmission of values,

I think is how humans to human transmission of values happens, right?

Well, we learn, yeah, I mean, as we grow up, we learn about the values that matter,

how things should go, what is reasonable to pursue and what isn’t reasonable to pursue.

You think machines can learn in the same kind of way?

Yeah, so I think that what we need to do is to get away from this idea that

you build an optimising machine, and then you put the objective into it.

Because if it’s possible that you might put in a wrong objective, and we already know this is

possible because it’s happened lots of times, right? That means that the machine should never

take an objective that’s given as gospel truth. Because once it takes the objective as gospel

truth, then it believes that whatever actions it’s taking in pursuit of that objective are

the correct things to do. So you could be jumping up and down and saying, no, no, no,

no, you’re going to destroy the world, but the machine knows what the true objective is and is

pursuing it, and tough luck to you. And this is not restricted to AI, right? This is, I think,

many of the 20th century technologies, right? So in statistics, you minimise a loss function,

the loss function is exogenously specified. In control theory, you minimise a cost function.

In operations research, you maximise a reward function, and so on. So in all these disciplines,

this is how we conceive of the problem. And it’s the wrong problem because we cannot specify

with certainty the correct objective, right? We need uncertainty, we need the machine to be

uncertain about what it is that it’s supposed to be maximising.

Favourite idea of yours, I’ve heard you say somewhere, well, I shouldn’t pick favourites,

but it just sounds beautiful, we need to teach machines humility. It’s a beautiful way to put it,

I love it.

That they’re humble, they know that they don’t know what it is they’re supposed to be doing,

and that those objectives, I mean, they exist, they’re within us, but we may not be able to

we may not be able to explicate them, we may not even know how we want our future to go.


And the machine, a machine that’s uncertain is going to be deferential to us. So if we say,

don’t do that, well, now the machines learn something a bit more about our true objectives,

because something that it thought was reasonable in pursuit of our objective,

turns out not to be, so now it’s learned something. So it’s going to defer because

it wants to be doing what we really want. And that point, I think, is absolutely central

to solving the control problem. And it’s a different kind of AI when you take away this

idea that the objective is known, then in fact, a lot of the theoretical frameworks that we’re so

familiar with, you know, Markov decision processes, goal based planning, you know,

standard games research, all of these techniques actually become inapplicable.

And you get a more complicated problem because now the interaction with the human becomes part

of the problem. Because the human by making choices is giving you more information about

the true objective and that information helps you achieve the objective better.

And so that really means that you’re mostly dealing with game theoretic problems where

you’ve got the machine and the human and they’re coupled together,

rather than a machine going off by itself with a fixed objective.

LW. Which is fascinating on the machine and the human level that we, when you don’t have an

objective, means you’re together coming up with an objective. I mean, there’s a lot of philosophy

that, you know, you could argue that life doesn’t really have meaning. We together agree on what

gives it meaning and we kind of culturally create things that give why the heck we are on this earth

anyway. We together as a society create that meaning and you have to learn that objective.

And one of the biggest, I thought that’s where you were going to go for a second,

one of the biggest troubles we run into outside of statistics and machine learning and AI

and just human civilization is when you look at, I came from, I was born in the Soviet Union

and the history of the 20th century, we ran into the most trouble, us humans, when there was a

certainty about the objective and you do whatever it takes to achieve that objective, whether you’re

talking about Germany or communist Russia. You get into trouble with humans.

I would say with, you know, corporations, in fact, some people argue that, you know,

we don’t have to look forward to a time when AI systems take over the world. They already have

and they call corporations, right? That corporations happen to be using people as

components right now, but they are effectively algorithmic machines and they’re optimizing

an objective, which is quarterly profit that isn’t aligned with overall wellbeing of the human race.

And they are destroying the world. They are primarily responsible for our inability to tackle

climate change. So I think that’s one way of thinking about what’s going on with corporations,

but I think the point you’re making is valid that there are many systems in the real world where

we’ve sort of prematurely fixed on the objective and then decoupled the machine from those that’s

supposed to be serving. And I think you see this with government, right? Government is supposed to

be a machine that serves people, but instead it tends to be taken over by people who have their

own objective and use government to optimize that objective regardless of what people want.

Do you find appealing the idea of almost arguing machines where you have multiple AI systems with

a clear fixed objective. We have in government, the red team and the blue team, they’re very fixed on

their objectives and they argue and they kind of may disagree, but it kind of seems to make it

work somewhat that the duality of it. Okay. Let’s go a hundred years back when there was still was

going on or at the founding of this country, there was disagreements and that disagreement is where,

so it was a balance between certainty and forced humility because the power was distributed.

Yeah. I think that the nature of debate and disagreement argument takes as a premise,

the idea that you could be wrong, which means that you’re not necessarily absolutely convinced

that your objective is the correct one. If you were absolutely convinced, there’d be no point

in having any discussion or argument because you would never change your mind and there wouldn’t

be any sort of synthesis or anything like that. I think you can think of argumentation as an

implementation of a form of uncertain reasoning. I’ve been reading recently about utilitarianism

and the history of efforts to define in a sort of clear mathematical way,

if you like a formula for moral or political decision making. It’s really interesting that

the parallels between the philosophical discussions going back 200 years and what you see now in

discussions about existential risk because it’s almost exactly the same. Someone would say,

okay, well here’s a formula for how we should make decisions. Utilitarianism is roughly each

person has a utility function and then we make decisions to maximize the sum of everybody’s

utility. Then people point out, well, in that case, the best policy is one that leads to

the enormously vast population, all of whom are living a life that’s barely worth living.

This is called the repugnant conclusion. Another version is that we should maximize

pleasure and that’s what we mean by utility. Then you’ll get people effectively saying, well,

in that case, we might as well just have everyone hooked up to a heroin drip. They didn’t use those

words, but that debate was happening in the 19th century as it is now about AI that if we get the

formula wrong, we’re going to have AI systems working towards an outcome that in retrospect

would be exactly wrong. Do you think there’s, as beautifully put, so the echoes are there,

but do you think, I mean, if you look at Sam Harris, our imagination worries about the AI

version of that because of the speed at which the things going wrong in the utilitarian context

could happen. Is that a worry for you? Yeah. I think that in most cases, not in all, but if we

have a wrong political idea, we see it starting to go wrong and we’re not completely stupid and so

we say, okay, maybe that was a mistake. Let’s try something different. Also, we’re very slow and

inefficient about implementing these things and so on. So you have to worry when you have

corporations or political systems that are extremely efficient. But when we look at AI systems

or even just computers in general, they have this different characteristic from ordinary

human activity in the past. So let’s say you were a surgeon, you had some idea about how to do some

operation. Well, and let’s say you were wrong, that way of doing the operation would mostly

kill the patient. Well, you’d find out pretty quickly, like after three, maybe three or four

tries. But that isn’t true for pharmaceutical companies because they don’t do three or four

operations. They manufacture three or four billion pills and they sell them and then they find out

maybe six months or a year later that, oh, people are dying of heart attacks or getting cancer from

this drug. And so that’s why we have the FDA, right? Because of the scalability of pharmaceutical

production. And there have been some unbelievably bad episodes in the history of pharmaceuticals

and adulteration of products and so on that have killed tens of thousands or paralyzed hundreds

of thousands of people. Now with computers, we have that same scalability problem that you can

sit there and type for I equals one to five billion do, right? And all of a sudden you’re

having an impact on a global scale. And yet we have no FDA, right? There’s absolutely no controls

at all over what a bunch of undergraduates with too much caffeine can do to the world.

And we look at what happened with Facebook, well, social media in general and click through

optimization. So you have a simple feedback algorithm that’s trying to just optimize click

through, right? That sounds reasonable, right? Because you don’t want to be feeding people ads

that they don’t care about or not interested in. And you might even think of that process as

simply adjusting the feeding of ads or news articles or whatever it might be

to match people’s preferences, right? Which sounds like a good idea.

But in fact, that isn’t how the algorithm works, right? You make more money,

the algorithm makes more money if it can better predict what people are going to click on,

because then it can feed them exactly that, right? So the way to maximize click through

is actually to modify the people to make them more predictable. And one way to do that is to

feed them information, which will change their behavior and preferences towards extremes that

make them predictable. Whatever is the nearest extreme or the nearest predictable point,

that’s where you’re going to end up. And the machines will force you there.

And I think there’s a reasonable argument to say that this, among other things,

is contributing to the destruction of democracy in the world.

And where was the oversight of this process? Where were the people saying, okay,

you would like to apply this algorithm to 5 billion people on the face of the earth.

Can you show me that it’s safe? Can you show me that it won’t have various kinds of negative

effects? No, there was no one asking that question. There was no one placed between

the undergrads with too much caffeine and the human race. They just did it.

But some way outside the scope of my knowledge, so economists would argue that the, what is it,

the invisible hand, so the capitalist system, it was the oversight. So if you’re going to corrupt

society with whatever decision you make as a company, then that’s going to be reflected in

people not using your product. That’s one model of oversight.

We shall see, but in the meantime, but you might even have broken the political system

that enables capitalism to function. Well, you’ve changed it.

We shall see.

Change is often painful. So my question is absolutely, it’s fascinating. You’re absolutely

right that there was zero oversight on algorithms that can have a profound civilization changing

effect. So do you think it’s possible? I mean, I haven’t, have you seen government? So do you

think it’s possible to create regulatory bodies oversight over AI algorithms, which are inherently

such cutting edge set of ideas and technologies?

Yeah, but I think it takes time to figure out what kind of oversight, what kinds of controls.

I mean, it took time to design the FDA regime, you know, and some people still don’t like it and

they want to fix it. And I think there are clear ways that it could be improved.

But the whole notion that you have stage one, stage two, stage three, and here are the criteria

for what you have to do to pass a stage one trial, right? We haven’t even thought about what those

would be for algorithms. So, I mean, I think there are things we could do right now with regard to

bias, for example, we have a pretty good technical handle on how to detect algorithms that are

propagating bias that exists in data sets, how to de bias those algorithms, and even what it’s going

to cost you to do that. So I think we could start having some standards on that. I think there are

things to do with impersonation and falsification that we could work on.

Fakes, yeah.

A very simple point. So impersonation is a machine acting as if it was a person.

I can’t see a real justification for why we shouldn’t insist that machines self identify

as machines. Where is the social benefit in fooling people into thinking that this is really

a person when it isn’t? I don’t mind if it uses a human like voice, that’s easy to understand,

that’s fine, but it should just say, I’m a machine in some form.

And how many people are speaking to that? I would think relatively obvious facts.

Yeah, I mean, there is actually a law in California that bans impersonation, but only in certain

restricted circumstances. So for the purpose of engaging in a fraudulent transaction and for the

purpose of modifying someone’s voting behavior. So those are the circumstances where machines have

to self identify. But I think arguably, it should be in all circumstances. And

then when you talk about deep fakes, we’re just at the beginning, but already it’s possible to

make a movie of anybody saying anything in ways that are pretty hard to detect.

Including yourself because you’re on camera now and your voice is coming through with high


Yeah, so you could take what I’m saying and replace it with pretty much anything else you

wanted me to be saying. And it’s a very simple thing.

Take what I’m saying and replace it with pretty much anything else you wanted me to be saying. And

even it would change my lips and facial expressions to fit. And there’s actually not much

in the way of real legal protection against that. I think in the commercial area, you could say,

yeah, you’re using my brand and so on. There are rules about that. But in the political sphere,

I think at the moment, anything goes. That could be really, really damaging.

And let me just try to make not an argument, but try to look back at history and say something dark

in essence is while regulation seems to be, oversight seems to be exactly the right thing to

do here. It seems that human beings, what they naturally do is they wait for something to go

wrong. If you’re talking about nuclear weapons, you can’t talk about nuclear weapons being dangerous

until somebody actually like the United States drops the bomb or Chernobyl melting. Do you think

we will have to wait for things going wrong in a way that’s obviously damaging to society,

not an existential risk, but obviously damaging? Or do you have faith that…

I hope not, but I think we do have to look at history.

And so the two examples you gave, nuclear weapons and nuclear power are very, very interesting

because nuclear weapons, we knew in the early years of the 20th century that atoms contained

a huge amount of energy. We had E equals MC squared. We knew the mass differences between

the different atoms and their components. And we knew that

you might be able to make an incredibly powerful explosive. So HG Wells wrote science fiction book,

I think in 1912. Frederick Soddy, who was the guy who discovered isotopes, the Nobel prize winner,

he gave a speech in 1915 saying that one pound of this new explosive would be the equivalent

of 150 tons of dynamite, which turns out to be about right. And this was in World War I,

so he was imagining how much worse the world war would be if we were using that kind of explosive.

But the physics establishment simply refused to believe that these things could be made.

Including the people who are making it.

Well, so they were doing the nuclear physics. I mean, eventually were the ones who made it.

You talk about Fermi or whoever.

Well, so up to the development was mostly theoretical. So it was people using sort of

primitive kinds of particle acceleration and doing experiments at the level of single particles

or collections of particles. They weren’t yet thinking about how to actually make a bomb or

anything like that. But they knew the energy was there and they figured if they understood it

better, it might be possible. But the physics establishment, their view, and I think because

they did not want it to be true, their view was that it could not be true. That this could not

not provide a way to make a super weapon. And there was this famous speech given by Rutherford,

who was the sort of leader of nuclear physics. And it was on September 11th, 1933. And he said,

anyone who talks about the possibility of obtaining energy from transformation of atoms

is talking complete moonshine. And the next morning, Leo Szilard read about that speech

and then invented the nuclear chain reaction. And so as soon as he invented, as soon as he had that

idea that you could make a chain reaction with neutrons, because neutrons were not repelled by

the nucleus, so they could enter the nucleus and then continue the reaction. As soon as he has that

idea, he instantly realized that the world was in deep doo doo. Because this is 1933, right? Hitler

had recently come to power in Germany. Szilard was in London and eventually became a refugee

and came to the US. And in the process of having the idea about the chain reaction,

he figured out basically how to make a bomb and also how to make a reactor. And he patented the

reactor in 1934. But because of the situation, the great power conflict situation that he could see

happening, he kept that a secret. And so between then and the beginning of World War II, people

were working, including the Germans, on how to actually create neutron sources, what specific

fission reactions would produce neutrons of the right energy to continue the reaction.

And that was demonstrated in Germany, I think in 1938, if I remember correctly.

The first nuclear weapon patent was 1939 by the French. So this was actually going on well before

World War II really got going. And then the British probably had the most advanced capability

in this area. But for safety reasons, among others, and just resources, they moved the program

from Britain to the US and then that became Manhattan Project. So the reason why we couldn’t

have any kind of oversight of nuclear weapons and nuclear technology

was because we were basically already in an arms race and a war.

LR But you mentioned then in the 20s and 30s. So what are the echoes? The way you’ve described

this story, I mean, there’s clearly echoes. Why do you think most AI researchers,

folks who are really close to the metal, they really are not concerned about AI. They don’t

think about it, whether it’s they don’t want to think about it. But why do you think that is,

is what are the echoes of the nuclear situation to the current AI situation? And what can we do

about it? BF I think there is a kind of motivated cognition, which is a term in psychology means

that you believe what you would like to be true, rather than what is true. And it’s unsettling

to think that what you’re working on might be the end of the human race, obviously. So you would

rather instantly deny it and come up with some reason why it couldn’t be true. And I have,

I collected a long list of reasons that extremely intelligent, competent AI scientists have come up

with for why we shouldn’t worry about this. For example, calculators are superhuman at arithmetic

and they haven’t taken over the world. So there’s nothing to worry about. Well, okay, my five year

old, you know, could have figured out why that was an unreasonable and really quite weak argument.

Another one was, while it’s theoretically possible that you could have superhuman AI destroy the

world, it’s also theoretically possible that a black hole could materialize right next to the

earth and destroy humanity. I mean, yes, it’s theoretically possible, quantum theoretically,

extremely unlikely that it would just materialize right there. But that’s a completely bogus analogy,

because, you know, if the whole physics community on earth was working to materialize a black hole

in near earth orbit, right? Wouldn’t you ask them, is that a good idea? Is that going to be safe?

You know, what if you succeed? Right. And that’s the thing, right? The AI community is sort of

refused to ask itself, what if you succeed? And initially I think that was because it was too hard,

but, you know, Alan Turing asked himself that, and he said, we’d be toast, right? If we were lucky,

we might be able to switch off the power, but probably we’d be toast. But there’s also an aspect

that because we’re not exactly sure what the future holds, it’s not clear exactly,

so technically what to worry about, sort of how things go wrong. And so there is something,

it feels like, maybe you can correct me if I’m wrong, but there’s something paralyzing about

worrying about something that logically is inevitable, but you have to think about it,

logically is inevitable, but you don’t really know what that will look like.

Yeah, I think that’s, it’s a reasonable point and, you know, it’s certainly in terms of

existential risks, it’s different from, you know, asteroid collides with the earth, right? Which,

again, is quite possible, you know, it’s happened in the past, it’ll probably happen again,

we don’t know right now, but if we did detect an asteroid that was going to hit the earth

in 75 years time, we’d certainly be doing something about it.

Well, it’s clear there’s got big rock and there’s,

we’ll probably have a meeting and see what do we do about the big rock with AI.

Right, with AI, I mean, there are very few people who think it’s not going to happen within the

next 75 years. I know Rod Brooks doesn’t think it’s going to happen, maybe Andrew Ng doesn’t

think it’s happened, but, you know, a lot of the people who work day to day, you know, as you say,

at the rock face, they think it’s going to happen. I think the median estimate from AI researchers is

somewhere in 40 to 50 years from now, or maybe, you know, I think in Asia, they think it’s going

to be even faster than that. I’m a little bit more conservative, I think it’d probably take

longer than that, but I think, you know, as happened with nuclear weapons, it can happen

overnight that you have these breakthroughs and we need more than one breakthrough, but,

you know, it’s on the order of half a dozen, I mean, this is a very rough scale, but sort of

half a dozen breakthroughs of that nature would have to happen for us to reach the superhuman AI.

But the, you know, the AI research community is vast now, the massive investments from governments,

from corporations, tons of really, really smart people, you know, you just have to look at the

rate of progress in different areas of AI to see that things are moving pretty fast. So to say,

oh, it’s just going to be thousands of years, I don’t see any basis for that. You know, I see,

you know, for example, the Stanford 100 year AI project, right, which is supposed to be sort of,

you know, the serious establishment view, their most recent report actually said it’s probably

not even possible. Oh, wow.

Right. Which if you want a perfect example of people in denial, that’s it. Because, you know,

for the whole history of AI, we’ve been saying to philosophers who said it wasn’t possible,

well, you have no idea what you’re talking about. Of course it’s possible, right? Give me an argument

for why it couldn’t happen. And there isn’t one, right? And now, because people are worried that

maybe AI might get a bad name, or I just don’t want to think about this, they’re saying, okay,

well, of course, it’s not really possible. You know, imagine if, you know, the leaders of the

cancer biology community got up and said, well, you know, of course, curing cancer,

it’s not really possible. There’d be complete outrage and dismay. And, you know, I find this

really a strange phenomenon. So, okay, so if you accept that it’s possible,

and if you accept that it’s probably going to happen, the point that you’re making that,

you know, how does it go wrong? A valid question. Without that, without an answer to that question,

then you’re stuck with what I call the gorilla problem, which is, you know, the problem that

the gorillas face, right? They made something more intelligent than them, namely us, a few million

years ago, and now they’re in deep doo doo. So there’s really nothing they can do. They’ve lost

the control. They failed to solve the control problem of controlling humans, and so they’ve

lost. So we don’t want to be in that situation. And if the gorilla problem is the only formulation

you have, there’s not a lot you can do, right? Other than to say, okay, we should try to stop,

you know, we should just not make the humans, or in this case, not make the AI. And I think

that’s really hard to do. I’m not actually proposing that that’s a feasible course of

action. I also think that, you know, if properly controlled AI could be incredibly beneficial.

But it seems to me that there’s a consensus that one of the major failure modes is this

loss of control, that we create AI systems that are pursuing incorrect objectives. And because

the AI system believes it knows what the objective is, it has no incentive to listen to us anymore,

so to speak, right? It’s just carrying out the strategy that it has computed as being the optimal

solution. And, you know, it may be that in the process, it needs to acquire more resources to

increase the possibility of success or prevent various failure modes by defending itself against

interference. And so that collection of problems, I think, is something we can address. The other

problems are, roughly speaking, you know, misuse, right? So even if we solve the control problem,

we make perfectly safe controllable AI systems. Well, why? You know, why does Dr. Evil going to

use those, right? He wants to just take over the world and he’ll make unsafe AI systems that then

get out of control. So that’s one problem, which is sort of a, you know, partly a policing problem,

partly a sort of a cultural problem for the profession of how we teach people what kinds

of AI systems are safe. You talk about autonomous weapon system and how pretty much everybody

agrees that there’s too many ways that that can go horribly wrong. This great slaughterbots movie

that kind of illustrates that beautifully. I want to talk about that. That’s another,

there’s another topic I’m having to talk about. I just want to mention that what I see is the

third major failure mode, which is overuse, not so much misuse, but overuse of AI that we become

overly dependent. So I call this the WALL E problem. So if you’ve seen WALL E, the movie,

all right, all the humans are on the spaceship and the machines look after everything for them,

and they just watch TV and drink big gulps. And they’re all sort of obese and stupid and they

sort of totally lost any notion of human autonomy. And, you know, so in effect, right. This would

happen like the slow boiling frog, right? We would gradually turn over more and more of the

management of our civilization to machines as we are already doing. And this, you know, if this

if this process continues, you know, we sort of gradually switch from sort of being the masters

of technology to just being the guests. Right. So we become guests on a cruise ship, you know,

which is fine for a week, but not not for the rest of eternity. You know, and it’s almost

irreversible. Right. Once you once you lose the incentive to, for example, you know, learn to be

an engineer or a doctor or a sanitation operative or any other of the infinitely many ways that we

maintain and propagate our civilization. You know, if you if you don’t have the incentive to do any

of that, you won’t. And then it’s really hard to recover. And of course, as just one of the

technologies that could that third failure mode result in that there’s probably other

technology in general detaches us from it does a bit. But the difference is that in terms of

the knowledge to to run our civilization, you know, up to now, we’ve had no alternative but

to put it into people’s heads. Right. And if you software with Google, I mean, so software in

general, so computers in general, but but the, you know, the knowledge of how, you know, how a

sanitation system works, you know, that’s an AI has to understand that it’s no good putting it

into Google. So, I mean, we we’ve always put knowledge in on paper, but paper doesn’t run our

civilization and only runs when it goes from the paper into people’s heads again. Right. So we’ve

always propagated civilization through human minds. And we’ve spent about a trillion person

years doing that. I literally write you, you can work it out. It’s about right. There’s about just

over 100 billion people who’ve ever lived. And each of them has spent about 10 years learning

stuff to keep their civilization going. And so that’s a trillion person years we put into this

effort. Beautiful way to describe all civilization. And now we’re, you know, we’re in danger of

throwing that away. So this is a problem that AI can’t solve. It’s not a technical problem. It’s

you know, if we do our job right, the AI systems will say, you know, the human race doesn’t in the

long run want to be passengers in a cruise ship. The human race wants autonomy. This is part of

human preferences. So we, the AI systems are not going to do this stuff for you. You’ve got to do

it for yourself. Right. I’m not going to carry you to the top of Everest in an autonomous

helicopter. You have to climb it if you want to get the benefit and so on. So, but I’m afraid that

because we are short sighted and lazy, we’re going to override the AI systems. And, and there’s an

amazing short story that I recommend to everyone that I talked to about this called The Machine

Stops, written in 1909 by E.M. Forster, who, you know, wrote novels about the British Empire and

sort of things that became costume dramas on the BBC. But he wrote this one science fiction story,

which is an amazing vision of the future. It has basically iPads, it has video conferencing,

it has MOOCs, it has computer induced obesity. I mean, literally it’s what people spend their

time doing is giving online courses or listening to online courses and talking about ideas,

but they never get out there in the real world. They don’t really have a lot of face to face

contact. Everything is done online, you know, so all the things we’re worrying about now

were described in the story. And, and then the human race becomes more and more dependent on

the machine, loses knowledge of how things really run and then becomes vulnerable to collapse. And

so it’s a, it’s a pretty unbelievably amazing story for someone writing in 1909 to imagine all

this. So there’s very few people that represent artificial intelligence more than you Stuart

Russell. If you say it’s okay, that’s very kind. So it’s all my fault. Right. You’re often brought

up as the person, well, Stuart Russell, like the AI person is worried about this. That’s why you

should be worried about it. Do you feel the burden of that? I don’t know if you feel that at all,

but when I talk to people like from, you talk about people outside of computer science,

when they think about this, Stuart Russell is worried about AI safety. You should be worried

too. Do you feel the burden of that? I mean, in a practical sense, yeah, because I get, you know,

a dozen, sometimes 25 invitations a day to talk about it, to give interviews, to write press

articles and so on. So in that very practical sense, I’m seeing that people are concerned and

really interested about this. Are you worried that you could be wrong as all good scientists are?

Of course. I worry about that all the time. I mean, that’s, that’s always been the way that I,

I’ve worked, you know, is like I have an argument in my head with myself, right? So I have,

I have some idea and then I think, okay, how could that be wrong? Or did someone else already have

that idea? So I’ll go and, you know, search in as much literature as I can to see whether someone

else already thought of that or, or even refuted it. So, you know, I, right now I’m, I’m reading a

lot of philosophy because, you know, in, in the form of the debates over, over utilitarianism and,

and other kinds of moral, moral formulas, shall we say, people have already thought through

some of these issues. But, you know, what, one of the things I’m, I’m not seeing in a lot of

these debates is this specific idea about the importance of uncertainty in the objective

that this is the way we should think about machines that are beneficial to humans. So this

idea of provably beneficial machines based on explicit uncertainty in the objective,

you know, it seems to be, you know, my gut feeling is this is the core of it. It’s going to have to

be elaborated in a lot of different directions and there are a lot of beneficial. Yeah. But there,

there are, I mean, it has to be right. We can’t afford, you know, hand wavy beneficial because

there are, you know, whenever we do hand wavy stuff, there are loopholes. And the thing about

super intelligent machines is they find the loopholes, you know, just like, you know, tax

evaders. If you don’t write your tax law properly, people will find the loopholes and end up paying

no tax. And, and so you should think of it this way and, and getting those definitions right,

you know, it is really a long process, you know, so you can, you can define mathematical frameworks

and within that framework, you can prove mathematical theorems that yes, this will,

you know, this, this theoretical entity will be provably beneficial to that theoretical entity,

but that framework may not match the real world in some crucial way. So it’s a long process,

thinking through it, iterating and so on. Last question. Yep. You have 10 seconds to answer it.

What is your favorite sci fi movie about AI? I would say interstellar has my favorite robots.

Oh, beats space. Yeah. Yeah. Yeah. So, so Tars, the robots, one of the robots in interstellar is

the way robot should behave. And, uh, I would say ex machina is in some ways, the one,

the one that makes you think, uh, in a nervous kind of way about, about where we’re going.

Well Stuart, thank you so much for talking today. Pleasure.