Lex Fridman Podcast - #43 - Gary Marcus: Toward a Hybrid of Deep Learning and Symbolic AI

The following is a conversation with Gary Marcus.

He’s a professor emeritus at NYU,

founder of Robust AI and Geometric Intelligence.

The latter is a machine learning company

that was acquired by Uber in 2016.

He’s the author of several books,

Unnatural and Artificial Intelligence,

including his new book, Rebooting AI,

Building Machines We Can Trust.

Gary has been a critical voice,

highlighting the limits of deep learning and AI in general

and discussing the challenges before our AI community

that must be solved in order to achieve

artificial general intelligence.

As I’m having these conversations,

I try to find paths toward insight, towards new ideas.

I try to have no ego in the process.

It gets in the way.

I’ll often continuously try on several hats, several roles.

One, for example, is the role of a three year old

who understands very little about anything

and asks big what and why questions.

The other might be a role of a devil’s advocate

who presents counter ideas with the goal of arriving

at greater understanding through debate.

Hopefully, both are useful, interesting,

and even entertaining at times.

I ask for your patience as I learn

to have better conversations.

This is the Artificial Intelligence Podcast.

If you enjoy it, subscribe on YouTube,

give it five stars on iTunes, support it on Patreon,

or simply connect with me on Twitter

at Lex Friedman, spelled F R I D M A N.

And now, here’s my conversation with Gary Marcus.

Do you think human civilization will one day have

to face an AI driven technological singularity

that will, in a societal way,

modify our place in the food chain

of intelligent living beings on this planet?

I think our place in the food chain has already changed.

So there are lots of things people used to do by hand

that they do with machine.

If you think of a singularity as like one single moment,

which is, I guess, what it suggests,

I don’t know if it’ll be like that,

but I think that there’s a lot of gradual change

and AI is getting better and better.

I mean, I’m here to tell you why I think it’s not nearly

as good as people think, but the overall trend is clear.

Maybe Rick Hertzweil thinks it’s an exponential

and I think it’s linear.

In some cases, it’s close to zero right now,

but it’s all gonna happen.

I mean, we are gonna get to human level intelligence

or whatever you want, artificial general intelligence

at some point, and that’s certainly gonna change

our place in the food chain,

because a lot of the tedious things that we do now,

we’re gonna have machines do,

and a lot of the dangerous things that we do now,

we’re gonna have machines do.

I think our whole lives are gonna change

from people finding their meaning through their work

through people finding their meaning

through creative expression.

So the singularity will be a very gradual,

in fact, removing the meaning of the word singularity.

It’ll be a very gradual transformation in your view.

I think that it’ll be somewhere in between,

and I guess it depends what you mean by gradual and sudden.

I don’t think it’s gonna be one day.

I think it’s important to realize

that intelligence is a multidimensional variable.

So people sort of write this stuff

as if IQ was one number, and the day that you hit 262

or whatever, you displace the human beings.

And really, there’s lots of facets to intelligence.

So there’s verbal intelligence,

and there’s motor intelligence,

and there’s mathematical intelligence and so forth.

Machines, in their mathematical intelligence,

far exceed most people already.

In their ability to play games,

they far exceed most people already.

In their ability to understand language,

they lag behind my five year old,

far behind my five year old.

So there are some facets of intelligence

that machines have grasped, and some that they haven’t,

and we have a lot of work left to do

to get them to, say, understand natural language,

or to understand how to flexibly approach

some kind of novel MacGyver problem solving

kind of situation.

And I don’t know that all of these things will come at once.

I think there are certain vital prerequisites

that we’re missing now.

So for example, machines don’t really have common sense now.

So they don’t understand that bottles contain water,

and that people drink water to quench their thirst,

and that they don’t wanna dehydrate.

They don’t know these basic facts about human beings,

and I think that that’s a rate limiting step

for many things.

It’s a great limiting step for reading, for example,

because stories depend on things like,

oh my God, that person’s running out of water.

That’s why they did this thing.

Or if they only had water, they could put out the fire.

So you watch a movie, and your knowledge

about how things work matter.

And so a computer can’t understand that movie

if it doesn’t have that background knowledge.

Same thing if you read a book.

And so there are lots of places where,

if we had a good machine interpretable set of common sense,

many things would accelerate relatively quickly,

but I don’t think even that is a single point.

There’s many different aspects of knowledge.

And we might, for example, find that we make a lot

of progress on physical reasoning,

getting machines to understand, for example,

how keys fit into locks, or that kind of stuff,

or how this gadget here works, and so forth and so on.

And so machines might do that long before they do

really good psychological reasoning,

because it’s easier to get kind of labeled data

or to do direct experimentation on a microphone stand

than it is to do direct experimentation on human beings

to understand the levers that guide them.

That’s a really interesting point, actually,

whether it’s easier to gain common sense knowledge

or psychological knowledge.

I would say the common sense knowledge

includes both physical knowledge and psychological knowledge.

And the argument I was making.

Well, you said physical versus psychological.

Yeah, physical versus psychological.

And the argument I was making is physical knowledge

might be more accessible, because you could have a robot,

for example, lift a bottle, try putting a bottle cap on it,

see that it falls off if it does this,

and see that it could turn it upside down,

and so the robot could do some experimentation.

We do some of our psychological reasoning

by looking at our own minds.

So I can sort of guess how you might react to something

based on how I think I would react to it.

And robots don’t have that intuition,

and they also can’t do experiments on people

in the same way or we’ll probably shut them down.

So if we wanted to have robots figure out

how I respond to pain by pinching me in different ways,

like that’s probably, it’s not gonna make it

past the human subjects board

and companies are gonna get sued or whatever.

So there’s certain kinds of practical experience

that are limited or off limits to robots.

That’s a really interesting point.

What is more difficult to gain a grounding in?

Because to play devil’s advocate,

I would say that human behavior is easier expressed

in data and digital form.

And so when you look at Facebook algorithms,

they get to observe human behavior.

So you get to study and manipulate even a human behavior

in a way that you perhaps cannot study

or manipulate the physical world.

So it’s true why you said pain is like physical pain,

but that’s again, the physical world.

Emotional pain might be much easier to experiment with,

perhaps unethical, but nevertheless,

some would argue it’s already going on.

I think that you’re right, for example,

that Facebook does a lot of experimentation

in psychological reasoning.

In fact, Zuckerberg talked about AI

at a talk that he gave in NIPS.

I wasn’t there, but the conference

has been renamed NeurIPS,

but he used to be called NIPS when he gave the talk.

And he talked about Facebook basically

having a gigantic theory of mind.

So I think it is certainly possible.

I mean, Facebook does some of that.

I think they have a really good idea

of how to addict people to things.

They understand what draws people back to things.

I think they exploit it in ways

that I’m not very comfortable with.

But even so, I think that there are only some slices

of human experience that they can access

through the kind of interface they have.

And of course, they’re doing all kinds of VR stuff,

and maybe that’ll change and they’ll expand their data.

And I’m sure that that’s part of their goal.

So it is an interesting question.

I think love, fear, insecurity,

all of the things that,

I would say some of the deepest things

about human nature and the human mind

could be explored through digital form.

It’s that you’re actually the first person

just now that brought up,

I wonder what is more difficult.

Because I think folks who are the slow,

and we’ll talk a lot about deep learning,

but the people who are thinking beyond deep learning

are thinking about the physical world.

You’re starting to think about robotics

in the home robotics.

How do we make robots manipulate objects,

which requires an understanding of the physical world

and then requires common sense reasoning.

And that has felt to be like the next step

for common sense reasoning,

but you’ve now brought up the idea

that there’s also the emotional part.

And it’s interesting whether that’s hard or easy.

I think some parts of it are and some aren’t.

So my company that I recently founded with Rod Brooks,

from MIT for many years and so forth,

we’re interested in both.

We’re interested in physical reasoning

and psychological reasoning, among many other things.

And there are pieces of each of these that are accessible.

So if you want a robot to figure out

whether it can fit under a table,

that’s a relatively accessible piece of physical reasoning.

If you know the height of the table

and you know the height of the robot, it’s not that hard.

If you wanted to do physical reasoning about Jenga,

it gets a little bit more complicated

and you have to have higher resolution data

in order to do it.

With psychological reasoning,

it’s not that hard to know, for example,

that people have goals and they like to act on those goals,

but it’s really hard to know exactly what those goals are.

But ideas of frustration.

I mean, you could argue it’s extremely difficult

to understand the sources of human frustration

as they’re playing Jenga with you, or not.

You could argue that it’s very accessible.

There’s some things that are gonna be obvious

and some not.

So I don’t think anybody really can do this well yet,

but I think it’s not inconceivable

to imagine machines in the not so distant future

being able to understand that if people lose in a game,

that they don’t like that.

That’s not such a hard thing to program

and it’s pretty consistent across people.

Most people don’t enjoy losing

and so that makes it relatively easy to code.

On the other hand, if you wanted to capture everything

about frustration, well, people can get frustrated

for a lot of different reasons.

They might get sexually frustrated,

they might get frustrated,

they can get their promotion at work,

all kinds of different things.

And the more you expand the scope,

the harder it is for anything like the existing techniques

to really do that.

So I’m talking to Garret Kasparov next week

and he seemed pretty frustrated

with his game against Deep Blue, so.

Yeah, well, I’m frustrated with my game

against him last year,

because I played him, I had two excuses,

I’ll give you my excuses up front,

but it won’t mitigate the outcome.

I was jet lagged and I hadn’t played in 25 or 30 years,

but the outcome is he completely destroyed me

and it wasn’t even close.

Have you ever been beaten in any board game by a machine?

I have, I actually played the predecessor to Deep Blue.

Deep Thought, I believe it was called,

and that too crushed me.

And that was, and after that you realize it’s over for us.

Well, there’s no point in my playing Deep Blue.

I mean, it’s a waste of Deep Blue’s computation.

I mean, I played Kasparov

because we both gave lectures this same event

and he was playing 30 people.

I forgot to mention that.

Not only did he crush me,

but he crushed 29 other people at the same time.

I mean, but the actual philosophical and emotional experience

of being beaten by a machine, I imagine is a,

I mean, to you who thinks about these things

may be a profound experience.

Or no, it was a simple mathematical experience.

Yeah, I think a game like chess particularly

where you have perfect information,

it’s two player closed end

and there’s more computation for the computer,

it’s no surprise the machine wins.

I mean, I’m not sad when a computer,

I’m not sad when a computer calculates

a cube root faster than me.

Like, I know I can’t win that game.

I’m not gonna try.

Well, with a system like AlphaGo or AlphaZero,

do you see a little bit more magic in a system like that

even though it’s simply playing a board game?

But because there’s a strong learning component?

You know, I find you should mention that

in the context of this conversation

because Kasparov and I are working on an article

that’s gonna be called AI is not magic.

And, you know, neither one of us thinks that it’s magic.

And part of the point of this article

is that AI is actually a grab bag of different techniques

and some of them have,

or they each have their own unique strengths and weaknesses.

So, you know, you read media accounts

and it’s like, ooh, AI, it must be magical

or it can solve any problem.

Well, no, some problems are really accessible

like chess and go and other problems like reading

are completely outside the current technology.

And it’s not like you can take the technology,

that drives AlphaGo and apply it to reading

and get anywhere.

You know, DeepMind has tried that a bit.

They have all kinds of resources.

You know, they built AlphaGo and they have,

you know, I wrote a piece recently that they lost

and you can argue about the word lost,

but they spent $530 million more than they made last year.

So, you know, they’re making huge investments.

They have a large budget

and they have applied the same kinds of techniques

to reading or to language.

It’s just much less productive there

because it’s a fundamentally different kind of problem.

Chess and go and so forth are closed end problems.

The rules haven’t changed in 2,500 years.

There’s only so many moves you can make.

You can talk about the exponential

as you look at the combinations of moves,

but fundamentally, you know, the go board has 361 squares.

That’s it.

That’s the only, you know, those intersections

are the only places that you can place your stone.

Whereas when you’re reading,

the next sentence could be anything.

You know, it’s completely up to the writer

what they’re gonna do next.

That’s fascinating that you think this way.

You’re clearly a brilliant mind

who points out the emperor has no clothes,

but so I’ll play the role of a person who says.

You’re gonna put clothes on the emperor?

Good luck with it.

It romanticizes the notion of the emperor, period,

suggesting that clothes don’t even matter.

Okay, so that’s really interesting

that you’re talking about language.

So there’s the physical world

of being able to move about the world,

making an omelet and coffee and so on.

There’s language where you first understand

what’s being written and then maybe even more complicated

than that, having a natural dialogue.

And then there’s the game of go and chess.

I would argue that language is much closer to go

than it is to the physical world.

Like it is still very constrained.

When you say the possibility of the number of sentences

that could come, it is huge,

but it nevertheless is much more constrained.

It feels maybe I’m wrong than the possibilities

that the physical world brings us.

There’s something to what you say

in some ways in which I disagree.

So one interesting thing about language

is that it abstracts away.

This bottle, I don’t know if it would be in the field of view

is on this table and I use the word on here

and I can use the word on here, maybe not here,

but that one word encompasses in analog space

sort of infinite number of possibilities.

So there is a way in which language filters down

the variation of the world and there’s other ways.

So we have a grammar and more or less

you have to follow the rules of that grammar.

You can break them a little bit,

but by and large we follow the rules of grammar

and so that’s a constraint on language.

So there are ways in which language is a constrained system.

On the other hand, there are many arguments

that say there’s an infinite number of possible sentences

and you can establish that by just stacking them up.

So I think there’s water on the table,

you think that I think there’s water on the table,

your mother thinks that you think that I think

that water’s on the table, your brother thinks

that maybe your mom is wrong to think

that you think that I think, right?

So we can make sentences of infinite length

or we can stack up adjectives.

This is a very silly example, a very, very silly example,

a very, very, very, very, very, very silly example

and so forth.

So there are good arguments

that there’s an infinite range of sentences.

In any case, it’s vast by any reasonable measure

and for example, almost anything in the physical world

we can talk about in the language world

and interestingly, many of the sentences that we understand,

we can only understand if we have a very rich model

of the physical world.

So I don’t ultimately want to adjudicate the debate

that I think you just set up, but I find it interesting.

Maybe the physical world is even more complicated

than language, I think that’s fair, but.

Language is really, really complicated.

It’s really, really hard.

Well, it’s really, really hard for machines,

for linguists, people trying to understand it.

It’s not that hard for children

and that’s part of what’s driven my whole career.

I was a student of Steven Pinker’s

and we were trying to figure out

why kids could learn language when machines couldn’t.

I think we’re gonna get into language,

we’re gonna get into communication intelligence

and neural networks and so on,

but let me return to the high level,

the futuristic for a brief moment.

So you’ve written in your book, in your new book,

it would be arrogant to suppose that we could forecast

where AI will be or the impact it will have

in a thousand years or even 500 years.

So let me ask you to be arrogant.

What do AI systems with or without physical bodies

look like 100 years from now?

If you would just, you can’t predict,

but if you were to philosophize and imagine, do.

Can I first justify the arrogance

before you try to push me beyond it?


I mean, there are examples like,

people figured out how electricity worked,

they had no idea that that was gonna lead to cell phones.

I mean, things can move awfully fast

once new technologies are perfected.

Even when they made transistors,

they weren’t really thinking that cell phones

would lead to social networking.

There are nevertheless predictions of the future,

which are statistically unlikely to come to be,

but nevertheless is the best.

You’re asking me to be wrong.

Asking you to be statistically.

In which way would I like to be wrong?

Pick the least unlikely to be wrong thing,

even though it’s most very likely to be wrong.

I mean, here’s some things

that we can safely predict, I suppose.

We can predict that AI will be faster than it is now.

It will be cheaper than it is now.

It will be better in the sense of being more general

and applicable in more places.

It will be pervasive.

I mean, these are easy predictions.

I’m sort of modeling them in my head

on Jeff Bezos’s famous predictions.

He says, I can’t predict the future,

not in every way, I’m paraphrasing.

But I can predict that people

will never wanna pay more money for their stuff.

They’re never gonna want it to take longer to get there.

So you can’t predict everything,

but you can predict something.

Sure, of course it’s gonna be faster and better.

But what we can’t really predict

is the full scope of where AI will be in a certain period.

I mean, I think it’s safe to say that,

although I’m very skeptical about current AI,

that it’s possible to do much better.

You know, there’s no in principled argument

that says AI is an insolvable problem,

that there’s magic inside our brains

that will never be captured.

I mean, I’ve heard people make those kind of arguments.

I don’t think they’re very good.

So AI’s gonna come, and probably 500 years

is plenty to get there.

And then once it’s here, it really will change everything.

So when you say AI’s gonna come,

are you talking about human level intelligence?

So maybe I…

I like the term general intelligence.

So I don’t think that the ultimate AI,

if there is such a thing, is gonna look just like humans.

I think it’s gonna do some things

that humans do better than current machines,

like reason flexibly.

And understand language and so forth.

But it doesn’t mean they have to be identical to humans.

So for example, humans have terrible memory,

and they suffer from what some people

call motivated reasoning.

So they like arguments that seem to support them,

and they dismiss arguments that they don’t like.

There’s no reason that a machine should ever do that.

So you see that those limitations of memory

as a bug, not a feature.


I’ll say two things about that.

One is I was on a panel with Danny Kahneman,

the Nobel Prize winner, last night,

and we were talking about this stuff.

And I think what we converged on

is that humans are a low bar to exceed.

They may be outside of our skill right now,

but as AI programmers, but eventually AI will exceed it.

So we’re not talking about human level AI.

We’re talking about general intelligence

that can do all kinds of different things

and do it without some of the flaws that human beings have.

The other thing I’ll say is I wrote a whole book,

actually, about the flaws of humans.

It’s actually a nice bookend to the,

or counterpoint to the current book.

So I wrote a book called Cluj,

which was about the limits of the human mind.

The current book is kind of about those few things

that humans do a lot better than machines.

Do you think it’s possible that the flaws

of the human mind, the limits of memory,

our mortality, our bias,

is a strength, not a weakness,

that that is the thing that enables,

from which motivation springs and meaning springs or not?

I’ve heard a lot of arguments like this.

I’ve never found them that convincing.

I think that there’s a lot of making lemonade out of lemons.

So we, for example, do a lot of free association

where one idea just leads to the next

and they’re not really that well connected.

And we enjoy that and we make poetry out of it

and we make kind of movies with free associations

and it’s fun and whatever.

I don’t think that’s really a virtue of the system.

I think that the limitations in human reasoning

actually get us in a lot of trouble.

Like, for example, politically we can’t see eye to eye

because we have the motivational reasoning I was talking

about and something related called confirmation bias.

So we have all of these problems that actually make

for a rougher society because we can’t get along

because we can’t interpret the data in shared ways.

And then we do some nice stuff with that.

So my free associations are different from yours

and you’re kind of amused by them and that’s great.

And hence poetry.

So there are lots of ways in which we take

a lousy situation and make it good.

Another example would be our memories are terrible.

So we play games like Concentration where you flip over

two cards, try to find a pair.

Can you imagine a computer playing that?

Computer’s like, this is the dullest game in the world.

I know where all the cards are, I see it once,

I know where it is, what are you even talking about?

So we make a fun game out of having this terrible memory.

So we are imperfect in discovering and optimizing

some kind of utility function.

But you think in general, there is a utility function.

There’s an objective function that’s better than others.

I didn’t say that.

But see, the presumption, when you say…

I think you could design a better memory system.

You could argue about utility functions

and how you wanna think about that.

But objectively, it would be really nice

to do some of the following things.

To get rid of memories that are no longer useful.

Objectively, that would just be good.

And we’re not that good at it.

So when you park in the same lot every day,

you confuse where you parked today

with where you parked yesterday

with where you parked the day before and so forth.

So you blur together a series of memories.

There’s just no way that that’s optimal.

I mean, I’ve heard all kinds of wacky arguments

of people trying to defend that.

But in the end of the day,

I don’t think any of them hold water.

It’s just above.

Or memories of traumatic events would be possibly

a very nice feature to have to get rid of those.

It’d be great if you could just be like,

I’m gonna wipe this sector.

I’m done with that.

I didn’t have fun last night.

I don’t wanna think about it anymore.

Whoop, bye bye.

I’m gone.

But we can’t.

Do you think it’s possible to build a system…

So you said human level intelligence is a weird concept, but…

Well, I’m saying I prefer general intelligence.

General intelligence.

I mean, human level intelligence is a real thing.

And you could try to make a machine

that matches people or something like that.

I’m saying that per se shouldn’t be the objective,

but rather that we should learn from humans

the things they do well and incorporate that into our AI,

just as we incorporate the things that machines do well

that people do terribly.

So, I mean, it’s great that AI systems

can do all this brute force computation that people can’t.

And one of the reasons I work on this stuff

is because I would like to see machines solve problems

that people can’t, that combine the strength,

or that in order to be solved would combine

the strengths of machines to do all this computation

with the ability, let’s say, of people to read.

So I’d like machines that can read

the entire medical literature in a day.

7,000 new papers or whatever the numbers,

comes out every day.

There’s no way for any doctor or whatever to read them all.

A machine that could read would be a brilliant thing.

And that would be strengths of brute force computation

combined with kind of subtlety and understanding medicine

that a good doctor or scientist has.

So if we can linger a little bit

on the idea of general intelligence.

So Yann LeCun believes that human intelligence

isn’t general at all, it’s very narrow.

How do you think?

I don’t think that makes sense.

We have lots of narrow intelligences for specific problems.

But the fact is, like, anybody can walk into,

let’s say, a Hollywood movie,

and reason about the content

of almost anything that goes on there.

So you can reason about what happens in a bank robbery,

or what happens when someone is infertile

and wants to go to IVF to try to have a child,

or you can, the list is essentially endless.

And not everybody understands every scene in the movie,

but there’s a huge range of things

that pretty much any ordinary adult can understand.

His argument is, is that actually,

the set of things seems large for us humans

because we’re very limited in considering

the kind of possibilities of experiences that are possible.

But in fact, the amount of experience that are possible

is infinitely larger.

Well, I mean, if you wanna make an argument

that humans are constrained in what they can understand,

I have no issue with that.

I think that’s right.

But it’s still not the same thing at all

as saying, here’s a system that can play Go.

It’s been trained on five million games.

And then I say, can it play on a rectangular board

rather than a square board?

And you say, well, if I retrain it from scratch

on another five million games, it can.

That’s really, really narrow, and that’s where we are.

We don’t have even a system that could play Go

and then without further retraining,

play on a rectangular board,

which any human could do with very little problem.

So that’s what I mean by narrow.

And so it’s just wordplay to say.

That is semantics, yeah.

Then it’s just words.

Then yeah, you mean general in a sense

that you can do all kinds of Go board shapes flexibly.

Well, that would be like a first step

in the right direction,

but obviously that’s not what it really meaning.

You’re kidding.

What I mean by general is that you could transfer

the knowledge you learn in one domain to another.

So if you learn about bank robberies in movies

and there’s chase scenes,

then you can understand that amazing scene in Breaking Bad

when Walter White has a car chase scene

with only one person.

He’s the only one in it.

And you can reflect on how that car chase scene

is like all the other car chase scenes you’ve ever seen

and totally different and why that’s cool.

And the fact that the number of domains

you can do that with is finite

doesn’t make it less general.

So the idea of general is you could just do it

on a lot of, don’t transfer it across a lot of domains.

Yeah, I mean, I’m not saying humans are infinitely general

or that humans are perfect.

I just said a minute ago, it’s a low bar,

but it’s just, it’s a low bar.

But right now, like the bar is here and we’re there

and eventually we’ll get way past it.

So speaking of low bars,

you’ve highlighted in your new book as well,

but a couple of years ago wrote a paper

titled Deep Learning, A Critical Appraisal

that lists 10 challenges faced

by current deep learning systems.

So let me summarize them as data efficiency,

transfer learning, hierarchical knowledge,

open ended inference, explainability,

integrating prior knowledge, cause of reasoning,

modeling on a stable world, robustness, adversarial examples

and so on.

And then my favorite probably is reliability

in the engineering of real world systems.

So whatever people can read the paper,

they should definitely read the paper,

should definitely read your book.

But which of these challenges is solved in your view

has the biggest impact on the AI community?

It’s a very good question.

And I’m gonna be evasive because I think that

they go together a lot.

So some of them might be solved independently of others,

but I think a good solution to AI

starts by having real,

what I would call cognitive models of what’s going on.

So right now we have a approach that’s dominant

where you take statistical approximations of things,

but you don’t really understand them.

So you know that bottles are correlated in your data

with bottle caps,

but you don’t understand that there’s a thread

on the bottle cap that fits with the thread on the bottle

and then that’s what tightens it.

If I tighten enough that there’s a seal

and the water won’t come out.

Like there’s no machine that understands that.

And having a good cognitive model

of that kind of everyday phenomena

is what we call common sense.

And if you had that,

then a lot of these other things start to fall

into at least a little bit better place.

Right now you’re like learning correlations between pixels

when you play a video game or something like that.

And it doesn’t work very well.

It works when the video game is just the way

that you studied it and then you alter the video game

in small ways,

like you move the paddle and break out a few pixels

and the system falls apart.

Because it doesn’t understand,

it doesn’t have a representation of a paddle,

a ball, a wall, a set of bricks and so forth.

And so it’s reasoning at the wrong level.

So the idea of common sense,

it’s full of mystery,

you’ve worked on it,

but it’s nevertheless full of mystery,

full of promise.

What does common sense mean?

What does knowledge mean?

So the way you’ve been discussing it now

is very intuitive.

It makes a lot of sense that that is something

we should have and that’s something

deep learning systems don’t have.

But the argument could be that we’re oversimplifying it

because we’re oversimplifying the notion of common sense

because that’s how it feels like we as humans

at the cognitive level approach problems.

So maybe.

A lot of people aren’t actually gonna read my book.

But if they did read the book,

one of the things that might come as a surprise to them

is that we actually say common sense is really hard

and really complicated.

So they would probably,

my critics know that I like common sense,

but that chapter actually starts by us beating up

not on deep learning,

but kind of on our own home team as it will.

So Ernie and I are first and foremost

people that believe in at least some

of what good old fashioned AI tried to do.

So we believe in symbols and logic and programming.

Things like that are important.

And we go through why even those tools

that we hold fairly dear aren’t really enough.

So we talk about why common sense is actually many things.

And some of them fit really well with those

classical sets of tools.

So things like taxonomy.

So I know that a bottle is an object

or it’s a vessel, let’s say.

And I know a vessel is an object

and objects are material things in the physical world.

So I can make some inferences.

If I know that vessels need to not have holes in them,

then I can infer that in order to carry their contents,

then I can infer that a bottle

shouldn’t have a hole in it in order to carry its contents.

So you can do hierarchical inference and so forth.

And we say that’s great,

but it’s only a tiny piece of what you need for common sense.

We give lots of examples that don’t fit into that.

So another one that we talk about is a cheese grater.

You’ve got holes in a cheese grater.

You’ve got a handle on top.

You can build a model in the game engine sense of a model

so that you could have a little cartoon character

flying around through the holes of the grater.

But we don’t have a system yet.

Taxonomy doesn’t help us that much

that really understands why the handle is on top

and what you do with the handle,

or why all of those circles are sharp,

or how you’d hold the cheese with respect to the grater

in order to make it actually work.

Do you think these ideas are just abstractions

that could emerge on a system

like a very large deep neural network?

I’m a skeptic that that kind of emergence per se can work.

So I think that deep learning might play a role

in the systems that do what I want systems to do,

but it won’t do it by itself.

I’ve never seen a deep learning system

really extract an abstract concept.

What they do, principled reasons for that

stemming from how back propagation works,

how the architectures are set up.

One example is deep learning people

actually all build in something called convolution,

which Jan Lacune is famous for, which is an abstraction.

They don’t have their systems learn this.

So the abstraction is an object that looks the same

if it appears in different places.

And what Lacune figured out and why,

essentially why he was a co winner of the Turing Award

was that if you programmed this in innately,

then your system would be a whole lot more efficient.

In principle, this should be learnable,

but people don’t have systems that kind of reify things

and make them more abstract.

And so what you’d really wind up with

if you don’t program that in advance is a system

that kind of realizes that this is the same thing as this,

but then I take your little clock there

and I move it over and it doesn’t realize

that the same thing applies to the clock.

So the really nice thing, you’re right,

that convolution is just one of the things

that’s like, it’s an innate feature

that’s programmed by the human expert.

We need more of those, not less.

Yes, but the nice feature is it feels like

that requires coming up with that brilliant idea,

can get you a Turing Award,

but it requires less effort than encoding

and something we’ll talk about, the expert system.

So encoding a lot of knowledge by hand.

So it feels like there’s a huge amount of limitations

which you clearly outline with deep learning,

but the nice feature of deep learning,

whatever it is able to accomplish,

it does a lot of stuff automatically

without human intervention.

Well, and that’s part of why people love it, right?

But I always think of this quote from Bertrand Russell,

which is it has all the advantages

of theft over honest toil.

It’s really hard to program into a machine

a notion of causality or even how a bottle works

or what containers are.

Ernie Davis and I wrote a, I don’t know,

45 page academic paper trying just to understand

what a container is,

which I don’t think anybody ever read the paper,

but it’s a very detailed analysis of all the things,

well, not even all of it,

some of the things you need to do

in order to understand a container.

It would be a whole lot nice,

and I’m a coauthor on the paper,

I made it a little bit better,

but Ernie did the hard work for that particular paper.

And it took him like three months

to get the logical statements correct.

And maybe that’s not the right way to do it,

it’s a way to do it.

But on that way of doing it,

it’s really hard work to do something

as simple as understanding containers.

And nobody wants to do that hard work,

even Ernie didn’t want to do that hard work.

Everybody would rather just like feed their system in

with a bunch of videos with a bunch of containers

and have the systems infer how containers work.

It would be like so much less effort,

let the machine do the work.

And so I understand the impulse,

I understand why people want to do that.

I just don’t think that it works.

I’ve never seen anybody build a system

that in a robust way can actually watch videos

and predict exactly which containers would leak

and which ones wouldn’t or something like,

and I know someone’s gonna go out and do that

since I said it, and I look forward to seeing it.

But getting these things to work robustly

is really, really hard.

So Yann LeCun, who was my colleague at NYU for many years,

thinks that the hard work should go into defining

an unsupervised learning algorithm

that will watch videos, use the next frame basically

in order to tell it what’s going on.

And he thinks that’s the Royal road

and he’s willing to put in the work

in devising that algorithm.

Then he wants the machine to do the rest.

And again, I understand the impulse.

My intuition, based on years of watching this stuff

and making predictions 20 years ago that still hold

even though there’s a lot more computation and so forth,

is that we actually have to do

a different kind of hard work,

which is more like building a design specification

for what we want the system to do,

doing hard engineering work to figure out

how we do things like what Yann did for convolution

in order to figure out how to encode complex knowledge

into the systems.

The current systems don’t have that much knowledge

other than convolution, which is again,

this objects being in different places

and having the same perception, I guess I’ll say.

Same appearance.

People don’t want to do that work.

They don’t see how to naturally fit one with the other.

I think that’s, yes, absolutely.

But also on the expert system side,

there’s a temptation to go too far the other way.

So we’re just having an expert sort of sit down

and encode the description,

the framework for what a container is,

and then having the system reason the rest.

From my view, one really exciting possibility

is of active learning where it’s continuous interaction

between a human and machine.

As the machine, there’s kind of deep learning type

extraction of information from data patterns and so on,

but humans also guiding the learning procedures,

guiding both the process and the framework

of how the machine learns, whatever the task is.

I was with you with almost everything you said

except the phrase deep learning.

What I think you really want there

is a new form of machine learning.

So let’s remember, deep learning is a particular way

of doing machine learning.

Most often it’s done with supervised data

for perceptual categories.

There are other things you can do with deep learning,

some of them quite technical,

but the standard use of deep learning

is I have a lot of examples and I have labels for them.

So here are pictures.

This one’s the Eiffel Tower.

This one’s the Sears Tower.

This one’s the Empire State Building.

This one’s a cat.

This one’s a pig and so forth.

You just get millions of examples, millions of labels,

and deep learning is extremely good at that.

It’s better than any other solution that anybody has devised,

but it is not good at representing abstract knowledge.

It’s not good at representing things

like bottles contain liquid and have tops to them

and so forth.

It’s not very good at learning

or representing that kind of knowledge.

It is an example of having a machine learn something,

but it’s a machine that learns a particular kind of thing,

which is object classification.

It’s not a particularly good algorithm for learning

about the abstractions that govern our world.

There may be such a thing.

Part of what we counsel in the book

is maybe people should be working on devising such things.

So one possibility, just I wonder what you think about it,

is that deep neural networks do form abstractions,

but they’re not accessible to us humans

in terms of we can’t.

There’s some truth in that.

So is it possible that either current or future

neural networks form very high level abstractions,

which are as powerful as our human abstractions

of common sense.

We just can’t get a hold of them.

And so the problem is essentially

we need to make them explainable.

This is an astute question,

but I think the answer is at least partly no.

One of the kinds of classical neural network architecture

is what we call an auto associator.

It just tries to take an input,

goes through a set of hidden layers,

and comes out with an output.

And it’s supposed to learn essentially

the identity function,

that your input is the same as your output.

So you think of it as binary numbers.

You’ve got the one, the two, the four, the eight,

the 16, and so forth.

And so if you want to input 24,

you turn on the 16, you turn on the eight.

It’s like binary one, one, and a bunch of zeros.

So I did some experiments in 1998

with the precursors of contemporary deep learning.

And what I showed was you could train these networks

on all the even numbers,

and they would never generalize to the odd number.

A lot of people thought that I was, I don’t know,

an idiot or faking the experiment,

or it wasn’t true or whatever.

But it is true that with this class of networks

that we had in that day,

that they would never ever make this generalization.

And it’s not that the networks were stupid,

it’s that they see the world in a different way than we do.

They were basically concerned,

what is the probability that the rightmost output node

is going to be one?

And as far as they were concerned,

in everything they’d ever been trained on, it was a zero.

That node had never been turned on,

and so they figured, why turn it on now?

Whereas a person would look at the same problem and say,

well, it’s obvious,

we’re just doing the thing that corresponds.

The Latin for it is mutatis mutandis,

we’ll change what needs to be changed.

And we do this, this is what algebra is.

So I can do f of x equals y plus two,

and I can do it for a couple of values,

I can tell you if y is three,

then x is five, and if y is four, x is six.

And now I can do it with some totally different number,

like a million, then you can say,

well, obviously it’s a million and two,

because you have an algebraic operation

that you’re applying to a variable.

And deep learning systems kind of emulate that,

but they don’t actually do it.

The particular example,

you could fudge a solution to that particular problem.

The general form of that problem remains,

that what they learn is really correlations

between different input and output nodes.

And they’re complex correlations

with multiple nodes involved and so forth.

Ultimately, they’re correlative,

they’re not structured over these operations over variables.

Now, someday, people may do a new form of deep learning

that incorporates that stuff,

and I think it will help a lot.

And there’s some tentative work on things

like differentiable programming right now

that fall into that category.

But the sort of classic stuff

like people use for ImageNet doesn’t have it.

And you have people like Hinton going around saying,

symbol manipulation, like what Marcus,

what I advocate is like the gasoline engine.

It’s obsolete.

We should just use this cool electric power

that we’ve got with the deep learning.

And that’s really destructive,

because we really do need to have the gasoline engine stuff

that represents, I mean, I don’t think it’s a good analogy,

but we really do need to have the stuff

that represents symbols.

Yeah, and Hinton as well would say

that we do need to throw out everything and start over.

Hinton said that to Axios,

and I had a friend who interviewed him

and tried to pin him down

on what exactly we need to throw out,

and he was very evasive.

Well, of course, because we can’t, if he knew.

Then he’d throw it out himself.

But I mean, you can’t have it both ways.

You can’t be like, I don’t know what to throw out,

but I am gonna throw out the symbols.

I mean, and not just the symbols,

but the variables and the operations over variables.

Don’t forget, the operations over variables,

the stuff that I’m endorsing

and which John McCarthy did when he founded AI,

that stuff is the stuff

that we build most computers out of.

There are people now who say,

we don’t need computer programmers anymore.

Not quite looking at the statistics

of how much computer programmers

actually get paid right now.

We need lots of computer programmers,

and most of them, they do a little bit of machine learning,

but they still do a lot of code, right?

Code where it’s like, if the value of X

is greater than the value of Y,

then do this kind of thing,

like conditionals and comparing operations over variables.

Like, there’s this fantasy you can machine learn anything.

There’s some things you would never wanna machine learn.

I would not use a phone operating system

that was machine learned.

Like, you made a bunch of phone calls

and you recorded which packets were transmitted

and you just machine learned it, it’d be insane.

Or to build a web browser by taking logs of keystrokes

and images, screenshots,

and then trying to learn the relation between them.

Nobody would ever,

no rational person would ever try to build a browser

that made, they would use symbol manipulation,

the stuff that I think AI needs to avail itself of

in addition to deep learning.

Can you describe your view of symbol manipulation

in its early days?

Can you describe expert systems

and where do you think they hit a wall

or a set of challenges?

Sure, so I mean, first I just wanna clarify,

I’m not endorsing expert systems per se.

You’ve been kind of contrasting them.

There is a contrast,

but that’s not the thing that I’m endorsing.

So expert systems tried to capture things

like medical knowledge with a large set of rules.

So if the patient has this symptom and this other symptom,

then it is likely that they have this disease.

So there are logical rules

and they were symbol manipulating rules

of just the sort that I’m talking about.

And the problem.

They encode a set of knowledge that the experts then put in.

And very explicitly so.

So you’d have somebody interview an expert

and then try to turn that stuff into rules.

And at some level I’m arguing for rules.

But the difference is those guys did in the 80s

was almost entirely rules,

almost entirely handwritten with no machine learning.

What a lot of people are doing now

is almost entirely one species of machine learning

with no rules.

And what I’m counseling is actually a hybrid.

I’m saying that both of these things have their advantage.

So if you’re talking about perceptual classification,

how do I recognize a bottle?

Deep learning is the best tool we’ve got right now.

If you’re talking about making inferences

about what a bottle does,

something closer to the expert systems

is probably still the best available alternative.

And probably we want something that is better able

to handle quantitative and statistical information

than those classical systems typically were.

So we need new technologies

that are gonna draw some of the strengths

of both the expert systems and the deep learning,

but are gonna find new ways to synthesize them.

How hard do you think it is to add knowledge at the low level?

So mine human intellects to add extra information

to symbol manipulating systems?

In some domains it’s not that hard,

but it’s often really hard.

Partly because a lot of the things that are important,

people wouldn’t bother to tell you.

So if you pay someone on Amazon Mechanical Turk

to tell you stuff about bottles,

they probably won’t even bother to tell you

some of the basic level stuff

that’s just so obvious to a human being

and yet so hard to capture in machines.

They’re gonna tell you more exotic things,

and they’re all well and good,

but they’re not getting to the root of the problem.

So untutored humans aren’t very good at knowing,

and why should they be,

what kind of knowledge the computer system developers

actually need?

I don’t think that that’s an irremediable problem.

I think it’s historically been a problem.

People have had crowdsourcing efforts,

and they don’t work that well.

There’s one at MIT, we’re recording this at MIT,

called Virtual Home, where,

and we talk about this in the book,

find the exact example there,

but people were asked to do things

like describe an exercise routine.

And the things that the people describe

are at a very low level

and don’t really capture what’s going on.

So they’re like, go to the room

with the television and the weights,

turn on the television,

press the remote to turn on the television,

lift weight, put weight down, whatever.

It’s like very micro level,

and it’s not telling you

what an exercise routine is really about,

which is like, I wanna fit a certain number of exercises

in a certain time period,

I wanna emphasize these muscles.

You want some kind of abstract description.

The fact that you happen to press the remote control

in this room when you watch this television

isn’t really the essence of the exercise routine.

But if you just ask people like, what did they do?

Then they give you this fine grain.

And so it takes a level of expertise

about how the AI works

in order to craft the right kind of knowledge.

So there’s this ocean of knowledge that we all operate on.

Some of them may not even be conscious,

or at least we’re not able to communicate it effectively.

Yeah, most of it we would recognize if somebody said it,

if it was true or not,

but we wouldn’t think to say that it’s true or not.

That’s a really interesting mathematical property.

This ocean has the property

that every piece of knowledge in it,

we will recognize it as true if we’re told,

but we’re unlikely to retrieve it in the reverse.

So that interesting property,

I would say there’s a huge ocean of that knowledge.

What’s your intuition?

Is it accessible to AI systems somehow?

Can we?

So you said this.

I mean, most of it is not,

well, I’ll give you an asterisk on this in a second,

but most of it has not ever been encoded

in machine interpretable form.

And so, I mean, if you say accessible,

there’s two meanings of that.

One is like, could you build it into a machine?


The other is like, is there some database

that we could go download and stick into our machine?

But the first thing, could we?

What’s your intuition? I think we could.

I think it hasn’t been done right.

You know, the closest, and this is the asterisk,

is the CYC psych system tried to do this.

A lot of logicians worked for Doug Lennon

for 30 years on this project.

I think they stuck too closely to logic,

didn’t represent enough about probabilities,

tried to hand code it.

There are various issues,

and it hasn’t been that successful.

That is the closest existing system

to trying to encode this.

Why do you think there’s not more excitement

slash money behind this idea currently?

There was.

People view that project as a failure.

I think that they confuse the failure

of a specific instance that was conceived 30 years ago

for the failure of an approach,

which they don’t do for deep learning.

So in 2010, people had the same attitude

towards deep learning.

They’re like, this stuff doesn’t really work.

And all these other algorithms work better and so forth.

And then certain key technical advances were made,

but mostly it was the advent

of graphics processing units that changed that.

It wasn’t even anything foundational in the techniques.

And there was some new tricks,

but mostly it was just more compute and more data,

things like ImageNet that didn’t exist before

that allowed deep learning.

And it could be, to work,

it could be that CYC just needs a few more things

or something like CYC,

but the widespread view is that that just doesn’t work.

And people are reasoning from a single example.

They don’t do that with deep learning.

They don’t say nothing that existed in 2010,

and there were many, many efforts in deep learning

was really worth anything.

I mean, really, there’s no model from 2010

in deep learning or the predecessors of deep learning

that has any commercial value whatsoever at this point.

They’re all failures.

But that doesn’t mean that there wasn’t anything there.

I have a friend, I was getting to know him,

and he said, I had a company too,

I was talking about I had a new company.

He said, I had a company too, and it failed.

And I said, well, what did you do?

And he said, deep learning.

And the problem was he did it in 1986

or something like that.

And we didn’t have the tools then, or 1990,

we didn’t have the tools then, not the algorithms.

His algorithms weren’t that different from model algorithms,

but he didn’t have the GPUs to run it fast enough.

He didn’t have the data.

And so it failed.

It could be that symbol manipulation per se

with modern amounts of data and compute

and maybe some advance in compute

for that kind of compute might be great.

My perspective on it is not that we want to resuscitate

that stuff per se, but we want to borrow lessons from it,

bring together with other things that we’ve learned.

And it might have an ImageNet moment

where it would spark the world’s imagination

and there’ll be an explosion of symbol manipulation efforts.

Yeah, I think that people at AI2,

Paul Allen’s AI Institute, are trying to build data sets.

Well, they’re not doing it

for quite the reason that you say,

but they’re trying to build data sets

that at least spark interest in common sense reasoning.

To create benchmarks.

Benchmarks for common sense.

That’s a large part of what the AI2.org

is working on right now.

So speaking of compute,

Rich Sutton wrote a blog post titled Bitter Lesson.

I don’t know if you’ve read it,

but he said that the biggest lesson that can be read

from so many years of AI research

is that general methods that leverage computation

are ultimately the most effective.

Do you think that?

The most effective at what?

Right, so they have been most effective

for perceptual classification problems

and for some reinforcement learning problems.

And he works on reinforcement learning.

Well, no, let me push back on that.

You’re actually absolutely right.

But I would also say they have been most effective generally

because everything we’ve done up to…

Would you argue against that?

Is, to me, deep learning is the first thing

that has been successful at anything in AI.

And you’re pointing out that this success

is very limited, folks,

but has there been something truly successful

before deep learning?

Sure, I mean, I want to make a larger point,

but on the narrower point, classical AI is used,

for example, in doing navigation instructions.

It’s very successful.

Everybody on the planet uses it now,

like multiple times a day.

That’s a measure of success, right?

So I don’t think classical AI was wildly successful,

but there are cases like that.

They’re just used all the time.

Nobody even notices them because they’re so pervasive.

So there are some successes for classical AI.

I think deep learning has been more successful,

but my usual line about this, and I didn’t invent it,

but I like it a lot,

is just because you can build a better ladder

doesn’t mean you can build a ladder to the moon.

So the bitter lesson is if you have

a perceptual classification problem,

throwing a lot of data at it is better than anything else.

But that has not given us any material progress

in natural language understanding,

common sense reasoning,

like a robot would need to navigate a home.

Problems like that, there’s no actual progress there.

So flip side of that, if we remove data from the picture,

another bitter lesson is that you just have

a very simple algorithm,

and you wait for compute to scale.

It doesn’t have to be learning.

It doesn’t have to be deep learning.

It doesn’t have to be data driven,

but just wait for the compute.

So my question for you,

do you think compute can unlock some of the things

with either deep learning or symbol manipulation that?

Sure, but I’ll put a proviso on that.

I think more compute’s always better.

Nobody’s gonna argue with more compute.

It’s like having more money.

I mean, there’s the data.

There’s diminishing returns on more money.

Exactly, there’s diminishing returns on more money,

but nobody’s gonna argue

if you wanna give them more money, right?

Except maybe the people who signed the giving pledge,

and some of them have a problem.

They’ve promised to give away more money

than they’re able to.

But the rest of us, if you wanna give me more money, fine.

I’m saying more money, more problems, but okay.

That’s true too.

What I would say to you is your brain uses like 20 watts,

and it does a lot of things that deep learning doesn’t do,

or that symbol manipulation doesn’t do,

that AI just hasn’t figured out how to do.

So it’s an existence proof

that you don’t need server resources

that are Google scale in order to have an intelligence.

I built, with a lot of help from my wife,

two intelligences that are 20 watts each,

and far exceed anything that anybody else

has built at a silicon.

Speaking of those two robots,

what have you learned about AI from having?

Well, they’re not robots, but.

Sorry, intelligent agents.

Those two intelligent agents.

I’ve learned a lot by watching my two intelligent agents.

I think that what’s fundamentally interesting,

well, one of the many things

that’s fundamentally interesting about them

is the way that they set their own problems to solve.

So my two kids are a year and a half apart.

They’re both five and six and a half.

They play together all the time,

and they’re constantly creating new challenges.

That’s what they do, is they make up games,

and they’re like, well, what if this, or what if that,

or what if I had this superpower,

or what if you could walk through this wall?

So they’re doing these what if scenarios all the time,

and that’s how they learn something about the world

and grow their minds, and machines don’t really do that.

So that’s interesting, and you’ve talked about this,

you’ve written about it, you’ve thought about it,

nature versus nurture.

So what innate knowledge do you think we’re born with,

and what do we learn along the way

in those early months and years?

Can I just say how much I like that question?

You phrased it just right, and almost nobody ever does,

which is what is the innate knowledge

and what’s learned along the way?

So many people dichotomize it,

and they think it’s nature versus nurture,

when it is obviously has to be nature and nurture.

They have to work together.

You can’t learn this stuff along the way

unless you have some innate stuff,

but just because you have the innate stuff

doesn’t mean you don’t learn anything.

And so many people get that wrong, including in the field.

People think if I work in machine learning,

the learning side, I must not be allowed to work

on the innate side, or that will be cheating.

Exactly, people have said that to me,

and it’s just absurd, so thank you.

But you could break that apart more.

I’ve talked to folks who studied

the development of the brain,

and the growth of the brain in the first few days

in the first few months in the womb,

all of that, is that innate?

So that process of development from a stem cell

to the growth of the central nervous system and so on,

to the information that’s encoded

through the long arc of evolution.

So all of that comes into play, and it’s unclear.

It’s not just whether it’s a dichotomy or not.

It’s where most, or where the knowledge is encoded.

So what’s your intuition about the innate knowledge,

the power of it, what’s contained in it,

what can we learn from it?

One of my earlier books was actually trying

to understand the biology of this.

The book was called The Birth of the Mind.

Like how is it the genes even build innate knowledge?

And from the perspective of the conversation

we’re having today, there’s actually two questions.

One is what innate knowledge or mechanisms,

or what have you, people or other animals

might be endowed with.

I always like showing this video

of a baby ibex climbing down a mountain.

That baby ibex, a few hours after its birth,

knows how to climb down a mountain.

That means that it knows, not consciously,

something about its own body and physics

and 3D geometry and all of this kind of stuff.

So there’s one question about what does biology

give its creatures and what has evolved in our brains?

How is that represented in our brains?

The question I thought about in the book

The Birth of the Mind.

And then there’s a question of what AI should have.

And they don’t have to be the same.

But I would say that it’s a pretty interesting

set of things that we are equipped with

that allows us to do a lot of interesting things.

So I would argue or guess, based on my reading

of the developmental psychology literature,

which I’ve also participated in,

that children are born with a notion of space,

time, other agents, places,

and also this kind of mental algebra

that I was describing before.

No certain causation if I didn’t just say that.

So at least those kinds of things.

They’re like frameworks for learning the other things.

Are they disjoint in your view

or is it just somehow all connected?

You’ve talked a lot about language.

Is it all kind of connected in some mesh

that’s language like?

If understanding concepts all together or?

I don’t think we know for people how they’re represented

and machines just don’t really do this yet.

So I think it’s an interesting open question

both for science and for engineering.

Some of it has to be at least interrelated

in the way that the interfaces of a software package

have to be able to talk to one another.

So the systems that represent space and time

can’t be totally disjoint because a lot of the things

that we reason about are the relations

between space and time and cause.

So I put this on and I have expectations

about what’s gonna happen with the bottle cap

on top of the bottle and those span space and time.

If the cap is over here, I get a different outcome.

If the timing is different, if I put this here,

after I move that, then I get a different outcome.

That relates to causality.

So obviously these mechanisms, whatever they are,

can certainly communicate with each other.

So I think evolution had a significant role

to play in the development of this whole kluge, right?

How efficient do you think is evolution?

Oh, it’s terribly inefficient except that.

Okay, well, can we do better?

Well, I’ll come to that in a sec.

It’s inefficient except that.

Once it gets a good idea, it runs with it.

So it took, I guess, a billion years,

if I went roughly a billion years, to evolve

to a vertebrate brain plan.

And once that vertebrate brain plan evolved,

it spread everywhere.

So fish have it and dogs have it and we have it.

We have adaptations of it and specializations of it,

but, and the same thing with a primate brain plan.

So monkeys have it and apes have it and we have it.

So there are additional innovations like color vision

and those spread really rapidly.

So it takes evolution a long time to get a good idea,

but, and I’m being anthropomorphic and not literal here,

but once it has that idea, so to speak,

which cashes out into one set of genes or in the genome,

those genes spread very rapidly

and they’re like subroutines or libraries,

I guess the word people might use nowadays

or be more familiar with.

They’re libraries that get used over and over again.

So once you have the library for building something

with multiple digits, you can use it for a hand,

but you can also use it for a foot.

You just kind of reuse the library

with slightly different parameters.

Evolution does a lot of that,

which means that the speed over time picks up.

So evolution can happen faster

because you have bigger and bigger libraries.

And what I think has happened in attempts

at evolutionary computation is that people start

with libraries that are very, very minimal,

like almost nothing, and then progress is slow

and it’s hard for someone to get a good PhD thesis

out of it and they give up.

If we had richer libraries to begin with,

if you were evolving from systems

that had an rich innate structure to begin with,

then things might speed up.

Or more PhD students, if the evolutionary process

is indeed in a meta way runs away with good ideas,

you need to have a lot of ideas,

pool of ideas in order for it to discover one

that you can run away with.

And PhD students representing individual ideas as well.

Yeah, I mean, you could throw

a billion PhD students at it.

Yeah, the monkeys are typewriters with Shakespeare, yep.

Well, I mean, those aren’t cumulative, right?

That’s just random.

And part of the point that I’m making

is that evolution is cumulative.

So if you have a billion monkeys independently,

you don’t really get anywhere.

But if you have a billion monkeys,

and I think Dawkins made this point originally,

or probably other people, Dawkins made it very nice

and either a selfish gene or blind watchmaker.

If there is some sort of fitness function

that can drive you towards something,

I guess that’s Dawkins point.

And my point, which is a variation on that,

is that if the evolution is cumulative,

I mean, the related points,

then you can start going faster.

Do you think something like the process of evolution

is required to build intelligent systems?

So if we… Not logically.

So all the stuff that evolution did,

a good engineer might be able to do.

So for example, evolution made quadrupeds,

which distribute the load across a horizontal surface.

A good engineer could come up with that idea.

I mean, sometimes good engineers come up with ideas

by looking at biology.

There’s lots of ways to get your ideas.

Part of what I’m suggesting

is we should look at biology a lot more.

We should look at the biology of thought and understanding

and the biology by which creatures intuitively reason

about physics or other agents,

or like how do dogs reason about people?

Like they’re actually pretty good at it.

If we could understand, at my college we joked dognition,

if we could understand dognition well,

and how it was implemented, that might help us with our AI.

So do you think it’s possible

that the kind of timescale that evolution took

is the kind of timescale that will be needed

to build intelligent systems?

Or can we significantly accelerate that process

inside a computer?

I mean, I think the way that we accelerate that process

is we borrow from biology, not slavishly,

but I think we look at how biology has solved problems

and we say, does that inspire

any engineering solutions here?

Try to mimic biological systems

and then therefore have a shortcut.

Yeah, I mean, there’s a field called biomimicry

and people do that for like material science all the time.

We should be doing the analog of that for AI

and the analog for that for AI

is to look at cognitive science or the cognitive sciences,

which is psychology, maybe neuroscience, linguistics,

and so forth, look to those for insight.

What do you think is a good test of intelligence

in your view?

So I don’t think there’s one good test.

In fact, I tried to organize a movement

towards something called a Turing Olympics

and my hope is that Francois is actually gonna take,

Francois Chollet is gonna take over this.

I think he’s interested and I don’t,

I just don’t have place in my busy life at this moment,

but the notion is that there’d be many tests

and not just one because intelligence is multifaceted.

There can’t really be a single measure of it

because it isn’t a single thing.

Like just the crudest level,

the SAT has a verbal component and a math component

because they’re not identical.

And Howard Gardner has talked about multiple intelligences

like kinesthetic intelligence

and verbal intelligence and so forth.

There are a lot of things that go into intelligence

and people can get good at one or the other.

I mean, in some sense, like every expert has developed

a very specific kind of intelligence

and then there are people that are generalists

and I think of myself as a generalist

with respect to cognitive science,

which doesn’t mean I know anything about quantum mechanics,

but I know a lot about the different facets of the mind.

And there’s a kind of intelligence

to thinking about intelligence.

I like to think that I have some of that,

but social intelligence, I’m just okay.

There are people that are much better at that than I am.

Sure, but what would be really impressive to you?

I think the idea of a touring Olympics is really interesting

especially if somebody like Francois is running it,

but to you in general, not as a benchmark,

but if you saw an AI system being able to accomplish

something that would impress the heck out of you,

what would that thing be?

Would it be natural language conversation?

For me personally, I would like to see

a kind of comprehension that relates to what you just said.

So I wrote a piece in the New Yorker in I think 2015

right after Eugene Guestman, which was a software package,

won a version of the Turing test.

And the way that it did this is it be,

well, the way you win the Turing test,

so called win it, is the Turing test is you fool a person

into thinking that a machine is a person,

is you’re evasive, you pretend to have limitations

so you don’t have to answer certain questions and so forth.

So this particular system pretended to be a 13 year old boy

from Odessa who didn’t understand English

and was kind of sarcastic

and wouldn’t answer your questions and so forth.

And so judges got fooled into thinking briefly

with a very little exposure, it was a 13 year old boy,

and it docked all the questions

Turing was actually interested in,

which is like how do you make the machine

actually intelligent?

So that test itself is not that good.

And so in New Yorker, I proposed an alternative, I guess,

and the one that I proposed there

was a comprehension test.

And I must like Breaking Bad

because I’ve already given you one Breaking Bad example

and in that article, I have one as well,

which was something like if Walter,

you should be able to watch an episode of Breaking Bad

or maybe you have to watch the whole series

to be able to answer the question and say,

if Walter White took a hit out on Jesse,

why did he do that?

So if you could answer kind of arbitrary questions

about characters motivations, I would be really impressed

with that and he built software to do that.

They could watch a film or there are different versions.

And so ultimately, I wrote this up with Praveen Paritosh

in a special issue of AI Magazine

that basically was about the Turing Olympics.

There were like 14 tests proposed.

The one that I was pushing was a comprehension challenge

and Praveen who’s at Google was trying to figure out

like how we would actually run it

and so we wrote a paper together.

And you could have a text version too

or you could have an auditory podcast version,

you could have a written version.

But the point is that you win at this test

if you can do, let’s say human level or better than humans

at answering kind of arbitrary questions.

Why did this person pick up the stone?

What were they thinking when they picked up the stone?

Were they trying to knock down glass?

And I mean, ideally these wouldn’t be multiple choice either

because multiple choice is pretty easily gamed.

So if you could have relatively open ended questions

and you can answer why people are doing this stuff,

I would be very impressed.

And of course, humans can do this, right?

If you watch a well constructed movie

and somebody picks up a rock,

everybody watching the movie

knows why they picked up the rock, right?

They all know, oh my gosh,

he’s gonna hit this character or whatever.

We have an example in the book about

when a whole bunch of people say, I am Spartacus,

you know, this famous scene.

The viewers understand,

first of all, that everybody or everybody minus one

has to be lying.

They can’t all be Spartacus.

We have enough common sense knowledge

to know they couldn’t all have the same name.

We know that they’re lying

and we can infer why they’re lying, right?

They’re lying to protect someone

and to protect things they believe in.

You get a machine that can do that.

They can say, this is why these guys all got up

and said, I am Spartacus.

I will sit down and say, AI has really achieved a lot.

Thank you.

Without cheating any part of the system.

Yeah, I mean, if you do it,

there are lots of ways you could cheat.

You could build a Spartacus machine

that works on that film.

That’s not what I’m talking about.

I’m talking about, you can do this

with essentially arbitrary films

or from a large set. Even beyond films

because it’s possible such a system would discover

that the number of narrative arcs in film

is limited to 1930. Well, there’s a famous thing

about the classic seven plots or whatever.

I don’t care.

If you wanna build in the system,

boy meets girl, boy loses girl, boy finds girl.

That’s fine.

I don’t mind having some head stories on it.

And they acknowledge.

Okay, good.

I mean, you could build it in innately

or you could have your system watch a lot of films again.

If you can do this at all,

but with a wide range of films,

not just one film in one genre.

But even if you could do it for all Westerns,

I’d be reasonably impressed.


So in terms of being impressed,

just for the fun of it,

because you’ve put so many interesting ideas out there

in your book,

challenging the community for further steps.

Is it possible on the deep learning front

that you’re wrong about its limitations?

That deep learning will unlock,

Yann LeCun next year will publish a paper

that achieves this comprehension.

So do you think that way often as a scientist?

Do you consider that your intuition

that deep learning could actually run away with it?

I’m more worried about rebranding

as a kind of political thing.

So, I mean, what’s gonna happen, I think,

is the deep learning is gonna start

to encompass symbol manipulation.

So I think Hinton’s just wrong.

Hinton says we don’t want hybrids.

I think people will work towards hybrids

and they will relabel their hybrids as deep learning.

We’ve already seen some of that.

So AlphaGo is often described as a deep learning system,

but it’s more correctly described as a system

that has deep learning, but also Monte Carlo tree search,

which is a classical AI technique.

And people will start to blur the lines

in the way that IBM blurred Watson.

First, Watson meant this particular system,

and then it was just anything that IBM built

in their cognitive division.

But purely, let me ask, for sure,

that’s a branding question and that’s like a giant mess.

I mean, purely, a single neural network

being able to accomplish reasonable comprehension.

I don’t stay up at night worrying

that that’s gonna happen.

And I’ll just give you two examples.

One is a guy at DeepMind thought he had finally outfoxed me.

At Zergilord, I think is his Twitter handle.

And he said, he specifically made an example.

Marcus said that such and such.

He fed it into GP2, which is the AI system

that is so smart that OpenAI couldn’t release it

because it would destroy the world, right?

You remember that a few months ago.

So he feeds it into GPT2, and my example

was something like a rose is a rose,

a tulip is a tulip, a lily is a blank.

And he got it to actually do that,

which was a little bit impressive.

And I wrote back and I said, that’s impressive,

but can I ask you a few questions?

I said, was that just one example?

Can it do it generally?

And can it do it with novel words,

which was part of what I was talking about in 1998

when I first raised the example.

So a dax is a dax, right?

And he sheepishly wrote back about 20 minutes later.

And the answer was, well, it had some problems with those.

So I made some predictions 21 years ago that still hold.

In the world of computer science, that’s amazing, right?

Because there’s a thousand or a million times more memory

and computations a million times,

do million times more operations per second

spread across a cluster.

And there’s been advances in replacing sigmoids

with other functions and so forth.

There’s all kinds of advances,

but the fundamental architecture hasn’t changed

and the fundamental limit hasn’t changed.

And what I said then is kind of still true.

Then here’s a second example.

I recently had a piece in Wired

that’s adapted from the book.

And the book went to press before GP2 came out,

but we described this children’s story

and all the inferences that you make in this story

about a boy finding a lost wallet.

And for fun, in the Wired piece, we ran it through GP2.

GPT2, something called talktotransformer.com,

and your viewers can try this experiment themselves.

Go to the Wired piece that has the link

and it has the story.

And the system made perfectly fluent text

that was totally inconsistent

with the conceptual underpinnings of the story, right?

This is what, again, I predicted in 1998.

And for that matter, Chomsky and Miller

made the same prediction in 1963.

I was just updating their claim for a slightly new text.

So those particular architectures

that don’t have any built in knowledge,

they’re basically just a bunch of layers

doing correlational stuff.

They’re not gonna solve these problems.

So 20 years ago, you said the emperor has no clothes.

Today, the emperor still has no clothes.

The lighting’s better though.

The lighting is better.

And I think you yourself are also, I mean.

And we found out some things to do with naked emperors.

I mean, it’s not like stuff is worthless.

I mean, they’re not really naked.

It’s more like they’re in their briefs

than everybody thinks they are.

And so like, I mean, they are great at speech recognition,

but the problems that I said were hard.

I didn’t literally say the emperor has no clothes.

I said, this is a set of problems

that humans are really good at.

And it wasn’t couched as AI.

It was couched as cognitive science.

But I said, if you wanna build a neural model

of how humans do certain class of things,

you’re gonna have to change the architecture.

And I stand by those claims.

So, and I think people should understand

you’re quite entertaining in your cynicism,

but you’re also very optimistic and a dreamer

about the future of AI too.

So you’re both, it’s just.

There’s a famous saying about being,

people overselling technology in the short run

and underselling it in the long run.

And so I actually end the book,

Ernie Davis and I end our book with an optimistic chapter,

which kind of killed Ernie

because he’s even more pessimistic than I am.

He describes me as a contrarian and him as a pessimist.

But I persuaded him that we should end the book

with a look at what would happen

if AI really did incorporate, for example,

the common sense reasoning and the nativism

and so forth, the things that we counseled for.

And we wrote it and it’s an optimistic chapter

that AI suitably reconstructed so that we could trust it,

which we can’t now, could really be world changing.

So on that point, if you look at the future trajectories

of AI, people have worries about negative effects of AI,

whether it’s at the large existential scale

or smaller short term scale of negative impact on society.

So you write about trustworthy AI,

how can we build AI systems that align with our values,

that make for a better world,

that we can interact with, that we can trust?

The first thing we have to do

is to replace deep learning with deep understanding.

So you can’t have alignment with a system

that traffics only in correlations

and doesn’t understand concepts like bottles or harm.

So Asimov talked about these famous laws

and the first one was first do no harm.

And you can quibble about the details of Asimov’s laws,

but we have to, if we’re gonna build real robots

in the real world, have something like that.

That means we have to program in a notion

that’s at least something like harm.

That means we have to have these more abstract ideas

that deep learning is not particularly good at.

They have to be in the mix somewhere.

And you could do statistical analysis

about probabilities of given harms or whatever,

but you have to know what a harm is

in the same way that you have to understand

that a bottle isn’t just a collection of pixels.

And also be able to, you’re implying

that you need to also be able to communicate

that to humans so the AI systems would be able

to prove to humans that they understand

that they know what harm means.

I might run it in the reverse direction,

but roughly speaking, I agree with you.

So we probably need to have committees

of wise people, ethicists and so forth.

Think about what these rules ought to be

and we shouldn’t just leave it to software engineers.

It shouldn’t just be software engineers

and it shouldn’t just be people

who own large mega corporations

that are good at technology, ethicists

and so forth should be involved.

But there should be some assembly of wise people

as I was putting it that tries to figure out

what the rules ought to be.

And those have to get translated into code.

You can argue or code or neural networks or something.

They have to be translated into something

that machines can work with.

And that means there has to be a way

of working the translation.

And right now we don’t.

We don’t have a way.

So let’s say you and I were the committee

and we decide that Asimov’s first law is actually right.

And let’s say it’s not just two white guys,

which would be kind of unfortunate that we have abroad.

And so we’ve representative sample of the world

or however we wanna do this.

And the committee decides eventually,

okay, Asimov’s first law is actually pretty good.

There are these exceptions to it.

We wanna program in these exceptions.

But let’s start with just the first one

and then we’ll get to the exceptions.

First one is first do no harm.

Well, somebody has to now actually turn that into

a computer program or a neural network or something.

And one way of taking the whole book,

the whole argument that I’m making

is that we just don’t have to do that yet.

And we’re fooling ourselves

if we think that we can build trustworthy AI

if we can’t even specify in any kind of,

we can’t do it in Python and we can’t do it in TensorFlow.

We’re fooling ourselves in thinking

that we can make trustworthy AI

if we can’t translate harm into something

that we can execute.

And if we can’t, then we should be thinking really hard

how could we ever do such a thing?

Because if we’re gonna use AI

in the ways that we wanna use it,

to make job interviews or to do surveillance,

not that I personally wanna do that or whatever.

I mean, if we’re gonna use AI

in ways that have practical impact on people’s lives

or medicine, it’s gotta be able

to understand stuff like that.

So one of the things your book highlights

is that a lot of people in the deep learning community,

but also the general public, politicians,

just people in all general groups and walks of life

have different levels of misunderstanding of AI.

So when you talk about committees,

what’s your advice to our society?

How do we grow, how do we learn about AI

such that such committees could emerge

where large groups of people could have

a productive discourse about

how to build successful AI systems?

Part of the reason we wrote the book

was to try to inform those committees.

So part of the reason we wrote the book

was to inspire a future generation of students

to solve what we think are the important problems.

So a lot of the book is trying to pinpoint

what we think are the hard problems

where we think effort would most be rewarded.

And part of it is to try to train people

who talk about AI, but aren’t experts in the field

to understand what’s realistic and what’s not.

One of my favorite parts in the book

is the six questions you should ask

anytime you read a media account.

So like number one is if somebody talks about something,

look for the demo.

If there’s no demo, don’t believe it.

Like the demo that you can try.

If you can’t try it at home,

maybe it doesn’t really work that well yet.

So if, we don’t have this example in the book,

but if Sundar Pinchai says we have this thing

that allows it to sound like human beings in conversation,

you should ask, can I try it?

And you should ask how general it is.

And it turns out at that time,

I’m alluding to Google Duplex when it was announced,

it only worked on calling hairdressers,

restaurants and finding opening hours.

That’s not very general, that’s narrow AI.

And I’m not gonna ask your thoughts about Sophia,

but yeah, I understand that’s a really good question

to ask of any kind of hype top idea.

Sophia has very good material written for her,

but she doesn’t understand the things that she’s saying.

So a while ago you’ve written a book

on the science of learning, which I think is fascinating,

but the learning case studies of playing guitar.

That’s called Guitar Zero.

I love guitar myself, I’ve been playing my whole life.

So let me ask a very important question.

What is your favorite song, rock song,

to listen to or try to play?

Well, those would be different,

but I’ll say that my favorite rock song to listen to

is probably All Along the Watchtower,

The Jimi Hendrix version.

It feels magic to me.

I’ve actually recently learned it, I love that song.

I’ve been trying to put it on YouTube, myself singing.

Singing is the scary part.

If you could party with a rock star for a weekend,

living or dead, who would you choose?

And pick their mind, it’s not necessarily about the partying.

Thanks for the clarification.

I guess John Lennon’s such an intriguing person,

and I think a troubled person, but an intriguing one.


Well, Imagine is one of my favorite songs.

Also one of my favorite songs.

That’s a beautiful way to end it.

Gary, thank you so much for talking to me.

Thanks so much for having me.

comments powered by Disqus