Lex Fridman Podcast - #222 - Jay McClelland: Neural Networks and the Emergence of Cognition

The following is a conversation with Jay McClelland,

a cognitive scientist at Stanford

and one of the seminal figures

in the history of artificial intelligence

and specifically neural networks.

Having written the parallel distributed processing book

with David Rommelhart,

who coauthored the backpropagation paper with Jeff Hinton.

In their collaborations, they’ve paved the way

for many of the ideas

at the center of the neural network based

machine learning revolution of the past 15 years.

To support this podcast,

please check out our sponsors in the description.

This is the Lex Friedman podcast

and here is my conversation with Jay McClelland.

You are one of the seminal figures

in the history of neural networks.

At the intersection of cognitive psychology

and computer science,

what to you has over the decades emerged

as the most beautiful aspect about neural networks?

Both artificial and biological.

The fundamental thing I think about with neural networks

is how they allow us to link

biology with the mysteries of thought.

When I was first entering the field myself

in the late 60s, early 70s,

cognitive psychology had just become a field.

There was a book published in 67 called Cognitive Psychology.

And the author said that the study of the nervous system

was only of peripheral interest.

It wasn’t going to tell us anything about the mind.

And I didn’t agree with that.

I always felt, oh, look, I’m a physical being.

From dust to dust, you know,

ashes to ashes, and somehow I emerged from that.

So that’s really interesting.

So there was a sense with cognitive psychology

that in understanding the neuronal structure of things,

you’re not going to be able to understand the mind.

And then your sense is if we study these neural networks,

we might be able to get at least very close

to understanding the fundamentals of the human mind.

Yeah.

I used to think, or I used to talk about the idea

of awakening from the Cartesian dream.

So Descartes, you know, thought about these things, right?

He was walking in the gardens of Versailles one day,

and he stepped on a stone.

And a statue moved.

And he walked a little further,

he stepped on another stone, and another statue moved.

And he, like, why did the statue move

when I stepped on the stone?

And he went and talked to the gardeners,

and he found out that they had a hydraulic system

that allowed the physical contact with the stone

to cause water to flow in various directions,

which caused water to flow into the statue

and move the statue.

And he used this as the beginnings of a theory

about how animals act.

And he had this notion that these little fibers

that people had identified that weren’t carrying the blood,

you know, were these little hydraulic tubes

that if you touch something, there would be pressure,

and it would send a signal of pressure

to the other parts of the system,

and that would cause action.

So he had a mechanistic theory of animal behavior.

And he thought that the human had this animal body,

but that some divine something else

had to have come down and been placed in him

to give him the ability to think, right?

So the physical world includes the body in action,

but it doesn’t include thought according to Descartes, right?

And so the study of physiology at that time

was the study of sensory systems and motor systems

and things that you could directly measure

when you stimulated neurons and stuff like that.

And the study of cognition was something that, you know,

was tied in with abstract computer algorithms

and things like that.

But when I was an undergraduate,

I learned about the physiological mechanisms.

And so when I’m studying cognitive psychology

as a first year PhD student, I’m saying,

wait a minute, the whole thing is biological, right?

You know?

You had that intuition right away.

That always seemed obvious to you.

Yeah, yeah.

Isn’t that magical, though,

that from just a little bit of biology can emerge

the full beauty of the human experience?

Why is that so obvious to you?

Well, obvious and not obvious at the same time.

And I think about Darwin in this context, too,

because Darwin knew very early on

that none of the ideas that anybody had ever offered

gave him a sense of understanding

how evolution could have worked.

But he wanted to figure out how it could have worked.

That was his goal.

And he spent a lot of time working on this idea

and reading about things that gave him hints

and thinking they were interesting but not knowing why

and drawing more and more pictures of different birds

that differ slightly from each other and so on, you know.

And then he figured it out.

But after he figured it out, he had nightmares about it.

He would dream about the complexity of the eye

and the arguments that people had given

about how ridiculous it was to imagine

that that could have ever emerged

from some sort of, you know, unguided process, right?

That it hadn’t been the product of design.

And so he didn’t publish for a long time,

in part because he was scared of his own ideas.

He didn’t think they could possibly be true.

But then, you know, by the time

the 20th century rolls around, we all,

you know, we understand that,

many people understand or believe

that evolution produced, you know, the entire

range of animals that there are.

And, you know, Descartes’s idea starts to seem

a little wonky after a while, right?

Like, well, wait a minute.

There’s the apes and the chimpanzees and the bonobos

and, you know, like, they’re pretty smart in some ways.

You know, so what?

Oh, you know, somebody comes up,

oh, there’s a certain part of the brain

that’s still different.

They don’t, you know, there’s no hippocampus

in the monkey brain.

It’s only in the human brain.

Huxley had to do a surgery in front of many, many people

in the late 19th century to show to them

there’s actually a hippocampus in the chimpanzee’s brain.

You know, so the continuity of the species

is another element that, you know,

contributes to this sort of, you know, idea

that we are ourselves a total product of nature.

And that, to me, is the magic and the mystery,

how nature could actually, you know,

give rise to organisms that have the capabilities

that we have.

So it’s interesting because even the idea of evolution

is hard for me to keep all together in my mind.

So because we think of a human time scale,

it’s hard to imagine, like, the development

of the human eye would give me nightmares too.

Because you have to think across many, many, many

generations, and it’s very tempting to think about

kind of a growth of a complicated object

and it’s like, how is it possible for such a thing

to be built?

Because also, me, from a robotics engineering perspective,

it’s very hard to build these systems.

How can, through an undirected process,

can a complex thing be designed?

It seems not, it seems wrong.

Yeah, so that’s absolutely right.

And I, you know, a slightly different career path

that would have been equally interesting to me

would have been to actually study the process

of embryological development flowing on

into brain development and the exquisite sort of laying

down of pathways and so on that occurs in the brain.

And I know the slightest bit about that is not my field,

but there are, you know, fascinating aspects

to this process that eventually result in the, you know,

the complexity of various brains.

At least, you know, one thing we’re,

in the field, I think people have felt for a long time,

in the study of vision, the continuity between humans

and nonhuman animals has been second nature

for a lot longer.

I was having, I had this conversation with somebody

who is a vision scientist and he was saying,

oh, we don’t have any problem with this.

You know, the monkey’s visual system

and the human visual system, extremely similar

up to certain levels, of course, they diverge after a while.

But the first, the visual pathway from the eye

to the brain and the first few layers of cortex

or cortical areas, I guess one would say,

are extremely similar.

Yeah, so on the cognition side is where the leap

seems to happen with humans,

that it does seem we’re kind of special.

And that’s a really interesting question

when thinking about alien life

or if there’s other intelligent alien civilizations

out there, is how special is this leap?

So one special thing seems to be the origin of life itself.

However you define that, there’s a gray area.

And the other leap, this is very biased perspective

of a human, is the origin of intelligence.

And again, from an engineer perspective,

it’s a difficult question to ask.

An important one is how difficult is that leap?

How special were humans?

Did a monolith come down?

Did aliens bring down a monolith

and some apes had to touch a monolith to get it?

That’s a lot like Descartes idea, right?

Exactly, but it just seems one heck of a leap

to get to this level of intelligence.

Yeah, and so Chomsky argued that some genetic fluke occurred

100,000 years ago and just happened

that some human, some hominin predecessor of current humans

had this one genetic tweak that resulted in language.

And language then provided this special thing that separates us

from all other animals.

I think there’s a lot of truth to the value and importance

of language, but I think it comes along

with the evolution of a lot of other related things related

to sociality and mutual engagement with others

and establishment of, I don’t know,

rich mechanisms for organizing and understanding

of the world, which language then plugs into.

Right, so language is a tool that

allows you to do this kind of collective intelligence.

And whatever is at the core of the thing that

allows for this collective intelligence is the main thing.

And it’s interesting to think about that one fluke, one

mutation could lead to the first crack opening of the door

to human intelligence.

All it takes is one.

Evolution just kind of opens the door a little bit,

and then time and selection takes care of the rest.

You know, there’s so many fascinating aspects

to these kinds of things.

So we think of evolution as continuous, right?

We think, oh, yes, OK, over 500 million years,

there could have been this relatively continuous changes.

And but that’s not what anthropologists,

evolutionary biologists found from the fossil record.

They found hundreds of millions of years of stasis.

And then suddenly a change occurs.

Well, suddenly on that scale is a million years or something,

or even 10 million years.

But the concept of punctuated equilibrium

was a very important concept in evolutionary biology.

And that also feels somehow right about the stages

of our mental abilities.

We seem to have a certain kind of mindset at a certain age.

And then at another age, we look at that four year old

and say, oh, my god, how could they have thought that way?

So Piaget was known for this kind of stage theory

of child development, right?

And you look at it closely, and suddenly those stages

are so discreet and it transitions.

But the difference between the four year old and the seven

year old is profound.

And that’s another thing that’s always interested me

is how something happens over the course of several years

of experience where at some point

we reach the point where something

like an insight or a transition or a new stage of development

occurs.

And these kinds of things can be understood

in complex systems research.

And so evolutionary biology, developmental biology,

cognitive development are all things

that have been approached in this kind of way.

Yeah.

Just like you said, I find both fascinating

those early years of human life, but also

the early minutes, days from the embryonic development

to how from embryos you get the brain.

That development, again, from an engineer perspective,

is fascinating.

So it’s not.

So the early, when you deploy the brain to the human world

and it gets to explore that world and learn,

that’s fascinating.

But just like the assembly of the mechanism

that is capable of learning, that’s amazing.

The stuff they’re doing with brain organoids

where you can build many brains and study

that self assembly of a mechanism from the DNA material,

that’s like, what the heck?

You have literally biological programs

that just generate a system, this mushy thing that’s

able to be robust and learn in a very unpredictable world

and learn seemingly arbitrary things,

or a very large number of things that enable survival.

Yeah.

Ultimately, that is a very important part

of the whole process of understanding

this emergence of mind from brain kind of thing.

And the whole thing seems to be pretty continuous.

So let me step back to neural networks

for another brief minute.

You wrote parallel distributed processing books

that explored ideas of neural networks in the 1980s

together with a few folks.

But the books you wrote with David Romelhart,

who is the first author on the back propagation

paper with Jeff Hinton.

So these are just some figures at the time

that we’re thinking about these big ideas.

What are some memorable moments of discovery

and beautiful ideas from those early days?

I’m going to start sort of with my own process in the mid 70s

and then into the late 70s when I met Jeff Hinton

and he came to San Diego and we were all together.

In my time in graduate schools, I’ve already described to you,

I had this sort of feeling of, OK, I’m

really interested in human cognition,

but this disembodied sort of way of thinking about it

that I’m getting from the current mode of thought about it

isn’t working fully for me.

And when I got my assistant professorship,

I went to UCSD and that was in 1974.

Something amazing had just happened.

Dave Romelhart had written a book together

with another man named Don Norman

and the book was called Explorations in Cognition.

And it was a series of chapters exploring

interesting questions about cognition,

but in a completely sort of abstract, nonbiological kind

of way.

And I’m saying, gee, this is amazing.

I’m coming to this community where people can get together

and feel like they’ve collectively exploring ideas.

And it was a book that had a lot of, I don’t know,

lightness to it.

And Don Norman, who was the more senior figure

to Romelhart at that time who led that project,

always created this spirit of playful exploration of ideas.

And so I’m like, wow, this is great.

But I was also still trying to get from the neurons

to the cognition.

And I realized at one point, I got this opportunity

to go to a conference where I heard a talk by a man named

James Anderson, who was an engineer,

but by then a professor in a psychology department, who

had used linear algebra to create neural network

models of perception and categorization and memory.

And it just blew me out of the water

that one could create a model that was simulating neurons,

not just engaged in a stepwise algorithmic process that

was construed abstractly.

But it was simulating remembering and recalling

and recognizing the prior occurrence of a stimulus

or something like that.

So for me, this was a bridge between the mind and the brain.

And I remember I was walking across campus one day in 1977,

and I almost felt like St. Paul on the road to Damascus.

I said to myself, if I think about the mind in terms

of a neural network, it will help

me answer the questions about the mind

that I’m trying to answer.

And that really excited me.

So I think that a lot of people were

becoming excited about that.

And one of those people was Jim Anderson, who I had mentioned.

Another one was Steve Grossberg, who

had been writing about neural networks since the 60s.

And Jeff Hinton was yet another.

And his PhD dissertation showed up in an applicant pool

to a postdoctoral training program

that Dave and Don, the two men I mentioned before,

Rumelhart and Norman, were administering.

And Rumelhart got really excited about Hinton’s PhD dissertation.

And so Hinton was one of the first people

who came and joined this group of postdoctoral scholars

that was funded by this wonderful grant that they got.

Another one who is also well known

in neural network circles is Paul Smolenski.

He was another one of that group.

Anyway, Jeff and Jim Anderson organized a conference

at UCSD where we were.

And it was called Parallel Models of Associative Memory.

And it brought all the people together

who had been thinking about these kinds of ideas

in 1979 or 1980.

And this began to kind of really resonate

with some of Rumelhart’s own thinking,

some of his reasons for wanting something

other than the kinds of computation

he’d been doing so far.

So let me talk about Rumelhart now for a minute,

OK, with that context.

Well, let me also just pause because he

said so many interesting things before we go to Rumelhart.

So first of all, for people who are not familiar,

neural networks are at the core of the machine learning,

deep learning revolution of today.

Geoffrey Hinton that we mentioned

is one of the figures that were important in the history

like yourself in the development of these neural networks,

artificial neural networks that are then

used for the machine learning application.

Like I mentioned, the backpropagation paper

is one of the optimization mechanisms

by which these networks can learn.

And the word parallel is really interesting.

So it’s almost like synonymous from a computational

perspective how you thought at the time about neural networks

as parallel computation.

Would that be fair to say?

Well, yeah, the parallel, the word parallel in this

comes from the idea that each neuron is

an independent computational unit, right?

It gathers data from other neurons,

it integrates it in a certain way,

and then it produces a result. And it’s

a very simple little computational unit.

But it’s autonomous in the sense that it does its thing, right?

It’s in a biological medium where

it’s getting nutrients and various chemicals

from that medium.

But you can think of it as almost like a little computer

in and of itself.

So the idea is that each our brains have, oh, look,

100 or hundreds, almost a billion

of these little neurons, right?

And they’re all capable of doing their work at the same time.

So it’s like instead of just a single central processor that’s

engaged in chug one step after another,

we have a billion of these little computational units

working at the same time.

So at the time that’s, I don’t know, maybe you can comment,

it seems to me, even still to me,

quite a revolutionary way to think about computation

relative to the development of theoretical computer science

alongside of that where it’s very much like sequential computer.

You’re analyzing algorithms that are running on a single computer.

You’re saying, wait a minute, why don’t we

take a really dumb, very simple computer

and just have a lot of them interconnected together?

And they’re all operating in their own little world

and they’re communicating with each other

and thinking of computation that way.

And from that kind of computation,

trying to understand how things like certain characteristics

of the human mind can emerge.

That’s quite a revolutionary way of thinking, I would say.

Well, yes, I agree with you.

And there’s still this sort of sense

of not sort of knowing how we kind of get all the way there,

I think.

And this very much remains at the core of the questions

that everybody’s asking about the capabilities

of deep learning and all these kinds of things.

But if I could just play this out a little bit,

a convolutional neural network or a CNN,

which many people may have heard of, is a set of,

you could think of it biologically as a set of

collections of neurons.

Each collection has maybe 10,000 neurons in it.

But there’s many layers.

Some of these things are hundreds or even

1,000 layers deep.

But others are closer to the biological brain

and maybe they’re like 20 layers deep or something like that.

So within each layer, we have thousands of neurons

or tens of thousands maybe.

Well, in the brain, we probably have millions in each layer.

But we’re getting sort of similar in a certain way.

And then we think, OK, at the bottom level,

there’s an array of things that are like the photoreceptors.

In the eye, they respond to the amount

of light of a certain wavelength at a certain location

on the pixel array.

So that’s like the biological eye.

And then there’s several further stages going up,

layers of these neuron like units.

And you go from that raw input array of pixels

to the classification, you’ve actually

built a system that could do the same kind of thing

that you and I do when we open our eyes and we look around

and we see there’s a cup, there’s a cell phone,

there’s a water bottle.

And these systems are doing that now, right?

So they are, in terms of the parallel idea

that we were talking about before,

they are doing this massively parallel computation

in the sense that each of the neurons in each

of those layers is thought of as computing

its little bit of something about the input

simultaneously with all the other ones in the same layer.

We get to the point of abstracting that away

and thinking, oh, it’s just one whole vector that’s

being computed, one activation pattern that’s

computed in a single step.

And that abstraction is useful, but it’s still that parallel.

And distributed processing, right?

Each one of these guys is just contributing

a tiny bit to that whole thing.

And that’s the excitement that you felt,

that from these simple things, you can emerge.

When you add these level of abstractions on it,

you can start getting all the beautiful things

that we think about as cognition.

And so, OK, so you have this conference.

I forgot the name already, but it’s

Parallel and Something Associative Memory and so on.

Very exciting, technical and exciting title.

And you started talking about Dave Romerhart.

So who is this person that was so,

you’ve spoken very highly of him.

Can you tell me about him, his ideas, his mind, who he was

as a human being, as a scientist?

So Dave came from a little tiny town in Western South Dakota.

And his mother was the librarian,

and his father was the editor of the newspaper.

And I know one of his brothers pretty well.

They grew up, there were four brothers,

and they grew up together.

And their father encouraged them to compete with each other

a lot.

They competed in sports, and they competed in mind games.

I don’t know, things like Sudoku and chess and various things

like that.

And Dave was a standout undergraduate.

He went at a younger age than most people

do to college at the University of South Dakota

and majored in mathematics.

And I don’t know how he got interested in psychology,

but he applied to the mathematical psychology

program at Stanford and was accepted as a PhD student

to study mathematical psychology at Stanford.

So mathematical psychology is the use of mathematics

to model mental processes.

So something that I think these days

might be called cognitive modeling, that whole space.

Yeah, it’s mathematical in the sense

that you say, if this is true and that is true,

then I can derive that this should follow.

And so you say, these are my stipulations

about the fundamental principles,

and this is my prediction about behavior.

And it’s all done with equations.

It’s not done with a computer simulation.

So you solve the equation, and that tells you

what the probability that the subject

will be correct on the seventh trial or the experiment is

or something like that.

So it’s a use of mathematics to descriptively characterize

aspects of behavior.

And Stanford at that time was the place

where there were several really, really strong

mathematical thinkers who were also connected with three

or four others around the country who brought

a lot of really exciting ideas onto the table.

And it was a very, very prestigious part

of the field of psychology at that time.

So Rummelhart comes into this.

He was a very strong student within that program.

And he got this job at this brand new university

in San Diego in 1967, where he’s one of the first assistant

professors in the Department of Psychology at UCSD.

So I got there in 74, seven years later,

and Rummelhart at that time was still

doing mathematical modeling.

But he had gotten interested in cognition.

He’d gotten interested in understanding.

And understanding, I think, remains,

what does it mean to understand anyway?

It’s an interesting sort of curious,

how would we know if we really understood something?

But he was interested in building machines

that would hear a couple of sentences

and have an insight about what was going on.

So for example, one of his favorite things at that time

was, Margie was sitting on the front step

when she heard the familiar jingle of the good humor man.

She remembered her birthday money and ran into the house.

What is Margie doing?

Why?

Well, there’s a couple of ideas you could have,

but the most natural one is that the good humor

man brings ice cream.

She likes ice cream.

She knows she needs money to buy ice cream,

so she’s going to run into the house and get her money

so she can buy herself an ice cream.

It’s a huge amount of inference that

has to happen to get those things to link up

with each other.

And he was interested in how the hell that could happen.

And he was trying to build good old fashioned AI style

models of representation of language and content of things

like has money.

So like formal logic and knowledge bases,

like that kind of stuff.

So he was integrating that with his thinking about cognition.

The mechanisms of cognition, how can they mechanistically

be applied to build these knowledge,

like to actually build something that

looks like a web of knowledge and thereby from there emerges

something like understanding, whatever the heck that is.

Yeah, he was grappling.

This was something that they grappled

with at the end of that book that I was describing,

Explorations in Cognition.

But he was realizing that the paradigm of good old fashioned

AI wasn’t giving him the answers to these questions.

By the way, that’s called good old fashioned AI now.

It wasn’t called that at the time.

Well, it was.

It was beginning to be called that.

Oh, because it was from the 60s.

Yeah, yeah.

By the late 70s, it was kind of old fashioned,

and it hadn’t really panned out.

And people were beginning to recognize that.

And Rommelhardt was like, yeah, he’s part of the recognition

that this wasn’t all working.

Anyway, so he started thinking in terms of the idea

that we needed systems that allowed us to integrate

multiple simultaneous constraints in a way that would

be mutually influencing each other.

So he wrote a paper that just really, first time I read it,

I said, oh, well, yeah, but is this important?

But after a while, it just got under my skin.

And it was called An Interactive Model of Reading.

And in this paper, he laid out the idea

that every aspect of our interpretation of what’s

coming off the page when we read at every level of analysis

you can think of actually depends

on all the other levels of analysis.

So what are the actual pixels making up each letter?

And what do those pixels signify about which letters they are?

And what do those letters tell us about what words are there?

And what do those words tell us about what ideas

the author is trying to convey?

And so he had this model where we

have these little tiny elements that represent

each of the pixels of each of the letters,

and then other ones that represent the line segments

in them, and other ones that represent the letters,

and other ones that represent the words.

And at that time, his idea was there’s this set of experts.

There’s an expert about how to construct a line out of pixels,

and another expert about which sets of lines

go together to make which letters,

and another one about which letters go together

to make which words, and another one about what

the meanings of the words are, and another one about how

the meanings fit together, and things like that.

And all these experts are looking at this data,

and they’re updating hypotheses at other levels.

So the word expert can tell the letter expert,

oh, I think there should be a T there,

because I think there should be a word the here.

And the bottom up sort of feature to letter expert

could say, I think there should be a T there, too.

And if they agree, then you see a T, right?

And so there’s a top down, bottom up interactive process,

but it’s going on at all layers simultaneously.

So everything can filter all the way down from the top,

as well as all the way up from the bottom.

And it’s a completely interactive, bidirectional,

parallel distributed process.

That is somehow, because of the abstractions, it’s hierarchical.

So there’s different layers of responsibilities,

different levels of responsibilities.

First of all, it’s fascinating to think about it

in this kind of mechanistic way.

So not thinking purely from the structure

of a neural network or something like a neural network,

but thinking about these little guys

that work on letters, and then the letters come words

and words become sentences.

And that’s a very interesting hypothesis

that from that kind of hierarchical structure

can emerge understanding.

Yeah, so, but the thing is, though,

I wanna just sort of relate this

to the earlier part of the conversation.

When Romelhart was first thinking about it,

there were these experts on the side,

one for the features and one for the letters

and one for how the letters make the words and so on.

And they would each be working,

sort of evaluating various propositions about,

you know, is this combination of features here

going to be one that looks like the letter T and so on.

And what he realized,

kind of after reading Hinton’s dissertation

and hearing about Jim Anderson’s

linear algebra based neural network models

that I was telling you about before

was that he could replace those experts

with neuron like processing units,

which just would have their connection weights

that would do this job.

So what ended up happening was

that Romelhart and I got together

and we created a model

called the interactive activation model of letter perception,

which takes these little pixel level inputs,

constructs line segment features, letters and words.

But now we built it out of a set of neuron

like processing units that are just connected

to each other with connection weights.

So the unit for the word time has a connection

to the unit for the letter T in the first position

and the letter I in the second position, so on.

And because these connections are bi directional,

if you have prior knowledge that it might be the word time

that starts to prime the letters and the features.

And if you don’t, then it has to start bottom up.

But the directionality just depends

on where the information comes in first.

And if you have context together

with features at the same time,

they can convergently result in an emergent perception.

And that was the piece of work that we did together

that sort of got us both completely convinced

that this neural network way of thinking

was going to be able to actually address the questions

that we were interested in as cognitive psychologists.

So the algorithmic side, the optimization side,

those are all details like when you first start the idea

that you can get far with this kind of way of thinking,

that in itself is a profound idea.

So do you like the term connectionism

to describe this kind of set of ideas?

I think it’s useful.

It highlights the notion that the knowledge

that the system exploits is in the connections

between the units, right?

There isn’t a separate dictionary.

There’s just the connections between the units.

So I already sort of laid that on the table

with the connections from the letter units

to the unit for the word time, right?

The unit for the word time isn’t a unit for the word time

for any other reason than it’s got the connections

to the letters that make up the word time.

Those are the units on the input that excited

when it’s excited that it in a sense represents

in the system that there’s support for the hypothesis

that the word time is present in the input.

But it’s not, the word time isn’t written anywhere

inside the bottle, it’s only written there

in the picture we drew of the model

to say that’s the unit for the word time, right?

And if somebody wants to tell me,

well, how do you spell that word?

You have to use the connections from that out

to then get those letters, for example.

That’s such a, that’s a counterintuitive idea

where humans want to think in this logic way.

This idea of connectionism, it doesn’t, it’s weird.

It’s weird that this is how it all works.

Yeah, but let’s go back to that CNN, right?

That CNN with all those layers of neuron

like processing units that we were talking about before,

it’s gonna come out and say, this is a cat, that’s a dog,

but it has no idea why it said that.

It’s just got all these connections

between all these layers of neurons,

like from the very first layer to the,

you know, like whatever these layers are,

they just get numbered after a while

because they, you know, they somehow further in you go,

the more abstract the features are,

but it’s a graded and continuous sort of process

of abstraction anyway.

And, you know, it goes from very local,

very specific to much more sort of global,

but it’s still, you know, another sort of pattern

of activation over an array of units.

And then at the output side, it says it’s a cat

or it’s a dog.

And when I open my eyes and say, oh, that’s Lex,

or, oh, you know, there’s my own dog

and I recognize my dog,

which is a member of the same species as many other dogs,

but I know this one

because of some slightly unique characteristics.

I don’t know how to describe what it is

that makes me know that I’m looking at Lex

or at my particular dog, right?

Or even that I’m looking at a particular brand of car.

Like I can say a few words about it,

but I wrote you a paragraph about the car,

you would have trouble figuring out

which car is he talking about, right?

So the idea that we have propositional knowledge

of what it is that allows us to recognize

that this is an actual instance

of this particular natural kind

has always been something that it never worked, right?

You couldn’t ever write down a set of propositions

for visual recognition.

And so in that space, it sort of always seemed very natural

that something more implicit,

you don’t have access to what the details

of the computation were in between,

you just get the result.

So that’s the other part of connectionism,

you cannot, you don’t read the contents of the connections,

the connections only cause outputs to occur

based on inputs.

Yeah, and for us that like final layer

or some particular layer is very important,

the one that tells us that it’s our dog

or like it’s a cat or a dog,

but each layer is probably equally as important

in the grand scheme of things.

Like there’s no reason why the cat versus dog

is more important than the lower level activations,

it doesn’t really matter.

I mean, all of it is just this beautiful stacking

on top of each other.

And we humans live in this particular layers,

for us it’s useful to survive,

to use those cat versus dog, predator versus prey,

all those kinds of things.

It’s fascinating that it’s all continuous,

but then you then ask,

the history of artificial intelligence, you ask,

are we able to introspect and convert the very things

that allow us to tell the difference between cat and dog

into a logic, into formal logic?

That’s been the dream.

I would say that’s still part of the dream of symbolic AI.

And I’ve recently talked to Doug Lenat who created Psych

and that’s a project that lasted for many decades

and still carries a sort of dream in it, right?

But we still don’t know the answer, right?

It seems like connectionism is really powerful,

but it also seems like there’s this building of knowledge.

And so how do we, how do you square those two?

Like, do you think the connections can contain

the depth of human knowledge and the depth

of what Dave Romahart was thinking about of understanding?

Well, that remains the $64 question.

And I…

With inflation, that number is higher.

Okay, $64,000.

Maybe it’s the $64 billion question now.

You know, I think that from the emergentist side,

which, you know, I placed myself on.

So I used to sometimes tell people

I was a radical, eliminative connectionist

because I didn’t want them to think

that I wanted to build like anything into the machine.

But I don’t like the word eliminative anymore

because it makes it seem like it’s wrong to think

that there is this emergent level of understanding.

And I disagree with that.

So I think, you know, I would call myself

an a radical emergentist connectionist

rather than eliminative connectionist, right?

Because I want to acknowledge

that these higher level kinds of aspects

of our cognition are real, but they’re not,

they don’t exist as such.

And there was an example that Doug Hofstadter used to use

that I thought was helpful in this respect.

Just the idea that we can think about sand dunes

as entities and talk about like how many there are even.

But we also know that a sand dune is a very fluid thing.

It’s a pile of sand that is capable

of moving around under the wind and reforming itself

in somewhat different ways.

And if we think about our thoughts as like sand dunes,

as being things that emerge from just the way

all the lower level elements sort of work together

and are constrained by external forces,

then we can say, yes, they exist as such,

but they also, we shouldn’t treat them

as completely monolithic entities that we can understand

without understanding sort of all of the stuff

that allows them to change in the ways that they do.

And that’s where I think the connectionist

feeds into the cognitive.

It’s like, okay, so if the substrate

is parallel distributed connectionist, then it doesn’t mean

that the contents of thought isn’t like abstract

and symbolic, but it’s more fluid maybe

than it’s easier to capture

with a set of logical expressions.

Yeah, that’s a heck of a sort of thing

to put at the top of a resume,

radical, emergentist, connectionist.

So there is, just like you said, a beautiful dance

between that, between the machinery of intelligence,

like the neural network side of it,

and the stuff that emerges.

I mean, the stuff that emerges seems to be,

I don’t know, I don’t know what that is,

that it seems like maybe all of reality is emergent.

What I think about, this is made most distinctly rich to me

when I look at cellular automata, look at game of life,

that from very, very simple things,

very rich, complex things emerge

that start looking very quickly like organisms

that you forget how the actual thing operates.

They start looking like they’re moving around,

they’re eating each other,

some of them are generating offspring.

You forget very quickly.

And it seems like maybe it’s something

about the human mind that wants to operate

in some layer of the emergent,

and forget about the mechanism

of how that emergence happens.

So it, just like you are in your radicalness,

I’m also, it seems like unfair

to eliminate the magic of that emergent,

like eliminate the fact that that emergent is real.

Yeah, no, I agree.

I’m not, that’s why I got rid of eliminative, right?

Eliminative, yeah.

Yeah, because it seemed like that was trying to say

that it’s all completely like.

An illusion of some kind, it’s not.

Well, who knows whether there isn’t,

there aren’t some illusory characteristics there.

And I think that philosophically many people

have confronted that possibility over time,

but it’s still important to accept it as magic, right?

So, I think of Fellini in this context,

I think of others who have appreciated the role of magic,

the role of magic, of actual trickery

in creating illusions that move us.

And Plato was on to this too.

It’s like somehow or other these shadows

give rise to something much deeper than that.

And that’s, so we won’t try to figure out what it is.

We’ll just accept it as given that that occurs.

And, you know, but he was still onto the magic of it.

Yeah, yeah, we won’t try to really, really,

really deeply understand how it works.

We’ll just enjoy the fact that it’s kind of fun.

Okay, but you worked closely with Dave Romo Hart.

He passed away as a human being.

What do you remember about him?

Do you miss the guy?

Absolutely, you know, he passed away 15ish years ago now.

And his demise was actually one of the most poignant

and, you know, like relevant tragedies, relevant to our conversation.

He started to undergo a progressive neurological condition

that isn’t far from what we’re used to.

A neurological condition that isn’t fully understood.

That is to say his particular course isn’t fully understood

because, you know, brain scans weren’t done at certain stages

and no autopsy was done or anything like that.

The wishes of the family.

We don’t know as much about the underlying pathology as we might,

but I had begun to get interested in this neurological condition

that might have been the very one that he was succumbing to

as my own efforts to understand another aspect of this mystery

that we’ve been discussing while he was beginning

to get progressively more and more affected.

So I’m going to talk about the disorder

and not about Rumelhart for a second, okay?

The disorder is something my colleagues and collaborators

have chosen to call semantic dementia.

So it’s a specific form of loss of mind

related to meaning, semantic dementia.

And it’s progressive in the sense that the patient loses the ability

to appreciate the meaning of the experiences that they have,

either from touch, from sight, from sound, from language.

They, I hear sounds, but I don’t know what they mean kind of thing.

So as this illness progresses, it starts with the patient

being unable to differentiate like similar breeds of dog

or remember the lower frequency unfamiliar categories

that they used to be able to remember.

But as it progresses, it becomes more and more striking

and the patient loses the ability to recognize things like

pigs and goats and sheep and calls all middle sized animals dogs

and can’t recognize rabbits and rodents anymore.

They call all the little ones cats

and they can’t recognize hippopotamuses and cows anymore.

They call them all horses.

So there was this one patient who went through this progression

where at a certain point, any four legged animal,

he would call it either a horse or a dog or a cat.

And if it was big, he would tend to call it a horse.

If it was small, he’d tend to call it a cat.

Middle sized ones, he called dogs.

This is just a part of the syndrome though.

The patient loses the ability to relate concepts to each other.

So my collaborator in this work, Carolyn Patterson,

developed a test called the pyramids and palm trees test.

So you give the patient a picture of pyramids

and they have a choice which goes with the pyramids,

palm trees or pine trees.

And she showed that this wasn’t just a matter of language

because the patient’s loss of this ability shows up

whether you present the material with words or with pictures.

The pictures, they can’t put the pictures together

with each other properly anymore.

They can’t relate the pictures to the words either.

They can’t do word picture matching.

But they’ve lost the conceptual grounding

from either modality of input.

And so that’s why it’s called semantic dementia.

The very semantics is disintegrating.

And we understand this in terms of our idea

that distributed representation, a pattern of activation,

represents the concepts, really similar ones.

As you degrade them, they start being,

you lose the differences.

So the difference between the dog and the goat

is no longer part of the pattern anymore.

And since dog is really familiar,

that’s the thing that remains.

And we understand that in the way the models work and learn.

But Rumelhart underwent this condition.

So on the one hand, it’s a fascinating aspect

of parallel distributed processing to be.

It reveals this sort of texture of distributed representation

in a very nice way, I’ve always felt.

But at the same time, it was extremely poignant

because this is exactly the condition

that Rumelhart was undergoing.

And there was a period of time when he was this man

who had been the most focused, goal directed, competitive,

thoughtful person who was willing to work for years

to solve a hard problem, he starts to disappear.

And there was a period of time when it was hard for any of us

to really appreciate that he was sort of, in some sense,

not fully there anymore.

Do you know if he was able to introspect

the solution of the understanding mind?

I mean, this is one of the big scientists that thinks about this.

Was he able to look at himself and understand the fading mind?

You know, we can contrast Hawking and Rumelhart in this way.

And I like to do that to honor Rumelhart

because I think Rumelhart is sort of like the Hawking

of cognitive science to me in some ways.

Both of them suffered from a degenerative condition.

In Hawking’s case, it affected the motor system.

In Rumelhart’s case, it’s affecting the semantics.

And not just the pure object semantics,

but maybe the self semantics as well.

And we don’t understand that.

Concepts broadly.

So I would say he didn’t.

And this was part of what, from the outside,

was a profound tragedy.

But on the other hand, at some level, he sort of did

because there was a period of time when it finally was realized

that he had really become profoundly impaired.

This was clearly a biological condition.

It wasn’t just like he was distracted that day or something like that.

So he retired from his professorship at Stanford

and he became, he lived with his brother for a couple years

and then he moved into a facility for people with cognitive impairments.

One that many elderly people end up in when they have cognitive impairments.

And I would spend time with him during that period.

This was like in the late 90s, around 2000 even.

And we would go bowling and he could still bowl.

And after bowling, I took him to lunch and I said,

where would you like to go?

You want to go to Wendy’s?

And he said, nah.

And I said, okay, well, where do you want to go?

And he just pointed.

He said, turn here.

So he still had a certain amount of spatial cognition

and he could get me to the restaurant.

And then when we got to the restaurant, I said,

what do you want to order?

And he couldn’t come up with any of the words,

but he knew where on the menu the thing was that he wanted.

So it’s, you know, and he couldn’t say what it was,

but he knew that that’s what he wanted to eat.

And so it’s like it isn’t monolithic at all.

Our cognition is, you know, first of all, graded in certain kinds of ways,

but also multipartite and there’s many elements to it and things,

certain sort of partial competencies still exist

in the absence of other aspects of these competencies.

So this is what always fascinated me about what used to be called

cognitive neuropsychology, you know,

the effects of brain damage on cognition.

But in particular, this gradual disintegration part.

You know, I’m a big believer that the loss of a human being that you value

is as powerful as, you know, first falling in love with that human being.

I think it’s all a celebration of the human being.

So the disintegration itself too is a celebration in a way.

Yeah, yeah.

But just to say something more about the scientist

and the backpropagation idea that you mentioned.

So in 1982, Hinton had been there as a postdoc and organized that conference.

He’d actually gone away and gotten an assistant professorship

and then there was this opportunity to bring him back.

So Jeff Hinton was back on a sabbatical.

San Diego.

And Rommelhard and I had decided we wanted to do this, you know,

we thought it was really exciting and the papers on the interactive activation model

that I was telling you about had just been published

and we both sort of saw a huge potential for this work and Jeff was there.

And so the three of us started a research group,

which we called the PDP Research Group.

And several other people came.

Francis Crick, who was at the Salk Institute, heard about it from Jeff

because Jeff was known among Brits to be brilliant

and Francis was well connected with his British friends.

So Francis Crick came.

That’s a heck of a group of people, wow.

And Paul Spolensky was one of the other postdocs.

He was still there as a postdoc.

And a few other people.

But anyway, Jeff talked to us about learning

and how we should think about how, you know, learning occurs in a neural network.

And he said, the problem with the way you guys have been approaching this

is that you’ve been looking for inspiration from biology

to tell you what the rules should be for how the synapses should change

the strengths of their connections, how the connections should form.

He said, that’s the wrong way to go about it.

What you should do is you should think in terms of

how you can adjust connection weights to solve a problem.

So you define your problem and then you figure out

how the adjustment of the connection weights will solve the problem.

And Rumelhart heard that and said to himself, okay,

so I’m going to start thinking about it that way.

I’m going to essentially imagine that I have some objective function,

some goal of the computation.

I want my machine to correctly classify all of these images.

And I can score that.

I can measure how well they’re doing on each image.

And I get some measure of error or loss, it’s typically called in deep learning.

And I’m going to figure out how to adjust the connection weights

so as to minimize my loss or reduce the error.

And that’s called, you know, gradient descent.

And engineers were already familiar with the concept of gradient descent.

And in fact, there was an algorithm called the delta rule

that had been invented by a professor in the electrical engineering department

at Stanford, Bernie Widrow and a collaborator named Hoff.

I never met him.

So gradient descent in continuous neural networks

with multiple neuron like processing units was already understood

for a single layer of connection weights.

We have some inputs over a set of neurons.

We want the output to produce a certain pattern.

We can define the difference between our target

and what the neural network is producing.

And we can figure out how to change the connection weights to reduce that error.

So what Romilhar did was to generalize that

so as to be able to change the connections from earlier layers of units

to the ones at a hidden layer between the input and the output.

And so he first called the algorithm the generalized delta rule

because it’s just an extension of the gradient descent idea.

And interestingly enough, Hinton was thinking that this wasn’t going to work very well.

So Hinton had his own alternative algorithm at the time

based on the concept of the Boltzmann machine that he was pursuing.

So the paper on the Boltzmann machine came out in,

learning in Boltzmann machines came out in 1985.

But it turned out that back prop worked better than the Boltzmann machine learning algorithm.

So this generalized delta algorithm ended up being called back propagation, as you say, back prop.

Yeah. And probably that name is opaque to me.

What does that mean?

What it meant was that in order to figure out what the changes you needed to make

to the connections from the input to the hidden layer,

you had to back propagate the error signals from the output layer

through the connections from the hidden layer to the output

to get the signals that would be the error signals for the hidden layer.

And that’s how Rumelhart formulated it.

It was like, well, we know what the error signals are at the output layer.

Let’s see if we can get a signal at the hidden layer

that tells each hidden unit what its error signal is essentially.

So it’s back propagating through the connections

from the hidden to the output to get the signals to tell the hidden units

how to change their weights from the input.

And that’s why it’s called back prop.

Yeah. But so it came from Hinton having introduced the concept of, you know,

define your objective function, figure out how to take the derivative

so that you can adjust the connections so that they make progress towards your goal.

So stop thinking about biology for a second

and let’s start to think about optimization and computation a little bit more.

So what about Jeff Hinton?

You’ve gotten a chance to work with him in that little thing.

The set of people involved there is quite incredible.

The small set of people under the PDP flag,

it’s just given the amount of impact those ideas have had over the years,

it’s kind of incredible to think about.

But, you know, just like you said, like yourself,

Jeffrey Hinton is seen as one of the, not just like a seminal figure in AI,

but just a brilliant person,

just like the horsepower of the mind is pretty high up there for him

because he’s just a great thinker.

So what kind of ideas have you learned from him?

Have you influenced each other on?

Have you debated over what stands out to you in the full space of ideas here

at the intersection of computation and cognition?

Well, so Jeff has said many things to me that had a profound impact on my thinking.

And he’s written several articles which were way ahead of their time.

He had two papers in 1981, just to give one example,

one of which was essentially the idea of transformers

and another of which was an early paper on semantic cognition

which inspired him and Rumelhart and me throughout the 80s

and, you know, still I think sort of grounds my own thinking

about the semantic aspects of cognition.

He also, in a small paper that was never published that he wrote in 1977,

you know, before he actually arrived at UCSD or maybe a couple years even before that,

I don’t know, when he was a PhD student,

he described how a neural network could do recursive computation.

And it was a very clever idea that he’s continued to explore over time,

which was sort of the idea that when you call a subroutine,

you need to save the state that you had when you called it

so you can get back to where you were when you’re finished with the subroutine.

And the idea was that you would save the state of the calling routine

by making fast changes to connection weights.

And then when you finished with the subroutine call,

those fast changes in the connection weights would allow you to go back

to where you had been before and reinstate the previous context

so that you could continue on with the top level of the computation.

Anyway, that was part of the idea.

And I always thought, okay, that’s really, you know,

he had extremely creative ideas that were quite a lot ahead of his time

and many of them in the 1970s and early 1980s.

So another thing about Geoff Hinton’s way of thinking,

which has profoundly influenced my effort to understand

human mathematical cognition, is that he doesn’t write too many equations.

And people tell stories like, oh, in the Hinton Lab meetings,

you don’t get up at the board and write equations

like you do in everybody else’s machine learning lab.

What you do is you draw a picture.

And, you know, he explains aspects of the way deep learning works

by putting his hands together and showing you the shape of a ravine

and using that as a geometrical metaphor for what’s happening

as this gradient descent process.

You’re coming down the wall of a ravine.

If you take too big a jump, you’re going to jump to the other side.

And so that’s why we have to turn down the learning rate, for example.

And it speaks to me of the fundamentally intuitive character of deep insight

together with a commitment to really understanding

in a way that’s absolutely ultimately explicit and clear, but also intuitive.

Yeah, there’s certain people like that.

Here’s an example, some kind of weird mix of visual and intuitive

and all those kinds of things.

Feynman is another example, different style of thinking, but very unique.

And when you’re around those people, for me in the engineering realm,

there’s a guy named Jim Keller who’s a chip designer, engineer.

Every time I talk to him, it doesn’t matter what we’re talking about.

Just having experienced that unique way of thinking transforms you

and makes your work much better.

And that’s the magic.

You look at Daniel Kahneman, you look at the great collaborations

throughout the history of science.

That’s the magic of that.

It’s not always the exact ideas that you talk about,

but it’s the process of generating those ideas.

Being around that, spending time with that human being,

you can come up with some brilliant work,

especially when it’s cross disciplinary as it was a little bit in your case with Jeff.

Yeah.

Jeff is a descendant of the logician Boole.

He comes from a long line of English academics.

And together with the deeply intuitive thinking ability that he has,

he also has, it’s been clear, he’s described this to me,

and I think he’s mentioned it from time to time in other interviews

that he’s had with people.

He’s wanted to be able to sort of think of himself as contributing

to the understanding of reasoning itself, not just human reasoning.

Like Boole is about logic, right?

It’s about what can we conclude from what else and how do we formalize that.

And as a computer scientist, logician, philosopher,

the goal is to understand how we derive truths from other,

from givens and things like this.

And the work that Jeff was doing in the early to mid 80s

on something called the Bolton machine was his way of connecting

with that Boolean tradition and bringing it into the more continuous,

probabilistic graded constraint satisfaction realm.

And it was a beautiful set of ideas linked with theoretical physics

as well as with logic.

And it’s always been, I mean, I’ve always been inspired

by the Bolton machine too.

It’s like, well, if the neurons are probabilistic rather than deterministic

in their computations, then maybe this somehow is part of the serendipity

or adventitiousness of the moment of insight, right?

It might not have occurred at that particular instant.

It might be sort of partially the result of a stochastic process.

And that too is part of the magic of the emergence of some of these things.

Well, you’re right with the Boolean lineage and the dream of computer science

is somehow, I mean, I certainly think of humans this way,

that humans are one particular manifestation of intelligence,

that there’s something bigger going on and you’re hoping to figure that out.

The mechanisms of intelligence, the mechanisms of cognition

are much bigger than just humans.

Yeah. So I think of, I started using the phrase computational intelligence

at some point as to characterize the field that I thought, you know,

people like Geoff Hinton and many of the people I know at DeepMind

are working in and where I feel like I’m, you know,

I’m a kind of a human oriented computational intelligence researcher

in that I’m actually kind of interested in the human solution.

But at the same time, I feel like that’s where a huge amount

of the excitement of deep learning actually lies is in the idea that,

you know, we may be able to even go beyond what we can achieve

with our own nervous systems when we build computational intelligences

that are, you know, not limited in the ways that we are by our own biology.

Perhaps allowing us to scale the very mechanisms of human intelligence

just increases power through scale.

Yes. And I think that that, you know, obviously that’s the,

that’s being played out massively at Google Brain, at OpenAI

and to some extent at DeepMind as well.

I guess I shouldn’t say to some extent.

Just the massive scale of the computations that are used to succeed

at games like Go or to solve the protein folding problems

that they’ve been solving and so on.

Still not as many synapses and neurons as the human brain.

So we still got, we’re still beating them on that.

We humans are beating the AIs, but they’re catching up pretty quickly.

You write about modeling of mathematical cognition.

So let me first ask about mathematics in general.

There’s a paper titled Parallel Distributed Processing

Approach to Mathematical Cognition where in the introduction

there’s some beautiful discussion of mathematics.

And you referenced there Tristan Needham who criticizes a narrow

form of view of mathematics by liking the studying of mathematics

as symbol manipulation to studying music without ever hearing a note.

So from that perspective, what do you think is mathematics?

What is this world of mathematics like?

Well, I think of mathematics as a set of tools for exploring

idealized worlds that often turn out to be extremely relevant

to the real world but need not.

But they’re worlds in which objects exist with idealized properties

and in which the relationships among them can be characterized

with precision so as to allow the implications of certain facts

to then allow you to derive other facts with certainty.

So if you have two triangles and you know that there is an angle

in the first one that has the same measure as an angle in the second one

and you know that the lengths of the sides adjacent to that angle

in each of the two triangles, the corresponding sides adjacent

to that angle also have the same measure, then you can then conclude

that the triangles are congruent.

That is to say they have all of their properties in common.

And that is something about triangles.

It’s not a matter of formulas.

These are idealized objects.

In fact, we built bridges out of triangles and we understand

how to measure the height of something we can’t climb by extending

these ideas about triangles a little further.

And all of the ability to get a tiny speck of matter launched

from the planet Earth to intersect with some tiny, tiny little body

way out in way beyond Pluto somewhere at exactly a predicted time

and date is something that depends on these ideas.

And it’s actually happening in the real physical world that these ideas

make contact with it in those kinds of instances.

But there are these idealized objects, these triangles or these distances

or these points, whatever they are, that allow for this set of tools

to be created that then gives human beings this incredible leverage

that they didn’t have without these concepts.

And I think this is actually already true when we think about just,

you know, the natural numbers.

I always like to include zero, so I’m going to say the nonnegative integers,

but that’s a place where some people prefer not to include zero.

We like zero here, natural numbers, zero, one, two, three, four, five,

six, seven, and so on.

Yeah. And because they give you the ability to be exact about

how many sheep you have.

I sent you out this morning, there were 23 sheep.

You came back with only 22. What happened?

The fundamental problem of physics, how many sheep you have.

It’s a fundamental problem of human society that you damn well better

bring back the same number of sheep as you started with.

And it allows commerce, it allows contracts, it allows the establishment

of records and so on to have systems that allow these things to be notated.

But they have an inherent aboutness to them that’s one in the same time sort of

abstract and idealized and generalizable, while on the other hand,

potentially very, very grounded and concrete.

And one of the things that makes for the incredible achievements of the human mind

is the fact that humans invented these idealized systems that leverage

the power of human thought in such a way as to allow all this kind of thing to happen.

And so that’s what mathematics to me is the development of systems for thinking about

the properties and relations among sets of idealized objects and

the mathematical notation system that we unfortunately focus way too much on

is just our way of expressing propositions about these properties.

It’s just like we’re talking with Chomsky in language.

It’s the thing we’ve invented for the communication of those ideas.

They’re not necessarily the deep representation of those ideas.

So what’s a good way to model such powerful mathematical reasoning, would you say?

What are some ideas you have for capturing this in a model?

The insights that human mathematicians have had is a combination of the kind of the

intuitive kind of connectionist like knowledge that makes it so that something is just like

obviously true so that you don’t have to think about why it’s true.

That then makes it possible to then take the next step and ponder and reason and

figure out something that you previously didn’t have that intuition about.

It then ultimately becomes a part of the intuition that the next generation of

mathematical thinkers have to ground their own thinking on so that they can extend the ideas even further.

I came across this quotation from Henri Poincare while I was walking in the woods with my wife

in a state park in Northern California late last summer.

And what it said on the bench was it is by logic that we prove but by intuition that we discover.

And so what for me the essence of the project is to understand how to bring the intuitive

connectionist resources to bear on letting the intuitive discovery arise from engagement in

thinking with this formal system.

So I think of the ability of somebody like Hinton or Newton or Einstein or Rumelhart or

Poincare to Archimedes is another example.

So suddenly a flash of insight occurs. It’s like the constellation of all of these

simultaneous constraints that somehow or other causes the mind to settle into a novel state that

it never did before and give rise to a new idea that then you can say, okay, well, now how can I

prove this? How do I write down the steps of that theorem that allow me to make it rigorous and certain?

And so I feel like the kinds of things that we’re beginning to see deep learning systems do of

their own accord kind of gives me this feeling of hope or encouragement that ultimately it’ll all happen.

So in particular as many people now have become really interested in thinking about, you know,

neural networks that have been trained with massive amounts of text can be given a prompt and they

can then sort of generate some really interesting, fanciful, creative story from that prompt.

And there’s kind of like a sense that they’ve somehow synthesized something like novel out of

the, you know, all of the particulars of all of the billions and billions of experiences that went

into the training data that gives rise to something like this sort of intuitive sense of what would

be a fun and interesting little story to tell or something like that. It just sort of wells up out

of the letting the thing play out its own imagining of what somebody might say given this prompt as

an input to get it to start to generate its own thoughts. And to me that sort of represents the

potential of capturing the intuitive side of this.

And there’s other examples, I don’t know if you find them as captivating is, you know, on the

DeepMind side with AlphaZero, if you study chess, the kind of solutions that has come up in terms

of chess, it is, there’s novel ideas there. It feels very like there’s brilliant moments of insight.

And the mechanism they use, if you think of search as maybe more towards good old fashioned AI and

then there’s the connection is the neural network that has the intuition of looking at a board,

looking at a set of patterns and saying, how good is this set of positions? And the next few

positions, how good are those? And that’s it. That’s just an intuition. Grandmasters have this

and understanding positionally, tactically, how good the situation is, how can it be improved

without doing this full, like deep search. And then maybe doing a little bit of what human chess

players call calculation, which is the search, taking a particular set of steps down the line to

see how they unroll. But there is moments of genius in those systems too. So that’s another hopeful

illustration that from neural networks can emerge this novel creation of an idea.

Yes. And I think that, you know, I think Demis Hassabis is, you know, he’s spoken about those

things. I heard him describe a move that was made in one of the go matches against Lisa Dahl in a

very similar way. And it caused me to become really excited to kind of collaborate with some of those

people and analyze it at DeepMind. So I think though that what I like to really emphasize here

is one part of what I like to emphasize about mathematical cognition at least is that philosophers

and logicians going back three or even a little more than 3000 years ago began to develop these

formal systems and gradually the whole idea about thinking formally got constructed. And, you know,

it’s preceded Euclid, certainly present in the work of Thales and others. And I’m not the world’s

leading expert in all the details of that history, but Euclid’s elements were the kind of the touch

point of a coherent document that sort of laid out this idea of an actual formal system within which

these objects were characterized and the system of inference that allowed new truths to be derived

from others was sort of like established as a paradigm. And what I find interesting is the

idea that the ability to become a person who is capable of thinking in this abstract formal way

is a result of the same kind of immersion in experience thinking in that way that we now

begin to think of our understanding of language as being, right? So, we immerse ourselves in a

particular language, in a particular world of objects and their relationships and we learn

to talk about that and we develop intuitive understanding of the real world. In a similar

way, we can think that what academia has created for us, what those early philosophers and their

academies in Athens and Alexandria and other places allowed was the development of these

schools of thought, modes of thought that then become deeply ingrained and it becomes what it

is that makes it so that somebody like Jerry Fodor would think that systematic thought is

the essential characteristic of the human mind as opposed to a derived and an acquired characteristic

that results from acculturation in a certain mode that’s been invented by humans.

Would you say it’s more fundamental than like language? If we start dancing, if we bring

Chomsky back into the conversation, first of all, is it unfair to draw a line between mathematical

cognition and language, linguistic cognition?

I think that’s a very interesting question and I think it’s one of the ones that I’m actually very

interested in right now, but I think the answer is in important ways, it is important to draw that

line, but then to come back and look at it again and see some of the subtleties and interesting

aspects of the difference. So if we think about Chomsky himself, he was born into an academic

family. His father was a professor of rabbinical studies at a small rabbinical college in

Philadelphia. He was deeply enculturated in a culture of thought and reason and brought to the

effort to understand natural language, this profound engagement with these formal systems. I

think that there was tremendous power in that and that Chomsky had some amazing insights into the

structure of natural language, but that, I’m going to use the word but there, the actual intuitive

knowledge of these things only goes so far and does not go as far as it does in people like

Chomsky himself. And this was something that was discovered in the PhD dissertation of Lyla

Gleitman, who was actually trained in the same linguistics department with Chomsky. So what Lyla

discovered was that the intuitions that linguists had about even the meaning of a phrase, not just

about its grammar, but about what they thought a phrase must mean were very different from the

intuitions of an ordinary person who wasn’t a formally trained thinker. And well, it recently

has become much more salient. I happened to have learned about this when I myself was a PhD student

at the University of Pennsylvania, but I never knew how to put it together with all of my other

thinking about these things. So I actually currently have the hypothesis that formally

trained linguists and other formally trained academics, whether it be linguistics, philosophy,

cognitive science, computer science, machine learning, mathematics,

have a mode of engagement with experience that is intuitively deeply structured to be more

organized around the systematicity and ability to be conformant with the principles of a system

than is actually true of the natural human mind without that immersion.

That’s fascinating. So the different fields and approaches with which you start to study the mind

actually take you away from the natural operation of the mind. So it makes it very difficult for you

to be somebody who introspects.

Yes. And this is where things about human belief and so called knowledge that we consider

private, not our business to manipulate in others. We are not entitled to tell somebody else what to

believe about certain kinds of things. What are those beliefs? Well, they are the product of this

sort of immersion and enculturation. That is what I believe.

And that’s limiting.

It’s something to be aware of.

Does that limit you from having a good model of cognition?

It can.

So when you look at mathematical or linguistics, I mean, what is that line then? So is Chomsky

unable to sneak up to the full picture of cognition? Are you, when you’re focusing on

mathematical thinking, are you also unable to do so?

I think you’re right. I think that’s a great way of characterizing it. And

I also think that it’s related to the concept of beginner’s mind and another concept called the

expert blind spot. So the expert blind spot is much more prosaic seeming than this point that

you were just making. But it’s something that plagues experts when they try to communicate

their understanding to non experts. And that is that things are self evident to them that

they can’t begin to even think about how they could explain it to somebody else.

Because it’s just like so patently obvious that it must be true. And

when Kronacker said, God made the natural numbers, all else is the work of man,

he was expressing that intuition that somehow or other, the basic fundamentals of discrete

quantities being countable and innumerable and indefinite in number was not something that

had to be discovered. But he was wrong. It turns out that many cognitive scientists

agreed with him for a time. There was a long period of time where the natural

numbers were considered to be a part of the innate endowment of core knowledge or to use

the kind of phrases that Spelke and Kerry used to talk about what they believe are

the innate primitives of the human mind. And they no longer believe that. It’s actually

been more or less accepted by almost everyone that the natural numbers are actually a cultural

construction. And it’s so interesting to go back and study those few people who still exist who

don’t have those systems. So this is just an example to me where a certain mode of thinking

about language itself or a certain mode of thinking about geometry and those kinds of

relations. So it becomes so second nature that you don’t know what it is that you need to teach. And

in fact, we don’t really teach it all that explicitly anyway. You take a math class,

the professor sort of teaches it to you the way they understand it. Some of the students in the

class sort of like they get it. They start to get the way of thinking and they can actually do the

problems that get put on the homework that the professor thinks are interesting and challenging

ones. But most of the students who don’t kind of engage as deeply don’t ever get. And we think,

oh, that man must be brilliant. He must have this special insight. But he must have some

some biological sort of bit that’s different, that makes him so that he or she could have

that insight. Although I don’t want to dismiss biological individual differences completely,

I find it much more interesting to think about the possibility that it was that difference in the

dinner table conversation at the Chomsky house when he was growing up that made it so that he

had that cast of mind. Yeah. And there’s a few topics we talked about that kind of interconnect

because I wonder the better I get at certain things, we humans, the deeper we understand

something, what are you starting to then miss about the rest of the world? We talked about David

and his degenerative mind. And, you know, when you look in the mirror and wonder how different

am I am I cognitively from the man I was a month ago, from the man I was a year ago, like what,

you know, if I can, having thought about language of Chomsky for 10, 20 years, what am I no longer

able to see? What is in my blind spot? And how big is that? And then to somehow be able to leap back

out of your deep, like structure that you form for yourself about thinking about the world,

leap back and look at the big picture again, or jump out of the your current way of thinking.

And to be able to introspect, like what are the limitations of your mind? How is your mind less

powerful than it used to be or more powerful or different, powerful in different ways? So that

seems to be a difficult thing to do because we’re living, we’re looking at the world through the

lens of our mind, right? To step outside and introspect is difficult, but it seems necessary

if you want to make progress. You know, one of the threads of psychological research that’s always

been very, I don’t know, important to me to be aware of is the idea that our explanations of our

own behavior aren’t necessarily actually part of the causal process that caused that behavior to

occur, or even valid observations of the set of constraints that led to the outcome, but they are

post hoc rationalizations that we can give based on information at our disposal about what might

have contributed to the result that we came to when asked. And so this is an idea that was

introduced in a very important paper by Nisbet and Wilson about, you know, the limits on our ability

to be aware of the factors that cause us to make the choices that we make. And, you know, I think

it’s something that we really ought to be much more cognizant of, in general, as human beings,

is that our own insight into exactly why we hold the beliefs that we do and we hold the attitudes

and make the choices and feel the feelings that we do is not something that we totally control

or totally observe. And it’s subject to, you know, our culturally transmitted understanding of what

it is that is the mode that we give to explain these things when asked to do so as much as it is

about anything else. And so even our ability to introspect and think we have access to our own

thoughts is a product of culture and belief, you know, practice.

So let me ask you the big question of advice. So you’ve lived an incredible life in terms of the

ideas you’ve put out into the world, in terms of the trajectory you’ve taken through your career,

through your life. What advice would you give to young people today, in high school, in college,

about how to have a career or how to have a life they can be proud of?

Finding the thing that you are intrinsically motivated to engage with and then celebrating

that discovery is what it’s all about. When I was in college, I struggled with that. I had thought

I wanted to be a psychiatrist because I think I was interested in human psychology in high school.

And at that time, the only sort of information I had that had anything to do with the psyche was,

you know, Freud and Erich Fromm and sort of popular psychiatry kinds of things.

And so, well, they were psychiatrists, right? So I had to be a psychiatrist.

And that meant I had to go to medical school. And I got to college and I find myself taking,

you know, the first semester of a three quarter physics class and it was mechanics. And this was

so far from what it was I was interested in, but it was also too early in the morning in the winter

court semester. So I never made it to the physics class. But I wondered about the rest of my

freshman year and most of my sophomore year until I found myself in the midst of this situation where

around me there was this big revolution happening. I was at Columbia University in 1968 and

the Vietnam War is going on. Columbia is building a gym in Morningside Heights, which is part of

Harlem. And people are thinking, oh, the big bad rich guys are stealing the parkland that

belongs to the people of Harlem. And, you know, they’re part of the military industrial complex,

which is enslaving us and sending us all off to war in Vietnam. And so there was a big revolution

that involved a confluence of black activism and, you know, SDS and social justice and the whole

university blew up and got shut down. And I got a chance to sort of think about

why people were behaving the way they were in this context. And I, you know, I happened to have

taken mathematical statistics. I happened to have been taking psychology that quarter at just cycle

one. And somehow things in that space all ran together in my mind and got me really excited

about asking questions about why people, what made certain people go into the buildings and not

others and things like that. And so suddenly I had a path forward and I had just been wandering

around aimlessly. And at the different points in my career, you know, and I think, okay,

well, should I take this class or should I just read that book about some idea that I want to

understand better, you know, or should I pursue the thing that excites me and interests me or

should I, you know, meet some requirement? You know, that’s, I always did the latter.

So I ended up, my professors in psychology were, thought I was great. They wanted me to go to

graduate school. They nominated me for Phi Beta Kappa. And I went to the Phi Beta Kappa ceremony

and this guy came up and he said, oh, are you Magna Arsuma? And I wasn’t even getting honors

based on my grades. They just happened to have thought I was interested enough in ideas to

belong to Phi Beta Kappa. So. I mean, would it be fair to say you kind of stumbled around a little

bit through accidents of too early morning of classes in physics and so on until you discovered

intrinsic motivation, as you mentioned, and then that’s it. It hooked you. And then you celebrate

the fact that this happens to human beings. Yeah. And what is it that made what I did intrinsically

motivating to me? Well, that’s interesting and I don’t know all the answers to it. And I don’t

think I want anybody to think that you should be sort of in any way, I don’t know, sanctimonious or

anything about it. You know, it’s like, I really enjoyed doing statistical analysis of data. I

really enjoyed running my own experiment, which was what I got a chance to do in the psychology

department that chemistry and physics had never, I never imagined that mere mortals would ever do

an experiment in those sciences, except one that was in the textbook that you were told to do in

lab class. But in psychology, we were already like, even when I was taking psych one, it turned out

we had our own rat and we got to, after two set experiments, we got to, okay, do something you

think of with your rat. So it’s the opportunity to do it myself and to bring together a certain

set of things that engaged me intrinsically. And I think it has something to do with why

certain people turn out to be profoundly amazing musical geniuses, right? They get immersed in it

at an early enough point and it just sort of gets into the fabric. So my little brother had intrinsic

motivation for music as we witnessed when he discovered how to put records on the phonograph

when he was like 13 months old and recognize which one he wanted to play, not because he could read

the labels, because he could sort of see which ones had which scratches, which were the different,

you know, oh, that’s rapidi espanol. And that’s, you know, and, and, and,

And he enjoyed that, that connected with him somehow.

Yeah. And, and there was something that it fed into and it, you’re extremely lucky if you have

that and if you can nurture it and can let it grow and let it be, be an important part of your life.

Yeah. Those are, those are the two things is like, be attentive enough to,

to feel it when it comes, like this is something special. I mean, I don’t know. For example,

I really like tabular data, like Excel sheets. Like it brings me a deep joy. I don’t know how

useful that is for anything. That’s part of what I’m talking about.

Exactly. So there’s like a million, not a million, but there’s a lot of things

like that. For me, you have to hear that for yourself, like be, like realize this is really

joyful. But then the other part that you’re mentioning, which is the nurture is take time

and stay with it, stay with it a while and see where that takes you in life.

Yeah. And I think, I think the, the, the motivational engagement results in the

immersion that then creates the opportunity to obtain the expertise. So, you know, we could call

it the Mozart effect, right? I mean, when I think about Mozart, I think about, you know,

the person who was born as the fourth member of the family string quartet, right? And, and they

handed him the violin when he was six weeks old. All right, start playing, you know, it’s like,

and so the, the level of immersion there was, was amazingly profound, but hopefully he also had,

you know, some, something, maybe this is where the more sort of the genetic part comes in.

Sometimes I think, you know, something in him resonated to the music so that that,

the synergy of the combination of that was so powerful. So, so that’s what I really considered

to be the Mozart effect. It’s sort of the, the synergy of something with, with experience that,

that then results in the unique flowering of a particular, you know, mind.

And so I know my siblings and I are all very different from each other. We’ve all gone in

our own different directions. And, you know, I mentioned my younger brother who was very musical.

I had my other younger brother was like this amazing, like intuitive engineer.

And my sister, one of my sisters was passionate about, in, you know, water conservation well

before it was, you know, such a hugely important issue that it is today. So we all sort of somehow

these find a different thing. And I don’t, I don’t mean to say it isn’t tied in with something about,

about us biologically, but, but it’s also when that happens, where you can find that, then,

you know, you can do your thing and you can be excited about it. So people can be excited about

fitting people on bicycles, as well as excited about making neural networks, achieve insights

into human cognition, right? Yeah. Like for me personally, I’ve always been excited about

love and friendship between humans. And just like the actual experience of it,

since I was a child, just observing people around me and also been excited about robots.

And there’s something in me that thinks I really would love to explore how those two things

combine. And it doesn’t make any sense. A lot of it is also timing, just to think of your own career

and your own life. You found yourself in certain pieces, places that happened to involve some of

the greatest thinkers of our time. And so it just worked out that like, you guys developed those

ideas. And there may be a lot of other people similar to you, and they were brilliant, and

they never found that right connection and place to where they, their ideas could flourish. So

it’s timing, it’s place, it’s people. And ultimately the whole ride, you know, it’s undirected.

Can I ask you about something you mentioned in terms of psychiatry when you were younger?

Because I had a similar experience of, you know, reading Freud and Carl Jung and just,

you know, those kind of popular psychiatry ideas. And that was a dream for me early on in high

school too. Like I hoped to understand the human mind by, somehow psychiatry felt like

the right discipline for that. Does that make you sad? That psychiatry is not

the mechanism by which you are able to explore the human mind. So for me, I was a little bit

disillusioned because of how much prescription medication and biochemistry is involved in the

discipline of psychiatry, as opposed to the dream of the Freud like, use the mechanisms of language

to explore the human mind. So that was a little disappointing. And that’s why I kind of went to

computer science and thinking like, maybe you can explore the human mind by trying to build the

thing. Yes. I wasn’t exposed to the sort of the biomedical slash pharmacological aspects of

psychiatry at that point because I dropped out of that whole idea of premed that I never even

found out about that until much later. But you’re absolutely right. So I was actually a member of the

National Advisory Mental Health Council. That is to say the board of scientists who advise the

director of the National Institute of Mental Health. And that was around the year 2000. And

in fact, at that time, the man who came in as the new director, I had been on this board for a year

when he came in, said, okay, schizophrenia is a biological illness. It’s a lot like cancer.

We’ve made huge strides in curing cancer. And that’s what we’re going to do with schizophrenia.

We’re going to find the medications that are going to cure this disease. And we’re not going

to listen to anybody’s grandmother anymore. And good old behavioral psychology is not something

we’re going to support any further. And he completely alienated me from the Institute

and from all of its prior policies, which had been much more holistic, I think, really at some level.

And the other people on the board were like psychiatrists, very biological psychiatrists.

It didn’t pan out that nothing has changed in our ability to help people with mental illness.

And so 20 years later, that particular path was a dead end, as far as I can tell.

Well, there’s some aspect to, and sorry to romanticize the whole philosophical conversation

about the human mind. But to me, psychiatrists, for a time, held the flag of we’re the deep thinkers.

In the same way that physicists are the deep thinkers about the nature of reality,

psychiatrists are the deep thinkers about the nature of the human mind. And I think that flag

has been taken from them and carried by people like you. It’s like, it’s more in the cognitive

psychology, especially when you have a foot in the computational view of the world, because you can

both build it, you can like, intuit about the functioning of the mind by building little models

and be able to see mathematical things and then deploying those models, especially in computers,

to say, does this actually work? They do like experiments. And then some combination of

neuroscience, where you’re starting to actually be able to observe, do certain experiments on

human beings and observe how the brain is actually functioning. And there, using intuition, you can

start being the philosopher. Like Richard Feynman is the philosopher, cognitive psychologists can

become the philosopher, and psychiatrists become much more like doctors. They’re like very medical.

They help people with medication, biochemistry, and so on. But they are no longer the book writers

and the philosophers, which of course I admire. I admire the Richard Feynman ability to do

great low level mathematics and physics and the high level philosophy.

Yeah, I think it was Fromm and Jung more than Freud that was sort of initially kind of like

made me feel like, oh, this is really amazing and interesting and I want to explore it further.

I actually, when I got to college and I lost that thread, I found more of it in sociology

and literature than I did in any place else. So I took quite a lot of both of those

disciplines as an undergraduate. And I was actually deeply ambivalent about

the psychology because I was doing experiments after the initial flurry of interest in

why people would occupy buildings during an insurrection and consider

being so overcommitted to their beliefs. But I ended up in the psychology laboratory running

experiments on pigeons. And so I had these profound dissonance between the kinds of issues

that would be explored when I was thinking about what I read about in modern British literature

versus what I could study with my pigeons in the laboratory. That got resolved when I went

to graduate school and I discovered cognitive psychology. And so for me, that was the path

out of this sort of like extremely sort of ambivalent divergence between the interest

in the human condition and the desire to do actual mechanistically oriented thinking about it. And I

think we’ve come a long way in that regard and that you’re absolutely right that nowadays this

is something that’s accessible to people through the pathway in through computer science or the

pathway in through neuroscience. You can get derailed in neuroscience down to the bottom of

the system where you might find the cures of various conditions, but you don’t get a chance

to think about the higher level stuff. So it’s in the systems and cognitive neuroscience and

computational intelligence, miasma up there at the top that I think these opportunities are most

are richest right now. And so yes, I am indeed blessed by having had the opportunity to fall

into that space. So you mentioned the human condition, speaking which you happen to be a

human being who’s unfortunately not immortal. That seems to be a fundamental part of the human

condition that this ride ends. Do you think about the fact that you’re going to die one day? Are you

afraid of death? I would say that I am not as much afraid of death as I am of degeneration. And

I say that in part for reasons of having, you know, seen some tragic degenerative situations

unfold. It’s exciting when you can continue to participate and feel like you’re near the place

where the wave is breaking on the shore, if you like. And I think about my own future potential.

If I were to begin to suffer from Alzheimer’s disease or semantic dementia or some other

condition, you know, I would sort of gradually lose the thread of that ability. And so one can

live on for a decade after, you know, sort of having to retire because one no longer has

these kinds of abilities to engage. And I think that’s the thing that I fear the most.

SL. The losing of that, like the breaking of the wave, the flourishing of the mind,

where you have these ideas and they’re swimming around and you’re able to play with them.

RL. Yeah. And collaborate with other people who, you know, are themselves

really helping to push these ideas forward. So, yeah.

SL. What about the edge of the cliff? The end? I mean, the mystery of it. I mean…

RL. The migrated sort of conception of mind and, you know, sort of continuous sort of way of

thinking about most things makes it so that, to me, the discreteness of that transition is less

apparent than it seems to be to most people.

SL. I see. I see. Yeah. Yeah. I wonder, so I don’t know if you know the work of Ernest Becker

and so on. I wonder what role mortality and our ability to be cognizant of it

and anticipate it and perhaps be afraid of it, what role that plays in our reasoning of the world.

RL. I think that it can be motivating to people to think they have a limited period left.

SL. I think in my own case, you know, it’s like seven or eight years ago now that I was

sitting around doing experiments on decision making that were

satisfying in a certain way because I could really get closure on whether the model fit the data

perfectly or not. And I could see how one could test, you know, the predictions in monkeys as well

as humans and really see what the neurons were doing. But I just realized, hey, wait a minute,

you know, I may only have about 10 or 15 years left here. And I don’t feel like I’m getting

towards the answers to the really interesting questions while I’m doing this particular level

of work. And that’s when I said to myself, okay, let’s pick something that’s hard. So that’s when

I started working on mathematical cognition. And I think it was more in terms of, well,

I got 15 more years possibly of useful life left. Let’s imagine that it’s only 10.

I’m actually getting close to the end of that now, maybe three or four more years.

But I’m beginning to feel like, well, I probably have another five after that. So, okay, I’ll give

myself another six or eight. But a deadline is looming and therefore. It’s not going to go on

forever. And so, yeah, I got to keep thinking about the questions that I think are the interesting and

important ones for sure. What do you hope your legacy is? You’ve done some incredible work in

your life as a man, as a scientist, when the aliens and the human civilization is long gone

and the aliens are reading the encyclopedia about the human species. What do you hope is the

paragraph written about you? I would want it to sort of highlight

a couple things that I was able to see one path that was more exciting to me than the one that

seemed already to be there for a cognitive psychologist, but not for any super special

reason other than that I’d had the right context prior to that, but that I had gone ahead and

followed that lead. And then I forget the exact wording, but I said in this preface that

the joy of science is the moment in which a partially formed thought in the mind of one person

gets crystallized a little better in the discourse and becomes the foundation

of some exciting concrete piece of actual scientific progress. And I feel like that

moment happened when Rumelhart and I were doing the interactive activation model and when

Rumelhart heard Hinton talk about gradient descent and having the objective function to guide the

learning process. And it happened a lot in that period and I sort of seek that kind of

thing in my collaborations with my students. So the idea that this is a person who contributed

to science by finding exciting collaborative opportunities to engage with other people

through is something that I certainly hope is part of the paragraph.

And like you said, taking a step maybe in directions that are non obvious. So it’s the

old Robert Frost road less taken. So maybe because you said like this incomplete initial idea,

that step you take is a little bit off the beaten path.

If I could just say one more thing here. This was something that really contributed

to energizing me in a way that I feel it would be useful to share. My PhD dissertation project

was completely empirical experimental project. And I wrote a paper based on the two main

experiments that were the core of my dissertation and I submitted it to a journal. And at the end

of the paper, I had a little section where I laid out the beginnings of my theory about what I

thought was going on that would explain the data that I had collected. And I had submitted the

paper to the Journal of Experimental Psychology. So I got back a letter from the editor saying,

thank you very much. These are great experiments and we’d love to publish them in the journal.

But what we’d like you to do is to leave the theorizing to the theorists and take that part

out of the paper. And so I did, I took that part out of the paper. But I almost found myself labeled

as a non theorist by this. And I could have succumbed to that and said, okay, well, I guess

my job is to just go on and do experiments, right? But that’s not what I wanted to do. And so when I

got to my assistant professorship, although I continued to do experiments because I knew I had

to get some papers out, I also at the end of my first year submitted my first article to

Psychological Review, which was the theoretical journal where I took that section and elaborated

it and wrote it up and submitted it to them. And they didn’t accept that either, but they said,

oh, this is interesting. You should keep thinking about it this time. And then that was what got me

going to think, okay, you know, so it’s not a superhuman thing to contribute to the development

of theory. You know, you don’t have to be, you can do it as a mere mortal.

LB And the broader, I think, lesson is don’t succumb to the labels of a particular reviewer.

RL Yeah, that’s for sure. Or anybody labeling you, right?

LB Yeah, exactly. I mean that, yeah, exactly. And especially as you become successful,

your labels get assigned to you for that you’re successful for that thing.

RL Connectionist or cognitive scientist and not a neuroscientist.

LB And then you can, you can completely, that’s just, that’s the stories of the past. You’re

today a new person that can completely revolutionize in totally new areas. So don’t

let those labels hold you back. Well, let me ask the big question. When you look at into the,

you said it started with Columbia trying to observe these humans and they’re doing

weird stuff and you want to know why are they doing this stuff. So Zuma even bigger.

LB At the hundred plus billion people who’ve ever lived on earth. Why do you think we’re all

doing what we’re doing? What do you think is the meaning of it all? The big why question.

We seem to be very busy doing a bunch of stuff and we seem to be kind of directed towards somewhere.

But why?

RL Well, I myself think that we make meaning for ourselves and that we find inspiration

in the meaning that other people have made in the past. You know, and the great religious thinkers

of the first millennium BC and, you know, few that came in the early part of the second millennium,

you know, laid down some important foundations for us.

But I do believe that, you know, we are an emergent result of a process that happened

naturally without guidance and that meaning is what we make of it and that the creation of

efforts to reify meaning in like religious traditions and so on is just a part of the

expression of that goal that we have to, you know, not find out what the meaning is, but to

make it ourselves. And so, to me, it’s something that’s very personal. It’s very individual. It’s

like meaning will come for you through the particular combination of synergistic elements

that are your fabric and your experience and your context and, you know, you should…

It’s all made in a certain kind of a local context though, right? Here I am at UCSD with this brilliant

man, Rommelhart, who’s having, you know, these doubts about symbolic artificial intelligence

that resonate with my desire to see it grounded in the biology and let’s make the most of that,

you know? Yeah. And so, from that like little pocket, there’s some kind of peculiar little

emergent process that then, which is basically each one of us, each one of us humans is a kind of,

you know, you think cells and they come together and it’s an emergent process that then tells fancy

stories about itself and then gets, just like you said, just enjoys the beauty of the stories

we tell about ourselves. It’s an emergent process that lives for a time, is defined by its local

pocket and context in time and space and then tells pretty stories and we write those stories

down and then we celebrate how nice the stories are and then it continues because we build stories

on top of each other and eventually we’ll colonize hopefully other planets, other solar systems,

other galaxies and we’ll tell even better stories. But it all starts here on Earth. Jay, you’re

speaking of peculiar emergent processes that lived one heck of a story. You’re one of the

the great scientists of cognitive science, of psychology, of computation. It’s a huge honor

you would talk to me today that you spend your very valuable time. I really enjoyed talking with

you and thank you for all the work you’ve done. I can’t wait to see what you do next.

JL Well, thank you so much and this has been an amazing opportunity for me to let ideas that I’ve

never fully expressed before come out because you asked such a wide range of the deeper questions

that we’ve all been thinking about for so long. So thank you very much for that.

RL Thank you. Thanks for listening to this conversation with Jay McClelland.

To support this podcast, please check out our sponsors in the description.

And now, let me leave you with some words from Jeffrey Hinton. In the long run,

curiosity driven research works best. Real breakthroughs come from people focusing

on what they’re excited about. Thanks for listening and hope to see you next time.

comments powered by Disqus