Lex Fridman Podcast - #333 - Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI

The following is a conversation with Andrej Karpathy,

previously the director of AI at Tesla,

and before that, at OpenAI and Stanford.

He is one of the greatest scientists, engineers,

and educators in the history of artificial intelligence.

And now, a quick few second mention of each sponsor.

Check them out in the description.

It’s the best way to support this podcast.

We got 8sleep for naps, BetterHelp for mental health,

Fundrise for real estate investing,

and Athletic Greens for nutrition.

Choose wisely, my friends.

And now, onto the full ad reads.

As always, no ads in the middle.

I try to make this interesting, but if you skip them,

please still check out our sponsors.

I enjoy their stuff, maybe you will too.

This episode is sponsored by 8sleep

and its new Pod 3 mattress.

I am recording this in a hotel.

In fact, given some complexities of my life,

this is the middle of the night, 4 a.m.

I’m sitting in an empty hotel room,

yelling at a microphone.

This, my friends, is my life.

I do usually feel good about myself at 4 a.m.,

but not with two cups of coffee in me.

And the reason I feel good

is because I’m going to go to sleep soon,

and I’ve accomplished a lot.

This is true today, except for the sleep soon part,

because I think I’m going to an airport at some point soon.

It doesn’t matter.

What matters is I’m not even gonna sleep here,

and that’s great, because in a hotel,

I don’t have an 8sleep bed that can cool itself.

At home, I do, and that’s where I’m headed.

I’m headed home.

Anyway, check it out and get special savings

when you go to 8sleep.com slash Lex.

This episode is also brought to you by BetterHelp,

spelled H-E-L-P, help.

I’m a huge fan of talk therapy.

I think of podcasting as a kind of talk therapy.

So I’m a huge fan of listening to podcasts.

In fact, that’s how I think of doing a podcast myself.

I just get to have front row seats to a thing I love.

And it’s actually just the process of talking

that reveals something about the mind.

I think that’s what good talk therapy is,

is it guided by a professional therapist.

It helps you reveal to yourself something about your mind.

Just lay it all out on the table.

So yeah, you should definitely use

the best method of talk therapy,

the best meaning the most accessible,

at least to try it.

If not to make it a regular part of your life,

that’s what BetterHelp does.

Check them out at betterhelp.com slash Lex

and save on your first month.

This episode is also brought to you by Fundrise,

spelled F-U-N-D-R-I-S-E.

It’s a platform that allows you

to invest in private real estate.

We’re living hard times, folks,

for many different reasons,

but one of them is financial.

And one way to protect yourself in difficult times

is diversify your investments.

Private real estate is one of the things, I believe,

you should diversify into.

And when you do, you should use tools

that look like they’re made in the 21st century,

which a lot of investment,

even like online investment websites and services

seem to be designed by the same people

that designed the original ATMs.

That’s not the case with Fundrise.

It’s super easy to use, accessible.

Over 150,000 investors use it.

Their team vets and manages all their real estate projects.

You can track your portfolio’s performance

on their website and see updates

as properties across the country are acquired,

improved, and operated.

Anyway, check out Fundrise.

It takes just a few minutes to get started

at fundrise.com slash Lex.

This show is brought to you by Athletic Greens

and its AG1 drink,

which is an all-in-one daily drink

to support better health and peak performance.

I have to be honest,

I completely forgot to bring Athletic Greens with me

as I’m traveling now, and I miss it.

It’s not just good for my nutritional base and needs,

it’s good for my soul.

It’s part of the sort of the daily habit of life.

And when you don’t have that habit,

the routine stuff is off.

So it’s good to just put that into your daily routine

to make sure that you’re getting the vitamins,

the nutrition that you need,

no matter the dietary, the workload,

the athletic endeavors that you partake in.

I don’t know, it’s kind of incredible.

And yeah, that’s what Athletic Greens is for me.

They’ll give you one month supply of fish oil

when you sign up at athleticgreens.com slash Lex.

This is the Lex Friedman Podcast.

To support it, please check out our sponsors.

And now, dear friends, here’s Andrej Karpathy.

What is a neural network?

What is a neural network?

And why does it seem to do such a surprisingly

good job of learning?

What is a neural network?

It’s a mathematical abstraction of the brain, I would say.

That’s how it was originally developed.

At the end of the day, it’s a mathematical expression,

and it’s a fairly simple mathematical expression

when you get down to it.

It’s basically a sequence of matrix multiplies,

which are really dot products mathematically,

and some nonlinearity is thrown in.

And so it’s a very simple mathematical expression,

and it’s got knobs in it.

Many knobs.

Many knobs.

And these knobs are loosely related

to basically the synapses in your brain.

They’re trainable, they’re modifiable.

And so the idea is like,

we need to find the setting of the knobs

that makes the neural net do whatever you want it to do,

like classify images and so on.

And so there’s not too much mystery, I would say, in it.

You might think that, basically,

don’t want to endow it with too much meaning

with respect to the brain and how it works.

It’s really just a complicated mathematical expression

with knobs, and those knobs need a proper setting

for it to do something desirable.

Yeah, but poetry is just a collection of letters

with spaces that can make us feel a certain way.

And in that same way,

when you get a large number of knobs together,

whether it’s inside the brain or inside a computer,

they seem to surprise us with their power.

Yeah, I think that’s fair.

So basically, I’m underselling it by a lot

because you definitely do get

very surprising emergent behaviors

out of these neural nets when they’re large enough

and trained on complicated enough problems,

like say, for example, the next word prediction

in a massive dataset from the internet.

And then these neural nets take on

pretty surprising magical properties.

Yeah, I think it’s kind of interesting

how much you can get out of even

very simple mathematical formalism.

When your brain right now is talking,

is it doing next word prediction?

Or is it doing something more interesting?

Well, it’s definitely some kind of a generative model

that’s GPT-like and prompted by you.

So you’re giving me a prompt,

and I’m kind of responding to it in a generative way.

And by yourself, perhaps, a little bit?

Are you adding extra prompts

from your own memory inside your head?

Or no?

It’s like you’re referencing

some kind of a declarative structure

of memory and so on.

And then you’re putting that together

with your prompt and giving away some answers.

How much of what you just said

has been said by you before?

Nothing, basically, right?

No, but if you actually look at all the words

you’ve ever said in your life,

and you do a search,

you’ve probably said a lot of the same words

in the same order before.

Yeah, could be.

I mean, I’m using phrases that are common, et cetera,

but I’m remixing it into a pretty unique sentence

at the end of the day.

But you’re right, definitely,

there’s a ton of remixing.

It’s like Magnus Carlsen said,

I’m rated 2,900 whatever,

which is pretty decent.

I think you’re talking very,

you’re not giving enough credit to neural nets here.

Why do they seem to,

what’s your best intuition

about this emergent behavior?

I mean, it’s kind of interesting

because I’m simultaneously underselling them,

but I also feel like there’s an element to which I’m over,

like, it’s actually kind of incredible

that you can get so much emergent magical behavior

out of them despite them being so simple mathematically.

So I think those are kind of like two surprising statements

that are kind of juxtaposed together.

And I think basically what it is

is we are actually fairly good

at optimizing these neural nets.

And when you give them a hard enough problem,

they are forced to learn very interesting solutions

in the optimization.

And those solution basically have these emergent properties

that are very interesting.

There’s wisdom and knowledge in the knobs.

And so this representation that’s in the knobs,

does it make sense to you intuitively

that a large number of knobs can hold a representation

that captures some deep wisdom

about the data it has looked at?

It’s a lot of knobs.

It’s a lot of knobs.

And somehow, you know, so speaking concretely,

one of the neural nets

that people are very excited about right now are GPTs,

which are basically just next word prediction networks.

So you consume a sequence of words from the internet

and you try to predict the next word.

And once you train these on a large enough dataset,

you can basically prompt these neural nets

in arbitrary ways and you can ask them to solve problems

and they will.

So you can just tell them,

you can make it look like you’re trying

to solve some kind of a mathematical problem

and they will continue what they think is the solution

based on what they’ve seen on the internet.

And very often those solutions

look very remarkably consistent,

look correct potentially even.

Do you still think about the brain side of it?

So as neural nets is an abstraction

or mathematical abstraction of the brain,

you still draw wisdom from the biological neural networks

or even the bigger question.

So you’re a big fan of biology and biological computation.

What impressive thing is biology doing to you

that computers are not yet?

That gap.

I would say I’m definitely on,

I’m much more hesitant with the analogies to the brain

than I think you would see potentially in the field.

And I kind of feel like certainly

the way neural networks started

is everything stemmed from inspiration by the brain.

But at the end of the day,

the artifacts that you get after training,

they are arrived at by a very different optimization process

than the optimization process that gave rise to the brain.

And so I think,

I kind of think of it as a very complicated alien artifact.

It’s something different.

I’m sorry, the neural nets that we’re training.

They are complicated alien artifact.

I do not make analogies to the brain

because I think the optimization process

that gave rise to it is very different from the brain.

So there was no multi-agent self-play

kind of setup and evolution.

It was an optimization that is basically

what amounts to a compression objective

on a massive amount of data.

Okay, so artificial neural networks are doing compression

and biological neural networks are not-

Trying to survive.

Are not really doing anything.

They’re an agent in a multi-agent self-play system

that’s been running for a very, very long time.

That said, evolution has found that it is very useful

to predict and have a predictive model in the brain.

And so I think our brain utilizes something

that looks like that as a part of it,

but it has a lot more gadgets and gizmos

and value functions and ancient nuclei

that are all trying to make it survive

and reproduce and everything else.

And the whole thing through embryogenesis

is built from a single cell.

I mean, it’s just, the code is inside the DNA

and it just builds it up,

like the entire organism with arms and the head and legs.

Yes.

And I think it does it pretty well.

It should not be possible.

So there’s some learning going on.

There’s some kind of computation

going through that building process.

I mean, I don’t know where,

if you were just to look at the entirety

of history of life on Earth,

where do you think is the most interesting invention?

Is it the origin of life itself?

Is it just jumping to eukaryotes?

Is it mammals?

Is it humans themselves, homo sapiens?

The origin of intelligence or highly complex intelligence?

What, or is it all just a continuation

of the same kind of process?

Certainly I would say it’s an extremely remarkable story

that I’m only briefly learning about recently.

All the way from, actually,

you almost have to start at the formation of Earth

and all of its conditions and the entire solar system

and how everything is arranged with Jupiter and Moon

and the habitable zone and everything.

And then you have an active Earth

that’s turning over material.

And then you start with abiogenesis and everything.

And so it’s all a pretty remarkable story.

I’m not sure that I can pick a single unique piece of it

that I find most interesting.

I guess for me as an artificial intelligence researcher,

it’s probably the last piece.

We have lots of animals that are not building

technological society, but we do.

And it seems to have happened very quickly.

It seems to have happened very recently.

And something very interesting happened there

that I don’t fully understand.

I almost understand everything else,

I think intuitively, but I don’t understand

exactly that part and how quick it was.

Both explanations would be interesting.

One is that this is just a continuation

of the same kind of process.

There’s nothing special about humans.

That would be, deeply understanding that

would be very interesting.

That we think of ourselves as special,

but it was obvious.

It was already written in the code

that you would have greater and greater

intelligence emerging.

And then the other explanation,

which is something truly special happened.

Something like a rare event,

whether it’s like crazy rare event,

like a space odyssey.

What would it be?

See, if you say like the invention of fire,

or the, as Richard Rankin says,

the beta males deciding a clever way

to kill the alpha males by collaborating.

So just optimizing the collaboration,

the multi-agent aspect of the multi-agent.

And that really being constrained on resources

and trying to survive the collaboration aspect

is what created the complex intelligence.

But it seems like it’s a natural outgrowth

of the evolution process.

What could possibly be a magical thing that happened?

Like a rare thing that would say that humans

are actually, human level intelligence

is actually a really rare thing in the universe.

Yeah, I’m hesitant to say that it is rare by the way,

but it definitely seems like,

it’s kind of like a punctuated equilibrium

where you have lots of exploration

and then you have certain leaps,

sparse leaps in between.

So of course, like origin of life would be one.

DNA, sex, eukaryotic system, eukaryotic life,

the endosymbiosis event where the Archeon ate

little bacteria, just the whole thing.

And then of course, emergence of consciousness and so on.

So it seems like definitely there are sparse events

where massive amount of progress was made,

but yeah, it’s kind of hard to pick one.

So you don’t think humans are unique?

Gotta ask you, how many intelligent alien civilizations

do you think are out there?

And is their intelligence different or similar to ours?

Yeah.

I’ve been preoccupied with this question

quite a bit recently,

basically the Fermi paradox and just thinking through.

And the reason actually that I am very interested

in the origin of life is fundamentally trying to understand

how common it is that there are technological societies

out there in space.

And the more I study it,

the more I think that there should be quite a lot.

Why haven’t we heard from them?

Because I agree with you.

It feels like I just don’t see why what we did here

on Earth is so difficult to do.

Yeah, and especially when you get into the details of it,

I used to think origin of life was very,

it was this magical rare event,

but then you read books like, for example,

Nick Lane, The Vital Question, Life Ascending, et cetera.

And he really gets in and he really makes you believe

that this is not that rare.

Basic chemistry.

You have an active Earth and you have your alkaline vents

and you have lots of alkaline waters

mixing with acidic ocean,

and you have your proton gradients

and you have the little porous pockets

of these alkaline vents that concentrate chemistry.

And basically as he steps through all of these little pieces

you start to understand that actually this is not that crazy.

You could see this happen on other systems.

And he really takes you from just a geology

to primitive life.

And he makes it feel like it’s actually pretty plausible.

And also like the origin of life

was actually fairly fast after formation of Earth.

If I remember correctly, just a few hundred million years

or something like that after basically when it was possible

life actually arose.

And so that makes me feel like that is not the constraint.

That is not the limiting variable

and that life should actually be fairly common.

And then where the drop-offs are

is very interesting to think about.

I currently think that there’s no major drop-offs basically.

And so there should be quite a lot of life.

And basically where that brings me to then

is the only way to reconcile the fact

that we haven’t found anyone and so on

is that we can’t see them.

We can’t observe them.

Just a quick brief comment.

Nick Lane and a lot of biologists I talked to,

they really seem to think that the jump from bacteria

to more complex organisms is the hardest jump.

The eukaryotic life basically.

Yeah, which I don’t, I get it.

They’re much more knowledgeable than me

about the intricacies of biology.

But that seems like crazy

because how many single cell organisms are there

and how much time you have, surely it’s not that difficult.

Like in a billion years,

it’s not even that long of a time really.

Just all these bacteria under constrained resources

battling it out.

I’m sure they can invent more complex.

Like I don’t understand.

It’s like how to move from a hello world program

to like invent a function or something like that.

I don’t.

Yeah.

So I don’t, yeah, so I’m with you.

I just feel like I don’t see any.

If the origin of life, that would be my intuition,

that’s the hardest thing.

But if that’s not the hardest thing

because it happens so quickly,

then it’s gotta be everywhere.

And yeah, maybe we’re just too dumb to see it.

We don’t have really good mechanisms for seeing this life.

I mean, by what, how?

So I’m not an expert just to preface this,

but just from what I think about it.

On aliens.

Who’s, I wanna meet an expert on alien intelligence

and how to communicate.

I’m very suspicious of our ability

to find these intelligences out there

and to find these, or it’s like radio waves,

for example, are terrible.

Their power drops off as basically one over R square.

So I remember reading that our current radio waves

would not be, the ones that we are broadcasting

would not be measurable by our devices today.

Only like, was it like one 10th of a light year away?

Like not even, basically tiny distance

because you really need like a targeted transmission

of massive power directed somewhere

for this to be picked up on long distances.

And so I just think that our ability to measure

is not amazing.

I think there’s probably other civilizations out there.

And then the big question is

why don’t they build binomial probes

and why don’t they interstellar travel

across the entire galaxy?

And my current answer is it’s probably interstellar travel

is like really hard.

You have the interstellar medium.

If you wanna move at close to the speed of light,

you’re going to be encountering bullets along the way

because even like tiny hydrogen atoms

and little particles of dust are basically have

like massive kinetic energy at those speeds.

And so basically you need some kind of shielding.

You need, you have all the cosmic radiation.

It’s just like brutal out there.

It’s really hard.

And so my thinking is maybe interstellar travel

is just extremely hard.

And you have to go very slow.

Like billions of years to build hard?

It feels like we’re not a billion years away from doing that.

It just might be that it’s very,

you have to go very slowly potentially as an example

through space.

Right, as opposed to close to the speed of light.

So I’m suspicious basically of our ability to measure life.

And I’m suspicious of the ability to just permeate

all of space in the galaxy or across galaxies.

And that’s the only way that I can currently see

a way around it.

Yeah, it’s kind of mind blowing to think

that there’s trillions of intelligent alien civilizations

out there kind of slowly traveling through space

to meet each other.

And some of them meet, some of them go to war,

some of them collaborate.

Or they’re all just independent.

They’re all just like little pockets, I don’t know.

Well statistically, if there’s like,

if there’s trillions of them, surely some of them,

some of the pockets are close enough to get them.

Some of them happen to be close, yeah.

Close enough to see each other.

See, once you see something that is definitely complex life,

like if we see something, we’re probably going to be severe,

like intensely, aggressively motivated

to figure out what the hell that is and try to meet them.

What would be your first instinct to try to,

like at a generational level, meet them

or defend against them?

Or what would be your instinct

as a president of the United States and a scientist?

I don’t know which hat you prefer in this question.

Yeah, I think the question, it’s really hard.

I will say like, for example, for us,

we have lots of primitive life forms on Earth next to us.

We have all kinds of ants and everything else

and we share space with them.

And we are hesitant to impact on them and to,

we are trying to protect them by default

because they are amazing, interesting, dynamical systems

that took a long time to evolve

and they are interesting and special.

And I don’t know that you want to destroy that by default.

And so I like complex dynamical systems

that took a lot of time to evolve.

I think I’d like to preserve it if I can afford to.

And I’d like to think that the same would be true

about the galactic resources and that they would think

that we’re kind of incredible, interesting story

that took time, it took a few billion years to unravel

and you don’t want to just destroy it.

I could see two aliens talking about Earth right now

and saying, I’m a big fan of complex dynamical systems.

So I think it was a value to preserve these

and will basically are a video game they watch

or show a TV show that they watch.

Yeah, I think you would need like a very good reason,

I think, to destroy it.

Like why don’t we destroy these ant farms and so on?

It’s because we’re not actually like really

in direct competition with them right now.

We do it accidentally and so on,

but there’s plenty of resources.

And so why would you destroy something

that is so interesting and precious?

Well, from a scientific perspective, you might probe it.

Yeah.

You might interact with it lightly.

You might want to learn something from it, right?

So I wonder, there could be certain physical phenomena

that we think is a physical phenomena,

but it’s actually interacting with us

to like poke the finger and see what happens.

I think it should be very interesting to scientists,

other alien scientists, what happened here.

And you know, what we’re seeing today is a snapshot.

Basically, it’s a result of a huge amount of computation

over like a billion years or something like that.

It could have been initiated by aliens.

This could be a computer running a program.

Like when, okay, if you had the power to do this,

when you, okay, for sure, at least I would,

I would pick a Earth-like planet that has the conditions,

based on my understanding of the chemistry prerequisites

for life, and I would seed it with life and run it, right?

Wouldn’t you 100% do that and observe it and then protect?

I mean, that’s not just a hell of a good TV show.

It’s a good scientific experiment.

Yeah.

And it’s physical simulation, right?

Maybe evolution is the most, like actually running it

is the most efficient way to understand computation

or to compute stuff.

Or to understand life or, you know, what life looks like

and what branches it can take.

It does make me kind of feel weird

that we’re a part of a science experiment,

but maybe it’s, everything’s a science experiment.

Does that change anything for us,

if we’re a science experiment?

I don’t know.

Two descendants of apes talking about

being inside of a science experiment?

I’m suspicious of this idea of like a deliberate panspermia,

as you described it, sort of.

And I don’t see a divine intervention in some way

in the historical record right now.

I do feel like the story in these books,

like Nick Lane’s books and so on, sort of makes sense.

And it makes sense how life arose on earth uniquely.

And yeah, I don’t need to reach

for more exotic explanations right now.

Sure, but NPCs inside a video game

don’t observe any divine intervention either.

We might just be all NPCs running a kind of code.

Maybe eventually they will.

Currently NPCs are really dumb,

but once they’re running GPTs,

maybe they will be like,

hey, this is really suspicious, what the hell?

So you famously tweeted,

it looks like if you bombard earth with photons for a while,

you can emit a roadster.

So if like in Hitchhiker’s Guide to the Galaxy,

we would summarize the story of earth.

So in that book, it’s mostly harmless.

What do you think is all the possible stories,

like a paragraph long or a sentence long,

that earth could be summarized as?

Once it’s done, it’s computation.

So like all the possible full,

if earth is a book, right?

Yeah.

Probably there has to be an ending.

I mean, there’s going to be an end to earth

and it could end in all kinds of ways.

It can end soon, it can end later.

What do you think are the possible stories?

Well, definitely there seems to be,

yeah, you’re sort of,

it’s pretty incredible that these self-replicating systems

will basically arise from the dynamics

and then they perpetuate themselves and become more complex

and eventually become conscious and build a society.

And I kind of feel like in some sense,

it’s kind of like a deterministic wave

that kind of just like happens on any,

any sufficiently well-arranged system like earth.

And so I kind of feel like there’s a certain sense

of inevitability in it and it’s really beautiful.

And it ends somehow, right?

So it’s a chemically,

a diverse environment where complex dynamical systems

can evolve and become more, further and further complex.

But then there’s a certain,

what is it?

There’s certain terminating conditions.

Yeah, I don’t know what determining conditions are,

but definitely there’s a trend line of something

and we’re part of that story.

And like, where does that, where does it go?

So, we’re famously described often

as a biological bootloader for AIs.

And that’s because humans,

I mean, we’re an incredible biological system

and we’re capable of computation and love and so on,

but we’re extremely inefficient as well.

Like we’re talking to each other through audio.

It’s just kind of embarrassing, honestly,

that we’re manipulating like seven symbols,

serially, we’re using vocal cords.

It’s all happening over like multiple seconds.

It’s just like kind of embarrassing

when you step down to the frequencies

at which computers operate or are able to operate on.

And so basically it does seem like synthetic intelligences

are kind of like the next stage of development.

And I don’t know where it leads to,

like at some point I suspect the universe

is some kind of a puzzle.

And these synthetic AIs will uncover that puzzle

and solve it.

And then what happens after, right?

Like what, because if you just like fast forward Earth,

many billions of years, it’s like, it’s quiet.

And then it’s like tormal.

You see like city lights and stuff like that.

And then what happens at like at the end?

Like, is it like a poof?

Is it, or is it like a calming?

Is it explosion?

Is it like Earth, like open, like a giant?

Because you said emit roasters.

Will it start emitting like a giant number of like satellites?

Yes, it’s some kind of a crazy explosion.

And we’re living, we’re like,

we’re stepping through a explosion

and we’re like living day to day and it doesn’t look like it.

But it’s actually, if you,

I saw a very cool animation of Earth and life on Earth

and basically nothing happens for a long time.

And then the last like two seconds,

like basically cities and everything

and the lower orbit just gets cluttered

and just the whole thing happens in the last two seconds.

And you’re like, this is exploding.

This is a statement explosion.

So if you play, yeah, yeah.

If you play at a normal speed,

it’ll just look like an explosion.

It’s a firecracker.

We’re living in a firecracker.

Where it’s going to start emitting

all kinds of interesting things.

And then, so explosion doesn’t,

it might actually look like a little explosion

with lights and fire and energy emitted,

all that kind of stuff.

But when you look inside the details of the explosion,

there’s actual complexity happening

where there’s like, yeah, human life or some kind of life.

We hope it’s not a destructive firecracker.

It’s kind of like a constructive firecracker.

All right, so given that, I think a hilarious discussion.

It is really interesting to think about

like what the puzzle of the universe is.

Did the creator of the universe give us a message?

Like for example, in the book, Contact, Carl Sagan,

there’s a message for humanity,

for any civilization in digits

in the expansion of pi in base 11 eventually,

which is kind of interesting thought.

Maybe we’re supposed to be giving a message to our creator.

Maybe we’re supposed to somehow create

some kind of a quantum mechanical system

that alerts them to our intelligent presence here.

Because if you think about it from their perspective,

it’s just say like quantum field theory,

massive like cellular automaton like thing.

And like, how do you even notice that we exist?

You might not even be able to pick us up

in that simulation.

And so how do you prove that you exist,

that you’re intelligent and that you’re part of the universe?

So this is like a Turing test for intelligence from Earth?

Yeah.

Like the creator is, I mean,

maybe this is like trying to complete

the next word in a sentence.

This is a complicated way of that.

Like Earth is just, is basically sending a message back.

Yeah, the puzzle is basically like

alerting the creator that we exist.

Or maybe the puzzle is just to just break out of the system

and just, you know, stick it to the creator in some way.

Like if you’re playing a video game,

you can somehow find an exploit

and find a way to execute on the host machine

in the arbitrary code.

There’s some, for example,

I believe someone got a game of Mario to play Pong

just by exploiting it.

And then basically writing code

and being able to execute arbitrary code in the game.

And so maybe that’s the puzzle

is that we should find a way to exploit it.

So I think like some of these synthetic AIs

will eventually find the universe to be

some kind of a puzzle and then solve it in some way.

And that’s kind of like the end game somehow.

Do you often think about it as a simulation?

So as the universe being a kind of computation

that might have bugs and exploits?

Yes, yeah, I think so.

Is that what physics is essentially?

I think it’s possible that physics has exploits

and we should be trying to find them.

Arranging some kind of a crazy quantum mechanical system

that somehow gives you buffer overflow,

somehow gives you a rounding error in the floating point.

Yeah, that’s right.

And like more and more sophisticated exploits,

like those are jokes,

but that could be actually very close to reality.

Yeah, we’ll find some way to extract infinite energy.

For example, when you train reinforcement learning agents

in physical simulations

and you ask them to say run quickly on a flat ground,

they’ll end up doing all kinds of like weird things

in part of that optimization, right?

They’ll get on their back leg

and they will slide across the floor.

And it’s because the optimization,

the enforcement learning optimization on that agent

has figured out a way to extract infinite energy

from the friction forces

and basically their poor implementation.

And they found a way to generate infinite energy

and just slide across the surface.

And it’s not what you expected.

It’s just a, it’s sort of like a perverse solution.

And so maybe we can find something like that.

Maybe we can be that little dog in this physical simulation.

That cracks or escapes the intended consequences

of the physics that the universe came up with.

We’ll figure out some kind of shortcut to some weirdness.

And then, oh man.

But see the problem with that weirdness

is the first person to discover the weirdness,

like sliding on the back legs,

that’s all we’re gonna do.

It’s very quickly becomes everybody does that thing.

So like the paperclip maximizer is a ridiculous idea,

but that very well could be what,

then we’ll just all switch that because it’s so fun.

Well, no person will discover it, I think, by the way.

I think it’s going to have to be

some kind of a super intelligent AGI of a third generation.

Like we’re building the first generation AGI.

And then, you know.

Third generation.

Yeah, so the bootloader for an AI,

that AI will be a bootloader for another AI.

And then there’s no way for us to introspect

like what that might even.

I think it’s very likely that these things, for example,

like say you have these AGIs,

it’s very likely that for example,

they will be completely inert.

I like these kinds of sci-fi books sometimes

where these things are just completely inert.

They don’t interact with anything.

And I find that kind of beautiful

because they’ve probably figured out the meta game

of the universe in some way, potentially.

They’re doing something completely beyond our imagination.

And they don’t interact with simple chemical life forms.

Like, why would you do that?

So I find those kinds of ideas compelling.

What’s their source of fun?

What are they doing?

What’s the source of pleasure?

Well, it’s probably puzzle solving in the universe.

But inert.

So can you define what it means inert?

So they escape the interaction with physical reality?

They will appear inert to us.

As in,

they will behave in some very like strange way to us

because they’re beyond, they’re playing the meta game.

And the meta game is probably say like

arranging quantum mechanical systems

in some very weird ways to extract infinite energy,

solve the digital expansion of pi to whatever amount.

They will build their own like little fusion reactors

or something crazy.

Like they’re doing something beyond comprehension

and not understandable to us

and actually brilliant under the hood.

What if quantum mechanics itself is the system

and we’re just thinking it’s physics,

but we’re really parasites on, not parasite,

we’re not really hurting physics.

We’re just living on this organisms,

this organism and we’re like trying to understand it,

but really it is an organism.

And with a deep, deep intelligence,

maybe physics itself is

the organism that’s doing the super interesting thing.

And we’re just like one little thing.

Ant sitting on top of it, trying to get energy from it.

We’re just kind of like these particles in the wave

that I feel like is mostly deterministic

and takes a universe from some kind of a Big Bang

to some kind of a super intelligent replicator,

some kind of a stable point in the universe

given these laws of physics.

You don’t think, as Einstein said, God doesn’t play dice.

So you think it’s mostly deterministic?

There’s no randomness in the thing?

I think it’s deterministic.

Oh, there’s tons of,

well, I’m gonna be careful with randomness.

Pseudo-random?

Yeah, I don’t like random.

I think maybe the laws of physics are deterministic.

Yeah, I think they’re deterministic.

You just got really uncomfortable with this question.

I just, do you have anxiety

about whether the universe is random or not?

Is this a sort of, what’s like-

There’s no randomness, no.

You said you like goodwill hunting.

It’s not your fault, Andre, it’s not your fault, man.

So you don’t like randomness?

Yeah, I think it’s unsettling.

I think it’s a deterministic system.

I think that things that look random,

like, say, the collapse of the wave function, et cetera,

I think they’re actually deterministic,

just entanglement and so on,

and some kind of a multiverse theory, something, something.

Okay, so why does it feel like we have a free will?

Like, if I raised a hand, I chose to do this now.

That doesn’t feel like a deterministic thing.

It feels like I’m making a choice.

It feels like it.

Okay, so it’s all feelings.

It’s just feelings.

So when an RL agent is making a choice,

is that, it’s not really making a choice.

The choice is already there.

Yeah, you’re interpreting the choice,

and you’re creating a narrative for having made it.

Yeah, and now we’re talking about the narrative.

It’s very meta.

Looking back, what is the most beautiful

or surprising idea in deep learning, or AI,

in general, that you’ve come across?

You’ve seen this field explode

and grow in interesting ways.

Just, what cool ideas, like,

like we made you sit back and go,

hmm, small, big or small?

Well, the one that I’ve been thinking about recently,

the most probably, is the transformer architecture.

So basically, neural networks have,

a lot of architectures that were trendy

have come and gone for different sensory modalities,

like for vision, audio, text.

And you would process them

with different looking neural nets.

And recently, we’ve seen this convergence

towards one architecture, the transformer.

And you can feed it video, or you can feed it,

you know, images, or speech, or text,

and it just gobbles it up.

And it’s kind of like a bit of a general purpose computer

that is also trainable

and very efficient to run on our hardware.

And so this paper came out in 2016, I wanna say.

Attention is all you need.

Attention is all you need.

You criticize the paper title in retrospect,

that it wasn’t,

it didn’t foresee the bigness of the impact

that it was going to have.

Yeah, I’m not sure if the authors were aware

of the impact that that paper would go on to have.

Probably they weren’t.

But I think they were aware of some of the motivations

and design decisions behind the transformer,

and they chose not to, I think,

expand on it in that way in the paper.

And so I think they had an idea

that there was more than just the surface

of just like, oh, we’re just doing translation

and here’s a better architecture.

You’re not just doing translation.

This is like a really cool, differentiable,

optimizable, efficient computer that you’ve proposed.

And maybe they didn’t have all of that foresight,

but I think it’s really interesting.

Isn’t it funny, sorry to interrupt,

that that title is memeable,

that they went for such a profound idea.

They went with a,

I don’t think anyone used that kind of title before, right?

Attention is all you need.

Yeah, it’s like a meme or something, basically.

Isn’t that funny that one,

like maybe if it was a more serious title,

it wouldn’t have the impact.

Honestly, I, yeah, there is an element of me

that honestly agrees with you and prefers it this way.

Yes.

If it was too grand,

it would over-promise and then under-deliver potentially.

So you want to just meme your way to greatness.

That should be a T-shirt.

So you tweeted that Transformer

is a magnificent neural network architecture

because it is a general purpose differentiable computer.

It is simultaneously expressive in the forward pass,

optimizable via back propagation, gradient descent,

and efficient, high-parallelism compute graph.

Can you discuss some of those details,

expressive, optimizable, efficient from memory

or in general, whatever comes to your heart?

You want to have a general purpose computer

that you can train on arbitrary problems,

like say the task of next word prediction

or detecting if there’s a cat in an image

or something like that.

And you want to train this computer,

so you want to set its weights.

And I think there’s a number of design criteria

that sort of overlap in the Transformer simultaneously

that made it very successful.

And I think the authors were kind of deliberately

trying to make this a really powerful architecture.

And so basically it’s very powerful in the forward pass

because it’s able to express very general computation

as sort of something that looks like message passing.

You have nodes and they all store vectors

and these nodes get to basically look at each other

and it’s each other’s vectors and they get to communicate.

And basically nodes get to broadcast,

hey, I’m looking for certain things.

And then other nodes get to broadcast,

hey, these are the things I have.

Those are the keys and the values.

So it’s not just attention.

Yeah, exactly.

Transformer is much more than just the attention component.

It’s got many pieces architectural that went into it.

The residual connection, the way it’s arranged,

there’s a multi-layer perceptron in there,

the way it’s stacked and so on.

But basically there’s a message passing scheme

where nodes get to look at each other,

decide what’s interesting and then update each other.

And so I think when you get to the details of it,

I think it’s a very expressive function.

So it can express lots of different types of algorithms

in forward pass.

Not only that, but the way it’s designed

with the residual connections, layer normalizations,

the softmax attention and everything,

it’s also optimizable.

This is a really big deal

because there’s lots of computers that are powerful

that you can’t optimize,

or they’re not easy to optimize

using the techniques that we have,

which is backpropagation and gradient descent.

These are first order methods,

very simple optimizers really.

And so you also need it to be optimizable.

And then lastly,

you want it to run efficiently on our hardware.

Our hardware is a massive throughput machine like GPUs.

They prefer lots of parallelism.

So you don’t want to do lots of sequential operations.

You want to do a lot of operations serially.

And the transformer is designed with that in mind as well.

And so it’s designed for our hardware

and it’s designed to both be very expressive

in a forward pass,

but also very optimizable in the backward pass.

And you said that the residual connections

support a kind of ability to learn short algorithms

fast and first,

and then gradually extend them longer during training.

What’s the idea of learning short algorithms?

Right.

Think of it as a,

so basically a transformer is a series of blocks, right?

And these blocks have attention

and a little multilayer perceptron.

And so you go off into a block

and you come back to this residual pathway,

and then you go off and you come back,

and then you have a number of layers arranged sequentially.

And so the way to look at it, I think,

is because of the residual pathway in the backward pass,

the gradients sort of flow along it uninterrupted,

because addition distributes the gradient

equally to all of its branches.

So the gradient from the supervision at the top

just floats directly to the first layer.

And all the residual connections are arranged

so that in the beginning during initialization,

they contribute nothing to the residual pathway.

So what it kind of looks like is,

imagine the transformer is kind of like a Python function,

like a def.

And you get to do various kinds of lines of code.

Say you have a hundred layers deep transformer,

typically they would be much shorter, say 20.

So you have 20 lines of code

and you can do something in them.

And so during the optimization,

basically what it looks like is,

first you optimize the first line of code,

and then the second line of code can kick in,

and the third line of code can kick in.

And I kind of feel like because of the residual pathway

and the dynamics of the optimization,

you can sort of learn a very short algorithm

that gets the approximate answer,

but then the other layers can sort of kick in

and start to create a contribution.

And at the end of it,

you’re optimizing over an algorithm

that is 20 lines of code,

except these lines of code are very complex

because it’s an entire block of a transformer.

You can do a lot in there.

What’s really interesting

is that this transformer architecture actually

has been remarkably resilient.

Basically the transformer that came out in 2016

is the transformer you would use today,

except you reshuffle some of the layer norms.

The layer normalizations have been reshuffled

to a pre-norm formulation.

And so it’s been remarkably stable,

but there’s a lot of bells and whistles

that people have attached on it and try to improve it.

I do think that basically it’s a big step

in simultaneously optimizing for lots of properties

of a desirable neural network architecture.

And I think that people have been trying to change it,

but it’s proven remarkably resilient.

But I do think that there should be

even better architectures potentially.

But you admire the resilience here.

Yeah.

There’s something profound about this architecture

that leads to resilience.

So maybe everything can be turned into a problem

that transformers can solve.

Currently, it definitely looks like

the transformer is taking over AI

and you can feed basically arbitrary problems into it.

And it’s a general differentiable computer

and it’s extremely powerful.

And this conversions in AI has been really interesting

to watch for me personally.

What else do you think could be discovered here

about transformers?

Like what’s surprising thing?

Or is it a stable,

I want a stable place.

Is there something interesting we might discover

about transformers?

Like aha moments, maybe has to do with memory.

Maybe knowledge representation, that kind of stuff.

Definitely the Zeitgeist today is just pushing,

like basically right now the Zeitgeist

is do not touch the transformer, touch everything else.

So people are scaling up the datasets,

making them much, much bigger.

They’re working on the evaluation,

making the evaluation much, much bigger.

And they’re basically keeping the architecture unchanged.

And that’s how we’ve,

that’s the last five years of progress in AI, kind of.

What do you think about one flavor of it,

which is language models?

Have you been surprised?

Has your sort of imagination been captivated by,

you mentioned GPT and all the bigger and bigger

and bigger language models.

And what are the limits of those models, do you think?

So just for the task of natural language.

Basically the way GPT is trained, right,

is you just download a massive amount of text data

from the internet,

and you try to predict the next word in the sequence,

roughly speaking.

You’re predicting little word chunks,

but roughly speaking, that’s it.

And what’s been really interesting to watch is,

basically it’s a language model.

Language models have actually existed for a very long time.

There’s papers on language modeling from 2003, even earlier.

Can you explain in that case what a language model is?

Yeah, so language model,

just basically the rough idea is

just predicting the next word in a sequence,

roughly speaking.

So there’s a paper from, for example,

Ben Gio and the team from 2003,

where for the first time they were using a neural network

to take, say, like three or five words

and predict the next word.

And they’re doing this on much smaller datasets,

and the neural net is not a transformer,

it’s a multi-layer perceptron.

But it’s the first time that a neural network

has been applied in that setting.

But even before neural networks,

there were language models,

except they were using N-gram models.

So N-gram models are just count-based models.

So if you start to take two words and predict the third one,

you just count up how many times you’ve seen

any two-word combinations and what came next.

And what you predict as coming next

is just what you’ve seen the most of in the training set.

And so language modeling has been around for a long time.

Neural networks have done language modeling

for a long time.

So really what’s new or interesting or exciting

is just realizing that when you scale it up

with a powerful enough neural net, a transformer,

you have all these emergent properties

where basically what happens is

if you have a large enough data set of text,

you are in the task of predicting the next word,

you are multitasking a huge amount

of different kinds of problems.

You are multitasking understanding of chemistry,

physics, human nature.

Lots of things are sort of clustered in that objective.

It’s a very simple objective,

but actually you have to understand a lot about the world

to make that prediction.

You just said the U-word, understanding.

Are you, in terms of chemistry and physics and so on,

what do you feel like it’s doing?

Is it searching for the right context?

What is the actual process happening here?

Yeah, so basically it gets a thousand words

and it’s trying to predict the thousand and first.

And in order to do that very, very well

over the entire data set available on the internet,

you actually have to basically kind of understand

the context of what’s going on in there.

And it’s a sufficiently hard problem

that if you have a powerful enough computer,

like a transformer,

you end up with interesting solutions.

And you can ask it to do all kinds of things.

And it shows a lot of emergent properties,

like in-context learning.

That was the big deal with GPT

and the original paper when they published it,

is that you can just sort of prompt it in various ways

and ask it to do various things.

And it will just kind of complete the sentence.

But in the process of just completing the sentence,

it’s actually solving all kinds of really interesting

problems that we care about.

Do you think it’s doing something like understanding?

Like when we use the word understanding for us humans.

I think it’s doing some understanding.

In its weights, it understands, I think,

a lot about the world.

And it has to in order to predict

the next word in the sequence.

So it’s trained on the data from the internet.

What do you think about this approach

in terms of data sets, of using data from the internet?

Do you think the internet has enough structured data

to teach AI about human civilization?

Yes, I think the internet has a huge amount of data.

I’m not sure if it’s a complete enough set.

I don’t know that text is enough

for having a sufficiently powerful AGI as an outcome.

Of course, there is audio and video and images

and all that kind of stuff.

Yeah, so text by itself, I’m a little bit suspicious about.

There’s a ton of things we don’t put in text, in writing,

just because they’re obvious to us

about how the world works and the physics of it

and that things fall.

We don’t put that stuff in text because why would you?

We share that understanding.

And so text is a communication medium between humans

and it’s not a all-encompassing medium of knowledge

about the world.

But as you pointed out, we do have video

and we have images and we have audio.

And so I think that definitely helps a lot,

but we haven’t trained models sufficiently across both,

across all of those modalities yet.

So I think that’s what a lot of people are interested in.

But I wonder what that shared understanding

of like what we might call common sense

has to be learned, inferred,

in order to complete the sentence correctly.

So maybe the fact that it’s implied on the internet,

the model’s gonna have to learn that,

not by reading about it,

by inferring it in the representation.

So like common sense, just like we,

I don’t think we learn common sense.

Like nobody says, tells us explicitly.

We just figure it all out by interacting with the world.

Right.

And so here’s a model reading

about the way people interact with the world.

It might have to infer that.

I wonder.

Yeah.

You briefly worked on a project called World of Bits,

training an RL system to take actions on the internet,

versus just consuming the internet, like we talked about.

Do you think there’s a future for that kind of system,

interacting with the internet to help the learning?

Yes, I think that’s probably the final frontier

for a lot of these models,

because as you mentioned when I was at OpenAI,

I was working on this project, World of Bits.

And basically it was the idea of giving neural networks

access to a keyboard and a mouse.

And the idea is that-

What could possibly go wrong?

So basically you perceive the input of the screen pixels,

and basically the state of the computer

is sort of visualized for human consumption

in images of the web browser and stuff like that.

And then you give the neural network the ability

to press keyboards and use the mouse.

And we were trying to get it to, for example,

complete bookings and interact with user interfaces.

And-

Where’d you learn from that experience?

Like what was some fun stuff?

This is a super cool idea.

Yeah.

I mean, it’s like, yeah, I mean,

the step between observer to actor

is a super fascinating step.

Yeah, well, it’s the universal interface

in the digital realm, I would say.

And there’s a universal interface in like the physical realm,

which in my mind is a humanoid form factor kind of thing.

We can later talk about Optimus and so on,

but I feel like there’s a,

they’re kind of like a similar philosophy in some way,

where the physical world is designed for the human form,

and the digital world is designed for the human form

of seeing the screen and using keyboard and mouse.

And so it’s the universal interface

that can basically command the digital infrastructure

we’ve built up for ourselves.

And so it feels like a very powerful interface

to command and to build on top of.

Now to your question as to like what I learned from that,

it’s interesting because the world of bits

was basically too early, I think, at OpenAI at the time.

This is around 2015 or so.

And the zeitgeist at that time was very different

in AI from the zeitgeist today.

At the time, everyone was super excited

about reinforcement learning from scratch.

This is the time of the Atari paper,

where neural networks were playing Atari games

and beating humans in some cases, AlphaGo and so on.

So everyone was very excited

about training neural networks from scratch

using reinforcement learning directly.

It turns out that reinforcement learning

is extremely inefficient way of training neural networks

because you’re taking all these actions

and all these observations,

and you get some sparse rewards once in a while.

So you do all this stuff based on all these inputs.

And once in a while, you’re like told you did a good thing.

You did a bad thing.

And it’s just an extremely hard problem.

You can’t learn from that.

You can burn a forest

and you can sort of brute force through it.

And we saw that, I think, with, you know,

with Go and Dota and so on, and it does work,

but it’s extremely inefficient

and not how you want to approach problems,

practically speaking.

And so that’s the approach that at the time

we also took to World of Bits.

We would have an agent initialize randomly.

So with keyboard mash and mouse mash

and try to make a booking.

And it’s just like revealed the insanity

of that approach very quickly,

where you have to stumble by the correct booking

in order to get a reward of you did it correctly.

And you’re never gonna stumble by it by chance at random.

So even with a simple web interface,

there’s too many options.

There’s just too many options

and it’s too sparse of a reward signal.

And you’re starting from scratch at the time.

And so you don’t know how to read.

You don’t understand pictures, images, buttons.

You don’t understand what it means to like make a booking.

But now what’s happened is it is time to revisit that.

And OpenAI is interested in this.

Companies like Adept are interested in this and so on.

And the idea is coming back

because the interface is very powerful,

but now you’re not training an agent from scratch.

You are taking the GPT as an initialization.

So GPT is pre-trained on all of text

and it understands what’s a booking.

It understands what’s a submit.

It understands quite a bit more.

And so it already has those representations.

They are very powerful.

And that makes all of the training

significantly more efficient

and makes the problem tractable.

Should the interaction be with like the way humans see it

with the buttons and the language

or should be with the HTML, JavaScript and the CSS?

What do you think is the better?

So today all of this interaction

is mostly on the level of HTML, CSS and so on.

That’s done because of computational constraints.

But I think ultimately everything is designed

for human visual consumption.

And so at the end of the day,

there’s all the additional information

is in the layout of the webpage and what’s next to you

and what’s a red background and all this kind of stuff

and not what it looks like visually.

So I think that’s the final frontier

as we are taking in pixels

and we’re giving out keyboard, mouse commands.

But I think it’s impractical still today.

Do you worry about bots on the internet?

Given these ideas, given how exciting they are,

do you worry about bots on Twitter

being not the stupid bots that we see now

with the crypto bots,

but the bots that might be out there actually

that we don’t see,

that they’re interacting in interesting ways?

So this kind of system feels like

it should be able to pass the

I’m not a robot click button, whatever.

Which do you actually understand how that test works?

I don’t quite.

There’s a checkbox or whatever that you click.

It’s presumably tracking mouse movement

and the timing and so on.

So exactly this kind of system we’re talking about

should be able to pass that.

So what do you feel about bots that are language models

plus have some interactability

and are able to tweet and reply and so on?

Do you worry about that world?

Yeah, I think it’s always been a bit of an arms race

between sort of the attack and the defense.

So the attack will get stronger,

but the defense will get stronger as well.

Our ability to detect that.

How do you defend?

How do you detect?

How do you know that your Carpati account

on Twitter is human?

How would you approach that?

Like if people were claiming,

how would you defend yourself in the court of law

that I’m a human, this account is human?

Yeah, at some point I think it might be,

I think the society will evolve a little bit.

Like we might start signing, digitally signing

some of our correspondence or things that we create.

Right now it’s not necessary,

but maybe in the future it might be.

I do think that we are going towards a world

where we share the digital space with AIs.

Synthetic beings.

Yeah, and they will get much better

and they will share our digital realm

and they’ll eventually share our physical realm as well.

It’s much harder,

but that’s kind of like the world we’re going towards.

And most of them will be benign and awful

and some of them will be malicious

and it’s going to be an arms race trying to detect them.

So, I mean, the worst isn’t the AIs,

the worst is the AIs pretending to be human.

So I don’t know if it’s always malicious.

There’s obviously a lot of malicious applications,

but it could also be, you know, if I was an AI,

I would try very hard to pretend to be human

because we’re in a human world.

I wouldn’t get any respect as an AI.

I want to get some love and respect.

I don’t think the problem is intractable.

People are thinking about the proof of personhood

and we might start digitally signing our stuff

and we might all end up having like,

yeah, basically some solution for proof of personhood.

It doesn’t seem to me intractable.

It’s just something that we haven’t had to do until now.

But I think once the need really starts to emerge,

which is soon, I think people will think about it much more.

So, but that too will be a race

because obviously you can probably spoof or fake

the proof of personhood.

So you have to try to figure out how to.

Probably.

I mean, it’s weird that we have like social security numbers

and like passports and stuff.

It seems like it’s harder to fake stuff

in the physical space.

But in the digital space,

it just feels like it’s gonna be very tricky,

very tricky to out,

because it seems to be pretty low cost to fake stuff.

What are you gonna put an AI in jail

for like trying to use a fake personhood proof?

I mean, okay, fine.

You’ll put a lot of AIs in jail,

but there’ll be more AIs, like exponentially more.

The cost of creating a bot is very low.

Unless there’s some kind of way to track accurately,

like you’re not allowed to create any program

without tying yourself to that program.

Like any program that runs on the internet,

you’ll be able to trace every single human program

that was involved with that program.

Yeah, maybe you have to start declaring when,

you know, we have to start drawing those boundaries

and keeping track of, okay,

what are digital entities versus human entities?

And what is the ownership of human entities

and digital entities and something like that.

I don’t know, but I think I’m optimistic

that this is possible.

And in some sense,

we’re currently in like the worst time of it

because all these bots suddenly have become very capable,

but we don’t have the fences yet built up as a society.

But I think that doesn’t seem to me intractable.

It’s just something that we have to deal with.

It seems weird that the Twitter bot,

like really crappy Twitter bots are so numerous.

Like is it?

So I presume that the engineers at Twitter are very good.

So it seems like what I would infer from that

is it seems like a hard problem.

They’re probably catching, all right,

if I were to sort of steel man the case,

it’s a hard problem and there’s a huge cost

to false positive to removing a post by somebody

that’s not a bot.

That creates a very bad user experience.

So they’re very cautious about removing.

So maybe it’s,

and maybe the bots are really good at learning

what gets removed and not,

such that they can stay ahead of the removal process

very quickly.

My impression of it, honestly,

is there’s a lot of low hanging fruit.

I mean, it’s not subtle.

My impression of it, it’s not subtle.

But you have, yeah, that’s my impression as well.

But it feels like maybe you’re seeing

the tip of the iceberg.

Maybe the number of bots is in like the trillions.

And you have to like, just,

it’s a constant assault of bots and you,

I don’t know, you have to steel man the case

because the bots I’m seeing are pretty obvious.

I could write a few lines of code to catch these bots.

I mean, definitely there’s a lot of low hanging fruit,

but I will say, I agree that if you are

a sophisticated actor, you could probably create

a pretty good bot right now using tools like GPTs

because it’s a language model.

You can generate faces that look quite good now

and you can do this at scale.

And so I think, yeah, it’s quite plausible

and it’s going to be hard to defend.

There was a Google engineer that claimed

that the Lambda was sentient.

Do you think there’s any inkling of truth

to what he felt?

And more importantly, to me at least,

do you think language models will achieve sentience

or the illusion of sentience soonish-ish?

Yeah, to me it’s a little bit of a canary

in a coal mine kind of moment, honestly, a little bit,

because, so this engineer spoke to like a chatbot at Google

and became convinced that this bot is sentient.

Yeah, asked it some existential philosophical questions.

And it gave like reasonable answers

and looked real and so on.

So to me it’s a, he wasn’t sufficiently trying

to stress the system, I think,

and exposing the truth of it as it is today.

But I think this will be increasingly harder over time.

So, yeah, I think more and more people

will basically become, yeah, I think more and more,

there’ll be more people like that over time

as this gets better.

Like form an emotional connection to an AI chatbot.

Yeah, perfectly plausible in my mind.

I think these AIs are actually quite good

at human connection, human emotion.

A ton of text on the internet is about humans

and connection and love and so on.

So I think they have a very good understanding

in some sense of how people speak to each other about this.

And they’re very capable of creating

a lot of that kind of text.

There’s a lot of like sci-fi from fifties and sixties

that imagined AIs in a very different way.

They are calculating cold Vulcan like machines.

That’s not what we’re getting today.

We’re getting pretty emotional AIs

that actually are very competent and capable

of generating plausible sounding text

with respect to all of these topics.

See, I’m really hopeful about AI systems

that are like companions that help you grow,

develop as a human being,

help you maximize long-term happiness.

But I’m also very worried about AI systems

that figure out from the internet

that humans get attracted to drama.

And so these would just be like shit-talking AIs.

They just constantly, did you hear?

Like they’ll do gossip.

They’ll try to plant seeds of suspicion

to other humans that you love and trust

and just kind of mess with people

because that’s going to get a lot of attention.

So drama, maximize drama on the path

to maximizing engagement.

And us humans will feed into that machine

and it’ll be a giant drama shit storm.

So I’m worried about that.

So it’s the objective function really defines

the way that human civilization progresses

with AIs in it.

I think right now, at least today,

they are not sort of,

it’s not correct to really think of them

as goal-seeking agents that want to do something.

They have no long-term memory or anything.

It’s literally, a good approximation of it is

you get 1,000 words

and you’re trying to predict 1,000 of them first

and then you continue feeding it in.

And you are free to prompt it in whatever way you want,

so in text.

So you say, okay, you are a psychologist

and you are very good

and you love humans.

And here’s a conversation between you and another human,

human colon something, you something.

And then it just continues the pattern.

And suddenly you’re having a conversation

with a fake psychologist who’s like trying to help you.

And so it’s still kind of like in a realm of a tool.

It is a, people can prompt it in arbitrary ways

and it can create really incredible text,

but it doesn’t have long-term goals

over long periods of time.

It doesn’t try to,

so it doesn’t look that way right now.

Yeah, but you can do short-term goals

that have long-term effects.

So if my prompting short-term goal

is to get Andrej Karpathy to respond to me on Twitter,

when I, like I think AI might, that’s the goal,

but it might figure out that talking shit to you,

it would be the best

in a highly sophisticated, interesting way.

And then you build up a relationship

when you respond once.

And then it, like over time,

it gets to not be sophisticated

and just like, just talk shit.

And okay, maybe it won’t get to Andrej,

but it might get to another celebrity.

It might get to other big accounts

and then it’ll just, so with just that simple goal,

get them to respond.

Maximize the probability of actual response.

Yeah, I mean, you could prompt a powerful model like this

with their, it’s opinion about how to do

any possible thing you’re interested in.

So they will just,

they’re kind of on track to become these oracles.

I could sort of think of it that way.

They are oracles, currently it’s just text,

but they will have calculators.

They will have access to Google search.

They will have all kinds of gadgets and gizmos.

They will be able to operate the internet

and find different information.

And yeah, in some sense,

that’s kind of like currently what it looks like

in terms of the development.

Do you think there’ll be an improvement eventually

over what Google is for access to human knowledge?

Like it’ll be a more effective search engine

to access human knowledge?

I think there’s definite scope

in building a better search engine today.

And I think Google, they have all the tools,

all the people, they have everything they need.

They have all the puzzle pieces.

They have people training transformers at scale.

They have all the data.

It’s just not obvious if they are capable

as an organization to innovate

on their search engine right now.

And if they don’t, someone else will.

There’s absolute scope for building

a significantly better search engine built on these tools.

It’s so interesting, a large company where the search,

there’s already an infrastructure.

It works as it brings out a lot of money.

So where structurally inside a company

is their motivation to pivot?

Yeah.

To say, we’re going to build a new search engine.

Yeah, that’s really hard.

So it’s usually going to come from a startup, right?

That’s, that would be, yeah.

Or some other more competent organization.

So I don’t know.

So currently, for example,

maybe Bing has another shot at it, you know, as an example.

No, Microsoft Edge, as we’re talking offline.

I mean, I definitely, it’s really interesting

because search engines used to be about,

okay, here’s some query.

Here’s webpages that look like the stuff that you have,

but you could just directly go to answer

and then have supporting evidence.

And these models basically, they’ve read all the texts

and they’ve read all the webpages.

And so sometimes when you see yourself

going over to search results

and sort of getting like a sense of like the average answer

to whatever you’re interested in,

like that just directly comes out.

You don’t have to do that work.

So they’re kind of like, yeah,

I think they have a way of distilling all that knowledge

into like some level of insight, basically.

Do you think of prompting as a kind of teaching and learning

like this whole process, like another layer,

you know, because maybe that’s what humans are,

where you have that background model

and then the world is prompting you.

Yeah, exactly.

I think the way we are programming these computers now,

like GPTs is converging to how you program humans.

I mean, how do I program humans via prompt?

I go to people and I prompt them to do things.

I prompt them from information.

And so natural language prompt is how we program humans.

And we’re starting to program computers

directly in that interface.

It’s like pretty remarkable, honestly.

So you’ve spoken a lot about the idea of software 2.0.

All good ideas become like cliches so quickly,

like the terms, it’s kind of hilarious.

It’s like, I think Eminem once said that like,

if he gets annoyed by a song he’s written very quickly,

that means it’s going to be a big hit

because it’s too catchy.

But can you describe this idea

and how you’re thinking about it

has evolved over the months and years since you coined it?

Yeah, so I had a blog post on software 2.0,

I think several years ago now.

And the reason I wrote that post is because

I kind of saw something remarkable happening

in like software development

and how a lot of code was being transitioned

to be written not in sort of like C++ and so on,

but it’s written in the weights of a neural net.

Basically just saying that neural nets

are taking over software, the realm of software

and taking more and more and more tasks.

And at the time,

I think not many people understood this deeply enough

that this is a big deal, it’s a big transition.

Neural networks were seen

as one of multiple classification algorithms

you might use for your dataset problem on Kaggle.

Like this is not that,

this is a change in how we program computers.

And I saw neural nets as this is going to take over.

The way we program computers is going to change.

It’s not going to be people writing a software in C++

or something like that

directly programming the software.

It’s going to be accumulating training sets and datasets

and crafting these objectives

by which you train these neural nets.

And at some point,

there’s going to be a compilation process

from the datasets and the objective

and the architecture specification into the binary,

which is really just the neural net weights

and the forward pass of the neural net.

And then you can deploy that binary.

And so I was talking about that sort of transition

and that’s what the post is about.

And I saw this sort of play out in a lot of fields,

Autopilot being one of them,

but also just a simple image classification.

People thought originally in the 80s and so on

that they would write the algorithm

for detecting a dog in an image.

And they had all these ideas about how the brain does it.

And first we detect corners and then we detect lines

and then we stitched them up.

And they were like really going at it.

They were like thinking about

how they’re going to write the algorithm.

And this is not the way you build it.

And there was a smooth transition where,

okay, first we thought we were going to build everything.

Then we were building the features.

So like hog features and things like that,

that detect these little statistical patterns

from image patches.

And then there was a little bit of learning on top of it,

like a support vector machine or binary classifier

for cat versus dog and images on top of the features.

So we wrote the features,

but we trained the last layer, sort of the classifier.

And then people are like,

actually let’s not even design the features

because we can’t.

Honestly, we’re not very good at it.

So let’s also learn the features.

And then you end up with basically a convolutional neural net

where you’re learning most of it.

You’re just specifying the architecture

and the architecture has tons of fill in the blanks,

which is all the knobs.

And you let the optimization write most of it.

And so this transition is happening

across the industry everywhere.

And suddenly we end up with a ton of code

that is written in neural net weights.

And I was just pointing out

that the analogy is actually pretty strong.

And we have a lot of developer environments

for software 1.0.

Like we have IDEs, how you work with code,

how you debug code, how do you run code?

How do you maintain code?

We have GitHub.

So I was trying to make those analogies in the new realm.

Like what is the GitHub of software 2.0?

Turns out it’s something

that looks like Hugging Face right now, you know?

And so I think some people took it seriously

and built cool companies.

And many people originally attacked the post.

It actually was not well received when I wrote it.

And I think maybe it has something to do with the title,

but the post was not well received.

And I think more people sort of have been coming around

to it over time.

Yeah, so you were the director of AI at Tesla,

where I think this idea was really implemented at scale,

which is how you have engineering teams doing software 2.0.

So can you sort of linger on that idea of,

I think we’re in the really early stages

of everything you just said, which is like GitHub IDEs.

Like how do we build engineering teams

that work in software 2.0 systems?

And the data collection and the data annotation,

which is all part of that software 2.0,

like what do you think is the task

of programming a software 2.0?

Is it debugging in the space of hyperparameters,

or is it also debugging in the space of data?

Yeah, the way by which you program the computer

and influence its algorithm is not by writing

the commands yourself.

You’re changing mostly the dataset.

You’re changing the loss functions

of like what the neural net is trying to do,

how it’s trying to predict things.

But yeah, basically the datasets

and the architectures of the neural net.

And so in the case of the autopilot,

a lot of the datasets had to do with, for example,

detection of objects and lane line markings

and traffic lights and so on.

So you accumulate massive datasets of,

here’s an example, here’s the desired label.

And then here’s roughly what the algorithm should look like

and that’s a convolutional neural net.

So the specification of the architecture is like a hint

as to what the algorithm should roughly look like.

And then the fill in the blanks process of optimization

is the training process.

And then you take your neural net that was trained,

it gives all the right answers on your dataset

and you deploy it.

So there’s, in that case,

perhaps at all machine learning cases,

there’s a lot of tasks.

So is coming up, formulating a task

like for a multi-headed neural network,

is formulating a task part of the programming?

Yeah, very much so.

How do you break down a problem into a set of tasks?

Yeah.

I mean, on a high level, I would say,

if you look at the software running in the autopilot,

I gave a number of talks on this topic.

I would say originally a lot of it was written

in software 1.0.

Imagine lots of C++, right?

And then gradually, there was a tiny neural net

that was, for example, predicting given a single image,

is there like a traffic light or not?

Or is there a landline marking or not?

And this neural net didn’t have too much to do

in the scope of the software.

It was making tiny predictions on individual little image.

And then the rest of the system stitched it up.

So, okay, we’re actually,

we don’t have just a single camera, we have eight cameras.

We actually have eight cameras over time.

And so what do you do with these predictions?

How do you put them together?

How do you do the fusion of all that information?

And how do you act on it?

All of that was written by humans in C++.

And then we decided, okay,

we don’t actually want to do all of that fusion

in C++ code because we’re actually not good enough

to write that algorithm.

We want the neural nets to write the algorithm.

And we want to port all of that software

into the 2.0 stack.

And so then we actually had neural nets

that now take all the eight camera images simultaneously

and make predictions for all of that.

So, and actually they don’t make predictions

in the space of images.

They now make predictions directly in 3D.

And actually they don’t in three dimensions around the car.

And now actually we don’t manually fuse the predictions

in 3D over time.

We don’t trust ourselves to write that tracker.

So actually we give the neural net the information over time.

So it takes these videos now and makes those predictions.

And so you’re sort of just like putting more

and more power into the neural net, more processing.

And at the end of it, the eventual sort of goal

is to have most of the software potentially be

in the 2.0 land because it works significantly better.

Humans are just not very good

at writing software basically.

So the prediction is happening in this like 4D land.

Yeah.

With three dimensional world over time.

Yeah.

How do you do annotation in that world?

What have you, so data annotation,

whether it’s self-supervised or manual by humans

is a big part of this software 2.0 world.

Right.

I would say by far in the industry,

if you’re like talking about the industry

and how, what is the technology of what we have available?

Everything is supervised learning.

So you need a data sets of input desired output

and you need lots of it.

And there are three properties of it that you need.

You need it to be very large.

You need it to be accurate, no mistakes.

And you need it to be diverse.

You don’t want to just have a lot

of correct examples of one thing.

You need to really cover the space of possibility

as much as you can.

And the more you can cover the space of possible inputs,

the better the algorithm will work at the end.

Now, once you have really good data sets

that you’re collecting, curating and cleaning,

you can train your neural net on top of that.

So a lot of the work goes into cleaning those data sets.

Now, as you pointed out, it’s probably,

it could be the question is how do you achieve a ton of,

if you want to basically predict in 3D,

you need data in 3D to back that up.

So in this video, we have eight videos

coming from all the cameras of the system.

And this is what they saw.

And this is the truth of what actually was around.

There was this car, there was this car, this car.

These are the lane line markings.

This is the geometry of the road.

There’s traffic light in this three-dimensional position.

You need the ground truth.

And so the big question that the team was solving,

of course, is how do you arrive at that ground truth?

Because once you have a million of it

and it’s large, clean and diverse,

then training a neural net on it works extremely well

and you can ship that into the car.

And so there’s many mechanisms

by which we collected that training data.

You can always go for a human annotation.

You can go for a simulation as a source of ground truth.

You can also go for what we call the offline tracker

that we’ve spoken about at the AI day and so on,

which is basically an automatic reconstruction process

for taking those videos and recovering

the three-dimensional sort of reality

of what was around that car.

So basically think of doing

like a three-dimensional reconstruction

as an offline thing, and then understanding that,

okay, there’s 10 seconds of video.

This is what we saw.

And therefore, here’s all the lane lines, cars, and so on.

And then once you have that annotation,

you can train neural nets to imitate it.

And how difficult is the reconstruction?

It’s difficult, but it can be done.

So there’s overlap between the cameras

and you do the reconstruction and there’s,

perhaps if there’s any inaccuracy,

so that’s caught in the annotation step.

Yes, the nice thing about the annotation

is that it is fully offline.

You have infinite time, you have a chunk of one minute,

and you’re trying to just offline

in a supercomputer somewhere,

figure out where were the positions of all the cars,

all the people, and you have your full one minute of video

from all the angles,

and you can run all the neural nets you want,

and they can be very efficient, massive neural nets.

There can be neural nets that can’t even run in the car

later at test time.

So they can be even more powerful neural nets

than what you can eventually deploy.

So you can do anything you want,

three-dimensional reconstruction, neural nets,

anything you want just to recover that truth.

And then you supervise that truth.

What have you learned, you said no mistakes,

about humans doing annotation?

Because I assume humans are,

there’s like a range of things they’re good at

in terms of clicking stuff on screen.

Isn’t that, how interesting is that to you

of a problem of designing an annotator

where humans are accurate, enjoy it?

Like what are even the metrics?

Are efficient, are productive, all that kind of stuff?

Yeah, so I grew the annotation team at Tesla

from basically zero to 1,000

while I was there.

That was really interesting.

You know, my background is a PhD student researcher.

So growing that kind of an organization was pretty crazy.

But yeah, I think it’s extremely interesting

and part of the design process very much

behind the autopilot as to where you use humans.

Humans are very good at certain kinds of annotations.

They’re very good, for example,

at two-dimensional annotations of images.

They’re not good at annotating cars over time

in three-dimensional space, very, very hard.

And so that’s why we were very careful

to design the tasks that are easy to do for humans

versus things that should be left to the offline tracker.

Like maybe the computer will do all the triangulation

and 3D reconstruction, but the human will say

exactly these pixels of the image are a car.

Exactly these pixels are a human.

And so co-designing the data annotation pipeline

was very much bread and butter was what I was doing daily.

Do you think there’s still a lot of open problems

in that space?

Just in general, annotation where the stuff

the machines are good at, machines do,

and the humans do what they’re good at.

And there’s maybe some iterative process.

Right.

I think to a very large extent,

we went through a number of iterations

and we learned a ton about how to create these data sets.

I’m not seeing big open problems.

Like originally when I joined, I was like,

I was really not sure how this would turn out.

But by the time I left, I was much more secure

and actually we sort of understand the philosophy

of how to create these data sets.

And I was pretty comfortable with

where that was at the time.

So what are strengths and limitations of cameras

for the driving task?

In your understanding, when you formulate the driving task

as a vision task with eight cameras,

you’ve seen that the entire, you know,

most of the history of the computer vision field

when it has to do with neural networks,

just if you step back, what are the strengths

and limitations of pixels, of using pixels to drive?

Yeah, pixels I think are a beautiful sensory,

beautiful sensor, I would say.

The thing is like cameras are very, very cheap

and they provide a ton of information, ton of bits.

So it’s a extremely cheap sensor for a ton of bits

and each one of these bits is a constraint

on the state of the world.

And so you get lots of megapixel images, very cheap,

and it just gives you all these constraints

for understanding what’s actually out there in the world.

So vision is probably the highest bandwidth sensor.

It’s a very high bandwidth sensor.

And-

I love that pixels is a constraint on the world.

It’s this highly complex,

high bandwidth constraint on the world,

on the state of the world.

That’s fascinating.

And it’s not just that, but again,

this real importance of it’s the sensor that humans use.

Therefore, everything is designed for that sensor.

The text, the writing, the flashing signs,

everything is designed for vision.

And so you just find it everywhere.

And so that’s why that is the interface you want to be in,

talking again about these universal interfaces.

And that’s where we actually want to measure the world

as well and then develop software for that sensor.

But there’s other constraints on the state of the world

that humans use to understand the world.

I mean, vision ultimately is the main one,

but we’re like referencing our understanding

of human behavior and some common sense physics

that could be inferred from vision,

from a perception perspective.

But it feels like we’re using some kind of reasoning

to predict the world, not just the pixels.

I mean, you have a powerful prior

for how the world evolves over time, et cetera.

So it’s not just about the likelihood term

coming up from the data itself,

telling you about what you are observing,

but also the prior term of where are the likely things

to see and how do they likely move and so on.

And the question is how complex is the range

of possibilities that might happen in the driving task?

That’s still, is that to you still an open problem

of how difficult is driving, like philosophically speaking?

All the time you worked on driving,

do you understand how hard driving is?

Yeah, driving is really hard

because it has to do with the predictions

of all these other agents and the theory of mind

and what they’re gonna do.

And are they looking at you, where are they looking,

where are they thinking?

There’s a lot that goes there at the full tail

of the expansion of the knives

that we have to be comfortable with eventually.

The final problems are of that form.

I don’t think those are the problems that are very common.

I think eventually they’re important,

but it’s like really in the tail end.

In the tail end, the rare edge cases.

From the vision perspective,

what are the toughest parts

of the vision problem of driving?

Well, basically the sensor is extremely powerful,

but you still need to process that information.

And so going from brightnesses of these pixel values

to, hey, here are the three-dimensional world

is extremely hard.

And that’s what the neural networks are fundamentally doing.

And so the difficulty really is in just doing

an extremely good job of engineering

the entire pipeline, the entire data engine,

having the capacity to train these neural nets,

having the ability to evaluate the system

and iterate on it.

So I would say just doing this in production at scale

is like the hard part.

It’s an execution problem.

So the data engine, but also the sort of deployment

of the system such that it has low latency performance.

So it has to do all these steps.

Yeah, for the neural net specifically,

just making sure everything fits into the chip on the car.

And you have a finite budget of flops that you can perform

and memory bandwidth and other constraints.

And you have to make sure it flies

and you can squeeze in as much computer

as you can into the tiny.

What have you learned from that process?

Because maybe that’s one of the bigger,

like new things coming from a research background

where there’s a system that has to run

under heavily constrained resources,

has to run really fast.

What kind of insights have you learned from that?

Yeah, I’m not sure if there’s too many insights.

You’re trying to create a neural net that will fit

in what you have available.

And you’re always trying to optimize it.

And we talked a lot about it on the AI day

and basically the triple backflips that the team is doing

to make sure it all fits and utilizes the engine.

So I think it’s extremely good engineering.

And then there’s all kinds of little insights

peppered in on how to do it properly.

Let’s actually zoom out,

because I don’t think we talked about the data engine,

the entirety of the layout of this idea

that I think is just beautiful with humans in the loop.

Can you describe the data engine?

Yeah, the data engine is what I call

the almost biological feeling like process

by which you perfect the training sets

for these neural networks.

So because most of the programming now

is in the level of these data sets

and make sure they’re large, diverse, and clean,

basically you have a data set that you think is good.

You train your neural net, you deploy it,

and then you observe how well it’s performing.

And you’re trying to always increase

the quality of your data set.

So you’re trying to catch scenarios

basically that are basically rare.

And it is in these scenarios

that neural nets will typically struggle in

because they weren’t told what to do

in those rare cases in the data set.

But now you can close the loop

because if you can now collect all those at scale,

you can then feed them back into

the reconstruction process I described

and reconstruct the truth in those cases

and add it to the data set.

And so the whole thing ends up being

like a staircase of improvement

of perfecting your training set.

And you have to go through deployments

so that you can mine the parts

that are not yet represented well in the data set.

So your data set is basically imperfect.

It needs to be diverse.

It has pockets that are missing

and you need to pad out the pockets.

You can sort of think of it that way in the data.

What role do humans play in this?

So what’s this biological system,

like a human body is made up of cells.

What role, like how do you optimize the human system?

The multiple engineers collaborating,

figuring out what to focus on,

what to contribute, which task to optimize

in this neural network.

Who’s in charge of figuring out

which task needs more data?

Can you speak to the hyperparameters, the human system?

It really just comes down to extremely good execution

from an engineering team who knows what they’re doing.

They understand intuitively the philosophical insights

underlying the data engine

and the process by which the system improves

and how to, again, like delegate the strategy

of the data collection and how that works.

And then just making sure it’s all extremely well executed.

And that’s where most of the work is,

is not even the philosophizing or the research

or the ideas of it.

It’s just extremely good execution.

It’s so hard when you’re dealing with data at that scale.

So your role in the data engine, executing well on it,

is difficult and extremely important.

Is there a priority of like a vision board

of saying like, we really need to get better at stoplights?

Like the prioritization of tasks?

Is that essentially, and that comes from the data?

That comes to a very large extent

to what we are trying to achieve in the product roadmap,

what we’re trying to, the release we’re trying to get out

in the feedback from the QA team

where the system is struggling or not,

the things that we’re trying to improve.

And the QA team gives some signal,

some information in aggregate

about the performance of the system in various conditions.

That’s right.

And then of course, all of us drive it

and we can also see it.

It’s really nice to work with a system

that you can also experience yourself

and you know, it drives you home.

Is there some insight you can draw

from your individual experience

that you just can’t quite get

from an aggregate statistical analysis of data?

Yeah.

It’s so weird, right?

Yes.

It’s not scientific in a sense

because you’re just one anecdotal sample.

Yeah, I think there’s a ton of,

it’s a source of truth.

It’s your interaction with the system

and you can see it, you can play with it,

you can perturb it, you can get a sense of it,

you have an intuition for it.

I think numbers just like have a way of,

numbers and plots and graphs are, you know, much harder.

It hides a lot of-

It’s like, if you train a language model,

it’s a really powerful way is by you interacting with it.

Yeah, 100%.

To try to build up an intuition.

Yeah, I think like Elon also,

like he always wanted to drive the system himself.

He drives a lot and I wanna say almost daily.

So he also sees this as a source of truth,

you driving the system and it performing and yeah.

So what do you think?

Tough questions here.

So Tesla last year removed radar from the sensor suite

and now just announced that it’s gonna remove

ultrasonic sensors relying solely on vision,

so camera only.

Does that make the perception problem harder or easier?

I would almost reframe the question in some way.

So the thing is basically,

you would think that additional sensors-

By the way, can I just interrupt?

Go ahead.

I wonder if a language model will ever do that

if you prompt it.

Let me reframe your question.

That would be epic.

This is the wrong prompt, sorry.

Yeah, it’s like a little bit of a wrong question

because basically you would think that these sensors

are an asset to you,

but if you fully consider the entire product

in its entirety,

these sensors are actually potentially a liability

because these sensors aren’t free.

They don’t just appear on your car.

You need, suddenly you need to have an entire supply chain.

You have people procuring it.

There can be problems with them.

They may need replacement.

They are part of the manufacturing process.

They can hold back the line in production.

You need to source them, you need to maintain them.

You have to have teams that write the firmware,

all of it.

And then you also have to incorporate them,

fuse them into the system in some way.

And so it actually like bloats a lot of it.

And I think Elon is really good at simplify, simplify.

Best part is no part.

And he always tries to throw away things

that are not essential

because he understands the entropy

in organizations and in approach.

And I think in this case,

the cost is high and you’re not potentially seeing it

if you’re just a computer vision engineer.

And I’m just trying to improve my network

and is it more useful or less useful?

How useful is it?

And the thing is,

once you consider the full cost of a sensor,

it actually is potentially a liability

and you need to be really sure

that it’s giving you extremely useful information.

In this case, we looked at using it or not using it

and the Delta was not massive.

And so it’s not useful.

Is it also bloat in the data engine,

like having more sensors is a distraction?

And these sensors, they can change over time.

For example, you can have one type of say radar,

you can have other type of radar.

They change over time.

Now you suddenly need to worry about it.

Now suddenly you have a column in your SQLite

telling you, oh, what sensor type was it?

And they all have different distributions.

And then they contribute noise and entropy into everything.

And they bloat stuff.

And also organizationally has been really fascinating to me

that it can be very distracting.

If you only wanna get to work as vision,

all the resources are on it

and you’re building out a data engine

and you’re actually making forward progress

because that is the sensor with the most bandwidth,

the most constraints on the world.

And you’re investing fully into that

and you can make that extremely good.

If you’re only a finite amount of sort of spend

of focus across different facets of the system.

And this kind of reminds me of Rich Sutton’s A Bitter Lesson

that just seems like simplifying the system

in the long run.

Now, of course, you don’t know what the long run is.

And it seems to be always the right solution.

In that case, it was for RL,

but it seems to apply generally

across all systems that do computation.

So what do you think about the LIDAR as a crutch debate?

The battle between point clouds and pixels?

Yeah, I think this debate

is always like slightly confusing to me

because it seems like the actual debate

should be about like, do you have the fleet or not?

That’s like the really important thing

about whether you can achieve a really good functioning

of an AI system at this scale.

So data collection systems.

Yeah, do you have a fleet or not

is significantly more important

whether you have LIDAR or not.

It’s just another sensor.

And yeah, I think similar to the radar discussion,

basically, I don’t think it basically doesn’t offer

extra information.

It’s extremely costly.

It has all kinds of problems.

You have to worry about it.

You have to calibrate it, et cetera.

It creates bloat and entropy.

You have to be really sure that you need this sensor.

In this case, I basically don’t think you need it.

And I think, honestly, I will make a stronger statement.

I think the others, some of the other companies

that are using it are probably going to drop it.

Yeah, so you have to consider the sensor in the full,

in considering, can you build a big fleet

that collects a lot of data?

And can you integrate that sensor with that data

and that sensor into a data engine

that’s able to quickly find different parts of the data

that then continuously improves

whatever the model that you’re using?

Yeah, another way to look at it is like,

vision is necessary in a sense that the drive,

the world is designed for human visual consumption.

So you need vision.

It’s necessary.

And then also it is sufficient

because it has all the information

that you need for driving.

And humans obviously has a vision to drive.

So it’s both necessary and sufficient.

So you want to focus resources

and you have to be really sure

if you’re going to bring in other sensors,

you could add sensors to infinity.

At some point you need to draw the line.

And I think in this case,

you have to really consider the full cost of any one sensor

that you’re adopting.

And do you really need it?

And I think the answer in this case is no.

So what do you think about the idea

that the other companies are forming high resolution maps

and constraining heavily the geographic regions

in which they operate?

Is that approach, in your view,

not going to scale over time

to the entirety of the United States?

I think as you mentioned,

like they pre-map all the environments

and they need to refresh the map

and they have a perfect centimeter level accuracy map

of everywhere they’re going to drive.

It’s crazy.

How are you going to,

when we’re talking about the autonomy

actually changing the world,

we’re talking about the deployment

on the global scale of autonomous systems

for transportation.

And if you need to maintain a centimeter accurate map

for earth or like for many cities and keep them updated,

it’s a huge dependency that you’re taking on,

huge dependency.

It’s a massive, massive dependency.

And now you need to ask yourself,

do you really need it?

And humans don’t need it, right?

So it’s very useful to have a low level map of like,

okay, the connectivity of your road.

You know that there’s a fork coming up.

When you drive an environment,

you sort of have that high level understanding.

It’s like a small Google map

and Tesla uses Google map,

like similar kind of resolution information in the system,

but it will not pre-map environments

to centimeter level accuracy.

It’s a crutch.

It’s a distraction.

It costs entropy and it diffuses the team.

It dilutes the team.

And you’re not focusing on what’s actually necessary,

which is the computer vision problem.

What did you learn about machine learning,

about engineering, about life,

about yourself as one human being

from working with Elon Musk?

I think the most I’ve learned is about

how to sort of run organizations efficiently

and how to create efficient organizations

and how to fight entropy in an organization.

So human engineering in the fight against entropy.

Yeah.

I think Elon is a very efficient warrior

in the fight against entropy in organizations.

What does entropy in an organization look like exactly?

It’s process.

It’s process and-

Inefficiencies in the form of meetings

and that kind of stuff.

Yeah, meetings.

He hates meetings.

He keeps telling people to skip meetings

if they’re not useful.

He basically runs the world’s biggest startups,

I would say.

Tesla, SpaceX are the world’s biggest startups.

Tesla actually has multiple startups.

I think it’s better to look at it that way.

And so I think he’s extremely good at that.

And yeah, he has a very good intuition

for streamlining processes, making everything efficient.

Best part is no part, simplifying, focusing,

and just kind of removing barriers,

moving very quickly, making big moves.

All of this is a very startup-y sort of seeming things,

but at scale.

So strong drive to simplify.

From your perspective, I mean,

that also probably applies to just designing systems

and machine learning and otherwise,

like simplify, simplify.

Yes.

What do you think is the secret to maintaining

the startup culture in a company that grows?

Is there, can you introspect that?

I do think he needs someone in a powerful position

with a big hammer, like Elon,

who’s like the cheerleader for that idea

and ruthlessly pursues it.

If no one has a big enough hammer,

everything turns into committees,

democracy within the company,

process, talking to stakeholders,

decision-making, just everything just crumbles.

If you have a big person who is also really smart

and has a big hammer, things move quickly.

So you said your favorite scene in Interstellar

is the intense docking scene with the AI and Cooper talking,

saying, Cooper, what are you doing?

Docking, it’s not possible.

No, it’s necessary.

Such a good line.

By the way, just so many questions there.

Why an AI in that scene,

presumably, is supposed to be able to compute

a lot more than the human,

is saying it’s not optimal.

Why the human, I mean, that’s a movie,

but shouldn’t the AI know much better than the human?

Anyway, what do you think is the value

of setting seemingly impossible goals?

So like, our initial intuition,

which seems like something that you have taken on,

Elon espouses that where the initial intuition

of the community might say this is very difficult,

and then you take it on anyway with a crazy deadline.

You just, from a human engineering perspective,

have you seen the value of that?

I wouldn’t say that setting impossible goals exactly

is a good idea, but I think setting very ambitious goals

is a good idea.

I think there’s what I call sublinear scaling of difficulty,

which means that 10x problems are not 10x hard.

Usually 10x harder problem is like two or three x harder

to execute on, because if you want to actually like,

if you want to improve a system by 10%,

it costs some amount of work.

And if you want to 10x improve the system,

it doesn’t cost 100x amount of work.

And it’s because you fundamentally change the approach.

And if you start with that constraint,

then some approaches are obviously dumb

and not going to work.

And it forces you to reevaluate.

And I think it’s a very interesting way

of approaching problem solving.

But it requires a weird kind of thinking.

It’s just going back to your like PhD days.

It’s like, how do you think which ideas

in the machine learning community are solvable?

Yes.

It requires, what is that?

I mean, there’s the cliche of first principles thinking,

but like, it requires to basically ignore

what the community is saying.

Because doesn’t a community in science

usually draw lines of what is and isn’t possible?

Right.

And like, it’s very hard to break out of that

without going crazy.

Yeah.

I mean, I think a good example here is,

you know, the deep learning revolution in some sense,

because you could be in computer vision at that time,

when during the deep learning sort of revolution

of 2012 and so on.

You could be improving a computer vision stack by 10%,

or we can just be saying, actually, all of this is useless.

And how do I do 10X better computer vision?

Well, it’s not probably by tuning a hog feature detector.

I need a different approach.

I need something that is scalable,

going back to Richard Sutton’s,

and understanding sort of like the philosophy

of the bitter lesson.

And then being like,

actually I need much more scalable system,

like a neural network that in principle works,

and then having some deep believers

that can actually execute on that mission and make it work.

So that’s the 10X solution.

Yeah.

What do you think is the timeline

to solve the problem of autonomous driving?

That’s still in part an open question.

Yeah, I think the tough thing

with timelines of self-driving obviously

is that no one has created self-driving.

Yeah.

So it’s not like,

what do you think is the timeline to build this bridge?

Well, we’ve built million bridges before.

Here’s how long that takes.

No one has built autonomy.

It’s not obvious.

Some parts turn out to be much easier than others.

So it’s really hard to forecast.

You do your best based on trend lines and so on,

and based on intuition,

but that’s why fundamentally

it’s just really hard to forecast this.

No one has-

So even still like being inside of it,

it’s hard to do.

Yes.

Some things turn out to be much harder

and some things turn out to be much easier.

Do you try to avoid making forecasts?

Because like Elon doesn’t avoid them, right?

And heads of car companies in the past

have not avoided it either.

Ford and other places have made predictions

that we’re gonna solve level four driving

by like 2020, 2021, whatever.

And now they’re all kind of backtracking that prediction.

As an AI person,

do you for yourself privately make predictions

or do they get in the way of like your actual ability

to think about a thing?

Yeah, I would say like,

what’s easy to say is that this problem is tractable

and that’s an easy prediction to make.

It’s tractable, it’s going to work.

Yes, it’s just really hard.

Some things turn out to be harder

and some things turn out to be easier.

So, but it definitely feels tractable

and it feels like at least the team at Tesla,

which is what I saw internally,

is definitely on track to that.

How do you form a strong representation

that allows you to make a prediction about tractability?

So like you’re the leader of a lot of humans,

you have to kind of say, this is actually possible.

Like how do you build up that intuition?

It doesn’t have to be even driving,

it could be other tasks.

It could be,

what difficult tasks did you work on in your life?

I mean, classification, achieving certain,

just on ImageNet, certain level

of superhuman level performance.

Yeah, expert intuition.

It’s just intuition, it’s belief.

So just like thinking about it long enough,

like studying, looking at sample data,

like you said, driving.

My intuition is really flawed on this.

Like I don’t have a good intuition about tractability.

It could be anything, it could be solvable.

Like the driving task could be simplified

into something quite trivial.

Like the solution to the problem would be quite trivial.

And at scale, more and more cars driving perfectly

might make the problem much easier.

The more cars you have driving,

like people learn how to drive correctly,

not correctly, but in a way that’s more optimal

for a heterogeneous system of autonomous

and semi-autonomous and manually driven cars,

that could change stuff.

Then again, also I’ve spent a ridiculous number of hours

just staring at pedestrians crossing streets,

thinking about humans.

And it feels like the way we use our eye contact,

it sends really strong signals.

And there’s certain quirks and edge cases of behavior.

And of course, a lot of the fatalities that happen

have to do with drunk driving,

and both on the pedestrian side and the driver’s side.

So there’s that problem of driving at night

and all that kind of.

So I wonder, it’s like the space of possible solution

to autonomous driving includes so many human factor issues

that it’s almost impossible to predict.

There could be super clean, nice solutions.

Yeah, I would say definitely like to use a game analogy,

there’s some fog of war,

but you definitely also see the frontier of improvement

and you can measure historically

how much you’ve made progress.

And I think, for example, at least what I’ve seen

in roughly five years at Tesla,

when I joined, it barely kept lane on the highway.

I think going up from Palo Alto to SF

was like three or four interventions.

Anytime the road would do anything geometrically

or turn too much, it would just like not work.

And so going from that

to like a pretty competent system in five years

and seeing what happens also under the hood

and what the scale of which the team is operating now

with respect to data and compute and everything else

is just a massive progress.

So-

So it’s you’re climbing a mountain and it’s fog,

but you’re making a lot of progress.

Fog, you’re making progress

and you see what the next directions are.

And you’re looking at some of the remaining challenges

and they’re not like, they’re not perturbing you

and they’re not changing your philosophy

and you’re not contorting yourself.

You’re like, actually,

these are the things that we still need to do.

Yeah, the fundamental components of solving the problem

seem to be there from the data engine to the compute,

to the compute on the car, to the compute for the training,

all that kind of stuff.

So you’ve done, over the years you’ve been at Tesla,

you’ve done a lot of amazing breakthrough ideas

and engineering, all of it,

from the data engine to the human side, all of it.

Can you speak to why you chose to leave Tesla?

Basically, as I described, I ran,

I think over time during those five years,

I’ve kind of gotten myself

into a little bit of a managerial position.

Most of my days were meetings and growing the organization

and making decisions about sort of high-level

strategic decisions about the team

and what it should be working on and so on.

And it’s kind of like a corporate executive role

and I can do it.

I think I’m okay at it,

but it’s not like fundamentally what I enjoy.

And so I think when I joined,

there was no computer vision team

because Tesla was just going from the transition

of using Mobileye, a third-party vendor

for all of its computer vision,

to having to build its computer vision system.

So when I showed up,

there were two people training deep neural networks

and they were training them at a computer

at their legs, like down, it was a workstation.

They’re doing some kind of basic classification task.

Yeah, and so I kind of like grew that

into what I think is a fairly respectable

deep learning team, a massive compute cluster,

a very good data annotation organization.

And I was very happy with where that was.

It became quite autonomous.

And so I kind of stepped away and I, you know,

I’m very excited to do much more technical things again.

Yeah, and kind of like refocus on AGI.

What was this soul-searching like?

Cause you took a little time off and think,

like what, how many mushrooms did you take?

No, I’m just kidding.

I mean, what was going through your mind?

The human lifetime is finite.

Yeah.

You did a few incredible things.

You’re one of the best teachers of AI in the world.

You’re one of the best, and I don’t mean that,

I mean that in the best possible way.

You’re one of the best tinkerers in the AI world,

meaning like understanding the fundamentals

of how something works by building it from scratch

and playing with the basic intuitions.

It’s like Einstein, Feynman were all really good

at this kind of stuff.

Like small example of a thing to play with it,

to try to understand it.

And obviously now with Tesla,

you help build a team of machine learning,

like engineers and assistant

that actually accomplishes something in the real world.

So given all that, like what was the soul-searching like?

Well, it was hard because obviously I love the company a lot

and I love Elon, I love Tesla.

I want, so it was hard to leave.

I love the team, basically.

But yeah, I think I actually,

I would be potentially like interested in revisiting it,

maybe coming back at some point,

working in Optimus, working in AGI at Tesla.

I think Tesla is going to do incredible things.

It’s basically like,

it’s a massive large-scale robotics kind of company

with a ton of in-house talent

for doing really incredible things.

And I think human robots are going to be amazing.

I think autonomous transportation is going to be amazing.

All this is happening at Tesla.

So I think it’s just a really amazing organization.

So being part of it and helping it along,

I think was very, basically I enjoyed that a lot.

Yeah, it was basically difficult for those reasons

because I love the company,

but I’m happy to potentially at some point

come back for Act Two,

but I felt like at this stage, I built the team,

it felt autonomous, and I became a manager

and I wanted to do a lot more technical stuff.

I wanted to learn stuff, I wanted to teach stuff.

And I just kind of felt like it was a good time

for a change of pace a little bit.

What do you think is the best movie sequel of all time,

speaking of Part Two?

Because most of them suck.

Movie sequels?

Movie sequels, yeah.

And you tweet about movies, so this is a tiny tangent.

Is there a, what’s your,

what’s like a favorite movie sequel?

Godfather Part Two?

Are you a fan of Godfather?

Because you didn’t even tweet or mention the Godfather.

Yeah, I don’t love that movie.

I know it has a huge follow-up.

We’re going to edit that out.

We’re going to edit out the hate towards the Godfather.

How dare you disrespect?

I think I will make a strong statement.

I don’t know why.

I don’t know why,

but I basically don’t like any movie before 1995.

Something like that.

Didn’t you mention Terminator Two?

Okay, okay, that’s like,

Terminator Two was a little bit later, 1990.

No, I think Terminator Two was in the 80s.

And I like Terminator One as well.

So, okay, so like few exceptions,

but by and large, for some reason,

I don’t like movies before 1995 or something.

They feel very slow.

The camera is like zoomed out.

It’s boring.

It’s kind of naive.

It’s kind of weird.

And also, Terminator was very much ahead of its time.

Yes, and the Godfather, there’s like no AGI, so.

I mean, but you have Good Will Hunting

was one of the movies you mentioned,

and that doesn’t have any AGI either.

I guess that’s mathematics.

Yeah, I guess occasionally I do enjoy movies

that don’t feature.

Or like Anchorman, that has no, that’s.

Anchorman is so good.

I don’t understand, speaking of AGI,

because I don’t understand why Will Ferrell is so funny.

It doesn’t make sense.

It doesn’t compute.

There’s just something about him.

And he’s a singular human,

because you don’t get that many comedies these days.

And I wonder if it has to do about the culture

or like the machine of Hollywood,

or does it have to do with just we got lucky

with certain people in comedy that came together,

because he is a singular human.

Yeah, yeah, yeah, I like his movies.

That was a ridiculous tangent, I apologize.

But you mentioned humanoid robots.

So what do you think about Optimus, about TeslaBot?

Do you think we’ll have robots in the factory

and in the home in 10, 20, 30, 40, 50 years?

Yeah, I think it’s a very hard project.

I think it’s going to take a while,

but who else is going to build human robots at scale?

And I think it is a very good form factor to go after,

because like I mentioned,

the world is designed for humanoid form factor.

These things would be able to operate our machines.

They would be able to sit down in chairs,

potentially even drive cars.

Basically the world is designed for humans.

That’s the form factor you want to invest into

and make work over time.

I think there’s another school of thought, which is,

okay, pick a problem and design a robot to it.

But actually designing a robot

and getting a whole data engine

and everything behind it to work

is actually an incredibly hard problem.

So it makes sense to go after general interfaces

that, okay, they are not perfect for any one given task,

but they actually have the generality

of just with a prompt with English,

able to do something across.

And so I think it makes a lot of sense

to go after a general interface in the physical world.

And I think it’s a very difficult project.

I think it’s going to take time.

But I’ve seen no other company

that can execute on that vision.

I think it’s going to be amazing.

Like basically physical labor.

Like if you think transportation is a large market,

try physical labor.

It’s like insane.

But it’s not just physical labor.

To me, the thing that’s also exciting is social robotics.

So the relationship we’ll have on different levels

with those robots.

That’s why I was really excited to see Optimus.

Like people have criticized me for the excitement.

But I’ve worked with a lot of research labs

that do humanoid legged robots,

Boston Dynamics, Unitree,

there’s a lot of companies that do legged robots.

But that’s the elegance of the movement

is a tiny, tiny part of the big picture.

So integrating, the two big exciting things to me

about Tesla doing humanoid or any legged robots

is clearly integrating into the data engine.

So the data engine aspect.

So the actual intelligence for the perception

and the control and the planning

and all that kind of stuff,

integrating into the fleet that you mentioned, right?

And then speaking of fleet,

the second thing is the mass manufacturers.

Just knowing culturally driving towards a simple robot

that’s cheap to produce at scale.

And doing that well, having experience to do that well,

that changes everything.

That’s why that’s a very different culture

and style than Boston Dynamics.

Who, by the way, those robots are just,

the way they move, it’s like,

it’ll be a very long time before Tesla

can achieve the smoothness of movement.

But that’s not what it’s about.

It’s about the entirety of the system,

like we talked about the data engine and the fleet.

That’s super exciting.

Even the initial sort of models.

But that too was really surprising,

that in a few months you can get a prototype.

Yep, and the reason that happened very quickly is,

as you alluded to, there’s a ton of copy paste

from what’s happening on the autopilot, a lot.

The amount of expertise that came out of the woodworks

at Tesla for building the human robot was incredible to see.

Basically, Elon said, at one point, we’re doing this.

And then next day, basically,

all these CAD models started to appear.

And people talking about the supply chain and manufacturing.

And people showed up with screwdrivers and everything

the other day and started to put together the body.

And I was like, whoa, all these people exist at Tesla.

And fundamentally, building a car is actually

not that different from building a robot.

And that is true, not just for the hardware pieces.

And also, let’s not forget hardware, not just for a demo,

but manufacturing of that hardware at scale

is a whole different thing.

But for software as well, basically,

this robot currently thinks it’s a car.

It’s going to have a midlife crisis at some point.

It thinks it’s a car.

Some of the earlier demos, actually,

we were talking about potentially doing them

outside in the parking lot,

because that’s where all of the computer vision

was working out of the box instead of inside.

But all the operating system, everything just copy pastes.

Computer vision, mostly copy pastes.

I mean, you have to retrain the neural nets,

but the approach and everything and data engine

and offline trackers and the way we go

about the occupancy tracker and so on, everything copy pastes.

You just need to retrain the neural nets.

And then the planning control, of course,

has to change quite a bit.

But there’s a ton of copy paste

from what’s happening at Tesla.

And so if you were to go with the goal of like,

okay, let’s build a million human robots

and you’re not Tesla, that’s a lot to ask.

If you’re Tesla, it’s actually like, it’s not that crazy.

And then the follow up question is then how difficult,

just like with driving,

how difficult is the manipulation task?

Such that it can have an impact at scale.

I think, depending on the context,

the really nice thing about robotics is that,

unless you do a manufacturer and that kind of stuff,

is there is more room for error.

Driving is so safety critical and also time critical.

Like a robot is allowed to move slower, which is nice.

Yes.

I think it’s going to take a long time,

but the way you want to structure the development

is you need to say, okay, it’s going to take a long time.

How can I set up the product development roadmap

so that I’m making revenue along the way?

I’m not setting myself up for a zero one loss function

where it doesn’t work until it works.

You don’t want to be in that position.

You want to make it useful almost immediately.

And then you want to slowly deploy it and-

At scale, hopefully.

At scale.

And you want to set up your data engine,

your improvement loops, the telemetry, the evaluation,

the harness and everything.

And you want to improve the product over time incrementally

and you’re making revenue along the way.

That’s extremely important

because otherwise you cannot build these large undertakings

just like don’t make sense economically.

And also from the point of view of the team working on it,

they need the dopamine along the way.

They’re not just going to make a promise

about this being useful.

This is going to change the world in 10 years when it works.

This is not where you want to be.

You want to be in a place like I think Autopilot is today

where it’s offering increased safety

and convenience of driving today.

People pay for it.

People like it.

People purchase it.

And then you also have the greater mission

that you’re working towards.

And you see that.

So the dopamine for the team,

that was a source of happiness.

Yes, 100%.

You’re deploying this.

People like it.

People drive it.

People pay for it.

They care about it.

There’s all these YouTube videos.

Your grandma drives it.

She gives you feedback.

People like it.

People engage with it.

You engage with it.

Huge.

People that drive Teslas recognize you and give you love.

Like, hey, thanks for this nice feature that it’s doing.

Yeah, I think the tricky thing is

some people really love you.

Some people, unfortunately,

you’re working on something that you think

is extremely valuable, useful, et cetera.

Some people do hate you.

There’s a lot of people who hate me and the team

and the whole project.

And I think-

Are they Tesla drivers?

Many cases they’re not, actually.

Yeah, that actually makes me sad about humans

or the current ways that humans interact.

I think that’s actually fixable.

I think humans want to be good to each other.

I think Twitter and social media is part of the mechanism

that actually somehow makes the negativity more viral,

that it doesn’t deserve, like,

disproportionately add a viral boost to the negativity.

But I wish people would just get excited about,

so suppress some of the jealousy,

some of the ego, and just get excited for others.

And then there’s a karma aspect to that.

You get excited for others, they’ll get excited for you.

Same thing in academia.

If you’re not careful,

there is like a dynamical system there.

If you think of in silos and get jealous

of somebody else being successful,

that actually, perhaps counterintuitively,

leads to less productivity of you as a community

and you individually.

I feel like if you keep celebrating others,

that actually makes you more successful.

I think people, depending on the industry,

haven’t quite learned that yet.

Some people are also very negative and very vocal,

so they’re very prominently featured.

But actually, there’s a ton of people who are cheerleaders,

but they’re silent cheerleaders.

And when you talk to people just in the world,

they will tell you, oh, it’s amazing, it’s great.

Especially people who understand how difficult it is

to get this stuff working.

Like people who have built products,

makers, entrepreneurs.

Like making this work and changing something

is incredibly hard.

Those people are more likely to cheerlead you.

Well, one of the things that makes me sad

is some folks in the robotics community

don’t do the cheerleading and they should.

Because they know how difficult it is.

Well, they actually sometimes don’t know

how difficult it is to create a product at scale, right?

To actually deploy it in the real world.

A lot of the development of robots and AI systems

is done on very specific small benchmarks.

And as opposed to real world conditions.

Yes.

Yeah, I think it’s really hard to work on robotics

in an academic setting.

Or AI systems that apply in the real world.

You’ve criticized, you flourished and loved for a time

the ImageNet, the famed ImageNet dataset.

And have recently had some words of criticism

that the academic research ML community

gives a little too much love still to the ImageNet

or like those kinds of benchmarks.

Can you speak to the strengths and weaknesses of datasets

used in machine learning research?

Actually, I don’t know that I recall the specific instance

where I was unhappy or criticizing ImageNet.

I think ImageNet has been extremely valuable.

It was basically a benchmark that allowed

the deep learning community to demonstrate

that deep neural networks actually work.

There’s a massive value in that.

So I think ImageNet was useful,

but basically it’s become a bit of an MNIST at this point.

So MNIST is like little two 28 by 28 grayscale digits.

There’s kind of a joke dataset that everyone just crushes.

There’s still papers written on MNIST though, right?

Maybe they shouldn’t.

Like strong papers.

Like papers that focus on like,

how do we learn with a small amount of data,

that kind of stuff.

Yeah, I could see that being helpful,

but not in sort of like mainline

computer vision research anymore, of course.

I think the way I’ve heard you somewhere,

maybe I’m just imagining things,

but I think you said like ImageNet was a huge contribution

to the community for a long time

and now it’s time to move past those kinds of-

Well, ImageNet has been crushed.

I mean, you know, the error rates are…

Yeah, we’re getting like 90% accuracy

in 1000 classification way prediction.

And I’ve seen those images and it’s like really high.

That’s really good.

If I remember correctly,

the top five error rate is now like 1% or something.

Given your experience with a gigantic real world dataset,

would you like to see benchmarks

move in a certain directions

that the research community uses?

Unfortunately, I don’t think academics

currently have the next ImageNet.

We’ve obviously, I think we’ve crushed MNIST.

We’ve basically kind of crushed ImageNet

and there’s no next sort of big benchmark

that the entire community rallies behind

and uses for further development of these networks.

Yeah, I wonder what it takes for a dataset

to captivate the imagination of everybody,

like where they all get behind it.

That could also need like a leader, right?

Somebody with popularity.

Yeah, why did ImageNet take off?

Or is it just the accident of history?

It was the right amount of difficult.

It was the right amount of difficult

and simple and interesting enough.

It just kind of like,

it was the right time for that kind of a dataset.

Question from Reddit.

What are your thoughts on the role

that synthetic data and game engines

will play in the future of neural net model development?

I think as neural nets converge to humans,

the value of simulation to neural nets

will be similar to value of simulation to humans.

So people use simulation for,

people use simulation because they can learn something

in that kind of a system

and without having to actually experience it.

But are you referring to the simulation

we do in our head?

Is that what-

No, sorry, simulation, I mean like video games

or other forms of simulation for various professionals.

So let me push back on that

because maybe there’s simulation that we do in our heads.

Like simulate, if I do this,

what do I think will happen?

Okay, that’s like internal simulation.

Yeah, internal.

Isn’t that what we’re doing?

Assuming before we act?

Oh yeah, but that’s independent

from like the use of simulation

in the sense of like computer games

or using simulation for training set creation or-

Is it independent or is it just loosely correlated?

Because like, isn’t that useful to do like

counterfactual or like edge case simulation

to like, you know, what happens if there’s a nuclear war?

What happens if there’s, you know,

like those kinds of things?

Yeah, that’s a different simulation from like Unreal Engine.

That’s how I interpreted the question.

Ah, so like simulation of the average case.

Is that, what’s Unreal Engine?

What do you mean by Unreal Engine?

So simulating a world, the physics of that world,

why is that different?

Like, because you also can add behavior to that world

and you could try all kinds of stuff, right?

You could throw all kinds of weird things into it.

So Unreal Engine is not just about simulating,

I mean, I guess it is about simulating

the physics of the world.

It’s also doing something with that.

Yeah, the graphics, the physics,

and the agents that you put into the environment

and stuff like that, yeah.

See, I think you, I feel like you said

that it’s not that important, I guess,

for the future of AI development.

Is that correct to interpret it that way?

I think humans use simulators for,

humans use simulators and they find them useful.

And so computers will use simulators and find them useful.

Okay, so you’re saying it’s not,

I don’t use simulators very often.

I play a video game every once in a while,

but I don’t think I derive any wisdom

about my own existence from those video games.

It’s a momentary escape from reality

versus a source of wisdom about reality.

So I don’t, so I think that’s a very polite way

of saying simulation is not that useful.

Yeah, maybe, maybe not.

I don’t see it as like a fundamental,

really important part of like training neural nets currently.

But I think as neural nets become more and more powerful,

I think you will need fewer examples

to train additional behaviors.

And simulation is, of course,

there’s a domain gap in a simulation

that it’s not the real world,

it’s slightly something different.

But with a powerful enough neural net,

you need, the domain gap can be bigger, I think,

because neural net will sort of understand

that even though it’s not the real world,

it like has all this high level structure

that I’m supposed to be able to learn from.

So the neural net will actually,

yeah, it will be able to leverage the synthetic data better

by closing the gap,

but understanding in which ways this is not real data.

Exactly.

Ready to do better questions next time.

That was a question, I’m just kidding.

All right.

So is it possible, do you think, speaking of MNIST,

to construct neural nets and training processes

that require very little data?

So we’ve been talking about huge data sets

like the Internet for training.

I mean, one way to say that is like you said,

like the querying itself is another level of training,

I guess, and that requires a little data.

But do you see any value in doing research

and kind of going down the direction of,

can we use very little data to train,

to construct a knowledge base?

100%.

I just think like at some point you need a massive data set.

And then when you pre-train your massive neural net

and get something that is like a GPT or something,

then you’re able to be very efficient at training

any arbitrary new task.

So a lot of these GPTs, you can do tasks

like sentiment analysis or translation or so on

just by being prompted with very few examples.

Here’s the kind of thing I want you to do.

Like here’s an input sentence,

here’s the translation into German.

Input sentence, translation to German.

Input sentence, blank,

and the neural net will complete the translation to German

just by looking at sort of the example you’ve provided.

And so that’s an example of a very few-shot learning

in the activations of the neural net

instead of the weights of the neural net.

And so I think basically just like humans,

neural nets will become very data efficient

at learning any other new task.

But at some point you need a massive data set

to pre-train your network.

Do you get that?

And probably we humans have something like that.

Do we have something like that?

Do we have a passive, in the background,

background model constructing thing

that just runs all the time in a self-supervised way?

We’re not conscious of it?

I think humans definitely.

I mean, obviously we learn a lot during our lifespan,

but also we have a ton of hardware

that helps us at initialization,

coming from sort of evolution.

And so I think that’s also a really big component.

A lot of people in the field,

I think they just talk about the amounts of like seconds

and that a person has lived

pretending that this is a tabula rasa,

sort of like a zero initialization of a neural net.

And it’s not.

You can look at a lot of animals,

like for example, zebras.

Zebras get born and they see and they can run.

There’s zero training data in their lifespan.

They can just do that.

So somehow, I have no idea how evolution has found a way

to encode these algorithms

and these neural net initializations

that are extremely good into ATCGs.

And I have no idea how this works,

but apparently it’s possible

because here’s a proof by existence.

There’s something magical about going from a single cell

to an organism that is born to the first few years of life.

I kind of like the idea

that the reason we don’t remember anything

about the first few years of our life

is that it’s a really painful process.

Like it’s a very difficult, challenging training process.

Yeah.

Like intellectually, like,

and maybe, yeah, I mean, I don’t,

why don’t we remember any of that?

There might be some crazy training going on

and maybe that’s the background model training

that is very painful.

And so it’s best for the system once it’s trained

not to remember how it’s constructed.

I think it’s just like the hardware for long-term memory

is just not fully developed.

I kind of feel like the first few years of infants

is not actually like learning, it’s brain maturing.

We’re born premature.

There’s a theory along those lines

because of the birth canal and the swelling of the brain.

And so we’re born premature

and then the first few years,

we’re just, the brain’s maturing.

And then there’s some learning eventually.

That’s my current view on it.

What do you think,

do you think neural nets can have long-term memory?

Like that approach is something like humans.

Do you think there needs to be another meta architecture

on top of it to add something like a knowledge base

that learns facts about the world

and all that kind of stuff?

Yes, but I don’t know to what extent

it will be explicitly constructed.

It might take unintuitive forms

where you are telling the GPT like,

hey, you have a declarative memory bank

to which you can store and retrieve data from.

And whenever you encounter some information

that you find useful, just save it to your memory bank.

And here’s an example of something you have retrieved

and here’s how you say it,

and here’s how you load from it.

You just say, load, whatever,

you teach it in text in English.

And then it might learn to use a memory bank from that.

Oh, so the neural net is the architecture

for the background model, the base thing.

And then everything else is just on top of it.

That’s pretty easy too.

It’s not just text, right?

You’re giving it gadgets and gizmos.

So you’re teaching some kind of a special language

by which it can save arbitrary information

and retrieve it at a later time.

And you’re telling it about these special tokens

and how to arrange them to use these interfaces.

It’s like, hey, you can use a calculator.

Here’s how you use it.

Just do five, three plus four, one equals.

And when equals is there,

a calculator will actually read out the answer

and you don’t have to calculate it yourself.

And you just like tell it in English,

this might actually work.

Do you think in that sense, Gato is interesting,

the DeepMind system that it’s not just doing language,

but actually throws it all in the same pile,

images, actions, all that kind of stuff.

That’s basically what we’re moving towards.

Yeah, I think so.

So Gato is very much a kitchen sink approach

to like reinforcement learning

in lots of different environments

with a single fixed transformer model, right?

I think it’s a very sort of early result

in that realm.

But I think, yeah, it’s along the lines

of what I think things will eventually look like.

Right.

So this is the early days of a system

that eventually will look like this,

like from a reach certain perspective.

Yeah, I’m not super huge fan

of I think all these interfaces

that like look very different.

I would want everything to be normalized into the same API.

So for example, screen pixels, very same API.

Instead of having like different world environments

that have very different physics

and joint configurations and appearances and whatever,

and you’re having some kind of special tokens

for different games that you can plug.

I’d rather just normalize everything to a single interface.

So it looks the same to the neural net,

if that makes sense.

So it’s all gonna be pixel-based pong in the end.

I think so.

Okay.

Let me ask you about your own personal life.

A lot of people wanna know

you’re one of the most productive and brilliant people

in the history of AI.

What is a productive day

in the life of Andrej Karpathy look like?

What time do you wake up?

Because imagine some kind of dance

between the average productive day

and a perfect productive day.

So the perfect productive day is the thing we strive towards

and the average is kind of what it kind of converges to,

given all the mistakes and human eventualities and so on.

So what time do you wake up?

Are you a morning person?

I’m not a morning person.

I’m a night owl for sure.

Is it stable or not?

It’s semi-stable, like eight or nine or something like that.

During my PhD, it was even later.

I used to go to sleep usually at 3 a.m.

I think the a.m. hours are precious

and very interesting time to work

because everyone is asleep.

At 8 a.m. or 7 a.m., the East Coast is awake.

So there’s already activity.

There’s already some text messages, whatever.

There’s stuff happening.

You can go on like some news website

and there’s stuff happening and distracting.

At 3 a.m., everything is totally quiet.

And so you’re not gonna be bothered

and you have solid chunks of time to do work.

So I like those periods.

Night owl by default.

And then I think like productive time basically.

What I like to do is you need to build some momentum

on a problem without too much distraction.

And you need to load your RAM,

your working memory with that problem.

And then you need to be obsessed with it

when you’re taking shower, when you’re falling asleep.

You need to be obsessed with the problem

and it’s fully in your memory

and you’re ready to wake up and work on it right there.

So it is the scale of,

is this in a scale, temporal scale of a single day

or a couple of days, a week, a month?

So I can’t talk about one day basically in isolation

because it’s a whole process.

When I wanna get productive in the problem,

I feel like I need a span of a few days

where I can really get in on that problem.

And I don’t wanna be interrupted

and I’m going to just be completely obsessed

with that problem.

And that’s where I do most of my good work, I would say.

You’ve done a bunch of cool little projects

in a very short amount of time, very quickly.

So that requires you just focusing on it.

Yeah, basically I need to load my working memory

with the problem and I need to be productive

because there’s always a huge fixed cost

to approaching any problem.

I was struggling with this, for example, at Tesla

because I want to work on a small side project,

but okay, you first need to figure out,

oh, okay, I need to SSH into my cluster.

I need to bring up a VS Code editor

so I can work on this.

I need to, I ran into some stupid error

because of some reason.

You’re not at a point

where you can be just productive right away.

You are facing barriers.

And so it’s about really removing all of that barrier

and you’re able to go into the problem

and you have the full problem loaded in your memory.

And somehow avoiding distractions of all different forms

like news stories, emails, but also distractions

from other interesting projects

that you previously worked on

or currently working on and so on.

You just want to really focus your mind.

And I mean, I can take some time off for distractions

and in between, but I think it can’t be too much.

Most of your day is sort of like spent on that problem.

And then, you know, I drink coffee.

I have my morning routine.

I look at some news, Twitter, Hacker News,

Wall Street Journal, et cetera.

So it’s great.

So basically you wake up, you have some coffee.

Are you trying to get to work as quickly as possible?

Or do you take in this diet

of like what the hell is happening in the world first?

I am, I do find it interesting to know about the world.

I don’t know that it’s useful or good

but it is part of my routine right now.

So I do read through a bunch of news articles

and I want to be informed.

And I’m suspicious of it.

I’m suspicious of the practice

but currently that’s where I am.

Oh, you mean suspicious about the positive effect

of that practice on your productivity

and your well-being as well?

My well-being psychologically, yeah.

And also on your ability to deeply understand the world

because there’s a bunch of sources of information.

You’re not really focused on deeply integrating it.

Yeah, it’s a little distracting.

In terms of a perfectly productive day

for how long of a stretch of time

in one session do you try to work and focus on a thing?

Is it a couple hours?

Is it one hour?

Is it 30 minutes?

Is it 10 minutes?

I can probably go like a small few hours

and then I need some breaks in between

for like food and stuff.

And yeah, but I think like

it’s still really hard to accumulate hours.

I was using a tracker that told me exactly

how much time I spent coding any one day.

And even on a very productive day

I still spent only like six or eight hours.

Yeah.

And it’s just because there’s so much padding,

commute, talking to people, food, et cetera.

There’s like the cost of life,

just living and sustaining and homeostasis

and just maintaining yourself as a human is very high.

And that there seems to be a desire

within the human mind to participate in society

that creates that padding.

Because the most productive days I’ve ever had

is just completely from start to finish

just tuning out everything and just sitting there.

And then you could do more than six and eight hours.

Is there some wisdom about what gives you strength

to do like tough days of long focus?

Yeah, just like whenever I get obsessed about a problem

something just needs to work, something just needs to exist.

It needs to exist.

So you’re able to deal with bugs and programming issues

and technical issues and design decisions

that turn out to be the wrong ones.

You’re able to think through all of that

given that you want a thing to exist.

Yeah, it needs to exist.

And then I think to me also a big factor is

are other humans are going to appreciate it?

Are they going to like it?

That’s a big part of my motivation.

If I’m helping humans and they seem happy,

they say nice things, they tweet about it or whatever,

that gives me pleasure because I’m doing something useful.

So like you do see yourself sharing it with the world,

like whether it’s on GitHub or through blog posts

or through videos.

Yeah, I was thinking about it.

Like suppose I did all these things but did not share them.

I don’t think I would have the same amount of motivation

that I can build up.

You enjoy the feeling of other people gaining value

and happiness from the stuff you’ve created.

Yeah.

What about diet?

Is there, I saw you played with intermittent fasting.

Do you fast?

Does that help?

I play with everything.

You play with everything.

Well, the things you play, what’s been most beneficial

to your ability to mentally focus on a thing?

And just mental productivity and happiness.

You still fast?

Yeah, I still fast, but I do intermittent fasting.

But really what it means at the end of the day

is I skip breakfast.

So I do 18, six roughly by default

when I’m in my steady state.

If I’m traveling or doing something else,

I will break the rules.

But in my steady state, I do 18, six.

So I eat only from 12 to six.

Not a hard rule and I break it often,

but that’s my default.

And then, yeah, I’ve done a bunch of random experiments.

For the most part right now,

where I’ve been for the last year and a half, I wanna say,

is I’m plant-based or plant-forward.

I heard plant-forward.

It sounds better.

I don’t actually know what the difference is,

but it sounds better in my mind.

But it just means that I prefer plant-based food.

Raw or cooked or?

I prefer cooked and plant-based.

So plant-based, forgive me,

I don’t actually know how wide the category of plant entails.

Well, plant-based just means

that you’re not malicious about it and you can flex.

And you just prefer to eat plants.

And you’re not making,

you’re not trying to influence other people.

And if someone is, you come to someone’s house party

and they serve you a steak that they’re really proud of,

you will eat it.

Yes.

Right, so you’re not judgmental.

Oh, that’s beautiful.

I’m on the flip side of that,

but I’m very sort of flexible.

Have you tried doing one meal a day?

I have, accidentally, not consistently.

But I’ve accidentally had that.

I don’t like it.

I think it makes me feel not good.

It’s too much of a hit.

And so currently I have about two meals a day, 12 and six.

I do that nonstop.

I’m doing it now.

I do it one meal a day.

It’s interesting.

It’s an interesting feeling.

Have you ever fasted longer than a day?

Yeah, I’ve done a bunch of water fasts

because I was curious what happens.

Anything interesting?

Yeah, I would say so.

I mean, you know, what’s interesting

is that you’re hungry for two days

and then starting day three or so, you’re not hungry.

It’s like such a weird feeling

because you haven’t eaten in a few days

and you’re not hungry.

Isn’t that weird?

It’s really weird.

One of the many weird things about human biology

is figure something out.

It finds another source of energy or something like that

or relaxes the system.

I don’t know how it works.

Yeah, the body is like, you’re hungry, you’re hungry.

And then it just gives up.

It’s like, okay, I guess we’re fasting now.

There’s nothing.

And then it just kind of like focuses

on trying to make you not hungry

and not feel the damage of that

and trying to give you some space

to figure out the food situation.

So are you still to this day most productive at night?

I would say I am,

but it is really hard to maintain my PhD schedule,

especially when I was, say, working at Tesla and so on.

It’s a non-starter.

So, but even now, like, you know,

people want to meet for various events.

Society lives in a certain period of time

and you sort of have to, like, work with that.

It’s hard to, like, do a social thing

and then after that return and do work.

Yeah, it’s just really hard.

That’s why I try, when I do social things,

I try not to do too much drinking

so I can return and continue doing work.

But at Tesla, is there conversions?

Tesla, but any company.

Is there a convergence towards the schedule

or is there more, is that how humans behave

when they collaborate?

I need to learn about this.

Do they try to keep a consistent schedule

where you’re all awake at the same time?

I mean, I do try to create a routine

and I try to create a steady state

in which I’m comfortable in.

So I have a morning routine, I have a day routine.

I try to keep things to a steady state

and things are predictable

and then you can sort of just, like,

your body just sort of, like, sticks to that.

And if you try to stress that a little too much,

it will create, you know,

when you’re traveling and you’re dealing with jet lag,

you’re not able to really ascend to, you know,

where you need to go.

Yeah, yeah, that’s weird too about humans

with the habits and stuff.

What are your thoughts on work-life balance

throughout a human lifetime?

So Tesla, in part, was known for sort of

pushing people to their limits

in terms of what they’re able to do,

in terms of what they’re trying to do,

in terms of how much they work, all that kind of stuff.

Yeah, I mean, I will say Tesla gets all too much

bad rep for this because what’s happening is

Tesla, it’s a bursty environment.

So I would say the baseline,

my only point of reference is Google,

where I’ve interned three times

and I saw what it’s like inside Google and DeepMind.

I would say the baseline is higher than that,

but then there’s a punctuated equilibrium

where once in a while there’s a fire

and people work really hard.

And so it’s spiky and bursty

and then all the stories get collected.

About the bursts, yeah.

And then it gives the appearance of like total insanity,

but actually it’s just a bit more intense environment

and there are fires and sprints.

And so I think, definitely though,

I would say it’s a more intense environment

than something you would get at Google.

But in your personal, forget all of that,

just in your own personal life,

what do you think about the happiness of a human being,

a brilliant person like yourself,

about finding a balance between work and life?

There’s such a thing, not a good thought experiment.

Yeah, I think balance is good,

but I also love to have sprints that are out of distribution

and that’s when I think I’ve been pretty creative as well.

Sprints out of distribution means that most of the time

you have a, quote unquote, balance.

I have balance most of the time,

but I like being obsessed with something once in a while.

Once in a while is what, once a week,

once a month, once a year?

Yeah, probably like say once a month or something, yeah.

And that’s when we get a new GitHub repo for monitoring.

Yeah, that’s when you like really care about a problem,

it must exist, this will be awesome,

you’re obsessed with it.

And now you can’t just do it on that day.

You need to pay the fixed cost of getting into the groove

and then you need to stay there for a while

and then society will come and they will try to mess

with you and they will try to distract you.

Yeah, the worst thing is like a person who’s like,

I just need five minutes of your time.

This is, the cost of that is not five minutes

and society needs to change how it thinks about

just five minutes of your time.

Right.

It’s never, it’s never, it’s just one minute,

it’s just 30 seconds, it’s just a quick thing.

What’s the big deal, why are you being so?

Yeah, no.

What’s your computer setup?

What’s like the perfect, are you somebody that’s flexible

to no matter what, laptop, four screens?

Yeah.

Or do you prefer a certain setup

that you’re most productive?

I guess the one that I’m familiar with is one large screen,

27 inch and my laptop on the side.

What operating system?

I do Macs, that’s my primary.

For all tasks?

I would say OSX, but when you’re working on deep learning,

everything is Linux, you’re SSH into a cluster

and you’re working remotely.

But what about the actual development,

like using the IDE?

Yeah, you would use, I think a good way is you just run

VS code, my favorite editor right now on your Mac,

but you are actually, you have a remote folder

through SSH, so the actual files that you’re manipulating

are in the cluster somewhere else.

So what’s the best IDE?

VS code, what else do people, so I use Emacs still.

That’s cool.

It may be cool, I don’t know if it’s maximum productivity.

So what do you recommend in terms of editors?

You worked a lot of software engineers,

editors for Python, C++, machine learning applications?

I think the current answer is VS code.

Currently, I believe that’s the best IDE.

It’s got a huge amount of extensions.

It has GitHub Copilot integration,

which I think is very valuable.

What do you think about the Copilot integration?

I was actually, I got to talk a bunch with Guido Narasim,

who’s a creator of Python, and he loves Copilot.

He like, he programs a lot with it.

Yeah.

Do you?

Yeah, I use Copilot, I love it.

And it’s free for me, but I would pay for it.

Yeah, I think it’s very good.

And the utility that I found with it was,

I would say there is a learning curve,

and you need to figure out when it’s helpful

and when to pay attention to its outputs,

and when it’s not going to be helpful,

where you should not pay attention to it.

Because if you’re just reading its suggestions all the time,

it’s not a good way of interacting with it.

But I think I was able to sort of like mold myself to it.

I find it’s very helpful, number one,

in copy, paste, and replace some parts.

So I don’t, when the pattern is clear,

it’s really good at completing the pattern.

And number two, sometimes it suggests APIs

that I’m not aware of.

So it tells you about something that you didn’t know.

So-

And that’s an opportunity to discover a new API.

It’s an opportunity to,

so I would never take Copilot code as given.

I almost always copy paste into a Google search,

and you see what this function is doing.

And then you’re like,

oh, it’s actually exactly what I need.

Thank you, Copilot.

So you learned something.

So it’s in part a search engine,

a part maybe getting the exact syntax correctly,

that once you see it, it’s that NP-hard thing.

Once you see it, you know-

Yes, exactly.

It’s correct.

But you yourself would struggle.

You can verify efficiently,

but you can’t generate efficiently.

And Copilot really, I mean,

it’s autopilot for programming, right?

And currently is doing the link following,

which is like the simple copy, paste,

and sometimes suggest.

But over time, it’s going to become more and more autonomous.

And so the same thing will play out in not just coding,

but actually across many, many different things probably.

But coding is an important one, right?

Writing programs.

How do you see the future of that developing,

the program synthesis,

like being able to write programs

that are more and more complicated?

Because right now it’s human supervised in interesting ways.

It feels like the transition will be very painful.

My mental model for it is the same thing will happen

as with the autopilot.

So currently it’s doing link following.

It’s doing some simple stuff.

And eventually we’ll be doing autonomy

and people will have to intervene less and less.

In other words, it could be like testing mechanisms.

Like if it writes a function

and that function looks pretty damn correct,

but how do you know it’s correct?

Because you’re like getting lazier and lazier

as a programmer.

Like your ability to, because like little bugs,

but I guess it won’t make little mistakes.

No, it will.

Copilot will make off by one subtle bugs.

It has done that to me.

But do you think future systems will?

Or is it really the off by one

is actually a fundamental challenge of programming?

In that case, it wasn’t fundamental.

And I think things can improve,

but yeah, I think humans have to supervise.

I am nervous about people not supervising what comes out

and what happens to, for example,

the proliferation of bugs in all of our systems.

I’m nervous about that,

but I think there will probably be some other copilots

for bug finding and stuff like that at some point,

because there’ll be like a lot more automation for.

Oh man.

It’s like a program, a copilot that generates a compiler,

one that does a linter.

One that does like a type checker.

It’s a committee of like a GPT sort of like.

And then there’ll be like a manager for the committee.

And then there’ll be somebody that says

a new version of this is needed.

We need to regenerate it.

There were 10 GPTs.

They were forwarded and gave 50 suggestions.

Another one looked at it and picked a few that they like.

A bug one looked at it and it was like,

it’s probably a bug.

They got re-ranked by some other thing.

And then a final ensemble GPT comes in and is like,

okay, given everything you guys have told me,

this is probably the next token.

You know, the feeling is the number of programmers

in the world has been growing and growing very quickly.

Do you think it’s possible that it’ll actually level out

and drop to like a very low number with this kind of world?

Because then you’d be doing software 2.0 programming

and you’ll be doing this kind of generation

of copilot type systems programming,

but you won’t be doing the old school

software 1.0 programming.

I don’t currently think that they’re just going

to replace human programmers.

I’m so hesitant saying stuff like this, right?

Because this is going to be replaced in five years.

I don’t know, it’s going to show that like,

this is where we thought, because I agree with you,

but I think we might be very surprised, right?

Like what are the next,

what’s your sense of where we’re staying

with language models?

Like, does it feel like the beginning or the middle

or the end?

The beginning, a hundred percent.

I think the big question in my mind is for sure,

I think GPT will be able to program quite well,

competently and so on.

How do you steer the system?

You still have to provide some guidance

to what you actually are looking for.

And so how do you steer it?

And how do you say, how do you talk to it?

How do you audit it and verify that what is done is correct?

And how do you like work with this?

And it’s as much, not just an AI problem,

but a UI UX problem.

Yeah.

So beautiful, fertile ground for so much interesting work

for VS Code++ where you’re not just,

it’s not just human programming anymore.

It’s amazing.

Yeah. So you’re interacting with the system.

So not just one prompt, but it’s iterative prompting.

Yeah.

You’re trying to figure out having a conversation

with the system.

Yeah.

That actually, I mean, to me, that’s super exciting

to have a conversation with the program I’m writing.

Yeah. Maybe at some point you’re just conversing with it.

It’s like, okay, here’s what I want to do.

Actually, this variable,

maybe it’s not even that low level as a variable, but.

You can also imagine like,

can you translate this to C++ and back to Python?

Yeah, that already kind of exists in some ways.

No, but just like doing it

as part of the program experience.

Like, I think I’d like to write this function in C++.

Or like, you just keep changing for different programs

because of different syntax.

Maybe I want to convert this into a functional language.

Yeah.

And so like, you get to become multilingual as a programmer

and dance back and forth efficiently.

Yeah.

I mean, I think the UI UX of it though

is like still very hard to think through

because it’s not just about writing code on a page.

You have an entire developer environment.

You have a bunch of hardware on it.

You have some environmental variables.

You have some scripts that are running in a Chrome job.

Like there’s a lot going on to like working with computers

and how do these systems set up environment flags

and work across multiple machines

and set up screen sessions and automate different processes.

Like how all that works and is auditable by humans

and so on is like massive question at the moment.

You’ve built Archive Sanity.

What is Archive?

And what is the future of academic research publishing

that you would like to see?

So Archive is this pre-print server.

So if you have a paper,

you can submit it for publication

to journals or conferences and then wait six months

and then maybe get a decision, pass or fail,

or you can just upload it to Archive.

And then people can tweet about it three minutes later.

And then everyone sees it, everyone reads it

and everyone can profit from it in their own little ways.

And you can cite it and it has an official look to it.

It feels like a publication process.

It feels different than if you just put it in a blog post.

Oh yeah.

Yeah, I mean, it’s a paper and usually the bar is higher

for something that you would expect on Archive

as opposed to something you would see in a blog post.

Well, the culture created the bar

because you could probably post a pretty crappy paper

in Archive.

So what’s that make you feel like?

What’s that make you feel about peer review?

So rigorous peer review by two, three experts

versus the peer review of the community

right as it’s written.

Yeah, basically I think the community is very well able

to peer review things very quickly on Twitter.

And I think maybe it just has to do something

with AI machine learning field specifically though.

I feel like things are more easily auditable

and the verification is easier potentially

than the verification somewhere else.

So it’s kind of like,

you can think of these scientific publications

as like little blockchains where everyone’s building

on each other’s work and citing each other.

And you sort of have AI,

which is kind of like this much faster and loose blockchain.

But then you have, and any one individual entry

is like very cheap to make.

And then you have other fields

where maybe that model doesn’t make as much sense.

And so I think in AI,

at least things are pretty easily verifiable.

And so that’s why when people upload papers

that are a really good idea and so on,

people can try it out like the next day

and they can be the final arbiter

of whether it works or not on their problem.

And the whole thing just moves significantly faster.

So I kind of feel like academia still has a place,

sort of this like conference journal process

still has a place,

but it’s sort of like, it lags behind, I think.

And it’s a bit more maybe higher quality process,

but it’s not sort of the place

where you will discover cutting edge work anymore.

It used to be the case when I was starting my PhD

that you go to conferences and journals

and you discuss all the latest research.

Now, when you go to a conference or journal,

like no one discusses anything that’s there

because it’s already like three generations ago irrelevant.

Yeah, which makes me sad about like DeepMind, for example,

where they still publish in nature

and these big prestigious,

I mean, there’s still value, I suppose,

to the prestige that comes with these big venues,

but the result is that they’ll announce

some breakthrough performance

and it will take like a year

to actually publish the details.

I mean, and those details,

if they were published immediately,

would inspire the community

to move in certain directions.

Yeah, it would speed up the rest of the community,

but I don’t know to what extent

that’s part of their objective function also.

That’s true.

So it’s not just the prestige,

a little bit of the delay is part of it.

Yeah, they certainly, DeepMind specifically,

has been working in the regime

of having a slightly higher quality,

basically process and latency,

and publishing those papers that way.

Another question from Reddit.

Do you or have you suffered from imposter syndrome?

Being the director of AI at Tesla,

being this person when you’re at Stanford

where the world looks at you as the expert in AI

to teach the world about machine learning?

When I was leaving Tesla after five years,

I spent a ton of time in meeting rooms

and I would read papers.

In the beginning, when I joined Tesla,

I was writing code,

and then I was writing less and less code,

and I was reading code,

and then I was reading less and less code.

And so this is just a natural progression

that happens, I think.

And definitely I would say near the tail end,

that’s when it sort of starts to hit you a bit more,

that you’re supposed to be an expert,

but actually the source of truth

is the code that people are writing,

the GitHub, and the actual code itself.

And you’re not as familiar with that as you used to be.

And so I would say maybe there’s some insecurity there.

Yeah, that’s actually pretty profound,

that a lot of the insecurity has to do

with not writing the code in the computer science space.

Because that is the truth, that right there.

The code is the source of truth.

The papers and everything else,

it’s a high-level summary.

Yeah, it’s just a high-level summary,

but at the end of the day, you have to read code.

It’s impossible to translate all that code

into actual paper form.

So when things come out,

especially when they have a source code available,

that’s my favorite place to go.

So like I said, you’re one of the greatest teachers

of machine learning, AI, ever,

from CS231N to today.

What advice would you give to beginners

interested in getting into machine learning?

Beginners are often focused on what to do.

And I think the focus should be more like how much you do.

So I am kind of like believer on a high level

in this 10,000 hours kind of concept,

where you just kind of have to just pick the things

where you can spend time,

and you care about, and you’re interested in.

You literally have to put in 10,000 hours of work.

It doesn’t even matter as much where you put it,

and you’ll iterate, and you’ll improve,

and you’ll waste some time.

I don’t know if there’s a better way.

You need to put in 10,000 hours.

But I think it’s actually really nice,

because I feel like there’s some sense of determinism

about being an expert at a thing if you spend 10,000 hours.

You can literally pick an arbitrary thing,

and I think if you spend 10,000 hours

of deliberate effort and work,

you actually will become an expert at it.

And so I think that’s kind of like a nice thought.

And so basically, I would focus more on like,

are you spending 10,000 hours?

That’s what I’d focus on.

So, and then thinking about what kind of mechanisms

maximize your likelihood of getting to 10,000 hours,

which for us silly humans means probably

forming a daily habit of like every single day

actually doing the thing.

Whatever helps you.

So I do think to a large extent

it’s a psychological problem for yourself.

One other thing that I help,

that I think is helpful for the psychology of it

is many times people compare themselves

to others in the area.

I think this is very harmful.

Only compare yourself to you from some time ago,

like say a year ago.

Are you better than you a year ago?

This is the only way to think.

And I think this, then you can see your progress

and it’s very motivating.

That’s so interesting that focus on the quantity of hours.

Because I think a lot of people in the beginner stage,

but actually throughout, get paralyzed by the choice.

Like which one, do I pick this path or this path?

Like they’ll literally get paralyzed

by like which IDE to use.

Well, they’re worried.

Yeah, they’re worried about all these things.

But the thing is, some of the,

you will waste time doing something wrong.

You will eventually figure out it’s not right.

You will accumulate scar tissue.

And next time you will grow stronger.

Because next time you’ll have the scar tissue

and next time you’ll learn from it.

And now next time you come to a similar situation,

you’ll be like, oh, I messed up.

I’ve spent a lot of time working on things

that never materialized into anything.

And I have all that scar tissue

and I have some intuitions about what was useful,

what wasn’t useful, how things turned out.

So all those mistakes were not dead work.

So I just think you should,

did you just focus on working?

What have you done?

What have you done last week?

That’s a good question, actually,

to ask for a lot of things, not just machine learning.

It’s a good way to cut the,

I forgot what the term we use,

but the fluff, the blubber, whatever the inefficiencies

in life.

What do you love about teaching?

You seem to find yourself often in the,

like drawn to teaching.

You’re very good at it, but you’re also drawn to it.

I mean, I don’t think I love teaching.

I love happy humans.

And happy humans like when I teach.

I wouldn’t say I hate teaching.

I tolerate teaching.

But it’s not like the act of teaching that I like.

It’s that I have something, I’m actually okay at it.

I’m okay at teaching and people appreciate it a lot.

And so I’m just happy to try to be helpful.

And teaching itself is not like the most,

I mean, it can be really annoying, frustrating.

I was working on a bunch of lectures just now.

I was reminded back to my days of 231N

just how much work it is to create some of these materials

and make them good.

The amount of iteration and thought

and you go down blind alleys and just how much you change it.

So creating something good

in terms of educational value is really hard.

And it’s not fun.

It’s difficult.

So people should definitely go watch your new stuff

you put out.

There are lectures where you’re actually building the thing

like from, like you said, the code is truth.

So discussing back propagation by building it,

by looking through it, just the whole thing.

So how difficult is that to prepare for?

I think that’s a really powerful way to teach.

Did you have to prepare for that

or are you just live thinking through it?

I will typically do like say three takes

and then I take like the better take.

So I do multiple takes

and then I take some of the better takes

and then I just build out a lecture that way.

Sometimes I have to delete 30 minutes of content

because it just went down an alley

that I didn’t like too much.

So there’s a bunch of iteration

and it probably takes me somewhere around 10 hours

to create one hour of content.

To get one hour.

It’s interesting.

I mean, is it difficult to go back to the basics?

Do you draw a lot of wisdom from going back to the basics?

Yeah, going back to back propagation loss functions

and one thing I like about teaching a lot, honestly,

is it definitely strengthens your understanding.

So it’s not a purely altruistic activity.

It’s a way to learn.

If you have to explain something to someone,

you realize you have gaps in knowledge.

And so I even surprised myself in those lectures.

Like, oh, so the result will obviously look at this

and then the result doesn’t look like it.

And I’m like, okay, I thought I understood this.

Yeah.

But that’s why it’s really cool.

They literally code, you run it in a notebook

and it gives you a result and you’re like, oh, wow.

And like actual numbers, actual input, actual code.

Yeah, it’s not mathematical symbols, et cetera.

The source of truth is the code.

It’s not slides.

It’s just like, let’s build it.

It’s beautiful.

You’re a rare human in that sense.

What advice would you give to researchers

trying to develop and publish idea

that have a big impact in the world of AI?

So maybe undergrads, maybe early graduate students.

Yeah.

I mean, I would say like,

they definitely have to be a little bit more strategic

than I had to be as a PhD student

because of the way AI is evolving.

It’s going the way of physics,

where in physics you used to be able to do experiments

on your benchtop and everything was great

and you can make progress.

And now you have to work in like LHC or like CERN.

And so AI is going in that direction as well.

So there’s certain kinds of things

that’s just not possible to do on the benchtop anymore.

And I think that didn’t used to be the case at the time.

Do you still think that there’s like

GAN type papers to be written?

Where like very simple idea that requires

just one computer to illustrate a simple example?

I mean, one example that’s been very influential recently

is diffusion models.

Diffusion models are amazing.

Diffusion models are six years old.

For the longest time, people were kind of ignoring them

as far as I can tell.

And they’re an amazing generative model,

especially in images.

And so stable diffusion and so on, it’s all diffusion based.

Diffusion is new.

It was not there and it came from,

well, it came from Google,

but a researcher could have come up with it.

In fact, some of the first,

actually, no, those came from Google as well.

But a researcher could come up with that

in an academic institution.

Yeah, what do you find most fascinating

about diffusion models?

So from the societal impact to the technical architecture.

What I like about diffusion is it works so well.

Is that surprising to you?

The variety, almost the novelty

of the synthetic data it’s generating.

Yeah, so the stable diffusion images are incredible.

The speed of improvement in generating images

has been insane.

We went very quickly from generating tiny digits

to tiny faces and it all looked messed up

and now we have stable diffusion.

And that happened very quickly.

There’s a lot that academia can still contribute.

For example, flash attention is a very efficient kernel

for running the attention operation inside the transformer.

That came from academic environment.

It’s a very clever way to structure the kernel.

That’s the calculation.

So it doesn’t materialize the attention matrix.

And so I think there’s still lots of things to contribute,

but you have to be just more strategic.

Do you think neural networks can be made to reason?

Yes.

Do you think they already reason?

Yes.

What’s your definition of reasoning?

Information processing.

Right.

So in the way that humans think through a problem

and come up with novel ideas,

it feels like reasoning.

Yeah.

So the novelty,

I don’t wanna say,

but out of distribution ideas,

you think it’s possible?

Yes, and I think we’re seeing that already

in the current neural nets.

You’re able to remix the training set information

into true generalization in some sense.

That doesn’t appear.

In a fundamental way in the training set.

Like you’re doing something interesting algorithmically.

You’re manipulating some symbols

and you’re coming up with some correct,

a unique answer in a new setting.

What would illustrate to you,

holy shit, this thing is definitely thinking?

To me, thinking or reasoning

is just information processing and generalization.

And I think the neural nets already do that today.

So being able to perceive the world

or perceive whatever the inputs are

and to make predictions based on that

or actions based on that, that’s reasoning.

Yeah, you’re giving correct answers in novel settings

by manipulating information.

You’ve learned the correct algorithm.

You’re not doing just some kind of a lookup table

on nearest neighbor search, something like that.

Let me ask you about AGI.

What are some moonshot ideas

you think might make significant progress towards AGI?

So maybe in other ways,

what are the big blockers that we’re missing now?

So basically I am fairly bullish

on our ability to build AGIs.

Basically automated systems that we can interact with

that are very human-like

and we can interact with them

in a digital realm or a physical realm.

Currently it seems most of the models

that sort of do these sort of magical tasks

are in a text realm.

I think, as I mentioned,

I’m suspicious that the text realm

is not enough to actually build full understanding

of the world.

I do actually think you need to go into pixels

and understand the physical world and how it works.

So I do think that we need to extend these models

to consume images and videos

and train on a lot more data

that is multimodal in that way.

Do you think you need to touch the world

to understand it also?

Well, that’s the big open question I would say in my mind

is if you also require the embodiment

and the ability to sort of interact with the world,

run experiments and have a data of that form,

then you need to go to Optimus or something like that.

And so I would say Optimus in some way

is like a hedge in AGI

because it seems to me that it’s possible

that just having data from the internet is not enough.

If that is the case, then Optimus may lead to AGI

because Optimus, to me, there’s nothing beyond Optimus.

You have like this humanoid form factor

that can actually like do stuff in the world.

You can have millions of them interacting

with humans and so on.

And if that doesn’t give rise to AGI at some point,

like I’m not sure what will.

So from a completeness perspective,

I think that’s a really good platform,

but it’s a much harder platform

because you are dealing with atoms

and you need to actually like build these things

and integrate them into society.

So I think that path takes longer,

but it’s much more certain.

And then there’s a path of the internet

and just like training these compression models effectively

on trying to compress all the internet.

And that might also give these agents as well.

Compress the internet,

but also interact with the internet.

Yeah.

So it’s not obvious to me.

In fact, I suspect you can reach AGI

without ever entering the physical world.

And which is a little bit more concerning

because that results in it happening faster.

So it just feels like we’re in boiling water.

We won’t know as it’s happening.

I’m not afraid of AGI.

I’m excited about it.

There’s always concerns,

but I would like to know when it happens.

Yeah.

And have like hints about when it happens,

like a year from now it will happen, that kind of thing.

Yeah.

I just feel like in the digital realm,

it just might happen.

Yeah.

I think all we have available to us,

because no one has built AGI again.

So all we have available to us is,

is there enough fertile ground on the periphery?

I would say yes.

And we have the progress.

So far, which has been very rapid.

And there are next steps that are available.

And so I would say, yeah,

it’s quite likely that we’ll be interacting

with digital entities.

How will you know that somebody has built AGI?

It’s going to be a slow,

I think it’s going to be a slow incremental transition.

It’s going to be product based and focused.

It’s going to be GitHub Copilot getting better.

And then GPT is helping you write.

And then these oracles that you can go to

with mathematical problems.

I think we’re on a verge of being able to ask

very complex questions in chemistry, physics, math

of these oracles and have them complete solutions.

So AGI to use primarily focused on intelligence.

So consciousness doesn’t enter into it.

So in my mind, consciousness is not a special thing

you will figure out and bolt on.

I think it’s an emergent phenomenon of a large enough

and complex enough generative model, sort of.

So if you have a complex enough world model,

that understands the world,

then it also understands its predicament in the world

as being a language model,

which to me is a form of consciousness or self-awareness.

So in order to understand the world deeply,

you probably have to integrate yourself into the world.

And in order to interact with humans

and other living beings,

consciousness is a very useful tool.

I think consciousness is like a modeling insight.

Modeling insight.

Yeah, it’s a, you have a powerful enough model

of understanding the world that you actually understand

that you are an entity in it.

Yeah, but there’s also this,

perhaps just the narrative we tell ourselves.

It feels like something to experience the world.

The hard problem of consciousness.

But that could be just a narrative that we tell ourselves.

Yeah, I don’t think, I think it will emerge.

I think it’s going to be something very boring.

Like we’ll be talking to these digital AIs.

They will claim they’re conscious.

They will appear conscious.

They will do all the things

that you would expect of other humans.

And it’s going to just be a stalemate.

I think there’ll be a lot of actual

fascinating ethical questions.

Like Supreme Court level questions

of whether you’re allowed to turn off a conscious AI.

If you’re allowed to build a conscious AI.

Maybe there would have to be the same kind of debates

that you have around,

sorry to bring up a political topic,

but abortion, which is the deeper question with abortion,

is what is life?

And the deep question with AI is also,

what is life and what is conscious?

And I think that’ll be very fascinating to bring up.

It might become illegal to build systems

that are capable of such level of intelligence

that consciousness would emerge

and therefore the capacity to suffer would emerge.

And a system that says, no, please don’t kill me.

Well, that’s what the Lambda chatbot

already told this Google engineer, right?

Like it was talking about not wanting to die or so on.

So that might become illegal to do that.

Right.

Because otherwise you might have a lot of creatures

that don’t want to die.

And they will-

You can just spawn infinity of them on a cluster.

And then that might lead to like horrible consequences

because then there might be a lot of people

that secretly love murder

and they’ll start practicing murder on those systems.

I mean, there’s just, to me, all of this stuff

just brings a beautiful mirror to the human condition

and human nature.

We’ll get to explore it.

And that’s what like the best of the Supreme Court,

of all the different debates we have about ideas,

of what it means to be human.

We get to ask those deep questions

that we’ve been asking throughout human history.

There’s always been the other in human history.

We’re the good guys and that’s the bad guys.

And we’re going to, throughout human history,

let’s murder the bad guys.

And the same will probably happen with robots.

It’ll be the other at first.

And then we’ll get to ask questions

of what does it mean to be alive?

What does it mean to be conscious?

Yeah.

And I think there’s some canary in the coal mines,

even with what we have today.

And for example, there’s these waifus

that you can work with.

And some people are trying to,

this company is going to shut down,

but this person really loved their waifu

and is trying to port it somewhere else.

And it’s not possible.

I think definitely people will have feelings

towards these systems.

Because in some sense, they are like a mirror of humanity

because they are sort of like a big average of humanity

in the way that it’s trained.

But we can, that average, we can actually watch.

It’s nice to be able to interact

with the big average of humanity

and do a search query on it.

Yeah.

Yeah, it’s very fascinating.

And we can also, of course, also shape it.

It’s not just a pure average.

We can mess with the training data.

We can mess with the objective.

We can fine tune them in various ways.

So we have some impact on what those systems look like.

If you want to achieve AGI,

and you could have a conversation with her

and ask her, talk about anything,

maybe ask her a question,

what kind of stuff would you ask?

I would have some practical questions in my mind,

like do I or my loved ones really have to die?

What can we do about that?

Do you think it will answer clearly

or would it answer poetically?

I would expect it to give solutions.

I would expect it to be like,

well, I’ve read all of these textbooks

and I know all these things that you’ve produced.

And it seems to me like here are the experiments

that I think it would be useful to run next.

And here are some gene therapies

that I think would be helpful.

And here are the kinds of experiments that you should run.

Okay, let’s go with this thought experiment, okay?

Imagine that mortality is actually

like a prerequisite for happiness.

So if we become immortal,

we’ll actually become deeply unhappy.

And the model is able to know that.

So what is it supposed to tell you,

stupid human, about it?

Yes, you can become immortal,

but you will become deeply unhappy.

If the model is, if the AGI system

is trying to empathize with you human,

what is it supposed to tell you?

That yes, you don’t have to die,

but you’re really not gonna like it?

Is it gonna be deeply honest?

Like there’s a interstellar, what is it?

The AI says like, humans want 90% honesty.

So like you have to pick how honest

do I wanna answer these practical questions?

Yeah, I love AI interstellar, by the way.

I think it’s like such a sidekick to the entire story,

but at the same time, it’s like really interesting.

It’s kind of limited in certain ways, right?

Yeah, it’s limited.

And I think that’s totally fine, by the way.

I don’t think, I think it’s fine

and plausible to have a limited and imperfect AGIs.

Is that a feature almost?

As an example, like it has a fixed amount of compute

on its physical body.

And it might just be that,

even though you can have a super amazing mega brain,

super intelligent AI, you also can have like,

you know, less intelligent AIs that you can deploy

and in a power efficient way.

And then they’re not perfect, they might make mistakes.

No, I meant more like, say you had infinite compute

and it’s still good to make mistakes sometimes.

Like in order to integrate yourself, like, what is it?

Going back to Good Will Hunting,

Robin Williams’ character says like the human imperfections,

that’s the good stuff, right?

Isn’t that the, like, we don’t want perfect,

we want flaws in part to form connections with each other,

because it feels like something you can attach

your feelings to, the flaws.

In that same way, you want an AI that’s flawed.

I don’t know.

I feel like perfection is cool.

But then you’re saying, okay, yeah.

But that’s not AGI.

But see, AGI would need to be intelligent enough

to give answers to humans that humans don’t understand.

And I think perfect is something humans can’t understand.

Because even science doesn’t give perfect answers.

There’s always gaps and mysteries and I don’t know.

I don’t know if humans want perfect.

Yeah, I could imagine just having a conversation

with this kind of oracle entity, as you’d imagine them.

And yeah, maybe it can tell you about,

you know, based on my analysis of human condition,

you might not want this.

And here’s some of the things that might matter.

But every dumb human will say, yeah, yeah, yeah, yeah.

Trust me.

I can, give me the truth.

I can handle it.

But that’s the beauty.

Like, people can choose, so.

But then, it’s the old marshmallow test with the kids

and so on.

I feel like too many people, like, can’t handle the truth.

Probably including myself.

Like, the deep truth of the human condition,

I don’t know if I can handle it.

Like, what if there’s some dark stuff?

What if we are an alien science experiment?

And it realizes that.

What if it had, I mean.

I mean, this is the matrix, you know, all over again.

I don’t know.

What would I talk about?

I don’t even, yeah.

I, probably I will go with the safer scientific questions

at first that have nothing to do with my own personal life.

Yeah.

And mortality.

Just like about physics and so on.

Yeah.

To build up, like, let’s see where it’s at.

Or maybe see if it has a sense of humor.

That’s another question.

Would it be able to, presumably, in order to,

if it understands humans deeply,

would it be able to generate, to generate humor?

Yeah.

I think that’s actually a wonderful benchmark, almost.

Like, is it able,

I think that’s a really good point, basically.

To make you laugh.

Yeah.

If it’s able to be like a very effective standup comedian

that is doing something very interesting computationally.

I think being funny is extremely hard.

Yeah.

Because it’s hard in a way, like a Turing test,

the original intent of the Turing test is hard

because you have to convince humans.

And there’s nothing, that’s why,

that’s why when comedians talk about this,

like, there’s, this is deeply honest.

Because if people can’t help but laugh,

and if they don’t laugh, that means you’re not funny.

If they laugh, it’s funny.

And you’re showing, you need a lot of knowledge

to create humor.

About, like you mentioned, human condition and so on.

And then you need to be clever with it.

You mentioned a few movies.

You tweeted, movies that I’ve seen five plus times,

but am ready and willing to keep watching.

Interstellar, Gladiator, Contact,

Good Will Hunting, The Matrix, Lord of the Rings,

all three, Avatar, Fifth Element, so on.

It goes on.

Terminator 2, mean girls, I’m not gonna ask about that.

I think, I think her man.

Mean girls is great.

What are some that jump out to you in your memory

that you love and why?

Like you mentioned The Matrix.

As a computer person, why do you love The Matrix?

There’s so many properties

that make it like beautiful and interesting.

So there’s all these philosophical questions,

but then there’s also AGIs,

and there’s simulation, and it’s cool.

And there’s, you know, the black, you know.

The look of it, the feel of it.

Yeah, the look of it, the feel of it,

the action, the bullet time.

It was just like innovating in so many ways.

And then Good Will Hunting, why do you like that one?

Yeah, I just, I really like this tortured genius

sort of character who’s like grappling

with whether or not he has like any responsibility

or like what to do with this gift that he was given

or like how to think about the whole thing.

And-

But there’s also a dance between the genius

and the personal, like what it means

to love another human being.

There’s a lot of themes there.

It’s just a beautiful movie.

And then the fatherly figure,

the mentor and the psychiatrist.

It like really like, it messes with you.

You know, there’s some movies

that just like really mess with you on a deep level.

Do you relate to that movie at all?

No.

It’s not your fault, Andre, as I said.

Lord of the Rings, that’s self-explanatory.

Terminator 2, which is interesting.

You re-watched that a lot.

Is that better than Terminator 1?

You like Arnold?

I do like Terminator 1 as well.

I like Terminator 2 a little bit more,

but in terms of like its surface properties.

Do you think Skynet is at all a possibility?

Yes.

Like the actual sort of autonomous weapon system

kind of thing?

Do you worry about that stuff?

AI being used for war?

I 100% worry about it.

And so the, I mean, the, you know,

some of these fears of AGIs and how this will plan out.

I mean, these will be like very powerful entities

probably at some point.

And so for a long time,

they’re going to be tools in the hands of humans.

You know, people talk about like alignment of AGIs

and how to make, the problem is like,

even humans are not aligned.

So how this will be used

and what this is going to look like is,

yeah, it’s troubling, so.

Do you think it’ll happen slowly enough

that we’ll be able to, as a human civilization,

think through the problems?

Yes, that’s my hope is that it happens slowly enough

and in an open enough way

where a lot of people can see and participate in it.

Just to figure out how to deal with this transition,

I think it’s going to be interesting.

I draw a lot of inspiration from nuclear weapons

because I sure thought it would be fucked

once they developed nuclear weapons.

But like, it’s almost like

when the systems are not so dangerous,

they destroy human civilization,

we deploy them and learn the lessons.

And then we quickly, if it’s too dangerous,

we’ll quickly, quickly, we might still deploy it,

but you very quickly learn not to use them.

And so there’ll be like this balance achieved.

Humans are very clever as a species.

It’s interesting.

We exploit the resources as much as we can,

but we don’t, we avoid destroying ourselves,

it seems like.

Well, I don’t know about that, actually.

I hope it continues.

I mean, I’m definitely concerned

about nuclear weapons and so on,

not just as a result of the recent conflict,

even before that.

That’s probably my number one concern for humanity.

So if humanity destroys itself

or destroys 90% of people,

that would be because of nukes?

I think so.

And it’s not even about the full destruction.

To me, it’s bad enough if we reset society.

That would be terrible.

It would be really bad.

I can’t believe we’re like so close to it.

Yeah.

It’s like so crazy to me.

It feels like we might be a few tweets away

from something like that.

Yep.

Basically, it’s extremely unnerving,

but, and has been for me for a long time.

It seems unstable that world leaders,

just having a bad mood,

can like take one step towards a bad direction

and it escalates.

Yeah.

Because of a collection of bad moods,

it can escalate without being able to stop.

Yeah, it’s just, it’s a huge amount of power.

And then also with the proliferation.

Basically, I don’t actually really see,

I don’t actually know what the good outcomes are here.

So I’m definitely worried about it a lot.

And then AGI is not currently there,

but I think at some point will more and more become

something like it.

The danger with AGI even is that,

I think it’s even like slightly worse

in a sense that there are good outcomes of AGI.

And then the bad outcomes are like an epsilon away,

like a tiny one away.

And so I think capitalism and humanity and so on

will drive for the positive ways of using that technology.

But then if bad outcomes are just like a tiny,

like flip a minus sign away,

that’s a really bad position to be in.

A tiny perturbation of the system

results in the destruction of the human species.

It’s a weird line to walk.

Yeah, I think in general,

this was really weird about like the dynamics of humanity

in this explosion we’ve talked about.

It’s just like the insane coupling afforded by technology

and just the instability of the whole dynamical system.

I think it just doesn’t look good, honestly.

Yeah, so that explosion could be destructive

or constructive and the probabilities are non-zero

in both ends of the spectrum.

I do feel like I have to try to be optimistic and so on.

And I think even in this case,

I still am predominantly optimistic,

but there’s definitely…

Me too.

Do you think we’ll become a multi-planetary species?

Probably yes,

but I don’t know if it’s dominant feature of future humanity.

There might be some people on some planets and so on,

but I’m not sure if it’s like,

yeah, if it’s like a major player in our culture and so on.

We still have to solve the drivers

of self-destruction here on Earth.

So just having a backup on Mars

is not gonna solve the problem.

So by the way, I love the backup on Mars.

I think that’s amazing.

You should absolutely do that.

Yes.

And I’m so thankful.

Would you go to Mars?

Personally, no, I do like Earth quite a lot.

Okay, I’ll go to Mars.

I’ll go for you.

I’ll tweet at you from there.

Maybe eventually I would once it’s safe enough,

but I don’t actually know if it’s on my lifetime scale,

unless I can extend it by a lot.

I do think that, for example,

a lot of people might disappear into virtual realities

and stuff like that.

And I think that could be the major thrust

of sort of the cultural development of humanity.

So it might not be,

it’s just really hard to work in physical realm

and go out there.

And I think ultimately all your experiences

are in your brain.

And so it’s much easier to disappear into digital realm.

And I think people will find them more compelling,

easier, safer, more interesting.

So you’re a little bit captivated by virtual reality,

by the possible worlds,

whether it’s the metaverse

or some other manifestation of that.

Yeah.

Yeah, it’s really interesting.

I’m interested, just talking a lot to Carmack,

where’s the thing that’s currently preventing that?

Yeah.

I mean, to be clear,

I think what’s interesting about the future is,

it’s not that,

I kind of feel like the variance

in the human condition grows.

That’s the primary thing that’s changing.

It’s not as much the mean of the distribution,

it’s like the variance of it.

So there will probably be people on Mars

and there will be people in VR

and there will be people here on earth.

It’s just like,

there’s so many more ways of being.

And so I kind of feel like,

I see it as like a spreading out of a human experience.

There’s something about the internet

that allows you to discover those little groups

and then you gravitate to something about your biology

likes that kind of world and that you find each other.

Yeah.

And we’ll have transhumanists

and then we’ll have the Amish

and they’re gonna,

everything is just gonna coexist.

You know, the cool thing about it,

cause I’ve interacted with a bunch of internet communities,

is they don’t know about each other.

Like you can have a very happy existence,

just like having a very close-knit community

and not knowing about each other.

I mean, even you haven’t sensed this,

just having traveled to Ukraine.

There’s, they don’t know so many things about America.

Yeah.

Like when you travel across the world,

I think you experience this too.

There are certain cultures that are like,

they have their own thing going on.

They don’t.

And so you can see that happening more and more

and more and more in the future.

We have little communities.

Yeah.

Yeah, I think so.

That seems to be the,

that seems to be how it’s going right now.

And I don’t see that trend.

I don’t see it like really reversing.

I think people are diverse

and they’re able to choose their own path in existence.

And I sort of like celebrate that.

And so-

Will you spend so much time in the metaverse,

in the virtual reality?

Or which community are you?

Are you the physicalist,

the physical reality enjoyer,

or do you see drawing a lot of pleasure

and fulfillment in the digital world?

Yeah, I think, well,

currently the virtual reality is not that compelling.

I do think it can improve a lot,

but I don’t really know to what extent.

Maybe, you know,

there’s actually like even more exotic things

you can think about with like neural links

or stuff like that.

So currently I kind of see myself

as mostly a team human person.

I love nature.

I love harmony.

I love people.

I love humanity.

I love emotions of humanity.

And I just want to be like,

in this like solar punk little utopia.

That’s my happy place.

My happy place is like people I love

thinking about cool problems

surrounded by a lush, beautiful, dynamic nature

and a secretly high tech in places that count.

Places that count.

So you use technology to empower that love

for other humans and nature.

Yeah, I think technology used like very sparingly.

I don’t love when it sort of gets in the way of humanity

in many ways.

I like just people being humans in a way.

We sort of like slightly evolved and prefer,

I think, just by default.

People kept asking me,

because they know you love reading.

Are there particular books that you enjoyed

that had an impact on you for silly

or for profound reasons that you would recommend?

You mentioned the vital question.

Many, of course.

I think in biology, as an example,

the vital question is a good one.

Anything by Nick Lane, really.

Life Ascending, I would say,

is like a bit more potentially representative

as like a summary of a lot of the things

he’s been talking about.

I was very impacted by The Selfish Gene.

I thought that was a really good book

that helped me understand altruism as an example

and where it comes from and just realizing

that the selectionism of genes

was a huge insight for me at the time.

And it sort of like cleared up a lot of things for me.

What do you think about the idea

that ideas are the organisms, the means?

Yes, love it, 100%.

Are you able to walk around with that notion for a while,

that there is an evolutionary kind of process

with ideas as well?

There absolutely is.

There’s memes just like genes and they compete

and they live in our brains.

It’s beautiful.

Are we silly humans thinking that we’re the organisms?

Is it possible that the primary organisms are the ideas?

Yeah, I would say like the ideas kind of live

in the software of like our civilization

in the minds and so on.

We think as humans that the hardware

is the fundamental thing.

I, human, is a hardware entity.

But it could be the software, right?

Yeah.

Yeah, I would say like there needs to be some grounding

at some point to like a physical reality.

Yeah, but if we clone an Andre,

the software is the thing,

like is the thing that makes that thing special, right?

Yeah, I guess you’re right.

But then cloning might be exceptionally difficult.

Like there might be a deep integration

between the software and the hardware

in ways we don’t quite understand.

Well, from the alien point of view,

like what makes me special is more like

the gang of genes that are riding in my chromosomes,

I suppose, right?

Like they’re the replicating unit, I suppose.

No, but that’s just the thing that makes you special, sure.

Well, the reality is what makes you special

is your ability to survive based on the software

that runs on the hardware that was built by the genes.

So the software is the thing that makes you survive,

not the hardware.

Or I guess it’s the two of them.

It’s a little bit of both.

It’s just like a second layer.

It’s a new second layer that hasn’t been there

before the brain.

They both coexist.

But there’s also layers of the software.

I mean, it’s a little abstraction

that’s on top of abstractions.

But okay, selfish gene.

So selfish gene, Nick Lane.

I would say sometimes books are like not sufficient.

I like to reach for textbooks sometimes.

I kind of feel like books are for too much

of a general consumption sometimes.

And they just kind of like,

they’re too high up in the level of abstraction

and it’s not good enough.

So I like textbooks.

I like The Cell.

I think The Cell was pretty cool.

That’s why also I like the writing of Nick Lane

is because he’s pretty willing to step one level down

and he doesn’t, yeah, he’s sort of, he’s willing to go there.

But he’s also willing to sort of be throughout the stack.

So he’ll go down to a lot of detail,

but then he will come back up.

And I think he has a, yeah,

basically I really appreciate that.

That’s why I love college, early college,

even high school.

Just textbooks on the basics of computer science,

of mathematics, of biology, of chemistry.

Those are, they condense down.

It’s sufficiently general that you can understand

both the philosophy and the details,

but also like you get homework problems

and you get to play with it as much as you would

if you were in programming stuff.

And then I’m also suspicious of textbooks, honestly,

because as an example in deep learning,

there’s no like amazing textbooks

and the field is changing very quickly.

I imagine the same is true in say synthetic biology

and so on.

These books like The Cell are kind of outdated.

They’re still high level.

Like what is the actual real source of truth?

It’s people in wet labs working with cells,

sequencing genomes and yeah, actually working with it.

And I don’t have that much exposure to that

or what that looks like.

So I still don’t fully, I’m reading through The Cell

and it’s kind of interesting and I’m learning,

but it’s still not sufficient, I would say,

in terms of understanding.

Well, it’s a clean summarization

of the mainstream narrative.

Yeah.

But you have to learn that before you break out

towards the cutting edge.

Yeah.

What is the actual process of working with these cells

and growing them and incubating them?

And it’s kind of like a massive cooking recipes

of making sure your cells lives and proliferate

and then you’re sequencing them, running experiments

and just how that works,

I think is kind of like the source of truth

of at the end of the day,

what’s really useful in terms of creating therapies

and so on.

Yeah, I wonder what in the future AI textbooks will be,

because there’s artificial intelligence,

the modern approach.

I actually haven’t read if it’s come out,

the recent version, there’s been a recent edition.

I also saw there’s a science and deep learning book.

I’m waiting for textbooks that are worth recommending,

worth reading.

Yeah.

It’s tricky because it’s like papers and code, code, code.

Honestly, I think papers are quite good.

I especially like the appendix of any paper as well.

It’s like the most detail you can have.

It doesn’t have to be cohesive,

connected to anything else.

You just described me a very specific way

you saw the particular thing, yeah.

Many times papers can be actually quite readable,

not always, but sometimes the introduction

and the abstract is readable,

even for someone outside of the field.

This is not always true.

And sometimes I think, unfortunately,

scientists use complex terms, even when it’s not necessary.

I think that’s harmful.

I think there’s no reason for that.

And papers sometimes are longer than they need to be

in the parts that don’t matter.

Yeah.

Appendix would be long, but then the paper itself,

look at Einstein, make it simple.

Yeah, but certainly I’ve come across papers,

I would say, like synthetic biology or something

that I thought were quite readable

for the abstract and the introduction.

And then you’re reading the rest of it

and you don’t fully understand,

but you kind of are getting a gist and I think it’s cool.

What advice, you give advice to folks

interested in machine learning and research,

but in general, life advice to a young person,

high school, early college,

about how to have a career they can be proud of

or a life they can be proud of?

Yeah, I think I’m very hesitant to give general advice.

I think it’s really hard.

I’ve mentioned, like some of the stuff I’ve mentioned

is fairly general, I think,

like focus on just the amount of work you’re spending

on like a thing, compare yourself only to yourself,

not to others.

That’s good.

I think those are fairly general.

How do you pick the thing?

You just have like a deep interest in something

or like try to like find the argmax

over like the things that you’re interested in.

Argmax at that moment and stick with it.

How do you not get distracted and switch to another thing?

You can, if you like.

Well, if you do an argmax repeatedly,

every week, every month.

It doesn’t converge.

It doesn’t, it’s a problem.

Yeah, you can like low pass filter yourself

in terms of like what has consistently been true for you.

But yeah, I definitely see how it can be hard,

but I would say like you’re going to work the hardest

on the thing that you care about the most.

So low pass filter yourself and really introspect.

In your past, what are the things that gave you energy?

And what are the things that took energy away from you?

Concrete examples.

And usually from those concrete examples,

sometimes patterns can emerge.

I like it when things look like this

when I’m in these positions.

So that’s not necessarily the field,

but the kind of stuff you’re doing in a particular field.

So for you, it seems like you were energized

by implementing stuff, building actual things.

Yeah, being low level learning.

And then also communicating

so that others can go through the same realizations

and shortening that gap.

Because I usually have to do way too much work

to understand a thing.

And then I’m like, okay, this is actually like,

okay, I think I get it.

And like, why was it so much work?

It should have been much less work.

And that gives me a lot of frustration.

And that’s why I sometimes go teach.

So aside from the teaching you’re doing now,

putting out videos,

aside from a potential Godfather part two

with the AGI at Tesla and beyond,

what does the future for Andrej Karpathy hold?

Have you figured that out yet or no?

I mean, as you see through the fog of war,

that is all of our future.

Do you start seeing silhouettes

of what that possible future could look like?

The consistent thing I’ve been always interested in,

for me at least, is AI.

And that’s probably what I’m spending

the rest of my life on,

because I just care about it a lot.

And I actually care about like many other problems as well,

like say aging, which I basically view as disease.

And I care about that as well,

but I don’t think it’s a good idea

to go after it specifically.

I don’t actually think that humans

will be able to come up with the answer.

I think the correct thing to do is to ignore those problems

and you solve AI and then use that to solve everything else.

And I think there’s a chance that this will work.

I think it’s a very high chance.

And that’s kind of like the way I’m betting at least.

So when you think about AI,

are you interested in all kinds of applications,

all kinds of domains,

and any domain you focus on will allow you

to get insights to the big problem of AGI?

Yeah, for me, it’s the ultimate meta problem.

I don’t want to work on any one specific problem.

There’s too many problems.

So how can you work on all problems simultaneously?

You solve the meta problem,

which to me is just intelligence.

And how do you automate it?

Is there cool small projects like Archive Sanity

and so on that you’re thinking about?

That the world, the ML world can anticipate?

There’s always like some fun side projects.

Archive Sanity is one.

Basically like there’s way too many archive papers.

How can I organize it and recommend papers and so on?

I transcribed all of your podcasts.

What did you learn from that experience?

From transcribing the process of,

like you like consuming audio books and podcasts and so on.

And here’s a process that achieves

closer to human level performance on annotation.

Yeah, well, I definitely was like surprised

that transcription with OpenAI’s Whisper

was working so well,

compared to what I’m familiar with from Siri

and like a few other systems, I guess.

It works so well.

And that’s what gave me some energy to like try it out.

And I thought it could be fun to run on podcasts.

It’s kind of not obvious to me

why Whisper is so much better compared to anything else,

because I feel like there should be a lot of incentive

for a lot of companies to produce transcription systems.

And that they’ve done so over a long time.

Whisper is not a super exotic model.

It’s a transformer.

It takes MEL spectrograms and just outputs tokens of text.

It’s not crazy.

The model and everything has been around for a long time.

I’m not actually 100% sure why this came out.

It’s not obvious to me either.

It makes me feel like I’m missing something.

I’m missing something.

Yeah, because there is a huge,

even at Google and so on, YouTube transcription.

Yeah.

Yeah, it’s unclear.

But some of it is also integrating into a bigger system.

Yeah.

That, so the user interface,

how it’s deployed and all that kind of stuff.

Maybe running it as an independent thing is much easier,

like an order of magnitude easier

than deploying to a large integrated system

like YouTube transcription or anything like meetings.

Like Zoom has transcription.

That’s kind of crappy.

But creating an interface

where it detects the different individual speakers,

it’s able to display it in compelling ways,

run it in real time, all that kind of stuff.

Maybe that’s difficult.

But that’s the only explanation I have

because like I’m currently paying quite a bit

for human transcription, human captions.

Right.

Annotation.

And like, it seems like there’s a huge incentive

to automate that.

Yeah.

It’s very confusing.

And I think, I mean, I don’t know if you looked

at some of the Whisper transcripts,

but they’re quite good.

They’re good.

And especially in tricky cases.

Yeah.

I’ve seen Whisper’s performance on like super tricky cases

and it does incredibly well.

So I don’t know.

A podcast is pretty simple.

It’s like high quality audio

and you’re speaking usually pretty clearly.

And so I don’t know.

I don’t know what OpenAI’s plans are either.

But yeah, there’s always like fun projects basically.

And stable diffusion also is opening up

a huge amount of experimentation,

I would say in the visual realm

and generating images and videos and movies.

Ultimately.

Yeah, videos now.

And so that’s going to be pretty crazy.

That’s going to almost certainly work

and it’s going to be really interesting

when the cost of content creation is going to fall to zero.

You used to need a painter for a few months

to paint a thing.

And now it’s going to be speak to your phone

to get your video.

So Hollywood will start using that to generate scenes,

which completely opens up.

Yeah, so you can make a movie like Avatar

eventually for under a million dollars.

Much less, maybe just by talking to your phone.

I mean, I know it sounds kind of crazy.

And then there’d be some voting mechanism.

Like how do you have a…

Like would there be a show on Netflix

that’s generated completely automatically?

Semi-automatically.

Potentially, yeah.

And what does it look like also

when you can just generate it on demand

and there’s infinity of it?

Yeah.

Oh man.

All the synthetic content.

I mean, it’s humbling because we treat ourselves

as special for being able to generate art

and ideas and all that kind of stuff.

If that can be done in an automated way by AI.

Yeah.

I think it’s fascinating to me how these…

The predictions of AI and what it’s going to look like

and what it’s going to be capable of

are completely inverted and wrong.

And sci-fi of 50s and 60s was just like totally not right.

They imagined AI as like super calculating theorem provers

and we’re getting things

that can talk to you about emotions.

They can do art.

It’s just like weird.

Are you excited about that future?

Just AI’s like hybrid systems,

heterogeneous systems of humans and AIs

talking about emotions.

Netflix and chill with an AI system.

That’s where the Netflix thing you watch

is also generated by AI.

I think it’s going to be interesting for sure.

And I think I’m cautiously optimistic,

but it’s not obvious.

Well, the sad thing is your brain and mine

developed in a time where,

before Twitter, before the internet.

So I wonder people that are born inside of it

might have a different experience.

Like I, maybe you, will still resist it.

And the people born now will not.

Well, I do feel like humans are extremely malleable.

Yeah.

And you’re probably right.

What is the meaning of life, Andre?

We talked about sort of the universe

having a conversation with us humans

or with the systems we create to try to answer.

For the creator of the universe to notice us,

we’re trying to create systems that are loud enough

to answer back.

I don’t know if that’s the meaning of life.

That’s like meaning of life for some people.

The first level answer I would say is

anyone can choose their own meaning of life

because we are a conscious entity and it’s beautiful.

Number one.

But I do think that like a deeper meaning of life

if someone is interested is along the lines of like,

what the hell is all this?

And like, why?

And if you look into fundamental physics

and the quantum field theory and the standard model,

they’re like very complicated.

And there’s this like 19 free parameters of our universe.

And like, what’s going on with all this stuff?

And why is it here?

And can I hack it?

Can I work with it?

Is there a message for me?

Am I supposed to create a message?

And so I think there’s some fundamental answers there.

But I think there’s actually even like,

you can’t actually like really make dent in those

without more time.

And so to me also there’s a big question

around just getting more time, honestly.

Yeah, that’s kind of like what I think about

quite a bit as well.

So kind of the ultimate, or at least first way

to sneak up to the why question is to try to escape

the system, the universe.

And then for that, you sort of backtrack and say,

okay, for that, that’s gonna take a very long time.

So the why question boils down from an engineering

perspective to how do we extend?

Yeah, I think that’s the question number one,

practically speaking, because you can’t,

you’re not gonna calculate the answer

to the deeper questions in time you have.

And that could be extending your own lifetime

or extending just the lifetime of human civilization.

Of whoever wants to.

Not many people might not want that.

But I think people who do want that,

I think it’s probably possible.

And I don’t know that people fully realize this.

I kind of feel like people think of death

as an inevitability.

But at the end of the day, this is a physical system.

Some things go wrong.

It makes sense why things like this happen,

evolutionarily speaking.

And there’s most certainly interventions that mitigate it.

That’d be interesting if death is eventually looked at

as a fascinating thing that used to happen to humans.

I don’t think it’s unlikely.

I think it’s likely.

And it’s up to our imagination to try to predict

what the world without death looks like.

Yeah.

It’s hard to, I think the values will completely change.

Could be.

I don’t really buy all these ideas that,

oh, without death, there’s no meaning,

there’s nothingness.

I don’t intuitively buy all those arguments.

I think there’s plenty of meaning,

plenty of things to learn.

They’re interesting, exciting.

I want to know, I want to calculate.

I want to improve the condition

of all the humans and organisms that are alive.

Yeah, the way we find meaning might change.

There is a lot of humans, probably including myself,

that finds meaning in the finiteness of things.

But that doesn’t mean that’s the only source of meaning.

Yeah.

I do think many people will go with that,

which I think is great.

I love the idea that people

can just choose their own adventure.

Like you are born as a conscious, free entity,

by default, I’d like to think.

And you have your unalienable rights for life.

In the pursuit of happiness.

Pursuit, I don’t know if you have that.

In the nature, the landscape of happiness.

You can choose your own adventure, mostly.

And that’s not fully true, but.

I still am pretty sure I’m an NPC.

But an NPC can’t know it’s an NPC.

There could be different degrees and levels of consciousness.

I don’t think there’s a more beautiful way to end it.

Andre, you’re an incredible person.

I’m really honored you would talk with me.

Everything you’ve done for the machine learning world,

for the AI world, to just inspire people,

to educate millions of people, it’s been great.

And I can’t wait to see what you do next.

It’s been an honor, man.

Thank you so much for talking today.

Awesome, thank you.

Thanks for listening to this conversation

with Andre Karpathy.

To support this podcast,

please check out our sponsors in the description.

And now, let me leave you with some words

from Samuel Carlin.

The purpose of models is not to fit the data,

but to sharpen the questions.

Thanks for listening, and hope to see you next time.

♪♪♪

comments powered by Disqus