Lex Fridman Podcast - #70 - Jim Keller: Moore’s Law, Microprocessors, Abstractions, and First Principles

The following is a conversation with Jim Keller,

legendary microprocessor engineer

who has worked at AMD, Apple, Tesla, and now Intel.

He’s known for his work on AMD K7, K8, K12,

and Zen microarchitectures, Apple A4 and A5 processors,

and coauthor of the specification

for the x8664 instruction set

and hypertransport interconnect.

He’s a brilliant first principles engineer

and out of the box thinker,

and just an interesting and fun human being to talk to.

This is the Artificial Intelligence Podcast.

If you enjoy it, subscribe on YouTube,

give it five stars on Apple Podcast,

follow on Spotify, support it on Patreon,

or simply connect with me on Twitter,

at Lex Friedman, spelled F R I D M A N.

I recently started doing ads

at the end of the introduction.

I’ll do one or two minutes after introducing the episode

and never any ads in the middle

that can break the flow of the conversation.

I hope that works for you

and doesn’t hurt the listening experience.

This show is presented by Cash App,

the number one finance app in the App Store.

I personally use Cash App to send money to friends,

but you can also use it to buy, sell,

and deposit Bitcoin in just seconds.

Cash App also has a new investing feature.

You can buy fractions of a stock, say $1 worth,

no matter what the stock price is.

Broker services are provided by Cash App Investing,

a subsidiary of Square and member SIPC.

I’m excited to be working with Cash App

to support one of my favorite organizations called First,

best known for their FIRST Robotics and Lego competitions.

They educate and inspire hundreds of thousands of students

in over 110 countries and have a perfect rating

at Charity Navigator,

which means that donated money

is used to maximum effectiveness.

When you get Cash App from the App Store or Google Play

and use code LEXPODCAST,

you’ll get $10 and Cash App will also donate $10 to FIRST,

which again is an organization

that I’ve personally seen inspire girls and boys

to dream of engineering a better world.

And now here’s my conversation with Jim Keller.

What are the differences and similarities

between the human brain and a computer

with the microprocessor at its core?

Let’s start with the philosophical question perhaps.

Well, since people don’t actually understand

how human brains work, I think that’s true.

I think that’s true.

So it’s hard to compare them.

Computers are, you know, there’s really two things.

There’s memory and there’s computation, right?

And to date, almost all computer architectures

are global memory, which is a thing, right?

And then computation where you pull data

and you do relatively simple operations on it

and write data back.

So it’s decoupled in modern computers.

And you think in the human brain,

everything’s a mesh, a mess that’s combined together?

What people observe is there’s, you know,

some number of layers of neurons

which have local and global connections

and information is stored in some distributed fashion

and people build things called neural networks in computers

where the information is distributed

in some kind of fashion.

You know, there’s a mathematics behind it.

I don’t know that the understanding of that is super deep.

The computations we run on those

are straightforward computations.

I don’t believe anybody has said

a neuron does this computation.

So to date, it’s hard to compare them, I would say.

So let’s get into the basics before we zoom back out.

How do you build a computer from scratch?

What is a microprocessor?

What is a microarchitecture?

What’s an instruction set architecture?

Maybe even as far back as what is a transistor?

So the special charm of computer engineering

is there’s a relatively good understanding

of abstraction layers.

So down at the bottom, you have atoms

and atoms get put together in materials like silicon

or dope silicon or metal and we build transistors.

On top of that, we build logic gates, right?

And then functional units, like an adder or a subtractor

or an instruction parsing unit.

And then we assemble those into processing elements.

Modern computers are built out of probably 10 to 20

locally organic processing elements

or coherent processing elements.

And then that runs computer programs, right?

So there’s abstraction layers and then software,

there’s an instruction set you run

and then there’s assembly language C, C++, Java, JavaScript.

There’s abstraction layers,

essentially from the atom to the data center, right?

So when you build a computer,

first there’s a target, like what’s it for?

Like how fast does it have to be?

Which today there’s a whole bunch of metrics

about what that is.

And then in an organization of 1,000 people

who build a computer, there’s lots of different disciplines

that you have to operate on.

Does that make sense?

And so…

So there’s a bunch of levels of abstraction

in an organization like Intel and in your own vision,

there’s a lot of brilliance that comes in

at every one of those layers.

Some of it is science, some of it is engineering,

some of it is art, what’s the most,

if you could pick favorites,

what’s the most important, your favorite layer

on these layers of abstractions?

Where does the magic enter this hierarchy?

I don’t really care.

That’s the fun, you know, I’m somewhat agnostic to that.

So I would say for relatively long periods of time,

instruction sets are stable.

So the x86 instruction set, the ARM instruction set.

What’s an instruction set?

So it says, how do you encode the basic operations?

Load, store, multiply, add, subtract, conditional, branch.

You know, there aren’t that many interesting instructions.

Look, if you look at a program and it runs,

you know, 90% of the execution is on 25 opcodes,

you know, 25 instructions.

And those are stable, right?

What does it mean, stable?

Intel architecture’s been around for 25 years.

It works.

And that’s because the basics, you know,

are defined a long time ago, right?

Now, the way an old computer ran is you fetched

instructions and you executed them in order.

Do the load, do the add, do the compare.

The way a modern computer works is you fetch

large numbers of instructions, say 500.

And then you find the dependency graph

between the instructions.

And then you execute in independent units

those little micrographs.

So a modern computer, like people like to say,

computers should be simple and clean.

But it turns out the market for simple,

clean, slow computers is zero, right?

We don’t sell any simple, clean computers.

No, you can, how you build it can be clean,

but the computer people want to buy,

that’s, say, in a phone or a data center,

fetches a large number of instructions,

computes the dependency graph,

and then executes it in a way that gets the right answers.

And optimizes that graph somehow.

Yeah, they run deeply out of order.

And then there’s semantics around how memory ordering works

and other things work.

So the computer sort of has a bunch of bookkeeping tables

that says what order should these operations finish in

or appear to finish in?

But to go fast, you have to fetch a lot of instructions

and find all the parallelism.

Now, there’s a second kind of computer,

which we call GPUs today.

And I call it the difference.

There’s found parallelism, like you have a program

with a lot of dependent instructions.

You fetch a bunch and then you go figure out

the dependency graph and you issue instructions out of order.

That’s because you have one serial narrative to execute,

which, in fact, can be done out of order.

Did you call it a narrative?


Oh, wow.

Yeah, so humans think of serial narrative.

So read a book, right?

There’s a sentence after sentence after sentence,

and there’s paragraphs.

Now, you could diagram that.

Imagine you diagrammed it properly and you said,

which sentences could be read in any order,

any order without changing the meaning, right?

That’s a fascinating question to ask of a book, yeah.

Yeah, you could do that, right?

So some paragraphs could be reordered,

some sentences can be reordered.

You could say, he is tall and smart and X, right?

And it doesn’t matter the order of tall and smart.

But if you say the tall man is wearing a red shirt,

what colors, you can create dependencies, right?

And so GPUs, on the other hand,

run simple programs on pixels,

but you’re given a million of them.

And the first order, the screen you’re looking at

doesn’t care which order you do it in.

So I call that given parallelism.

Simple narratives around the large numbers of things

where you can just say,

it’s parallel because you told me it was.

So found parallelism where the narrative is sequential,

but you discover like little pockets of parallelism versus.

Turns out large pockets of parallelism.

Large, so how hard is it to discover?

Well, how hard is it?

That’s just transistor count, right?

So once you crack the problem, you say,

here’s how you fetch 10 instructions at a time.

Here’s how you calculate the dependencies between them.

Here’s how you describe the dependencies.

Here’s, you know, these are pieces, right?

So once you describe the dependencies,

then it’s just a graph.

Sort of, it’s an algorithm that finds,

what is that?

I’m sure there’s a graph theoretical answer here

that’s solvable.

In general, programs, modern programs

that human beings write,

how much found parallelism is there in them?

What does 10X mean?

So if you execute it in order, you would get

what’s called cycles per instruction,

and it would be about, you know,

three instructions, three cycles per instruction

because of the latency of the operations and stuff.

And in a modern computer, excuse it,

but like 0.2, 0.25 cycles per instruction.

So it’s about, we today find 10X.

And there’s two things.

One is the found parallelism in the narrative, right?

And the other is the predictability of the narrative, right?

So certain operations say, do a bunch of calculations,

and if greater than one, do this, else do that.

That decision is predicted in modern computers

to high 90% accuracy.

So branches happen a lot.

So imagine you have a decision

to make every six instructions,

which is about the average, right?

But you want to fetch 500 instructions,

figure out the graph, and execute them all in parallel.

That means you have, let’s say,

if you fetch 600 instructions and it’s every six,

you have to fetch, you have to predict

99 out of 100 branches correctly

for that window to be effective.

Okay, so parallelism, you can’t parallelize branches.

Or you can.

No, you can predict.

You can predict.

What does predicted branch mean?

So imagine you do a computation over and over.

You’re in a loop.

So while n is greater than one, do.

And you go through that loop a million times.

So every time you look at the branch,

you say, it’s probably still greater than one.

And you’re saying you could do that accurately.

Very accurately.

Modern computers.

My mind is blown.

How the heck do you do that?

Wait a minute.

Well, you want to know?

This is really sad.

20 years ago, you simply recorded

which way the branch went last time

and predicted the same thing.



What’s the accuracy of that?


So then somebody said, hey, let’s keep a couple of bits

and have a little counter so when it predicts one way,

we count up and then pins.

So say you have a three bit counter.

So you count up and then you count down.

And you can use the top bit as the signed bit

so you have a signed two bit number.

So if it’s greater than one, you predict taken.

And less than one, you predict not taken, right?

Or less than zero, whatever the thing is.

And that got us to 92%.


Okay, no, it gets better.

This branch depends on how you got there.

So if you came down the code one way,

you’re talking about Bob and Jane, right?

And then said, does Bob like Jane?

It went one way.

But if you’re talking about Bob and Jill,

does Bob like Jane?

You go a different way.

Right, so that’s called history.

So you take the history and a counter.

That’s cool, but that’s not how anything works today.

They use something that looks a little like a neural network.

So modern, you take all the execution flows.

And then you do basically deep pattern recognition

of how the program is executing.

And you do that multiple different ways.

And you have something that chooses what the best result is.

There’s a little supercomputer inside the computer.

That’s trying to predict branching.

That calculates which way branches go.

So the effective window that it’s worth finding grass

in gets bigger.

Why was that gonna make me sad?

Because that’s amazing.

It’s amazingly complicated.

Oh, well.

Well, here’s the funny thing.

So to get to 85% took 1,000 bits.

To get to 99% takes tens of megabits.

So this is one of those, to get the result,

to get from a window of say 50 instructions to 500,

it took three orders of magnitude

or four orders of magnitude more bits.

Now if you get the prediction of a branch wrong,

what happens then?

You flush the pipe.

You flush the pipe, so it’s just the performance cost.

But it gets even better.


So we’re starting to look at stuff that says,

so they executed down this path,

and then you had two ways to go.

But far away, there’s something that doesn’t matter

which path you went.

So you took the wrong path.

You executed a bunch of stuff.

Then you had the mispredicting.

You backed it up.

You remembered all the results you already calculated.

Some of those are just fine.

Like if you read a book and you misunderstand a paragraph,

your understanding of the next paragraph

sometimes is invariant to that understanding.

Sometimes it depends on it.

And you can kind of anticipate that invariance.

Yeah, well, you can keep track of whether the data changed.

And so when you come back through a piece of code,

should you calculate it again or do the same thing?

Okay, how much of this is art and how much of it is science?

Because it sounds pretty complicated.

Well, how do you describe a situation?

So imagine you come to a point in the road

where you have to make a decision, right?

And you have a bunch of knowledge about which way to go.

Maybe you have a map.

So you wanna go the shortest way,

or do you wanna go the fastest way,

or do you wanna take the nicest road?

So there’s some set of data.

So imagine you’re doing something complicated

like building a computer.

And there’s hundreds of decision points,

all with hundreds of possible ways to go.

And the ways you pick interact in a complicated way.


And then you have to pick the right spot.

Right, so that’s.

So that’s art or science, I don’t know.

You avoided the question.

You just described the Robert Frost problem

of road less taken.

I described the Robert Frost problem?

That’s what we do as computer designers.

It’s all poetry.



Yeah, I don’t know how to describe that

because some people are very good

at making those intuitive leaps.

It seems like just combinations of things.

Some people are less good at it,

but they’re really good at evaluating the alternatives.

Right, and everybody has a different way to do it.

And some people can’t make those leaps,

but they’re really good at analyzing it.

So when you see computers are designed

by teams of people who have very different skill sets.

And a good team has lots of different kinds of people.

I suspect you would describe some of them

as artistic, but not very many.

Unfortunately, or fortunately.


Well, you know, computer design’s hard.

It’s 99% perspiration.

And the 1% inspiration is really important.

But you still need the 99.

Yeah, you gotta do a lot of work.

And then there are interesting things to do

at every level of that stack.

So at the end of the day,

if you run the same program multiple times,

does it always produce the same result?

Is there some room for fuzziness there?

That’s a math problem.

So if you run a correct C program,

the definition is every time you run it,

you get the same answer.

Yeah, well that’s a math statement.

But that’s a language definitional statement.

So for years when people did,

when we first did 3D acceleration of graphics,

you could run the same scene multiple times

and get different answers.


Right, and then some people thought that was okay

and some people thought it was a bad idea.

And then when the HPC world used GPUs for calculations,

they thought it was a really bad idea.

Okay, now in modern AI stuff,

people are looking at networks

where the precision of the data is low enough

that the data is somewhat noisy.

And the observation is the input data is unbelievably noisy.

So why should the calculation be not noisy?

And people have experimented with algorithms

that say can get faster answers by being noisy.

Like as a network starts to converge,

if you look at the computation graph,

it starts out really wide and then it gets narrower.

And you can say is that last little bit that important

or should I start the graph on the next rev

before we whittle it all the way down to the answer, right?

So you can create algorithms that are noisy.

Now if you’re developing something

and every time you run it, you get a different answer,

it’s really annoying.

And so most people think even today,

every time you run the program, you get the same answer.

No, I know, but the question is

that’s the formal definition of a programming language.

There is a definition of languages

that don’t get the same answer,

but people who use those, you always want something

because you get a bad answer and then you’re wondering

is it because of something in the algorithm

or because of this?

And so everybody wants a little switch that says

no matter what, do it deterministically.

And it’s really weird because almost everything

going into modern calculations is noisy.

So why do the answers have to be so clear?

Right, so where do you stand?

I design computers for people who run programs.

So if somebody says I want a deterministic answer,

like most people want that.

Can you deliver a deterministic answer,

I guess is the question.

Like when you.

Yeah, hopefully, sure.

What people don’t realize is you get a deterministic answer

even though the execution flow is very undeterministic.

So you run this program 100 times,

it never runs the same way twice, ever.

And the answer, it arrives at the same answer.

But it gets the same answer every time.

It’s just amazing.

Okay, you’ve achieved, in the eyes of many people,

legend status as a chip art architect.

What design creation are you most proud of?

Perhaps because it was challenging,

because of its impact, or because of the set

of brilliant ideas that were involved in bringing it to life?

I find that description odd.

And I have two small children, and I promise you,

they think it’s hilarious.

This question.


I do it for them.

So I’m really interested in building computers.

And I’ve worked with really, really smart people.

I’m not unbelievably smart.

I’m fascinated by how they go together,

both as a thing to do and as an endeavor that people do.

How people and computers go together?


Like how people think and build a computer.

And I find sometimes that the best computer architects

aren’t that interested in people,

or the best people managers aren’t that good

at designing computers.

So the whole stack of human beings is fascinating.

So the managers, the individual engineers.

Yeah, yeah.

Yeah, I said I realized after a lot of years

of building computers, where you sort of build them

out of transistors, logic gates, functional units,

computational elements, that you could think of people

the same way, so people are functional units.

And then you could think of organizational design

as a computer architecture problem.

And then it was like, oh, that’s super cool,

because the people are all different,

just like the computational elements are all different.

And they like to do different things.

And so I had a lot of fun reframing

how I think about organizations.

Just like with computers, we were saying execution paths,

you can have a lot of different paths that end up

at the same good destination.

So what have you learned about the human abstractions

from individual functional human units

to the broader organization?

What does it take to create something special?

Well, most people don’t think simple enough.

All right, so the difference between a recipe

and the understanding.

There’s probably a philosophical description of this.

So imagine you’re gonna make a loaf of bread.

The recipe says get some flour, add some water,

add some yeast, mix it up, let it rise,

put it in a pan, put it in the oven.

It’s a recipe.

Understanding bread, you can understand biology,

supply chains, grain grinders, yeast, physics,

thermodynamics, there’s so many levels of understanding.

And then when people build and design things,

they frequently are executing some stack of recipes.

And the problem with that is the recipes

all have limited scope.

Like if you have a really good recipe book

for making bread, it won’t tell you anything

about how to make an omelet.

But if you have a deep understanding of cooking,

right, than bread, omelets, you know, sandwich,

you know, there’s a different way of viewing everything.

And most people, when you get to be an expert at something,

you know, you’re hoping to achieve deeper understanding,

not just a large set of recipes to go execute.

And it’s interesting to walk groups of people

because executing recipes is unbelievably efficient

if it’s what you want to do.

If it’s not what you want to do, you’re really stuck.

And that difference is crucial.

And everybody has a balance of, let’s say,

deeper understanding of recipes.

And some people are really good at recognizing

when the problem is to understand something deeply.

Does that make sense?

It totally makes sense, does every stage of development,

deep understanding on the team needed?

Oh, this goes back to the art versus science question.


If you constantly unpack everything

for deeper understanding, you never get anything done.

And if you don’t unpack understanding when you need to,

you’ll do the wrong thing.

And then at every juncture, like human beings

are these really weird things because everything you tell them

has a million possible outputs, right?

And then they all interact in a hilarious way.

Yeah, it’s very interesting.

And then having some intuition about what you tell them,

what you do, when do you intervene, when do you not,

it’s complicated.

Right, so.

It’s essentially computationally unsolvable.

Yeah, it’s an intractable problem, sure.

Humans are a mess.

But with deep understanding,

do you mean also sort of fundamental questions

of things like what is a computer?

Or why, like the why questions,

why are we even building this, like of purpose?

Or do you mean more like going towards

the fundamental limits of physics,

sort of really getting into the core of the science?

In terms of building a computer, think a little simpler.

So common practice is you build a computer,

and then when somebody says, I wanna make it 10% faster,

you’ll go in and say, all right,

I need to make this buffer bigger,

and maybe I’ll add an add unit.

Or I have this thing that’s three instructions wide,

I’m gonna make it four instructions wide.

And what you see is each piece

gets incrementally more complicated, right?

And then at some point you hit this limit,

like adding another feature or buffer

doesn’t seem to make it any faster.

And then people will say,

well, that’s because it’s a fundamental limit.

And then somebody else will look at it and say,

well, actually the way you divided the problem up

and the way the different features are interacting

is limiting you, and it has to be rethought, rewritten.

So then you refactor it and rewrite it,

and what people commonly find is the rewrite

is not only faster, but half as complicated.

From scratch? Yes.

So how often in your career, but just have you seen

is needed, maybe more generally,

to just throw the whole thing out and start over?

This is where I’m on one end of it,

every three to five years.

Which end are you on?

Rewrite more often.

Rewrite, and three to five years is?

If you wanna really make a lot of progress

on computer architecture, every five years

you should do one from scratch.

So where does the x86.64 standard come in?

How often do you?

I was the coauthor of that spec in 98.

That’s 20 years ago.

Yeah, so that’s still around.

The instruction set itself has been extended

quite a few times.

And instruction sets are less interesting

than the implementation underneath.

There’s been, on x86 architecture, Intel’s designed a few,

AIM designed a few very different architectures.

And I don’t wanna go into too much of the detail

about how often, but there’s a tendency

to rewrite it every 10 years,

and it really should be every five.

So you’re saying you’re an outlier in that sense.

Rewrite more often.

Well, and here’s the problem.

Isn’t that scary?

Yeah, of course.

Well, scary to who?

To everybody involved, because like you said,

repeating the recipe is efficient.

Companies wanna make money.

No, individual engineers wanna succeed,

so you wanna incrementally improve,

increase the buffer from three to four.

Well, this is where you get

into the diminishing return curves.

I think Steve Jobs said this, right?

So every, you have a project, and you start here,

and it goes up, and you have diminishing return.

And to get to the next level, you have to do a new one,

and the initial starting point will be lower

than the old optimization point, but it’ll get higher.

So now you have two kinds of fear,

short term disaster and long term disaster.

And you’re, you’re haunted.

So grown ups, right, like, you know,

people with a quarter by quarter business objective

are terrified about changing everything.

And people who are trying to run a business

or build a computer for a long term objective

know that the short term limitations block them

from the long term success.

So if you look at leaders of companies

that had really good long term success,

every time they saw that they had to redo something, they did.

And so somebody has to speak up.

Or you do multiple projects in parallel,

like you optimize the old one while you build a new one.

But the marketing guys are always like,

make promise me that the new computer

is faster on every single thing.

And the computer architect says,

well, the new computer will be faster on the average,

but there’s a distribution of results and performance,

and you’ll have some outliers that are slower.

And that’s very hard,

because they have one customer who cares about that one.

So speaking of the long term, for over 50 years now,

Moore’s Law has served, for me and millions of others,

as an inspiring beacon of what kind of amazing future

brilliant engineers can build.


I’m just making your kids laugh all of today.

That was great.

So first, in your eyes, what is Moore’s Law,

if you could define for people who don’t know?

Well, the simple statement was, from Gordon Moore,

was double the number of transistors every two years.

Something like that.

And then my operational model is,

we increase the performance of computers

by two X every two or three years.

And it’s wiggled around substantially over time.

And also, in how we deliver, performance has changed.

But the foundational idea was

two X to transistors every two years.

The current cadence is something like,

they call it a shrink factor, like 0.6 every two years,

which is not 0.5.

But that’s referring strictly, again,

to the original definition of just.

A transistor count.

A shrink factor’s just getting them

smaller and smaller and smaller.

Well, it’s for a constant chip area.

If you make the transistors smaller by 0.6,

then you get one over 0.6 more transistors.

So can you linger on it a little longer?

What’s a broader, what do you think should be

the broader definition of Moore’s Law?

When you mentioned how you think of performance,

just broadly, what’s a good way to think about Moore’s Law?

Well, first of all, I’ve been aware

of Moore’s Law for 30 years.

In which sense?

Well, I’ve been designing computers for 40.

You’re just watching it before your eyes kind of thing.

And somewhere where I became aware of it,

I was also informed that Moore’s Law

was gonna die in 10 to 15 years.

And then I thought that was true at first.

But then after 10 years, it was gonna die in 10 to 15 years.

And then at one point, it was gonna die in five years.

And then it went back up to 10 years.

And at some point, I decided not to worry

about that particular prognostication

for the rest of my life, which is fun.

And then I joined Intel and everybody said

Moore’s Law is dead.

And I thought that’s sad,

because it’s the Moore’s Law company.

And it’s not dead.

And it’s always been gonna die.

And humans like these apocryphal kind of statements,

like we’ll run out of food, or we’ll run out of air,

or we’ll run out of room, or we’ll run out of something.

Right, but it’s still incredible

that it’s lived for as long as it has.

And yes, there’s many people who believe now

that Moore’s Law is dead.

You know, they can join the last 50 years

of people who had the same idea.

Yeah, there’s a long tradition.

But why do you think, if you can try to understand it,

why do you think it’s not dead?

Well, let’s just think, people think Moore’s Law

is one thing, transistors get smaller.

But actually, under the sheet,

there’s literally thousands of innovations.

And almost all those innovations

have their own diminishing return curves.

So if you graph it, it looks like a cascade

of diminishing return curves.

I don’t know what to call that.

But the result is an exponential curve.

Well, at least it has been.

So, and we keep inventing new things.

So if you’re an expert in one of the things

on a diminishing return curve, right,

and you can see it’s plateau,

you will probably tell people, well, this is done.

Meanwhile, some other pile of people

are doing something different.

So that’s just normal.

So then there’s the observation of

how small could a switching device be?

So a modern transistor is something like

a thousand by a thousand by a thousand atoms, right?

And you get quantum effects down around two to 10 atoms.

So you can imagine the transistor

as small as 10 by 10 by 10.

So that’s a million times smaller.

And then the quantum computational people

are working away at how to use quantum effects.


A thousand by a thousand by a thousand.


That’s a really clean way of putting it.

Well, a fan, like a modern transistor,

if you look at the fan, it’s like 120 atoms wide,

but we can make that thinner.

And then there’s a gate wrapped around it,

and then there’s spacing.

There’s a whole bunch of geometry.

And a competent transistor designer

could count both atoms in every single direction.

Like there’s techniques now to already put down atoms

in a single atomic layer.

And you can place atoms if you want to.

It’s just from a manufacturing process,

if placing an atom takes 10 minutes

and you need to put 10 to the 23rd atoms together

to make a computer, it would take a long time.

So the methods are both shrinking things

and then coming up with effective ways

to control what’s happening.

Manufacture stably and cheaply.


So the innovation stock’s pretty broad.

There’s equipment, there’s optics, there’s chemistry,

there’s physics, there’s material science,

there’s metallurgy, there’s lots of ideas

about when you put different materials together,

how do they interact, are they stable,

is it stable over temperature, like are they repeatable?

There’s like literally thousands of technologies involved.

But just for the shrinking, you don’t think

we’re quite yet close to the fundamental limits of physics?

I did a talk on Moore’s Law and I asked for a roadmap

to a path of 100 and after two weeks,

they said we only got to 50.

100 what, sorry?

100 X shrink.

100 X shrink?

We only got to 50.

And I said, why don’t you give it another two weeks?

Well, here’s the thing about Moore’s Law, right?

So I believe that the next 10 or 20 years

of shrinking is gonna happen, right?

Now, as a computer designer, you have two stances.

You think it’s going to shrink, in which case

you’re designing and thinking about architecture

in a way that you’ll use more transistors.

Or conversely, not be swamped by the complexity

of all the transistors you get, right?

You have to have a strategy, you know?

So you’re open to the possibility and waiting

for the possibility of a whole new army

of transistors ready to work.

I’m expecting more transistors every two or three years

by a number large enough that how you think about design,

how you think about architecture has to change.

Like, imagine you build buildings out of bricks,

and every year the bricks are half the size,

or every two years.

Well, if you kept building bricks the same way,

so many bricks per person per day,

the amount of time to build a building

would go up exponentially, right?

But if you said, I know that’s coming,

so now I’m gonna design equipment that moves bricks faster,

uses them better, because maybe you’re getting something

out of the smaller bricks, more strength, thinner walls,

you know, less material, efficiency out of that.

So once you have a roadmap with what’s gonna happen,

transistors, we’re gonna get more of them,

then you design all this collateral around it

to take advantage of it, and also to cope with it.

Like, that’s the thing people don’t understand.

It’s like, if I didn’t believe in Moore’s Law,

and then Moore’s Law transistors showed up,

my design teams would all drown.

So what’s the hardest part of this inflow

of new transistors?

I mean, even if you just look historically,

throughout your career, what’s the thing,

what fundamentally changes when you add more transistors

in the task of designing an architecture?

Well, there’s two constants, right?

One is people don’t get smarter.

By the way, there’s some science showing

that we do get smarter because of nutrition or whatever.

Sorry to bring that up.

Blend effect.


Yeah, I’m familiar with it.

Nobody understands it, nobody knows if it’s still going on.

So that’s a…

Or whether it’s real or not.

But yeah, it’s a…

I sort of…

Anyway, but not exponentially.

I would believe for the most part,

people aren’t getting much smarter.

The evidence doesn’t support it, that’s right.

And then teams can’t grow that much.


Right, so human beings, you know,

we’re really good in teams of 10,

you know, up to teams of 100, they can know each other.

Beyond that, you have to have organizational boundaries.

So you’re kind of, you have,

those are pretty hard constraints, right?

So then you have to divide and conquer,

like as the designs get bigger,

you have to divide it into pieces.

You know, the power of abstraction layers is really high.

We used to build computers out of transistors.

Now we have a team that turns transistors into logic cells

and another team that turns them into functional units,

another one that turns them into computers, right?

So we have abstraction layers in there

and you have to think about when do you shift gears on that.

We also use faster computers to build faster computers.

So some algorithms run twice as fast on new computers,

but a lot of algorithms are N squared.

So, you know, a computer with twice as many transistors

and it might take four times as long to run.

So you have to refactor the software.

Like simply using faster computers

to build bigger computers doesn’t work.

So you have to think about all these things.

So in terms of computing performance

and the exciting possibility

that more powerful computers bring,

is shrinking the thing which you’ve been talking about,

for you, one of the biggest exciting possibilities

of advancement in performance?

Or is there other directions that you’re interested in,

like in the direction of sort of enforcing given parallelism

or like doing massive parallelism

in terms of many, many CPUs,

you know, stacking CPUs on top of each other,

that kind of parallelism or any kind of parallelism?

Well, think about it a different way.

So old computers, you know, slow computers,

you said A equal B plus C times D, pretty simple, right?

And then we made faster computers with vector units

and you can do proper equations and matrices, right?

And then modern like AI computations

or like convolutional neural networks,

where you convolve one large data set against another.

And so there’s sort of this hierarchy of mathematics,

you know, from simple equation to linear equations,

to matrix equations, to deeper kind of computation.

And the data sets are getting so big

that people are thinking of data as a topology problem.

You know, data is organized in some immense shape.

And then the computation, which sort of wants to be,

get data from immense shape and do some computation on it.

So what computers have allowed people to do

is have algorithms go much, much further.

So that paper you reference, the Sutton paper,

they talked about, you know, like when AI started,

it was apply rule sets to something.

That’s a very simple computational situation.

And then when they did first chess thing,

they solved deep searches.

So have a huge database of moves and results, deep search,

but it’s still just a search, right?

Now we take large numbers of images

and we use it to train these weight sets

that we convolve across.

It’s a completely different kind of phenomena.

We call that AI.

Now they’re doing the next generation.

And if you look at it,

they’re going up this mathematical graph, right?

And then computations, both computation and data sets

support going up that graph.

Yeah, the kind of computation that might,

I mean, I would argue that all of it is still a search,


Just like you said, a topology problem with data sets,

you’re searching the data sets for valuable data

and also the actual optimization of neural networks

is a kind of search for the…

I don’t know, if you had looked at the interlayers

of finding a cat, it’s not a search.

It’s a set of endless projections.

So, you know, a projection,

here’s a shadow of this phone, right?

And then you can have a shadow of that on the something

and a shadow on that of something.

And if you look in the layers, you’ll see

this layer actually describes pointy ears

and round eyeness and fuzziness.

But the computation to tease out the attributes

is not search.

Like the inference part might be search,

but the training’s not search.

And then in deep networks, they look at layers

and they don’t even know it’s represented.

And yet, if you take the layers out, it doesn’t work.

So I don’t think it’s search.

But you’d have to talk to a mathematician

about what that actually is.

Well, we could disagree, but it’s just semantics,

I think, it’s not, but it’s certainly not…

I would say it’s absolutely not semantics, but…

Okay, all right, well, if you want to go there.

So optimization to me is search,

and we’re trying to optimize the ability

of a neural network to detect cat ears.

And the difference between chess and the space,

the incredibly multidimensional,

100,000 dimensional space that neural networks

are trying to optimize over is nothing like

the chessboard database.

So it’s a totally different kind of thing.

And okay, in that sense, you can say it loses the meaning.

I can see how you might say, if you…

The funny thing is, it’s the difference

between given search space and found search space.

Right, exactly.

Yeah, maybe that’s a different way to describe it.

That’s a beautiful way to put it, okay.

But you’re saying, what’s your sense

in terms of the basic mathematical operations

and the architectures, computer hardware

that enables those operations?

Do you see the CPUs of today still being

a really core part of executing

those mathematical operations?


Well, the operations continue to be add, subtract,

load, store, compare, and branch.

It’s remarkable.

So it’s interesting, the building blocks

of computers or transistors under that atoms.

So you got atoms, transistors, logic gates, computers,

functional units of computers.

The building blocks of mathematics at some level

are things like adds and subtracts and multiplies,

but the space mathematics can describe

is, I think, essentially infinite.

But the computers that run the algorithms

are still doing the same things.

Now, a given algorithm might say, I need sparse data,

or I need 32 bit data, or I need, you know,

like a convolution operation that naturally takes

eight bit data, multiplies it, and sums it up a certain way.

So like the data types in TensorFlow

imply an optimization set.

But when you go right down and look at the computers,

it’s and and or gates doing adds and multiplies.

Like that hasn’t changed much.

Now, the quantum researchers think

they’re going to change that radically,

and then there’s people who think about analog computing

because you look in the brain, and it

seems to be more analogish.

You know, that maybe there’s a way to do that more


But we have a million X on computation,

and I don’t know the relationship

between computational, let’s say,

intensity and ability to hit mathematical abstractions.

I don’t know any way to describe that, but just like you saw

in AI, you went from rule sets to simple search

to complex search to, say, found search.

Like those are orders of magnitude more computation

to do.

And as we get the next two orders of magnitude,

like a friend, Roger Gaduri, said,

like every order of magnitude changes the computation.

Fundamentally changes what the computation is doing.


Oh, you know the expression the difference in quantity

is the difference in kind.

You know, the difference between ant and anthill, right?

Or neuron and brain.

You know, there’s this indefinable place

where the quantity changed the quality, right?

And we’ve seen that happen in mathematics multiple times,

and you know, my guess is it’s going to keep happening.

So your sense is, yeah, if you focus head down

and shrinking the transistor.

Well, it’s not just head down, we’re aware of the software

stacks that are running in the computational loads,

and we’re kind of pondering what do you

do with a petabyte of memory that wants

to be accessed in a sparse way and have, you know,

the kind of calculations AI programmers want.

So there’s a dialogue interaction,

but when you go in the computer chip,

you know, you find adders and subtractors and multipliers.

So if you zoom out then with, as you mentioned very sudden,

the idea that most of the development in the last many

decades in AI research came from just leveraging computation

and just simple algorithms waiting for the computation

to improve.

Well, software guys have a thing that they call it

the problem of early optimization.

So you write a big software stack,

and if you start optimizing like the first thing you write,

the odds of that being the performance limiter is low.

But when you get the whole thing working,

can you make it 2x faster by optimizing the right things?


While you’re optimizing that, could you

have written a new software stack, which

would have been a better choice?


Now you have creative tension.


But the whole time as you’re doing the writing,

that’s the software we’re talking about.

The hardware underneath gets faster and faster.

Well, this goes back to the Moore’s law.

If Moore’s law is going to continue, then your AI research

should expect that to show up, and then you

make a slightly different set of choices then.

We’ve hit the wall.

Nothing’s going to happen.

And from here, it’s just us rewriting algorithms.

That seems like a failed strategy for the last 30

years of Moore’s law’s death.

So can you just linger on it?

I think you’ve answered it, but I’ll just

ask the same dumb question over and over.

So why do you think Moore’s law is not going to die?

Which is the most promising, exciting possibility

of why it won’t die in the next 5, 10 years?

So is it the continued shrinking of the transistor,

or is it another S curve that steps in and it totally sort

of matches up?

Shrinking the transistor is literally

thousands of innovations.

Right, so there’s stacks of S curves in there.

There’s a whole bunch of S curves just kind

of running their course and being reinvented

and new things.

The semiconductor fabricators and technologists have all

announced what’s called nanowires.

So they took a fan, which had a gate around it,

and turned that into little wires

so you have better control of that, and they’re smaller.

And then from there, there are some obvious steps

about how to shrink that.

The metallurgy around wire stacks and stuff

has very obvious abilities to shrink.

And there’s a whole combination of things there to do.

Your sense is that we’re going to get a lot

if this innovation performed just that, shrinking.

Yeah, like a factor of 100 is a lot.

Yeah, I would say that’s incredible.

And it’s totally unknown.

It’s only 10 or 15 years.

Now, you’re smarter, you might know,

but to me it’s totally unpredictable

of what that 100x would bring in terms

of the nature of the computation that people would be.

Yeah, are you familiar with Bell’s law?

So for a long time, it was mainframes, minis, workstation,

PC, mobile.

Moore’s law drove faster, smaller computers.

And then when we were thinking about Moore’s law,

Rajagaduri said, every 10x generates a new computation.

So scalar, vector, matrix, topological computation.

And if you go look at the industry trends,

there was mainframes, and then minicomputers, and then PCs,

and then the internet took off.

And then we got mobile devices.

And now we’re building 5G wireless

with one millisecond latency.

And people are starting to think about the smart world

where everything knows you, recognizes you.

The transformations are going to be unpredictable.

How does it make you feel that you’re

one of the key architects of this kind of future?

So we’re not talking about the architects

of the high level people who build the Angry Bird apps,

and Snapchat.

Angry Bird apps.

Who knows?

Maybe that’s the whole point of the universe.

I’m going to take a stand at that,

and the attention distracting nature of mobile phones.

I’ll take a stand.

But anyway, in terms of the side effects of smartphones,

or the attention distraction, which part?

Well, who knows where this is all leading?

It’s changing so fast.

My parents used to yell at my sisters

for hiding in the closet with a wired phone with a dial on it.

Stop talking to your friends all day.

Now my wife yells at my kids for talking to their friends

all day on text.

It looks the same to me.

It’s always echoes of the same thing.

But you are one of the key people

architecting the hardware of this future.

How does that make you feel?

Do you feel responsible?

Do you feel excited?

So we’re in a social context.

So there’s billions of people on this planet.

There are literally millions of people working on technology.

I feel lucky to be doing what I do and getting paid for it,

and there’s an interest in it.

But there’s so many things going on in parallel.

The actions are so unpredictable.

If I wasn’t here, somebody else would do it.

The vectors of all these different things

are happening all the time.

You know, there’s a, I’m sure, some philosopher

or metaphilosopher is wondering about how

we transform our world.

So you can’t deny the fact that these tools are

changing our world.

That’s right.

Do you think it’s changing for the better?

I read this thing recently.

It said the two disciplines with the highest GRE scores in college

are physics and philosophy.

And they’re both sort of trying to answer the question,

why is there anything?

And the philosophers are on the kind of theological side,

and the physicists are obviously on the material side.

And there’s 100 billion galaxies with 100 billion stars.

It seems, well, repetitive at best.

So you know, there’s on our way to 10 billion people.

I mean, it’s hard to say what it’s all for,

if that’s what you’re asking.

Yeah, I guess I am.

Things do tend to significantly increase in complexity.

And I’m curious about how computation,

like our physical world inherently

generates mathematics.

It’s kind of obvious, right?

So we have x, y, z coordinates.

You take a sphere, you make it bigger.

You get a surface that grows by r squared.

Like, it generally generates mathematics.

And the mathematicians and the physicists

have been having a lot of fun talking to each other for years.

And computation has been, let’s say, relatively pedestrian.

Like, computation in terms of mathematics

has been doing binary algebra, while those guys have

been gallivanting through the other realms of possibility.

Now recently, the computation lets

you do mathematical computations that

are sophisticated enough that nobody understands

how the answers came out.

Machine learning.

It used to be you get data set, you guess at a function.

The function is considered physics

if it’s predictive of new functions, new data sets.

Modern, you can take a large data set

with no intuition about what it is

and use machine learning to find a pattern that

has no function, right?

And it can arrive at results that I

don’t know if they’re completely mathematically describable.

So computation has kind of done something interesting compared

to a equal b plus c.

There’s something reminiscent of that step

from the basic operations of addition

to taking a step towards neural networks that’s

reminiscent of what life on Earth at its origins was doing.

Do you think we’re creating sort of the next step

in our evolution in creating artificial intelligence

systems that will?

I don’t know.

I mean, there’s so much in the universe already,

it’s hard to say.

Where we stand in this whole thing.

Are human beings working on additional abstraction

layers and possibilities?

Yeah, it appears so.

Does that mean that human beings don’t need dogs?

You know, no.

Like, there’s so many things that

are all simultaneously interesting and useful.

Well, you’ve seen, throughout your career,

you’ve seen greater and greater level abstractions built

in artificial machines, right?

Do you think, when you look at humans,

do you think that the look of all life on Earth

is a single organism building this thing,

this machine with greater and greater levels of abstraction?

Do you think humans are the peak,

the top of the food chain in this long arc of history

on Earth?

Or do you think we’re just somewhere in the middle?

Are we the basic functional operations of a CPU?

Are we the C++ program, the Python program,

or the neural network?

Like, somebody’s, you know, people

have calculated, like, how many operations does the brain do?

Something, you know, I’ve seen the number 10 to the 18th

a bunch of times, arrive different ways.

So could you make a computer that

did 10 to the 20th operations?



Do you think?

We’re going to do that.

Now, is there something magical about how brains compute things?

I don’t know.

You know, my personal experience is interesting,

because, you know, you think you know how you think,

and then you have all these ideas,

and you can’t figure out how they happened.

And if you meditate, you know, what you can be aware of

is interesting.

So I don’t know if brains are magical or not.

You know, the physical evidence says no.

Lots of people’s personal experience says yes.

So what would be funny is if brains are magical,

and yet we can make brains with more computation.

You know, I don’t know what to say about that.

But do you think magic is an emergent phenomena?

Could be.

I have no explanation for it.

Let me ask Jim Keller of what in your view is consciousness?

With consciousness?

Yeah, like what, you know, consciousness, love,

things that are these deeply human things that

seems to emerge from our brain, is that something

that we’ll be able to make encode in chips that get

faster and faster and faster and faster?

That’s like a 10 hour conversation.

Nobody really knows.

Can you summarize it in a couple of sentences?

Many people have observed that organisms run

at lots of different levels, right?

If you had two neurons, somebody said

you’d have one sensory neuron and one motor neuron, right?

So we move towards things and away from things.

And we have physical integrity and safety or not, right?

And then if you look at the animal kingdom,

you can see brains that are a little more complicated.

And at some point, there’s a planning system.

And then there’s an emotional system

that’s happy about being safe or unhappy about being threatened.

And then our brains have massive numbers of structures,

like planning and movement and thinking and feeling

and drives and emotions.

And we seem to have multiple layers of thinking systems.

And we have a dream system that nobody understands whatsoever,

which I find completely hilarious.

And you can think in a way that those systems are

more independent.

And you can observe the different parts of yourself

can observe them.

I don’t know which one’s magical.

I don’t know which one’s not computational.


Is it possible that it’s all computation?


Is there a limit to computation?

I don’t think so.

Do you think the universe is a computer?

It seems to be.

It’s a weird kind of computer.

Because if it was a computer, like when

they do calculations on how much calculation

it takes to describe quantum effects, it’s unbelievably high.

So if it was a computer, wouldn’t you

have built it out of something that was easier to compute?

That’s a funny system.

But then the simulation guys pointed out

that the rules are kind of interesting.

When you look really close, it’s uncertain.

And the speed of light says you can only look so far.

And things can’t be simultaneous,

except for the odd entanglement problem where they seem to be.

The rules are all kind of weird.

And somebody said physics is like having

50 equations with 50 variables to define 50 variables.

Physics itself has been a shit show for thousands of years.

It seems odd when you get to the corners of everything.

It’s either uncomputable or undefinable or uncertain.

It’s almost like the designers of the simulation

are trying to prevent us from understanding it perfectly.

But also, the things that require calculations

require so much calculation that our idea

of the universe of a computer is absurd,

because every single little bit of it

takes all the computation in the universe to figure out.

So that’s a weird kind of computer.

You say the simulation is running

in a computer, which has, by definition, infinite computation.

Not infinite.

Oh, you mean if the universe is infinite?


Well, every little piece of our universe

seems to take infinite computation to figure out.

Not infinite, just a lot.

Well, a lot.

Some pretty big number.

Compute this little teeny spot takes all the mass

in the local one light year by one light year space.

It’s close enough to infinite.

Well, it’s a heck of a computer if it is one.

I know.

It’s a weird description, because the simulation

description seems to break when you look closely at it.

But the rules of the universe seem to imply something’s up.

That seems a little arbitrary.

The universe, the whole thing, the laws of physics,

it just seems like, how did it come out to be the way it is?

Well, lots of people talk about that.

Like I said, the two smartest groups of humans

are working on the same problem.

From different aspects.

And they’re both complete failures.

So that’s kind of cool.

They might succeed eventually.

Well, after 2,000 years, the trend isn’t good.

Oh, 2,000 years is nothing in the span

of the history of the universe.

That’s for sure.

We have some time.

But the next 1,000 years doesn’t look good either.

That’s what everybody says at every stage.

But with Moore’s law, as you’ve just described,

not being dead, the exponential growth of technology,

the future seems pretty incredible.

Well, it’ll be interesting, that’s for sure.

That’s right.

So what are your thoughts on Ray Kurzweil’s sense

that exponential improvement in technology

will continue indefinitely?

Is that how you see Moore’s law?

Do you see Moore’s law more broadly,

in the sense that technology of all kinds

has a way of stacking S curves on top of each other,

where it’ll be exponential, and then we’ll see all kinds of…

What does an exponential of a million mean?

That’s a pretty amazing number.

And that’s just for a local little piece of silicon.

Now let’s imagine you, say, decided

to get 1,000 tons of silicon to collaborate in one computer

at a million times the density.

Now you’re talking, I don’t know, 10 to the 20th more

computation power than our current, already unbelievably

fast computers.

Nobody knows what that’s going to mean.

The sci fi guys call it computronium,

like when a local civilization turns the nearby star

into a computer.

I don’t know if that’s true, but…

So just even when you shrink a transistor, the…

That’s only one dimension.

The ripple effects of that.

People tend to think about computers as a cost problem.

So computers are made out of silicon and minor amounts

of metals and this and that.

None of those things cost any money.

There’s plenty of sand.

You could just turn the beach and a little bit of ocean water

into computers.

So all the cost is in the equipment to do it.

And the trend on equipment is once you

figure out how to build the equipment,

the trend of cost is zero.

Elon said, first you figure out what

configuration you want the atoms in,

and then how to put them there.

His great insight is people are how constrained.

I have this thing, I know how it works,

and then little tweaks to that will generate something,

as opposed to what do I actually want,

and then figure out how to build it.

It’s a very different mindset.

And almost nobody has it, obviously.

Well, let me ask on that topic,

you were one of the key early people

in the development of autopilot, at least in the hardware

side, Elon Musk believes that autopilot

and vehicle autonomy, if you just look at that problem,

can follow this kind of exponential improvement.

In terms of the how question that we’re talking about,

there’s no reason why you can’t.

What are your thoughts on this particular space

of vehicle autonomy, and your part of it

and Elon Musk’s and Tesla’s vision for vehicle autonomy?

Well, the computer you need to build is straightforward.

And you could argue, well, does it need to be

two times faster or five times or 10 times?

But that’s just a matter of time or price in the short run.

So that’s not a big deal.

You don’t have to be especially smart to drive a car.

So it’s not like a super hard problem.

I mean, the big problem with safety is attention,

which computers are really good at, not skills.

Well, let me push back on one.

You see, everything you said is correct,

but we as humans tend to take for granted

how incredible our vision system is.

So you can drive a car with 20, 50 vision,

and you can train a neural network to extract

the distance of any object in the shape of any surface

from a video and data.

Yeah, but that’s really simple.

No, it’s not simple.

That’s a simple data problem.

It’s not, it’s not simple.

It’s because it’s not just detecting objects,

it’s understanding the scene,

and it’s being able to do it in a way

that doesn’t make errors.

So the beautiful thing about the human vision system

and our entire brain around the whole thing

is we’re able to fill in the gaps.

It’s not just about perfectly detecting cars.

It’s inferring the occluded cars.

It’s trying to, it’s understanding the physics.

I think that’s mostly a data problem.

So you think what data would compute

with improvement of computation

with improvement in collection of data?

Well, there is a, you know, when you’re driving a car

and somebody cuts you off, your brain has theories

about why they did it.

You know, they’re a bad person, they’re distracted,

they’re dumb, you know, you can listen to yourself, right?

So, you know, if you think that narrative is important

to be able to successfully drive a car,

then current autopilot systems can’t do it.

But if cars are ballistic things with tracks

and probabilistic changes of speed and direction,

and roads are fixed and given, by the way,

they don’t change dynamically, right?

You can map the world really thoroughly.

You can place every object really thoroughly.

Right, you can calculate trajectories

of things really thoroughly, right?

But everything you said about really thoroughly

has a different degree of difficulty, so.

And you could say at some point,

computer autonomous systems will be way better

at things that humans are lousy at.

Like, they’ll be better at attention,

they’ll always remember there was a pothole in the road

that humans keep forgetting about,

they’ll remember that this set of roads

has these weirdo lines on it

that the computers figured out once,

and especially if they get updates,

so if somebody changes a given,

like, the key to robots and stuff somebody said

is to maximize the givens, right?


So having a robot pick up this bottle cap

is way easier if you put a red dot on the top,

because then you’ll have to figure out,

and if you wanna do a certain thing with it,

maximize the givens is the thing.

And autonomous systems are happily maximizing the givens.

Like, humans, when you drive someplace new,

you remember it, because you’re processing it

the whole time, and after the 50th time you drove to work,

you get to work, you don’t know how you got there, right?

You’re on autopilot, right?

Autonomous cars are always on autopilot.

But the cars have no theories about why they got cut off,

or why they’re in traffic.

So they also never stop paying attention.

Right, so I tend to believe you do have to have theories,

meta models of other people,

especially with pedestrian cyclists,

but also with other cars.

So everything you said is actually essential to driving.

Driving is a lot more complicated than people realize,

I think, so to push back slightly, but to…

So to cut into traffic, right?


You can’t just wait for a gap,

you have to be somewhat aggressive.

You’ll be surprised how simple a calculation for that is.

I may be on that particular point,

but there’s, maybe I actually have to push back.

I would be surprised.

You know what, yeah, I’ll just say where I stand.

I would be very surprised,

but I think you might be surprised how complicated it is.

I tell people, progress disappoints in the short run,

and surprises in the long run.

It’s very possible, yeah.

I suspect in 10 years it’ll be just taken for granted.

Yeah, probably.

But you’re probably right, not look like…

It’s gonna be a $50 solution that nobody cares about.

It’s like GPSes, like, wow, GPSes.

We have satellites in space

that tell you where your location is.

It was a really big deal, now everything has a GPS in it.

Yeah, that’s true, but I do think that systems

that involve human behavior are more complicated

than we give them credit for.

So we can do incredible things with technology

that don’t involve humans, but when you…

I think humans are less complicated than people.

You know, frequently ascribed.

Maybe I feel…

We tend to operate out of large numbers of patterns

and just keep doing it over and over.

But I can’t trust you because you’re a human.

That’s something a human would say.

But my hope is on the point you’ve made is,

even if, no matter who’s right,

I’m hoping that there’s a lot of things

that humans aren’t good at

that machines are definitely good at,

like you said, attention and things like that.

Well, they’ll be so much better

that the overall picture of safety and autonomy

will be, obviously cars will be safer,

even if they’re not as good at understanding.

I’m a big believer in safety.

I mean, there are already the current safety systems,

like cruise control that doesn’t let you run into people

and lane keeping.

There are so many features

that you just look at the parade of accidents

and knocking off like 80% of them is super doable.

Just to linger on the autopilot team

and the efforts there,

it seems to be that there’s a very intense scrutiny

by the media and the public in terms of safety,

the pressure, the bar put before autonomous vehicles.

What are your, sort of as a person there

working on the hardware and trying to build a system

that builds a safe vehicle and so on,

what was your sense about that pressure?

Is it unfair?

Is it expected of new technology?

Yeah, it seems reasonable.

I was interested, I talked to both American

and European regulators,

and I was worried that the regulations

would write into the rules technology solutions,

like modern brake systems imply hydraulic brakes.

So if you read the regulations,

to meet the letter of the law for brakes,

it sort of has to be hydraulic, right?

And the regulator said they’re interested in the use cases,

like a head on crash, an offset crash,

don’t hit pedestrians, don’t run into people,

don’t leave the road, don’t run a red light or a stoplight.

They were very much into the scenarios.

And they had all the data about which scenarios

injured or killed the most people.

And for the most part, those conversations were like,

what’s the right thing to do to take the next step?

Now, Elon’s very interested also in the benefits

of autonomous driving or freeing people’s time

and attention, as well as safety.

And I think that’s also an interesting thing,

but building autonomous systems so they’re safe

and safer than people seemed,

since the goal is to be 10X safer than people,

having the bar to be safer than people

and scrutinizing accidents seems philosophically correct.

So I think that’s a good thing.

What are, is different than the things you worked at,

Intel, AMD, Apple, with autopilot chip design

and hardware design, what are interesting

or challenging aspects of building this specialized

kind of computing system in the automotive space?

I mean, there’s two tricks to building

like an automotive computer.

One is the software team, the machine learning team

is developing algorithms that are changing fast.

So as you’re building the accelerator,

you have this, you know, worry or intuition

that the algorithms will change enough

that the accelerator will be the wrong one, right?

And there’s the generic thing, which is,

if you build a really good general purpose computer,

say its performance is one, and then GPU guys

will deliver about 5X to performance

for the same amount of silicon,

because instead of discovering parallelism,

you’re given parallelism.

And then special accelerators get another two to 5X

on top of a GPU, because you say,

I know the math is always eight bit integers

into 32 bit accumulators, and the operations

are the subset of mathematical possibilities.

So AI accelerators have a claimed performance benefit

over GPUs because in the narrow math space,

you’re nailing the algorithm.

Now, you still try to make it programmable,

but the AI field is changing really fast.

So there’s a, you know, there’s a little

creative tension there of, I want the acceleration

afforded by specialization without being over specialized

so that the new algorithm is so much more effective

that you’d have been better off on a GPU.

So there’s a tension there.

To build a good computer for an application

like automotive, there’s all kinds of sensor inputs

and safety processors and a bunch of stuff.

So one of Elon’s goals is to make it super affordable.

So every car gets an autopilot computer.

So some of the recent startups you look at,

and they have a server in the trunk,

because they’re saying, I’m gonna build

this autopilot computer, replaces the driver.

So their cost budget’s 10 or $20,000.

And Elon’s constraint was, I’m gonna put one in every car,

whether people buy autonomous driving or not.

So the cost constraint he had in mind was great, right?

And to hit that, you had to think about the system design.

That’s complicated, and it’s fun.

You know, it’s like, it’s like, it’s craftsman’s work.

Like, you know, a violin maker, right?

You can say, Stradivarius is this incredible thing,

the musicians are incredible.

But the guy making the violin, you know,

picked wood and sanded it, and then he cut it,

you know, and he glued it, you know,

and he waited for the right day

so that when he put the finish on it,

it didn’t, you know, do something dumb.

That’s craftsman’s work, right?

You may be a genius craftsman

because you have the best techniques

and you discover a new one,

but most engineers, craftsman’s work.

And humans really like to do that.

You know the expression?

Smart humans.

No, everybody.

All humans.

I don’t know.

I used to, I dug ditches when I was in college.

I got really good at it.




Digging ditches is also craftsman’s work.

Yeah, of course.

So there’s an expression called complex mastery behavior.

So when you’re learning something,

that’s fine, because you’re learning something.

When you do something, it’s relatively simple.

It’s not that satisfying.

But if the steps that you have to do are complicated

and you’re good at them, it’s satisfying to do them.

And then if you’re intrigued by it all,

as you’re doing them, you sometimes learn new things

that you can raise your game.

But craftsman’s work is good.

And engineers, like engineering is complicated enough

that you have to learn a lot of skills.

And then a lot of what you do is then craftsman’s work,

which is fun.

Autonomous driving, building a very resource

constrained computer.

So a computer has to be cheap enough

to put in every single car.

That essentially boils down to craftsman’s work.

It’s engineering, it’s innovation.

Yeah, you know, there’s thoughtful decisions

and problems to solve and trade offs to make.

Do you need 10 camera and ports or eight?

You know, you’re building for the current car

or the next one.

You know, how do you do the safety stuff?

You know, there’s a whole bunch of details.

But it’s fun.

It’s not like I’m building a new type of neural network,

which has a new mathematics and a new computer to work.

You know, that’s like, there’s more invention than that.

But the rejection to practice,

once you pick the architecture, you look inside

and what do you see?

Adders and multipliers and memories and, you know,

the basics.

So computers is always this weird set of abstraction layers

of ideas and thinking that reduction to practice

is transistors and wires and, you know, pretty basic stuff.

And that’s an interesting phenomenon.

By the way, like factory work,

like lots of people think factory work

is road assembly stuff.

I’ve been on the assembly line.

Like the people who work there really like it.

It’s a really great job.

It’s really complicated.

Putting cars together is hard, right?

And the car is moving and the parts are moving

and sometimes the parts are damaged

and you have to coordinate putting all the stuff together

and people are good at it.

They’re good at it.

And I remember one day I went to work

and the line was shut down for some reason

and some of the guys sitting around were really bummed

because they had reorganized a bunch of stuff

and they were gonna hit a new record

for the number of cars built that day.

And they were all gung ho to do it.

And these were big, tough buggers.

And, you know, but what they did was complicated

and you couldn’t do it.

Yeah, and I mean.

Well, after a while you could,

but you’d have to work your way up

because, you know, like putting the bright,

what’s called the brights, the trim on a car

on a moving assembly line

where it has to be attached 25 places

in a minute and a half is unbelievably complicated.

And human beings can do it, it’s really good.

I think that’s harder than driving a car, by the way.

Putting together, working at a.

Working on a factory.

Two smart people can disagree.


I think driving a car.

We’ll get you in the factory someday

and then we’ll see how you do.

No, not for us humans driving a car is easy.

I’m saying building a machine that drives a car

is not easy.

No, okay.


Driving a car is easy for humans

because we’ve been evolving for billions of years.

Drive cars.

Yeah, I noticed that.

The pale of the cars are super cool.

No, now you join the rest of the internet

and mocking me.


I wasn’t mocking, I was just.

Yeah, yeah.

Intrigued by your anthropology.

Yeah, it’s.

I’ll have to go dig into that.

There’s some inaccuracies there, yes.

Okay, but in general,

what have you learned in terms of

thinking about passion, craftsmanship,

tension, chaos.


The whole mess of it.

What have you learned, have taken away from your time

working with Elon Musk, working at Tesla,

which is known to be a place of chaos innovation,

craftsmanship, and all of those things.

I really like the way you thought.

You think you have an understanding

about what first principles of something is,

and then you talk to Elon about it,

and you didn’t scratch the surface.

He has a deep belief that no matter what you do,

it’s a local maximum, right?

And I had a friend, he invented a better electric motor,

and it was a lot better than what we were using.

And one day he came by, he said,

I’m a little disappointed, because this is really great,

and you didn’t seem that impressed.

And I said, when the super intelligent aliens come,

are they going to be looking for you?

Like, where is he?

The guy who built the motor.


Probably not.

You know, like, but doing interesting work

that’s both innovative and, let’s say,

craftsman’s work on the current thing

is really satisfying, and it’s good.

And that’s cool.

And then Elon was good at taking everything apart,

and like, what’s the deep first principle?

Oh, no, what’s really, no, what’s really?

You know, that ability to look at it without assumptions

and how constraints is super wild.

You know, he built a rocket ship, and an electric car,

and you know, everything.

And that’s super fun, and he’s into it, too.

Like, when they first landed two SpaceX rockets at Tesla,

we had a video projector in the big room,

and like, 500 people came down,

and when they landed, everybody cheered,

and some people cried.

It was so cool.

All right, but how did you do that?

Well, it was super hard, and then people say,

well, it’s chaotic, really?

To get out of all your assumptions,

you think that’s not gonna be unbelievably painful?

And is Elon tough?

Yeah, probably.

Do people look back on it and say,

boy, I’m really happy I had that experience

to go take apart that many layers of assumptions?

Sometimes super fun, sometimes painful.

So it could be emotionally and intellectually painful,

that whole process of just stripping away assumptions.

Yeah, imagine 99% of your thought process

is protecting your self conception,

and 98% of that’s wrong.

Now you got the math right.

How do you think you’re feeling

when you get back into that one bit that’s useful,

and now you’re open,

and you have the ability to do something different?

I don’t know if I got the math right.

It might be 99.9, but it ain’t 50.

Imagining it, the 50% is hard enough.

Now, for a long time, I’ve suspected you could get better.

Like you can think better, you can think more clearly,

you can take things apart.

And there’s lots of examples of that, people who do that.

And Nilan is an example of that, you are an example.

I don’t know if I am, I’m fun to talk to.


I’ve learned a lot of stuff.

Well, here’s the other thing, I joke, like I read books,

and people think, oh, you read books.

Well, no, I’ve read a couple of books a week for 55 years.

Well, maybe 50,

because I didn’t learn to read until I was age or something.

And it turns out when people write books,

they often take 20 years of their life

where they passionately did something,

reduce it to 200 pages.

That’s kind of fun.

And then you go online,

and you can find out who wrote the best books

and who liked, you know, that’s kind of wild.

So there’s this wild selection process,

and then you can read it,

and for the most part, understand it.

And then you can go apply it.

Like I went to one company,

I thought, I haven’t managed much before.

So I read 20 management books,

and I started talking to them,

and basically compared to all the VPs running around,

I’d read 19 more management books than anybody else.

It wasn’t even that hard.

And half the stuff worked, like first time.

It wasn’t even rocket science.

But at the core of that is questioning the assumptions,

or sort of entering the thinking,

first principles thinking,

sort of looking at the reality of the situation,

and using that knowledge, applying that knowledge.

So that’s.

So I would say my brain has this idea

that you can question first assumptions.

But I can go days at a time and forget that,

and you have to kind of like circle back that observation.

Because it is emotionally challenging.

Well, it’s hard to just keep it front and center,

because you operate on so many levels all the time,

and getting this done takes priority,

or being happy takes priority,

or screwing around takes priority.

Like how you go through life is complicated.

And then you remember, oh yeah,

I could really think first principles.

Oh shit, that’s tiring.

But you do for a while, and that’s kind of cool.

So just as a last question in your sense,

from the big picture, from the first principles,

do you think, you kind of answered it already,

but do you think autonomous driving is something

we can solve on a timeline of years?

So one, two, three, five, 10 years,

as opposed to a century?

Yeah, definitely.

Just to linger on it a little longer,

where’s the confidence coming from?

Is it the fundamentals of the problem,

the fundamentals of building the hardware and the software?

As a computational problem, understanding ballistics,

roles, topography, it seems pretty solvable.

And you can see this, like speech recognition,

for a long time people are doing frequency

and domain analysis, and all kinds of stuff,

and that didn’t work at all, right?

And then they did deep learning about it,

and it worked great.

And it took multiple iterations.

And autonomous driving is way past

the frequency analysis point.

Use radar, don’t run into things.

And the data gathering’s going up,

and the computation’s going up,

and the algorithm understanding’s going up,

and there’s a whole bunch of problems

getting solved like that.

The data side is really powerful,

but I disagree with both you and Elon.

I’ll tell Elon once again, as I did before,

that when you add human beings into the picture,

it’s no longer a ballistics problem.

It’s something more complicated,

but I could be very well proven wrong.

Cars are highly damped in terms of rate of change.

Like the steering system’s really slow

compared to a computer.

The acceleration of the acceleration’s really slow.

Yeah, on a certain timescale, on a ballistics timescale,

but human behavior, I don’t know.

I shouldn’t say.

Human beings are really slow too.

Weirdly, we operate half a second behind reality.

Nobody really understands that one either.

It’s pretty funny.

Yeah, yeah.

We very well could be surprised,

and I think with the rate of improvement

in all aspects on both the compute

and the software and the hardware,

there’s gonna be pleasant surprises all over the place.

Speaking of unpleasant surprises,

many people have worries about a singularity

in the development of AI.

Forgive me for such questions.


When AI improves the exponential

and reaches a point of superhuman level

general intelligence, beyond the point,

there’s no looking back.

Do you share this worry of existential threats

from artificial intelligence,

from computers becoming superhuman level intelligent?

No, not really.

We already have a very stratified society,

and then if you look at the whole animal kingdom

of capabilities and abilities and interests,

and smart people have their niche,

and normal people have their niche,

and craftsmen have their niche,

and animals have their niche.

I suspect that the domains of interest

for things that are astronomically different,

like the whole something got 10 times smarter than us

and wanted to track us all down because what?

We like to have coffee at Starbucks?

Like, it doesn’t seem plausible.

No, is there an existential problem

that how do you live in a world

where there’s something way smarter than you,

and you based your kind of self esteem

on being the smartest local person?

Well, there’s what, 0.1% of the population who thinks that?

Because the rest of the population’s been dealing with it

since they were born.

So the breadth of possible experience

that can be interesting is really big.

And, you know, superintelligence seems likely,

although we still don’t know if we’re magical,

but I suspect we’re not.

And it seems likely that it’ll create possibilities

that are interesting for us,

and its interests will be interesting for that,

for whatever it is.

It’s not obvious why its interests would somehow

want to fight over some square foot of dirt,

or, you know, whatever the usual fears are about.

So you don’t think it’ll inherit

some of the darker aspects of human nature?

Depends on how you think reality’s constructed.

So for whatever reason,

human beings are in, let’s say,

creative tension and opposition

with both our good and bad forces.

Like, there’s lots of philosophical understanding of that.

I don’t know why that would be different.

So you think the evil is necessary for the good?

I mean, the tension.

I don’t know about evil,

but like we live in a competitive world

where your good is somebody else’s evil.

You know, there’s the malignant part of it,

but that seems to be self limiting,

although occasionally it’s super horrible.

But yes, there’s a debate over ideas,

and some people have different beliefs,

and that debate itself is a process.

So the arriving at something.

Yeah, and why wouldn’t that continue?


But you don’t think that whole process

will leave humans behind in a way that’s painful?

Emotionally painful, yes.

For the 0.1%, they’ll be.

Why isn’t it already painful

for a large percentage of the population?

And it is.

I mean, society does have a lot of stress in it,

about the 1%, and about the this, and about the that,

but you know, everybody has a lot of stress in their life

about what they find satisfying,

and you know, know yourself seems to be the proper dictum,

and pursue something that makes your life meaningful

seems proper, and there’s so many avenues on that.

Like, there’s so much unexplored space

at every single level, you know.

I’m somewhat of, my nephew called me a jaded optimist.

And you know, so it’s.

There’s a beautiful tension in that label,

but if you were to look back at your life,

and could relive a moment, a set of moments,

because there were the happiest times of your life,

outside of family, what would that be?

I don’t want to relive any moments.

I like that.

I like that situation where you have some amount of optimism

and then the anxiety of the unknown.

So you love the unknown, the mystery of it.

I don’t know about the mystery.

It sure gets your blood pumping.

What do you think is the meaning of this whole thing?

Of life, on this pale blue dot?

It seems to be what it does.

Like, the universe, for whatever reason,

makes atoms, which makes us, which we do stuff.

And we figure out things, and we explore things, and.

That’s just what it is.

It’s not just.

Yeah, it is.

Jim, I don’t think there’s a better place to end it

is a huge honor, and.

Well, that was super fun.

Thank you so much for talking today.

All right, great.

Thanks for listening to this conversation,

and thank you to our presenting sponsor, Cash App.

Download it, use code LexPodcast.

You’ll get $10, and $10 will go to FIRST,

a STEM education nonprofit that inspires hundreds

of thousands of young minds to become future leaders

and innovators.

If you enjoy this podcast, subscribe on YouTube.

Give it five stars on Apple Podcast.

Follow on Spotify, support it on Patreon,

or simply connect with me on Twitter.

And now, let me leave you with some words of wisdom

from Gordon Moore.

If everything you try works,

you aren’t trying hard enough.

Thank you for listening, and hope to see you next time.

comments powered by Disqus