Lex Fridman Podcast - #18 - Elon Musk: Tesla Autopilot

The following is a conversation with Elon Musk.

He’s the CEO of Tesla, SpaceX, Neuralink, and a cofounder of several other companies.

This conversation is part of the Artificial Intelligence podcast.

The series includes leading researchers in academia and industry, including CEOs

and CTOs of automotive, robotics, AI, and technology companies.

This conversation happened after the release of the paper from our group at MIT

on Driver Functional Vigilance, during use of Tesla’s Autopilot.

The Tesla team reached out to me offering a podcast conversation with Mr.

Musk.

I accepted, with full control of questions I could ask and the choice

of what is released publicly.

I ended up editing out nothing of substance.

I’ve never spoken with Elon before this conversation, publicly or privately.

Neither he nor his companies have any influence on my opinion, nor on the rigor

and integrity of the scientific method that I practice in my position at MIT.

Tesla has never financially supported my research, and I’ve never owned a Tesla

vehicle, and I’ve never owned Tesla stock.

This podcast is not a scientific paper.

It is a conversation.

I respect Elon as I do all other leaders and engineers I’ve spoken with.

We agree on some things and disagree on others.

My goal is always with these conversations is to understand the way

the guest sees the world.

One particular point of disagreement in this conversation was the extent to

which camera based driver monitoring will improve outcomes and for how long

it will remain relevant for AI assisted driving.

As someone who works on and is fascinated by human centered artificial

intelligence, I believe that if implemented and integrated effectively,

camera based driver monitoring is likely to be of benefit in both the short

term and the long term.

In contrast, Elon and Tesla’s focus is on the improvement of autopilot such

that it’s statistical safety benefits override any concern of human behavior

and psychology.

Elon and I may not agree on everything, but I deeply respect the engineering

and innovation behind the efforts that he leads.

My goal here is to catalyze a rigorous nuanced and objective discussion in

industry and academia on AI assisted driving.

One that ultimately makes for a safer and better world.

And now here’s my conversation with Elon Musk.

What was the vision, the dream of autopilot when, in the beginning, the

big picture system level, when it was first conceived and started being

installed in 2014, the hardware and the cars, what was the vision, the dream?

I wouldn’t characterize the vision or dream, simply that there are obviously

two massive revolutions in, in the automobile industry.

One is the transition to electrification and then the other is autonomy.

And it became obvious to me that in the future, any car that does not have

autonomy would be about as useful as a horse, which is not to say that

there’s no use, it’s just rare and somewhat idiosyncratic if somebody

has a horse at this point.

It’s just obvious that cars will drive themselves completely.

It’s just a question of time.

And if we did not participate in the autonomy revolution, then our cars

would not be useful to people relative to cars that are autonomous.

I mean, an autonomous car is arguably worth five to 10 times more than

a car which is not autonomous.

In the long term.

Turns out what you mean by long term, but let’s say at least for the

next five years, perhaps 10 years.

So there are a lot of very interesting design choices with autopilot early on.

First is showing on the instrument cluster or in the Model 3 on the

center stack display, what the combined sensor suite sees, what was the

thinking behind that choice?

Was there a debate?

What was the process?

The whole point of the display is to provide a health check on the

vehicle’s perception of reality.

So the vehicle’s taking information from a bunch of sensors, primarily

cameras, but also radar and ultrasonics, GPS, and so forth.

And then that, that information is then rendered into vector space and that,

you know, with a bunch of objects with, with properties like lane lines and

traffic lights and other cars.

And then in vector space that is rerendered onto a display.

So you can confirm whether the car knows what’s going on or not

by looking out the window.

Right.

I think that’s an extremely powerful thing for people to get an understanding.

So it become one with the system and understanding what

the system is capable of.

Now, have you considered showing more?

So if we look at the computer vision, you know, like road segmentation,

lane detection, vehicle detection, object detection, underlying the system,

there is at the edges, some uncertainty.

Have you considered revealing the parts that the vehicle is

in, the parts that the, the uncertainty in the system, the sort of probabilities

associated with, with say image recognition or something like that?

Yeah.

So right now it shows like the vehicles in the vicinity, a very clean, crisp image.

And people do confirm that there’s a car in front of me and the system

sees there’s a car in front of me, but to help people build an intuition

of what computer vision is by showing some of the uncertainty.

Well, I think it’s, in my car, I always look at the sort of the debug view.

And there’s, there’s two debug views.

Uh, one is augmented vision, uh, where, which I’m sure you’ve seen where it’s

basically, we draw boxes and labels around objects that are recognized.

And then there’s a work called the visualizer, which is basically vector

space representation, summing up the input from all sensors that doesn’t,

that doesn’t, does not show any pictures, but it shows, uh, all of the, it’s

basically shows the car’s view of, of, of the world in vector space.

Um, but I think this is very difficult for people to know, normal people to

understand, they would not know what they’re looking at.

So it’s almost an HMI challenge to the current things that are being

displayed is optimized for the general public understanding of

what the system is capable of.

It’s like, if you have no idea what, how computer vision works or anything,

you can sort of look at the screen and see if the car knows what’s going on.

And then if you’re, you know, if you’re a development engineer or if you’re,

you know, if you’re, if you have the development build like I do, then you

can see, uh, you know, all the debug information, but those would just be

like total diverse to most people.

What’s your view on how to best distribute effort.

So there’s three, I would say technical aspects of autopilot

that are really important.

So it’s the underlying algorithms, like the neural network architecture,

there’s the data, so that the strain on, and then there’s a hardware development.

There may be others, but so look, algorithm, data, hardware, you don’t, you

only have so much money, only have so much time, what do you think is the most

important thing to, to, uh, allocate resources to, or do you see it as pretty

evenly distributed between those three?

We automatically get a fast amounts of data because all of our cars have eight

external facing cameras and radar, and usually 12 ultrasonic sensors, uh, GPS,

obviously, um, and, uh, IMU.

And so we basically have a fleet that has, uh, and we’ve got about 400,000

cars on the road that have that level of data, I think you keep quite

close track of it actually.

Yes.

Yeah.

So we’re, we’re approaching half a million cars on the road that have the full sensor

suite.

Um, so this is, I’m, I’m not sure how many other cars on the road have the sensor

suite, but I would be surprised if it’s more than 5,000, which means that we

have 99% of all the data.

So there’s this huge inflow of data.

Absolutely.

Massive inflow of data, and then we, it’s, it’s taken us about three years, but now

we’ve finally developed our full self driving computer, which can process, uh,

and in order of magnitude as much as the Nvidia system that we currently have in

the, in the cars, and it’s really just a, to use it, you’ve unplugged the Nvidia

computer and plug the Tesla computer in and that’s it.

And it’s, it’s, uh, in fact, we’re not even, we’re still exploring the boundaries

of capabilities, uh, but we’re able to run the cameras at full frame rate, full

resolution, uh, not even crop the images and it’s still got headroom even on one

of the systems, the harder full self driving computer is really two computers,

two systems on a chip that are fully redundant.

So you could put a bolt through basically any part of that system and it still

works.

The redundancy, are they perfect copies of each other or also it’s purely for

redundancy as opposed to an argue machine kind of architecture where they’re both

making decisions.

This is purely for redundancy.

I think it would more like it’s, if you have a twin engine aircraft, uh, commercial

aircraft, the system will operate best if both systems are operating, but it’s,

it’s capable of operating safely on one.

So, but as it is right now, we can just run, we’re, we haven’t even hit the, the,

the edge of performance.

So there’s no need to actually distribute functionality across both SOCs.

We can actually just run a full duplicate on, on, on each one.

Do you haven’t really explored or hit the limit of this?

Not yet at the limiter.

So the magic of deep learning is that it gets better with data.

You said there’s a huge inflow of data, but the thing about driving the really

valuable data to learn from is the edge cases.

So how do you, I mean, I’ve, I’ve heard you talk somewhere about, uh, autopilot

disengagements being an important moment of time to use.

Is there other edge cases where you can, you know, you can, you can, you can

drive, is there other edge cases or perhaps can you speak to those edge cases?

What aspects of that might be valuable or if you have other ideas, how to

discover more and more and more edge cases in driving?

Well, there’s a lot of things that are learned.

There are certainly edge cases where I say somebody is on autopilot and they,

they take over and then, okay, that, that, that, that’s a trigger that goes to our

system that says, okay, did they take over for convenience or do they take

over because the autopilot wasn’t working properly.

There’s also like, let’s say we’re, we’re trying to figure out what is the optimal

spline for traversing an intersection.

Um, then then the ones where there are no interventions and are the right ones.

So you then say, okay, when it looks like this, do the following.

And then, and then you get the optimal spline for a complex, uh,

navigating a complex, uh, intersection.

So that’s for this.

So there’s kind of the common case you’re trying to, uh, capture a huge amount of

samples of a particular intersection, how, when things went right, and then

there’s the edge case where, uh, as you said, not for convenience, but

something didn’t go exactly right.

Somebody took over, somebody asserted manual control from autopilot.

And really like the way to look at this as view all input is error.

If the user had to do input, it does something all input is error.

That’s a powerful line.

That’s a powerful line to think of it that way, because they may very well be

error, but if you want to exit the highway, or if you want to, uh, it’s

a navigation decision that all autopilot is not currently designed to do.

Then the driver takes over.

How do you know the difference?

That’s going to change with navigate an autopilot, which we were just

released and without still confirm.

So the navigation, like lane change based, like a certain control in

order to change, do a lane change or exit a freeway or, or doing a highway

under change, the vast majority of that will go away with, um, the

release that just went out.

Yeah.

So that, that I don’t think people quite understand how big of a step that is.

Yeah, they don’t.

So if you drive the car, then you do.

So you still have to keep your hands on the steering wheel currently when

it does the automatic lane change.

What are, so there’s, there’s these big leaps through the development of

autopilot through its history and what stands out to you as the big leaps?

I would say this one, navigate an autopilot without, uh, confirm

without having to confirm is a huge leap.

It is a huge leap.

It also automatically overtakes low cars.

So it’s, it’s both navigation, um, and seeking the fastest lane.

So it’ll, it’ll, it’ll, you know, overtake a slow cause, um, and exit the

freeway and take highway interchanges.

And, and then, uh, we have, uh, traffic lights, uh, recognition, which

introduced initially as a, as a warning.

I mean, on the development version that I’m driving, the car fully, fully

stops and goes at traffic lights.

So those are the steps, right?

You’ve just mentioned something sort of inkling a step towards full autonomy.

What would you say are the biggest technological roadblocks

to full self driving?

Actually, I don’t think, I think we just, the full self driving computer that we

just, uh, that the Tesla, what we call the FSD computer, uh, that that’s now in

production.

Uh, so if you order, uh, any model SRX or any model three that has the full self

driving package, you’ll get the FSD computer.

That, that was, that’s important to have enough, uh, base computation, uh, then

refining the neural net and the control software, uh, which, but all of that can

just be provided as an over there update.

The thing that’s really profound and where I’ll be emphasizing at the, uh, sort

of what that investor day that we’re having focused on autonomy is that the

cars currently being produced with the hardware currently being produced is

capable of full self driving, but capable is an interesting word because, um, like

the hardware is, and as we refine the software, the capabilities will increase

dramatically, um, and then the reliability will increase dramatically, and then it

will receive regulatory approval.

So essentially buying a car today is an investment in the future.

You’re essentially buying a car, you’re buying the, I think the most profound

thing is that if you buy a Tesla today, I believe you are buying an appreciating

asset, not a depreciating asset.

So that’s a really important statement there because if hardware is capable

enough, that’s the hard thing to upgrade usually.

Exactly.

So then the rest is a software problem.

Yes.

Software has no marginal cost really.

But what’s your intuition on the software side?

How hard are the remaining steps to, to get it to where, um, you know, uh, the,

the experience, uh, not just the safety, but the full experience is something

that people would, uh, enjoy.

Well, I think people enjoy it very much so on, on, on the highways.

It’s, it’s a total game changer for quality of life for using, you know,

Tesla autopilot on the highways, uh, so it’s really just extending that

functionality to city streets, adding in the traffic light recognition, uh,

navigating complex intersections and, um, and then, uh, being able to navigate

complicated parking lots so the car can, uh, exit a parking space and come and

find you, even if it’s in a complete maze of a parking lot, um, and, and, and,

and then if, and then you can just, it can just drop you off and find a

parking spot by itself.

Yeah.

In terms of enjoyability and something that people would, uh, would actually

find a lot of use from the parking lot is a, is a really, you know, it’s, it’s

rich of annoyance when you have to do it manually.

So there’s a lot of benefit to be gained from automation there.

So let me start injecting the human into this discussion a little bit.

Uh, so let’s talk about, uh, the, the, the, the, the, the, the, the, the, the,

about full autonomy.

If you look at the current level four vehicles being tested on

road, like Waymo and so on, they’re only technically autonomous.

They’re really level two systems with just the different design philosophy,

because there’s always a safety driver in almost all cases and

they’re monitoring the system.

Right.

Do you see Tesla’s full self driving as still for a time to come requiring

supervision of the human being.

So it’s capabilities are powerful enough to drive, but nevertheless requires

the human to still be supervising, just like a safety driver is in a

other fully autonomous vehicles.

I think it will require detecting hands on wheel for at least, uh, six months

or something like that from here.

It really is a question of like, from a regulatory standpoint, uh, what, how much

safer than a person does autopilot need to be for it to be okay to not monitor

the car, you know, and, and this is a debate that one can have it.

And then if you, but you need, you know, a large sample, a large amount of data.

Um, so you can prove with high confidence, statistically speaking, that the car is

dramatically safer than a person, um, and that adding in the person monitoring

does not materially affect the safety.

So it might need to be like two or 300% safer than a person.

And how do you prove that incidents per mile incidents per mile crashes and

fatalities, fatalities would be a factor, but there, there are just not enough

fatalities to be statistically significant at scale, but there are enough.

Crashes, you know, there are far more crashes than there are fatalities.

So you can assess what is the probability of a crash that then there’s another step

which probability of injury and probability of permanent injury, the

probability of death, and all of those need to be a much better than a person,

uh, by at least perhaps 200%.

And you think there’s, uh, the ability to have a healthy discourse with the

regulatory bodies on this topic?

I mean, there’s no question that, um, but, um, regulators pay just disproportionate

amount of attention to that, which generates press.

This is just an objective fact.

Um, and Tesla generates a lot of press.

So the, you know, in the United States, this, I think almost, you know,

uh, in the United States, this, I think almost 40,000 automotive deaths per year.

Uh, but if there are four in Tesla, they’ll probably receive a thousand

times more press than anyone else.

So the, the psychology of that is actually fascinating.

I don’t think we’ll have enough time to talk about that, but I have to talk to

you about the human side of things.

So myself and our team at MIT recently released the paper on functional

vigilance of drivers while using autopilot.

This is work we’ve been doing since autopilot was first released publicly

over three years ago, collecting video of driver faces and driver body.

So I saw that you tweeted a quote from the abstract, so I can at least, uh,

guess that you’ve glanced at it.

Yeah, I read it.

Can I talk you through what we found?

Sure.

Okay.

So it appears that in the data that we’ve collected, that drivers are maintaining

functional vigilance such that we’re looking at 18,000 disengagement from

autopilot, 18,900 and annotating, were they able to take over control in a timely

manner?

So they were there present looking at the road, uh, to take over control.

Okay.

So this, uh, goes against what, what many would predict from the body of literature

on vigilance with automation.

Now, the question is, do you think these results hold across the broader

population?

So ours is just a small subset.

Do you think, uh, one of the criticism is that, you know, there’s a small

minority of drivers that may be highly responsible where their vigilance

decrement would increase with autopilot use?

I think this is all really going to be swept.

I mean, the system’s improving so much, so fast that this is going to be a mood

point very soon where vigilance is like, if something’s many times safer than a

person, then adding a person, uh, does the, the, the effect on safety is, is

limited.

Um, and in fact, uh, it could be negative.

That’s really interesting.

So the, uh, the, so the fact that a human may, some percent of the population may,

uh, exhibit a vigilance decrement will not affect overall statistics numbers of

safety.

No, in fact, I think it will become, uh, very, very quickly, maybe even towards

the end of this year, but I’d say I’d be shocked if it’s not next year.

At the latest, that, um, having the person, having a human intervene will

decrease safety decrease.

It’s like, imagine if you’re an elevator and it used to be that there were

elevator operators, um, and, and you couldn’t go on an elevator by yourself

and work the lever to move between floors.

Um, and now, uh, nobody wants it an elevator operator because the automated

elevator that stops the floors is much safer than the elevator operator.

And in fact, it would be quite dangerous to have someone with a lever that can

move the elevator between floors.

So that’s a, that’s a really powerful statement and really interesting one.

But I also have to ask from a user experience and from a safety perspective,

one of the passions for me algorithmically is a camera based detection of, uh,

of just sensing the human, but detecting what the driver is looking at, cognitive

load, body pose on the computer vision side, that’s a fascinating problem.

But do you, and there’s many in industry believe you have to have

camera based driver monitoring.

Do you think there could be benefit gained from driver monitoring?

If you have a system that’s, that’s at, that’s at or below a human level

reliability, then driver monitoring makes sense.

But if your system is dramatically better, more likely to be

better, more liable than, than a human, then drive monitoring monitoring

is not just not help much.

And, uh, like I said, you, you, just like, as an, you wouldn’t want someone

into like, you wouldn’t want someone in the elevator, if you’re in an elevator,

do you really want someone with a big lever, some, some random person

operating the elevator between floors?

I wouldn’t trust that or rather have the buttons.

Okay.

You’re optimistic about the pace of improvement of the system that from

what you’ve seen with the full self driving car computer, the rate

of improvement is exponential.

So one of the other very interesting design choices early on that connects

to this is the operational design domain of autopilot.

So where autopilot is able to be turned on the, so contrast another vehicle

system that we’re studying is the Cadillac SuperCrew system.

That’s in terms of ODD, very constrained to particular kinds of highways, well

mapped, tested, but it’s much narrower than the ODD of Tesla vehicles.

What’s there’s, there’s pros and…

It’s like ADD.

Yeah.

That’s good.

That’s a, that’s a good line.

Uh, what was the design decision, uh, what, in that different philosophy

of thinking where there’s pros and cons, what we see with, uh, a wide ODD

is drive Tesla drivers are able to explore more the limitations of the

system, at least early on, and they understand together with the instrument

cluster display, they start to understand what are the capabilities.

So that’s a benefit.

The con is you go, you’re letting drivers use it basically anywhere.

So anyway, that could detect lanes with confidence.

Was there a philosophy, uh, design decisions that were challenging

that were being made there or from the very beginning, was that, uh,

done on purpose with intent?

Well, I mean, I think it’s frankly, it’s pretty crazy giving it, letting people

drive a two ton death machine manually.

Uh, that’s crazy.

Like, like in the future of people who are like, I can’t believe anyone was

just allowed to drive for one of these two ton death machines and they

just drive wherever they wanted.

Just like elevators.

He was like, move the elevator with that lever, wherever you want.

It can stop at halfway between floors if you want.

It’s pretty crazy.

So it’s going to seem like a mad thing in the future that people were driving cars.

So I have a bunch of questions about the human psychology, about behavior and so

on that would become that because, uh, you have faith in the AI system, uh, not

faith, but, uh, the, both on the hardware side and the deep learning approach of

learning from data will make it just far safer than humans.

Yeah, exactly.

Recently, there are a few hackers who, uh, tricked autopilot to act in

unexpected ways with adversarial examples.

So we all know that neural network systems are very sensitive to minor

disturbances to these adversarial examples on input.

Do you think it’s possible to defend against something like this for the

broader, for the industry?

Sure.

So can you elaborate on the, on the confidence behind that answer?

Um, well the, you know, neural net is just like a basic bunch of matrix math.

Or you have to be like a very sophisticated, somebody who really

understands neural nets and like basically reverse engineer how the matrix

is being built and then create a little thing that’s just exactly, um, causes

the matrix math to be slightly off.

But it’s very easy to then block it, block that by, by having basically

anti negative recognition.

It’s like if you, if the system sees something that looks like a matrix hack,

uh, exclude it, so it’s such an easy thing to do.

So learn both on the, the valid data and the invalid data.

So basically learn on the adversarial examples to be able to exclude them.

Yeah.

Like you basically want to both know what is, what is a car and

what is definitely not a car.

And you train for this is a car and this is definitely not a car.

Those are two different things.

People have no idea neural nets really.

They probably think neural nets are both like, you know, fishing net only.

So as you know, so taking a step beyond just Tesla and autopilot, uh, current

deep learning approaches still seem in some ways to be far from general

intelligence systems.

Do you think the current approaches will take us to general intelligence or do

totally new ideas need to be invented?

I think we’re missing a few key ideas for general intelligence, general artificial

general intelligence, but it’s going to be upon us very quickly.

And then we’ll need to figure out what shall we do if we even have that choice?

But it’s amazing how people can’t differentiate between say the narrow

AI that, you know, allows a car to figure out what a lane line is and, and, and,

you know, and navigate streets versus general intelligence.

Like these are just very different things.

Like your toaster and your computer are both machines, but one’s much

more sophisticated than another.

You’re confident with Tesla.

You can create the world’s best toaster.

The world’s best toaster.

Yes.

The world’s best toaster. Yes. The world’s best self driving. I’m, I, yes.

To me right now, this seems game set match.

I don’t, I mean, that sounds, I don’t want to be complacent or overconfident,

but that’s what it appears.

That is just literally what it, how it appears right now.

I could be wrong, but it appears to be the case that Tesla is vastly ahead of

everyone.

Do you think we will ever create an AI system that we can love and loves us back

in a deep, meaningful way?

Like in the movie, her, I think AI will be capable of convincing you to fall in

love with it very well.

And that’s different than us humans.

You know, we start getting into a metaphysical question of like, do emotions

and thoughts exist in a different realm than the physical?

And maybe they do.

Maybe they don’t.

I don’t know.

But from a physics standpoint, I tend to think of things, you know, like physics

was my main sort of training and from a physics standpoint, essentially, if it

loves you in a way that is, that you can’t tell whether it’s real or not, it is

real.

That’s a physics view of love.

Yeah.

If there’s no, if you cannot just, if you cannot prove that it does not, if there’s

no, if there’s no test that you can apply that would make it, allow you to tell the

difference, then there is no difference.

Right.

And it’s similar to seeing our world as simulation.

There may not be a test to tell the difference between what the real world

and the simulation, and therefore from a physics perspective, it might as well be

the same thing.

Yes.

And there may be ways to test whether it’s a simulation.

There might be, I’m not saying there aren’t, but you could certainly imagine

that a simulation could correct that once an entity in the simulation found a way

to detect the simulation, it could either restart, you know, pause the simulation,

start a new simulation, or do one of many other things that then corrects for that

error.

So when maybe you or somebody else creates an AGI system and you get to ask

her one question, what would that question be?

What’s outside the simulation?

Elon, thank you so much for talking today.

It was a pleasure.

All right.

Thank you.

comments powered by Disqus