The following is a conversation with Elon Musk.
He’s the CEO of Tesla, SpaceX, Neuralink, and a cofounder of several other companies.
This conversation is part of the Artificial Intelligence podcast.
The series includes leading researchers in academia and industry, including CEOs
and CTOs of automotive, robotics, AI, and technology companies.
This conversation happened after the release of the paper from our group at MIT
on Driver Functional Vigilance, during use of Tesla’s Autopilot.
The Tesla team reached out to me offering a podcast conversation with Mr.
Musk.
I accepted, with full control of questions I could ask and the choice
of what is released publicly.
I ended up editing out nothing of substance.
I’ve never spoken with Elon before this conversation, publicly or privately.
Neither he nor his companies have any influence on my opinion, nor on the rigor
and integrity of the scientific method that I practice in my position at MIT.
Tesla has never financially supported my research, and I’ve never owned a Tesla
vehicle, and I’ve never owned Tesla stock.
This podcast is not a scientific paper.
It is a conversation.
I respect Elon as I do all other leaders and engineers I’ve spoken with.
We agree on some things and disagree on others.
My goal is always with these conversations is to understand the way
the guest sees the world.
One particular point of disagreement in this conversation was the extent to
which camera based driver monitoring will improve outcomes and for how long
it will remain relevant for AI assisted driving.
As someone who works on and is fascinated by human centered artificial
intelligence, I believe that if implemented and integrated effectively,
camera based driver monitoring is likely to be of benefit in both the short
term and the long term.
In contrast, Elon and Tesla’s focus is on the improvement of autopilot such
that it’s statistical safety benefits override any concern of human behavior
and psychology.
Elon and I may not agree on everything, but I deeply respect the engineering
and innovation behind the efforts that he leads.
My goal here is to catalyze a rigorous nuanced and objective discussion in
industry and academia on AI assisted driving.
One that ultimately makes for a safer and better world.
And now here’s my conversation with Elon Musk.
What was the vision, the dream of autopilot when, in the beginning, the
big picture system level, when it was first conceived and started being
installed in 2014, the hardware and the cars, what was the vision, the dream?
I wouldn’t characterize the vision or dream, simply that there are obviously
two massive revolutions in, in the automobile industry.
One is the transition to electrification and then the other is autonomy.
And it became obvious to me that in the future, any car that does not have
autonomy would be about as useful as a horse, which is not to say that
there’s no use, it’s just rare and somewhat idiosyncratic if somebody
has a horse at this point.
It’s just obvious that cars will drive themselves completely.
It’s just a question of time.
And if we did not participate in the autonomy revolution, then our cars
would not be useful to people relative to cars that are autonomous.
I mean, an autonomous car is arguably worth five to 10 times more than
a car which is not autonomous.
In the long term.
Turns out what you mean by long term, but let’s say at least for the
next five years, perhaps 10 years.
So there are a lot of very interesting design choices with autopilot early on.
First is showing on the instrument cluster or in the Model 3 on the
center stack display, what the combined sensor suite sees, what was the
thinking behind that choice?
Was there a debate?
What was the process?
The whole point of the display is to provide a health check on the
vehicle’s perception of reality.
So the vehicle’s taking information from a bunch of sensors, primarily
cameras, but also radar and ultrasonics, GPS, and so forth.
And then that, that information is then rendered into vector space and that,
you know, with a bunch of objects with, with properties like lane lines and
traffic lights and other cars.
And then in vector space that is rerendered onto a display.
So you can confirm whether the car knows what’s going on or not
by looking out the window.
Right.
I think that’s an extremely powerful thing for people to get an understanding.
So it become one with the system and understanding what
the system is capable of.
Now, have you considered showing more?
So if we look at the computer vision, you know, like road segmentation,
lane detection, vehicle detection, object detection, underlying the system,
there is at the edges, some uncertainty.
Have you considered revealing the parts that the vehicle is
in, the parts that the, the uncertainty in the system, the sort of probabilities
associated with, with say image recognition or something like that?
Yeah.
So right now it shows like the vehicles in the vicinity, a very clean, crisp image.
And people do confirm that there’s a car in front of me and the system
sees there’s a car in front of me, but to help people build an intuition
of what computer vision is by showing some of the uncertainty.
Well, I think it’s, in my car, I always look at the sort of the debug view.
And there’s, there’s two debug views.
Uh, one is augmented vision, uh, where, which I’m sure you’ve seen where it’s
basically, we draw boxes and labels around objects that are recognized.
And then there’s a work called the visualizer, which is basically vector
space representation, summing up the input from all sensors that doesn’t,
that doesn’t, does not show any pictures, but it shows, uh, all of the, it’s
basically shows the car’s view of, of, of the world in vector space.
Um, but I think this is very difficult for people to know, normal people to
understand, they would not know what they’re looking at.
So it’s almost an HMI challenge to the current things that are being
displayed is optimized for the general public understanding of
what the system is capable of.
It’s like, if you have no idea what, how computer vision works or anything,
you can sort of look at the screen and see if the car knows what’s going on.
And then if you’re, you know, if you’re a development engineer or if you’re,
you know, if you’re, if you have the development build like I do, then you
can see, uh, you know, all the debug information, but those would just be
like total diverse to most people.
What’s your view on how to best distribute effort.
So there’s three, I would say technical aspects of autopilot
that are really important.
So it’s the underlying algorithms, like the neural network architecture,
there’s the data, so that the strain on, and then there’s a hardware development.
There may be others, but so look, algorithm, data, hardware, you don’t, you
only have so much money, only have so much time, what do you think is the most
important thing to, to, uh, allocate resources to, or do you see it as pretty
evenly distributed between those three?
We automatically get a fast amounts of data because all of our cars have eight
external facing cameras and radar, and usually 12 ultrasonic sensors, uh, GPS,
obviously, um, and, uh, IMU.
And so we basically have a fleet that has, uh, and we’ve got about 400,000
cars on the road that have that level of data, I think you keep quite
close track of it actually.
Yes.
Yeah.
So we’re, we’re approaching half a million cars on the road that have the full sensor
suite.
Um, so this is, I’m, I’m not sure how many other cars on the road have the sensor
suite, but I would be surprised if it’s more than 5,000, which means that we
have 99% of all the data.
So there’s this huge inflow of data.
Absolutely.
Massive inflow of data, and then we, it’s, it’s taken us about three years, but now
we’ve finally developed our full self driving computer, which can process, uh,
and in order of magnitude as much as the Nvidia system that we currently have in
the, in the cars, and it’s really just a, to use it, you’ve unplugged the Nvidia
computer and plug the Tesla computer in and that’s it.
And it’s, it’s, uh, in fact, we’re not even, we’re still exploring the boundaries
of capabilities, uh, but we’re able to run the cameras at full frame rate, full
resolution, uh, not even crop the images and it’s still got headroom even on one
of the systems, the harder full self driving computer is really two computers,
two systems on a chip that are fully redundant.
So you could put a bolt through basically any part of that system and it still
works.
The redundancy, are they perfect copies of each other or also it’s purely for
redundancy as opposed to an argue machine kind of architecture where they’re both
making decisions.
This is purely for redundancy.
I think it would more like it’s, if you have a twin engine aircraft, uh, commercial
aircraft, the system will operate best if both systems are operating, but it’s,
it’s capable of operating safely on one.
So, but as it is right now, we can just run, we’re, we haven’t even hit the, the,
the edge of performance.
So there’s no need to actually distribute functionality across both SOCs.
We can actually just run a full duplicate on, on, on each one.
Do you haven’t really explored or hit the limit of this?
Not yet at the limiter.
So the magic of deep learning is that it gets better with data.
You said there’s a huge inflow of data, but the thing about driving the really
valuable data to learn from is the edge cases.
So how do you, I mean, I’ve, I’ve heard you talk somewhere about, uh, autopilot
disengagements being an important moment of time to use.
Is there other edge cases where you can, you know, you can, you can, you can
drive, is there other edge cases or perhaps can you speak to those edge cases?
What aspects of that might be valuable or if you have other ideas, how to
discover more and more and more edge cases in driving?
Well, there’s a lot of things that are learned.
There are certainly edge cases where I say somebody is on autopilot and they,
they take over and then, okay, that, that, that, that’s a trigger that goes to our
system that says, okay, did they take over for convenience or do they take
over because the autopilot wasn’t working properly.
There’s also like, let’s say we’re, we’re trying to figure out what is the optimal
spline for traversing an intersection.
Um, then then the ones where there are no interventions and are the right ones.
So you then say, okay, when it looks like this, do the following.
And then, and then you get the optimal spline for a complex, uh,
navigating a complex, uh, intersection.
So that’s for this.
So there’s kind of the common case you’re trying to, uh, capture a huge amount of
samples of a particular intersection, how, when things went right, and then
there’s the edge case where, uh, as you said, not for convenience, but
something didn’t go exactly right.
Somebody took over, somebody asserted manual control from autopilot.
And really like the way to look at this as view all input is error.
If the user had to do input, it does something all input is error.
That’s a powerful line.
That’s a powerful line to think of it that way, because they may very well be
error, but if you want to exit the highway, or if you want to, uh, it’s
a navigation decision that all autopilot is not currently designed to do.
Then the driver takes over.
How do you know the difference?
That’s going to change with navigate an autopilot, which we were just
released and without still confirm.
So the navigation, like lane change based, like a certain control in
order to change, do a lane change or exit a freeway or, or doing a highway
under change, the vast majority of that will go away with, um, the
release that just went out.
Yeah.
So that, that I don’t think people quite understand how big of a step that is.
Yeah, they don’t.
So if you drive the car, then you do.
So you still have to keep your hands on the steering wheel currently when
it does the automatic lane change.
What are, so there’s, there’s these big leaps through the development of
autopilot through its history and what stands out to you as the big leaps?
I would say this one, navigate an autopilot without, uh, confirm
without having to confirm is a huge leap.
It is a huge leap.
It also automatically overtakes low cars.
So it’s, it’s both navigation, um, and seeking the fastest lane.
So it’ll, it’ll, it’ll, you know, overtake a slow cause, um, and exit the
freeway and take highway interchanges.
And, and then, uh, we have, uh, traffic lights, uh, recognition, which
introduced initially as a, as a warning.
I mean, on the development version that I’m driving, the car fully, fully
stops and goes at traffic lights.
So those are the steps, right?
You’ve just mentioned something sort of inkling a step towards full autonomy.
What would you say are the biggest technological roadblocks
to full self driving?
Actually, I don’t think, I think we just, the full self driving computer that we
just, uh, that the Tesla, what we call the FSD computer, uh, that that’s now in
production.
Uh, so if you order, uh, any model SRX or any model three that has the full self
driving package, you’ll get the FSD computer.
That, that was, that’s important to have enough, uh, base computation, uh, then
refining the neural net and the control software, uh, which, but all of that can
just be provided as an over there update.
The thing that’s really profound and where I’ll be emphasizing at the, uh, sort
of what that investor day that we’re having focused on autonomy is that the
cars currently being produced with the hardware currently being produced is
capable of full self driving, but capable is an interesting word because, um, like
the hardware is, and as we refine the software, the capabilities will increase
dramatically, um, and then the reliability will increase dramatically, and then it
will receive regulatory approval.
So essentially buying a car today is an investment in the future.
You’re essentially buying a car, you’re buying the, I think the most profound
thing is that if you buy a Tesla today, I believe you are buying an appreciating
asset, not a depreciating asset.
So that’s a really important statement there because if hardware is capable
enough, that’s the hard thing to upgrade usually.
Exactly.
So then the rest is a software problem.
Yes.
Software has no marginal cost really.
But what’s your intuition on the software side?
How hard are the remaining steps to, to get it to where, um, you know, uh, the,
the experience, uh, not just the safety, but the full experience is something
that people would, uh, enjoy.
Well, I think people enjoy it very much so on, on, on the highways.
It’s, it’s a total game changer for quality of life for using, you know,
Tesla autopilot on the highways, uh, so it’s really just extending that
functionality to city streets, adding in the traffic light recognition, uh,
navigating complex intersections and, um, and then, uh, being able to navigate
complicated parking lots so the car can, uh, exit a parking space and come and
find you, even if it’s in a complete maze of a parking lot, um, and, and, and,
and then if, and then you can just, it can just drop you off and find a
parking spot by itself.
Yeah.
In terms of enjoyability and something that people would, uh, would actually
find a lot of use from the parking lot is a, is a really, you know, it’s, it’s
rich of annoyance when you have to do it manually.
So there’s a lot of benefit to be gained from automation there.
So let me start injecting the human into this discussion a little bit.
Uh, so let’s talk about, uh, the, the, the, the, the, the, the, the, the, the,
about full autonomy.
If you look at the current level four vehicles being tested on
road, like Waymo and so on, they’re only technically autonomous.
They’re really level two systems with just the different design philosophy,
because there’s always a safety driver in almost all cases and
they’re monitoring the system.
Right.
Do you see Tesla’s full self driving as still for a time to come requiring
supervision of the human being.
So it’s capabilities are powerful enough to drive, but nevertheless requires
the human to still be supervising, just like a safety driver is in a
other fully autonomous vehicles.
I think it will require detecting hands on wheel for at least, uh, six months
or something like that from here.
It really is a question of like, from a regulatory standpoint, uh, what, how much
safer than a person does autopilot need to be for it to be okay to not monitor
the car, you know, and, and this is a debate that one can have it.
And then if you, but you need, you know, a large sample, a large amount of data.
Um, so you can prove with high confidence, statistically speaking, that the car is
dramatically safer than a person, um, and that adding in the person monitoring
does not materially affect the safety.
So it might need to be like two or 300% safer than a person.
And how do you prove that incidents per mile incidents per mile crashes and
fatalities, fatalities would be a factor, but there, there are just not enough
fatalities to be statistically significant at scale, but there are enough.
Crashes, you know, there are far more crashes than there are fatalities.
So you can assess what is the probability of a crash that then there’s another step
which probability of injury and probability of permanent injury, the
probability of death, and all of those need to be a much better than a person,
uh, by at least perhaps 200%.
And you think there’s, uh, the ability to have a healthy discourse with the
regulatory bodies on this topic?
I mean, there’s no question that, um, but, um, regulators pay just disproportionate
amount of attention to that, which generates press.
This is just an objective fact.
Um, and Tesla generates a lot of press.
So the, you know, in the United States, this, I think almost, you know,
uh, in the United States, this, I think almost 40,000 automotive deaths per year.
Uh, but if there are four in Tesla, they’ll probably receive a thousand
times more press than anyone else.
So the, the psychology of that is actually fascinating.
I don’t think we’ll have enough time to talk about that, but I have to talk to
you about the human side of things.
So myself and our team at MIT recently released the paper on functional
vigilance of drivers while using autopilot.
This is work we’ve been doing since autopilot was first released publicly
over three years ago, collecting video of driver faces and driver body.
So I saw that you tweeted a quote from the abstract, so I can at least, uh,
guess that you’ve glanced at it.
Yeah, I read it.
Can I talk you through what we found?
Sure.
Okay.
So it appears that in the data that we’ve collected, that drivers are maintaining
functional vigilance such that we’re looking at 18,000 disengagement from
autopilot, 18,900 and annotating, were they able to take over control in a timely
manner?
So they were there present looking at the road, uh, to take over control.
Okay.
So this, uh, goes against what, what many would predict from the body of literature
on vigilance with automation.
Now, the question is, do you think these results hold across the broader
population?
So ours is just a small subset.
Do you think, uh, one of the criticism is that, you know, there’s a small
minority of drivers that may be highly responsible where their vigilance
decrement would increase with autopilot use?
I think this is all really going to be swept.
I mean, the system’s improving so much, so fast that this is going to be a mood
point very soon where vigilance is like, if something’s many times safer than a
person, then adding a person, uh, does the, the, the effect on safety is, is
limited.
Um, and in fact, uh, it could be negative.
That’s really interesting.
So the, uh, the, so the fact that a human may, some percent of the population may,
uh, exhibit a vigilance decrement will not affect overall statistics numbers of
safety.
No, in fact, I think it will become, uh, very, very quickly, maybe even towards
the end of this year, but I’d say I’d be shocked if it’s not next year.
At the latest, that, um, having the person, having a human intervene will
decrease safety decrease.
It’s like, imagine if you’re an elevator and it used to be that there were
elevator operators, um, and, and you couldn’t go on an elevator by yourself
and work the lever to move between floors.
Um, and now, uh, nobody wants it an elevator operator because the automated
elevator that stops the floors is much safer than the elevator operator.
And in fact, it would be quite dangerous to have someone with a lever that can
move the elevator between floors.
So that’s a, that’s a really powerful statement and really interesting one.
But I also have to ask from a user experience and from a safety perspective,
one of the passions for me algorithmically is a camera based detection of, uh,
of just sensing the human, but detecting what the driver is looking at, cognitive
load, body pose on the computer vision side, that’s a fascinating problem.
But do you, and there’s many in industry believe you have to have
camera based driver monitoring.
Do you think there could be benefit gained from driver monitoring?
If you have a system that’s, that’s at, that’s at or below a human level
reliability, then driver monitoring makes sense.
But if your system is dramatically better, more likely to be
better, more liable than, than a human, then drive monitoring monitoring
is not just not help much.
And, uh, like I said, you, you, just like, as an, you wouldn’t want someone
into like, you wouldn’t want someone in the elevator, if you’re in an elevator,
do you really want someone with a big lever, some, some random person
operating the elevator between floors?
I wouldn’t trust that or rather have the buttons.
Okay.
You’re optimistic about the pace of improvement of the system that from
what you’ve seen with the full self driving car computer, the rate
of improvement is exponential.
So one of the other very interesting design choices early on that connects
to this is the operational design domain of autopilot.
So where autopilot is able to be turned on the, so contrast another vehicle
system that we’re studying is the Cadillac SuperCrew system.
That’s in terms of ODD, very constrained to particular kinds of highways, well
mapped, tested, but it’s much narrower than the ODD of Tesla vehicles.
What’s there’s, there’s pros and…
It’s like ADD.
Yeah.
That’s good.
That’s a, that’s a good line.
Uh, what was the design decision, uh, what, in that different philosophy
of thinking where there’s pros and cons, what we see with, uh, a wide ODD
is drive Tesla drivers are able to explore more the limitations of the
system, at least early on, and they understand together with the instrument
cluster display, they start to understand what are the capabilities.
So that’s a benefit.
The con is you go, you’re letting drivers use it basically anywhere.
So anyway, that could detect lanes with confidence.
Was there a philosophy, uh, design decisions that were challenging
that were being made there or from the very beginning, was that, uh,
done on purpose with intent?
Well, I mean, I think it’s frankly, it’s pretty crazy giving it, letting people
drive a two ton death machine manually.
Uh, that’s crazy.
Like, like in the future of people who are like, I can’t believe anyone was
just allowed to drive for one of these two ton death machines and they
just drive wherever they wanted.
Just like elevators.
He was like, move the elevator with that lever, wherever you want.
It can stop at halfway between floors if you want.
It’s pretty crazy.
So it’s going to seem like a mad thing in the future that people were driving cars.
So I have a bunch of questions about the human psychology, about behavior and so
on that would become that because, uh, you have faith in the AI system, uh, not
faith, but, uh, the, both on the hardware side and the deep learning approach of
learning from data will make it just far safer than humans.
Yeah, exactly.
Recently, there are a few hackers who, uh, tricked autopilot to act in
unexpected ways with adversarial examples.
So we all know that neural network systems are very sensitive to minor
disturbances to these adversarial examples on input.
Do you think it’s possible to defend against something like this for the
broader, for the industry?
Sure.
So can you elaborate on the, on the confidence behind that answer?
Um, well the, you know, neural net is just like a basic bunch of matrix math.
Or you have to be like a very sophisticated, somebody who really
understands neural nets and like basically reverse engineer how the matrix
is being built and then create a little thing that’s just exactly, um, causes
the matrix math to be slightly off.
But it’s very easy to then block it, block that by, by having basically
anti negative recognition.
It’s like if you, if the system sees something that looks like a matrix hack,
uh, exclude it, so it’s such an easy thing to do.
So learn both on the, the valid data and the invalid data.
So basically learn on the adversarial examples to be able to exclude them.
Yeah.
Like you basically want to both know what is, what is a car and
what is definitely not a car.
And you train for this is a car and this is definitely not a car.
Those are two different things.
People have no idea neural nets really.
They probably think neural nets are both like, you know, fishing net only.
So as you know, so taking a step beyond just Tesla and autopilot, uh, current
deep learning approaches still seem in some ways to be far from general
intelligence systems.
Do you think the current approaches will take us to general intelligence or do
totally new ideas need to be invented?
I think we’re missing a few key ideas for general intelligence, general artificial
general intelligence, but it’s going to be upon us very quickly.
And then we’ll need to figure out what shall we do if we even have that choice?
But it’s amazing how people can’t differentiate between say the narrow
AI that, you know, allows a car to figure out what a lane line is and, and, and,
you know, and navigate streets versus general intelligence.
Like these are just very different things.
Like your toaster and your computer are both machines, but one’s much
more sophisticated than another.
You’re confident with Tesla.
You can create the world’s best toaster.
The world’s best toaster.
Yes.
The world’s best toaster. Yes. The world’s best self driving. I’m, I, yes.
To me right now, this seems game set match.
I don’t, I mean, that sounds, I don’t want to be complacent or overconfident,
but that’s what it appears.
That is just literally what it, how it appears right now.
I could be wrong, but it appears to be the case that Tesla is vastly ahead of
everyone.
Do you think we will ever create an AI system that we can love and loves us back
in a deep, meaningful way?
Like in the movie, her, I think AI will be capable of convincing you to fall in
love with it very well.
And that’s different than us humans.
You know, we start getting into a metaphysical question of like, do emotions
and thoughts exist in a different realm than the physical?
And maybe they do.
Maybe they don’t.
I don’t know.
But from a physics standpoint, I tend to think of things, you know, like physics
was my main sort of training and from a physics standpoint, essentially, if it
loves you in a way that is, that you can’t tell whether it’s real or not, it is
real.
That’s a physics view of love.
Yeah.
If there’s no, if you cannot just, if you cannot prove that it does not, if there’s
no, if there’s no test that you can apply that would make it, allow you to tell the
difference, then there is no difference.
Right.
And it’s similar to seeing our world as simulation.
There may not be a test to tell the difference between what the real world
and the simulation, and therefore from a physics perspective, it might as well be
the same thing.
Yes.
And there may be ways to test whether it’s a simulation.
There might be, I’m not saying there aren’t, but you could certainly imagine
that a simulation could correct that once an entity in the simulation found a way
to detect the simulation, it could either restart, you know, pause the simulation,
start a new simulation, or do one of many other things that then corrects for that
error.
So when maybe you or somebody else creates an AGI system and you get to ask
her one question, what would that question be?
What’s outside the simulation?
Elon, thank you so much for talking today.
It was a pleasure.
All right.
Thank you.