The following is a conversation with Andrew Ng,
one of the most impactful educators, researchers, innovators, and leaders
in artificial intelligence and technology space in general.
He cofounded Coursera and Google Brain,
launched Deep Learning AI, Landing AI, and the AI Fund,
and was the chief scientist at Baidu.
As a Stanford professor and with Coursera and Deep Learning AI,
he has helped educate and inspire millions of students, including me.
This is the Artificial Intelligence Podcast.
If you enjoy it, subscribe on YouTube, give it five stars on Apple Podcast,
support it on Patreon, or simply connect with me on Twitter
at Lex Friedman, spelled F R I D M A N.
As usual, I’ll do one or two minutes of ads now
and never any ads in the middle that can break the flow of the conversation.
I hope that works for you and doesn’t hurt the listening experience.
This show is presented by Cash App, the number one finance app in the App Store.
When you get it, use code LEXPODCAST.
Cash App lets you send money to friends, buy Bitcoin,
and invest in the stock market with as little as $1.
Broker services are provided by Cash App Investing,
a subsidiary of Square, a member SIPC.
Since Cash App allows you to buy Bitcoin,
let me mention that cryptocurrency in the context of the history of money is fascinating.
I recommend Ascent of Money as a great book on this history.
Debits and credits on ledgers started over 30,000 years ago.
The US dollar was created over 200 years ago,
and Bitcoin, the first decentralized cryptocurrency, released just over 10 years ago.
So given that history, cryptocurrency is still very much in its early days of development,
but it’s still aiming to and just might redefine the nature of money.
So again, if you get Cash App from the App Store or Google Play
and use the code LEXPODCAST, you’ll get $10,
and Cash App will also donate $10 to FIRST,
one of my favorite organizations that is helping to advance robotics and STEM education
for young people around the world.
And now, here’s my conversation with Andrew Ng.
The courses you taught on machine learning at Stanford
and later on Coursera that you cofounded have educated and inspired millions of people.
So let me ask you, what people or ideas inspired you
to get into computer science and machine learning when you were young?
When did you first fall in love with the field, is another way to put it.
Growing up in Hong Kong and Singapore, I started learning to code when I was five or six years old.
At that time, I was learning the basic programming language,
and they would take these books and they’ll tell you,
type this program into your computer, so type that program to my computer.
And as a result of all that typing, I would get to play these very simple shoot them up games
that I had implemented on my little computer.
So I thought it was fascinating as a young kid that I could write this code.
I was really just copying code from a book into my computer
to then play these cool little video games.
Another moment for me was when I was a teenager and my father,
who’s a doctor, was reading about expert systems and about neural networks.
So he got me to read some of these books, and I thought it was really cool.
You could write a computer that started to exhibit intelligence.
Then I remember doing an internship while I was in high school, this was in Singapore,
where I remember doing a lot of photocopying and as an office assistant.
And the highlight of my job was when I got to use the shredder.
So the teenager me, remote thinking, boy, this is a lot of photocopying.
If only we could write software, build a robot, something to automate this,
maybe I could do something else.
So I think a lot of my work since then has centered on the theme of automation.
Even the way I think about machine learning today,
we’re very good at writing learning algorithms that can automate things that people can do.
Or even launching the first MOOCs, Mass Open Online Courses, that later led to Coursera.
I was trying to automate what could be automatable in how I was teaching on campus.
Process of education, trying to automate parts of that to make it more,
sort of to have more impact from a single teacher, a single educator.
Yeah, I felt, you know, teaching at Stanford,
teaching machine learning to about 400 students a year at the time.
And I found myself filming the exact same video every year,
telling the same jokes in the same room.
And I thought, why am I doing this?
Why don’t we just take last year’s video?
And then I can spend my time building a deeper relationship with students.
So that process of thinking through how to do that,
that led to the first MOOCs that we launched.
And then you have more time to write new jokes.
Are there favorite memories from your early days at Stanford,
teaching thousands of people in person and then millions of people online?
You know, teaching online, what not many people know was that a lot of those videos
were shot between the hours of 10 p.m. and 3 a.m.
A lot of times, we were launching the first MOOCs at Stanford.
We had already announced the course, about 100,000 people signed up.
We just started to write the code and we had not yet actually filmed the videos.
So a lot of pressure, 100,000 people waiting for us to produce the content.
So many Fridays, Saturdays, I would go out, have dinner with my friends,
and then I would think, OK, do you want to go home now?
Or do you want to go to the office to film videos?
And the thought of being able to help 100,000 people potentially learn machine learning,
fortunately, that made me think, OK, I want to go to my office,
go to my tiny little recording studio.
I would adjust my Logitech webcam, adjust my Wacom tablet,
make sure my lapel mic was on,
and then I would start recording often until 2 a.m. or 3 a.m.
I think unfortunately, that doesn’t show that it was recorded that late at night,
but it was really inspiring the thought that we could create content
to help so many people learn about machine learning.
How did that feel?
The fact that you’re probably somewhat alone,
maybe a couple of friends recording with a Logitech webcam
and kind of going home alone at 1 or 2 a.m. at night
and knowing that that’s going to reach sort of thousands of people,
eventually millions of people, what’s that feeling like?
I mean, is there a feeling of just satisfaction of pushing through?
I think it’s humbling.
And I wasn’t thinking about what I was feeling.
I think one thing that I’m proud to say we got right from the early days
was I told my whole team back then that the number one priority
is to do what’s best for learners, do what’s best for students.
And so when I went to the recording studio,
the only thing on my mind was what can I say?
How can I design my slides?
What I need to draw right to make these concepts as clear as possible for learners?
I think I’ve seen sometimes instructors is tempting to,
hey, let’s talk about my work.
Maybe if I teach you about my research,
someone will cite my papers a couple more times.
And I think one of the things we got right,
launching the first few MOOCs and later building Coursera,
was putting in place that bedrock principle of
let’s just do what’s best for learners and forget about everything else.
And I think that that is a guiding principle
turned out to be really important to the rise of the MOOC movement.
And the kind of learner you imagined in your mind
is as broad as possible, as global as possible.
So really try to reach as many people
interested in machine learning and AI as possible.
I really want to help anyone that had an interest in machine learning
to break into the field.
And I think sometimes I’ve actually had people ask me,
hey, why are you spending so much time explaining gradient descent?
And my answer was, if I look at what I think the learner needs
and what benefit from, I felt that having that
a good understanding of the foundations coming back to the basics
would put them in a better stead to then build on a long term career.
So try to consistently make decisions on that principle.
So one of the things you actually revealed to the narrow AI community
at the time and to the world is that the amount of people
who are actually interested in AI is much larger than we imagined.
By you teaching the class and how popular it became,
it showed that, wow, this isn’t just a small community
of sort of people who go to NeurIPS and it’s much bigger.
It’s developers, it’s people from all over the world.
I mean, I’m Russian, so everybody in Russia is really interested.
There’s a huge number of programmers who are interested in machine learning,
India, China, South America, everywhere.
There’s just millions of people who are interested in machine learning.
So how big do you get a sense that the number of people
is that are interested from your perspective?
I think the number has grown over time.
I think it’s one of those things that maybe it feels like it came out of nowhere,
but it’s an insight that building it, it took years.
It’s one of those overnight successes that took years to get there.
My first foray into this type of online education
was when we were filming my Stanford class
and sticking the videos on YouTube and some other things.
We had uploaded the horrors and so on,
but it’s basically the one hour, 15 minute video that we put on YouTube.
And then we had four or five other versions of websites that I had built,
most of which you would never have heard of
because they reached small audiences,
but that allowed me to iterate,
allowed my team and me to iterate,
to learn what are the ideas that work and what doesn’t.
For example, one of the features I was really excited about
and really proud of was build this website
where multiple people could be logged into the website at the same time.
So today, if you go to a website,
if you are logged in and then I want to log in,
you need to log out because it’s the same browser, the same computer.
But I thought, well, what if two people say you and me
were watching a video together in front of a computer?
What if a website could have you type your name and password,
have me type my name and password,
and then now the computer knows both of us are watching together
and it gives both of us credit for anything we do as a group.
Influencers feature rolled it out in a high school in San Francisco.
We had about 20 something users.
Where’s the teacher there?
Sacred Heart Cathedral Prep, the teacher is great.
I mean, guess what?
Zero people use this feature.
It turns out people studying online,
they want to watch the videos by themselves.
So you can play back, pause at your own speed rather than in groups.
So that was one example of a tiny lesson learned out of many
that allowed us to hone into the set of features.
It sounds like a brilliant feature.
So I guess the lesson to take from that is
there’s something that looks amazing on paper and then nobody uses it.
It doesn’t actually have the impact that you think it might have.
And so, yeah, I saw that you really went through a lot of different features
and a lot of ideas to arrive at Coursera,
the final kind of powerful thing that showed the world
that MOOCs can educate millions.
And I think with the whole machine learning movement as well,
I think it didn’t come out of nowhere.
Instead, what happened was as more people learn about machine learning,
they will tell their friends and their friends will see
how it’s applicable to their work.
And then the community kept on growing.
And I think we’re still growing.
I don’t know in the future what percentage of all developers
will be AI developers.
I could easily see it being north of 50%, right?
Because so many AI developers broadly construed,
not just people doing the machine learning modeling,
but the people building infrastructure, data pipelines,
all the software surrounding the core machine learning model
maybe is even bigger.
I feel like today almost every software engineer
has some understanding of the cloud.
Not all, but maybe this is my microcontroller developer
that doesn’t need to deal with the cloud.
But I feel like the vast majority of software engineers today
are sort of having an appreciation of the cloud.
I think in the future, maybe we’ll approach nearly 100% of all developers
being in some way an AI developer
or at least having an appreciation of machine learning.
And my hope is that there’s this kind of effect
that there’s people who are not really interested in being a programmer
or being into software engineering, like biologists, chemists,
and physicists, even mechanical engineers,
all these disciplines that are now more and more sitting on large data sets.
And here they didn’t think they’re interested in programming
until they have this data set and they realize
there’s this set of machine learning tools
that allow you to use the data set.
So they actually become, they learn to program
and they become new programmers.
So like the, not just because you’ve mentioned
a larger percentage of developers become machine learning people.
So it seems like more and more the kinds of people
who are becoming developers is also growing significantly.
Yeah, I think once upon a time,
only a small part of humanity was literate, could read and write.
And maybe you thought, maybe not everyone needs to learn to read and write.
You just go listen to a few monks read to you and maybe that was enough.
Or maybe you just need a few handful of authors to write the bestsellers
and no one else needs to write.
But what we found was that by giving as many people,
in some countries, almost everyone, basic literacy,
it dramatically enhanced human to human communications.
And we can now write for an audience of one,
such as if I send you an email or you send me an email.
I think in computing, we’re still in that phase
where so few people know how to code
that the coders mostly have to code for relatively large audiences.
But if everyone, or most people became developers at some level,
similar to how most people in developed economies are somewhat literate,
I would love to see the owners of a mom and pop store
be able to write a little bit of code to customize the TV display
for their special this week.
And I think it will enhance human to computer communications,
which is becoming more and more important today as well.
So you think it’s possible that machine learning
becomes kind of similar to literacy,
where like you said, the owners of a mom and pop shop,
is basically everybody in all walks of life
would have some degree of programming capability?
I could see society getting there.
There’s one other interesting thing.
If I go talk to the mom and pop store,
if I talk to a lot of people in their daily professions,
I previously didn’t have a good story for why they should learn to code.
We could give them some reasons.
But what I found with the rise of machine learning and data science is that
I think the number of people with a concrete use for data science
in their daily lives, in their jobs,
may be even larger than the number of people
who have concrete use for software engineering.
For example, if you run a small mom and pop store,
I think if you can analyze the data about your sales, your customers,
I think there’s actually real value there,
maybe even more than traditional software engineering.
So I find that for a lot of my friends in various professions,
be it recruiters or accountants or people that work in the factories,
which I deal with more and more these days,
I feel if they were data scientists at some level,
they could immediately use that in their work.
So I think that data science and machine learning
may be an even easier entree into the developer world
for a lot of people than the software engineering.
That’s interesting.
And I agree with that, but that’s beautifully put.
But we live in a world where most courses and talks have slides,
PowerPoint, keynote,
and yet you famously often still use a marker and a whiteboard.
The simplicity of that is compelling,
and for me at least, fun to watch.
So let me ask, why do you like using a marker and whiteboard,
even on the biggest of stages?
I think it depends on the concepts you want to explain.
For mathematical concepts,
it’s nice to build up the equation one piece at a time,
and the whiteboard marker or the pen and stylus
is a very easy way to build up the equation,
to build up a complex concept one piece at a time
while you’re talking about it,
and sometimes that enhances understandability.
The downside of writing is that it’s slow,
and so if you want a long sentence, it’s very hard to write that.
So I think there are pros and cons,
and sometimes I use slides,
and sometimes I use a whiteboard or a stylus.
The slowness of a whiteboard is also its upside,
because it forces you to reduce everything to the basics.
Some of your talks involve the whiteboard.
I mean, you go very slowly,
and you really focus on the most simple principles,
and that’s a beautiful,
that enforces a kind of a minimalism of ideas
that I think is surprising at least for me is great for education.
Like a great talk, I think, is not one that has a lot of content.
A great talk is one that just clearly says a few simple ideas,
and I think the whiteboard somehow enforces that.
Peter Abbeel, who’s now one of the top roboticists
and reinforcement learning experts in the world,
was your first PhD student.
So I bring him up just because I kind of imagine
this must have been an interesting time in your life,
and do you have any favorite memories of working with Peter,
since you were your first student in those uncertain times,
especially before deep learning really sort of blew up?
Any favorite memories from those times?
Yeah, I was really fortunate to have had Peter Abbeel
as my first PhD student,
and I think even my long term professional success
builds on early foundations or early work
that Peter was so critical to.
So I was really grateful to him for working with me.
What not a lot of people know is just how hard research was,
and still is.
Peter’s PhD thesis was using reinforcement learning
to fly helicopters.
And so, even today, the website heli.stanford.edu,
heli.stanford.edu is still up.
You can watch videos of us using reinforcement learning
to make a helicopter fly upside down,
fly loose roses, so it’s cool.
It’s one of the most incredible robotics videos ever,
so people should watch it.
Oh yeah, thank you.
It’s inspiring.
That’s from like 2008 or seven or six, like that range.
Yeah, something like that.
Yeah, so it was over 10 years old.
That was really inspiring to a lot of people, yeah.
What not many people see is how hard it was.
So Peter and Adam Coase and Morgan Quigley and I
were working on various versions of the helicopter,
and a lot of things did not work.
For example, it turns out one of the hardest problems we had
was when the helicopter’s flying around upside down,
doing stunts, how do you figure out the position?
How do you localize the helicopter?
So we wanted to try all sorts of things.
Having one GPS unit doesn’t work
because you’re flying upside down,
the GPS unit’s facing down, so you can’t see the satellites.
So we experimented trying to have two GPS units,
one facing up, one facing down.
So if you flip over, that didn’t work
because the downward facing one couldn’t synchronize
if you’re flipping quickly.
Morgan Quigley was exploring this crazy,
complicated configuration of specialized hardware
to interpret GPS signals.
Looking at the FPG is completely insane.
Spent about a year working on that, didn’t work.
So I remember Peter, great guy, him and me,
sitting down in my office looking at some of the latest things
we had tried that didn’t work and saying,
done it, what now?
Because we tried so many things and it just didn’t work.
In the end, what we did, and Adam Coles was crucial to this,
was put cameras on the ground and use cameras on the ground
to localize the helicopter.
And that solved the localization problem
so that we could then focus on the reinforcement learning
and inverse reinforcement learning techniques
so it didn’t actually make the helicopter fly.
And I’m reminded, when I was doing this work at Stanford,
around that time, there was a lot of reinforcement learning
theoretical papers, but not a lot of practical applications.
So the autonomous helicopter work for flying helicopters
was one of the few practical applications
of reinforcement learning at the time,
which caused it to become pretty well known.
I feel like we might have almost come full circle with today.
There’s so much buzz, so much hype, so much excitement
about reinforcement learning.
But again, we’re hunting for more applications
of all of these great ideas that David Kuhnke has come up with.
What was the drive sort of in the face of the fact
that most people are doing theoretical work?
What motivates you in the uncertainty and the challenges
to get the helicopter sort of to do the applied work,
to get the actual system to work?
Yeah, in the face of fear, uncertainty, sort of the setbacks
that you mentioned for localization.
I like stuff that works.
In the physical world.
So like, it’s back to the shredder.
You know, I like theory, but when I work on theory myself,
and this is personal taste,
I’m not saying anyone else should do what I do.
But when I work on theory, I personally enjoy it more
if I feel that the work I do will influence people,
have positive impact, or help someone.
I remember when many years ago,
I was speaking with a mathematics professor,
and it kind of just said, hey, why do you do what you do?
It kind of just said, hey, why do you do what you do?
And then he said, he had stars in his eyes when he answered.
And this mathematician, not from Stanford,
different university, he said, I do what I do
because it helps me to discover truth and beauty
in the universe.
He had stars in his eyes when he said that.
And I thought, that’s great.
I don’t want to do that.
I think it’s great that someone does that,
fully support the people that do it,
a lot of respect for people that do that.
But I am more motivated when I can see a line
to how the work that my teams and I are doing helps people.
The world needs all sorts of people.
I’m just one type.
I don’t think everyone should do things
the same way as I do.
But when I delve into either theory or practice,
if I personally have conviction that here’s a pathway
to help people, I find that more satisfying
to have that conviction.
That’s your path.
You were a proponent of deep learning
before it gained widespread acceptance.
What did you see in this field that gave you confidence?
What was your thinking process like in that first decade
of the, I don’t know what that’s called, 2000s, the aughts?
Yeah, I can tell you the thing we got wrong
and the thing we got right.
The thing we really got wrong was the importance of,
the early importance of unsupervised learning.
So early days of Google Brain,
we put a lot of effort into unsupervised learning
rather than supervised learning.
And there was this argument,
I think it was around 2005 after NeurIPS,
at that time called NIPS, but now NeurIPS had ended.
And Jeff Hinton and I were sitting in the cafeteria
outside the conference.
We had lunch, we were just chatting.
And Jeff pulled up this napkin.
He started sketching this argument on a napkin.
It was very compelling, as I’ll repeat it.
Human brain has about a hundred trillion.
So there’s 10 to the 14 synaptic connections.
You will live for about 10 to the nine seconds.
That’s 30 years.
You actually live for two by 10 to the nine,
maybe three by 10 to the nine seconds.
So just let’s say 10 to the nine.
So if each synaptic connection,
each weight in your brain’s neural network
has just a one bit parameter,
that’s 10 to the 14 bits you need to learn
in up to 10 to the nine seconds.
10 to the nine seconds of your life.
So via this simple argument,
which is a lot of problems, it’s very simplified.
That’s 10 to the five bits per second
you need to learn in your life.
And I have a one year old daughter.
I am not pointing out 10 to five bits per second
of labels to her.
And I think I’m a very loving parent,
but I’m just not gonna do that.
So from this very crude, definitely problematic argument,
there’s just no way that most of what we know
is through supervised learning.
But where you get so many bits of information
is from sucking in images, audio,
those experiences in the world.
And so that argument,
and there are a lot of known forces argument
you should go into,
really convinced me that there’s a lot of power
to unsupervised learning.
So that was the part that we actually maybe got wrong.
I still think unsupervised learning is really important,
but in the early days, 10, 15 years ago,
a lot of us thought that was the path forward.
Oh, so you’re saying that that perhaps
was the wrong intuition for the time.
For the time, that was the part we got wrong.
The part we got right was the importance of scale.
So Adam Coates, another wonderful person,
fortunate to have worked with him,
he was in my group at Stanford at the time
and Adam had run these experiments at Stanford
showing that the bigger we train a learning algorithm,
the better its performance.
And it was based on that.
There was a graph that Adam generated
where the X axis, Y axis lines going up into the right.
So the bigger you make this thing,
the better its performance accuracy is the vertical axis.
So it’s really based on that chart that Adam generated
that he gave me the conviction
that you could scale these models way bigger
than what we could on a few CPUs,
which is where we had at Stanford
that we could get even better results.
And it was really based on that one figure
that Adam generated
that gave me the conviction to go with Sebastian Thrun
to pitch starting a project at Google,
which became the Google Brain project.
The Brain, you go find a Google Brain.
And there the intuition was scale
will bring performance for the system.
So we should chase a larger and larger scale.
And I think people don’t realize how groundbreaking of it.
It’s simple, but it’s a groundbreaking idea
that bigger data sets will result in better performance.
It was controversial at the time.
Some of my well meaning friends,
senior people in the machine learning community,
I won’t name, but some of whom we know,
my well meaning friends came
and were trying to give me friendly,
I was like, hey, Andrew, why are you doing this?
This is crazy.
It’s in the near natural architecture.
Look at these architectures of building.
You just want to go for scale?
Like this is a bad career move.
So my well meaning friends,
some of them were trying to talk me out of it.
But I find that if you want to make a breakthrough,
you sometimes have to have conviction
and do something before it’s popular,
since that lets you have a bigger impact.
Let me ask you just a small tangent on that topic.
I find myself arguing with people saying that greater scale,
especially in the context of active learning,
so very carefully selecting the data set,
but growing the scale of the data set
is going to lead to even further breakthroughs
in deep learning.
And there’s currently pushback at that idea
that larger data sets are no longer,
so you want to increase the efficiency of learning.
You want to make better learning mechanisms.
And I personally believe that bigger data sets will still,
with the same learning methods we have now,
will result in better performance.
What’s your intuition at this time
on this dual side?
Do we need to come up with better architectures for learning
or can we just get bigger, better data sets
that will improve performance?
I think both are important and it’s also problem dependent.
So for a few data sets,
we may be approaching a Bayes error rate
or approaching or surpassing human level performance
and then there’s that theoretical ceiling
that we will never surpass,
so Bayes error rate.
But then I think there are plenty of problems
where we’re still quite far
from either human level performance
or from Bayes error rate
and bigger data sets with neural networks
without further algorithmic innovation
will be sufficient to take us further.
But on the flip side,
if we look at the recent breakthroughs
using transforming networks or language models,
it was a combination of novel architecture
but also scale had a lot to do with it.
If we look at what happened with GP2 and BERTZ,
I think scale was a large part of the story.
Yeah, that’s not often talked about
is the scale of the data set it was trained on
and the quality of the data set
because there’s some,
so it was like reddit threads that had,
they were operated highly.
So there’s already some weak supervision
on a very large data set
that people don’t often talk about, right?
I find that today we have maturing processes
to managing code,
things like Git, right?
Version control.
It took us a long time to evolve the good processes.
I remember when my friends and I
were emailing each other C++ files in email,
but then we had,
was it CVS or version Git?
Maybe something else in the future.
We’re very mature in terms of tools for managing data
and think about the clean data
and how to solve down very hot, messy data problems.
I think there’s a lot of innovation there
to be had still.
I love the idea that you were versioning through email.
I’ll give you one example.
When we work with manufacturing companies,
it’s not at all uncommon
for there to be multiple labels
that disagree with each other, right?
And so we would do the work in visual inspection.
We will take, say, a plastic part
and show it to one inspector
and the inspector, sometimes very opinionated,
they’ll go, clearly, that’s a defect.
This scratch, unacceptable.
Gotta reject this part.
Take the same part to different inspector,
different, very opinionated.
Clearly, the scratch is small.
It’s fine.
Don’t throw it away.
You’re gonna make us, you know.
And then sometimes you take the same plastic part,
show it to the same inspector
in the afternoon, I suppose, in the morning,
and very opinionated go, in the morning,
they say, clearly, it’s okay.
In the afternoon, equally confident.
Clearly, this is a defect.
And so what is an AI team supposed to do
if sometimes even one person doesn’t agree
with himself or herself in the span of a day?
So I think these are the types of very practical,
very messy data problems that my teams wrestle with.
In the case of large consumer internet companies
where you have a billion users,
you have a lot of data.
You don’t worry about it.
Just take the average.
It kind of works.
But in a case of other industry settings,
we don’t have big data.
If just a small data, very small data sets,
maybe around 100 defective parts
or 100 examples of a defect.
If you have only 100 examples,
these little labeling errors,
if 10 of your 100 labels are wrong,
that actually is 10% of your data set has a big impact.
So how do you clean this up?
What are you supposed to do?
This is an example of the types of things
that my teams, this is a landing AI example,
are wrestling with to deal with small data,
which comes up all the time
once you’re outside consumer internet.
Yeah, that’s fascinating.
So then you invest more effort and time
in thinking about the actual labeling process.
What are the labels?
What are the how are disagreements resolved
and all those kinds of like pragmatic real world problems.
That’s a fascinating space.
Yeah, I find that actually when I’m teaching at Stanford,
I increasingly encourage students at Stanford
to try to find their own project
for the end of term project,
rather than just downloading someone else’s
nicely clean data set.
It’s actually much harder if you need to go
and define your own problem and find your own data set,
rather than you go to one of the several good websites,
very good websites with clean scoped data sets
that you could just work on.
You’re now running three efforts,
the AI Fund, Landing AI, and deeplearning.ai.
As you’ve said, the AI Fund is involved
in creating new companies from scratch.
Landing AI is involved in helping
already established companies do AI
and deeplearning.ai is for education of everyone else
or of individuals interested in getting into the field
and excelling in it.
So let’s perhaps talk about each of these areas.
First, deeplearning.ai.
How, the basic question,
how does a person interested in deep learning
get started in the field?
Deep learning.ai is working to create courses
to help people break into AI.
So my machine learning course that I taught through Stanford
is one of the most popular courses on Coursera.
To this day, it’s probably one of the courses,
sort of, if I asked somebody,
how did you get into machine learning
or how did you fall in love with machine learning
or would get you interested,
it always goes back to Andrew Ng at some point.
I see, yeah, I’m sure.
You’ve influenced, the amount of people
you’ve influenced is ridiculous.
So for that, I’m sure I speak for a lot of people
say big thank you.
No, yeah, thank you.
I was once reading a news article,
I think it was tech review
and I’m gonna mess up the statistic,
but I remember reading an article that said
something like one third of all programmers are self taught.
I may have the number one third,
around me was two thirds,
but when I read that article,
I thought this doesn’t make sense.
Everyone is self taught.
So, cause you teach yourself.
I don’t teach people.
That’s well put.
Yeah, so how does one get started in deep learning
and where does deeplearning.ai fit into that?
So the deep learning specialization offered by deeplearning.ai
is I think it was Coursera’s top specialization.
It might still be.
So it’s a very popular way for people
to take that specialization
to learn about everything from neural networks
to how to tune in your network
to what is a ConvNet to what is a RNN
or a sequence model or what is an attention model.
And so the deep learning specialization
steps everyone through those algorithms
so you deeply understand it
and can implement it and use it for whatever application.
From the very beginning.
So what would you say are the prerequisites
for somebody to take the deep learning specialization
in terms of maybe math or programming background?
Yeah, need to understand basic programming
since there are programming exercises in Python
and the math prereq is quite basic.
So no calculus is needed.
If you know calculus is great, you get better intuitions
but deliberately try to teach that specialization
without requiring calculus.
So I think high school math would be sufficient.
If you know how to multiply two matrices,
I think that’s great.
So a little basic linear algebra is great.
Basic linear algebra,
even very, very basic linear algebra in some programming.
I think that people that have done the machine learning course
will find a deep learning specialization a bit easier
but it’s also possible to jump
into the deep learning specialization directly
but it will be a little bit harder
since we tend to go over faster concepts
like how does gradient descent work
and what is the objective function
which is covered more slowly in the machine learning course.
Could you briefly mention some of the key concepts
in deep learning that students should learn
that you envision them learning in the first few months
in the first year or so?
So if you take the deep learning specialization,
you learn the foundations of what is a neural network.
How do you build up a neural network
from a single logistic unit to a stack of layers
to different activation functions.
You learn how to train the neural networks.
One thing I’m very proud of in that specialization
is we go through a lot of practical knowhow
of how to actually make these things work.
So what are the differences between different optimization algorithms?
What do you do if the algorithm overfits
or how do you tell if the algorithm is overfitting?
When do you collect more data?
When should you not bother to collect more data?
I find that even today, unfortunately,
there are engineers that will spend six months
trying to pursue a particular direction
such as collect more data
because we heard more data is valuable
but sometimes you could run some tests
and could have figured out six months earlier
that for this particular problem, collecting more data isn’t going to cut it.
So just don’t spend six months collecting more data.
Spend your time modifying the architecture or trying something else.
So go through a lot of the practical knowhow
so that when someone, when you take the deep learning specialization,
you have those skills to be very efficient
in how you build these networks.
So dive right in to play with the network, to train it,
to do the inference on a particular data set,
to build intuition about it without building it up too big
to where you spend, like you said, six months
learning, building up your big project
without building any intuition of a small aspect of the data
that could already tell you everything you need to know about that data.
Yes, and also the systematic frameworks of thinking
for how to go about building practical machine learning.
Maybe to make an analogy, when we learn to code,
we have to learn the syntax of some programming language, right?
Be it Python or C++ or Octave or whatever.
But the equally important or maybe even more important part of coding
is to understand how to string together these lines of code
into coherent things.
So when should you put something in a function column?
When should you not?
How do you think about abstraction?
So those frameworks are what makes a programmer efficient
even more than understanding the syntax.
I remember when I was an undergrad at Carnegie Mellon,
one of my friends would debug their code
by first trying to compile it, and then it was C++ code.
And then every line in the syntax error,
they want to get rid of the syntax errors as quickly as possible.
So how do you do that?
Well, they would delete every single line of code with a syntax error.
So really efficient for getting rid of syntax errors
for horrible debugging errors.
So I think we learn how to debug.
And I think in machine learning,
the way you debug a machine learning program
is very different than the way you do binary search or whatever,
or use a debugger, trace through the code
in traditional software engineering.
So it’s an evolving discipline,
but I find that the people that are really good
at debugging machine learning algorithms
are easily 10x, maybe 100x faster at getting something to work.
And the basic process of debugging is,
so the bug in this case,
why isn’t this thing learning, improving,
sort of going into the questions of overfitting
and all those kinds of things?
That’s the logical space that the debugging is happening in
with neural networks.
Yeah, often the question is, why doesn’t it work yet?
Or can I expect it to eventually work?
And what are the things I could try?
Change the architecture, more data, more regularization,
different optimization algorithm,
different types of data.
So to answer those questions systematically,
so that you don’t spend six months hitting down the blind alley
before someone comes and says,
why did you spend six months doing this?
What concepts in deep learning
do you think students struggle the most with?
Or sort of is the biggest challenge for them
was to get over that hill.
It hooks them and it inspires them and they really get it.
Similar to learning mathematics,
I think one of the challenges of deep learning
is that there are a lot of concepts
that build on top of each other.
If you ask me what’s hard about mathematics,
I have a hard time pinpointing one thing.
Is it addition, subtraction?
Is it a carry?
Is it multiplication?
There’s just a lot of stuff.
I think one of the challenges of learning math
and of learning certain technical fields
is that there are a lot of concepts
and if you miss a concept,
then you’re kind of missing the prerequisite
for something that comes later.
So in the deep learning specialization,
try to break down the concepts
to maximize the odds of each component being understandable.
So when you move on to the more advanced thing,
we learn confidence,
hopefully you have enough intuitions
from the earlier sections
to then understand why we structure confidence
in a certain way
and then eventually why we built RNNs and LSTMs
or attention models in a certain way
building on top of the earlier concepts.
Actually, I’m curious,
you do a lot of teaching as well.
Do you have a favorite,
this is the hard concept moment in your teaching?
Well, I don’t think anyone’s ever turned the interview on me.
I’m glad you get first.
I think that’s a really good question.
Yeah, it’s really hard to capture the moment
when they struggle.
I think you put it really eloquently.
I do think there’s moments
that are like aha moments
that really inspire people.
I think for some reason,
reinforcement learning,
especially deep reinforcement learning
is a really great way
to really inspire people
and get what the use of neural networks can do.
Even though neural networks
really are just a part of the deep RL framework,
but it’s a really nice way
to paint the entirety of the picture
of a neural network
being able to learn from scratch,
knowing nothing and explore the world
and pick up lessons.
I find that a lot of the aha moments
happen when you use deep RL
to teach people about neural networks,
which is counterintuitive.
I find like a lot of the inspired sort of fire
in people’s passion,
people’s eyes,
it comes from the RL world.
Do you find reinforcement learning
to be a useful part
of the teaching process or no?
I still teach reinforcement learning
in one of my Stanford classes
and my PhD thesis was on reinforcement learning.
So I clearly loved a few.
I find that if I’m trying to teach
students the most useful techniques
for them to use today,
I end up shrinking the amount of time
I talk about reinforcement learning.
It’s not what’s working today.
Now, our world changes so fast.
Maybe this will be totally different
in a couple of years.
But I think we need a couple more things
for reinforcement learning to get there.
One of my teams is looking
to reinforcement learning
for some robotic control tasks.
So I see the applications,
but if you look at it as a percentage
of all of the impact
of the types of things we do,
it’s at least today outside of
playing video games, right?
In a few of the games, the scope.
Actually, at NeurIPS,
a bunch of us were standing around
saying, hey, what’s your best example
of an actual deploy reinforcement
learning application?
And among like
senior machine learning researchers, right?
And again, there are some emerging ones,
but there are not that many great examples.
I think you’re absolutely right.
The sad thing is there hasn’t been
a big impactful real world application
of reinforcement learning.
I think its biggest impact to me
has been in the toy domain,
in the game domain,
in the small example.
That’s what I mean for educational purpose.
It seems to be a fun thing to explore
in your networks with.
But I think from your perspective,
and I think that might be
the best perspective is
if you’re trying to educate
with a simple example
in order to illustrate
how this can actually be grown
to scale and have a real world impact,
then perhaps focusing on the fundamentals
of supervised learning
in the context of a simple data set,
even like an MNIST data set
is the right way,
is the right path to take.
The amount of fun I’ve seen people
have with reinforcement learning
has been great,
but not in the applied impact
in the real world setting.
So it’s a trade off,
how much impact you want to have
versus how much fun you want to have.
Yeah, that’s really cool.
And I feel like the world
actually needs all sorts.
Even within machine learning,
I feel like deep learning
is so exciting,
but the AI team
shouldn’t just use deep learning.
I find that my teams
use a portfolio of tools.
And maybe that’s not the exciting thing
to say, but some days
we use a neural net,
some days we use a PCA.
Actually, the other day,
I was sitting down with my team
looking at PCA residuals,
trying to figure out what’s going on
with PCA applied
to manufacturing problem.
And some days we use
a probabilistic graphical model,
some days we use a knowledge draft,
which is one of the things
that has tremendous industry impact.
But the amount of chatter
about knowledge drafts in academia
is really thin compared
to the actual real world impact.
So I think reinforcement learning
should be in that portfolio.
And then it’s about balancing
how much we teach all of these things.
And the world should have
diverse skills.
It’d be sad if everyone
just learned one narrow thing.
Yeah, the diverse skill
help you discover the right tool
for the job.
What is the most beautiful,
surprising or inspiring idea
in deep learning to you?
Something that captivated
your imagination.
Is it the scale that could be,
the performance that could be
achieved with scale?
Or is there other ideas?
I think that if my only job
was being an academic researcher,
if an unlimited budget
and didn’t have to worry
about short term impact
and only focus on long term impact,
I’d probably spend all my time
doing research on unsupervised learning.
I still think unsupervised learning
is a beautiful idea.
At both this past NeurIPS and ICML,
I was attending workshops
or listening to various talks
about self supervised learning,
which is one vertical segment
maybe of unsupervised learning
that I’m excited about.
Maybe just to summarize the idea,
I guess you know the idea
about describing fleet.
No, please.
So here’s the example
of self supervised learning.
Let’s say we grab a lot
of unlabeled images off the internet.
So with infinite amounts
of this type of data,
I’m going to take each image
and rotate it by a random
multiple of 90 degrees.
And then I’m going to train
a supervised neural network
to predict what was
the original orientation.
So it has to be rotated 90 degrees,
180 degrees, 270 degrees,
or zero degrees.
So you can generate
an infinite amounts of labeled data
because you rotated the image
so you know what’s the
ground truth label.
And so various researchers
have found that by taking
unlabeled data and making
up labeled data sets
and training a large neural network
on these tasks,
you can then take the hidden
layer representation and transfer
it to a different task
very powerfully.
Learning word embeddings
where we take a sentence,
delete a word,
predict the missing word,
which is how we learn.
One of the ways we learn
word embeddings
is another example.
And I think there’s now
this portfolio of techniques
for generating these made up tasks.
Another one called jigsaw
would be if you take an image,
cut it up into a three by three grid,
so like a nine,
three by three puzzle piece,
jump up the nine pieces
and have a neural network predict
which of the nine factorial
possible permutations
it came from.
So many groups,
including OpenAI,
Peter B has been doing
some work on this too,
Facebook, Google Brain,
I think DeepMind,
oh actually,
Aaron van der Oort
has great work on the CPC objective.
So many teams are doing exciting work
and I think this is a way
to generate infinite label data
and I find this a very exciting
piece of unsupervised learning.
So long term you think
that’s going to unlock
a lot of power
in machine learning systems
is this kind of unsupervised learning.
I don’t think there’s
a whole enchilada,
I think it’s just a piece of it
and I think this one piece
unsupervised,
self supervised learning
is starting to get traction.
We’re very close
to it being useful.
Well, word embedding
is really useful.
I think we’re getting
closer and closer
to just having a significant
real world impact
maybe in computer vision and video
but I think this concept
and I think there’ll be
other concepts around it.
You know, other unsupervised
learning things that I worked on
I’ve been excited about.
I was really excited
about sparse coding
and ICA,
slow feature analysis.
I think all of these are ideas
that various of us
were working on
about a decade ago
before we all got distracted
by how well supervised
learning was doing.
So we would return
we would return to the fundamentals
of representation learning
that really started
this movement of deep learning.
I think there’s a lot more work
that one could explore around
this theme of ideas
and other ideas
to come up with better algorithms.
So if we could return
to maybe talk quickly
about the specifics
of deep learning.ai
the deep learning specialization
perhaps how long does it take
to complete the course
would you say?
The official length
of the deep learning specialization
is I think 16 weeks
so about four months
but it’s go at your own pace.
So if you subscribe
to the deep learning specialization
there are people that finished it
in less than a month
by working more intensely
and studying more intensely
so it really depends on
on the individual.
When we created
the deep learning specialization
we wanted to make it
very accessible
and very affordable.
And with you know
Coursera and deep learning.ai
education mission
one of the things
that’s really important to me
is that if there’s someone
for whom paying anything
is a financial hardship
then just apply for financial aid
and get it for free.
If you were to recommend
a daily schedule for people
in learning whether it’s
through the deep learning.ai
specialization or just learning
in the world of deep learning
what would you recommend?
How do they go about day to day
sort of specific advice
about learning
about their journey in the world
of deep learning machine learning?
I think getting the habit of learning
is key and that means regularity.
So for example
we send out a weekly newsletter
the batch every Wednesday
so people know it’s coming Wednesday
you can spend a little bit of time
on Wednesday
catching up on the latest news
through the batch on Wednesday
and for myself
I’ve picked up a habit of spending
some time every Saturday
and every Sunday reading or studying
and so I don’t wake up on the Saturday
and have to make a decision
do I feel like reading
or studying today or not
it’s just what I do
and the fact is a habit
makes it easier.
So I think if someone can get into that habit
it’s like you know
just like we brush our teeth every morning
I don’t think about it
if I thought about it
it’s a little bit annoying
to have to spend two minutes doing that
but it’s a habit that it takes
no cognitive load
but this would be so much harder
if we have to make a decision every morning
and actually that’s the reason
why I wear the same thing every day as well
it’s just one less decision
I just get up and wear my blue shirt
so but I think if you can get that habit
that consistency of studying
then it actually feels easier.
So yeah it’s kind of amazing
in my own life
like I play guitar every day for
I force myself to at least for five minutes
play guitar
it’s just it’s a ridiculously short period of time
but because I’ve gotten into that habit
it’s incredible what you can accomplish
in a period of a year or two years
you can become
you know exceptionally good
at certain aspects of a thing
by just doing it every day
for a very short period of time
it’s kind of a miracle
that that’s how it works
it adds up over time.
Yeah and I think this is often
not about the bursts of sustained efforts
and the all nighters
because you could only do that
a limited number of times
it’s the sustained effort over a long time
I think you know reading two research papers
is a nice thing to do
but the power is not reading two research papers
it’s reading two research papers a week
for a year
then you read a hundred papers
and you actually learn a lot
when you read a hundred papers.
So regularity and making learning a habit
do you have general other study tips
for particularly deep learning
that people should
in their process of learning
is there some kind of recommendations
or tips you have as they learn?
One thing I still do
when I’m trying to study something really deeply
is take handwritten notes
it varies
I know there are a lot of people
that take the deep learning courses
during a commute or something
where it may be more awkward to take notes
so I know it may not work for everyone
but when I’m taking courses on Coursera
and I still take some every now and then
the most recent one I took
was a course on clinical trials
because I was interested about that
I got out my little Moleskine notebook
and what I was seeing on my desk
was just taking down notes
so what the instructor was saying
and that act we know that
that act of taking notes
preferably handwritten notes
increases retention.
So as you’re sort of watching the video
just kind of pausing maybe
and then taking the basic insights down on paper.
Yeah so there have been a few studies
if you search online
you find some of these studies
that taking handwritten notes
because handwriting is slower
as we’re saying just now
it causes you to recode the knowledge
in your own words more
and that process of recoding
promotes long term retention
this is as opposed to typing
which is fine
again typing is better than nothing
or in taking a class
and not taking notes is better
than not taking any class at all
but comparing handwritten notes
and typing
you can usually type faster
for a lot of people
you can handwrite notes
and so when people type
they’re more likely to just transcribe
verbatim what they heard
and that reduces the amount of recoding
and that actually results
in less long term retention.
I don’t know what the psychological effect
there is but so true
there’s something fundamentally different
about writing hand handwriting
I wonder what that is
I wonder if it is as simple
as just the time it takes to write it slower
yeah and because you can’t write
as many words
you have to take whatever they said
and summarize it into fewer words
and that summarization process
requires deeper processing of the meaning
which then results in better retention
that’s fascinating
oh and I think because of Coursera
I spent so much time studying pedagogy
this is actually one of my passions
I really love learning
how to more efficiently
help others learn
you know one of the things I do
both when creating videos
or when we write the batch is
I try to think is one minute spent of us
going to be a more efficient learning experience
than one minute spent anywhere else
and we really try to you know
make it time efficient for the learners
because you know everyone’s busy
so when when we’re editing
I often tell my teams
every word needs to fight for its life
and if you can delete a word
let’s just delete it and not wait
let’s not waste the learning time
oh that’s so it’s so amazing
that you think that way
because there is millions of people
that are impacted by your teaching
and sort of that one minute spent
has a ripple effect right
through years of time
which is it’s just fascinating to think about
how does one make a career
out of an interest in deep learning
do you have advice for people
we just talked about
sort of the beginning early steps
but if you want to make it
an entire life’s journey
or at least a journey of a decade or two
how do you how do you do it
so most important thing is to get started
right and and I think in the early parts
of a career coursework
um like the deep learning specialization
or it’s a very efficient way
to master this material
so because you know instructors
uh be it me or someone else
or you know Lawrence Maroney
teaches our TensorFlow specialization
or other things we’re working on
spend effort to try to make it time efficient
for you to learn a new concept
so coursework is actually a very efficient way
for people to learn concepts
and the beginning parts of breaking
into a new field
in fact one thing I see at Stanford
some of my PhD students want to jump
in the research right away
and I actually tend to say look
in your first couple years of PhD
and spend time taking courses
because it lays a foundation
it’s fine if you’re less productive
in your first couple years
you’ll be better off in the long term
beyond a certain point
there’s materials that doesn’t exist in courses
because it’s too cutting edge
the course hasn’t been created yet
there’s some practical experience
that we’re not yet that good
as teaching in a course
and I think after exhausting
the efficient coursework
then most people need to go on
to either ideally work on projects
and then maybe also continue their learning
by reading blog posts and research papers
and things like that
doing projects is really important
and again I think it’s important
to start small and just do something
today you read about deep learning
feels like oh all these people
doing such exciting things
what if I’m not building a neural network
that changes the world
then what’s the point?
Well the point is sometimes building
that tiny neural network
you know be it MNIST or upgrade
to a fashion MNIST to whatever
so doing your own fun hobby project
that’s how you gain the skills
to let you do bigger and bigger projects
I find this to be true at the individual level
and also at the organizational level
for a company to become good at machine learning
sometimes the right thing to do
is not to tackle the giant project
is instead to do the small project
that lets the organization learn
and then build out from there
but this is true both for individuals
and for companies
taking the first step
and then taking small steps is the key
should students pursue a PhD
do you think you can do so much
that’s one of the fascinating things
in machine learning
you can have so much impact
without ever getting a PhD
so what are your thoughts
should people go to grad school
should people get a PhD?
I think that there are multiple good options
of which doing a PhD could be one of them
I think that if someone’s admitted
to a top PhD program
you know at MIT, Stanford, top schools
I think that’s a very good experience
or if someone gets a job
at a top organization
at the top AI team
I think that’s also a very good experience
there are some things you still need a PhD to do
if someone’s aspiration is to be a professor
you know at the top academic university
you just need a PhD to do that
but if it goes to you know
start a company, build a company
do great technical work
I think a PhD is a good experience
but I would look at the different options
available to someone
you know where are the places
where you can get a job
where are the places to get a PhD program
and kind of weigh the pros and cons of those
So just to linger on that for a little bit longer
what final dreams and goals
do you think people should have
so what options should they explore
so you can work in industry
so for a large company
like Google, Facebook, Baidu
all these large sort of companies
that already have huge teams
of machine learning engineers
you can also do with an industry
sort of more research groups
that kind of like Google Research, Google Brain
then you can also do
like we said a professor in academia
and what else
oh you can build your own company
you can do a startup
is there anything that stands out
between those options
or are they all beautiful different journeys
that people should consider
I think the thing that affects your experience more
is less are you in this company
versus that company
or academia versus industry
I think the thing that affects your experience most
is who are the people you’re interacting with
in a daily basis
so even if you look at some of the large companies
the experience of individuals
in different teams is very different
and what matters most is not the logo above the door
when you walk into the giant building every day
what matters the most is who are the 10 people
who are the 30 people you interact with every day
so I actually tend to advise people
if you get a job from a company
ask who is your manager
who are your peers
who are you actually going to talk to
we’re all social creatures
we tend to become more like the people around us
and if you’re working with great people
you will learn faster
or if you get admitted
if you get a job at a great company
or a great university
maybe the logo you walk in is great
but you’re actually stuck on some team
doing really work that doesn’t excite you
and then that’s actually a really bad experience
so this is true both for universities
and for large companies
for small companies you can kind of figure out
who you’ll be working with quite quickly
and I tend to advise people
if a company refuses to tell you
who you will work with
someone say oh join us
the rotation system will figure it out
I think that that’s a worrying answer
because it because it means you may not get sent
to you may not actually get to a team
with great peers and great people to work with
it’s actually a really profound advice
that we kind of sometimes sweep
we don’t consider too rigorously or carefully
the people around you are really often
especially when you accomplish great things
it seems the great things are accomplished
because of the people around you
so that’s a it’s not about the the
where whether you learn this thing
or that thing or like you said
the logo that hangs up top
it’s the people that’s a fascinating
and it’s such a hard search process
of finding just like finding the right friends
and somebody to get married with
and that kind of thing
it’s a very hard search
it’s a people search problem
yeah but I think when someone interviews
you know at a university
or the research lab or the large corporation
it’s good to insist on just asking
who are the people
who is my manager
and if you refuse to tell me
I’m gonna think well maybe that’s
because you don’t have a good answer
it may not be someone I like
and if you don’t particularly connect
if something feels off with the people
then don’t stick to it
you know that’s a really important signal to consider
yeah yeah and actually I actually
in my standard class CS230
as well as an ACM talk
I think I gave like a hour long talk
on career advice
including on the job search process
and then some of these
so you can find those videos online
awesome and I’ll point them
I’ll point people to them
beautiful
so the AI fund helps AI startups
get off the ground
or perhaps you can elaborate
on all the fun things it’s involved with
what’s your advice
and how does one build a successful AI startup
you know in Silicon Valley
a lot of startup failures
come from building other products
that no one wanted
so when you know cool technology
but who’s going to use it
so I think I tend to be very outcome driven
and customer obsessed
ultimately we don’t get to vote
if we succeed or fail
it’s only the customer
that they’re the only one
that gets a thumbs up or thumbs down vote
in the long term
in the short term
you know there are various people
that get various votes
but in the long term
that’s what really matters
so as you build the startup
you have to constantly ask the question
will the customer give a thumbs up on this
I think so
I think startups that are very customer focused
customer obsessed
deeply understand the customer
and are oriented to serve the customer
are more likely to succeed
with the provisional
I think all of us should only do things
that we think create social good
and moves the world forward
so I personally don’t want to build
addictive digital products
just to sell a lot of ads
or you know there are things
that could be lucrative
that I won’t do
but if we can find ways to serve people
in meaningful ways
I think those can be
great things to do
either in the academic setting
or in a corporate setting
or a startup setting
so can you give me the idea
of why you started the AI fund
I remember when I was leading
the AI group at Baidu
I had two jobs
two parts of my job
one was to build an AI engine
to support the existing businesses
and that was running
just ran
just performed by itself
there was a second part of my job at the time
which was to try to systematically initiate
new lines of businesses
using the company’s AI capabilities
so you know the self driving car team
came out of my group
the smart speaker team
similar to what is Amazon Echo Alexa in the US
but we actually announced it
before Amazon did
so Baidu wasn’t following Amazon
that came out of my group
and I found that to be
actually the most fun part of my job
so what I wanted to do was
to build AI fund as a startup studio
to systematically create new startups
from scratch
with all the things we can now do with AI
I think the ability to build new teams
to go after this rich space of opportunities
is a very important way
to very important mechanism
to get these projects done
that I think will move the world forward
so I’ve been fortunate to build a few teams
that had a meaningful positive impact
and I felt that we might be able to do this
in a more systematic repeatable way
so a startup studio is a relatively new concept
there are maybe dozens of startup studios
you know right now
but I feel like all of us
many teams are still trying to figure out
how do you systematically build companies
with a high success rate
so I think even a lot of my you know
venture capital friends are
seem to be more and more building companies
rather than investing in companies
but I find a fascinating thing to do
to figure out the mechanisms
by which we could systematically build
successful teams, successful businesses
in areas that we find meaningful
so a startup studio is something
is a place and a mechanism
for startups to go from zero to success
to try to develop a blueprint
it’s actually a place for us
to build startups from scratch
so we often bring in founders
and work with them
or maybe even have existing ideas
that we match founders with
and then this launches
you know hopefully into successful companies
so how close are you to figuring out
a way to automate the process
of starting from scratch
and building a successful AI startup
yeah I think we’ve been constantly
improving and iterating on our processes
how we do that
so things like you know
how many customer calls do we need to make
in order to get customer validation
how do we make sure this technology
can be built
quite a lot of our businesses
need cutting edge machine learning algorithms
so you know kind of algorithms
have developed in the last one or two years
and even if it works in a research paper
it turns out taking the production
is really hard
there are a lot of issues
for making these things work in the real life
that are not widely addressed in academia
so how do we validate
that this is actually doable
how do you build a team
get the specialized domain knowledge
be it in education or health care
whatever sector we’re focusing on
so I think we’ve actually getting
we’ve been getting much better
at giving the entrepreneurs
a high success rate
but I think we’re still
I think the whole world is still
in the early phases of figuring this out
but do you think there is some aspects
of that process that are transferable
from one startup to another
to another to another
yeah very much so
you know starting from scratch
you know starting a company
to most entrepreneurs
is a really lonely thing
and I’ve seen so many entrepreneurs
not know how to make certain decisions
like when do you need to
how do you do B2B sales right
if you don’t know that
it’s really hard
or how do you market this efficiently
other than you know buying ads
which is really expensive
are there more efficient tactics for that
or for a machine learning project
you know basic decisions
can change the course of
whether machine learning product works or not
and so there are so many hundreds of decisions
that entrepreneurs need to make
and making a mistake
and a couple key decisions
can have a huge impact
on the fate of the company
so I think a startup studio
provides a support structure
that makes starting a company
much less of a lonely experience
and also when facing with these key decisions
like trying to hire your first
uh the VP of engineering
what’s a good selection criteria
how do you solve
should I hire this person or not
by helping by having a ecosystem
around the entrepreneurs
the founders to help
I think we help them at the key moments
and hopefully significantly
make them more enjoyable
and then higher success rate
so there’s somebody to brainstorm with
in these very difficult decision points
and also to help them recognize
what they may not even realize
is a key decision point
that’s that’s the first
and probably the most important part
yeah actually I can say one other thing
um you know I think
building companies is one thing
but I feel like it’s really important
that we build companies
that move the world forward
for example within the AI Fund team
there was once an idea
for a new company
that if it had succeeded
would have resulted in people
watching a lot more videos
in a certain narrow vertical type of video
um I looked at it
the business case was fine
the revenue case was fine
but I looked and just said
I don’t want to do this
like you know I don’t actually
just want to have a lot more people
watch this type of video
wasn’t educational
it’s an educational baby
and so and so I I I I code the idea
on the basis that I didn’t think
it would actually help people
so um whether building companies
or working enterprises
or doing personal projects
I think um it’s up to each of us
to figure out what’s the difference
we want to make in the world
With landing AI
you help already established companies
grow their AI and machine learning efforts
how does a large company
integrate machine learning
into their efforts?
AI is a general purpose technology
and I think it will transform every industry
our community has already transformed
to a large extent
the software internet sector
most software internet companies
outside the top right
five or six or three or four
already have reasonable
machine learning capabilities
or or getting there
it’s still room for improvement
but when I look outside
the software internet sector
everything from manufacturing
agriculture, healthcare
logistics transportation
there’s so many opportunities
that very few people are working on
so I think the next wave of AI
is for us to also transform
all of those other industries
there was a McKinsey study
estimating 13 trillion dollars
of global economic growth
US GDP is 19 trillion dollars
so 13 trillion is a big number
or PwC estimates 16 trillion dollars
so whatever number is is large
but the interesting thing to me
was a lot of that impact
will be outside
the software internet sector
so we need more teams
to work with these companies
to help them adopt AI
and I think this is one thing
so make you know
help drive global economic growth
and make humanity more powerful
and like you said the impact is there
so what are the best industries
the biggest industries
where AI can help
perhaps outside the software tech sector
frankly I think it’s all of them
some of the ones I’m spending a lot of time on
are manufacturing agriculture
look into healthcare
for example in manufacturing
we do a lot of work in visual inspection
where today there are people standing around
using the eye human eye
to check if you know
this plastic part or the smartphone
or this thing has a scratch
or a dent or something in it
we can use a camera to take a picture
use a algorithm
deep learning and other things
to check if it’s defective or not
and thus help factories improve yield
and improve quality
and improve throughput
it turns out the practical problems
we run into are very different
than the ones you might read about
in in most research papers
the data sets are really small
so we face small data problems
you know the factories
keep on changing the environment
so it works well on your test set
but guess what
something changes in the factory
the lights go on or off
recently there was a factory
in which a bird threw through the factory
and pooped on something
and so that changed stuff
and so increasing our algorithm
makes robustness
so all the changes happen in the factory
I find that we run a lot of practical problems
that are not as widely discussed
in academia
and it’s really fun
kind of being on the cutting edge
solving these problems before
maybe before many people are even aware
that there is a problem there
and that’s such a fascinating space
you’re absolutely right
but what is the first step
that a company should take
it’s just scary leap
into this new world of
going from the human eye
inspecting to digitizing that process
having a camera
having an algorithm
what’s the first step
like what’s the early journey
that you recommend
that you see these companies taking
I published a document
called the AI Transformation Playbook
that’s online
and taught briefly in the AI for Everyone
course on Coursera
about the long term journey
that companies should take
but the first step
is actually to start small
I’ve seen a lot more companies fail
by starting too big
than by starting too small
take even Google
you know most people don’t realize
how hard it was
and how controversial it was
in the early days
so when I started Google Brain
it was controversial
you know people thought
deep learning near nest
tried it didn’t work
why would you want to do deep learning
so my first internal customer
within Google
was the Google speech team
which is not the most lucrative
project in Google
not the most important
it’s not web search or advertising
but by starting small
my team helped the speech team
build a more accurate speech recognition system
and this caused their peers
other teams to start
to have more faith in deep learning
my second internal customer
was the Google Maps team
where we used computer vision
to read house numbers
from basic street view images
to more accurately locate houses
within Google Maps
so improve the quality of geodata
and it was only after those two successes
that I then started
a more serious conversation
with the Google Ads team
and so there’s a ripple effect
that you showed that it works
in these cases
and then it just propagates
through the entire company
that this thing has a lot of value
and use for us
I think the early small scale projects
it helps the teams gain faith
but also helps the teams learn
what these technologies do
I still remember when our first GPU server
it was a server under some guy’s desk
and you know and then that taught us
early important lessons about
how do you have multiple users
share a set of GPUs
which is really not obvious at the time
but those early lessons were important
we learned a lot from that first GPU server
that later helped the teams think through
how to scale it up
to much larger deployments
Are there concrete challenges
that companies face
that you see is important for them to solve?
I think building and deploying
machine learning systems is hard
there’s a huge gulf between
something that works
in a jupyter notebook on your laptop
versus something that runs
their production deployment setting
in a factory or agriculture plant or whatever
so I see a lot of people
get something to work on your laptop
and say wow look what I’ve done
and that’s great that’s hard
that’s a very important first step
but a lot of teams underestimate
the rest of the steps needed
so for example
I’ve heard this exact same conversation
between a lot of machine learning people
and business people
the machine learning person says
look my algorithm does well on the test set
and it’s a clean test set at the end of peak
and the machine and the business person says
thank you very much
but your algorithm sucks it doesn’t work
and the machine learning person says
no wait I did well on the test set
and I think there is a gulf between
what it takes to do well on the test set
on your hard drive
versus what it takes to work well
in a deployment setting
some common problems
robustness and generalization
you deploy something in the factory
maybe they chop down a tree outside the factory
so the tree no longer covers the window
and the lighting is different
so the test set changes
and in machine learning
and especially in academia
we don’t know how to deal with test set distributions
that are dramatically different
than the training set distribution
you know that this research
the stuff like domain annotation
transfer learning
you know there are people working on it
but we’re really not good at this
so how do you actually get this to work
because your test set distribution
is going to change
and I think also if you look at the number of lines of code
in the software system
the machine learning model is maybe five percent
or even fewer
relative to the entire software system
you need to build
so how do you get all that work done
and make it reliable and systematic
so good software engineering work
is fundamental here
to building a successful small machine learning system
yes and the software system
needs to interface with the machine learning system
needs to interface with people’s workloads
so machine learning is automation on steroids
if we take one task out of many tasks
that are done in the factory
so the factory does lots of things
one task is vision inspection
if we automate that one task
it can be really valuable
but you may need to redesign a lot of other tasks
around that one task
for example say the machine learning algorithm
says this is defective
what are you supposed to do
do you throw it away
do you get a human to double check
do you want to rework it or fix it
so you need to redesign a lot of tasks
around that thing you’ve now automated
so planning for the change management
and making sure that the software you write
is consistent with the new workflow
and you take the time to explain to people
what needs to happen
so I think what landing AI has become good at
and then I think we learned by making the steps
and you know painful experiences
well my what would become good at is
working with our partners to think through
all the things beyond just the machine learning model
or running the jupyter notebook
but to build the entire system
manage the change process
and figure out how to deploy this in a way
that has an actual impact
the processes that the large software tech companies
use for deploying don’t work
for a lot of other scenarios
for example when I was leading large speech teams
if the speech recognition system goes down
what happens well alarms goes off
and then someone like me would say hey
you 20 engine environment
you 20 engineers please fix this
but if you have a system girl in the factory
there are not 20 machine learning engineers
sitting around you can page your duty
and have them fix it
so how do you deal with the maintenance
or the or the dev ops or the mo ops
or the other aspects of this
so these are concepts that I think landing AI
and a few other teams on the cutting edge
but we don’t even have systematic terminology yet
to describe some of the stuff we do
because I think we’re inventing it on the fly.
So you mentioned some people are interested
in discovering mathematical beauty
and truth in the universe
and you’re interested in having
a big positive impact in the world
so let me ask the two are not inconsistent
no they’re all together
I’m only half joking
because you’re probably interested a little bit in both
but let me ask a romanticized question
so much of the work
your work and our discussion today
has been on applied AI
maybe you can even call narrow AI
where the goal is to create systems
that automate some specific process
that adds a lot of value to the world
but there’s another branch of AI
starting with Alan Turing
that kind of dreams of creating human level
or superhuman level intelligence
is this something you dream of as well
do you think we human beings
will ever build a human level intelligence
or superhuman level intelligence system?
I would love to get to AGI
and I think humanity will
but whether it takes 100 years
or 500 or 5000
I find hard to estimate
do you have
some folks have worries
about the different trajectories
that path would take
even existential threats of an AGI system
do you have such concerns
whether in the short term or the long term?
I do worry about the long term fate of humanity
I do wonder as well
I do worry about overpopulation on the planet Mars
just not today
I think there will be a day
when maybe someday in the future
Mars will be polluted
there are all these children dying
and someone will look back at this video
and say Andrew how is Andrew so heartless?
He didn’t care about all these children
dying on the planet Mars
and I apologize to the future viewer
I do care about the children
but I just don’t know how to
productively work on that today
your picture will be in the dictionary
for the people who are ignorant
about the overpopulation on Mars
yes so it’s a long term problem
is there something in the short term
we should be thinking about
in terms of aligning the values of our AI systems
with the values of us humans
sort of something that Stuart Russell
and other folks are thinking about
as this system develops more and more
we want to make sure that it represents
the better angels of our nature
the ethics the values of our society
you know if you take self driving cars
the biggest problem with self driving cars
is not that there’s some trolley dilemma
and you teach this so you know
how many times when you are driving your car
did you face this moral dilemma
who do I crash into?
so I think self driving cars
will run into that problem roughly as often
as we do when we drive our cars
the biggest problem with self driving cars
is when there’s a big white truck across the road
and what you should do is break
and not crash into it
and the self driving car fails
and it crashes into it
so I think we need to solve that problem first
I think the problem with some of these discussions
about AGI you know alignments
the paperclip problem
is that is a huge distraction
from the much harder problems
that we actually need to address today
it’s not the hardest problems
we need to address today
it’s not the hard problems
we need to address today
I think bias is a huge issue
I worry about wealth and equality
the AI and internet are causing
an acceleration of concentration of power
because we can now centralize data
use AI to process it
and so industry after industry
we’ve affected every industry
so the internet industry has a lot of
win and take most
or win and take all dynamics
but we’ve infected all these other industries
so we’re also giving these other industries
most of them to take all flavors
so look at what Uber and Lyft
did to the taxi industry
so we’re doing this type of thing
it’s a lot and so this
so we’re creating tremendous wealth
but how do we make sure that the wealth
is fairly shared
I think that and then how do we help
people whose jobs are displaced
you know I think education is part of it
there may be even more
that we need to do than education
I think bias is a serious issue
there are adverse uses of AI
like deepfakes being used
for various and various purposes
so I worry about some teams
maybe accidentally
and I hope not deliberately
making a lot of noise about things
that problems in the distant future
rather than focusing on
some of the much harder problems
yeah the overshadow of the problems
that we have already today
they’re exceptionally challenging
like those you said
and even the silly ones
but the ones that have a huge impact
huge impact
which is the lighting variation
outside of your factory window
that that ultimately is
what makes the difference
between like you said
the Jupiter notebook
and something that actually transforms
an entire industry potentially
yeah and I think
and then just to some companies
or a regulator comes to you
and says look your product
is messing things up
fixing it may have a revenue impact
well it’s much more fun to talk to them
about how you promise
not to wipe out humanity
and to face the actually really hard problems we face
so your life has been a great journey
from teaching to research
to entrepreneurship
two questions
one are there regrets
moments that if you went back
you would do differently
and two are there moments
you’re especially proud of
moments that made you truly happy
you know I’ve made so many mistakes
it feels like every time
I discover something
I go why didn’t I think of this
you know five years earlier
or even 10 years earlier
and as recently
and then sometimes I read a book
and I go I wish I read this book 10 years ago
my life would have been so different
although that happened recently
and then I was thinking
if only I read this book
when we’re starting up Coursera
I could have been so much better
but I discovered the book
had not yet been written
we’re starting Coursera
so that made me feel better
but I find that the process of discovery
we keep on finding out things
that seem so obvious in hindsight
but it always takes us so much longer
than than I wish to to figure it out
so on the second question
are there moments in your life
that if you look back
that you’re especially proud of
or you’re especially happy
what would be the that filled you with happiness
and fulfillment
well two answers
one does my daughter know of her
yes of course
because I know how much time I spent with her
I just can’t spend enough time with her
congratulations by the way
thank you
and then second is helping other people
I think to me
I think the meaning of life
is helping others achieve
whatever are their dreams
and then also to try to move the world forward
making humanity more powerful as a whole
so the times that I felt most happy
most proud was when I felt
someone else allowed me the good fortune
of helping them a little bit
on the path to their dreams
I think there’s no better way to end it
than talking about happiness
and the meaning of life
so Andrew it’s a huge honor
me and millions of people
thank you for all the work you’ve done
thank you for talking today
thank you so much thanks
thanks for listening to this conversation with Andrew Ng
and thank you to our presenting sponsor Cash App
download it use code LEX podcast
you’ll get ten dollars
and ten dollars will go to FIRST
an organization that inspires and educates young minds
to become science and technology innovators of tomorrow
if you enjoy this podcast
subscribe on YouTube
give it five stars on Apple podcast
support it on Patreon
or simply connect with me on Twitter
at LEX Freedman
and now let me leave you with some words of wisdom from Andrew Ng
ask yourself
if what you’re working on succeeds beyond your wildest dreams
would you have significantly helped other people?
if not then keep searching for something else to work on
otherwise you’re not living up to your full potential
thank you for listening and hope to see you next time