Lex Fridman Podcast - #93 - Daphne Koller: Biomedicine and Machine Learning

The following is a conversation with Daphne Koller,

a professor of computer science at Stanford University,

a cofounder of Coursera with Andrew Ng,

and founder and CEO of Incitro,

a company at the intersection

of machine learning and biomedicine.

We’re now in the exciting early days

of using the data driven methods of machine learning

to help discover and develop new drugs

and treatments at scale.

Daphne and Incitro are leading the way on this

with breakthroughs that may ripple

through all fields of medicine,

including ones most critical for helping

with the current coronavirus pandemic.

This conversation was recorded

before the COVID 19 outbreak.

For everyone feeling the medical, psychological,

and financial burden of this crisis,

I’m sending love your way.

Stay strong, we’re in this together, we’ll beat this thing.

This is the Artificial Intelligence Podcast.

If you enjoy it, subscribe on YouTube,

review it with five stars on Apple Podcast,

support it on Patreon,

or simply connect with me on Twitter

at Lex Friedman, spelled F R I D M A N.

As usual, I’ll do a few minutes of ads now

and never any ads in the middle

that can break the flow of this conversation.

I hope that works for you

and doesn’t hurt the listening experience.

This show is presented by Cash App,

the number one finance app in the app store.

When you get it, use code LEXPODCAST.

Cash App lets you send money to friends,

buy Bitcoin, and invest in the stock market

with as little as one dollar.

Since Cash App allows you to send

and receive money digitally,

peer to peer, and security in all digital transactions

is very important,

let me mention the PCI data security standard

that Cash App is compliant with.

I’m a big fan of standards for safety and security.

PCI DSS is a good example of that,

where a bunch of competitors got together

and agreed that there needs to be a global standard

around the security of transactions.

Now we just need to do the same for autonomous vehicles

and AI systems in general.

So again, if you get Cash App from the App Store

or Google Play and use the code LEXPODCAST,

you get $10 and Cash App will also donate $10 to FIRST,

an organization that is helping to advance robotics

and STEM education for young people around the world.

And now here’s my conversation with Daphne Koller.

So you cofounded Coursera and made a huge impact

in the global education of AI.

And after five years in August, 2016,

wrote a blog post saying that you’re stepping away

and wrote, quote,

it is time for me to turn to another critical challenge,

the development of machine learning

and its applications to improving human health.

So let me ask two far out philosophical questions.

One, do you think we’ll one day find cures

for all major diseases known today?

And two, do you think we’ll one day figure out

a way to extend the human lifespan,

perhaps to the point of immortality?

So one day is a very long time

and I don’t like to make predictions

of the type we will never be able to do X

because I think that’s a smacks of hubris.

It seems that never in the entire eternity

of human existence will we be able to solve a problem.

That being said, curing disease is very hard

because oftentimes by the time you discover the disease,

a lot of damage has already been done.

And so to assume that we would be able to cure disease

at that stage assumes that we would come up with ways

of basically regenerating entire parts of the human body

in the way that actually returns it to its original state.

And that’s a very challenging problem.

We have cured very few diseases.

We’ve been able to provide treatment

for an increasingly large number,

but the number of things that you could actually define

to be cures is actually not that large.

So I think that there’s a lot of work

that would need to happen before one could legitimately say

that we have cured even a reasonable number,

far less all diseases.

On the scale of zero to 100,

where are we in understanding the fundamental mechanisms

of all of major diseases?

What’s your sense?

So from the computer science perspective

that you’ve entered the world of health,

how far along are we?

I think it depends on which disease.

I mean, there are ones where I would say

we’re maybe not quite at a hundred

because biology is really complicated

and there’s always new things that we uncover

that people didn’t even realize existed.

But I would say there’s diseases

where we might be in the 70s or 80s,

and then there’s diseases in which I would say

with probably the majority where we’re really close to zero.

Would Alzheimer’s and schizophrenia

and type two diabetes fall closer to zero or to the 80?

I think Alzheimer’s is probably closer to zero than to 80.

There are hypotheses,

but I don’t think those hypotheses have as of yet

been sufficiently validated that we believe them to be true.

And there is an increasing number of people

who believe that the traditional hypotheses

might not really explain what’s going on.

I would also say that Alzheimer’s and schizophrenia

and even type two diabetes are not really one disease.

They’re almost certainly a heterogeneous collection

of mechanisms that manifest in clinically similar ways.

So in the same way that we now understand

that breast cancer is really not one disease,

it is multitude of cellular mechanisms,

all of which ultimately translate

to uncontrolled proliferation, but it’s not one disease.

The same is almost undoubtedly true

for those other diseases as well.

And that understanding that needs to precede

any understanding of the specific mechanisms

of any of those other diseases.

Now, in schizophrenia, I would say

we’re almost certainly closer to zero than to anything else.

Type two diabetes is a bit of a mix.

There are clear mechanisms that are implicated

that I think have been validated

that have to do with insulin resistance and such,

but there’s almost certainly there as well

many mechanisms that we have not yet understood.

You’ve also thought and worked a little bit

on the longevity side.

Do you see the disease and longevity as overlapping

completely, partially, or not at all as efforts?

Those mechanisms are certainly overlapping.

There’s a well known phenomenon that says

that for most diseases, other than childhood diseases,

the risk for contracting that disease

increases exponentially year on year,

every year from the time you’re about 40.

So obviously there’s a connection between those two things.

That’s not to say that they’re identical.

There’s clearly aging that happens

that is not really associated with any specific disease.

And there’s also diseases and mechanisms of disease

that are not specifically related to aging.

So I think overlap is where we’re at.


It is a little unfortunate that we get older

and it seems that there’s some correlation

with the occurrence of diseases

or the fact that we get older.

And both are quite sad.

I mean, there’s processes that happen as cells age

that I think are contributing to disease.

Some of those have to do with DNA damage

that accumulates as cells divide

where the repair mechanisms don’t fully correct for those.

There are accumulations of proteins

that are misfolded and potentially aggregate

and those too contribute to disease

and will contribute to inflammation.

There’s a multitude of mechanisms that have been uncovered

that are sort of wear and tear at the cellular level

that contribute to disease processes

and I’m sure there’s many that we don’t yet understand.

On a small tangent and perhaps philosophical,

the fact that things get older

and the fact that things die is a very powerful feature

for the growth of new things.

It’s a learning, it’s a kind of learning mechanism.

So it’s both tragic and beautiful.

So do you, so in trying to fight disease

and trying to fight aging,

do you think about sort of the useful fact of our mortality

or would you, like if you were, could be immortal,

would you choose to be immortal?

Again, I think immortal is a very long time

and I don’t know that that would necessarily be something

that I would want to aspire to

but I think all of us aspire to an increased health span,

I would say, which is an increased amount of time

where you’re healthy and active

and feel as you did when you were 20

and we’re nowhere close to that.

People deteriorate physically and mentally over time

and that is a very sad phenomenon.

So I think a wonderful aspiration would be

if we could all live to the biblical 120 maybe

in perfect health.

In high quality of life.

High quality of life.

I think that would be an amazing goal

for us to achieve as a society

now is the right age 120 or 100 or 150.

I think that’s up for debate

but I think an increased health span

is a really worthy goal.

And anyway, in a grand time of the age of the universe,

it’s all pretty short.

So from the perspective,

you’ve done obviously a lot of incredible work

in machine learning.

So what role do you think data and machine learning

play in this goal of trying to understand diseases

and trying to eradicate diseases?

Up until now, I don’t think it’s played

very much of a significant role

because largely the data sets that one really needed

to enable a powerful machine learning methods,

those data sets haven’t really existed.

There’s been dribs and drabs

and some interesting machine learning

that has been applied, I would say machine learning

slash data science,

but the last few years are starting to change that.

So we now see an increase in some large data sets

but equally importantly, an increase in technologies

that are able to produce data at scale.

It’s not typically the case that people have deliberately

proactively used those tools

for the purpose of generating data for machine learning.

They, to the extent that those techniques

have been used for data production,

they’ve been used for data production

to drive scientific discovery

and the machine learning came as a sort of byproduct

second stage of, oh, you know, now we have a data set,

let’s do machine learning on that

rather than a more simplistic data analysis method.

But what we are doing in Citro

is actually flipping that around and saying,

here’s this incredible repertoire of methods

that bioengineers, cell biologists have come up with,

let’s see if we can put them together in brand new ways

with the goal of creating data sets

that machine learning can really be applied on productively

to create powerful predictive models

that can help us address fundamental problems

in human health.

So really focus to get, make data the primary focus

and the primary goal and find,

use the mechanisms of biology and chemistry

to create the kinds of data set

that could allow machine learning to benefit the most.

I wouldn’t put it in those terms

because that says that data is the end goal.

Data is the means.

So for us, the end goal is helping address challenges

in human health and the method that we’ve elected to do that

is to apply machine learning to build predictive models

and machine learning, in my opinion,

can only be really successfully applied

especially the more powerful models

if you give it data that is of sufficient scale

and sufficient quality.

So how do you create those data sets

so as to drive the ability to generate predictive models

which subsequently help improve human health?

So before we dive into the details of that,

let me take a step back and ask when and where

was your interest in human health born?

Are there moments, events, perhaps if I may ask,

tragedies in your own life that catalyzes passion

or was it the broader desire to help humankind?

So I would say it’s a bit of both.

So on, I mean, my interest in human health

actually dates back to the early 2000s

when a lot of my peers in machine learning

and I were using data sets

that frankly were not very inspiring.

Some of us old timers still remember

the quote unquote 20 news groups data set

where this was literally a bunch of texts

from 20 news groups,

a concept that doesn’t really even exist anymore.

And the question was, can you classify

which news group a particular bag of words came from?

And it wasn’t very interesting.

The data sets at the time on the biology side

were much more interesting,

both from a technical and also from

an aspirational perspective.

They were still pretty small,

but they were better than 20 news groups.

And so I started out, I think just by wanting

to do something that was more, I don’t know,

societally useful and technically interesting.

And then over time became more and more interested

in the biology and the human health aspects for themselves

and began to work even sometimes on papers

that were just in biology

without having a significant machine learning component.

I think my interest in drug discovery

is partly due to an incident I had with

when my father sadly passed away about 12 years ago.

He had an autoimmune disease that settled in his lungs

and the doctors basically said,

well, there’s only one thing that we could do,

which is give him prednisone.

At some point, I remember a doctor even came and said,

hey, let’s do a lung biopsy to figure out

which autoimmune disease he has.

And I said, would that be helpful?

Would that change treatment?

He said, no, there’s only prednisone.

That’s the only thing we can give him.

And I had friends who were rheumatologists who said

the FDA would never approve prednisone today

because the ratio of side effects to benefit

is probably not large enough.

Today, we’re in a state where there’s probably four or five,

maybe even more, well, it depends for which autoimmune disease,

but there are multiple drugs that can help people

with autoimmune disease,

many of which didn’t exist 12 years ago.

And I think we’re at a golden time in some ways

in drug discovery where there’s the ability to create drugs

that are much more safe and much more effective

than we’ve ever been able to before.

And what’s lacking is enough understanding

of biology and mechanism to know where to aim that engine.

And I think that’s where machine learning can help.

So in 2018, you started and now lead a company in Citro,

which is, like you mentioned,

perhaps the focus is drug discovery

and the utilization of machine learning for drug discovery.

So you mentioned that, quote,

we’re really interested in creating

what you might call a disease in a dish model,

disease in a dish models,

places where diseases are complex,

where we really haven’t had a good model system,

where typical animal models that have been used for years,

including testing on mice, just aren’t very effective.

So can you try to describe what is an animal model

and what is a disease in a dish model?


So an animal model for disease

is where you create effectively,

it’s what it sounds like.

It’s oftentimes a mouse where we have introduced

some external perturbation that creates the disease

and then we cure that disease.

And the hope is that by doing that,

we will cure a similar disease in the human.

The problem is that oftentimes

the way in which we generate the disease in the animal

has nothing to do with how that disease

actually comes about in a human.

It’s what you might think of as a copy of the phenotype,

a copy of the clinical outcome,

but the mechanisms are quite different.

And so curing the disease in the animal,

which in most cases doesn’t happen naturally,

mice don’t get Alzheimer’s, they don’t get diabetes,

they don’t get atherosclerosis,

they don’t get autism or schizophrenia.

Those cures don’t translate over

to what happens in the human.

And that’s where most drugs fails

just because the findings that we had in the mouse

don’t translate to a human.

The disease in the dish models is a fairly new approach.

It’s been enabled by technologies

that have not existed for more than five to 10 years.

So for instance, the ability for us to take a cell

from any one of us, you or me,

revert that say skin cell to what’s called stem cell status,

which is what’s called the pluripotent cell

that can then be differentiated

into different types of cells.

So from that pluripotent cell,

one can create a Lex neuron or a Lex cardiomyocyte

or a Lex hepatocyte that has your genetics,

but that right cell type.

And so if there’s a genetic burden of disease

that would manifest in that particular cell type,

you might be able to see it by looking at those cells

and saying, oh, that’s what potentially sick cells

look like versus healthy cells

and then explore what kind of interventions

might revert the unhealthy looking cell to a healthy cell.

Now, of course, curing cells is not the same

as curing people.

And so there’s still potentially a translatability gap,

but at least for diseases that are driven,

say by human genetics and where the human genetics

is what drives the cellular phenotype,

there is some reason to hope that if we revert those cells

in which the disease begins

and where the disease is driven by genetics

and we can revert that cell back to a healthy state,

maybe that will help also revert

the more global clinical phenotype.

So that’s really what we’re hoping to do.

That step, that backward step, I was reading about it,

the Yamanaka factor.


So it’s like that reverse step back to stem cells.


Seems like magic.

It is.

Honestly, before that happened,

I think very few people would have predicted

that to be possible.

It’s amazing.

Can you maybe elaborate, is it actually possible?

Like where, like how stable?

So this result was maybe like,

I don’t know how many years ago,

maybe 10 years ago was first demonstrated,

something like that.

Is this, how hard is this?

Like how noisy is this backward step?

It seems quite incredible and cool.

It is, it is incredible and cool.

It was much more, I think finicky and bespoke

at the early stages when the discovery was first made.

But at this point, it’s become almost industrialized.

There are what’s called contract research organizations,

vendors that will take a sample from a human

and revert it back to stem cell status.

And it works a very good fraction of the time.

Now there are people who will ask,

I think good questions.

Is this really truly a stem cell or does it remember

certain aspects of what,

of changes that were made in the human beyond the genetics?

It’s passed as a skin cell, yeah.

It’s passed as a skin cell or it’s passed

in terms of exposures to different environmental factors

and so on.

So I think the consensus right now

is that these are not always perfect

and there is little bits and pieces of memory sometimes,

but by and large, these are actually pretty good.

So one of the key things,

well, maybe you can correct me,

but one of the useful things for machine learning

is size, scale of data.

How easy it is to do these kinds of reversals to stem cells

and then disease in a dish models at scale.

Is that a huge challenge or not?

So the reversal is not as of this point

something that can be done at the scale

of tens of thousands or hundreds of thousands.

I think total number of stem cells or IPS cells

that are what’s called induced pluripotent stem cells

in the world I think is somewhere between five and 10,000

last I looked.

Now again, that might not count things that exist

in this or that academic center

and they may add up to a bit more,

but that’s about the range.

So it’s not something that you could at this point

generate IPS cells from a million people,

but maybe you don’t need to

because maybe that background is enough

because it can also be now perturbed in different ways.

And some people have done really interesting experiments

in for instance, taking cells from a healthy human

and then introducing a mutation into it

using one of the other miracle technologies

that’s emerged in the last decade

which is CRISPR gene editing

and introduced a mutation that is known to be pathogenic.

And so you can now look at the healthy cells

and the unhealthy cells, the one with the mutation

and do a one on one comparison

where everything else is held constant.

And so you could really start to understand specifically

what the mutation does at the cellular level.

So the IPS cells are a great starting point

and obviously more diversity is better

because you also wanna capture ethnic background

and how that affects things,

but maybe you don’t need one from every single patient

with every single type of disease

because we have other tools at our disposal.

Well, how much difference is there between people

I mentioned ethnic background in terms of IPS cells?

So we’re all like, it seems like these magical cells

that can do to create anything

between different populations, different people.

Is there a lot of variability between cell cells?

Well, first of all, there’s the variability,

that’s driven simply by the fact

that genetically we’re different.

So a stem cell that’s derived from my genotype

is gonna be different from a stem cell

that’s derived from your genotype.

There’s also some differences that have more to do with

for whatever reason, some people’s stem cells

differentiate better than other people’s stem cells.

We don’t entirely understand why.

So there’s certainly some differences there as well,

but the fundamental difference

and the one that we really care about and is a positive

is that the fact that the genetics are different

and therefore recapitulate my disease burden

versus your disease burden.

What’s a disease burden?

Well, a disease burden is just if you think,

I mean, it’s not a well defined mathematical term,

although there are mathematical formulations of it.

If you think about the fact that some of us are more likely

to get a certain disease than others

because we have more variations in our genome

that are causative of the disease,

maybe fewer that are protective of the disease.

People have quantified that

using what are called polygenic risk scores,

which look at all of the variations

in an individual person’s genome

and add them all up in terms of how much risk they confer

for a particular disease.

And then they’ve put people on a spectrum

of their disease risk.

And for certain diseases where we’ve been sufficiently

powered to really understand the connection

between the many, many small variations

that give rise to an increased disease risk,

there’s some pretty significant differences

in terms of the risk between the people,

say at the highest decile of this polygenic risk score

and the people at the lowest decile.

Sometimes those differences are factor of 10 or 12 higher.

So there’s definitely a lot that our genetics

contributes to disease risk, even if it’s not

by any stretch the full explanation.

And from a machine learning perspective,

there’s signal there.

There is definitely signal in the genetics

and there’s even more signal, we believe,

in looking at the cells that are derived

from those different genetics because in principle,

you could say all the signal is there at the genetics level.

So we don’t need to look at the cells,

but our understanding of the biology is so limited at this

point than seeing what actually happens at the cellular

level is a heck of a lot closer to the human clinical outcome

than looking at the genetics directly.

And so we can learn a lot more from it

than we could by looking at genetics alone.

So just to get a sense, I don’t know if it’s easy to do,

but what kind of data is useful

in this disease in a dish model?

Like what’s the source of raw data information?

And also from my outsider’s perspective,

so biology and cells are squishy things.

And then how do you connect the computer to that?

Which sensory mechanisms, I guess.

So that’s another one of those revolutions

that have happened in the last 10 years

in that our ability to measure cells very quantitatively

has also dramatically increased.

So back when I started doing biology in the late 90s,

early 2000s, that was the initial era

where we started to measure biology

in really quantitative ways using things like microarrays,

where you would measure in a single experiment

the activity level, what’s called expression level

of every gene in the genome in that sample.

And that ability is what actually allowed us

to even understand that there are molecular subtypes

of diseases like cancer, where up until that point,

it’s like, oh, you have breast cancer.

But then when we looked at the molecular data,

it was clear that there’s different subtypes

of breast cancer that at the level of gene activity

look completely different to each other.

So that was the beginning of this process.

Now we have the ability to measure individual cells

in terms of their gene activity

using what’s called single cell RNA sequencing,

which basically sequences the RNA,

which is that activity level of different genes

for every gene in the genome.

And you could do that at single cell level.

So that’s an incredibly powerful way of measuring cells.

I mean, you literally count the number of transcripts.

So it really turns that squishy thing

into something that’s digital.

Another tremendous data source that’s emerged

in the last few years is microscopy

and specifically even super resolution microscopy,

where you could use digital reconstruction

to look at subcellular structures,

sometimes even things that are below

the diffraction limit of light

by doing a sophisticated reconstruction.

And again, that gives you a tremendous amount of information

at the subcellular level.

There’s now more and more ways that amazing scientists

out there are developing for getting new types

of information from even single cells.

And so that is a way of turning those squishy things

into digital data.

Into beautiful data sets.

But so that data set then with machine learning tools

allows you to maybe understand the developmental,

like the mechanism of a particular disease.

And if it’s possible to sort of at a high level describe,

how does that help lead to a drug discovery

that can help prevent, reverse that mechanism?

So I think there’s different ways in which this data

could potentially be used.

Some people use it for scientific discovery

and say, oh, look, we see this phenotype

at the cellular level.

So let’s try and work our way backwards

and think which genes might be involved in pathways

that give rise to that.

So that’s a very sort of analytical method

to sort of work our way backwards

using our understanding of known biology.

Some people use it in a somewhat more,

sort of forward, if that was a backward,

this would be forward, which is to say,

okay, if I can perturb this gene,

does it show a phenotype that is similar

to what I see in disease patients?

And so maybe that gene is actually causal of the disease.

So that’s a different way.

And then there’s what we do,

which is basically to take that very large collection

of data and use machine learning to uncover the patterns

that emerge from it.

So for instance, what are those subtypes

that might be similar at the human clinical outcome,

but quite distinct when you look at the molecular data?

And then if we can identify such a subtype,

are there interventions that if I apply it

to cells that come from this subtype of the disease

and you apply that intervention,

it could be a drug or it could be a CRISPR gene intervention,

does it revert the disease state

to something that looks more like normal,

happy, healthy cells?

And so hopefully if you see that,

that gives you a certain hope

that that intervention will also have

a meaningful clinical benefit to people.

And there’s obviously a bunch of things

that you would wanna do after that to validate that,

but it’s a very different and much less hypothesis driven way

of uncovering new potential interventions

and might give rise to things that are not the same things

that everyone else is already looking at.

That’s, I don’t know, I’m just like to psychoanalyze

my own feeling about our discussion currently.

It’s so exciting to talk about sort of a machine,

fundamentally, well, something that’s been turned

into a machine learning problem

and that says can have so much real world impact.

That’s how I feel too.

That’s kind of exciting because I’m so,

most of my day is spent with data sets

that I guess closer to the news groups.

So this is a kind of, it just feels good to talk about.

In fact, I almost don’t wanna talk about machine learning.

I wanna talk about the fundamentals of the data set,

which is an exciting place to be.

I agree with you.

It’s what gets me up in the morning.

It’s also what attracts a lot of the people

who work at InCetro to InCetro

because I think all of the,

certainly all of our machine learning people

are outstanding and could go get a job selling ads online

or doing eCommerce or even self driving cars.

But I think they would want, they come to us

because they want to work on something

that has more of an aspirational nature

and can really benefit humanity.

What, with these approaches, what do you hope,

what kind of diseases can be helped?

We mentioned Alzheimer’s, schizophrenia, type 2 diabetes.

Can you just describe the various kinds of diseases

that this approach can help?

Well, we don’t know.

And I try and be very cautious about making promises

about some things that, oh, we will cure X.

People make that promise.

And I think it’s, I tried to first deliver and then promise

as opposed to the other way around.

There are characteristics of a disease

that make it more likely that this type of approach

can potentially be helpful.

So for instance, diseases that have

a very strong genetic basis are ones

that are more likely to manifest

in a stem cell derived model.

We would want the cellular models

to be relatively reproducible and robust

so that you could actually get enough of those cells

and in a way that isn’t very highly variable and noisy.

You would want the disease to be relatively contained

in one or a small number of cell types

that you could actually create in an in vitro,

in a dish setting.

Whereas if it’s something that’s really broad and systemic

and involves multiple cells

that are in very distal parts of your body,

putting that all in the dish is really challenging.

So we want to focus on the ones

that are most likely to be successful today

with the hope, I think, that really smart bioengineers

out there are developing better and better systems

all the time so that diseases that might not be tractable

today might be tractable in three years.

So for instance, five years ago,

these stem cell derived models didn’t really exist.

People were doing most of the work in cancer cells

and cancer cells are very, very poor models

of most human biology because they’re,

A, they were cancer to begin with

and B, as you passage them and they proliferate in a dish,

they become, because of the genomic instability,

even less similar to human biology.

Now we have these stem cell derived models.

We have the capability to reasonably robustly,

not quite at the right scale yet, but close,

to derive what’s called organoids,

which are these teeny little sort of multicellular organ,

sort of models of an organ system.

So there’s cerebral organoids and liver organoids

and kidney organoids and.

Yeah, brain organoids.

That’s organoids.

It’s possibly the coolest thing I’ve ever seen.

Is that not like the coolest thing?


And then I think on the horizon,

we’re starting to see things like connecting

these organoids to each other

so that you could actually start,

and there’s some really cool papers that start to do that

where you can actually start to say,

okay, can we do multi organ system stuff?

There’s many challenges to that.

It’s not easy by any stretch, but it might,

I’m sure people will figure it out.

And in three years or five years,

there will be disease models that we could make

for things that we can’t make today.

Yeah, and this conversation would seem almost outdated

with the kind of scale that could be achieved

in like three years.

I hope so.

That’s the hope.

That would be so cool.

So you’ve cofounded Coursera with Andrew Ng

and were part of the whole MOOC revolution.

So to jump topics a little bit,

can you maybe tell the origin story of the history,

the origin story of MOOCs, of Coursera,

and in general, your teaching to huge audiences

on a very sort of impactful topic of AI in general?

So I think the origin story of MOOCs

emanates from a number of efforts

that occurred at Stanford University

around the late 2000s

where different individuals within Stanford,

myself included, were getting really excited

about the opportunities of using online technologies

as a way of achieving both improved quality of teaching

and also improved scale.

And so Andrew, for instance,

led the Stanford Engineering Everywhere,

which was sort of an attempt to take 10 Stanford courses

and put them online just as video lectures.

I led an effort within Stanford to take some of the courses

and really create a very different teaching model

that broke those up into smaller units

and had some of those embedded interactions and so on,

which got a lot of support from university leaders

because they felt like it was potentially a way

of improving the quality of instruction at Stanford

by moving to what’s now called the flipped classroom model.

And so those efforts eventually sort of started

to interplay with each other

and created a tremendous sense of excitement and energy

within the Stanford community

about the potential of online teaching

and led in the fall of 2011

to the launch of the first Stanford MOOCs.

By the way, MOOCs, it’s probably impossible

that people don’t know, but it’s, I guess, massive.

Open online courses. Open online courses.

We did not come up with the acronym.

I’m not particularly fond of the acronym,

but it is what it is. It is what it is.

Big bang is not a great term for the start of the universe,

but it is what it is. Probably so.

So anyway, so those courses launched in the fall of 2011,

and there were, within a matter of weeks,

with no real publicity campaign, just a New York Times article

that went viral, about 100,000 students or more

in each of those courses.

And I remember this conversation that Andrew and I had.

We were just like, wow, there’s this real need here.

And I think we both felt like, sure,

we were accomplished academics and we could go back

and go back to our labs, write more papers.

But if we did that, then this wouldn’t happen.

And it seemed too important not to happen.

And so we spent a fair bit of time debating,

do we wanna do this as a Stanford effort,

kind of building on what we’d started?

Do we wanna do this as a for profit company?

Do we wanna do this as a nonprofit?

And we decided ultimately to do it as we did with Coursera.

And so, you know, we started really operating

as a company at the beginning of 2012.

And the rest is history.

But how did you, was that really surprising to you?

How did you at that time and at this time

make sense of this need for sort of global education

you mentioned that you felt that, wow,

the popularity indicates that there’s a hunger

for sort of globalization of learning.

I think there is a hunger for learning that,

you know, globalization is part of it,

but I think it’s just a hunger for learning.

The world has changed in the last 50 years.

It used to be that you finished college, you got a job,

by and large, the skills that you learned in college

were pretty much what got you through

the rest of your job history.

And yeah, you learn some stuff,

but it wasn’t a dramatic change.

Today, we’re in a world where the skills that you need

for a lot of jobs, they didn’t even exist

when you went to college.

And the jobs, and many of the jobs that existed

when you went to college don’t even exist today or are dying.

So part of that is due to AI, but not only.

And we need to find a way of keeping people,

giving people access to the skills that they need today.

And I think that’s really what’s driving

a lot of this hunger.

So I think if we even take a step back,

for you, all of this started in trying to think

of new ways to teach or to,

new ways to sort of organize the material

and present the material in a way

that would help the education process, the pedagogy, yeah.

So what have you learned about effective education

from this process of playing,

of experimenting with different ideas?

So we learned a number of things.

Some of which I think could translate back

and have translated back effectively

to how people teach on campus.

And some of which I think are more specific

to people who learn online,

more sort of people who learn as part of their daily life.

So we learned, for instance, very quickly

that short is better.

So people who are especially in the workforce

can’t do a 15 week semester long course.

They just can’t fit that into their lives.

Sure, can you describe the shortness of what?

The entirety, so every aspect,

so the little lecture, the lecture’s short,

the course is short.


We started out, the first online education efforts

were actually MIT’s OpenCourseWare initiatives.

And that was recording of classroom lectures and,

Hour and a half or something like that, yeah.

And that didn’t really work very well.

I mean, some people benefit.

I mean, of course they did,

but it’s not really a very palatable experience

for someone who has a job and three kids

and they need to run errands and such.

They can’t fit 15 weeks into their life

and the hour and a half is really hard.

So we learned very quickly.

I mean, we started out with short video modules

and over time we made them shorter

because we realized that 15 minutes was still too long.

If you wanna fit in when you’re waiting in line

for your kid’s doctor’s appointment,

it’s better if it’s five to seven.

We learned that 15 week courses don’t work

and you really wanna break this up into shorter units

so that there is a natural completion point,

gives people a sense of they’re really close

to finishing something meaningful.

They can always come back and take part two and part three.

We also learned that compressing the content works

really well because if some people that pace works well

and for others, they can always rewind and watch again.

And so people have the ability

to then learn at their own pace.

And so that flexibility, the brevity and the flexibility

are both things that we found to be very important.

We learned that engagement during the content is important

and the quicker you give people feedback,

the more likely they are to be engaged.

Hence the introduction of these,

which we actually was an intuition that I had going in

and was then validated using data

that introducing some of these sort of little micro quizzes

into the lectures really helps.

Self graded as automatically graded assessments

really helped too because it gives people feedback.

See, there you are.

So all of these are valuable.

And then we learned a bunch of other things too.

We did some really interesting experiments, for instance,

on gender bias and how having a female role model

as an instructor can change the balance of men to women

in terms of, especially in STEM courses.

And you could do that online by doing AB testing

in ways that would be really difficult to go on campus.

Oh, that’s exciting.

But so the shortness, the compression,

I mean, that’s actually, so that probably is true

for all good editing is always just compressing the content,

making it shorter.

So that puts a lot of burden on the creator of the,

the instructor and the creator of the educational content.

Probably most lectures at MIT or Stanford

could be five times shorter

if the preparation was put enough.

So maybe people might disagree with that,

but like the Christmas, the clarity that a lot of the,

like Coursera delivers is, how much effort does that take?

So first of all, let me say that it’s not clear

that that crispness would work as effectively

in a face to face setting

because people need time to absorb the material.

And so you need to at least pause

and give people a chance to reflect and maybe practice.

And that’s what MOOCs do is that they give you

these chunks of content and then ask you

to practice with it.

And that’s where I think some of the newer pedagogy

that people are adopting in face to face teaching

that have to do with interactive learning and such

can be really helpful.

But both those approaches,

whether you’re doing that type of methodology

in online teaching or in that flipped classroom,

interactive teaching.

What’s that, sorry to pause, what’s flipped classroom?

Flipped classroom is a way in which online content

is used to supplement face to face teaching

where people watch the videos perhaps

and do some of the exercises before coming to class.

And then when they come to class,

it’s actually to do much deeper problem solving

oftentimes in a group.

But any one of those different pedagogies

that are beyond just standing there and droning on

in front of the classroom for an hour and 15 minutes

require a heck of a lot more preparation.

And so it’s one of the challenges I think that people have

that we had when trying to convince instructors

to teach on Coursera.

And it’s part of the challenges that pedagogy experts

on campus have in trying to get faculty

to teach differently is that it’s actually harder

to teach that way than it is to stand there and drone.

Do you think MOOCs will replace in person education

or become the majority of in person of education

of the way people learn in the future?

Again, the future could be very far away,

but where’s the trend going do you think?

So I think it’s a nuanced and complicated answer.

I don’t think MOOCs will replace face to face teaching.

I think learning is in many cases a social experience.

And even at Coursera, we had people who naturally formed

study groups, even when they didn’t have to,

to just come and talk to each other.

And we found that that actually benefited their learning

in very important ways.

So there was more success among learners

who had those study groups than among ones who didn’t.

So I don’t think it’s just gonna,

oh, we’re all gonna just suddenly learn online

with a computer and no one else in the same way

that recorded music has not replaced live concerts.

But I do think that especially when you are thinking

about continuing education, the stuff that people get

when they’re traditional,

whatever high school, college education is done,

and they yet have to maintain their level of expertise

and skills in a rapidly changing world,

I think people will consume more and more educational content

in this online format because going back to school

for formal education is not an option for most people.

Briefly, it might be a difficult question to ask,

but there’s a lot of people fascinated

by artificial intelligence, by machine learning,

by deep learning.

Is there a recommendation for the next year

or for a lifelong journey of somebody interested in this?

How do they begin?

How do they enter that learning journey?

I think the important thing is first to just get started.

And there’s plenty of online content that one can get

for both the core foundations of mathematics

and statistics and programming.

And then from there to machine learning,

I would encourage people not to skip

to quickly pass the foundations

because I find that there’s a lot of people

who learn machine learning, whether it’s online

or on campus without getting those foundations.

And they basically just turn the crank on existing models

in ways that A, don’t allow for a lot of innovation

and an adjustment to the problem at hand,

but also B, are sometimes just wrong

and they don’t even realize that their application is wrong

because there’s artifacts that they haven’t fully understood.

So I think the foundations,

machine learning is an important step.

And then actually start solving problems,

try and find someone to solve them with

because especially at the beginning,

it’s useful to have someone to bounce ideas off

and fix mistakes that you make

and you can fix mistakes that they make,

but then just find practical problems,

whether it’s in your workplace or if you don’t have that,

Kaggle competitions or such are a really great place

to find interesting problems and just practice.


Perhaps a bit of a romanticized question,

but what idea in deep learning do you find,

have you found in your journey the most beautiful

or surprising or interesting?

Perhaps not just deep learning,

but AI in general, statistics.

I’m gonna answer with two things.

One would be the foundational concept of end to end training,

which is that you start from the raw data

and you train something that is not like a single piece,

but rather towards the actual goal that you’re looking to.

From the raw data to the outcome,

like no details in between.

Well, not no details, but the fact that you,

I mean, you could certainly introduce building blocks

that were trained towards other tasks.

I’m actually coming to that in my second half of the answer,

but it doesn’t have to be like a single monolithic blob

in the middle.

Actually, I think that’s not ideal,

but rather the fact that at the end of the day,

you can actually train something that goes all the way

from the beginning to the end.

And the other one that I find really compelling

is the notion of learning a representation

that in its turn, even if it was trained to another task,

can potentially be used as a much more rapid starting point

to solving a different task.

And that’s, I think, reminiscent

of what makes people successful learners.

It’s something that is relatively new

in the machine learning space.

I think it’s underutilized even relative

to today’s capabilities, but more and more

of how do we learn sort of reusable representation?

And so end to end and transfer learning.


Is it surprising to you that neural networks

are able to, in many cases, do these things?

Is it maybe taken back to when you first would dive deep

into neural networks or in general, even today,

is it surprising that neural networks work at all

and work wonderfully to do this kind of raw end to end

and end to end learning and even transfer learning?

I think I was surprised by how well

when you have large enough amounts of data,

it’s possible to find a meaningful representation

in what is an exceedingly high dimensional space.

And so I find that to be really exciting

and people are still working on the math for that.

There’s more papers on that every year.

And I think it would be really cool

if we figured that out, but that to me was a surprise

because in the early days when I was starting my way

in machine learning and the data sets were rather small,

I think we believed, I believed that you needed

to have a much more constrained

and knowledge rich search space

to really make, to really get to a meaningful answer.

And I think it was true at the time.

What I think is still a question

is will a completely knowledge free approach

where there’s no prior knowledge going

into the construction of the model,

is that gonna be the solution or not?

It’s not actually the solution today

in the sense that the architecture of a convolutional

neural network that’s used for images

is actually quite different

to the type of network that’s used for language

and yet different from the one that’s used for speech

or biology or any other application.

There’s still some insight that goes

into the structure of the network

to get the right performance.

Will you be able to come up

with a universal learning machine?

I don’t know.

I wonder if there’s always has to be some insight

injected somewhere or whether it can converge.

So you’ve done a lot of interesting work

with probabilistic graphical models in general,

Bayesian deep learning and so on.

Can you maybe speak high level,

how can learning systems deal with uncertainty?

One of the limitations I think of a lot

of machine learning models is that

they come up with an answer

and you don’t know how much you can believe that answer.

And oftentimes the answer is actually

quite poorly calibrated relative to its uncertainties.

Even if you look at where the confidence

that comes out of say the neural network at the end,

and you ask how much more likely

is an answer of 0.8 versus 0.9,

it’s not really in any way calibrated

to the actual reliability of that network

and how true it is.

And the further away you move from the training data,

the more, not only the more wrong the network is,

often it’s more wrong and more confident

in its wrong answer.

And that is a serious issue in a lot of application areas.

So when you think for instance,

about medical diagnosis as being maybe an epitome

of how problematic this can be,

if you were training your network

on a certain set of patients

and a certain patient population,

and I have a patient that is an outlier

and there’s no human that looks at this,

and that patient is put into a neural network

and your network not only gives

a completely incorrect diagnosis,

but is supremely confident

in its wrong answer, you could kill people.

So I think creating more of an understanding

of how do you produce networks

that are calibrated in their uncertainty

and can also say, you know what, I give up.

I don’t know what to say about this particular data instance

because I’ve never seen something

that’s sufficiently like it before.

I think it’s going to be really important

in mission critical applications,

especially ones where human life is at stake

and that includes medical applications,

but it also includes automated driving

because you’d want the network to be able to say,

you know what, I have no idea what this blob is

that I’m seeing in the middle of the road.

So I’m just going to stop

because I don’t want to potentially run over a pedestrian

that I don’t recognize.

Is there good mechanisms, ideas of how to allow

learning systems to provide that uncertainty

along with their predictions?

Certainly people have come up with mechanisms

that involve Bayesian deep learning,

deep learning that involves Gaussian processes.

I mean, there’s a slew of different approaches

that people have come up with.

There’s methods that use ensembles of networks

trained with different subsets of data

or different random starting points.

Those are actually sometimes surprisingly good

at creating a sort of set of how confident

or not you are in your answer.

It’s very much an area of open research.

Let’s cautiously venture back into the land of philosophy

and speaking of AI systems providing uncertainty,

somebody like Stuart Russell believes

that as we create more and more intelligence systems,

it’s really important for them to be full of self doubt

because if they’re given more and more power,

we want the way to maintain human control

over AI systems or human supervision, which is true.

Like you just mentioned with autonomous vehicles,

it’s really important to get human supervision

when the car is not sure because if it’s really confident

in cases when it can get in trouble,

it’s gonna be really problematic.

So let me ask about sort of the questions of AGI

and human level intelligence.

I mean, we’ve talked about curing diseases,

which is sort of fundamental thing

we can have an impact today,

but AI people also dream of both understanding

and creating intelligence.

Is that something you think about?

Is that something you dream about?

Is that something you think is within our reach

to be thinking about as computer scientists?

Well, boy, let me tease apart different parts

of that question.

The worst question.

Yeah, it’s a multi part question.

So let me start with the feasibility of AGI.

Then I’ll talk about the timelines a little bit

and then talk about, well, what controls does one need

when thinking about protections in the AI space?

So, I think AGI obviously is a longstanding dream

that even our early pioneers in the space had,

the Turing test and so on

are the earliest discussions of that.

We’re obviously closer than we were 70 or so years ago,

but I think it’s still very far away.

I think machine learning algorithms today

are really exquisitely good pattern recognizers

in very specific problem domains

where they have seen enough training data

to make good predictions.

You take a machine learning algorithm

and you move it to a slightly different version

of even that same problem, far less one that’s different

and it will just completely choke.

So I think we’re nowhere close to the versatility

and flexibility of even a human toddler

in terms of their ability to context switch

and solve different problems

using a single knowledge base, single brain.

So am I desperately worried about

the machines taking over the universe

and starting to kill people

because they want to have more power?

I don’t think so.

Well, so to pause on that,

so you kind of intuited that super intelligence

is a very difficult thing to achieve.

Even intelligence.

Intelligence, intelligence.

Super intelligence, we’re not even close to intelligence.

Even just the greater abilities of generalization

of our current systems.

But we haven’t answered all the parts

and we’ll take another.

I’m getting to the second part.

Okay, but maybe another tangent you can also pick up

is can we get in trouble with much dumber systems?

Yes, and that is exactly where I was going.

So just to wrap up on the threats of AGI,

I think that it seems to me a little early today

to figure out protections against a human level

or superhuman level intelligence

where we don’t even see the skeleton

of what that would look like.

So it seems that it’s very speculative

on how to protect against that.

But we can definitely and have gotten into trouble

on much dumber systems.

And a lot of that has to do with the fact

that the systems that we’re building are increasingly

complex, increasingly poorly understood.

And there’s ripple effects that are unpredictable

in changing little things that can have dramatic consequences

on the outcome.

And by the way, that’s not unique to artificial intelligence.

I think artificial intelligence exacerbates that,

brings it to a new level.

But heck, our electric grid is really complicated.

The software that runs our financial markets

is really complicated.

And we’ve seen those ripple effects translate

to dramatic negative consequences,

like for instance, financial crashes that have to do

with feedback loops that we didn’t anticipate.

So I think that’s an issue that we need to be thoughtful

about in many places,

artificial intelligence being one of them.

And I think it’s really important that people are thinking

about ways in which we can have better interpretability

of systems, better tests for, for instance,

measuring the extent to which a machine learning system

that was trained in one set of circumstances,

how well does it actually work

in a very different set of circumstances

where you might say, for instance,

well, I’m not gonna be able to test my automated vehicle

in every possible city, village,

weather condition and so on.

But if you trained it on this set of conditions

and then tested it on 50 or a hundred others

that were quite different from the ones

that you trained it on and it worked,

then that gives you confidence that the next 50

that you didn’t test it on might also work.

So effectively it’s testing for generalizability.

So I think there’s ways that we should be

constantly thinking about to validate the robustness

of our systems.

I think it’s very different from the let’s make sure

robots don’t take over the world.

And then the other place where I think we have a threat,

which is also important for us to think about

is the extent to which technology can be abused.

So like any really powerful technology,

machine learning can be very much used badly

as well as to good.

And that goes back to many other technologies

that have come up with when people invented

projectile missiles and it turned into guns

and people invented nuclear power

and it turned into nuclear bombs.

And I think honestly, I would say that to me,

gene editing and CRISPR is at least as dangerous

as technology if used badly than as machine learning.

You could create really nasty viruses and such

using gene editing that you would be really careful about.

So anyway, that’s something that we need

to be really thoughtful about whenever we have

any really powerful new technology.

Yeah, and in the case of machine learning

is adversarial machine learning.

So all the kinds of attacks like security almost threats

and there’s a social engineering

with machine learning algorithms.

And there’s face recognition and big brother is watching you

and there’s the killer drones that can potentially go

and targeted execution of people in a different country.

One can argue that bombs are not necessarily

that much better, but people wanna kill someone,

they’ll find a way to do it.

So in general, if you look at trends in the data,

there’s less wars, there’s less violence,

there’s more human rights.

So we’ve been doing overall quite good as a human species.

Are you optimistic?

Surprisingly sometimes.

Are you optimistic?

Maybe another way to ask is do you think most people

are good and fundamentally we tend towards a better world,

which is underlying the question,

will machine learning with gene editing

ultimately land us somewhere good?

Are you optimistic?

I think by and large, I’m optimistic.

I think that most people mean well,

that doesn’t mean that most people are altruistic do gooders,

but I think most people mean well,

but I think it’s also really important for us as a society

to create social norms where doing good

and being perceived well by our peers

are positively correlated.

I mean, it’s very easy to create dysfunctional norms

in emotional societies.

There’s certainly multiple psychological experiments

as well as sadly real world events

where people have devolved to a world

where being perceived well by your peers

is correlated with really atrocious,

often genocidal behaviors.

So we really want to make sure

that we maintain a set of social norms

where people know that to be a successful member of society,

you want to be doing good.

And one of the things that I sometimes worry about

is that some societies don’t seem to necessarily

be moving in the forward direction in that regard

where it’s not necessarily the case

that being a good person

is what makes you be perceived well by your peers.

And I think that’s a really important thing

for us as a society to remember.

It’s really easy to degenerate back into a universe

where it’s okay to do really bad stuff

and still have your peers think you’re amazing.

It’s fun to ask a world class computer scientist

and engineer a ridiculously philosophical question

like what is the meaning of life?

Let me ask, what gives your life meaning?

Or what is the source of fulfillment, happiness,

joy, purpose?

When we were starting Coursera in the fall of 2011,

that was right around the time that Steve Jobs passed away.

And so the media was full of various famous quotes

that he uttered and one of them that really stuck with me

because it resonated with stuff that I’d been feeling

for even years before that is that our goal in life

should be to make a dent in the universe.

So I think that to me, what gives my life meaning

is that I would hope that when I am lying there

on my deathbed and looking at what I’d done in my life

that I can point to ways in which I have left the world

a better place than it was when I entered it.

This is something I tell my kids all the time

because I also think that the burden of that

is much greater for those of us who were born to privilege.

And in some ways I was, I mean, I wasn’t born super wealthy

or anything like that, but I grew up in an educated family

with parents who loved me and took care of me

and I had a chance at a great education

and I always had enough to eat.

So I was in many ways born to privilege

more than the vast majority of humanity.

And my kids I think are even more so born to privilege

than I was fortunate enough to be.

And I think it’s really important that especially

for those of us who have that opportunity

that we use our lives to make the world a better place.

I don’t think there’s a better way to end it.

Daphne, it was an honor to talk to you.

Thank you so much for talking today.

Thank you.

Thanks for listening to this conversation

with Daphne Koller and thank you

to our presenting sponsor, Cash App.

Please consider supporting the podcast

by downloading Cash App and using code LEXPodcast.

If you enjoy this podcast, subscribe on YouTube,

review it with five stars on Apple Podcast,

support it on Patreon, or simply connect with me

on Twitter at LEXFREEDMAN.

And now let me leave you with some words from Hippocrates,

a physician from ancient Greece

who’s considered to be the father of medicine.

Wherever the art of medicine is loved,

there’s also a love of humanity.

Thank you for listening and hope to see you next time.

comments powered by Disqus