Lex Fridman Podcast - #153 - Dmitry Korkin Evolution of Proteins, Viruses, Life, and AI

The following is a conversation with Dmitry Korkin,

his second time in the podcast.

He’s a professor of bioinformatics

and computational biology at WPI,

where he specializes in bioinformatics of complex disease,

computational genomics, systems biology,

and biomedical data analytics.

He loves biology, he loves computing,

plus he is Russian and recites a poem in Russian

at the end of the podcast.

What else could you possibly ask for in this world?

Quick mention of our sponsors.

Brave Browser, NetSuite Business Management Software,

Magic Spoon Low Carb Cereal,

and 8sleep Self Cooling Mattress.

So the choice is browsing privacy, business success,

healthy diet, or comfortable sleep.

Choose wisely, my friends,

and if you wish, click the sponsor links below

to get a discount and to support this podcast.

As a side note, let me say that to me,

the scientists that did the best apolitical,

impactful, brilliant work of 2020

are the biologists who study viruses without an agenda,

without much sleep, to be honest,

just a pure passion for scientific discovery

and exploration of the mysteries within viruses.

Viruses are both terrifying and beautiful.

Terrifying because they can threaten

the fabric of human civilization,

both biological and psychological.

Beautiful because they give us insights

into the nature of life on Earth

and perhaps even extraterrestrial life

of the not so intelligent variety

that might meet us one day

as we explore the habitable planets

and moons in our universe.

If you enjoy this thing, subscribe on YouTube,

review it on Apple Podcast, follow on Spotify,

support on Patreon, or connect with me on Twitter

at Lex Friedman.

And now here’s my conversation with Dmitry Korkin.

It’s often said that proteins

and the amino acid residues that make them up

are the building blocks of life.

Do you think of proteins in this way

as the basic building blocks of life?

Yes and no.

So the proteins indeed is the basic unit,

biological unit that carries out

important function of the cell.

However, through studying the proteins

and comparing the proteins across different species,

across different kingdoms,

you realize that proteins are actually

much more complicated.

So they have so called modular complexity.

And so what I mean by that is an average protein

consists of several structural units.

So we call them protein domains.

And so you can imagine a protein as a string of beads

where each bead is a protein domain.

And in the past 20 years,

scientists have been studying

the nature of the protein domains

because we realize that it’s the unit.

Because if you look at the functions, right?

So many proteins have more than one function

and those protein functions are often carried out

by those protein domains.

So we also see that in the evolution,

those proteins domains get shuffled.

So they act actually as a unit.

Also from the structural perspective, right?

So some people think of a protein

as a sort of a globular molecule,

but as a matter of fact,

is the globular part of this protein is a protein domain.

So we often have this, again,

the collection of this protein domains

align on a string as beads.

And the protein domains are made up of amino acid residue.

So we’re talking about.

So this is the basic,

so you’re saying the protein domain

is the basic building block of the function

that we think about proteins doing.

So of course you can always talk

about different building blocks.

It’s turtles all the way down.

But there’s a point where there is,

at the point of the hierarchy

where it’s the most, the cleanest element block

based on which you can put them together

in different kinds of ways to form complex function.

And you’re saying protein domains,

why is that not talked about as often in popular culture?

Well, there are several perspectives on this.

And one of course is the historical perspective, right?

So historically scientists have been able

to structurally resolved

to obtain the 3D coordinates of a protein

for smaller proteins.

And smaller proteins tend to be a single domain protein.

So we have a protein equal to a protein domain.

And so because of that,

the initial suspicion was that the proteins are,

they have globular shapes

and the more of smaller proteins you obtain structurally,

the more you became convinced that that’s the case.

And only later when we started having

alternative approaches.

So the traditional ones are X ray crystallography

and NMR spectroscopy.

So this is sort of the two main techniques

that give us the 3D coordinates.

But nowadays there’s huge breakthrough

in cryo electron microscopy.

So the more advanced methods that allow us

to get into the 3D shapes of much larger molecules,

molecular complexes,

just to give you one of the common examples

for this year, right?

So the first experimental structure

of a SARS COVID 2 protein

was the cryo EM structure of the S protein.

So the spike protein.

And so it was solved very quickly.

And the reason for that is the advancement

of this technology is pretty spectacular.

How many domains does the, is it more than one domain?

Oh yes.

Oh yes, I mean, so it’s a very complex structure.

And we, you know, on top of the complexity

of a single protein, right?

So this structure is actually is a complex, is a trimer.

So it needs to form a trimer in order to function properly.

What’s a complex?

So a complex is a glomeration of multiple proteins.

And so we can have the same protein copied in multiple,

you know, made up in multiple copies

and forming something that we called a homo oligomer.

Homo means the same, right?

So in this case, so the spike protein is the,

is an example of a homo tetram, homo trimer, sorry.

So you need three copies of it?

Three copies.

In order to.


We have these three chains,

the three molecular chains coupled together

and performing the function.

That’s what, when you look at this protein from the top,

you see a perfect triangle.


So, but other, you know,

so other complexes are made up of, you know,

different proteins.

Some of them are completely different.

Some of them are similar.

The hemoglobin molecule, right?

So it’s actually, it’s a protein complex.

It’s made of four basic subunits.

Two of them are identical to each other.

Two other identical to each other,

but they are also similar to each other,

which sort of gives us some ideas about the evolution

of this, you know, of this molecule.

And perhaps, so one of the hypothesis is that, you know,

in the past, it was just a homo tetramer, right?

So four identical copies,

and then it became, you know, sort of modified,

it became mutated over the time

and became more specialized.

Can we linger on the spike protein for a little bit?

Is there something interesting

or like beautiful you find about it?

I mean, first of all,

it’s an incredibly challenging protein.

And so we, as a part of our sort of research

to understand the structural basis of this virus,

to sort of decode, structurally decode,

every single protein in its proteome,

which, you know, we’ve been working on this spike protein.

And one of the main challenges was that the cryoEM data

allows us to reconstruct or to obtain the 3D coordinates

of roughly two thirds of the protein.

The rest of the one third of this protein,

it’s a part that is buried into the membrane of the virus

and of the viral envelope.

And it also has a lot of unstable structures around it.

So it’s chemically interacting somehow

with whatever the hex is connecting to.

Yeah, so people are still trying to understand.

So the nature of, and the role of this one third,

because the top part, you know, the primary function

is to get attached to the ACE2 receptor, human receptor.

There is also beautiful mechanics

of how this thing happens, right?

So because there are three different copies of this chains,

you know, there are three different domains, right?

So we’re talking about domains.

So this is the receptor binding domains, RBDs,

that gets untangled and get ready to get attached

to the receptor.

And now they are not necessarily going in a sync mode.

As a matter of fact.

It’s asynchronous.

So yes, and this is where another level of complexity

comes into play because right now what we see is,

we typically see just one of the arms going out

and getting ready to be attached to the ACE2 receptors.

However, there was a recent mutation

that people studied in that spike protein.

And very recently, a group from UMass Medical School

will happen to collaborate with groups.

So this is a group of Jeremy Lubin

and a number of other faculty.

They actually solve the mutated structure of the spike.

And they showed that actually, because of these mutations,

you have more than one arms opening up.

And so now, so the frequency of two arms going up

increase quite drastically.


Does that change the dynamics somehow?

It potentially can change the dynamics

because now you have two possible opportunities

to get attached to the ACE2 receptor.

It’s a very complex molecular process, mechanistic process.

But the first step of this process is the attachment

of this spike protein, of the spike trimer

to the human ACE2 receptor.

So this is a molecule that sits

on the surface of the human cell.

And that’s essentially what initiates,

what triggers the whole process of encapsulation.

If this was dating, this would be the first date.

So this is the…

In a way.


So is it possible to have the spike protein

just like floating about on its own?

Or does it need that interactability with the membrane?

Yeah, so it needs to be attached,

at least as far as I know.

But when you get this thing attached on the surface,

there is also a lot of dynamics

on how it sits on the surface.

So for example, there was a recent work in,

again, where people use the cryolectron microscopy

to get the first glimpse of the overall structure.

It’s a very low res, but you still get

some interesting details about the surface,

about what is happening inside,

because we have literally no clue until recent work

about how the capsid is organized.

What’s a capsid?

So a capsid is essentially,

it’s the inner core of the viral particle

where there is the RNA of the virus,

and it’s protected by another protein, N protein,

that essentially acts as a shield.

But now we are learning more and more,

so it’s actually, it’s not just this shield,

it potentially is used for the stability

of the outer shell of the virus.

So it’s pretty complicated.

And I mean, understanding all of this is really useful

for trying to figure out like developing a vaccine

or some kind of drug to attack,

any aspects of this, right?

So, I mean, there are many different implications to that.

First of all, it’s important to understand

the virus itself, right?

So in order to understand how it acts,

what is the overall mechanistic process

of this virus replication,

of this virus proliferation to the cell, right?

So that’s one aspect.

The other aspect is designing new treatments.

So one of the possible treatments

is designing nanoparticles.

And so some nanoparticles that will resemble the viral shape

that would have the spike integrated,

and essentially would act as a competitor to the real virus

by blocking the ACE2 receptors,

and thus preventing the real virus entering the cell.

Now, there are also, you know,

there is a very interesting direction

in looking at the membrane,

at the envelope portion of the protein

and attacking its M protein.

So there are, you know, to give you a, you know,

sort of a brief overview,

there are four structural proteins.

These are the proteins that made up

a structure of the virus.

So SPIKE, S protein that acts as a trimer,

so it needs three copies.

E, envelope protein that acts as a pantomime,

so it needs five copies to act properly.

M is a membrane protein, it forms dimers,

and actually it forms beautiful lattice.

And this is something that we’ve been studying

and we are seeing it in simulations.

It actually forms a very nice grid

or, you know, threads, you know,

of different dimers attached next to each other.

Just a bunch of copies of each other,

and they naturally, when you have a bunch of copies

of each other, they form an interesting lattice.


And, you know, if you think about this, right?

So this complex, you know, the viral shape

needs to be organized somehow, self organized somehow, right?

So it, you know, if it was a completely random process,

you know, you probably wouldn’t have the envelope shell

of the ellipsoid shape, you know,

you would have something, you know,

pretty random, right, shape.

So there is some, you know, regularity

in how this, you know, how this M dimers

get to attach to each other

in a very specific directed way.

Is that understood at all?

It’s not understood.

We are now, we’ve been working in the past six months

since, you know, we met, actually,

this is where we started working on trying to understand

the overall structure of the envelope

and the key components that made up this, you know,


Wait, does the envelope also have the lattice structure

or no?

So the envelope is essentially is the outer shell

of the viral particle.

The N, the nucleocapsid protein,

is something that is inside.

Got it.

But get that, the N is likely to interact with M.

Does it go M and E?

Like, where’s the E and the M?

So E, those different proteins,

they occur in different copies on the viral particle.

So E, this pentamer complex,

we only have two or three, maybe, per each particle, okay?

We have thousand or so of M dimers

that essentially made up,

that makes up the entire, you know, outer shell.

So most of the outer shell is the M.

M dimer.

And the M protein.

When you say particle, that’s the virion,

the virus, the individual virus.

It’s a single, yes.

Single element of the virus, it’s a single virus.

Single virus, right.

And we have about, you know, roughly 50 to 90 spike trimmers.


So when you, you know, when you show a…

Per virus particle.

Sorry, what did you say, 50 to 90?

50 to 90, right?

So this is how this thing is organized.

And so now, typically, right,

so you see these, the antibodies that target,

you know, spike protein,

certain parts of the spike protein,

but there could be some, also some treatments, right?

So these are, you know, these are small molecules

that bind strategic parts of these proteins,

disrupting its function.

So one of the promising directions,

it’s one of the newest directions,

is actually targeting the M dimer of the protein.

Targeting the proteins that make up this outer shell.

Because if you’re able to destroy the outer shell,

you’re essentially destroying the viral particle itself.

So preventing it from, you know, functioning at all.

So that’s, you think is,

from a sort of cyber security perspective,

virus security perspective,

that’s the best attack vector?

Is, or like, that’s a promising attack vector?

I would say, yeah.

So, I mean, there’s still tons of research needs to be,

you know, to be done.

But yes, I think, you know, so.

There’s more attack surface, I guess.

More attack surface.

But, you know, from our analysis,

from other evolutionary analysis,

this protein is evolutionarily more stable

compared to the, say, to the spike protein.

Oh, and stable means a more static target?

Well, yeah, so it doesn’t change.

It doesn’t evolve from the evolutionary perspective

so drastically as, for example, the spike protein.

There’s a bunch of stuff in the news

about mutations of the virus in the United Kingdom.

I also saw in South Africa something.

Maybe that was yesterday.

You just kind of mentioned about stability and so on.

Which aspects of this are mutatable

and which aspects, if mutated, become more dangerous?

And maybe even zooming out,

what are your thoughts and knowledge and ideas

about the way it’s mutated,

all the news that we’ve been hearing?

Are you worried about it from a biological perspective?

Are you worried about it from a human perspective?

So, I mean, you know, mutations are sort of a general way

for these viruses to evolve, right?

So, it’s, you know, it’s essentially,

this is the way they evolve.

This is the way they were able to jump

from one species to another.

We also see some recent jumps.

There were some incidents of this virus jumping

from human to dogs.

So, you know, there is some danger in those jumps

because every time it jumps, it also mutates, right?

So, when it jumps to the species

and jumps back, right?

So, it acquires some mutations

that are sort of driven by the environment

of a new host, right?

And it’s different from the human environment.

And so, we don’t know whether the mutations

that are acquired in the new species

are neutral with respect to the human host

or maybe, you know, maybe damaging.

Yeah, change is always scary, but so are you worried about,

I mean, it seems like because the spread is,

during winter now, seems to be exceptionally high

and especially with a vaccine just around the corner

already being actually deployed,

is there some worry that this puts evolutionary pressure,

selective pressure on the virus for it to mutate?

Is that a source of worry?

Well, I mean, there is always this thought

in the scientist’s mind, you know, what will happen, right?

So, I know there’ve been discussions

about sort of the arms race between the ability

of the humanity to get vaccinated faster

than the virus, you know, essentially, you know,

it becomes, you know, resistant to the vaccine.

I mean, I don’t worry that much simply because,

you know, there is not that much evidence to that.

To aggressive mutation around the vaccine.

Exactly, you know, obviously there are mutations

around the vaccine, so the reason we get vaccinated

every year against the seasonal mutations, right?

But, you know, I think it’s important to study it.

No doubts, right?

So, I think one of the, you know, to me,

and again, I might be biased because, you know,

we’ve been trying to do that as well,

so, but one of the critical directions

in understanding the virus is to understand its evolution

in order to sort of understand the mechanisms,

the key mechanisms that lead the virus to jump,

you know, the Nordic viruses to jump from species,

from species to another, that the mechanisms

that lead the virus to become resistant to vaccines,

also to treatments, right?

And hopefully that knowledge will enable us

to sort of forecast the evolutionary traces,

the future evolutionary traces of this virus.

I mean, what, from a biological perspective,

this might be a dumb question,

but is there parts of the virus that if souped up,

like through mutation, could make it more effective

at doing its job?

We’re talking about this specific coronavirus

because we were talking about the different, like,

the membrane, the M protein, the E protein,

the N and the S, the spike, is there some?

And there are 20 or so more in addition to that.

But is that a dumb way to look at it?

Like, which of these, if mutated,

could have the greatest impact, potentially damaging impact,

on the effectiveness of the virus?

So it’s actually, it’s a very good question

because, and the short answer is, we don’t know yet.

But of course there is capacity of this virus

to become more efficient.

The reason for that is, you know,

so if you look at the virus, I mean, it’s a machine, right?

So it’s a machine that does a lot of different functions,

and many of these functions are sort of nearly perfect,

but they’re not perfect.

And those mutations can have the greatest impact

and make those functions more perfect.

For example, the attachment to ACE2 receptor, right,

of the spike, right?

So, you know, has this virus reached the efficiency

in which the attachment is carried out?

Or there are some mutations that still to be discovered,

right, that will make this attachment sort of stronger,

or, you know, something more, in a way more efficient

from the point of view of this virus functioning.

That’s sort of the obvious example.

But if you look at each of these proteins,

I mean, it’s there for a reason,

it performs certain function.

And it could be that certain mutations will, you know,

enhance this function.

It could be that some mutations will make this function

much less efficient, right?

So that’s also the case.

Let’s, since we’re talking about the evolutionary history

of a virus, let’s zoom back out

and look at the evolution of proteins.

I glanced at this 2010 Nature paper

on the quote, ongoing expansion of the protein universe.

And then, you know, it kind of implies and talks about

that proteins started with a common ancestor,

which is, you know, kind of interesting.

It’s interesting to think about like,

even just like the first organic thing

that started life on Earth.

And from that, there’s now, you know, what is it?

3.5 billion years later, there’s now millions of proteins.

And they’re still evolving.

And that’s, you know, in part,

one of the things that you’re researching.

Is there something interesting to you about the evolution

of proteins from this initial ancestor to today?

Is there something beautiful and insightful

about this long story?

So I think, you know, if I were to pick a single keyword

about protein evolution, I would pick modularity,

something that we talked about in the beginning.

And that’s the fact that the proteins are no longer

considered as, you know, as a sequence of letters.

There are hierarchical complexities

in the way these proteins are organized.

And these complexities are actually going

beyond the protein sequence.

It’s actually going all the way back to the gene,

to the nucleotide sequence.

And so, you know, again, these protein domains,

they are not only functional building blocks,

they are also evolutionary building blocks.

And so what we see in the sort of,

in the later stages of evolution,

I mean, once this stable structurally

and functionally building blocks were discovered,

they essentially, they stay, those domains stay as such.

So that’s why if you start comparing different proteins,

you will see that many of them will have similar fragments.

And those fragments will correspond to something

that we call protein domain families.

And so they are still different

because you still have mutations and, you know,

the, you know, different mutations are attributed to,

to, you know, diversification of the function

of this, you know, protein domains.

However, you don’t, you very rarely see, you know,

the evolutionary events that would split

this domain into fragments because,

and it’s, you know, once you have the domain split,

you actually, you, you know,

you can completely cancel out its function

or at the very least you can reduce it.

And that’s not, you know, efficient from the point of view

of the, you know, of the cell functioning.

So, so the, the, the protein domain level

is a very important one.

Now, on top of that, right?

So if you look at the proteins, right,

so you have this structural units

and they carry out the function,

but then much less is known about things

that connect this protein domains,

something that we call linkers.

And those linkers are completely flexible, you know,

parts of the protein that nevertheless

carry out a lot of function.

So it’s like little tails, little heads.

So, so, so we do have tails.

So they’re called termini, C and N termini.

So these are things right on the, on, on, on one

and another ends of the protein sequence.

So they are also very important.

So they, they attributed to very specific interactions

between the proteins.


But you’re referring to the links between domains.

That connect the domains.

And, you know, apart from the, just the,

the simple perspective, if you have, you know,

a very short domain, you have, sorry, a very short linker,

you have two domains next to each other.

They are forced to be next to each other.

If you have a very long one,

you have the domains that are extremely flexible

and they carry out a lot of sort of

spatial reorganization, right?

That’s awesome.

But on top of that, right, just this linker itself,

because it’s so flexible, it actually can adapt

to a lot of different shapes.

And therefore it’s a, it’s a very good interactor

when it comes to interaction between this protein

and other protein, right?

So these things also evolve, you know,

and they in a way have different sort of laws of

the driving laws that underlie the evolution

because they no longer need to,

to preserve certain structure, right?

Unlike protein domains.

And so on top of that,

you have something that is even less studied.

And this is something that attribute to,

to the concept of alternative splicing.

So alternative splicing.

So it’s a, it’s a very cool concept.

It’s something that we’ve been fascinated about for,

you know, over a decade in my lab

and trying to do research with that.

But so, you know, so typically, you know,

a simplistic perspective is that one gene

is equal one protein product, right?

So you have a gene, you know,

you transcribe it and translate it

and it becomes a protein.

In reality, when we talk about eukaryotes,

especially sort of more recent eukaryotes

that are very complex,

the gene is no longer equal to one protein.

It actually can produce multiple functionally,

you know, active protein products.

And each of them is, you know,

is called an alternatively spliced product.

The reason it happens is that if you look at the gene,

it actually has, it has also blocks.

And the blocks, some of which,

and it’s essentially, it goes like this.

So we have a block that will later be translated.

We call it exon.

Then we’ll have a block that is not translated, cut out.

We call it intron.

So we have exon, intron, exon, intron,

et cetera, et cetera, et cetera, right?

So sometimes you can have, you know,

dozens of these exons and introns.

So what happens is during the process

when the gene is converted to RNA,

we have things that are cut out,

the introns that are cut out,

and exons that now get assembled together.

And sometimes we will throw out some of the exons

and the remaining protein product will become

still be the same.


Oh, different.

So now you have fragments of the protein

that no longer there.

They were cut out with the introns.

Sometimes you will essentially take one exon

and replace it with another one, right?

So there’s some flexibility in this process.

So that creates a whole new level of complexity.

Cause now.

Is this random though?

Is it random?

It’s not random.

We, and this is where I think now the appearance

of this modern single cell

and before that tissue level sequencing,

next generation sequencing techniques such as RNA seed

allows us to see that these are the events

that often happen in response.

It’s a dynamic event that happens in response

to disease or in response

to certain developmental stage of a cell.

And this is an incredibly complex layer

that also undergoes, I mean,

because it’s at the gene level, right?

So it undergoes certain evolution, right?

And now we have this interplay

between what is happening in the protein world

and what is happening in the gene and RNA world.

And for example, it’s often that we see

that the boundaries of this exons coincide

with the boundaries of the protein domains, right?

So there is this close interplay to that.

It’s not always, I mean, otherwise it would be too simple,


But we do see the connection

between those sort of machineries.

And obviously the evolution will pick up this complexity

and, you know.

Select for whatever is successful,

whatever is interesting function.

We see that complexity in play

and makes this question more complex, but more exciting.

Small detour, I don’t know if you think about this

into the world of computer science.

There’s a Douglas Hostetter, I think,

came up with the name of Quine,

which are, I don’t know if you’re familiar

with these things, but it’s computer programs

that have, I guess, exon and intron,

and they copy, the whole purpose of the program

is to copy itself.

So it prints copies of itself,

but can also carry information inside of it.

So it’s a very kind of crude, fun exercise of,

can we sort of replicate these ideas from cells?

Can we have a computer program that when you run it,

just print itself, the entirety of itself,

and does it in different programming languages and so on.

I’ve been playing around and writing them.

It’s a kind of fun little exercise.

You know, when I was a kid, so you know,

it was essentially one of the sort of main stages

in informatics Olympiads that you have to reach

in order to be any so good,

is you should be able to write a program

that replicates itself.

And so the task then becomes even sort of more complicated.

So what is the shortest program?

And of course, it’s a function of a programming language,

but yeah, I remember a long, long, long time ago

when we tried to make it short and short

and find the shortcut.

There’s actually on a stack exchange, there’s a entire site

called CodeGolf, I think,

where the entirety is just the competition.

People just come up with whatever task, I don’t know,

like write code that reports the weather today.

And the competition is about whatever programming language,

what is the shortest program?

And it makes you actually, people should check it out

because it makes you realize

there’s some weird programming languages out there.

But just to dig on that a little deeper,

do you think, in computer science,

we don’t often think about programs,

just like the machine learning world now,

that’s still kind of basic programs.

And then there’s humans that replicate themselves, right?

And there’s these mutations and so on.

Do you think we’ll ever have a world

where there’s programs that kind of

have an evolutionary process?

So I’m not talking about evolutionary algorithms,

but I’m talking about programs that kind of

mate with each other and evolve

and like on their own replicate themselves.

So this is kind of the idea here is,

that’s how you can have a runaway thing.

So we think about machine learning as a system

that gets smarter and smarter and smarter and smarter.

At least the machine learning systems of today are like,

it’s a program that you can like turn off,

as opposed to throwing a bunch of little programs out there

and letting them like multiply and mate

and evolve and replicate.

Do you ever think about that kind of world,

when we jump from the biological systems

that you’re looking at to artificial ones?

I mean, it’s almost like you take the sort of the area

of intelligent agents, right?

Which are essentially the independent sort of codes

that run and interact and exchange the information, right?

So I don’t see why not.

I mean, it could be sort of a natural evolution

in this area of computer science.

I think it’s kind of an interesting possibility.

It’s terrifying too,

but I think it’s a really powerful tool.

Like to have like agents that, you know,

we have social networks with millions of people

and they interact.

I think it’s interesting to inject into that,

was already injected into that bots, right?

But those bots are pretty dumb.

You know, they’re probably pretty dumb algorithms.

You know, it’s interesting to think

that there might be bots that evolve together with humans.

And there’s the sea of humans and robots

that are operating first in the digital space.

And then you can also think, I love the idea.

Some people worked, I think at Harvard, at Penn,

there’s robotics labs that, you know,

take as a fundamental task to build a robot

that given extra resources can build another copy of itself,

like in the physical space,

which is super difficult to do, but super interesting.

I remember there’s like research on robots

that can build a bridge.

So they make a copy of themselves

and they connect themselves

and the sort of like self building bridge

based on building blocks.

You can imagine like a building that self assembles.

So it’s basically self assembling structures

from robotic parts.

But it’s interesting to, within that robot,

add the ability to mutate

and do all the interesting like little things

that you’re referring to in evolution

to go from a single origin protein building block

to like this weird complex.

And if you think about this, I mean, you know,

the bits and pieces are there, you know.

So you mentioned the evolution algorithm, right?

You know, so this is sort of,

and maybe sort of the goal is in a way different, right?

So the goal is to, you know, to essentially,

to optimize your search, right?

So, but sort of the ideas are there.

So people recognize that, you know,

that the recombination events lead to global changes

in the search trajectories, the mutations event

is a more refined, you know, step in the search.

Then you have, you know, other sort of

nature inspired algorithm, right?

So one of the reasons that, you know,

I think it’s one of the funnest one

is the slime based algorithm, right?

So it’s, I think the first was introduced

by the Japanese group,

where it was able to solve some pre complex problems.

So that’s, and then I think there are still a lot of things

we’ve yet to, you know, borrow from the nature, right?

So there are a lot of sort of ideas

that nature, you know, gets to offer us that, you know,

it’s up to us to grab it and to, you know,

get the best use of it.

Including neural networks, you know, we have a very crude

inspiration from nature on neural networks.

Maybe there’s other inspirations to be discovered

in the brain or other aspects of the various systems,

even like the immune system, the way it interplays.

I recently started to understand that the,

like the immune system has something to do

with the way the brain operates.

Like there’s multiple things going on in there,

which all of which are not modeled

in artificial neural networks.

And maybe if you throw a little bit of that biological spice

in there, you’ll come up with something, something cool.

I’m not sure if you’re familiar with the Drake equation

that estimate, I just did a video on it yesterday

because I wanted to give my own estimate of it.

It’s an equation that combines a bunch of factors

to estimate how many alien civilizations are in the galaxy.

I’ve heard about it, yes.

So one of the interesting parameters, you know,

it’s like how many stars are born every year,

how many planets are on average per star for this,

how many habitable planets are there.

And then the one that starts being really interesting

is the probability that life emerges on a habitable planet.

So like, I don’t know if you think about,

you certainly think a lot about evolution,

but do you think about the thing

which evolution doesn’t describe,

which is like the beginning of evolution, the origin of life.

I think I put the probability of life developing

in a habitable planet at 1%.

This is very scientifically rigorous.

Okay, well, first at a high level for the Drake equation,

what would you put that percent at on earth?

And in general, do you have something,

do you have thoughts about how life might’ve started,

you know, like the proteins being the first kind of,

one of the early jumping points?

Yeah, so I think back in 2018,

there was a very exciting paper published in Nature

where they found one of the simplest amino acids,

glycine, in a comet dust.

So this is, and I apologize if I don’t pronounce,

it’s a Russian named comet,

it’s I think Chugryumov Gerasimenko.

This is the comet where, and there was this mission

to get close to this comet and get the stardust

from its tail.

And when scientists analyzed it,

they actually found traces of, you know, of glycine,

which, you know, makes up, you know,

it’s one of the basic, one of the 20 basic amino acids

that makes up proteins, right?

So that was kind of very exciting, right?

But, you know, the question is very interesting, right?

So what, you know, if there is some alien life,

is it gonna be made of proteins, right?

Or maybe RNAs, right?

So we see that, you know, the RNA viruses are certainly,

you know, very well established sort of, you know,

group of molecular machines, right?

So, yeah, it’s a very interesting question.

What probability would you put?

Like, how hard is this job?

Like, how unlikely just on Earth do you think

this whole thing is that we got going?

Like, are we really lucky or is it inevitable?

Like, what’s your sense when you sit back

and think about life on Earth?

Is it higher or lower than 1%?

Well, because 1% is pretty low, but it still is like,

damn, that’s a pretty good chance.

Yes, it’s a pretty good chance.

I mean, I would, personally, but again, you know,

I’m, you know, probably not the best person

to do such estimations, but I would, you know,

intuitively, I would probably put it lower.

But still, I mean, you know, given.

So we’re really lucky here on Earth.

I mean.

Or the conditions are really good.

It’s, you know, I think that there was,

everything was right in a way, right?

So we still, it’s not, the conditions were not like ideal

if you try to look at, you know, what was, you know,

several billions years ago when the life emerged.

So there is something called the Rare Earth Hypothesis

that, you know, in counter to the Drake Equation says

that the, you know, the conditions of Earth,

if you actually were to describe Earth,

it’s quite a special place.

So special it might be unique in our galaxy

and potentially, you know, close to unique

in the entire universe.

Like it’s very difficult to reconstruct

those same conditions.

And what the Rare Earth Hypothesis argues

is all those different conditions are essential for life.

And so that’s sort of the counter, you know,

like all the things we, you know,

thinking that Earth is pretty average.

I mean, I can’t really, I’m trying to remember

to go through all of them, but just the fact

that it is shielded from a lot of asteroids,

the, obviously the distance to the sun,

but also the fact that it’s like a perfect balance

between the amount of water and land

and all those kinds of things.

I don’t know, there’s a bunch of different factors

that I don’t remember, there’s a long list.

But it’s fascinating to think about if in order

for something like proteins and then DNA and RNA

to emerge, you need, and basic living organisms,

you need to be very close to an Earth like planet,

which will be sad or exciting, I don’t know which.

If you ask me, I, you know, in a way I put a parallel

between, you know, between our own research.

And I mean, from the intuitive perspective,

you know, you have those two extremes

and the reality is never very rarely falls

into the extremes.

It’s always the optimus always reached somewhere in between.

So, and that’s what I tend to think.

I think that, you know, we’re probably somewhere in between.

So they were not unique, unique, but again,

the chances are, you know, reasonably small.

The problem is we don’t know the other extreme

is like, I tend to think that we don’t actually understand

the basic mechanisms of like what this is all originated

from, like, it seems like we think of life

as this distinct thing, maybe intelligence

is a distinct thing, maybe the physics that,

from which planets and suns are born is a distinct thing.

But that could be a very, it’s like the Stephen Wolfram

thing, it’s like the, from simple rules emerges

greater and greater complexity.

So, you know, I tend to believe that just life finds a way.

Like, we don’t know the extreme of how common life is

because it could be life is like everywhere.

Like, so everywhere that it’s almost like laughable,

like that we’re such idiots to think who are you?

Like, it’s like ridiculous to even like think,

it’s like ants thinking that their little colony

is the unique thing and everything else doesn’t exist.

I mean, it’s also very possible that that’s the extreme

and we’re just not able to maybe comprehend

the nature of that life.

Just to stick on alien life for just a brief moment more,

there is some signs of life on Venus in gaseous form.

There’s hope for life on Mars, probably extinct.

We’re not talking about intelligent life.

Although that has been in the news recently.

We’re talking about basic like, you know, bacteria.

Yeah, and then also, I guess, there’s a couple moons.


Yeah, Europa, which is Jupiter’s moon.

I think there’s another one.

Are you, is that exciting or is it terrifying to you

that we might find life?

Do you hope we find life?

I certainly do hope that we find life.

I mean, it was very exciting to hear about this news

about the possible life on Venus.

It’d be nice to have hard evidence of something with,

which is what the hope is for Mars and Europa.

But do you think those organisms

will be similar biologically

or would they even be sort of carbon based

if we do find them?

I would say they would be carbon based.

How similar, it’s a big question, right?

So it’s the moment we discover things outside Earth, right?

Even if it’s a tiny little single cell.

I mean, there is so much.

Just imagine that, that would be so.

I think that that would be another turning point

for the science, you know?

Especially if it’s different in some very new way.

That’s exciting.

Because that says, that’s a definitive statement,

not a definitive, but a pretty strong statement

that life is everywhere in the universe.

To me at least, that’s really exciting.

You brought up Joshua Lederberg in an offline conversation.

I think I’d love to talk to you about Alpha Fold

and this might be an interesting way

to enter that conversation because,

so he won the 1958 Nobel Prize in Physiology and Medicine

for discovering that bacteria can mate and exchange genes.

But he also did a ton of other stuff,

like we mentioned, helping NASA find life on Mars

and the…

Dendro. Dendro.

The chemical expert system.

Expert systems, remember those?

What do you find interesting about this guy

and his ideas about artificial intelligence in general?

So I have a kind of personal story to share.

So I started my PhD in Canada back in 2000.

And so essentially my PhD was,

so we were developing sort of a new language

for symbolic machine learning.

So it’s different from the feature based machine learning.

And one of the sort of cleanest applications

of this approach, of this formalism

was to cheminformatics and computer aided drug design.

So essentially we were, as a part of my research,

I developed a system that essentially looked

at chemical compounds of say the same therapeutic category,

you know, male hormones, right?

And try to figure out the structural fragments

that are the structural building blocks

that are important that define this class

versus structural building blocks

that are there just because, you know,

to complete the structure.

But they are not essentially the ones

that make up the chemical, the key chemical properties

of this therapeutic category.

And, you know, for me, it was something new.

I was trained as an applied mathematicians, you know,

as with some machine learning background,

but, you know, computer aided drug design

was a completely new territory.

So because of that, I often find myself

asking lots of questions on one of these

sort of central forums.

Back then, there were no Facebooks or stuff like that.

There was a forum, you know, it’s a forum.

It’s essentially, it’s like a bulletin board.


On the internet.

Yeah, so you essentially, you have a bunch of people

and you post a question and you get, you know,

an answer from, you know, different people.

And back then, just like one of the most popular forums

was CCL, I think Computational Chemistry Library,

not library, but something like that,

but CCL, that was the forum.

And there, I, you know, I…

Asked a lot of dumb questions.

Yes, I asked questions.

Also shared some, you know, some information

about how formal it is and how we do

and whether whatever we do makes sense.

And so, you know, and I remember that one of these posts,

I mean, I still remember, you know,

I would call it desperately looking

for a chemist advice, something like that, right?

And so I post my question, I explained, you know,

how formalism is, what it does

and what kind of applications I’m planning to do.

And, you know, and it was, you know,

in the middle of the night and I went back to bed.

And next morning, have a phone call from my advisor

who also looked at this forum.

It’s like, you won’t believe who replied to you.

And it’s like, who?

And he said, well, you know, there is a message

to you from Joshua Lederberg.

And my reaction was like, who is Joshua Lederberg?

Your advisor hung up. So, and essentially, you know,

Joshua wrote me that we had conceptually similar ideas

in the dendrial project.

You may wanna look it up.

And we should also, sorry, and it’s a side comment,

say that even though he won the Nobel Prize

at a really young age, in 58, but so he was,

I think he was what, 33.

It’s just crazy.

So anyway, so that’s, so hence in the 90s,

responding to young whippersnappers on the CCL forum.


And so back then he was already very senior.

I mean, he unfortunately passed away back in 2008,

but, you know, back in 2001, he was, I mean,

he was a professor emeritus at Rockefeller University.

And, you know, that was actually, believe it or not,

one of the reasons I decided to join, you know,

as a postdoc, the group of Andre Salle,

who was at Rockefeller University,

with the hope that, you know, that I could actually,

you know, have a chance to meet Joshua in person.

And I met him very briefly, right?

Just because he was walking, you know,

there’s a little bridge that connects the,

sort of the research campus with the,

with the sort of skyscraper that Rockefeller owns,

the where, you know, postdocs and faculty

and graduate students live.

And so I met him, you know,

and had a very short conversation, you know.

But so I started, you know, reading about Dendral

and I was amazed, you know, it’s,

we’re talking about 1960, right?

The ideas were so profound.

Well, what’s the fun about the ideas of it?

The reason to make this is even crazier.

So, Lederberg wanted to make a system

that would help him study the extraterrestrial molecules,


So, the idea was that, you know,

the way you study the extraterrestrial molecules

is you do the mass spec analysis, right?

And so the mass spec gives you sort of bits,

numbers about essentially gives you the ideas

about the possible fragments or, you know,

atoms, you know, and maybe a little fragments,

pieces of this molecule that make up the molecule, right?

So now you need to sort of,

to decompose this information

and to figure out what was the hole

before it became fragments, bits and pieces, right?

So, in order to make this, you know,

to have this tool, the idea of Lederberg

was to connect chemistry, computer science,

and to design this so called expert system

that looks, that takes into account,

that takes as an input the mass spec data,

the possible database of possible molecules

and essentially try to sort of induce the molecule

that would correspond to this spectra

or, you know, essentially what this project ended up being

was that, you know, it would provide a list of candidates

that then a chemist would look at and make final decision.


But the original idea, I suppose,

is to solve the entirety of this problem automatically.

Yes, yes.

So he, you know, so he,

back then he approached. 60s.

Yes, believe that, it’s amazing.

I mean, it still blows my mind, you know, that it’s,

that’s, and this was essentially the origin

of the modern bioinformatics, cheminformatics,

you know, back in 60s.

So that’s, you know, every time you deal with projects

like this, with the, you know, research like this,

you just, you know, so the power of the, you know,

intelligence of this people is just, you know, overwhelming.

Do you think about expert systems, is there,

and why they kind of didn’t become successful,

especially in the space of bioinformatics,

where it does seem like there is a lot of expertise

in humans, and, you know, it’s possible to see

that a system like this could be made very useful.


And be built up.

So it’s actually, it’s a great question,

and this is something, so, you know, so, you know,

at my university, I teach artificial intelligence,

and, you know, we start, my first two lectures

are on the history of AI.

And there we, you know, we try to, you know,

go through the main stages of AI.

And so, you know, the question of why expert systems failed

or became obsolete, it’s actually a very interesting one.

And there are, you know, if you try to read the, you know,

the historical perspectives,

there are actually two lines of thoughts.

One is that they were essentially

not up to the expectations.

And so therefore they were replaced, you know,

by other things, right?

The other one was that completely opposite one,

that they were too good.

And as a result, they essentially became

sort of a household name,

and then essentially they got transformed.

I mean, in both cases, sort of the outcome was the same.

They evolved into something, right?

And that’s what I, you know, if I look at this, right?

So the modern machine learning, right?


So there’s echoes in the modern machine learning.

I think so, I think so, because, you know,

if you think about this, you know, and how we design,

you know, the most successful algorithms,

including AlphaFold, right?

You built in the knowledge about the domain

that you study, right?

So you built in your expertise.

So speaking of AlphaFold,

so DeepMind’s AlphaFold 2 recently was announced

to have, quote unquote, solved protein folding.

But how exciting is this to you?

It seems to be one of the,

one of the exciting things that have happened in 2020.

It’s an incredible accomplishment from the looks of it.

What part of it is amazing to you?

What part would you say is over hype

or maybe misunderstood?

It’s definitely a very exciting achievement.

To give you a little bit of perspective, right?

So in bioinformatics, we have several competitions.

And so the way, you know, you often hear

how those competitions have been explained

to sort of to non bioinformaticians is that, you know,

they call it bioinformatics Olympic games.

And there are several disciplines, right?

So the historically one of the first one

was the discipline in predicting the protein structure,

predicting the 3D coordinates of the protein.

But there are some others.

So the predicting protein functions,

predicting effects of mutations on protein functions,

then predicting protein, protein interactions.

So the original one was CASP

or a critical assessment of a protein structure.

And the, you know, typically what happens

during this competitions is, you know, scientists,

experimental scientists solve the structures,

but don’t put them into the protein data bank,

which is the centralized database

that contains all the 3D coordinates.

Instead, they hold it and release protein sequences.

And now the challenge of the community

is to predict the 3D structures of this proteins

and then use the experimental results structures

to assess which one is the closest one, right?

And this competition, by the way,

just a bunch of different tangents.

And maybe you can also say, what is protein folding?

Then this competition, CASP competition

has become the gold standard.

And that’s what was used to say

that protein folding was solved.

So just to add a little, just a bunch.

So if you could, whenever you say stuff,

maybe throw in some of the basics

for the folks that might be outside of the field.

Anyway, sorry.

So, yeah, so, you know, so the reason it’s, you know,

it’s relevant to our understanding of protein folding

is because, you know, we’ve yet to learn

how the folding mechanistically works, right?

So there are different hypothesis,

what happens to this fold?

For example, there is a hypothesis that the folding happens

by, you know, also in the modular fashion, right?

So that, you know, we have protein domains

that get folded independently

because their structure is stable.

And then the whole protein structure gets formed.

But, you know, within those domains,

we also have a so called secondary structure,

the small alpha helices, beta schists.

So these are, you know, elements that are structurally stable.

And so, and the question is, you know,

when do they get formed?

Because some of the secondary structure elements,

you have to have, you know, a fragment in the beginning

and say the fragment in the middle, right?

So you cannot potentially start having the full fold

from the get go, right?

So it’s still, you know, it’s still a big enigma,

what happens.

We know that it’s an extremely efficient

and stable process, right?

So there’s this long sequence

and the fold happens really quickly.


So that’s really weird, right?

And it happens like the same way almost every time.

Exactly, exactly.

That’s really weird.

That’s freaking weird.

It’s, yeah, that’s why it’s such an amazing thing.

But most importantly, right?

So it’s, you know, so when you see the, you know,

the translation process, right?

So when you don’t have the whole protein translated,

right, it’s still being translated,

you know, getting out from the ribosome,

you already see some structural, you know, fragmentation.

So folding starts happening

before the whole protein gets produced, right?

And so this is obviously, you know,

one of the biggest questions in, you know,

in modern molecular biologists.

Not like maybe what happens,

like that’s not as bigger than the question of folding.

That’s the question of like,

something like deeper fundamental idea of folding.

Yes. Behind folding.

Exactly, exactly.

So, you know, so obviously if we are able to predict

the end product of protein folding,

we are one step closer to understanding

sort of the mechanisms of the protein folding.

Because we can then potentially look and start probing

what are the critical parts of this process

and what are not so critical parts of this process.

So we can start decomposing this, you know,

so in a way this protein structure prediction algorithm

can be used as a tool, right?

So you change the, you know, you modify the protein,

you get back to this tool, it predicts,

okay, it’s completely unstable.

Yeah, which aspects of the input

will have a big impact on the output?

Exactly, exactly.

So what happens is, you know,

we typically have some sort of incremental advancement,

you know, each stage of this CASP competition,

you have groups with incremental advancement

and, you know, historically the top performing groups

were, you know, they were not using machine learning.

They were using a very advanced biophysics

combined with bioinformatics,

combined with, you know, the data mining

and that was, you know, that would enable them

to obtain protein structures of those proteins

that don’t have any structurally solved relatives

because, you know, if we have another protein,

say the same protein, but coming from a different species,

we could potentially derive some ideas

and that’s so called homology or comparative modeling,

where we’ll derive some ideas

from the previously known structures

and that would help us tremendously

in, you know, in reconstructing the 3D structure overall.

But what happens when we don’t have these relatives?

This is when it becomes really, really hard, right?

So that’s so called de novo, you know,

de novo protein structure prediction.

And in this case, those methods were traditionally very good.

But what happened in the last year,

the original alpha fold came into

and all of a sudden it’s much better than everyone else.

This is 2018.


Oh, and the competition is only every two years, I think.

And then, so, you know, it was sort of kind of over shockwave

to the bioinformatics community that, you know,

we have like a state of the art machine learning system

that does, you know, structure prediction.

And essentially what it does, you know,

so if you look at this, it actually predicts the context.

So, you know, so the process of reconstructing

the 3D structure starts by predicting the context

between the different parts of the protein.

And the context essentially is the parts of the proteins

that are in a close proximity to each other.

Right, so actually the machine learning part

seems to be estimating, you can correct me if I’m wrong here,

but it seems to be estimating the distance matrix,

which is like the distance between the different parts.

Yeah, so we call the contact map.

Contact map.

So once you have the contact map,

the reconstruction is becoming more straightforward, right?

But so the contact map is the key.

And so, you know, so that what happened.

And now we started seeing in this current stage, right?

Well, in the most recent one,

we started seeing the emergence of these ideas

in other people works, right?

But yet here’s, you know, AlphaFold2

that again outperforms everyone else.

And also by introducing yet another wave

of the machine learning ideas.

Yeah, there don’t seem to be also an incorporation.

First of all, the paper is not out yet,

but there’s a bunch of ideas already out.

There does seem to be an incorporation of this other thing.

I don’t know if it’s something that you could speak to,

which is like the incorporation of like other structures,

like evolutionary similar structures

that are used to kind of give you hints.

Yes, so evolutionary similarity is something

that we can detect at different levels, right?

So we know, for example,

that the structure of proteins

is more conserved than the sequence.

The sequence could be very different,

but the structural shape is actually still very conserved.

So that’s sort of the intrinsic property that, you know,

in a way related to protein folds,

you know, to the evolution of the, you know,

of the proteins and protein domains, et cetera.

But we know that, I mean, there’ve been multiple studies.

And, you know, ideally, if you have structures,

you know, you should use that information.

However, sometimes we don’t have this information.

Instead, we have a bunch of sequences.

Sequences, we have a lot, right?

So we have, you know, hundreds, thousands

of, you know, different organisms sequenced, right?

And by taking the same protein,

but in different organisms and aligning it,

so making it, you know, making the corresponding positions

aligned, we can actually say a lot

about sort of what is conserved in this protein

and therefore, you know, structurally more stable,

what is diverse in this protein.

So on top of that, we could provide sort of the information

about the sort of the secondary structure

of this protein, et cetera, et cetera.

So this information is extremely useful

and it’s already there.

So while it’s tempting to, you know,

to do a complete ab initio,

so you just have a protein sequence and nothing else,

the reality is such that we are overwhelmed with this data.

So why not use it?

And so, yeah, so I’m looking forward

to reading this paper.

It does seem to, like they’ve,

in the previous version of Alpha Fold,

they didn’t, for this evolutionary similarity thing,

they didn’t use machine learning for that.

Or rather, they used it as like the input

to the entirety of the neural net,

like the features derived from the similarity.

It seems like there’s some kind of quote, unquote,

iterative thing where it seems to be part of the learning

process is the incorporation of this evolutionary similarity.

Yeah, I don’t think there is a bioarchive paper, right?

There’s nothing.

No, there’s nothing.

There’s a blog post that’s written

by a marketing team, essentially,

which, you know, it has some scientific similarity,

probably, to the actual methodology used,

but it could be, it’s like interpreting scripture.

It could be just poetic interpretations of the actual work

as opposed to direct connection to the work.

So now, speaking about protein folding, right?

So, you know, in order to answer the question

whether or not we have solved this, right?

So we need to go back to the beginning of our conversation

with the realization that an average protein

is that typically what the CASP has been focusing on

is this competition has been focusing

on the single, maybe two domain proteins

that are still very compact.

And even those ones are extremely challenging to solve.

But now we talk about, you know,

an average protein that has two, three protein domains.

If you look at the proteins that are in charge

of the, you know, of the process with the neural system,

right, perhaps one of the most recently evolved

sort of systems in an organism, right?

All of them, well, the majority of them

are highly multi domain proteins.

So they are, you know, some of them have five, six, seven,

you know, and more domains, right?

And, you know, we are very far away

from understanding how these proteins are folded.

So the complexity of the protein matters here.

The complexity of the protein modules

or the protein domains.

So you’re saying solved, so the definition

of solved here is particularly the CASP competition

achieving human level, not human level,

achieving experimental level performance

on these particular sets of proteins

that have been used in these competitions.

Well, I mean, you know, I do think that, you know,

especially with regards to the alpha fold,

you know, it is able to, you know, to solve,

you know, at the near experimental level,

pre big majority of the more compact proteins

like, or protein domains.

Because again, in order to understand

how the overall protein, you know,

multi domain protein fold, we do need to understand

the structure of its individual domains.

I mean, unlike if you look at alpha zero

or like even mu zero, if you look at that work,

you know, it’s nice reinforcement learning

self playing mechanisms are nice

cause it’s all in simulation.

So you can learn from just huge amounts.

Like you don’t need data.

It was like the problem with proteins,

like the size, I forget how many 3D structures

have been mapped, but the training data is very small.

No matter what, it’s like millions,

maybe a one or two million or something like that,

but it’s some very small number,

but like, it doesn’t seem like that’s scalable.

There has to be, I don’t know,

it feels like you want to somehow 10 X the data

or a hundred X the data somehow.

Yes, but we also can take advantage of homology models,

right, so the models that are of very good quality

because they are essentially obtained

based on the evolutionary information, right?

So you can, there is a potential to enhance this information

and, you know, use it again to empower the training set.

And it’s, I think, I am actually very optimistic.

I think it’s been one of this sort of, you know,

churning events where you have a system that is,

you know, a machine learning system

that is truly better than the machine learning system.

Better than the sort of the more conventional

biophysics based methods.

That’s a huge leap.

This is one of those fun questions,

but where would you put it in the ranking

of the greatest breakthroughs

in artificial intelligence history?

So like, okay, so let’s see who’s in the running.

Maybe you can correct me.

So you got like AlphaZero and AlphaGo

beating the world champion at the game of Go.

Thought to be impossible like 20 years ago.

Or at least the AI community was highly skeptical.

Then you got like also Deep Blue original Kasparov.

You have deep learning itself,

like the maybe, what would you say,

the AlexNet, ImageNet moment.

So the first neural network

achieving human level performance.

Super, that’s not true.

Achieving like a big leap in performance

on the computer vision problem.

There is OpenAI, the whole like GPT3,

that whole space of transformers and language models

just achieving this incredible performance

of application of neural networks to language models.

Boston Dynamics, pretty cool.

Like robotics.

People are like, there’s no AI.

No, no, there’s no machine learning currently.

But AI is much bigger than machine learning.

So that just the engineering aspect,

I would say it’s one of the greatest accomplishments

in engineering side.

Engineering meaning like mechanical engineering

of robotics ever.

Then of course, autonomous vehicles.

You can argue for Waymo,

which is like the Google self driving car.

Or you can argue for Tesla,

which is like actually being used

by hundreds of thousands of people on the road today,

machine learning system.

And I don’t know if you can, what else is there?

But I think that’s it.

And then AlphaFold, many people are saying

is up there, potentially number one.

Would you put them at number one?

Well, in terms of the impact on the science

and on the society beyond, it’s definitely,

to me would be one of the…

Top three?

What you want?

Maybe, I mean, I’m probably not the best person

to answer that.

But I do have, I remember my,

back in, I think 1997, when Deep Blue,

that Kasparov, it was, I mean, it was a shock.

I mean, it was, and I think for the,

for the pre substantial part of the world,

that especially people who have some experience with chess,

and realizing how incredibly human this game,

how much of a brain power you need

to reach those levels of grandmasters, right, level.

And it’s probably one of the first time,

and how good Kasparov was.

And again, yeah, so Kasparov’s arguably

one of the best ever, right?

And you get a machine that beats him.

All right, so it’s…

First time a machine probably beat a human

at that scale of a thing, of anything.

Yes, yes.

So that was, to me, that was like, you know,

one of the groundbreaking events in the history of AI.

Yeah, that’s probably number one.

Probably, like we don’t, it’s hard to remember.

It’s like Muhammad Ali versus, I don’t know,

any of the Mike Tyson, something like that.

It’s like, nah, you gotta put Muhammad Ali at number one.

Same with Deep Blue,

even though it’s not machine learning based.

Still, it uses advanced search,

and search is the integral part of AI, right?

It’s not, people don’t think of it that way at this moment.

In vogue currently, search is not seen

as a fundamental aspect of intelligence,

but it very well, I mean, it very likely is.

In fact, I mean, that’s what neural networks are,

is they’re just performing search

on the space of parameters, and it’s all search.

All of intelligence is some form of search,

and you just have to become cleverer and clever

at that search problem.

And I also have another one that you didn’t mention

that’s one of my favorite ones is,

so you’ve probably heard of this,

it’s, I think it’s called Deep Rembrandt.

It’s the project where they trained,

I think there was a collaboration

between the sort of the experts

in Rembrandt painting in Netherlands,

and a group, an artificial intelligence group,

where they train an algorithm

to replicate the style of the Rembrandt,

and they actually printed a portrait

that never existed before in the style of Rembrandt.

I think they printed it on a sort of,

on the canvas that, you know,

using pretty much same types of paints and stuff.

To me, it was mind blowing.

Yeah, and the space of art, that’s interesting.

There hasn’t been, maybe that’s it,

but I think there hasn’t been an image in that moment yet

in the space of art.

You haven’t been able to achieve

superhuman level performance in the space of art,

even though there’s this big famous thing

where a piece of art was purchased,

I guess for a lot of money.


Yeah, but it’s still, you know,

people are like in the space of music at least,

that’s, you know, it’s clear that human created pieces

are much more popular.

So there hasn’t been a moment where it’s like,

oh, this is, we’re now,

I would say in the space of music,

what makes a lot of money,

we’re talking about serious money,

it’s music and movies, or like shows and so on,

and entertainment.

There hasn’t been a moment where AI created,

AI was able to create a piece of music

or a piece of cinema, like Netflix show,

that is, you know, that’s sufficiently popular

to make a ton of money.


And that moment would be very, very powerful,

because that’s like, that’s an AI system

being used to make a lot of money.

And like direct, of course, AI tools,

like even Premiere, audio editing,

all the editing, everything I do,

to edit this podcast, there’s a lot of AI involved.

Actually, this is a program,

I wanna talk to those folks, just cause I wanna nerd out,

it’s called iZotope, I don’t know if you’re familiar with it.

They have a bunch of tools of audio processing,

and they have, I think they’re Boston based,

just, it’s so exciting to me to use it,

like on the audio here,

cause it’s all machine learning.

It’s not, cause most audio production stuff

is like any kind of processing you do,

it’s very basic signal processing,

and you’re tuning knobs and so on.

They have all of that, of course,

but they also have all of this machine learning stuff,

like where you actually give it training data,

you select parts of the audio you train on,

you train on it, and it figures stuff out.

It’s great, it’s able to detect,

like the ability of it to be able

to separate voice and music, for example,

or voice and anything, is incredible.

Like it just, it’s clearly exceptionally good

at applying these different neural networks models

to just separate the different kinds

of signals from the audio.

That, okay, so that’s really exciting.

Photoshop, Adobe people also use it,

but to generate a piece of music

that will sell millions, a piece of art, yeah.

No, I agree, and you know, it’s,

that’s, you know, as I mentioned,

I offer my AI class, and you know,

an integral part of this is the project, right?

So it’s my favorite, ultimate favorite part,

because it typically, we have these project presentations

the last two weeks of the classes,

right before, you know, the Christmas break,

and it’s sort of, it adds this cool excitement,

and every time, I mean, I’m amazed, you know,

with some projects that people, you know, come up with.

And so, and quite a few of them are actually, you know,

they have some link to arts.

I mean, you know, I think last year we had a group

who designed an AI producing hokus, Japanese poems.

Oh, wow.

So, and some of them, so, you know,

it got trained on the English based,

haikus, haikus, right?

So, and some of them, you know,

they get to present, like, the top selection.

They were pretty good.

I mean, you know, I mean, of course, I’m not a specialist,

but you read them, and you see this is real.

It seems profound.

Yes, yeah, it seems real.

So it’s kind of cool.

We also had a couple of projects where people tried

to teach AI how to play, like, rock music, classical music.

I think, and popular music.


Interestingly enough, you know,

classical music was among the most difficult ones.

Oh, sure.

And, you know, of course, if you, if, you know,

you know, if you look at the, you know,

the, like, grandmasters of music, like Bach, right?

So there is a lot of, there is a lot of,

there is a lot of almost math.

Yeah, well, he’s very mathematical.

Yeah, exactly.

So this is, I would imagine that at least some style

of this music could be picked up,

but then you have this completely different spectrum

of classical composers.

And so, you know, it’s almost like, you know,

you don’t have to sort of look at the data.

You just listen to it and say, nah, that’s not it, not yet.

That’s not it, yeah.

That’s how I feel too.

There’s OpenAI has, I think, OpenMuse

or something like that, the system.

It’s cool, but it’s like, eh,

it’s not compelling for some reason.

It could be a psychological reason too.

Maybe we need to have a human being,

a tortured soul behind the music.

I don’t know.

Yeah, no, absolutely.

I completely agree.

But yeah, whether or not we’ll have,

one day we’ll have, you know,

a song written by an AI engine

to be like in top charts, musical charts,

I wouldn’t be surprised.

I wouldn’t be surprised.

I wonder if we already have one

and it just hasn’t been announced.

We wouldn’t know.

How hard is the multi protein folding problem?

Is that kind of something you’ve already mentioned

which is baked into this idea of greater

and greater complexity of proteins?

Like multi domain proteins,

is that basically become multi protein complexes?

Yes, you got it right.

So it’s sort of, it has the components

of both of protein folding

and protein, protein interactions.

Because in order for these domains,

many of these proteins actually,

they never form a stable structure.

One of my favorite proteins,

and pretty much everyone who works in the,

I know, whom I know, who works with proteins,

they always have their favorite proteins.

Right, so one of my favorite proteins,

probably my favorite protein,

the one that I worked when I was a postdoc

is so called post synaptic density 95, PSD 95 protein.

So it’s one of the key actors

in the majority of neurological processes

at the molecular level.

So it’s a, and essentially it’s a key player

in the post synaptic density.

So this is the crucial part of this synapse

where a lot of these chemical processes are happening.

So it has five domains, right?

So five protein domains.

So pretty large proteins, I think 600 something assets.

But the way it’s organized itself, it’s flexible, right?

So it acts as a scaffold.

So it is used to bring in other proteins.

So they start acting in the orchestrated manner, right?

So, and the type of the shape of this protein,

it’s in a way, there are some stable parts of this protein,

but there are some flexible.

And this flexibility is built in into the protein

in order to become sort of this multifunctional machine.

So do you think that kind of thing is also learnable

through the alpha fold two kind of approach?

I mean, the time will tell.

Is it another level of complexity?

Is it like how big of a jump in complexity

is that whole thing?

To me, it’s yet another level of complexity

because when we talk about protein, protein interactions,

and there is actually a different challenge for this

called Capri, and so this, that is focused specifically

on macromolecular interactions, protein, protein, protein,

DNA, et cetera.

So, but it’s, there are different mechanisms

that govern molecular interactions

and that need to be picked up,

say by a machine learning algorithm.

Interestingly enough, we actually,

we participated for a few years in this competition.

We typically don’t participate in competitions,

I don’t know, don’t have enough time,

because it’s very intensive, it’s a very intensive process.

But we participated back in about 10 years ago or so.

And the way we entered this competition,

so we design a scoring function, right?

So the function that evaluates

whether or not your protein, protein interaction

is supposed to look like experimentally solved, right?

So the scoring function is very critical part

of the model prediction.

So we designed it to be a machine learning one.

And so it was one of the first machine learning

based scoring function used in Capri.

And we essentially learned what should contribute,

what are the critical components contributing

into the protein, protein interaction.

So this could be converted into a learning problem

and thereby it could be learned?

I believe so, yes.

Do you think AlphaFold2 or something similar to it

from DeepMind or somebody else will be,

will result in a Nobel Prize or multiple Nobel Prizes?

So like, you know, obviously, maybe not so obviously,

you can’t give a Nobel Prize to a computer program.

At least for now, give it to the designers of that program.

But do you see one or multiple Nobel Prizes

where AlphaFold2 is like a large percentage

of what that prize is given for?

Would it lead to discoveries at the level of Nobel Prizes?

I mean, I think we are definitely destined

to see the Nobel Prize becoming sort of,

to be evolving with the evolution of science

and the evolution of science as such

that it now becomes like really multi facets, right?

So where you don’t really have like a unique discipline,

you have sort of the, a lot of cross disciplinary talks

in order to achieve sort of, you know,

really big advancements, you know.

So I think, you know, the computational methods

will be acknowledged in one way or another.

And as a matter of fact, you know,

they were first acknowledged back in 2013, right?

Where, you know, the first three people were, you know,

awarded the Nobel Prize for study the protein folding,

right, the principle.

And, you know, I think all three of them

are computational biophysicists, right?

So, you know, that I think is unavoidable.

You know, it will come with the time.

The fact that, you know, alpha fold and, you know,

similar approaches, because again, it’s a matter of time

that people will embrace this, you know, principle

and we’ll see more and more such, you know,

such tools coming into play.

But, you know, these methods will be critical

in a scientific discovery, no doubts about it.

On the engineering side, maybe a dark question,

but do you think it’s possible to use

these machine learning methods

to start to engineer proteins?

And the next question is something quite a few biologists

are against, some are for, for study purposes,

is to engineer viruses.

Do you think machine learning, like something like alpha fold

could be used to engineer viruses?

So to answer the first question, you know,

it has been, you know, a part of the research

in the protein science, the protein design is, you know,

is a very prominent areas of research.

Of course, you know, one of the pioneers is David Baker

and Rosetta algorithm that, you know,

essentially was doing the de novo design and was used

to design new proteins, you know.

And design of proteins means design of function.

So like when you design a protein, you can control,

I mean, the whole point of a protein

with the protein structure comes a function,

like it’s doing something.


So you can design different things.

So you can, yeah, so you can, well,

you can look at the proteins from the functional perspective.

You can also look at the proteins

from the structural perspective, right?

So the structural building blocks.

So if you want to have a building block

of a certain shape, you can try to achieve it

by, you know, introducing a new protein sequence

and predicting, you know, how it will fold.

So with that, I mean, it’s a natural,

one of the, you know, natural applications

of these algorithms.

Now, talking about engineering a virus.

With machine learning.

With machine learning, right?

So, well, you know, so luckily for us,

I mean, we don’t have that much data, right?


We actually, right now, one of the projects

that we are carrying on in the lab

is we’re trying to develop a machine learning algorithm

that determines the,

whether or not the current strain is pathogenic.

And the current strain of the coronavirus.

Of the virus.

I mean, so there are applications to coronaviruses

because we have strains of SARS COVID 2,

also SARS COVID, MERS that are pathogenic,

but we also have strains of other coronaviruses

that are, you know, not pathogenic.

I mean, the common cold viruses and, you know,

some other ones, right?

So, so pathogenic meaning spreading.

Pathogenic means actually inflicting damage.


There are also some, you know,

seasonal versus pandemic strains of influenza, right?

And determining the, what are the molecular determinant,


So that are built in, into the protein sequence,

into the gene sequence, right?

So, and whether or not the machine learning

can determine those, those components, right?

Oh, interesting.

So like using machine learning to do,

that’s really interesting to, to, to given,

give the input is like what the entire,

the protein sequence and then determine

if this thing is going to be able to do damage

to a biological system.


So, so I mean,

It’s a good machine learning,

you’re saying we don’t have enough data for that?

We, I mean, for, for this specific one, we do.

We might actually, I have, you know,

have to back up on this because we’re still in the process.

There was one work that appeared in bioarchive

by Eugene Kunin, who is one of these, you know,

pioneers in, in, in evolutionary genomics.

And they tried to look at this, but, you know,

the methods were sort of standard, you know,

supervised learning methods.

And now the question is, you know,

can you advance it further by, by using, you know,

not so standard methods, you know?

So there’s obviously a lot of hope in,

in transfer learning where you can actually try to transfer

the information that the machine learning learns about

the proper protein sequences, right?

And, you know, so, so there is some promise

in going this direction, but if we have this,

it would be extremely useful because then

we could essentially forecast the potential mutations

that would make the current strain

more or less pathogenic.

Anticipate, anticipate them from a vaccine development,

for the treatment, antiviral drug development.

That, that would be a very crucial task.

But you could also use that system to then say,

how would we potentially modify this virus

to make it more pathogenic?

This, that’s true.

That’s true.

And then, you know, the, again,

the hope is, well, several things, right?

So one is that, you know, it’s,

even if you design a, you know, a sequence, right?

So to carry out the actual experimental biology,

to ensure that all the components working, you know,

is a completely different matter.

Difficult process.


Then the, you know, we’ve seen in the past,

there could be some regulation of the moment

the scientific community recognizes

that it’s now becoming no longer a sort of a fun puzzle

to, you know, for machine learning.

Could be open.

Yeah, so then there might be some regulation.

So I think back in, what, 2015, there was, you know,

there was an issue on regulating the research

on influenza strains, right?

There were several groups, you know,

used sort of the mutation analysis

to determine whether or not this strain will jump

from one species to another.

And I think there was like a half a year moratorium

on the research on the paper published

until, you know, scientists, you know, analyzed it

and decided that it’s actually safe.

I forgot what that’s called.

Something of function, test of function.

Gain of function.

Gain of function, yeah.

Gain of function, loss of function, that’s right.


It’s like, let’s watch this thing mutate for a while

to see like, to see what kind of things we can observe.

I guess I’m not so much worried

about that kind of research if there’s a lot of regulation

and if it’s done very well and with competence and seriously.

I am more worried about kind of this, you know,

the underlying aspect of this question

is more like 50 years from now.

Speaking to the Drake equation,

one of the parameters in the Drake equation

is how long civilizations last.

And that seems to be the most important value actually

for calculating if there’s other alien

intelligent civilizations out there.

That’s where there’s most variability.

Assuming like if life, if that percentage

that life can emerge is like not zero,

like if we’re a super unique,

then it’s the how long we last

is basically the most important thing.

So from a selfish perspective,

but also from a Drake equation perspective,

I’m worried about our civilization lasting.

And you kind of think about all the ways

in which machine learning can be used

to design greater weapons of destruction, right?

And I mean, one way to ask that

if you look sort of 50 years from now,

a hundred years from now,

would you be more worried about natural pandemics

or engineered pandemics?

Like who’s the better designer of viruses,

nature or humans if we look down the line?

I think in my view, I would still be worried

about the natural pandemics simply because I mean,

the capacity of the nature producing this.

It does pretty good job, right?


And the motivation for using virus,

engineering viruses as a weapon is a weird one

because maybe you can correct me on this,

but it seems very difficult to target a virus, right?

The whole point of a weapon, the way a rocket works,

if a starting point, you have an end point

and you’re trying to hit a target,

to hit a target with a virus is very difficult.

It’s basically just, right?

The target would be the human species.

Oh man.

Yeah, I have a hope in us.

I’m forever optimistic that we will not,

there’s insufficient evil in the world

to lead to that kind of destruction.

Well, I also hope that, I mean, that’s what we see.

I mean, with the way we are getting connected,

the world is getting connected.

I think it helps for the world to become more transparent.


So the information spread is,

I think it’s one of the key things for the society

to become more balanced one way or another.

This is something that people disagree with me on,

but I do think that the kind of secrecy

that governments have.

So you’re kind of speaking more to the other aspects,

like a research community being more open,

companies are being more open.

Government is still like,

we’re talking about like military secrets.

I think military secrets of the kind

that could destroy the world

will become also a thing of the 20th century.

It’ll become more and more open.


I think nations will lose power in the 21st century,

like lose sufficient power towards secrecies.

Transparency is more beneficial than secrecy,

but of course it’s not obvious.

Let’s hope so.

Let’s hope so that the governments

will become more transparent.

What, so we last talked, I think in March or April,

what have you learned?

How has your philosophical, psychological,

biological worldview changed since then?

Or you’ve been studying it nonstop

from a computational biology perspective.

How has your understanding and thoughts about this virus

changed over those months from the beginning to today?

One thing that I was really amazed at

how efficient the scientific community was.

I mean, and even just judging on this very narrow domain

of protein structure and understanding

the structural characterization of this virus

from the components point of view,

whole virus point of view.

If you look at SARS, something that happened less than 20,

but close enough, 20 years ago,

and you see what, when it happened,

what was sort of the response by the scientific community,

you see that the structure characterizations did a cure,

but it took several years, right?

Now the things that took several years,

it’s a matter of months, right?

So we see that the research pop up.

We are at the unprecedented level

in terms of the sequencing, right?

Never before we had a single virus sequence so many times,

so which allows us to actually to trace very precisely

the sort of the evolutionary nature of this virus,

what happens, and it’s not just this virus independently

of everything, it’s the sequence of this virus

linked, anchored to the specific geographic place

to specific

people, because our genotype influences also

the evolution of this, it’s always a host pathogen,

core evolution that, you know,

it’s not just the virus, it’s the sequence of this virus,

it’s the sequence of this virus linked to the specific

geographic place, it’s the sequence of this virus

linked to the specific geographic place to specific people,

that, you know, occurs.

It’d be cool if we also had a lot more data about,

so that the spread of this virus, not maybe,

well, it’d be nice if we had it for like contact tracing

purposes for this virus, but it’d be also nice if we had it

for the study for future viruses to be able to respond

and so on, but it’s already nice that we have geographical

data and the basic data from individual humans, yeah.

Exactly, no, I think contact tracing is obviously

a key component in understanding

the spread of this virus.

There is also, there is a number of challenges, right?

So XPRIZE is one of them, we

just recently took a part of

this competition, it’s the prediction of the

number of infections in different regions.

Oh, sure.

So, you know, obviously the AI

is the main topic in those predictions.

Yeah, but it’s still, the data, I mean, that’s a competition,

but the data is weak

on the training. Like, it’s great,

it’s much more than probably before, but like, it’d be nice if it was like

really rich. I talked to Michael Mina from

Harvard, I mean, he dreams that the community comes together with like a

weather map to where viruses, right, like

really high resolution sensors on like how

from person to person the viruses that travel, all the different kinds of viruses, right?

Because there’s a ton of them, and then you’d be able to tell

the story that you’ve spoken about

of the evolution of these viruses, like day to day mutations that

are occurring. I mean, that’d be fascinating just from a perspective of

study and from the perspective of being able to respond to future pandemics.

That’s ultimately what I’m worried about. People love

books. Is there some three

or whatever number of books, technical, fiction, philosophical, that

brought you joy in life, had an impact on your life,

and maybe some that you would recommend others?

I’ll give you three very different books, and I also have a special runner up.

Honorable mention.

I mean, it’s an audiobook, and that’s

some specific reason behind it. So the first book is

something that sort of impacted my earlier

stage of life, and I’m probably not going to be very original here.

It’s Bulgakov’s Master and Margarita.

For a Russian, maybe it’s not super original,

but it’s a really powerful book, even in English.

It is incredibly powerful, and

I mean, the way it ends.

I still have goosebumps when I read

the very last sort of, it’s called prologue, where

it’s just so powerful. What impact did it have on you? What ideas?

What insights did you get from it? I was just taken by

the fact that

you have those parallel lives

apart from many centuries, and

somehow they got sort of intertwined into

one story, and that

to me was fascinating. And of course

the romantic part of this book is like

it’s not just romance, it’s like the romance

empowered by sort of magic, right?

And maybe on top of that, you have some irony,

which is unavoidable, right? Because it was that

Soviet time. But it’s very deeply Russian, so that’s

the wit, the humor, the pain, the love,

all of that is one of the books that kind of captures

something about Russian culture that people outside of Russia

should probably read. I agree. What’s the second one? So the second one

is again another one that it happened

I read it later in my life. I think I read it

first time when I was a graduate student.

And that’s the Solzhenitsyn’s Cancer Word.

That is amazingly powerful book.

What is it about? It’s about, I mean, essentially

based on Solzhenitsyn was

diagnosed with cancer when he was reasonably young, and he

made a full recovery. So this is

about a person who was sentenced

for life in one of these camps.

And he had some cancer,

so he was transported back to one of these

Soviet republics, I think it was

South Asian republics. And the

book is about

his experience being a

prisoner, being a patient in the

cancer clinic, in the cancer ward, surrounded

by people, many of which die.

But in the way

it reads, first of all, later on I

read the accounts of the doctors

who describe the experiences

in the book by the

patient as incredibly accurate.

So I read that there was some doctor saying that

every single doctor should read this book to understand

what the patient feels. But

again, as many of the Solzhenitsyn’s

books, it has multiple levels of complexity.

And obviously if you look above

the cancer and the patient, the

tumor that was growing and then disappeared

in his

body with some consequences, this is

allegorically the

Soviet, and he actually

when he was asked, he said that this is what made him

think about this, how to combine these experiences.

Him being a part of the Soviet regime,

also being a part of the

someone sent to Gulag camp,

and also someone who experienced cancer

in his life. The Gulag Archipelago

and this book, these are the works that actually made him

receive a Nobel Prize. But to me

I’ve read

other books by Solzhenitsyn.

This one to me is the most powerful one.

And by the way, both this one and the previous one you read in Russian?

Yes. So now there is the third book is an English book

and it’s completely different. So we’re switching the gears

completely. So this is the book which, it’s not even

a book, it’s an essay by

Jonathan Neumann called The Computer and the Brain.

And that was the book he was writing

knowing that he was dying of cancer.

So the book was released back, it’s a very thin book.

But the power,

the intellectual power in this book, in this essay

is incredible. I mean you probably know that von Neumann

is considered to be one of the biggest

thinkers. So his intellectual power was incredible.

And you can actually feel this power

in this book where the person is writing knowing that he will be,

he will die. The book actually got published only after his

death back in 1958. He died in 1957.

So he tried to put as many

ideas that he still

hadn’t realized.

So this book is very difficult

to read because every single paragraph

is just compact, is

filled with these ideas. And the ideas are incredible.

Even nowadays, so he tried

to put the parallels between the brain

computing power, the neural system, and the computers

as they were understood. Do you remember what year he was working on this?

    1. So that was right during his,

when he was diagnosed with cancer and he was essentially…

Yeah, he’s one of those, there’s a few folks people mention,

I think Ed Witten is another that like

everyone that meets them, they say he’s just an intellectual powerhouse.

Yes. Okay, so who’s the honorable mention?

And this is, I mean, the reason I put it sort of in a separate section

because this is a book that I recently

listened to. So it’s an audio book.

And this is a book called Lab Girl by Hope Jarron.

So Hope Jarron, she is a

scientist, she’s a geochemist that essentially

studies the

fossil plants. And so she uses

this fossil plant, the chemical analysis to understand

what was the climate back in

a thousand years, hundreds of thousands of years ago.

And so something that incredibly

touched me by this book, it was narrated by the author.

Nice. And it’s an incredibly

personal story, incredibly. So

certain parts of the book, you could actually hear the author crying.

And that to me, I mean, I never experienced

anything like this, reading the book, but it was like

the connection between you and the author.

And I think this is really

a must read, but even better, a must listen

to audio book for anyone who

wants to learn about sort of

academia, science, research in general, because it’s

a very personal account about her becoming

a scientist. So

we’re just before New Year’s.

We talked a lot about some difficult topics of viruses and so on.

Do you have some exciting things you’re looking forward

to in 2021? Some New Year’s resolutions,

maybe silly or fun, or

something very important and fundamental to

the world of science or something completely unimportant?

Well, I’m definitely looking forward to

towards things becoming normal.

So yes, I really miss traveling.

Every summer I go

to an international summer school. It’s called

the School for Molecular and Theoretical Biology. It’s held in Europe.

It’s organized by very good friends of mine. And this is

the school for gifted kids from all over the world, and

they’re incredibly bright. It’s like every time I go there, it’s like, you know,

it’s a highlight of the year. And

we couldn’t make it this August, so we

did this school remotely, but it’s different.

So I am definitely looking forward to next August

coming there. One of

my personal resolutions, I realized that

being in the house and working from home,

I realized that actually

I apparently missed a lot

spending time with my family,

believe it or not. So you typically, with all the

research and teaching and

everything related to the academic life,

I mean, you get distracted. And so

you don’t feel that

the fact that you are away from your family doesn’t affect you

because you are naturally distracted by other things.

So this time I realized that

that’s so important, right? Spending your time with

the family, with your kids. And so that

would be my new year resolution and actually trying to

spend as much time as possible. Even when the world opens up.

Yeah, that’s a beautiful message. That’s a beautiful reminder.

I asked you if there’s a Russian poem

that I could read, that I could force you to read, and you said, okay, fine, sure.

Do you mind reading?

And you said that no paper needed.

So this poem was written by my namesake,

another Dmitry, Dmitry Kemerefeld.

It’s a recent poem and it’s

called Sorceress, Vyadma,

in Russian, or actually

Koldunya. So that’s sort of another sort of connotation of

sorceress or witch. And I really like it

and it’s one of just a handful poems I actually

can recall by heart. I also have a very strong

association when I read this poem with Master and

Margarita, the main female character,

Margarita. And also it’s

about, it’s happening about the same time we’re talking

now, so around New Year,

around Christmas. Do you mind reading it in Russian?

I’ll give it a try.

So you narrowed your eyes,

that anyone who was blessed

was ready to give their soul to the devil

for this witch’s connection.

And I, without prejudice,

ran out to feel your

amazing breath on your lips,

to remember how you flew above the earth

in a white view,

in a white haze, in a white mist.

That’s beautiful. I love how it captures a moment of longing

and maybe love even.

Yes. To me it has a lot of meaning about

this something that is happening,

something that is far away, but still very close to you.

And yes, it’s the winter.

There’s something magical about winter, isn’t there?

I don’t know how to translate it, but a kiss in winter

is interesting. Lips in winter and all that kind of stuff.

It’s beautiful. Russian has a way. It has a reason, Russian poetry

is just, I’m a fan of poetry in both languages, but English

doesn’t capture some of the magic that Russian seems to, so

thank you for doing that. That was awesome. Dmitry,

it’s great to talk to you again. It’s contagious

how much you love what you do, how much you love life, so I really appreciate

you taking the time to talk today. And thank you for having me.

Thanks for listening to this conversation with Dmitry Korkin, and thank you to our

sponsors. Brave Browser, NetSuite Business Management

Software, Magic Spoon Low Carb Cereal, and

Asleep Self Cooling Mattress. So the choice is

browsing privacy, business success, healthy diet, or comfortable

sleep. Choose wisely, my friends. And if you wish,

click the sponsor links below to get a discount and to support this podcast.

And now, let me leave you with some words from Jeffrey Eugenides.

Biology gives you a brain.

Life turns it into a mind. Thank you for listening,

and hope to see you next time.

comments powered by Disqus