The following is a conversation with Vladimir Vapnik, part two, the second
time we spoke on the podcast.
He’s the coinventor of support vector machines, support vector clustering, VC
theory, and many foundational ideas and statistical learning.
He was born in the Soviet Union, worked at the Institute of Control Sciences
in Moscow, then in the US, worked at AT&T, NEC labs, Facebook AI research,
and now is a professor at Columbia University.
His work has been cited over 200,000 times.
The first time we spoke on the podcast was just over a year
ago, one of the early episodes.
This time we spoke after a lecture he gave titled complete statistical theory
of learning as part of the MIT series of lectures on deep learning
and AI that I organized.
I’ll release the video of the lecture in the next few days.
This podcast and lecture are independent from each other, so you don’t need
one to understand the other.
The lecture is quite technical and math heavy, so if you do watch both, I
recommend listening to this podcast first, since the podcast is
probably a bit more accessible.
This is the artificial intelligence podcast.
If you enjoy it, subscribe on YouTube, give it five stars on Apple podcasts,
support it on Patreon, or simply connect with me on Twitter
at Lex Friedman spelled F R I D M A N.
As usual, I’ll do one or two minutes of ads now and never any ads in
the middle that can break the flow of the conversation.
I hope that works for you and doesn’t hurt the listening experience.
This show is presented by Cash App, the number one finance app in the app store.
When you get it, use code LexPodcast.
Cash App lets you send money to friends, buy Bitcoin, and invest in the
stock market with as little as $1.
Broker services are provided by Cash App Investing, a subsidiary of Square
and member SIPC, since Cash App allows you to send and receive money
digitally, peer to peer, and security in all digital transactions is very important.
Let me mention that PCI data security standard, PCI DSS level one,
that Cash App is compliant with.
I’m a big fan of standards for safety and security and PCI DSS is a good
example of that, where a bunch of competitors got together and agreed
that there needs to be a global standard around the security of transactions.
Now we just need to do the same for autonomous vehicles
and AI systems in general.
So again, if you get Cash App from the app store or Google Play and use the code
LexPodcast, you get $10 and Cash App will also donate $10 to FIRST, one of my
favorite organizations that is helping to advance robotics and STEM education
for young people around the world.
And now here’s my conversation with Vladimir Vapnik.
You and I talked about Alan Turing yesterday a little bit and that he, as the
father of artificial intelligence, may have instilled in our field, an ethic
of engineering and not science, seeking more to build intelligence
rather than to understand it.
What do you think is the difference between these two paths of engineering
intelligence and the science of intelligence?
It’s a completely different story.
Engineering is a mutation of human activity.
You have to make a device which behaves as humans behave, have all the functions
of humans.
It doesn’t matter how you do it, but to understand what is intelligence,
but to understand what is intelligence about, it’s quite a different problem.
So I think, I believe that it’s somehow related to the predicate we talked
yesterday about, because look at the Vladimir Propp’s idea.
He just found 31 here, predicates, he called it units, which can explain
human behavior, at least in Russian tales.
You look at Russian tales and derive from that.
And then people realize that it’s more wide than in Russian tales.
It is in TV, in movie serials and so on and so on.
So you’re talking about Vladimir Propp, who in 1928 published a book,
Morphology of the Folktale, describing 31 predicates that have this kind of
sequential structure that a lot of the stories, narratives follow in Russian
folklore and in other contexts.
We’ll talk about it.
I’d like to talk about predicates in a focused way, but let me, if you allow
me to stay zoomed out on our friend, Alan Turing, and, you know, he inspired
a generation with the imitation game.
Yes.
Do you think if we can linger on that a little bit longer, do you think we can
learn, do you think learning to imitate intelligence can get us closer to the
science, to understanding intelligence?
So why do you think imitation is so far from understanding?
I think that it is different between you have different goals.
So your goal is to create something, something useful.
Yeah.
And that is great.
And you can see how much things was done and I believe that it will be done even
more, it’s self driving cars and also the business, it is great.
And it was inspired by Turing’s vision.
But understanding is very difficult.
It’s more or less philosophical category.
What means understand the world?
I believe in scheme which starts from Plato, that there exists world of ideas.
I believe that intelligence, it is world of ideas, but it is world of pure ideas.
And when you combine them with reality things, it creates, as in my case,
invariants, which is very specific.
And that’s, I believe, the combination of ideas in way to constructing invariants.
Constructing invariant is intelligence.
But first of all, predicate, if you know, predicate and hopefully
then not too much predicate exists.
For example, 31 predicate for human behavior, it is not a lot.
Vladimir Propp used 31, you can even call them predicate, 31
predicates to describe stories, narratives.
Do you think human behavior, how much of human behavior, how much of our
world, our universe, all the things that matter in our existence can be
summarized in predicates of the kind that Propp was working with?
I think that we have a lot of form of behavior, but I think that
predicate is much less because even in this example, which I gave you
yesterday, you saw that predicate can be, one predicate can construct many
different invariants depending on your data.
They’re applying to different data and they give different invariants.
So, but pure ideas, maybe not so much.
Not so many.
I don’t know about that, but my guess, I hope that’s why challenge
about digit recognition, how much you need.
I think we’ll talk about computer vision and 2D images a little bit
in your challenge.
That’s exactly about intelligence.
That’s exactly, that’s exactly about, no, that hopes to be exactly about
the spirit of intelligence in the simplest possible way.
Yeah, absolutely you should start the simplest way, otherwise you
will not be able to do it.
Well, there’s an open question whether starting at the MNIST digit
recognition is a step towards intelligence or it’s an entirely different thing.
I think that to beat records using say 100, 200 times less examples,
you need intelligence.
You need intelligence.
So let’s, because you use this term and it would be nice, I’d like to
ask simple, maybe even dumb questions.
Let’s start with a predicate.
In terms of terms and how you think about it, what is a predicate?
I don’t know.
I have a feeling formally they exist, but I believe that predicate for
2D images, one of them is symmetry.
Hold on a second.
Sorry.
Sorry, sorry to interrupt and pull you back.
At the simplest level, we’re not even, we’re not being profound currently.
A predicate is a statement of something that is true.
Yes.
Do you think of predicates as somehow probabilistic in nature or is this binary?
This is truly constraints of logical statements about the world.
In my definition, the simplest predicate is function.
Function, and you can use this function to make inner product that is predicate.
What’s the input and what’s the output of the function?
Input is X, something which is input in reality.
Say if you consider digit recognition, it pixel space input, but it is
function which in pixel space, but it can be any function from pixel space and you
choose, and I believe that there are several functions which is important for
understanding of images.
One of them is symmetry.
It’s not so simple construction as I described with the derivative, with all
this stuff, but another, I believe, I don’t know how many, is how well
structurized is picture.
Structurized?
Yeah.
What do you mean by structurized?
It is formal definition.
Say something heavy on the left corner, not so heavy in the middle and so on.
You describe in general concept of what you assume.
Concepts, some kind of universal concepts.
Yeah, but I don’t know how to formalize this.
Do you?
So this is the thing.
There’s a million ways we can talk about this.
I’ll keep bringing it up, but we humans have such concepts when we look at
digits, but it’s hard to put them, just like you’re saying now, it’s
hard to put them into words.
You know, that is example, when critics in music, trying to describe music,
they use predicate and not too many predicate, but in different combination,
but they have some special words for describing music and the same
should be for images, but maybe there are critics who understand essence
of what this image is about.
Do you think there exists critics who can summarize the essence of
images, human beings?
I hope so, yes, but that…
Explicitly state them on paper.
The fundamental question I’m asking is, do you think there exists a small
set of predicates that will summarize images?
It feels to our mind, like it does, that the concept of what makes a two
and a three and a four…
No, no, no, it’s not on this level.
It should not describe two, three, four.
It describes some construction, which allow you to create invariance.
And invariance, sorry to stick on this, but terminology.
Invariance, it is property of your image.
Say, I can say, looking on my image, it is more or less symmetric.
Looking on my image, it is more or less symmetric, and I can give you value
of symmetry, say, level of symmetry, using this function which I gave
yesterday. And you can describe that your image has these characteristics
exactly in the way how musical critics describe music.
So, but this is invariant applied to specific data, to specific music,
to something.
I strongly believe in this plot ideas that there exists world of predicate
and world of reality, and predicate and reality is somehow connected,
and you have to know that.
Let’s talk about Plato a little bit.
So you draw a line from Plato, to Hegel, to Wigner, to today.
So Plato has forms, the theory of forms.
So there’s a world of ideas and a world of things, as you talk about,
and there’s a connection.
And presumably the world of ideas is very small, and the world of things
is arbitrarily big, but they’re all what Plato calls them like, it’s a shadow.
The real world is a shadow from the world of forms.
Yeah, you have projection of a world of ideas.
Yeah, very poetic.
In reality, you can realize this projection using these invariants
because it is projection for own specific examples, which create specific features
of specific objects.
So the essence of intelligence is while only being able to observe
the world of things, try to come up with a world of ideas.
Exactly.
Like in this music story, intelligent musical critics knows all these words
and have a feeling about what they mean.
I feel like that’s a contradiction, intelligent music critics.
But I think music is to be enjoyed in all its forms.
The notion of critic, like a food critic.
No, I don’t want touch emotion.
That’s an interesting question.
Does emotion…
There’s certain elements of the human psychology, of the human experience,
which seem to almost contradict intelligence and reason.
Like emotion, like fear, like love, all of those things,
are those not connected in any way to the space of ideas?
That I don’t know.
I just want to be concentrate on very simple story, on digit recognition.
So you don’t think you have to love and fear death in order to recognize digits?
I don’t know.
Because it’s so complicated.
It involves a lot of stuff which I never considered.
But I know about digit recognition.
And I know that for digit recognition,
to get records from small number of observations, you need predicate.
But not special predicate for this problem.
But universal predicate, which understand world of images.
Of visual information.
Visual, yes.
But on the first step, they understand, say, world of handwritten digits,
or characters, or something simple.
So like you said, symmetry is an interesting one.
No, that’s what I think one of the predicate is related to symmetry.
The level of symmetry.
Okay, degree of symmetry.
So you think symmetry at the bottom is a universal notion,
and there’s degrees of a single kind of symmetry,
or is there many kinds of symmetries?
Many kinds of symmetries.
There is a symmetry, antisymmetry, say, letter S.
So it has vertical antisymmetry.
And it could be diagonal symmetry, vertical symmetry.
So when you cut vertically the letter S…
Yeah, then the upper part and lower part in different directions.
Inverted, along the Y axis.
But that’s just like one example of symmetry, right?
Isn’t there like…
Right, but there is a degree of symmetry.
If you play all this iterative stuff to do tangent distance,
whatever I describe, you can have a degree of symmetry.
And that is what describing reason of image.
It is the same as you will describe this image.
Think about digit S, it has antisymmetry.
Digit three is symmetric.
More or less, look for symmetry.
Do you think such concepts like symmetry,
predicates like symmetry, is it a hierarchical set of concepts?
Or are these independent, distinct predicates
that we want to discover as some set of…
No, there is an idea of symmetry.
And you can, this idea of symmetry, make very general.
Like degree of symmetry.
If degree of symmetry can be zero, no symmetry at all.
Or degree of symmetry, say, more or less symmetrical.
But you have one of these descriptions.
And symmetry can be different.
As I told, horizontal, vertical, diagonal,
and antisymmetry is also concept of symmetry.
What about shape in general?
I mean, symmetry is a fascinating notion, but…
No, no, I’m talking about digit.
I would like to concentrate on all I would like to know,
predicate for digit recognition.
Yes, but symmetry is not enough for digit recognition, right?
It is not necessarily for digit recognition.
It helps to create invariant, which you can use
when you will have examples for digit recognition.
You have regular problem of digit recognition.
You have examples of the first class or second class.
Plus, you know that there exists concept of symmetry.
And you apply, when you’re looking for decision rule,
you will apply concept of symmetry,
of this level of symmetry, which you estimate from…
So let’s talk.
Everything comes from weak convergence.
What is convergence?
What is weak convergence?
What is strong convergence?
I’m sorry, I’m gonna do this to you.
What are we converging from and to?
You’re converging, you would like to have a function.
The function which, say, indicator function,
which indicate your digit five, for example.
A classification task.
Let’s talk only about classification.
So classification means you will say
whether this is a five or not,
or say which of the 10 digits it is.
Right, right.
I would like to have these functions.
Then, I have some examples.
I can consider property of these examples.
Say, symmetry.
And I can measure level of symmetry for every digit.
And then I can take average from my training data.
And I will consider only functions
of conditional probability,
which I’m looking for my decision rule.
Which applying to digits will give me the same average
as I observe on training data.
So, actually, this is different level
of description of what you want.
You want not just, you show not one digit.
You show, this predicate, show general property
of all digits which you have in mind.
If you have in mind digit three,
it gives you property of digit three.
And you select as admissible set of function,
only function, which keeps this property.
You will not consider other functions.
So, you immediately looking for smaller subset of function.
That’s what you mean by admissible functions.
Admissible function, exactly.
Which is still a pretty large,
for the number three, is a large.
It is pretty large, but if you have one predicate.
But according to, there is a strong and weak convergence.
Strong convergence is convergence in function.
You’re looking for the function on one function,
and you’re looking for another function.
And square difference from them should be small.
If you take difference in any points,
make a square, make an integral, and it should be small.
That is convergence in function.
Suppose you have some function, any function.
So, I would say, I say that some function
converge to this function.
If integral from square difference between them is small.
That’s the definition of strong convergence.
That definition of strong convergence.
Two functions, the integral, the difference, is small.
Yeah, it is convergence in functions.
Yeah.
But you have different convergence in functionals.
You take any function, you take some function, phi,
and take inner product, this function, this f function.
f0 function, which you want to find.
And that gives you some value.
So, you say that set of functions converge
in inner product to this function,
if this value of inner product converge to value f0.
That is for one phi.
But weak convergence requires that it converge for any
function of Hilbert space.
If it converge for any function of Hilbert space,
then you will say that this is weak convergence.
You can think that when you take integral,
that is integral property of function.
For example, if you will take sine or cosine,
it is coefficient of, say, Fourier expansion.
So, if it converge for all coefficients of Fourier
expansion, so under some condition,
it converge to function you’re looking for.
But weak convergence means any property.
Convergence not point wise, but integral property
of function.
So, weak convergence means integral property of functions.
When I’m talking about predicate,
I would like to formulate which integral properties
I would like to have for convergence.
So, and if I will take one predicated function,
which I measure property, if I will use one predicate
and say, I will consider only function which give me
the same value as this predicate,
I selecting set of functions from functions
which is admissible in the sense that function which I’m
looking for in this set of functions
because I checking in training data, it gives the same.
Yeah, so it always has to be connected to the training
data in terms of?
Yeah, but property, you can know independent on training data.
And this guy, prop, says that there is formal property,
31 property.
A fairy tale, a Russian fairy tale.
But Russian fairy tale is not so interesting.
More interesting that people apply this to movies,
to theater, to different things.
And the same works, they’re universal.
Well, so I would argue that there’s
a little bit of a difference between the kinds of things
that were applied to which are essentially stories
and digit recognition.
It is the same story.
You’re saying digits, there’s a story within the digit.
Yeah.
And so but my point is why I hope
that it possible to beat record using not 60,000,
but say 100 times less.
Because instead, you will give predicates.
And you will select your decision
not from wide set of functions, but from set of functions
which keeps this predicates.
But predicate is not related just to digit recognition.
Right.
Like in Plato’s case.
Do you think it’s possible to automatically discover
the predicates?
So you basically said that the essence of intelligence
is the discovery of good predicates.
Yeah.
Now, the natural question is that’s
what Einstein was good at doing in physics.
Can we make machines do these kinds
of discovery of good predicates?
Or is this ultimately a human endeavor?
That I don’t know.
I don’t think that machine can do.
Because according to theory about weak convergence,
any function from Hilbert space can be predicated.
So you have infinite number of predicate in upper.
And before, you don’t know which predicate is good and which.
But whatever prop show and why people call it breakthrough,
that there is not too many predicate
which cover most of situation happened in the world.
Right.
So there’s a sea of predicates.
And most of the only a small amount
are useful for the kinds of things
that happen in the world.
I think that I would say only small part of predicate
very useful.
Useful all of them.
Only very few are what we should let’s call them
good predicates.
Very good predicates.
So can we linger on it?
What’s your intuition?
Why is it hard for a machine to discover good predicates?
Even in my talk described how to do predicate.
How to find new predicate.
I’m not sure that it is very good.
What did you propose in your talk?
No.
In my talk, I gave example for diabetes.
Diabetes, yeah.
When we achieve some percent.
So then we’re looking for area where
some sort of predicate, which I formulate,
does not keeps invariant.
So if it doesn’t keep, I retrain my data.
I select only function which keeps this invariant.
And when I did it, I improved my performance.
I can looking for this predicate.
I know technically how to do that.
And you can, of course, do it using machine.
But I’m not sure that we will construct the smartest
predicate.
But this is the, allow me to linger on it.
Because that’s the essence.
That’s the challenge.
That is artificial.
That’s the human level intelligence
that we seek is the discovery of these good predicates.
You’ve talked about deep learning as a way to,
the predicates they use and the functions are mediocre.
You can find better ones.
Let’s talk about deep learning.
Sure, let’s do it.
I know only Jan’s Likun convolutional network.
And what else?
I don’t know.
And it’s a very simple convolution.
There’s not much else to know.
To pixel left and right.
I can do it like that with one predicate.
Convolution is a single predicate.
It’s single.
It’s single predicate.
Yes, but that’s it.
You know exactly.
You take the derivative for translation and predicate.
This should be kept.
So that’s a single predicate.
But humans discovered that one.
Or at least.
Not it.
That is a risk.
Not too many predicates.
And that is big story because Jan did it 25 years ago
and nothing so clear was added to deep network.
And then I don’t understand why we
should talk about deep network instead of talking
about piecewise linear functions which keeps this predicate.
Well, a counter argument is that maybe the amount
of predicates necessary to solve general intelligence,
say in the space of images, doing
efficient recognition of handwritten digits
is very small.
And so we shouldn’t be so obsessed about finding.
We’ll find other good predicates like convolution, for example.
There has been other advancements
like if you look at the work with attention,
there’s intentional mechanisms in especially used
in natural language focusing the network’s ability
to learn at which part of the input to look at.
The thing is, there’s other things besides predicates
that are important for the actual engineering mechanism
of showing how much you can really
do given these predicates.
I mean, that’s essentially the work of deep learning
is constructing architectures that are able to be,
given the training data, to be able to converge
towards a function that can generalize well.
It’s an engineering problem.
Yeah, I understand.
But let’s talk not on emotional level,
but on a mathematical level.
You have set of piecewise linear functions.
It is all possible neural networks.
It’s just piecewise linear functions.
It’s many, many pieces.
Large number of piecewise linear functions.
Exactly.
Very large.
Almost feels like too large.
It’s still simpler than, say, convolution,
than reproducing kernel Hilbert space, which
have a Hilbert set of functions.
What’s Hilbert space?
It’s space with infinite number of coordinates,
say, or function for expansion, something like that.
So it’s much richer.
And when I’m talking about closed form solution,
I’m talking about this set of function,
not piecewise linear set, which is particular case of it
is small part.
So neural networks is a small part
of the space of functions you’re talking about.
Say, small set of functions.
Let me take that.
But it is fine.
It is fine.
I don’t want to discuss the small or big.
You take advantage.
So you have some set of functions.
So now, when you’re trying to create architecture,
you would like to create admissible set of functions,
which all your tricks to use not all functions,
but some subset of this set of functions.
Say, when you’re introducing convolutional net,
it is way to make this subset useful for you.
But from my point of view, convolutional,
it is something you want to keep some invariants,
say, translation invariants.
But now, if you understand this and you cannot explain
on the level of ideas what neural network does,
you should agree that it is much better
to have a set of functions.
And they say, this set of functions should be admissible.
It must keep this invariant, this invariant,
and that invariant.
You know that as soon as you incorporate
new invariant set of function, because smaller and smaller
and smaller.
But all the invariants are specified by you, the human.
Yeah, but what I hope that there is a standard predicate,
like PROPSHOW, that’s what I want
to find for digit recognition.
If we start, it is completely new area,
what is intelligence about on the level,
starting from Plato’s idea, what is world of ideas.
And I believe that is not too many.
But it is amusing that mathematicians doing something,
a neural network in general function,
but people from literature, from art, they use this all
the time.
That’s right.
Invariants saying, it is great how people describe music.
We should learn from that.
And something on this level.
But so why Vladimir Propp, who was just theoretical,
who studied theoretical literature, he found that.
You know what?
Let me throw that right back at you,
because there’s a little bit of a,
that’s less mathematical and more emotional, philosophical,
Vladimir Propp.
I mean, he wasn’t doing math.
No.
And you just said another emotional statement,
which is you believe that this Plato world of ideas is small.
I hope.
Do you, what’s your intuition, though?
If we can linger on it.
You know, it is not just small or big.
I know exactly.
Then when I introducing some predicate,
I decrease set of functions.
But my goal to decrease set of function much.
By as much as possible.
Good predicate, which does this, then
I should choose next predicate, which decrease set
as much as possible.
So set of good predicate, it is such
that they decrease this amount of admissible function.
So if each good predicate significantly
reduces the set of admissible functions,
that there naturally should not be that many good predicates.
No, but if you reduce very well the VC dimension
of the function, of admissible set of function, it’s small.
And you need not too much training data to do well.
And VC dimension, by the way, is some measure of capacity
of this set of functions.
Right.
Roughly speaking, how many function in this set.
So you’re decreasing, decreasing.
And it makes easy for you to find function
you’re looking for.
But the most important part, to create good admissible set
of functions.
And it probably, there are many ways.
But the good predicates such that they can do that.
So for this duck, you should know a little bit about duck.
Because what are the three fundamental laws of ducks?
Looks like a duck, swims like a duck, and quacks like a duck.
You should know something about ducks to be able to.
Not necessarily.
Looks like, say, horse.
It’s also good.
So it’s not, it generalizes from ducks.
And talk like, and make sound like horse or something.
And run like horse, and moves like horse.
It is general, it is general predicate
that this applied to duck.
But for duck, you can say, play chess like duck.
You cannot say play chess like duck.
Why not?
So you’re saying you can, but that would not be a good.
No, you will not reduce a lot of functions.
You would not do, yeah, you would not
reduce the set of functions.
So you can, the story is formal story, mathematical story.
Is that you can use any function you want as a predicate.
But some of them are good, some of them are not,
because some of them reduce a lot of functions
to admissible set of some of them.
But the question is, and I’ll probably
keep asking this question, but how do we find such,
what’s your intuition?
Handwritten recognition.
How do we find the answer to your challenge?
Yeah, I understand it like that.
I understand what.
What defined?
What it means, I knew predicate.
Yeah.
Like guy who understand music can say this word,
which he described when he listened to music.
He understand music.
He use not too many different, oh, you can do like prop.
You can make collection.
What he talking about music, about this, about that.
It’s not too many different situation he described.
Because we mentioned Vladimir prop a bunch.
Let me just mention, there’s a sequence of 31
structural notions that are common in stories.
And I think.
You call it units.
Units.
And I think they resonate.
I mean, it starts just to give an example,
obsession, a member of the hero’s community,
a family leaves the security of the home environment.
Then it goes to the interdiction,
a forbidding edict or command is passed upon the hero.
Don’t go there.
Don’t do this.
The hero is warned against some action.
Then step three, violation of interdiction.
Break the rules, break out on your own.
Then reconnaissance.
The villain makes an effort to attain knowledge,
needing to fulfill their plan, so on.
It goes on like this, ends in a wedding, number 31.
Happily ever after.
No, he just gave description of all situations.
He understands this world.
Of folktales.
Yeah, not folktales, but stories.
And these stories not in just folktales.
These stories in detective serials as well.
And probably in our lives.
We probably live.
Read this.
And then they wrote that this predicate is good
for different situation.
From movie, for theater.
By the way, there’s also criticism, right?
There’s an other way to interpret narratives
from Claude Levi Strauss.
I don’t know.
I am not in this business.
No, I know, it’s theoretical literature,
but it’s looking at paradigms behind things.
It’s always the discussion, yeah.
But at least there is units.
It’s not too many units that can describe.
But this guy probably gives another units.
Or another way of…
Exactly, another set of units.
Another set of predicates.
It doesn’t matter how.
But they exist.
Probably.
My question is, whether given those units,
whether without our human brains to interpret these units,
they would still hold as much power as they have.
Meaning, are those units enough
when we give them to an alien species?
Let me ask you.
Do you understand digit images?
No, I don’t understand.
No, no, no.
When you can recognize these digit images,
it means that you understand.
Yes, exactly.
You understand characters, you understand…
No, no, no, no.
It’s the imitation versus understanding question,
because I don’t understand the mechanism
by which I understand.
No, no, no.
I’m not talking about, I’m talking about predicates.
You understand that it involves symmetry,
maybe structure, maybe something else.
I cannot formulate.
I just was able to find symmetries, degree of symmetries.
That’s really good.
So this is a good line.
I feel like I understand the basic elements
of what makes a good hand recognition system my own.
Like symmetry connects with me.
It seems like that’s a very powerful predicate.
My question is, is there a lot more going on
that we’re not able to introspect?
Maybe I need to be able to understand
a huge amount in the world of ideas,
thousands of predicates, millions of predicates
in order to do hand recognition.
I don’t think so.
So both your hope and your intuition
are such that very few predicates are enough.
You’re using digits, you’re using examples as well.
Theory says that if you will use all possible functions
from Hilbert space, all possible predicate,
you don’t need training data.
You just will have admissible set of function
which contain one function.
Yes.
So the trade off is when you’re not using all predicates,
you’re only using a few good predicates
you need to have some training data.
Yes, exactly.
The more good predicates you have,
the less training data you need.
Exactly.
That is intelligent.
Still, okay, I’m gonna keep asking the same dumb question,
handwritten recognition to solve the challenge.
You kind of propose a challenge that says
we should be able to get state of the art MNIST error rates
by using very few, 60, maybe fewer examples per digit.
What kind of predicates do you think it will look like?
That is the challenge.
So people who will solve this problem,
they will answer.
Do you think they’ll be able to answer it
in a human explainable way?
They just need to write function, that’s it.
But so can that function be written, I guess,
by an automated reasoning system?
Whether we’re talking about a neural network
learning a particular function or another mechanism?
No, I’m not against neural network.
I’m against admissible set of function
which create neural network.
You did it by hand.
You don’t do it by invariance, by predicate, by reason.
But neural networks can then reverse,
do the reverse step of helping you find a function
that just, the task of a neural network
is to find a disentangled representation, for example,
that they call, is to find that one predicate function
that’s really capture some kind of essence.
One, not the entire essence, but one very useful essence
of this particular visual space.
Do you think that’s possible?
Listen, I’m grasping, hoping there’s an automated way
to find good predicates, right?
So the question is what are the mechanisms
of finding good predicates, ideas
that you think we should pursue?
A young grad student listening right now.
I gave example.
So find situation where predicate which you’re suggesting
don’t create invariant.
It’s like in physics.
Find situation where existing theory cannot explain it.
Find situation where the existing theory
can’t explain it.
So you’re finding contradictions.
Find contradiction, and then remove this contradiction.
But in my case, what means contradiction,
you find function which, if you will use this function,
you’re not keeping invariants.
This is really the process of discovering contradictions.
Yeah.
It is like in physics.
Find situation where you have contradiction
for one of the property, for one of the predicate.
Then include this predicate, making invariants,
and solve again this problem.
Now you don’t have contradiction.
But it is not the best way, probably, I don’t know,
to looking for predicate.
That’s just one way, okay.
That, no, no, it is brute force way.
The brute force way.
What about the ideas of what,
big umbrella term of symbolic AI?
There’s what in the 80s with expert systems,
sort of logic reasoning based systems.
Is there hope there to find some,
through sort of deductive reasoning,
to find good predicates?
I don’t think so.
I think that just logic is not enough.
It’s kind of a compelling notion, though.
You know, that when smart people sit in a room
and reason through things, it seems compelling.
And making our machines do the same is also compelling.
So, everything is very simple.
When you have infinite number of predicate,
you can choose the function you want.
You have invariants and you can choose the function you want.
But you have to have not too many invariants
to solve the problem.
So, and have from infinite number of function
to select finite number
and hopefully small number of functions,
which is good enough to extract small set
of admissible functions.
So, they will be admissible, it’s for sure,
because every function just decrease set of function
and leaving it admissible.
But it will be small.
But why do you think logic based systems don’t,
can’t help, intuition, not?
Because you should know reality.
You should know life.
This guy like Propp, he knows something.
And he tried to put in invariant his understanding.
That’s the human, yeah, but see,
you’re putting too much value into Vladimir Propp
knowing something.
No, it is, in the story, what means you know life?
What it means?
You know common sense.
No, no, you know something.
Common sense, it is some rules.
You think so?
Common sense is simply rules?
Common sense is every, it’s mortality,
it’s fear of death, it’s love, it’s spirituality,
it’s happiness and sadness.
All of it is tied up into understanding gravity,
which is what we think of as common sense.
I don’t really need to discuss so wide.
I want to discuss, understand digit recognition.
Anytime I bring up love and death,
you bring it back to digit recognition, I like it.
No, you know, it is durable because there is a challenge.
Yeah.
Which I see how to solve it.
If I will have a student concentrate on this work,
I will suggest something to solve.
You mean handwritten record?
Yeah, it’s a beautifully simple, elegant, and yet.
I think that I know invariants which will solve this.
You do?
I think so, yes.
But it is not universal, it is maybe,
I want some universal invariants
which are good not only for digit recognition,
for image understanding.
So let me ask, how hard do you think
is 2D image understanding?
So if we, we can kind of intuit handwritten recognition.
How big of a step, leap, journey is it from that?
If I gave you good, if I solved your challenge
for handwritten recognition,
how long would my journey then be from that
to understanding more general, natural images?
Immediately, you will understand this
as soon as you will make a record.
Because it is not for free.
As soon as you will create several invariants
which will help you to get the same performance
that the best neural net did using 100,
there might be more than 100 times less examples,
you have to have something smart to do that.
And you’re saying?
That is invariant, it is predicate.
Because you should put some idea how to do that.
But okay, let me just pause.
Maybe it’s a trivial point, maybe not.
But handwritten recognition feels like a 2D,
two dimensional problem.
And it seems like how much complicated is the fact
that most images are projection of a three dimensional world
onto a 2D plane.
It feels like for a three dimensional world,
we need to start understanding common sense
in order to understand an image.
It’s no longer visual shape and symmetry.
It’s having to start to understand concepts
of, understand life.
Yeah, you’re talking that there are different invariant,
different predicate, yeah.
And potentially much larger number.
You know, maybe, but let’s start from simple.
Yeah, but you said that it would be immediate.
No, you know, I cannot think about things
which I don’t understand.
This I understand, but I’m sure that I don’t understand
everything there.
Yeah, that’s the difference.
Do as simple as possible, but not simpler.
And that is exact case.
With handwritten.
Yeah, but that’s the difference between you and I.
I welcome and enjoy thinking about things
I completely don’t understand.
Because to me, it’s a natural extension
without having solved handwritten recognition
to wonder how difficult is the next step
of understanding 2D, 3D images.
Because ultimately, while the science of intelligence
is fascinating, it’s also fascinating to see
how that maps to the engineering of intelligence.
And recognizing handwritten digits is not,
doesn’t help you, it might, it may not help you
with the problem of general intelligence.
We don’t know.
It’ll help you a little bit.
We don’t know how much.
It’s unclear.
Yeah.
It might very much.
But I would like to make a remark.
Yes.
I start not from very primitive problem,
make a challenge problem.
I start with very general problem, with PLATO.
So you understand, and it comes from PLATO
to digit recognition.
So you basically took PLATO and the world
of forms and ideas and mapped and projected
into the clearest, simplest formulation
of that big world.
You know, I would say that I did not understand PLATO
until recently, and until I consider
the convergence and then predicate,
and then, oh, this is what PLATO told.
So.
Can you linger on that?
Like why, how do you think about this world of ideas
and world of things in PLATO?
No, it is metaphor.
It is.
It’s a metaphor, for sure.
Yeah.
It’s a compelling, it’s a poetic
and a beautiful metaphor.
Yeah, yeah, yeah.
But what, can you?
But it is a way how you should try to understand
how to talk ideas in the world.
So from my point of view,
it is very clear, but it is lying.
All the time, people looking for that.
Say, PLATO, then Hegel, whatever reasonable it exists,
whatever exists, it is reasonable.
I don’t know what he have in mind reasonable.
Right, this philosophers again,
their words. No, no, no, no, no, no, no.
It is next stop of Wigner.
That mathematics understand something of reality.
It is the same PLATO line.
And then it comes suddenly to Vladimir Propp.
Look, 31 ideas, 31 units, and this corrects everything.
There’s abstractions, ideas that represent our world.
Our world, and we should always try to reach into that.
Yeah, but you should make a projection on reality.
But understanding is, it is abstract ideas.
You have in your mind several abstract ideas
which you can apply to reality.
And reality in this case,
so if you look at machine learning as data.
This example, data.
Data.
Okay, let me put this on you
because I’m an emotional creature.
I’m not a mathematical creature like you.
I find compelling the idea,
forget the space, the sea of functions.
There’s also a sea of data in the world.
And I find compelling that there might be,
like you said, teacher,
small examples of data that are most useful
for discovering good,
whether it’s predicates or good functions,
that the selection of data may be a powerful journey,
a useful, you know, coming up with a mechanism
for selecting good data might be useful too.
Do you find this idea of finding the right data set
interesting at all?
Or do you kind of take the data set as a given?
I think that it is, you know, my theme is very simple.
You have huge set of functions.
If you will apply, and you have not too many data,
if you pick up function which describes this data,
you will do not very well.
You will.
Like randomly pick up.
Yeah, you will overfit.
Yeah, it will be overfitting.
So you should decrease set of function
from which you’re picking up one.
So you should go somehow to admissible set of function.
And this, what about weak conversions?
So, but from another point of view,
to make admissible set of function,
you need just a DG, just function
which you will take in inner product,
which you will measure property of your function.
And that is how it works.
No, I get it, I get it, I understand it,
but do you, the reality is.
But let’s think about examples.
You have huge set of function,
and you have several examples.
If you just trying to keep, take function
which satisfies these examples, you still will overfit.
You need decrease, you need admissible set of function.
Absolutely, but what, say you have more data than functions.
So sort of consider the, I mean,
maybe not more data than functions,
because that’s impossible.
But what, I was trying to be poetic for a second.
I mean, you have a huge amount of data,
a huge amount of examples.
But amount of function can be even bigger.
It can get bigger, I understand.
Everything is.
There’s always a bigger boat.
Full Hilbert space.
I got you, but okay.
But you don’t find the world of data
to be an interesting optimization space.
Like the optimization should be in the space of functions.
Creating admissible set of functions.
Admissible set of functions.
No, you know, even from the classical business theory,
from structure risk minimization,
you should organize function in the way
that they will be useful for you.
Right.
And that is admissible set.
The way you’re thinking about useful
is you’re given a small set of examples.
Useful small, small set of function
which contain function I’m looking for.
Yeah, but looking for based on
the empirical set of small examples.
Yeah, but that is another story.
I don’t touch it.
Because I believe that this small examples
is not too small.
Say 60 per class.
Law of large numbers works.
I don’t need uniform law.
The story is that in statistics there are two law.
Law of large numbers and uniform law of large numbers.
So I want to be in situation where I use
law of large numbers but not uniform law of large numbers.
Right, so 60 is law of large, it’s large enough.
I hope, no, it still need some evaluations,
some bonds.
But the idea is the following that
if you trust that
say this average gives you something close to expectations
so you can talk about that, about this predicate.
And that is basis of human intelligence.
Good predicates is the,
the discovery of good predicates is the basis of human intelligence.
It is discoverer of your understanding world.
Of your methodology of understanding world.
Because you have several function
which you will apply to reality.
Can you say that again?
So you’re…
You have several functions predicate.
But they’re abstract.
Yes.
Then you will apply them to reality, to your data.
And you will create in this way predicate.
Which is useful for your task.
But predicate are not related specifically to your task.
To this your task.
It is abstract functions.
Which being applying, applied to…
Many tasks that you might be interested in.
It might be many tasks, I don’t know.
Or…
Different tasks.
Well they should be many tasks, right?
I believe like, like in prop case.
It was for fairytales, but it’s happened everywhere.
Okay, so we talked about images a little bit.
But, can we talk about Noam Chomsky for a second?
No, I believe I…
I don’t know him very well.
Personally, well…
Not personally, I don’t know.
His ideas.
Well let me just say,
do you think language, human language,
is essential to expressing ideas?
As Noam Chomsky believes.
So like, language is at the core
of our formation of predicates.
The human language.
For me, language and all the story of language
is very complicated.
I don’t understand this.
And I am not…
I thought about…
Nobody does.
I am not ready to work on that.
Because it’s so huge.
It is not for me, and I believe not for our century.
The 21st century.
Not for 21st century.
You should learn something, a lot of stuff,
from simple task like digit recognition.
So you think, okay, you think digital recognition,
2D image, how would you more abstractly define
digit recognition?
It’s 2D image, symbol recognition, essentially.
I mean, I’m trying to get a sense,
sort of thinking about it now,
having worked with MNIST forever,
how small of a subset is this
of the general vision recognition problem
and the general intelligence problem?
Is it…
Yeah.
Is it a giant subset?
Is it not?
And how far away is language?
You know, let me refer to Einstein.
Take the simplest problem, as simple as possible,
but not simpler.
And this is challenge, this simple problem.
But it’s simple by idea, but not simple to get it.
When you will do this, you will find some predicate,
which helps it a bit.
Well, yeah, I mean, with Einstein, you can,
you look at general relativity,
but that doesn’t help you with quantum mechanics.
That’s another story.
You don’t have any universal instrument.
Yes, so I’m trying to wonder which space we’re in,
whether handwritten recognition is like general relativity,
and then language is like quantum mechanics.
So you’re still gonna have to do a lot of mess
to universalize it.
But I’m trying to see,
so what’s your intuition why handwritten recognition
is easier than language?
Just, I think a lot of people would agree with that,
but if you could elucidate sort of the intuition of why.
I don’t know, no, I don’t think in this direction.
I just think in directions that this is problem,
which if we will solve it well,
we will create some abstract understanding of images.
Maybe not all images.
I would like to talk to guys who doing in real images
in Columbia University.
What kind of images, unreal?
Real images.
Yeah, what they’re ready, is there a predicate,
what can be predicate?
I still symmetry will play role in real life images,
in any real life images, 2D images.
Let’s talk about 2D images.
Because that’s what we know.
A neural network was created for 2D images.
So the people I know in vision science, for example,
the people who study human vision,
that they usually go to the world of symbols
and like handwritten recognition,
but not really, it’s other kinds of symbols
to study our visual perception system.
As far as I know, not much predicate type of thinking
is understood about our vision system.
They did not think in this direction.
They don’t, yeah, but how do you even begin
to think in that direction?
That’s a, I would like to discuss with them.
Yeah.
Because if we will be able to show that it is what working,
and theoretical scheme, it’s not so bad.
So the unfortunate, so if we compare to language,
language is like letters, finite set of letters,
and a finite set of ways you can put together those letters.
So it feels more amenable to kind of analysis.
With natural images, there is so many pixels.
No, no, no, letter, language is much, much more complicated.
It’s involved a lot of different stuff.
It’s not just understanding of very simple class of tasks.
I would like to see list of task with language involved.
Yes, so there’s a lot of nice benchmarks now
in natural language processing from the very trivial,
like understanding the elements of a sentence,
to question answering, to much more complicated
where you talk about open domain dialogue.
The natural question is, with handwritten recognition,
is really the first step of understanding
visual information.
Right.
But even our records show that we go in the wrong direction
because we need 60,000 digits.
So even this first step, so forget about talking
about the full journey, this first step
should be taking in the right direction.
No, no, wrong direction because 60,000 is unacceptable.
No, I’m saying it should be taken in the right direction
because 60,000 is not acceptable.
If you can talk, it’s great, we have half percent of error.
And hopefully the step from doing hand recognition
using very few examples, the step towards what babies do
when they crawl and understand their physical environment.
I know you don’t know about babies.
If you will do from very small examples,
you will find principles which are different
from what we’re using now.
And so it’s more or less clear.
That means that you will use weak convergence,
not just strong convergence.
Do you think these principles
will naturally be human interpretable?
Oh, yeah.
So like when we’ll be able to explain them
and have a nice presentation to show
what those principles are, or are they very,
going to be very kind of abstract kinds of functions?
For example, I talked yesterday about symmetry.
Yes.
And I gave very simple examples.
The same will be like that.
You gave like a predicate of a basic for?
For symmetries.
Yes, for different symmetries and you have for?
Degree of symmetries, that is important.
Not just symmetry.
Existence doesn’t exist, degree of symmetry.
Yeah, for handwritten recognition.
No, it’s not for handwritten, it’s for any images.
But I would like apply to handwritten.
Right, in theory it’s more general, okay, okay.
So a lot of the things we’ve been talking about
falls, we’ve been talking about philosophy a little bit,
but also about mathematics and statistics.
A lot of it falls into this idea,
a universal idea of statistical theory of learning.
What is the most beautiful and sort of powerful
or essential idea you’ve come across,
even just for yourself personally in the world
of statistics or statistic theory of learning?
Probably uniform convergence, which we did
with Alexei Chilvonenkis.
Can you describe universal convergence?
You have law of large numbers.
So for any function, expectation of function,
average of function converged to expectation.
But if you have set of functions,
for any function it is true.
But it should converge simultaneously
for all set of functions.
And for learning, you need uniform convergence.
Just convergence is not enough.
Because when you pick up one which gives minimum,
you can pick up one function which does not converge
and it will give you the best answer for this function.
So you need uniform convergence to guarantee learning.
So learning does not rely on trivial law of large numbers,
it relies on universal law.
But idea of convergence exists in statistics for a long time.
But it is interesting that as I think about myself,
how stupid I was 50 years, I did not see weak convergence.
I work on strong convergence.
But now I think that most powerful is weak convergence.
Because it makes admissible set of functions.
And even in all proverbs,
when people try to understand recognition about dog law,
looks like a dog and so on, they use weak convergence.
People in language, they understand this.
But when we’re trying to create artificial intelligence,
we want event in different way.
We just consider strong convergence arguments.
So reducing the set of admissible functions,
you think there should be effort put into understanding
the properties of weak convergence?
You know, in classical mathematics, in Gilbert space,
there are only two ways,
two form of convergence, strong and weak.
Now we can use both.
That means that we did everything.
And it so happened that when we use Hilbert space,
which is very rich space, space of continuous functions,
which has integral and square.
So we can apply weak and strong convergence for learning
and have closed form solution.
So for computationally simple.
For me, it is sign that it is right way.
Because you don’t need any heuristic here,
just do whatever you want.
But now the only what left is this concept
of what is predicate, but it is not statistics.
By the way, I like the fact that you think that heuristics
are a mess that should be removed from the system.
So closed form solution is the ultimate goal.
No, it so happened that when you’re using right instrument,
you have closed form solution.
Do you think intelligence, human level intelligence,
when we create it,
will have something like a closed form solution?
You know, now I’m looking on bounds,
which I gave bounds for convergence.
And when I’m looking for bounds,
I’m thinking what is the most appropriate kernel
for this bound would be.
So we know that in say,
all our businesses, we use radial basis function.
But looking on the bound,
I think that I start to understand that maybe
we need to make corrections to radial basis function
to be closer to work better for this bounds.
So I’m again trying to understand what type of kernel
have best approximation,
best fit to this bound.
Sure, so there’s a lot of interesting work
that could be done in discovering better functions
than radial basis functions for bounds you find.
It still comes from,
you’re looking to mass and trying to understand what.
From your own mind, looking at the, I don’t know.
Then I’m trying to understand what will be good for that.
Yeah, but to me, there’s still a beauty.
Again, maybe I’m a descendant of Alan Turing to heuristics.
To me, ultimately, intelligence will be a mess of heuristics.
And that’s the engineering answer, I guess.
Absolutely.
When you’re doing say, self driving cars,
the great guy who will do this.
It doesn’t matter what theory behind that.
Who has a better feeling how to apply it.
But by the way, it is the same story about predicates.
Because you cannot create rule for,
situation is much more than you have rule for that.
But maybe you can have more abstract rule
than it will be less literal.
It is the same story about ideas
and ideas applied to specific cases.
But still you should reach.
You cannot avoid this.
Yes, of course.
But you should still reach for the ideas
to understand the science.
Okay, let me kind of ask, do you think neural networks
or functions can be made to reason?
So what do you think, we’ve been talking about intelligence,
but this idea of reasoning,
there’s an element of sequentially disassembling,
interpreting the images.
So when you think of handwritten recognition, we kind of think
that there’ll be a single, there’s an input and output.
There’s not a recurrence.
What do you think about sort of the idea of recurrence,
of going back to memory and thinking through this
sort of sequentially mangling the different representations
over and over until you arrive at a conclusion?
Or is ultimately all that can be wrapped up into a function?
No, you’re suggesting that let us use this type of algorithm.
When I started thinking, I first of all,
starting to understand what I want.
Can I write down what I want?
And then I’m trying to formalize.
And when I do that, I think I have to solve this problem.
And till now I did not see a situation where you need recurrence.
But do you observe human beings?
Yeah.
You try to, it’s the imitation question, right?
It seems that human beings reason
this kind of sequentially sort of,
does that inspire in you a thought that we need to add that
into our intelligence systems?
You’re saying, okay, I mean, you’ve kind of answered saying
until now I haven’t seen a need for it.
And so because of that, you don’t see a reason
to think about it.
You know, most of things I don’t understand.
In reasoning in human, it is for me too complicated.
For me, the most difficult part is to ask questions,
to good questions, how it works,
how people asking questions, I don’t know this.
You said that machine learning is not only
about technical things, speaking of questions,
but it’s also about philosophy.
So what role does philosophy play in machine learning?
We talked about Plato, but generally thinking
in this philosophical way, does it have,
how does philosophy and math fit together in your mind?
First ideas and then their implementation.
It’s like predicate, like say admissible set of functions.
It comes together, everything.
Because the first iteration of theory was done 50 years ago.
I told that, this is theory.
So everything’s there, if you have data you can,
and your set of function has not big capacity.
So low VC dimension, you can do that.
You can make structural risk minimization, control capacity.
But you was not able to make admissible set of function good.
Now when suddenly realize that we did not use
another idea of convergence, which we can,
everything comes together.
But those are mathematical notions.
Philosophy plays a role of simply saying
that we should be swimming in the space of ideas.
Let’s talk what is philosophy.
Philosophy means understanding of life.
So understanding of life, say people like Plata,
they understand on very high abstract level of life.
So, and whatever I doing,
just implementation of my understanding of life.
But every new step, it is very difficult.
For example, to find this idea
that we need big convergence was not simple for me.
So that required thinking about life a little bit.
Hard to trace, but there was some thought process.
I’m working, I’m thinking about the same problem
for 50 years or more, and again, and again, and again.
I’m trying to be honest and that is very important.
Not to be very enthusiastic, but concentrate
on whatever we was not able to achieve, for example.
And understand why.
And now I understand that because I believe in math,
I believe that in Wigner’s idea.
But now when I see that there are only two way
of convergence and we’re using both,
that means that we must do as well as people doing.
But now, exactly in philosophy
and what we know about predicate,
how we understand life, can we describe as a predicate.
I thought about that and that is more or less obvious
level of symmetry.
But next, I have a feeling,
it’s something about structures.
But I don’t know how to formulate,
how to measure measure of structure and all this stuff.
And the guy who will solve this challenge problem,
then when we were looking how he did it,
probably just only symmetry is not enough.
But something like symmetry will be there.
Structure will be there.
Oh yeah, absolutely.
Symmetry will be there and level of symmetry will be there.
And level of symmetry, antisymmetry, diagonal, vertical.
And I even don’t know how you can use
in different direction idea of symmetry, it’s very general.
But it will be there.
I think that people very sensitive to idea of symmetry.
But there are several ideas like symmetry.
As I would like to learn.
But you cannot learn just thinking about that.
You should do challenging problems
and then analyze them, why it was able to solve them.
And then you will see.
Very simple things, it’s not easy to find.
But even with talking about this every time.
I was surprised, I tried to understand.
These people describe in language
strong convergence mechanism for learning.
I did not see, I don’t know.
But weak convergence, this dark story
and story like that when you will explain to kid,
you will use weak convergence argument.
It looks like it does like it does that.
But when you try to formalize, you’re just ignoring this.
Why, why 50 years from start of machine learning?
And that’s the role of philosophy, thinking about life.
I think that maybe, I don’t know.
Maybe this is theory also, we should blame for that
because empirical risk minimization and all this stuff.
And if you read now textbooks,
they just about bound about empirical risk minimization.
They don’t looking for another problem like admissible set.
But on the topic of life, perhaps we,
you could talk in Russian for a little bit.
What’s your favorite memory from childhood?
Oh, music.
How about, can you try to answer in Russian?
Music?
It was very cool when…
What kind of music?
Classic music.
What’s your favorite?
Well, different composers.
At first, it was Vivaldi, I was surprised that it was possible.
And then when I understood Bach, I was absolutely shocked.
By the way, from him I think that there is a predicate,
like a structure.
In Bach?
Well, of course.
Because you can just feel the structure.
And I don’t think that different elements of life
are very much divided, in the sense of predicates.
Everywhere structure, in painting structure,
in human relations structure.
Here’s how to find these high level predicates, it’s…
In Bach and in life, everything is connected.
Now that we’re talking about Bach,
let’s switch back to English,
because I like Beethoven and Chopin, so…
Well, Chopin, it’s another amusing story.
But Bach, if we talk about predicates,
Bach probably has the most sort of
well defined predicates that underlie it.
It is very interesting to read what critics
are writing about Bach, which words they’re using.
They’re trying to describe predicates.
And then Chopin, it is very different vocabulary,
very different predicates.
And I think that if you will make collection of that,
so maybe from this you can describe predicate
for digit recognition as well.
From Bach and Chopin.
No, no, no, not from Bach and Chopin.
From the critic interpretation of the music, yeah.
When they’re trying to explain you music, what they use.
As they use, they describe high level ideas
of platos ideas, what behind this music.
That’s brilliant.
So art is not self explanatory in some sense.
So you have to try to convert it into ideas.
It is ill post problems.
When you go from ideas to the representation,
it is easy way.
But when you’re trying to go Bach, it is ill post problems.
But nevertheless, I believe that when you’re looking
from that, even from art, you will be able to find
predicates for digit recognition.
That’s such a fascinating and powerful notion.
Do you ponder your own mortality?
Do you think about it?
Do you fear it?
Do you draw insight from it?
About mortality, no, yeah.
Are you afraid of death?
Not too much, not too much.
It is pity that I will not be able to do something
which I think I have a feeling to do that.
For example, I will be very happy to work with guys
theoretician from music to write this collection
of description, how they describe music,
how they use that predicate, and from art as well.
Then take what is in common and try to understand
predicate which is absolute for everything.
And then use that for visual recognition
and see if there is a connection.
Yeah, exactly.
Ah, there’s still time.
We got time.
Ha ha ha ha.
Yeah.
We got time.
It take years and years and years.
Yes, yeah, it’s a long way.
Well, see, you’ve got the patient mathematicians mind.
I think it could be done very quickly and very beautifully.
I think it’s a really elegant idea.
Yeah, but also.
Some of many.
Yeah, you know, the most time,
it is not to make this collection to understand
what is the common to think about that once again
and again and again.
Again and again and again, but I think sometimes,
especially just when you say this idea now,
even just putting together the collection
and looking at the different sets of data,
language, trying to interpret music,
criticize music, and images,
I think there’ll be sparks of ideas that’ll come.
Of course, again and again, you’ll come up with better ideas,
but even just that notion is a beautiful notion.
I even have some example.
Yes, so I have friend
who was specialist in Russian poetry.
She is professor of Russian poetry.
He did not write poems,
but she know a lot of stuff.
She make book, several books,
and one of them is a collection of Russian poetry.
She have images of Russian poetry.
She collect all images of Russian poetry.
And I ask her to do following.
You have NIPS, digit recognition,
and we get 100 digits,
or maybe less than 100.
I don’t remember, maybe 50 digits.
And try from poetical point of view,
describe every image which she see,
using only words of images of Russian poetry.
And she did it.
And then we tried to,
I call it learning using privileged information.
I call it privileged information.
You have on two languages.
One language is just image of digit,
and another language, poetic description of this image.
And this is privileged information.
And there is an algorithm when you’re working
using privileged information, you’re doing better.
Much better, so.
So there’s something there.
Something there.
And there is a, in NEC,
she unfortunately died.
The collection of digits
in poetic descriptions of these digits.
Yeah.
So there’s something there in that poetic description.
But I think that there is a abstract ideas
on the plot of level of ideas.
Yeah, that they’re there.
That could be discovered.
And music seems to be a good entry point.
But as soon as we start with this challenge problem.
The challenge problem.
Listen.
It immediately connected to all this stuff.
Especially with your talk and this podcast,
and I’ll do whatever I can to advertise it.
It’s such a clean, beautiful Einstein like formulation
of the challenge before us.
Right.
Let me ask another absurd question.
We talked about mortality.
We talked about philosophy of life.
What do you think is the meaning of life?
What’s the predicate for mysterious existence here on earth?
I don’t know.
It’s very interesting how we have,
in Russia, I don’t know if you know the guy Strugatsky.
They are writing fiction.
They’re thinking about human, what’s going on.
And they have idea that there are developing
two type of people, common people and very smart people.
They just started.
And these two branches of people will go
in different direction very soon.
So that’s what they’re thinking about that.
So the purpose of life is to create two paths.
Two paths.
Of human societies.
Yes.
Simple people and more complicated people.
Which do you like best?
The simple people or the complicated ones?
I don’t know that it is just his fantasy,
but you know, every week we have guy
who is just a writer and also a theorist of literature.
And he explain how he understand literature
and human relationship.
How he see life.
And I understood that I’m just small kids
comparing to him.
He’s very smart guy in understanding life.
He knows this predicate.
He knows big blocks of life.
I am used every time when I listen to him.
And he just talking about literature.
And I think that I was surprised.
So the managers in big companies,
most of them are guys who study English language
and English literature.
So why?
Because they understand life.
They understand models.
And among them,
maybe many talented critics just analyzing this.
And this is big science like property.
This is blocks.
That’s very smart.
It amazes me that you are and continue to be humbled
by the brilliance of others.
I’m very modest about myself.
I see so smart guys around.
Well, let me be immodest for you.
You’re one of the greatest mathematicians,
statisticians of our time.
It’s truly an honor.
Thank you for talking again.
And let’s talk.
It is not.
I know my limits.
Let’s talk again when your challenge is taken on
and solved by grad student.
Especially when they use it.
It happens.
Maybe music will be involved.
Latimer, thank you so much.
It’s been an honor. Thank you very much.
Thanks for listening to this conversation
with Latimer Vapnik.
And thank you to our presenting sponsor, Cash App.
Download it, use code LexPodcast.
You’ll get $10 and $10 will go to FIRST,
an organization that inspires and educates young minds
to become science and technology innovators of tomorrow.
If you enjoy this podcast, subscribe on YouTube,
give us five stars on Apple Podcast,
support it on Patreon,
or simply connect with me on Twitter at Lex Friedman.
And now, let me leave you with some words
from Latimer Vapnik.
When solving a problem of interest,
do not solve a more general problem
as an intermediate step.
Thank you for listening.
I hope to see you next time.