🎁Amazon Prime 💗The Drop 📖Kindle Unlimited 🎧Audible Plus 🎵Amazon Music Unlimited 🌿iHerb 💰Binance
The following is a conversation with Travis Oliphant,
one of the most impactful programmers
and data scientists ever.
He created NumPy, SciPy, and Anaconda.
NumPy formed the foundation
of tensor based machine learning in Python,
SciPy formed the foundation
of scientific programming in Python,
and Anaconda, specifically with Conda,
made Python more accessible to a much larger audience.
Travis’s life work across a large number of programming
and entrepreneurial efforts has and will continue
to have immeasurable impact on millions of lives
by empowering scientists and engineers
in big companies, small companies,
and open source communities to take on difficult problems
and solve them with the power of programming.
Plus, he’s a truly kind human being,
which is something that when combined with vision
and ambition makes for a great leader
and a great person to chat with.
To support this podcast,
please check out our sponsors in the description.
This is the Lex Friedman Podcast,
and here is my conversation with Travis Oliphant.
What was the first computer program you’ve ever written?
Do you remember?
Whoa, that’s a good question.
I think it was in fourth grade.
Just a simple loop in BASIC.
BASIC. BASIC, yeah, on an Atari 800,
Atari 400, I think, or maybe it was an Atari 800.
It was a part of a class,
and we just were just BASIC loops to print things out.
Did you use go to statements?
Yes, yes, we used go to statements.
I remember in the early days,
that’s when I first realized
there’s like principles to programming,
when I was told that don’t use go to statements.
Those are bad software engineering principles,
like it goes against what great, beautiful code is.
I was like, oh, okay, there’s rules to this game.
I didn’t see that until high school
when I took an AP computer science course.
I did a lot of other kinds of just programming in TI,
but finally, when I took an AP computer science course
in Pascal.
Wow.
That’s, yeah, it was Pascal.
That’s when I, oh, there are these principles.
Not C or C++?
No, I didn’t take C until the next year in college.
I had a course in C, but I haven’t done much in Pascal,
just that AP computer science course.
Now, sorry for the romanticized question,
but when did you first fall in love with programming?
Oh, man, good question.
I think actually when I was 10,
my dad got us a TI Timex Sinclair,
and he was excited about the spreadsheet capability,
and then, but I made him get the basic,
the add ons we could actually program in basic,
and just being able to write instructions
and have the computer do something.
Then we got a TI 994A when I was about 12,
and I would just, it had sprites and graphics and music.
You could actually program it to do music.
That’s when I really sort of fell in love with programming.
So this is a full, like a real computer
with like, with memory and storage,
processors and whatnot,
because you say TI. Yeah, the Timex Sinclair
was one of the very first, it was a cheap, cheap,
like, I think it was, well, it was still expensive,
but it was 2K of memory.
We got the 16K add on pack,
but yeah, it had memory, and you could program it.
You had the, in order to store your programs,
you had to attach a tape drive.
Remember that old, the sound that would play
when you converted the modems would convert digital bits
to audio files set on a tape drive.
Still remember that sound, but that was the storage.
And what was the programming language, do you remember?
It was basic. It was basic.
And then they had a VisiCalc,
and so a little bit of spreadsheet programming
in VisiCalc, but mostly just some basic.
Do you remember what kind of things drew you to programming?
Was it working with data, was it video games?
Games, math, mathy stuff?
Yeah, I’ve always loved math,
and a lot of people think they don’t like math
because I think when they’re exposed to it early,
it’s about memory.
When you’re exposed to math early,
you have a good short term memory,
can remember his timetables.
And I do have a reasonably, I mean, not perfect,
but a reasonably long little short term memory buffer.
And so I did great at timetables.
I said, oh, I’m good at math.
But I started to really like math,
just the problem solving aspect.
And so computing was problem solving applied.
And so that’s always kind of been the draw,
kind of coupled with the mathematics.
Did you ever see the computer as like an extension
of your mind, like something able to achieve?
Not till later.
Okay.
Yeah, not then.
It’s just like a little set of puzzles
that you can play with and you can play with math puzzles.
Yeah, it was too rudimentary early on.
Like it was sort of, yeah, it was a lot of work
to actually take a thought you’d have
and actually get it implemented.
And that’s still work, but it’s getting easier.
And so yeah, I would say that’s definitely
what’s attracting me to Python
is that that was more real, right?
I could think in Python.
Speaking of foreign language,
I only speak another language fluently besides English,
which is Spanish.
And I remember the day when I would dream in Spanish
and you start to think in that language.
And then you actually, I do definitely believe
that language limits or expands your thinking.
There’s some languages that actually lead you
to certain thought processes.
Yeah, like, so I speak Russian fluently
and that’s certainly a language that leads you
down certain thought processes.
Well, yeah, I mean, there’s a history
of the two world wars of millions of people starving
to death or near to death throughout its history
of suffering, of injustice, like this promise sold
to the people and then the carpet
or whatever is swept from under them.
And it’s like broken promises.
And all of that pain and melancholy is in the language,
the sad songs, the sad hopeful songs,
the over romanticized, like, I love you, I hate you,
the sort of the swings between all the various spectrums
of emotion, so that’s all within the language.
The way it’s twisted, there’s a strong culture
of rhyming poetry, so like the bards,
like the sync, there’s a musicality to the language too.
Did Dostoevsky write in Russian?
Yeah, so like Dostoevsky, Tostoy, all the,
all the.
The ones that I know about, which are translated
and I’m curious how the translations.
So Dostoevsky did not use the musicality
of the language too much.
So it actually translates pretty well
because it’s so philosophically dense
that the story does a lot of the work,
but there’s a bunch of things that are untranslatable.
Certainly the poetry is not translatable.
I actually have a few conversations coming up offline
and also in this podcast with people
who’ve translated Dostoevsky.
And that’s for people who worked, who work in this field,
know how difficult that is.
Sometimes you can spend months thinking
about a single sentence, right?
In context, like, cause there’s just the magic
captured by that sentence and how do you translate
just in the right way?
Because those words can be really powerful.
There’s a famous line,
beauty will save the world from Dostoevsky.
You know, there’s so many ways to translate that.
And you’re right, the language gives you the tools
with which to tell the story,
but it also leads your mind down certain trajectories
and paths to where over time,
as you think in that language,
you become a different human being.
Yes. Yeah.
Yeah, that’s a fascinating reality, I think.
I know people have explored that,
but it’s just rediscovered.
Well, we don’t, we live in our own like little pockets.
Like this is the sad thing is I feel like unfortunately,
given time and given getting older,
I’ll never know China, the Chinese world,
because I don’t truly know the language.
Same with Japanese, I don’t truly know Japanese
and Portuguese and Brazil,
that whole South American continent.
Like, yeah, I’ll go to Brazil and Argentina,
but will I truly understand the people
if I don’t understand the language?
It’s sad because I wonder how much,
how many geniuses were missing
because so much of the scientific world,
so much of the technical world is in English,
and so much of it might be lost
because it’s just we don’t have the common language.
I completely agree.
I’m very much in that vein of there’s a lot of genius
out there that we miss,
and it’s sort of fortunate when it bubbles up
into something that we can understand or process,
there’s a lot we miss.
So I tend to lean towards really loving democratization
or things that empower people
or very resistant sort of authoritarian structures.
Fundamentally for that reason,
well, several reasons, but it just hurts us.
We’re soft.
So speaking of languages that empower you,
so Python was the first language for me
that I really enjoyed thinking in, as you said.
Sounds like you shared my experience too.
So when did you first,
do you remember when you first kind of connected with Python,
maybe even fell in love with Python?
It’s a good question.
It was a process.
It took about a year.
I first encountered Python in 1997.
I was a graduate student studying biomedical engineering
at the Mayo Clinic.
And I had previously,
I’d been involved in taking information from satellites.
I was an electrical engineering student
used to taking information
and trying to get something out of it,
doing some data processing, getting information out of it.
And I’d done that in MATLAB.
I’d done that in Perl.
I’d done that in scripting on a VMS.
There’s actually a VAX VMS system,
they had their own little scripting tools around Fortran.
Done a lot of that.
And then as a graduate student,
I was looking for something and encountered Python.
And because Python had an array,
had two things that made me not filter it away.
Because I was filtering a bunch of stuff,
as Yorick, I looked at Yorick,
I looked at a few other languages that are out there
at the time in 1997, but it had arrays.
There’s a library called Numeric
that had just been written in 95,
like not very, not too much earlier.
By an MIT alum, Jim Huganen.
You know, and I went back and read the mailing list
to see the history of how it grew.
And there was a very interesting,
it’s fascinating to do that actually,
to see how this emergent cooperation,
unstructured cooperation happens in the open source world
that led to a lot of this collective programming,
which is something maybe we might get into a little later,
but what that looks like.
What gap did Numeric fill?
Numeric filled the gap of having an array object.
There was no array object.
There was no array.
There was a one dimensional byte concept,
but there was no n dimensional,
two, three, four dimensional tensor they call it now.
I’m still in the category that a tensor is another thing
and it’s just an ndarray we should call it,
but kind of lost that battle.
There’s many battles in this world,
some of which we win, some we lose.
That’s exactly right.
So, but it had no math to it.
So Numeric had math and a basic way to think in arrays.
So I was looking for that,
and it had complex numbers,
a lot of programming languages.
And you can see it because,
if you’re just a computer scientist,
you think, ah, complex numbers are just two floats.
So you can, people can build that on.
But in practice, a complex number
as one of the significant algebras
that helps connect a lot of physical
and mathematical ideas,
particularly FFT for an electrical engineer.
And it’s a really important concept
and not having it means you have to develop it
several times and those times may not share an approach.
One of the common things in programming,
one of the things programming enables is abstractions.
But when you have shared abstractions, it’s even better.
It sort of gets to the level of language
of actually we all think of this the same way,
which is both powerful and dangerous, right?
Because powerful in that we now can quickly
make bigger and higher level things
on top of those abstractions dangerous
because it also limits us as to the things
we maybe left behind in producing that abstraction,
which is at the heart of programming today
and actually building around the programming world.
I think it’s a fascinating philosophical topic.
Yeah, they will continue for many years, I think.
They’ll continue for many years.
As we build more and more and more abstractions.
Yes, I often think about, you know,
we have a world that’s built on these abstractions
that were they the only ones possible?
Certainly not, but they led to,
you know, it’s very hard to do it differently.
Like there’s an inertia that’s very hard to,
you know, push out, push away from.
That has implications for things like,
you know, the Julia language,
which you have heard of, I’m sure.
And I’ve met the creators and I liked Julia.
It’s a really cool language,
but they struggled to kind of against the,
just the tide of like this inertia of people using Python.
And, you know, there’s strategies to approach that,
but nonetheless, it’s a phenomena.
And sometimes, so I love complex numbers
and I love to raise, so I looked at Python.
And then I had the experience, I did some stuff in Python
and I was just doing my PhD.
So I was out, my focus was on,
I was actually doing a combination of MRI and ultrasound
and looking at a phenomenon called elastography,
which is you push waves into the body
and observe those waves, like you can actually measure them.
And then you do mathematical inversion
to see what the elasticity is.
And so that’s the problem I was solving
is how to do that with both ultrasound and MRI.
I needed some tool to do that with.
So I was starting to use Python in 97.
In 98, I went back, looked at what I’d written
and realized I could still understand it,
which is not the experience I’d had
when doing Perl in 95, right?
I’d done the same thing and then I looked back
and I forgotten what I was even saying.
Now, you know, I’m not saying, so that may,
hey, this may work, I like this.
This is something I can retain
without becoming an expert per se.
And so that led me to go, I’m gonna push more into this.
And then that 98 was kind of when I started
to fall in love with Python, I would say.
A few peculiar things about Python.
So maybe compare it to Perl,
compare it to some of the other languages.
So there’s no braces.
Yeah.
So space is used, indentation, I should say,
is used as part of the language.
Yeah, right.
So did you, I mean, that’s quite a leap.
Were you comfortable with that leap
or were you just very open minded?
It’s a good question.
I was open minded, so I was cognizant of the concern.
And it definitely has, it has specific challenges.
You know, cut and pasting.
For example, when you’re cut and pasting code,
and if your editors aren’t supportive of that,
if you’re putting it into a terminal,
and particularly in the past when terminals
didn’t necessarily have the intelligence to manage it now.
Now, I, Python, and Jupyter Notebooks
handle that just fine, so there’s really no problem.
But in the past, it created some challenges,
formatting challenges, also mixed tabs and spaces.
If editors weren’t, you weren’t clear
on what was happening, you would have these issues.
So there were really concrete reasons about it
that I heard and understood.
I never really encountered a problem with it personally.
Like, it was occasional annoyances,
but I really liked the fact
that it didn’t have all this extra characters, right?
That these extra characters didn’t show up
in my visual field when I was just trying
to process understanding a snippet of code.
Yeah, there’s a cleanness to it.
But, I mean, the idea is supposed to be
that Perl also has a cleanness to it
because of the minimalism of how many characters
it takes to express a certain thing.
So it’s very compact.
But what you realize with that compactness comes,
there’s a culture that prizes compactness,
and so the code gets more and more compact
and less and less readable to a point where it’s like,
like, to be a good programmer in Perl,
you write code that’s basically unreadable.
There’s a culture, like.
Correct, and you’re proud of it.
Yeah, you’re proud of it.
Right, exactly, and it’s like, feels good.
And it’s really selective.
It means you have to be an expert in Perl to understand it.
Whereas Python allowed you not to have to be an expert.
You didn’t have to take all this brain energy.
You could leverage, what I say,
you could leverage your English language center,
which you’re using all the time.
I’ve wondered about other languages,
particularly non Latin based languages.
Latin based languages with the characters are at least similar.
I think people have an easier time,
but I don’t know what it’s like to be a Japanese
or a Chinese person trying to learn different syntax.
Like, what would computer programming look like in that?
I haven’t looked at that at all,
but it certainly doesn’t,
you know, leveraging your Chinese language center,
I’m not sure Python or any programming does that.
But that was a big deal.
The fact that it was accessible, I could be a scientist.
What I really liked is many programming languages
really demand a lot of you, and you can get a lot,
you know, you do a lot if you learn it.
But Python enables you to do a lot
without demanding a lot of you.
There’s nuance to that statement,
but it certainly was, it’s more accessible.
So more people could actually, as a scientist,
as somebody who, or an engineer,
who was trying to solve another problem
besides point programming,
I could still use this language and get things done
and be happy about it.
And I was also comfortable in C at that time.
And MATLAB, you did a little bit of that.
And MATLAB, I did a lot before that, exactly.
So I was comfortable in,
those three languages were really the tools I used
during my studies and schooling.
But to your point about language helping you think,
one of the big things about MATLAB was it was,
and APL before it, I don’t know if you remember APL.
APL is actually the predecessor of array based programming,
which I think is really an underappreciated,
if I talk to people who are just steeped
in computer programming, computer science,
like most of the people that Microsoft has hired
in the past, for example,
Microsoft as a company generally did not understand
array based programming.
Like culturally, they didn’t understand it.
So they kept missing the boat,
kept missing the understanding of what this was.
They’ve gotten better,
but there’s still a whole culture of folks
that doesn’t, programming, that’s systems programming
or web programming or lists and maps.
And what about an n dimensional array?
Oh yeah, that’s just an implementation detail.
Well, you can think that,
but then actually if you have that as a construct,
you actually think differently.
APL was the first language to understand that.
And it was in the sixties, right?
The challenge of APL is APL had very dense,
not only glyphs, like new characters, new glyphs,
but they even had a new keyboard
because to produce those glyphs,
this is back in the early days in computing
when the QWERTY keyboard maybe wasn’t as established,
like, well, we can have a new keyboard, no big deal.
But it was a big deal and it didn’t catch on.
And the language APL, very much like Perl,
as people would pride themselves on how much,
could they write the game of life
in 30 characters of APL.
APL has characters that mean summation
and they have adverbs,
they would have adjectives and these things called adverbs,
which are like methods, like reduction,
reduction would be an adverb on an ad operator, right?
So, but doing, using these tools you could construct
and then you start to think at that level,
you think in n dimensions is something I like to say,
and you start to think differently about data at that point.
Now you’re, it really helps.
Yeah, I mean, outside of programming,
if you really internalize linear algebra as a course,
I mean, it’s philosophically allows you
to think of the world differently.
It’s almost like liberating, you don’t have to,
you don’t have to think about the individual numbers
in the n dimensional array.
You could think of it as an object in itself
and all of a sudden this world can open up.
You’re saying MATLAB and APL were like the early C,
I don’t know if many languages got that right ever.
No, no, no they didn’t.
Even still.
Even still, I would say.
I mean, NumPy is an inheritor of the traditions
that I would say APLJ was another version that was,
what it did is not have the glyphs,
just have short characters,
but still a Latin keyboard could type them.
And then numeric inherited from that
in terms of let’s add arrays plus broadcasting
plus methods, reduction,
even some of the language like rank is a concept
that was in Python and is still in Python
for the number of dimensions, right?
That’s different than say the rank of a matrix
which people think of as well.
So it came from that tradition,
but NumPy is a very pragmatic, practical tool.
NumPy inherited from numeric
and we can get to where NumPy came from
which is the current array,
at least current as of 2015, 2017.
Now there’s a ton of them over the past two or three years.
We can get into that too.
So if we just linger on the early days
of what was your favorite feature of Python?
Do you remember like what?
So it’s so interesting to linger on like the,
what really makes you connect with a language?
I’m not sure it’s obvious to introspect that.
No, it isn’t.
And I’ve thought about that at some length.
I think definitely the fact that I could read it later,
that I could use it productively
without becoming an expert.
Other language I had to put more effort into.
That’s like an empirical observation.
Like you’re not analyzing any one aspect of the language.
It just seems time after time when you look back,
It’s somehow readable.
Then it was sort of, I could take executable English
and translate it to Python more easily.
Like I didn’t have to go, there was no translation layer.
As an engineer or as a scientist,
I could think about what I wanted to do.
And then the syntax wasn’t that far behind it, right?
Now there are some warts there still.
It wasn’t perfect.
Like there’s some areas where I’m like,
ah, it’d be better if this were different
or if this were different.
Some of those things got added to the language too.
I was really grateful for some of the early pioneers
in the Python ecosystem back,
because Python got written in 91.
That’s when the first version came out.
But Guido was very open to users.
And one of the sets of users were people like Jim Huganen
and David Asher and Paul Dubois and Conrad Hinson.
These were people that were on the main list.
And they were just asking for things like,
hey, we really should have complex numbers in this language.
So let’s, you know, there’s a J, there’s a one J, right?
And the fact that they went the engineering route of J
is interesting.
I don’t think that’s entirely favoring engineers.
I think it’s because I is so often used
as the index of a for loop.
So I think that’s actually why.
Probably, I mean, there’s a pragmatic aspect.
But the fact that complex numbers were there, I love that.
The fact that I could write in the array constructs
and that reduction was there,
very simple to write summations and broadcasting was there.
I could do addition of whole arrays.
So that was cool.
Those are some things I loved about it.
I don’t know what to start talking to you about
because you’ve created so many incredible projects
that basically changed the whole landscape of programming.
But okay, let’s start with,
let’s go chronologically with SciPy.
You created SciPy over two decades ago now?
Yes, yes, I love to talk about SciPy.
SciPy was really my baby.
What is it?
What was its goal?
What is its goal?
How does it work?
Yeah, fantastic.
So SciPy was effectively, here I am using Python
to do stuff that I previously used MATLAB to use.
And I was using numeric, which is an array library
that made a lot of it possible.
But there’s things that were missing.
Like I didn’t have an ordinary differential equation solver
I could just call, right?
I didn’t have integration.
Hey, I wanted to integrate this function.
Okay, well, I don’t have just a function
I can call to do that.
These are things I remember being critical things
that I was missing.
Optimization.
I just wanna pass a function to an optimizer
and have it tell me what the optimal value is.
Those are things I’m like, well,
why don’t we just write a library that adds these tools?
And I started to post on the mailing list
and there’d previously been, people have discussed,
I remember Conrad Henson saying,
wouldn’t it be great if we had this optimizer library
or David Ashwood say this stuff.
And I’m a ambitious, ambitious is the wrong word,
an eager and probably more time than sense.
I was a poor graduate student.
My wife thinks I’m working on my PhD and I am,
but part of the PhD that I loved
was the fact that it’s exploratory.
You’re not just taking orders,
fulfilling a list of things to do,
you’re trying to figure out what to do.
And so I thought, well, I’m running tools
for my own use and a PhD,
so I’ll just start this project.
And so in 99, 98 was when I first started
to write libraries for Python.
Definitely when I fell in love with Python 98,
I thought, oh, well, there’s just a few things missing.
Like, oh, I need a reader to read DICOM files.
I was in medical imaging and DICOM was a format
that I want to be able to load that into Python.
Okay, how do I write a reader for that?
So I wrote something called, it was an IO package, right?
And that was my very first extension module, which is C.
So I wrote C code to extend Python
so that in Python I could write things more easily.
That combination kind of hooked me.
It was the idea that I could,
here’s this powerful tool I can use as a scripting language
and a high level language to think about,
but that I can extend easily, easily in C,
easily for me because I knew enough C.
And then Guido had written a link.
I mean, the only, the hard part of extending Python
was something called the way memory management networks,
and you have to do reference counting.
And so there’s a tracking of reference counting
you have to do manually.
And if you don’t, you have memory leaks.
And so that’s hard.
Plus then C, you know, it’s just much more,
you have to put more effort into it.
It’s not just, I have to now think about pointers
and I have to think about stuff that is different.
I have to kind of,
you’re like putting a new cartridge in your brain.
Like, okay, I’m thinking about MRI.
Now I’m thinking about programming.
And there are distinct modules
you end up having to think about.
So it’s harder.
And when I was just in Python,
I could just think about MRI and high level writing,
but I could do that.
And that kind of, I liked it.
I found that to be enjoyable and fun.
And so I ended up, oh,
well, let me just add a bunch of stuff to Python
to do integration.
Well, and the cool thing is,
is that the power of the internet,
just looking around and I found,
oh, there’s this NetLive,
which has hundreds of 4chan routines
that people have written in the 60s and the 70s and the 80s
in 4chan 77, fortunately, it wasn’t 4chan 16.
So it had been ported to 4chan 77.
And 4chan 77 is actually a really great language.
4chan 90 probably is my favorite 4chan
because it’s also, it’s got complex numbers,
got arrays and it’s pretty high level.
Now, the problem with it
is you’d never want to write a program in 4chan 90
or 4chan 77,
but it’s totally fine to write a subroutine in, right?
And so, and then 4chan kind of got a little off course
when they tried to compete with C++.
But at the time,
I just want libraries to do something like,
oh, here’s an ordinary differential equation.
Here’s integration.
Here’s runge cut integration.
Already done.
I don’t have to think about that algorithm.
I mean, you could,
but it’s nice to have somebody who’s already done one
and tested it.
And so I sort of started this journey in 98, really.
If you look back at the mailing list,
there’s sort of this productive era of me
writing an extension module
to connect runge cut integration to Python
and making an ordinary differential equation solver.
And then releasing that as a package.
So we could call ODE pack, I think I called it then.
Quad pack.
And then I just made these packages.
Eventually that became multipack
because they’re originally modular.
You can install them separately.
But a massive problem in Python
was actually just getting your stuff installed.
At the time, releasing software for me,
like today it’s people think, what does that mean?
Well, then it meant some poorly written webpage.
I had some bad webpage up and I put a tarball,
just a GZIP tarball of source code.
That was the release.
But okay, can we just stand that?
Because the community aspect
of creating the package and sharing that, that’s rare.
That, to have, to both have the, at that time,
so like the raw.
Yeah, it was pretty early, yeah.
Oh, well, not rare.
Maybe you can correct me on this,
but it seems like in the scientific community,
so many people, you were basically solving the problems
you needed to solve to process the particular application,
the data that you need.
And to also have the mind
that I’m going to make this usable for others, that’s.
I would say I was inspired.
I’d been inspired by Linux,
been inspired by Linus and him making his code available.
And I was starting to use Linux at the time.
And I went, this is cool.
So I’d kind of been previously primed that way.
And generally I was into science
because I liked the sharing notion.
I liked the idea of, hey, let’s,
if collectively we build knowledge and share it,
we can all be better off.
Okay, so you want to energize by that idea.
So I was energized by that idea already, right?
And I can’t deny that I was.
I’m sort of had this very,
I liked that part of science, that part of sharing.
And then all of a sudden, oh, wait, here’s something.
And here’s something I could do.
And then I slowly over years learned how to share better
so that you could actually engage more people faster.
One of the key things was actually giving people a binary
they could install, right?
So that it wasn’t just your source code, good luck.
Compile this and then.
It’s compiled, ready to install, just, you know.
So in fact, a lot of the journey from 98,
even through 2012 when I started Anaconda was about that.
Like it’s why, you know, it’s really the key
as to why a scientist with dreams of doing MRI research
ended up starting a software company
that installs software.
I work with a few folks now that don’t program
like on the creative side and the video side,
the audio side.
And because my whole life is running on scripts,
I have to try to get them,
I’m having all the task of teaching them
how to do Python enough to run the scripts.
And so I’ve been actually facing this,
whether it’s Anaconda or some with the task of
how do I minimally explain basically to my mom
how to write a Python script.
And it’s an interesting challenge.
I have to, it’s a to do item for me to figure out like,
what is the minimal amount of information I have to teach?
What are the tools you use that one, you enjoy it,
two, you’re effective at it.
And they’re related, those are two related questions.
And then the debugging, like the iterative process
of running the script to figure out what the error is,
maybe even for some people to do the fix yourself.
So do you compile it?
Do you, like how do you distribute that code to them?
And it’s interesting because I think
it’s exactly what you’re talking about.
If you increase the circle of empathy,
the circle of people that are able to use your programs,
you increase it, it’s like effectiveness and it’s power.
And so you have to think, can I write scripts?
Can I write programs that can be used by medical engineers,
by all kinds of people that don’t know programming
and actually maybe plant a seed,
have them catch the bug of programming
so that they start on a journey.
That’s a huge responsibility.
And ultimately it has to do with the Amazon one click buy.
Like how frictionless can you make the early steps?
Frictionless is actually really key.
To go in any community is, any friction point,
you’re just gonna lose some people, right?
Now sometimes you may wanna intentionally do that.
If you’re early enough on, you need a lot of help.
You need people who have the skills.
You might actually, it’s helpful.
You don’t necessarily have too many users
as opposed to contributors if you’re early on.
Anyway, there’s, SciFi started in 98,
but it really emerged as this collection of modules
that I was just putting on the net.
People were downloading and I think I got 100 users, right?
By the end of that year.
But the fact that I got 100 users and more than that,
people started to email me with fixes.
And that was actually intoxicating, right?
That was the, here I’m writing papers
and I’m giving conferences and I get people to say hello,
but yeah, good job.
But mostly it was, you’re viewed with,
it’s competitive, right?
You publish a paper and people are like,
oh, it wasn’t my paper.
I was starting to see that sense of academic life
where it was so much,
I thought there was this cooperative effort,
but it sounds like we’re here just to one up each other.
And it’s not true across the board,
but a lot of that’s there.
But here in this world,
I was getting responses from people all over the world.
I remember Pjaro Peterson in Estonia, right?
Was one of the first people.
And he sent me back this make file,
cause the first thing it is, yeah, your build thing stinks
and here’s a better make file.
Now it was a complex make file.
I don’t think I never understood that make file actually,
but it worked and it did a lot more.
And so I said, thanks, this is cool.
And that was my first kind of engagement
with community development.
But the process was, he sent me a patch file.
I had to upload a new tar ball.
And I just found, I really love that.
And the style back then was here’s a mailing list.
It’s very, it wasn’t as,
it’s certainly weren’t the tools that are available today.
It was very early on, but I really started to,
that’s the whole year.
I think I did about seven packages that year, right?
And then by the end of the year,
I collected them into a thing called multipack.
So in 99, there was this thing called multipack.
And that’s when a high school student,
no, he was a high school student at the time,
guy named Robert Kern,
took that package and made a Windows installer, right?
And then of course, a massive increase of usage.
So by the way, most of this development was under Linux.
Yes, yes, it was on Linux.
I was a Linux developer doing it on a Unix box.
I mean, at the time I was actually getting into,
I had a new hard drive,
did some kernel programming to make the hard drive work.
I mean, not programming, but modification to the kernel
so I could actually get a hard drive working.
I love that aspect of it.
I was also in, at school, I was building a cluster.
I took Mac computers and you put yellow dog Linux on them.
At the Mayo Clinic, they were just,
they had all these Macs that were older,
they were just getting rid of.
And so I kind of got permission to go grab them together.
I put about 24 of them together in a cluster, in a cabinet,
and put yellow dog Linux on them all.
And I wrote a C++ program to do MRI simulation.
That was what I was doing at the same time
for my day job, so to speak.
So I was loving the whole process.
And the same time I was,
oh, I need a ordinary differential equation.
That’s why ordinary differential equations were key
was because that’s the heart of a block equation
for simulating MRI, is an ODE solver.
And so that’s, but I actually did that,
it just happened at the same time.
That’s why it was kind of what you’re working on
and what you’re interested in, they’re coinciding.
I was definitely scratching my own itch
in terms of building stuff.
And which helped in the sense that I was using it for me,
so at least I had one user.
I had one person who was like, well, no, this is better.
I like this interface better.
And I had the experience of MATLAB
to guide some of what those APIs might look like.
But you’re just doing yourself,
you’re building all this stuff.
But with the Windows installer,
it was the first time I realized, oh yeah,
the binary installer really helps people.
And so that led to spending more time
on that side of things.
So around 2000, so I graduated my PhD in 2000,
end of year, end of 2000.
So 99 doing a lot of work there,
98 doing a lot of work there,
99 kind of spending more time on my PhD,
helping people use the tools,
thinking about what do I want to go from here.
There was a company, there was a guy actually,
Eric Jones and Travis Vought.
They were two friends who founded a company called NTHOT.
It’s here in Austin, still here.
And they, Eric contacted me at the time
when I was a graduate student still.
And he said, hey, why don’t you come down?
We want to build a company.
We’re thinking of a scientific company
and we want to take what you’re doing
and kind of add it to some stuff that he’d done.
He’d written some tools.
And then Piero Peterson had done F2Py.
Let’s come together and build,
pull this all together and call it SciPy.
So that’s the origin of the SciPy brand.
It came from multi pack
and a whole bunch of modules I’d written,
plus a few things from some other folks
and then pulled together in a single installer.
SciPy was really a distribution of Python
masquerading as a library.
How did you think about SciPy in context of Python,
in context of Numeric, like what?
So we saw SciPy as a way to make an R&D environment
for Python, like use Python, depended on Numeric.
So Numeric was the array library we depended on.
And then from there, extend it with a bunch of modules
that allowed for, and at the time,
the original vision of SciPy was to have plotting,
was to have the REPL environment
and kind of really a whole data environment
that you could then install and get going with.
And that was kind of the thinking.
It didn’t really evolve that way, right?
It sort of had a, for one,
it’s really hard to do massive scale projects
with open source collectives.
Actually, there’s sort of an intrinsic cooperation limit
as to which, too many cooks in the kitchen,
you can do amazing infrastructure work.
When it comes down to bringing it all together
into a single deliverable,
that actually requires a little more product management
that is not, that doesn’t really emerge
from the same dynamic.
So it struggled, struggled to get almost too many voices.
It’s hard to have everybody agree.
Consensus doesn’t really work at that scale.
You end up with politics,
with the same kind of things that’s happened
in large organizations trying to decide
what to do together.
So consensus building was challenging at scale
as more people came in, right?
Early on, it’s fine, because there’s nobody there.
So it works, but then as you get more successful
and more people use it, all of a sudden,
oh, there’s this scale at which this doesn’t work anymore
and we have to come up with different approaches.
So Sidepy came out officially in 2001,
was the first release, most of the time.
I remember the days of getting that release ready.
It was a Windows installer and there were bugs
on how the Windows compiler handled complex numbers
and you were chasing segmentation faults.
And it was, it’s a lot of work.
There was a lot of effort had nothing to do
with my area of study.
And at the same time, I had just gotten an offer.
So he wondered if I wanted to come down
and help him start that company with his friend.
And at the time I was like, I was intrigued,
but I was squaring a path, an academic path.
And I had just got an offer to go and teach at my alma mater.
So I took that tenure track position.
And Sidepy, and kind of, then I started to work on Sidepy
as a professor too.
So that’s, I left, I’ve got the Mayo Clinic,
graduated, wrote my thesis using Sidepy,
wrote, you know, there’s images that were created.
Now the plotting tool I used was something
from Yorick actually.
It was a plotting, a PLT kind of a plotting language
that I used.
Yorick is a programming language?
It was a programming language, had a plotting tool,
Dyslin, it had integration to Dyslin.
I ended up using Dyslin plus some of the plotting
from Yorick linked to from Python.
Anyway, it was, people don’t plot that way now,
but this is before, and Sidepy was trying to add plotting.
Yeah. Right?
It didn’t have much success.
Really the success of plotting came from John Hunter,
who had a similar experience to my experience,
my kind of maverick experience as a person
just trying to get stuff done and kind of having more time
than money maybe, right?
And John Hunter created what?
MapPlotLib.
He’s the creator of MapPlotLib.
Yeah, so John Hunter was, you know,
he wasn’t a student at the time, but he was an,
he was working in Quant field and he said,
we need better plotting.
So he just went out and said, cool, I’ll make a new project
and we’ll call it MapPlotLib.
And he released in 2001,
about the same time that Sidepy came out
and it was separate library, separate install,
use numeric, Sidepy use numeric.
And so Sidepy, you know, in 2001, we released Sidepy
and then Endthought created a conference called Sidepy,
which was brought people together to talk about the space.
And that conference is still ongoing.
It’s one of the favorite conferences of a lot of people
because it’s, you know, it’s changed over the years,
but early on it was, you know, a collection of 50 people
who care about, scientists mostly, you know,
practicing scientists who want, who care about coding
and doing it well and not using MATLAB.
And I remember being driven by, you know, I liked MATLAB,
but I didn’t like the fact that,
so I’m not opposed to proprietary software.
I’m actually not an open source zealot.
I love open source for the, what it brings,
but I also see the role for proprietary software.
But what I didn’t like was the fact that I would develop
code and publish it and then effectively telling somebody
here to run my code, you have to have
this proprietary software.
Right, and there’s also culture around MATLAB as much,
because I’ve talked to a few folks in,
MathWorks creates MATLAB?
Yeah.
I mean, there’s just a culture, they try really hard,
but it just, there’s this corporate IBM style culture
that’s like, or whatever.
I don’t want to say negative things about IBM or whatever,
but there’s a…
No, it’s really that connection.
It’s something I’m in the middle of right now
is the business of open source.
And how do you connect the ethos of cooperative development
with the necessity of creating profits, right?
And like right now today, I’m still in the middle of that.
That’s actually the early days of me exploring this question.
Cause I was writing SciPy, I mean, as an aside,
I also had, so I had three kids at the time.
I have six kids now.
I got married early, wanted a family.
I had three kids and I remember reading,
I read Richard Stallman’s post and I was a fan of Stallman.
I would read his work, I liked this collective ideas
he would have.
Certainly the ideas on IP law, I read a lot of his stuff.
But then he said, okay, well,
how do I make money with this?
How do I make a living?
How do I pay for my kids?
All this stuff was in my mind,
young graduate student making no money,
thinking I got to get a job.
And he said, well, I think just be like me
and don’t have kids, right?
That’s just, don’t, don’t.
That’s his take on it.
That was what he said in that moment, right?
That’s the thing I read and I went,
okay, this is a train I can’t get on.
There has to be a way to preserve the culture
of open source and still be able to make sufficient money
to feed your kids.
Yes, exactly, there’s gotta be.
Well, so that actually led me to a study of economics.
Because at the time I was ignorant and I really was.
I’m actually, I’m embarrassed for educational system
that they could let me and I was valedictorian
in my high school class and I did super well in college.
And like academically I did great, right?
But the fact that I could do that and then be clueless
about this key part of life,
it led me to go, there’s a problem.
Like I should have learned this in fifth grade.
I should have learned this in eighth grade.
Like everybody should come out
with a basic knowledge of economics.
You’re an interesting example because you’ve created tools
that change the lives of probably millions of people
and the fact that you don’t understand at the time
of the creation of those tools, the basics economics
of how like to build up a giant system is the problem.
Yeah, it’s a problem.
And so during my PhD at the same time,
this is back in 98, 99 at the same time,
I was in a library, I was reading books on capitalism,
I was reading books on Marxism,
I was reading books on what is this thing?
What does it mean?
And I encountered, basically I encountered a set of writings
from people that said they were the inheritors of Adam Smith.
Read Adam Smith for the first time, right?
Which is the wealth of nations
and kind of this notion of emergent societies
and realized, oh, there’s this whole world out here
of people and the challenge of economics is also political.
Like, cause economics, people, different parties
running for office, they want their economic friends.
They want their economists to back them up, right?
Or to be their magicians, like the magicians
in Pharaoh’s court, right?
The people that are kind of say, hey, this is,
you should listen to me because I’ve got the expert
who says this.
And so it gets really muddled, right?
But I was looking at it from as a scientist going,
what is this space?
What does this mean?
How does Paris get fed?
How does, what is money?
How does it work?
And I found a lot of writings that I really loved.
I found some things that I really loved
and I learned from that.
It was writings from people like Von Missess.
He wrote a paper in 1920 that still should be read
more than it is.
It was the economic calculation problem
of the socialist commonwealth.
It was basically in response
to the Bolshevik revolution in 1917.
And his basic argument was it’s not gonna work
to not have private property.
You’re not gonna be able to come up with prices.
The bureaucrats aren’t gonna be able to determine
how to allocate resources without a price system.
And a price system emerges from people making trades.
And they can only make trades if they have authority
over the thing they’re trading.
And that creates information flow
that you just don’t have if you try to top down it.
Right.
And it’s like, huh, that’s a really good point.
Yeah, the prices have a signal that’s used.
And it’s important to have that signal
when you’re trying to build a community
of productive people like you would
in the software engineering space.
Yeah, the prices are actually
an important signaling mechanism.
Right, and that money is just a bartering tool.
Right, so this is the first time I’ve encountered
any of this concept, right, and the fact that,
oh, this is actually really critical.
Like it’s so critical to our prosperity
and that we’re dangerously not learning about this,
not teaching our children about this.
So you had the three kids,
you had to make some hard decisions.
I had to make some money, right, had to figure it out.
But I didn’t really care.
I mean, I’ve never been driven by money, just need it.
Yeah, right, need to eat.
So how did that resolve itself in terms of site buy?
So I would say it didn’t really resolve itself.
It sort of started a journey that I’m continuing on.
I’m still on, I would say.
I don’t think it resolved itself.
But I will say I went in eyes wide open.
Like I knew that there were problems
with giving stuff away and creating the market externalities
that the fact that, yeah, people might use it
and I might not get paid for it
and I’ll have to figure something else out to get paid.
Like at least I can say I’m not bitter
that a lot of people have used stuff that I’ve written
and I haven’t necessarily benefited economically from it.
I’ve heard other people be bitter about that
when they write or they talk.
Like, oh, I should’ve got more value out of this.
And I’m also, I want to create systems
that let people like me who might have these desires
to do things, let them benefit.
So it actually creates more of the same.
Not to turn on your bitterness module,
but there’s some aspect, I wish there was mechanisms for me
to reward whoever created side buy and non buy
because it brought so much joy to my life.
I appreciate that.
You know what I mean?
The tip dark notion was there.
I appreciate that.
But there should be a very frictionless mechanism.
There should be a frictionless mechanism.
I totally agree.
I would love to talk about some of the ideas I have
because I actually came across,
I think I’ve come up with some interesting notions
that could work, but they’ll require anything that will work
takes time to emerge, right?
Like things don’t just turn overnight.
That’s definitely one thing I’ve also understood
and learned is any fixes, that’s why it’s kind of funny.
We often give credit to, oh, this president gets elected
and oh, look how great things have done.
And I saw that when I had a transition in a condo
when a new CEO came in, right?
And it’s like the success that’s happening,
there’s an inertia there.
Yeah, and sometimes the decision you made
like 10 years before is the reason why the success is the.
Right, exactly.
So we’re sort of just running around taking credit
for stuff.
The credit assignment has like a delay to it
that makes the credit assignment basically wrong
more than right.
Wrong more than right, exactly.
And so I’m like, oh, this is, you know,
that’s the stuff I would read a ton about, you know,
early on.
So I don’t, I feel like I’m with you.
Like I want the same thing.
I want to be able to, and honestly, not for personally,
I’ve been happy.
I feel like I don’t have any, I mean,
we’ve been done reasonably okay, but I’ve had to pursue it.
Like that’s really what started my trajectory from academia
is reading that stuff led me to say,
oh, entrepreneurship matters.
So I love software, but we need more entrepreneurs
and I wanna understand that better.
So once I kind of had that virus infect my brain,
even though I was on a trajectory
to go to a tenure track position at a university
and I was there for six years,
I was kind of already out the door when I started.
And we can get into that, but.
Well, can I just ask you a quick question on,
is there some design principles
that were in your mind around SciPy?
Like, is there some key ideas
that were just like sticking to you
that this is the fundamental ideas?
Yeah, I would say so.
I would think it’s basically accessibility to scientists,
like give them, give scientists and engineers tools
that they don’t have to think a lot about programming.
So give them really good building blocks,
give them functions that they wanna call
and sort of just the right length of spelling.
There’s one tradition in programming where it’s like,
make very, very long names, right?
And you can see it in some programming languages
where the names get, take half the screen.
And in the 4chan world, characters had to be six letters
early on, right?
And that’s way too much, too little.
But I was like, I liked to have names
that were informative but short.
So even though Python, well this is a different conversation,
but documentation is doing some work there.
So when you look at great scientific libraries
and functions, there’s a richness of documentation
that helps you get into the details.
The first glance at a function gives you the intuition
of all it needs to do by looking at the headers and so on.
But to get the depths of all the complexities involved,
all the options involved,
documentation does some of the work.
Documentation is essential, yeah.
So that was actually a, so we thought about several things.
One is we wanted plotting.
We wanted interactive environment.
We wanted good documentation.
These are things we knew, we wanted.
The reality is those took about 10 years to evolve, right?
Given the fact that we didn’t have a big budget,
it was all volunteer labor.
It was sort of, when nthought got created
and they started to try to find projects,
people would pay for pieces
and they were able to fund some of it.
Not nearly enough to keep up with what was necessary.
And no criticism, just simply the reality.
I mean, it’s hard to start a business
and then do consulting and then also
promote an open source project that’s still fairly new.
Cypo is fairly niche.
We stayed connected all while I was a student,
sorry, a professor.
I went to BYU and started to teach.
Electrical engineering, all the applied math courses.
I loved teaching single processing,
probability theory, electromagnetism.
I was, if you look at writing my professor,
which my kids loved to do,
I wasn’t, I got some bad reviews because people.
What was the criticism?
I would speak too high of a level.
Like I definitely had a calibration problem
coming out of graduate work
where I hate to be condescending to people.
Like I really have a ton of respect for people fundamentally.
Like my fundamental thing is I respect people.
Sometimes that can lead to a,
I was thinking they had more knowledge than they did.
And so I would just speak at a very high level,
assume they got it.
But they need to rise to the standard that you set.
I mean, that’s one of the,
some of the greatest teachers do that.
And I agree.
And that was kind of what was inspiring me.
But you also have to,
I cannot say I was articulate
with some of the greatest teachers, right?
I was, like one classic example,
when I first taught at BYU,
my very first class, it was overheads,
transparencies, overheads.
Before projectors were really that common,
I taught transparencies.
I’m writing my notes out.
I go in, room’s half dark.
I just blaring through these transparencies.
Here it is, here it is, here it is.
And I did give a quiz after two weeks.
No one knew anything.
Nothing I had taught had gotten anywhere.
And I realized, okay, I’m not, this is not working.
So I put away the transparencies
and I turned around and just started using the chalkboard.
And what it did is it slowed me down, right?
The chalkboard just slowed me down
and gave people time to process and to think.
And then that made me focus.
My writing wasn’t great on the chalkboard,
but I really love that part of like the teaching.
So that entered SciPy’s world in terms of,
we always understood that there’s a didactic aspect
of SciPy, kind of how do you take the knowledge
and then produce it?
The challenge we had was the scope.
Like ultimately SciPy was everything, right?
And so 2001, when it first came out,
people were starting to use it.
No, this is cool, this is a tool we actually use.
At the same time, 2001 timeframe,
there was a little bit of like the Hubble Space Telescope,
the folks at Hubble that started to say,
hey, Python, we’re gonna use Python
for processing images from Hubble.
And so Perry Greenfield was a good friend
in running that program.
And he had called me before I left WIU and said,
you know, we wanna do this,
but numeric actually has some challenges in terms of,
you know, it’s not, the array doesn’t have enough types.
We need more operations.
You know, broadcasting needs to be a little more settled.
They wanted record arrays.
They wanted, you know, record arrays are like a data frame,
but a little bit different,
but they wanted more structured data.
So he had called me even early on then,
and he said, you know, what,
would you wanna work on something to make this work?
And I said, yeah, I’m interested, but I’m going here,
and I, you know, we’ll see if I have time.
So in the meantime, while I was teaching
and SciPy was emerging, and I had a student,
I was constantly, while I was teaching,
trying to figure a way to fund this stuff.
So I had a graduate student, my only graduate student,
a Chinese fellow, Liu Hongze is his name, great guy.
He wrote a bunch of stuff for iterative linear algebra,
like got into writing some of the iterative
linear algebra tools that are currently there in SciPy,
and they’ve gotten better since,
but this is in 2005, kept working on SciPy,
but Perry has started working on a replacement
to numeric called NumArray.
And in 2004, a package called ND Image,
it was an image processing library
that was written for NumArray,
and it had in it a morphology tool.
I don’t know if you know what morphology is.
It’s open, dilations, closed, you know,
there was sort of this, as a medical imaging student,
I knew what it was,
because it was used in segmentation a lot.
And in fact, I’d wanted to do something like that
in Python, in SciPy, but just had never gotten around to it.
So when it came out, but it worked only on NumArray,
and SciPy needed numeric,
and so we effectively had the beginning of this split.
And numeric and NumArray didn’t share data,
they were just two, so you could have a gigabyte
of numeric, NumArray data, and gigabyte of numeric data,
and they wouldn’t share it.
And so you had these,
then you had these scientific libraries written on top.
I got really bugged by that.
I got really like, oh man, this is not good,
we’re not cooperating now,
we’re sort of redoing each other’s work,
and we’re just this young community.
So that’s what led me, even though I knew it was risky,
because my, you know, I was on a tenure track position,
2004 I got reviewed.
They said, hey, things are going okay,
you’re doing well, paper’s coming out,
but you’re kind of spending a lot of time
doing this open source stuff, maybe do a little less of that,
and a little more of the paper writing and grant writing,
which was naive, but it was definitely the thinking.
It still goes on.
Still goes on.
You’re basically creating a thing
which enables science in the 21st century.
Right.
Maybe don’t emphasize that so much in your free year tenure.
Right.
It illustrates some of the challenges.
Yes.
It does, and it’s, people mean well.
Yes.
Like, but we’ve gotten broken in a bunch of ways.
Certain things, programming,
understanding the role of software engineering,
programming in society is a little bit lacking.
Exactly.
Now, I was in electrical engineering position.
Right.
That’s even worse there.
Yeah, it was very, they were very focused,
and so, you know, good people, and I had a great time,
I loved my time, I loved my teaching,
I loved all the things I did there.
The problem was, the split was happening
in this community that I loved, right?
I saw people, and I went, oh my gosh,
this is gonna be, this is not great,
and so I happened, you know, fate,
I had a class I had signed up for,
it’s a, I was trying to build an MRI system,
so I had a kind of a radio, instead of a radio,
a digital radio class, it was a digital MRI class.
And I had people sign up, two people signed up,
then they dropped, and so I had nobody in this class.
So, and I didn’t have any other courses to teach,
and I thought, oh, I’ve got some time,
and I’ll just write, I’ll just write a replace,
a merger of Numerica Numeray.
Like, I’ll basically take the numeric code base
at the features Numeray was adding,
and then kind of come up with a single array library
that everybody can use.
So that’s where NumPy came from,
was my thinking, hey, I can do this,
and who else is going to?
Because at that point, I’d been around the community
long enough, and I’d written enough C code,
I knew, I knew the structures, and I,
in fact, my first contribution to numeric
had been writing the CAPI documentation
that went in the first documentation for NumPy,
for numeric, sorry, this is Paul DuBois,
David Asher, Conrad Hinson, and myself.
I got credit because I wrote this chapter,
which is all the CAPI of Numerica, all the C stuff.
So I said, I’m probably the one to do it,
and nobody else is gonna do this.
So it was sort of, out of a sense of duty and passion,
knowing that, eh, I don’t think my academic,
I don’t think the department here is gonna appreciate this,
but it’s the right thing to do.
It was like.
Can we just link on that moment?
Yeah, yeah.
Because the importance of the way you thought
and the action you took, I feel is understated
and is rare and I would love to see so much more of it
because what happens as the tools become more popular,
there’s a split that happens.
And it’s a truly heroic and impactful action
to in those early, in that early split,
to step up and it’s like great leaders throughout history,
like get, what is the brave heart,
like get on a horse and rile the troops
because I think that can have, make a big difference.
We have TensorFlow versus PyTorch
in the machine learning community.
We have the same problem today.
Yeah, I wonder.
It’s actually bigger.
I wonder if it’s possible in the early days
to rally the troops.
It is possible, especially in the early days.
The longer it goes, the harder, right?
The more energy in the factions, the harder.
But in the early days, it is possible
and it’s extremely helpful
and there’s a willingness there,
but the challenge is there’s just not a willingness
to fund it.
There’s not a willingness to, you know,
like I was literally walking into a field
saying I’m going to do this
and here I am, like, you know,
I have five kids at home now.
Pressure builds.
Sometimes my wife hears these stories
and she’s like, you did what?
I thought we were going to,
I thought you were actually on a path
to make sure we had resources and money, but,
but again, there’s a, there’s an aspect,
I’m a very hopeful person.
I’m an optimistic person by nature.
I love people.
I learned that about myself later on.
And part of my, my religious beliefs
actually lead to that.
And it’s why I hold them dear
because it’s actually how I feel about,
that’s what leads me to these attitudes,
sort of this hopefulness and this sense of,
yeah, it may not work out for me financially
or maybe, but that’s not the ultimate gain.
Like that’s a thing, but it’s not,
that’s not the scorecard for me.
And so I just wanted to be helpful
and I knew, and partly because these SciPy conferences,
because the maintenance conversations,
I knew there was a lot of need for this, right?
And so I had this, it wasn’t like I was alone
in terms of no feedback.
I had these people who knew, but it was crazy.
Like people who at the time said,
yeah, we didn’t think you’d be able to do it.
We thought it was crazy.
And also instructive, like practically speaking,
that you had a cool feature
that you were chasing the morphology, like the.
Yes.
Like it’s not just like.
There’s an end result.
It’s not some visionary thing.
I’m going to unite the community.
You were like. Correct.
You were actually practically,
this is what one person actually could do
and actually build.
Cause that is important.
Cause you can get over your skis.
You can definitely get over your skis.
And I had, in fact, this almost got me over my skis, right?
I would say, well, in retrospect, I hate looking back.
I can tell you all the flaws with NumPy, right?
When I go into it, there’s lots of stuff that I’m like,
oh man, that’s embarrassing.
That was wrong.
I wish I had somebody stop me with a wet fish there.
Like I needed, like what I’d wished I’d had
was somebody with more experience and certainly library
writing and array library.
There’s like, I wish I had me.
I could go back in time and go do this, do that.
There’s a more important thing.
Cause there’s things we did that are still there
that are problematic, that created challenges for later.
And I didn’t know it at the time.
Didn’t understand how important that was.
And in many cases, didn’t know what to do.
Like there was pieces of the design of NumPy.
I didn’t know what to do until five years ago.
Now I know what they should have been, Ben.
But I didn’t know at the time and nobody,
and I couldn’t get the help.
Anyway, so I wrote it.
It took about, it took four months to write
the first version, then about 14 months to make it usable.
But it was, it wasn’t, it was that first four months
of intense writing, coding, getting something out the door
that worked that was, it was, it was definitely challenging.
And then the big thing I did was create a new type object
called D type.
That was probably the contribution.
And then the fact that I added broad, not just broadcasting,
but advanced indexing so that you could do masked indexing
and indirect indexing instead of just slicing.
So for people who don’t know, and maybe you can elaborate,
NumPy, I guess the vision in the narrowest sense
is to have this object that represents
n dimensional arrays.
And like at any level of abstraction you want,
but basically it could be a black box
that you can investigate in ways that you would naturally
want to investigate such objects.
Yes, exactly.
So you could do math on it easily.
Math on it easily, yeah.
So it had an associated library of math operations
and effectively SciPy became an even larger operate set
of math operations.
So the key for me was I was going to write NumPy
and then move SciPy to depend on NumPy.
In fact, early on, one of the initial proposals
was that we would just write SciPy
and it would have the numeric object inside of it.
And it’d be SciPy.array or something.
That turned out to be problematic because numeric
already had a little mini library of linear algebra
and some functions, and it had enough momentum,
enough users that nobody wanted to,
they wanted backward compatibility.
One of the big challenges of NumPy
was I had to be backward compatible
with both numeric and NumArray
in order to allow both of those communities to come together.
There was a ton of work in creating
that backward compatibility
that also created echoes in today’s object.
Like some of the complexity in today’s object
is actually from that goal of backward compatibility
to these other communities,
which if you didn’t have that, you’d do something different,
which is instructive because a lot of things are there.
You think, what is that there for?
It’s like, well, it’s a remnant.
It’s an artifact of its historical existence.
By the way, I love the empathy
and the lack of ego behind that
because I feel, you see that in the split
in the JavaScript framework, for example,
the arbitrary branching.
Right.
I think in order to unite people,
you have to kind of put your ego aside
and truly listen to others.
You do.
What do you love about NumArray?
What do you love about Numeric?
Like actually get a sense,
we were talking about languages earlier,
sort of empathize to the culture,
the people that love something about this particular API,
some of the naming style
or the actual usage patterns
and truly understand them
and so that you can create that same draw
in the united thing. I completely agree.
I completely agree.
And you have to also have enough passion
that you’ll do it.
It can’t be just like a perfunctory,
oh yes, I’ll listen to you
and then I’m not really that excited about it.
So it really is an aspect,
it’s a philosophical, like there’s a philia,
there’s a love of esteeming of others.
It’s actually at the heart of what,
it’s sort of a life philosophy for me, right?
That I’m constantly pursuing and that helped,
absolutely helped.
Makes me wonder in a philosophical,
like looking at human civilization as one object,
it makes me wonder how we can copy and paste Travis’s
in this book.
Well, some aspects, maybe.
Some aspects, right, right, exactly.
Well, it’s a good question.
How do we teach this?
How do we encourage it?
How do we lift it?
Because so much of the software world,
it’s giant communities, right?
But it seems like so much is moved by,
like little individuals.
You talk about like Linus Torvalds.
It’s like, could you have not,
could you have had Linux without him?
Could you?
Yeah, Guido and Python.
Guido and Python.
Well, the iPy community particularly,
it’s like I said, we wanted to build this big thing,
but ultimately we didn’t.
What happened is we had Mavericks and champions
like John Hunter who created Matplotlib.
We had Fernando Perez who created iPython.
And so we sort of inspired each other,
but then it kind of, there’s sort of a culture
of this selfless giving, the stewardship mentality,
as opposed to ownership mentality,
but stewardship and community focused,
community focused, but intentional work.
Like not waiting for everybody else to do the work,
but you’re doing it for the benefit of others
and not worried about what you’re gonna get.
You’re not worried about the credit.
You’re not worried about what you’re gonna get.
You’re worried about, I later realized
that I have to worry a little about credit,
not because I want the credit,
because I want people to understand
what led to the results.
Like, I don’t, it’s not about me.
It’s I want to understand this is what led to the result.
So let’s like, I think doing,
and this is what had no impact on the result.
Like let’s promote, just like you said,
I want to promote the attributes
that help make us better off.
How do we make more of West McKinney?
Like West McKinney was critical to the success of Python
because of his creation of pandas,
which is the roots of that were all the way back
in numeric and num array and numpy,
where numpy created an array of records.
West started to use that almost like a data frame,
except it’s an array of records.
And data frame, the challenge is,
okay, if you want to augment it at another column,
you have to insert, you have to do all this memory movement
to insert a column.
Whereas data frames became,
oh, I’m going to have a loose collection of arrays.
So it’s a record of arrays that is a part of a data frame.
And we thought about that back in the memory days,
but West ended up doing the work to build it.
And then also the operations that were relevant
for data processing.
What I noticed is just that each of these little things
creates just another tick, another up.
So numpy ultimately took a little while,
about six months in, people started to join me,
Francesc Altad, Robert Kern, Charles Harris.
And these people are many of the unsung heroes, I would say.
People who are, you know,
they sometimes don’t get the credit they deserve
because they were critical both to support,
like, you know, it’s hard and you want,
you need some support, people need support.
And I needed just encouragement.
And they were helping and encouraged by contributing.
And once, the big thing for me was when John Hunter,
he had previously done kind of a simple thing
called numerics to kind of, you know, between numeric
and numerae, he had a little high level tool
that would just select each one for matplotlib.
In 2006, he finally said,
we’re gonna just make numpy the dependency of matplotlib.
As soon as he did that,
and I remember specifically when he did that,
I said, okay, we’ve done it.
Like, that was when I knew we had to see success.
Before then it was still unsure,
but that kind of started a roller coaster.
And then 2006 to 2009.
And then I’ve been floored by what it’s done.
Like, I knew it would help.
I had no idea how much it would help.
Right, so.
And it has to do with, again, the language thing.
It just, people started to think in terms of numpy.
Yes.
And that opened up a whole new way of thinking.
And part of the story that you kind of mentioned,
but maybe you can elaborate,
is it seems like at some point in the story,
Python took over science and data science.
Yes.
And bigger than that,
the scientific community started to think like programmers
or started to utilize the tools of computers to do,
like at a scale that wasn’t done with Fortran.
Like at this gigantic scale,
they started to open in their heart.
And then Python was the thing.
I mean, there’s a few other competitors, I guess,
but Python, I think, really, really took over.
I agree.
There’s a lot of stories here
that are kind of during this journey,
because this is sort of the start of this journey in 2005, 2006.
So my tenure committee, I applied for tenure in 2006, 2007.
It came back, I split the department.
I was very polarizing.
I had some huge fans
and then some people that said no way, right?
So it was very, I was a polarizing figure in the department.
It went all the way up to the university president.
Ultimately, my department chair had the sway
and they didn’t say no.
They said, come back in two years and do it again.
And I went, eh, at that point, I was like,
I mean, I had this interest in entrepreneurship,
this interest in not the academic circles,
not the, like, how do we make industry work?
So I do have to give credit to that exploration of economics
because that led me, oh, I had a lot of opinions.
I was actually very libertarian at the time.
And I still have some libertarian trends,
but I’m more of a, I’m more of a collectivist libertarian.
So you value broadly, philosophically freedom.
I value broadly, philosophically freedom,
but I also understand the power of communities,
like the power of collective behavior.
And so what’s that balance, right?
That makes sense.
So by the time I was just,
I gotta go out and explore this entrepreneur world.
So I left academia.
I said, no thanks, called my friend, Eric, here,
who had, his company was going.
I said, hey, could I join you and start this trend?
And he, at that time they were using SciFi a lot.
They were trying to get clients.
And so I came down to Texas.
And in Texas is where I sort of,
it’s my entrepreneur world, right?
I left academia and went to entrepreneur world in 2007.
So I moved here in 2007, kind of took a leap,
knew nothing really about business,
knew nothing about a lot of stuff there.
There’s, you know, for a long time,
I’ve kept some connections to a lot of academics
because I still value it.
I still love the scientific tradition.
I still value the essence and the soul and the heart
of what is possible.
Don’t like a lot of the administration
and the kind of, we can go into detail about why
and where and how this happens,
what are some of the challenges.
I don’t know, but I’m with you.
So I’m still affiliated with MIT.
I still love MIT because there’s magic there.
There’s people I talk to, like researchers, faculty,
in those conversations and the whiteboard
and just the conversation, that’s magic there.
All the other stuff, the administration,
all that kind of stuff seems to,
you don’t wanna say too harshly criticize
sort of bureaucracies, but there’s a lag
that seems to get in the way of the magic.
And I’m still have a lot of hope
that that can change because I don’t often see
that particular type of magic elsewhere in the industry.
So like we need that and we need that flame going.
And it’s the same thing as exactly as you said,
it has the same kind of elements
like the open source community does.
And, but then if you, like the reason I stepped away,
the reason I’m here, just like you did in Austin is like,
if I wanna build one robot, I’ll stay at MIT.
But if I wanna build millions and make money enough
to where I can explore the magic of that, then you can’t.
And I think that dance is…
That translational dance has been lost a bit, right?
And there’s a lot of reasons for that.
I’m not, I’m certainly not an expert on this stuff.
I can opine like anybody else,
but I realized that I wanted to explore entrepreneurship,
which I, and really figure out,
and it’s been a driving passion for 20 years, 25 years.
How do we connect capital markets and company?
Cause again, I fell in love with the notion of,
oh, profit seeking on its own is not a bad thing.
It’s actually a coordination mechanism
for allocating resources that, you know,
in an emergent way, right?
That respects everybody’s opinions, right?
So this is actually powerful.
So I say all the time, when I make a company
and we do something that makes profit,
what we’re saying is, hey,
we’re collecting of the world’s resources
and voluntarily people are asking us
to do something that they like.
And that’s a huge deal.
And so I really liked that energy.
So that’s what I came to do and to learn
and to try to figure out.
And that’s what I’ve been kind of stumbling through
since for the past 14 years.
And that’s 2007.
2007, yeah.
And so you were still working at NoPi.
So NoPi was just emerging.
Just emerging.
One of the things I’ve done,
it’s worth mentioning because it emphasizes
the exploratory nature of my thinking at the time.
I said, well, I don’t know how to fund this thing.
I’ve got a graduate student I’m paying for
and I’ve got no funding for him.
And I had done some fundraising from the public
to try to get public fundraisers in my lab.
I didn’t really wanna go out
and just do the fundraising circuit
the way it’s traditionally done.
So I wrote a book and I said, I’m gonna write a book
and I’m gonna charge for it.
It was called Guide to NoPi.
And so ultimately NoPi became
documentation driven development
because I basically wrote the book
and made sure the stuff worked or the book would work.
So it really helped actually make NoPi become a thing.
So writing that book,
and it’s not a page turner.
Guide to NoPi is not a book you pick up
and go, oh, this is great, over the fire.
But it’s where you could find the details,
like how’d all this work.
And a lot of people love that book.
And so a lot of people ended up,
so I said, look, I need to, so I’m gonna charge for it.
And I got some flack for that.
Not that much, just probably five angry messages,
people yelling at me saying I was a bad guy
for charging for this book.
Was one of them Richard Stallman?
No. Just kidding.
No, I haven’t really had any interaction with him personally,
like I said, but there were a few,
but actually surprisingly not.
There was actually a lot of people like,
no, it’s fine, you can charge for a book.
That’s no big deal.
We know that’s a way you can try to make money
around open source.
So what I did, I did it in an interesting way.
I said, well, kind of my ideas around IP law and stuff.
I love the idea you can share something, you can spread it.
Like once it’s, the fact that you have a thing
and copying is free, but the creation is not free.
So how do you fund the creation and allow the copying?
And in software, it’s a little more complicated than that
because creation is actually a continuous thing.
It’s not like you build a widget and it’s done.
It’s sort of a process of emerging
and continuing to create.
But I wrote the book
and had this market determined price thing.
I said, look, I need, I think I said 250,000.
If I make 250,000 from this book, I’ll make it free.
So as soon as I get that much money,
or I said five years, so there’s a time limit.
Like it’s not forever.
That’s really cool.
It’s amazing.
I released it on this.
And it’s actually interesting
because one of the people
who also thought that was interesting
ended up being Chris White,
who was the director of DARPA project
that we got funding through at Anaconda.
And the reason he even called us back
is because he remembered my name from this book
and he thought that was interesting.
And so even though we hadn’t gone to the demo days,
we applied and the people said, yeah,
nobody ever gets this without coming to the demo day first.
This is the first time I’ve seen it.
But it’s because I knew, you know,
Chris had done this and had this interaction.
So it did have impact.
I was actually really, really pleased by the result.
I mean, I ended up in three years, I made 90,000.
So sold 30,000 copies by myself.
I just put it up on, you know, use PayPal and sold it.
And that was my first taste of kind of, okay,
this can work to some degree.
And I, you know, all over the world, right?
From Germany to Japan to, it was actually, it did work.
And so I appreciated the fact that PayPal existed
and I had a way to get the money, the distribution was simple.
This is pre Amazon book stuff.
So it was just publishing a website.
It was the popularity of SciPy emerging
and getting company usage.
I ended up not letting it go the five years
and not trying to make the full amount
because, you know, a year and a half later,
I was at Enthought.
I had left academia as an Enthought
and I kind of had a full time job.
And then actually what happened is the documentation people,
there’s a group that said, hey,
we want to do documentation for SciPy as a collective.
And they’re essentially needing the stuff in the book, right?
And so they kind of ask,
hey, could we just use the stuff in your book?
And at that point I said, yeah, I’ll just open it up.
So that’s, but it has served its purpose.
And the money that I made actually funded my grad student.
Like it was actually, you know,
I paid him 25,000 a year out of that money.
So the funny thing is if you do a very similar
kind of experiment now with NumPy or something like it,
you could probably make a lot more.
It’s probably true.
Because of the tooling and the community building.
Yeah, I agree.
Like the, and social media,
that there’s just a virality to that kind of idea.
I agree.
There’d be things to do.
I’ve thought about that.
And really I thought about a couple of books
or a couple of things that could be done there.
And I just haven’t, right?
Even, I tried to hire a ghostwriter this year too
to see if that could help, but it didn’t.
But part of my problem is this,
I’ve been so excited by a number of things
that have stemmed from that.
Like, so I came here, worked at Enthought for four years,
graciously, Eric made me president.
Then we started to work closely together.
We actually helped him buy out his partner.
It didn’t end great.
Like unfortunately Eric and I aren’t real,
aren’t friends now.
I still respect him.
I have a lot, I wish we were,
but he didn’t like the fact that Peter and I
started Anaconda, right?
That was not, I mean, so there’s two sides to that story.
So I’m not gonna go into it, right?
Sure.
But you, as human beings
and you wish you still could be friends.
I do, I do.
It saddens me.
I mean, that’s a story of great minds
building great companies.
Somehow it’s sad that when there’s that kind of.
And I hold him in esteem.
I’m grateful for him.
I think Enthought still exists.
They’re doing great work helping scientists.
They still run the SciPy conference.
They have an R&D platform they’re selling now
that’s a tool that you can go get today, right?
So Enthought has played a role in the SciPy
in supporting the community around SciPy, I would say.
They ended up not being able to,
they ended up building a tool suite
to write GUI applications.
Like that’s where they could actually make
that the business could work.
And so supporting SciPy and NumPy itself
wasn’t as possible.
Like they didn’t, they tried.
I mean, it was not just because,
it was just because of the business aspect.
So, and I wanted to build a company that could do,
that could get venture funding, right?
Better for worse.
I mean, that’s a longer story.
We could talk a lot about that, but.
And that’s where Anaconda came to be.
That’s where Anaconda came to be.
So let me ask you, it’s a little bit for fun
because you built this amazing thing.
And so let’s talk about like an old warrior
looking over old battles.
You’ve, you know, there’s a sad letter in 2012
that you wrote to the NumPy mailing list
announcing that you’re leaving NumPy.
And some of the things you’ve listed
as some of the things you regret
or not regret necessarily, but some things to think about.
If you could go back and you could fix stuff about NumPy
or both sort of in a personal level,
but also like looking forward,
what kind of things would you like to see changed?
Good question.
So I think there’s technical questions
and social questions right there.
First of all, you know, I wrote NumPy as a service
and I spent a lot of time doing it.
And then other people came help make it happen.
NumPy succeeded because the work of a lot of people, right?
So it’s important to understand that.
I’m grateful for the opportunity,
the role I had, I could play
and grateful that things I did had an impact,
but they only had the impact they had
because the other people that came to the story.
And so they were essential,
but the way data types were handled,
the way data types, we had array scalers, for example,
that are really just a substitute for a type concept, right?
So we had array scalers or actual Python objects
so that there’s for every, for a 32 bit float
or a 16 bit float or a 16 bit integer,
Python doesn’t have a natural,
it’s just one integer, there’s one float.
Well, what about these lower precision types,
these larger precision types?
So we had them in NumPy
so that you could have a collection of them,
but then have an object in Python that was one of them.
And there’s questions about like in retrospect,
I wouldn’t have created those
if it improved the type system.
And like made the type system actually a Python type system
as opposed to currently,
it’s a Python one level type system.
I don’t know if you know the difference
between Python one, Python two,
it’s kind of technical, kind of depth,
but Python two, one of its big things that Guido did,
it was really brilliant.
It was the actually Python one,
all classes, new objects were one.
If you as a user wrote a class,
it was an instance of a single Python type
called the class type, right?
In Python two, he used a meta typing hook
to actually go, oh, we can extend this
and have users write classes that are new types.
So he was able to have your user classes be actual types
and the Python type system got a lot more rich.
I barely understood that at the time that NumPy was written.
And so I essentially in NumPy created a type system
that was Python one era.
It was every D type is an instance of the same type
as opposed to having new D types be really just Python types
with additional metadata.
What’s the cost of that?
Is it efficiency, is it usability?
It’s usability primarily.
The cost isn’t really efficiency.
It’s the fact that it’s clumsy to create new types.
It’s hard.
And then one of the challenges,
you wanna create new types.
You wanna quaternion type or you wanna add a new posit type
or you wanna, so it’s hard.
And now, if we had done that well,
when Numba came on the scene
where we could actually compile Python code,
it would integrate with that type system much cleaner.
And now all of a sudden you could do gradual typing
more easily.
You could actually have Python when you add Numba
plus better typing, could actually be a,
you’d smooth out a lot of rough edges.
But there’s already, there’s like,
but are you talking about from the perspective
of developers within NumPy or users of NumPy?
Developers of new, not really users of NumPy so much.
It’s the development of NumPy.
So you’re thinking about like how to design NumPy
so that it’s contributors.
Yeah, the contributors, it’s easier.
It’s easier.
It’s less work to make it better and to keep it maintained.
And where that’s impacted things, for example,
is the GPU.
Like all of a sudden GPUs start getting added
and we don’t have them in NumPy.
Like NumPy should just work on GPUs.
The fact that we’d have to download a whole other object
called Kupy to have arrays on GPUs
is just an artifact of history.
Like there’s no fundamental reason for it.
Well, that’s really interesting.
If we could sort of go on that tangent briefly
is you have PyTorch and other libraries like TensorFlow
that basically tried to mimic NumPy.
Like you’ve created a sort of platonic form
of multi dimension. Basically, yeah.
Yeah, exactly.
Well, and the problem was I didn’t realize that.
Platonic form has a lot of edges.
They’re like, well, we should cut those out
before we present it.
So I wonder if you can comment,
is there like a difference between their implementations?
Do you wish that they were all using NumPy
or like in this abstraction of GPU?
And sorry to interrupt that there’s GPUs, ASICs.
There might be other neuromorphic computing.
There might be other kind of,
or the aliens will come with a new kind of computer.
Like an abstraction that NumPy should just operate nicely
over the things that are more and more
and smarter and smarter with this multi dimensional arrays.
Yeah, yeah.
There’s several comments there.
We are working on something now called data dash APIs.org.
Data dash API.org, you can go there today.
And it’s our answer.
It’s my answer.
It’s not just me.
It’s me and Rolf and Athen and Aaron
and a lot of companies are helping us at Quansight Labs.
It’s not unifying all the arrays.
It’s creating an API that is unified.
So we do care about this
and we’re trying to work through it.
I actually had the chance to go and meet
with the TensorFlow team and the PyTorch team
and talk to them after exiting Anaconda.
Just talking about,
because the first year after leaving Anaconda in 2018,
I became deeply aware of this and realized that,
oh, this split in the array community that exists today
makes what I was concerned about in 2005 pretty parochial.
It’s a lot worse, right?
Now there’s a lot more people.
So perhaps the industry can sustain more stacks, right?
There’s a lot of money,
but it makes it a lot less efficient.
I mean, but I’ve also learned to appreciate,
it’s okay to have some competition.
It’s okay to have different implementations,
but it’s better if you can at least refactor some parts.
I mean, you’re gonna be more efficient
if you can refactor parts.
It’s nice to have competition over things,
over what is nice to have competition.
They’re innovative.
Yeah, innovative.
And then maybe on the infrastructure,
whatever, however you define infrastructure,
that maybe it’s nice to have come together.
Exactly, I agree.
And I think, but it was interesting to hear the stories.
I mean, TensorFlow came out of a C++ library,
Jeff Dean wrote, I think,
that was basically how they were doing inference, right?
And then they realized, oh,
we could do this TensorFlow thing.
That C++ library, then what was interesting to me
was the fact that both Google and Facebook did not,
it’s not like they supported Python or NumPy initially.
They just realized they had to.
They came to this world and then all the users were like,
hey, where’s the NumPy interface?
Oh, and then they kind of came late to it
and then they had these bolt ons.
TensorFlow’s bolt on, I don’t mean to offend,
but it was so bad.
Yeah, it was bad.
It’s the first time that I’m usually,
I mean, one of the challenges I have
is I don’t criticize enough in the sense
that I don’t give people input enough, you know, if.
I think it’s universally agreed upon
that the bolt ons on TensorFlow were.
But I went to, it was a talk given at Mallorca in Spain
and a great guy came and gave a talk and I said,
you should never show that API again
at a PyData conference.
Like that was, that’s terrible.
Like you’re taking this beautiful system we’ve created
and like you’re corrupting all these poor Python people,
forcing them to write code like that
or thinking they should.
Fortunately, you know, they adopted Keras as their,
and Keras is better.
And so Keras, TensorFlow is fine, is reasonable,
but they bolted it on.
Facebook did too.
Like Facebook had their own C++ library for doing inference
and they also had the same reaction, they had to do this.
One big difference is Facebook,
maybe because of the way it’s situated in part of fair,
part of the research library,
TensorFlow is definitely used and, you know,
they have to make, they couldn’t just open it up
and let the community, you know, change what that is.
Cause I guess they were worried
about disrupting their operations.
Facebook’s been much more open to having community input
on the structure itself.
Whereas Google and TensorFlow,
they’re really eager to have community users,
people use it and build the infrastructure,
but it’s much more walled.
Like it’s harder to become a contributor to TensorFlow.
And it’s also, this is very difficult question to answer
and don’t mean to be throwing shade at anybody,
but you have to wonder, it’s the Microsoft question
of when you have a tool like PyTorch or TensorFlow,
how much are you tending to the hackers
and how much are you tending to the big corporate clients?
Correct.
So like the ones that,
do you tend to the millions of people
that are giving you almost no money,
or do you tend to the few
that are giving you a ton of money?
I tend to stand with the people.
Right.
Cause I feel like if you nurture the hackers,
you will make the right decisions in the longterm
that will make the companies happy.
I lean that way too.
I totally agree.
But then you have to find the right dance.
But it’s a balance.
Cause you can lean to the hackers and run out of money.
Yeah, exactly.
Exactly.
Which has been some of the challenge I’ve faced
in the sense that,
like I would look at some of the experiments,
like NumPy, the fact that we have this split
is a factor of I wasn’t able to collect more money
towards NumPy development.
Yeah.
Right?
I mean, I didn’t succeed in the early days
of getting enough financial contribution to NumPy
so that they could work on it.
Right?
I couldn’t work on it full time.
I had to just catch an hour here, an hour there.
And I basically not liked that.
Like I’ve wanted to be able to do something about that
for a long time and try to figure out how,
well, there’s lots of ways.
I mean, possibly one could say,
we had an offer from Microsoft
at early days of Anaconda.
2014, they offered to come buy us, right?
The problem was the right people at Microsoft
didn’t offer to buy us.
And they were still,
they were, it was really a,
we were like a second,
they had really bought, they just bought R,
the R company called,
it was not R studio,
but it was another R company that was emergent.
And it was kind of a,
well, we should also get a Python play,
but they were really doubling down on R.
Right?
And so it was like,
it was where you would go to die.
So it’s not, it wasn’t,
it was before Satya was there.
Satya had just started.
Just started.
Right?
And the offer was coming from someone
two levels down from him.
Got you.
Right?
And if it had come from Scott Guthrie,
so I got a chance to meet Scott Guthrie,
great guy, I like him.
If an offer had come from him,
probably would be at Microsoft right now.
That’d be fascinating.
That would be really nice actually,
especially given what Microsoft has since done
for the open source community and all those things.
Yes, I think they’re doing well.
I really like some of the stuff they’ve been doing.
They’re still working,
and they’ve, you know,
they’ve hired Guido now,
and they’ve hired a lot of Python developers.
Wait, Guido’s not at Microsoft?
Yeah, he works at Microsoft.
I need to.
Which, he retired,
then he came out of retirement,
and he’s working now.
I was just talking to him,
and he didn’t mention this person.
Well.
I should investigate this further.
Well.
Because I know he loved Dropbox,
but I wasn’t sure what he was doing,
who he was up to.
Well, he was kind of saying he’d retire,
but, and it’s literally been five years
since I last sat down and really talked to Guido.
Right?
Guido’s a technology expert, right?
He’s a, so I came,
I was excited because I’d finally figured out
the type system for NumPy.
I wanted to kind of talk about that with him,
and I kind of overwhelmed him.
Could you stay in that,
just for a brief moment,
because you’re a fascinating person
in the history of programming.
He is a fascinating person.
What have you learned from Guido
about programming, about life?
Yeah, yeah.
A lot, actually.
I’ve been a fan of Guido’s.
You know, we have a chance to talk.
Some, I wouldn’t say, you know,
we talk all the time.
Not at all.
He may, but we talk enough to,
I respect his,
in fact, when I first started NumPy,
one of the first things I did was I had a,
I asked Guido for a meeting
with him and Paul Dubois in San Mateo.
And I went and met him for lunch.
And basically, to say,
maybe we can actually,
part of the strategy for NumPy
was to get it into Python 3,
and maybe be part of Python.
And so we talked about that.
That’s a cool conversation.
And about that approach, right?
I would have loved to be a flyer in the water.
That was good.
And over the years for Guido,
I learned,
so he was open.
Like, he was willing to listen to people’s ideas.
Right?
And over the years,
now generally, you know,
I’m not saying universally that’s been true,
but generally that’s been true.
So he’s willing to listen.
He’s willing to defer.
Like on the scientific side,
he would just kind of defer.
He didn’t really always understand
what we were doing.
Yeah.
And he’d defer.
One place where he didn’t enough
was we missed a matrix multiply operator.
Like that finally got added to Python,
but about 10 years later than it should have.
But the reason was because nobody,
it takes a lot of effort.
And I learned this while I was writing NumPy.
I also wrote tools to Python.
I began with Python Dev,
and I added some pieces to Python.
Like the memory view object.
I wanted the structure of NumPy into Python.
So we didn’t get NumPy into Python,
but we got the basic structure of it into Python.
Like, so you could build on it.
Nobody did for a while,
but eventually database authors started to.
And it’s a lot better.
They did.
And also Antoine Petrou and Stefan Krah
actually fixed the memory view object.
Cause I wrote the underlying infrastructure in C,
but the Python exposure was terrible
until they came in and fixed it.
Partly because I was writing NumPy,
and NumPy was the Python exposure.
I didn’t really care about
if you didn’t have NumPy installed.
Anyway, Guido opened up ideas,
technologically brilliant.
Like really, I really got a lot of respect for him
when I saw what he did
with this type class merger thing.
It was actually tricky, right?
And then willing to share, willing to share his ideas.
So the other thing early on in 1998,
I said, I wrote my first extension module.
The reason I could is because he’d written this blog post
on how to do reference counting, right?
And without it, I would have been lost, right?
But he was willing to at least try to write this post.
And so he’s been motivated early on with Python.
There’s a computer science for everybody.
You kind of have this early on desire to,
oh, maybe we should be pushing programming to more people.
So he had this populist notion, I guess,
or populist sense to learn that there’s a certain skill,
and I’ve seen it in other people too,
of engaging with contributors sufficiently to,
because when somebody engaged with you
and wants to contribute to you,
if you ignore them, they go away.
So building that early contributor base
requires real engagement with other people.
And he would do that.
Can you also comment on this tragic stepping down
from his position as the benevolent dictator for life
over the wars, you know?
The Walrus operator?
The Walrus operator was the last battle.
I don’t know if that’s the cause of it,
but there’s this, for people who don’t know,
you can look up, there’s the Walrus operator,
which looks like a colon and equal sign.
Yeah, colon, equal sign.
And it actually does maybe the thing
that an equal sign should be doing.
Yeah, maybe, right, exactly.
But it’s just historically,
equal sign means something else.
It just means assignment.
So he stepped down over this.
What do you think about the pressure of leadership?
It’s something that, you mentioned the letter I wrote
in NumPy at the time.
That was a hard time, actually.
I mean, there’s been really hard times.
It was hard.
You get criticized, right?
And you get pushed, and you get,
not everybody loves what you do.
Like anytime you do anything that has impact at all,
you’re not universally loved, right?
You get some real critics.
And that’s an important energy,
because it’s impossible for you to do everything right.
You need people to be pushing.
But sometimes people can get mean, right?
People can, I prefer to give people the benefit of the doubt.
I don’t immediately assume they have bad intentions.
And maybe for other, maybe that doesn’t happen for everybody.
For whatever reason, their past,
their experiences with people, they sometimes have bad,
so they immediately attribute to you bad intentions.
So you’re like, where did this come from?
I mean, I’m definitely open to criticism,
but I think you’re misinterpreting the whole point.
Because I would get that, certainly when I started Anaconda.
Sometimes I say to people,
I care enough about entrepreneurship
to make some open source people uncomfortable.
And I care enough about open source
to make investors uncomfortable.
So I sort of, you create kind of doubters on both sides.
So when you have, and this is just a plea
to the listener and the public, I’ve noticed this too,
that there’s a tendency, and social media makes this worse,
when you don’t have perfect information about the situation,
you tend to fill the gaps with the worst possible,
or at least a bad story that fills those gaps.
And I think it’s good to live life,
maybe not fully naively, but filling in the gaps
with the good, with the best, with the positive,
with the hopeful explanation of why you see this.
So if you see somebody like you trying to make money
on a book about an umpire,
there’s a million stories around that that are positive.
And those are good to think about,
to project positive intent on the people.
Because for many reasons, usually because people are good
and they do have good intent.
And also when you project that positive intent,
people will step up to that too.
Yes.
It’s a great point.
It has this kind of viral nature to it.
And of course with Twitter, early on figured out,
and Facebook is that they can make a lot of money
and engagement from the negative.
Yes.
So there’s this, we’re fighting this mechanism.
I agree.
Which is challenging.
It’s easier.
It’s just easier to be.
To be negative.
And then for some reason, something in our minds
really enjoys sharing that and getting all excited
about the negativity.
We do, yeah.
Some protective mechanism perhaps that we’re gonna get eaten
if we don’t, yeah.
Exactly.
For us to be effective as a group of people
in a software engineering project,
you have to project positive intent, I think.
I totally agree.
Totally agree.
And I think that’s very,
and so that happens in this space.
But Python has done a reasonable job in the past,
but here is a situation where I think it started
to get this pressure where it didn’t.
I really didn’t, I didn’t know enough about what happened.
I’ve talked to several people about it.
And I know most of the steering committee members today,
one person nominated me for that role,
but it’s the wrong role for me right now, right?
I have a lot of respect for the Python developer space
and the Python developers.
I also understand the gap between computer science
Python developers and array programming developers
or science developers.
And in fact, Python succeeds in the array space
the more it has people in that boundary.
And there’s often very few.
Like I was playing a role in that boundary
and working like everything to try to keep up
with even what Guido was saying, like I’m a C programmer,
but not a computer scientist.
Like I was an engineer and physicist and mathematician,
and I didn’t always understand
what they were talking about
and why they would have opinions the way they did.
So, you know, you have to listen and try to understand.
Then you also have to explain your point of view
in a way they can understand.
And that takes a lot of work.
And that communication is always the challenge.
And it’s just what we’re describing here
about the negativity is just another form of that.
Like how do we come together?
And it does appear we’re wired anyway
to at least have a, there’s a part of us
that will enemy, you know, friend, enemy.
And we see, yeah, it’s like,
why are we wiring on the enemy front?
So why are we pushing that?
Why are we promoting that so deeply?
Assume friend until proven otherwise.
Yes, yes.
So, cause you have such a fascinating mind in all of this.
Let me just ask you these questions.
So one interesting side on the Python history
is the move from Python two to Python three.
You mentioned move from Python one to Python two,
but the move from Python two to Python three
is a little bit interesting
because it took a very long time.
It broke, you know, quite a small way
backward compatibility, but even that small way
seemed to have been very painful for people.
Is there lessons you draw?
Oh man, tons of lessons.
From how long it took and how painful it seemed to be?
Yeah, tons of lessons.
Well, I mentioned here earlier
that NumPy was written in 2005.
It was in 2005 that I actually went to Guido
to talk about getting NumPy into Python three.
Like my strategy was to,
oh, we were moving to Python three.
Let’s have that be, and it seems funny in retrospect
because like, wait, Python three,
that was in 2020, right?
When we finally ended the support for Python two
or at least 2017.
The reason it took a long time,
a lot of time, I think it was because one of the things is
there wasn’t much to like about Python three.
3.0, 3.1, it really wasn’t until 3.3.
Like I consider Python 3.3 to be Python 3.0.
But it wasn’t until Python 3.3
that I felt there’s enough stuff in it
to make it worth anybody using it, right?
And then 3.4 started to be, oh yeah, I want that.
And then 3.5 as the matrix multiply operator,
and now it’s like, okay, we gotta use that.
Plus the libraries that started leveraging
some of the features of Python three.
Exactly.
So it really, the challenge was it was,
but it also illustrated a truism that, you know,
when you have inertia,
when you have a group of people using something,
it’s really hard to move them away from it.
You can’t just change the world on them.
And Python three, you know, made some,
I think it fixed some things Guido had always hated.
I don’t think he didn’t like the fact
that print was a statement.
He wanted to make it a function.
But in some sense, that’s a bit of gratuitous change
to the language.
And you could argue, and people have,
but one of the challenges was there wasn’t enough features
and too many just changes without features.
And so the empathy for the end user
as to why they would switch wasn’t there.
I think also it illustrated just the funding realities.
Like Python wasn’t funded.
Like it was also a project
with a bunch of volunteer labor, right?
It had more people, so more volunteer labor,
but it was still, it was fun in the sense
that at least Guido had a job.
And I’ve learned some of the behind the scenes on that now
since talking to people who have lived through it
and maybe not on air, we can talk about some of that.
But it’s interesting to see, but Guido had a job,
but his full time job wasn’t just work on Python.
Like he had other things to do.
Just wild.
It is wild, isn’t it?
It’s wild how few people are funded.
Yes.
And how much impact they have.
Yes.
Maybe that’s a feature not a bug, I don’t know.
Maybe, yes, exactly.
At least early on, like it’s sort of, I know, yeah.
It’s like Olympic athletes are often severely underfunded,
but maybe that’s what brings out the greatness.
Perhaps, yes, correct.
No, exactly.
Maybe this is the essential part of it.
Because I do think about that in terms of,
I currently have an incubator for open source startups.
Like what I’m trying to do right now
is create the environment I wished had existed
when I was leaving academia with NumPy
and trying to figure out what to do.
I’m trying to create those opportunities and environments.
So, and that’s what drives me still,
is how do I make the world easier
for the open source entrepreneur?
So let me stay, I mean, I could probably stay on NumPy
for a long time, but this is fun question.
So Andre Kapathy leads the Tesla Autopilot team,
and he’s also one of the most like legit programmers I know.
It’s like he builds stuff from scratch a lot,
and that’s how he builds intuition about how a problem works.
He just builds it from scratch, and I always love that.
And the primary language he uses is Python
for the intuition building.
But he posted something on Twitter saying
that they got a significant improvement
on some aspect of their like data loading, I think,
by switching away from np.square root,
so the NumPy’s implementation of square root,
to math.square root, and then somebody else commented
that you can get even a much greater improvement
by using the vanilla Python square root, which is like.
Power 0.5.
And it’s fascinating to me, I just wanted to.
So that was some shade throwing at some.
No, no, and yes, we’re talking about.
It’s a good way to ask the trade off
between usability and efficiency broadly in NumPy,
but also on these specific weird quirks
of like a single function.
Yep, so on that point, if you use a NumPy math function
on a scaler, it’s gonna be slower
than using a Python function on that scaler.
But because the math object in NumPy is more complicated,
because you can also call that math object on an array.
And so effectively, it goes through a similar machine.
There aren’t enough of the, which you would do
and you could do like checks and fast paths.
So yeah, if you’re basically doing a list,
if you run over a list, in fact,
for problems that are less than 1,000,
even maybe 10,000 is probably the,
if you’re going more than 10,000,
that’s where you definitely need to be using arrays.
But if you’re less than that, and for reading,
if you’re doing a reading process
and essentially it’s not compute bound, it’s IO bound.
And so you’re really taking lists of 1,000 at a time
and doing work on it.
Yeah, you could be faster just using Python,
straight up Python.
See, but also, and this is the side to the top,
there’s the fundamental questions
when you look at the long arc of history,
it’s very possible that np.square root is much faster.
It could be.
So like in terms of like, don’t worry about it,
it’s the evils of over optimization or whatever,
all the different quotes around that,
is sometimes obsessing about this particular little quark
is not sufficient.
For somebody like, if you’re trying to optimize your path,
I mean, I agree, premature optimization
creates all kinds of challenges, right?
Because now, but you may have to do it.
I believe the quote is, it’s the root of all evil.
It’s the root of all evil, right?
Let’s give Donald Knuth, I think,
or is he more than somebody else?
Well, Doc Knuth is kind of like Mark Twain,
people just attribute stuff to him, I don’t know.
And it’s fine because he’s brilliant.
So, no, I was a LaTeX user myself,
and so I have a lot of respect,
and he did more than that, of course,
but yeah, someone I really appreciate
in the computer science space.
Yeah, I don’t, I think that’s appropriate.
There’s a lot of little things like that,
where people actually, if you understood it,
you go, yeah, of course, that’s the case.
And the other part, the other part I didn’t mention,
and Numba was a thing we wrote early on,
and I was really excited by Numba
because it’s something we wanted,
it was a compiler for Python syntax,
and I wanted it from the beginning of writing NumPy
because of this function question,
like taking, the power of arrays
is really that you can write functions using all of it.
It has implicit looping, right?
So you don’t worry about,
I write this n dimensional for loop
with four loops, four, four statements.
You just say, oh, big four dimensional array,
I’m gonna do this operation, this plus, this minus,
this reduction, and you get this,
it’s called vectorization in other areas,
but you can basically think at a high level
and get massive amounts of computation done
with the added benefit of,
oh, it can be paralyzed easily.
It can be put in parallel.
You don’t have to think about that.
In fact, it’s worse to go decompose your,
you write the for loops
and then try to infer parallelism from for loops.
That’s actually a harder problem
than to take the array problem
and just automatically parallelize that problem.
That’s what, and so functions in NumPy
are called universal functions, ufuncs.
So square root is an example of a ufunk.
There are others, sine, cosine, add, subtract.
In fact, one of the first libraries to SciPy
was something called Special
where I added Bessel functions
and all these special functions that come up in physics
and I added them as ufuncs so they could work on arrays.
So I understood ufuncs very, very well
from day one inside of numeric.
That was one of the things we tried to make better
in NumPy was how do they work?
Can they do broadcasting?
What does broadcasting mean?
But one of the problems is, okay,
what do I do with a Python scaler?
So what happens, the Python scaler gets broadcast
to a zero dimensional array
and then it goes through the whole same machinery
as if it were a 10,000 dimensional array.
And then it kind of unpacks the element
and then does the addition.
That’s not to mention the function it calls
in the case of square root
is just the clib square root, right?
In some cases, like Python’s power,
there’s some optimizations they’re doing
that could be faster
than just calling this the clib square root.
In the interpreter or in the?
No, in the C code, in the Python runtime.
In the Python runtime, so they really optimize it
and they have the freedom to do that
because they don’t have to worry about.
It’s just a scaler.
Right, they don’t have to worry about the fact
that, oh, this could be an object with many pieces.
The ufunc machine is also generic
in sense that typecasting and broadcasting,
broadcasting’s idea of I’m gonna go,
I have a zero dimensional array,
I have a scaler with a four dimensional array
and I add them.
Oh, I have to kind of coerce the shape of this guy
to make it work against the whole four dimensional array.
So it’s the idea of I can do a one dimensional array
against a two dimensional array and have it make sense.
Well, that’s what NumPy does is it challenges you
to reformulate, rethink your problem
as a multi dimensional array problem
versus move away from scalers completely.
Right, exactly, exactly.
In fact, that’s where some of the edge cases boundaries are
is that, well, they’re still there
and this is where array scalers are particular.
So array scalers are particularly bad
in the sense that they were written
so that you could optimize the math on them,
but that hasn’t happened.
And so their default is to coerce the array scaler
to a zero dimensional array
and then use the NumPy machinery.
That’s what, and you could specialize,
but it doesn’t happen all the time.
So in fact, when we first wrote Numba,
we do comparisons and say, look, it’s 1000X speed up.
We were lying a little bit in the sense that,
well, first do the 40X slowdown
of using the array scalers inside of a loop.
Cause if you used to use Python scalers,
you’d already be 10 times faster.
But then we would get a hundred times faster
over that using just compilation.
But what we do is compile the loop
from out of the interpreter to machine code.
And then that’s always been the power of Python
is this extensibility so that you can,
cause people say, oh, Python’s so slow.
Well, sure, if you do all your logic
in the runtime of the Python interpreter, yeah.
But the power is that you don’t have to.
You write all the logic,
what you do in the high level is just high level logic.
And the actual calls you’re making
could be on gigabyte arrays of data.
And that’s all done at compiled speeds.
And the fact that integration is one can happen,
but two is separable.
That’s one of the, the language like Julia says,
we’re going to be all in one.
You can do all of it together.
And then there’s, the jury’s out, is that possible?
I tend to think that you’re going to,
there’s separate concerns there.
You want to precompile.
In fact, generally you will want to precompile your,
some of your loops.
Like SciPy is a compilation step.
To install SciPy, it takes about two hours.
If you have many machines,
maybe you can get it down to one hour.
But to compile those libraries takes about, takes a while.
You don’t want to do that at runtime.
You don’t want to do that all the time.
You want to have this precompiled binary available
that you’re then just linking into.
So there’s real questions about the whole source code.
Code is, running binary code is more than source code.
It’s creating object code, it’s the linker, it’s the loader,
it’s the how does that interpret it
inside of virtual memory space.
There’s a lot of details there that actually
I didn’t understand for a long time
until I read books on the topic.
And it led to, the more you know, the better off you are
and you can do more details,
but sometimes it helps with abstractions too.
Well, the problem, as we mentioned earlier
with abstractions is you kind of sometimes assume
that whoever implemented this thing
had your case in mind and found the optimal solution.
Yes.
Or like you assume certain things.
I mean, there’s a lot of,
Correct.
One of the really powerful things to me early on,
I mean, it sounds silly to say, but with Python,
probably one of the reasons I fell in love with it
is dictionaries.
Yes.
So obviously probably most languages
have some mapping concept,
but it felt like it was a first class citizen
and it was just my brain was able to think in dictionaries.
But then there’s the thing that I guess I still use
to this day is order dictionaries
because that seems like a more natural way
to construct dictionaries.
Yeah.
And from a computer science perspective,
the running time cost is not that significant,
but there’s a lot of things to understand about dictionaries
that the abstraction kind of
doesn’t necessarily incentivize you to understand.
Right, do you really understand the notion of a hash map
and how the dictionary is implemented?
But you’re right.
Dictionaries are a good example
of an abstraction that’s powerful.
And I agree with you.
I agree, I love dictionaries too.
Took me a while to understand that once you do,
you realize, oh, they’re everywhere.
And Python uses them everywhere too.
Like it’s actually constructed,
one of the foundational things is dictionaries
and it does everything with dictionaries.
So it is, it’s powerful.
Order dictionaries came later,
but it is very, very powerful.
It took me a little while coming
from just the array programming entirely
to understand these other objects,
like dictionaries and lists and tuples and binary trees.
Like I said, I wasn’t a computer scientist,
I studied arrays first.
And so I was very array centric.
And you realize, oh, these others
don’t have purposes and value actually.
I agree.
There’s a friendliness about,
like one way to think about arrays
is arrays are just like full of numbers,
but to make them accessible to humans
and make them less error prone to human users,
sometimes you want to attach names,
human interpretable names
that are sticky to those arrays.
So that’s how you start to think about dictionaries
is you start to convert numbers
into something that’s human interpretable.
And that’s actually the tension I’ve had with NumPy
because I’ve built so much tooling
around human interpretability
and also protecting me from a year later
not making the mistakes by being,
I wanted to force myself to use English versus numbers.
Yes, so there’s a project called Labeled Arrays.
Like very early it was recognized that,
oh, we’re indexing NumPy with just numbers,
all the columns and particularly the dimensions.
I mean, if you have an image,
you don’t necessarily need to label each column or row,
but if you have a lot of images
or you have another dimension,
you’d at least like to label the dimension
as this is X, this is Y, this is Z,
or this is give us some human meaning
or some domain specific meaning.
That was one of the impetuses for Pandas actually
was just, oh, we do need to label these things.
And Label Array was an attempt to add
that like a lighter weight version of that.
And there’s been, like, that’s an example of something
I think NumPy could add, could be added to NumPy,
but one of the challenges again, how do you fund this?
Like I said, one of the tragedies I think is that,
so I never had the chance to,
I was never paid to work on NumPy, right?
So I’ve always just done it in my spare time,
always taken from one thing,
taken from another thing to do it.
And at the time, I mean, today,
it would be the wrong day and today,
like paying me to work on NumPy now
would not be a good use of effort,
but we are finally at Quansight Labs,
I’m actually paying people to work on NumPy and SciPy,
which is I’m thrilled with, I’m excited by.
I’ve wanted to do that.
That’s what I always wanted to do from day one.
It just took me a while to figure out a mechanism to do that.
Even like in the university setting,
respecting that, like pushing students,
young minds and young graduate students to contribute
and then figuring out financial mechanisms
that enable them to contribute
and then sort of reward them
for their innovative scientific journey,
that would be nice.
But then also just a better allocation of resources.
It’s 20 year anniversary since 9.11
and I was just looking, we spent over $6 trillion
in the Middle East after 9.11 in the various efforts there.
And sort of to put politics and all that aside,
it’s just, you think about the education system,
all the other ways we could have
possibly allocated that money.
To me, to take it back,
the amount of impact you would have
by allocating a little bit of money to the programmers
that build the tools that run the world is fascinating.
It is.
I don’t know, I think, again,
there is some aspect to being broke
as somewhat of a feature, not a bug,
that you make sure that you’re valued.
But you can still manage that.
Right, no, I know.
But I don’t think that’s a big part.
So it’s like, I think you can have enough money
and actually be wealthy while maintaining your values.
Agreed, agreed.
There’s an old adage that nations that trade together
don’t go to war together.
I’ve often thought about nations that code together.
Yeah, code together.
Right?
I love that.
Because one of the things I love about open source
is it’s global, it’s multinational.
Like there aren’t national boundaries.
One of the challenges with business and open source
is the fact that, well, business is national.
Like businesses are entities
that are recognized in legal jurisdictions, right?
And have laws that are respected in those jurisdictions
and hiring, and yet the open source ecosystem
is not, it’s not there.
Like currently, one of the problems we’re solving
is hiring people all over the world, right?
Because we, it’s a global effort.
And I’ve had the chance to work, and I’ve loved the chance.
I’ve never been to like Iran,
but I once had a conference
where I was able to talk to people there, right?
And talk to folks in Pakistan.
I’ve never been there, but we had a call
where there were people there,
like just scientists and normal people.
And there’s a certain amount of humanizing, right?
That gets away from the,
like we often get the memes of society
that bubble up and get discussed,
but the memes are not even an accurate reflection
of the reality of what people are.
Well, if you look at the major power centers
that are leading to something like cyber war
in the next few decades,
it’s the United States, it’s Russia, and China.
And those three countries in particular
have incredible developers.
So if they work together, I think that’s one way,
the politicians can do their stupid bickering,
but like there’s a layer of infrastructure, of humanity.
If they collaborate together,
that I think can prevent major military conflict,
which would, I think most likely happen at the cyber level
versus the actual hot war level.
You’re right.
You know, I think that’s a good prediction.
Nations that code together don’t go to war together.
Don’t go to war together.
That’s a hope, right?
That’s one of the philosophical hopes, but yeah.
So you mentioned the project of Numba,
which is fascinating.
So from the early days,
there was kind of a pushback on Python that it’s not fast.
You know, you see C plus,
if you wanna write something that’s fast,
you use C plus plus.
If you wanna write something that’s usable and friendly,
but slow, you use Python.
And so what is Numba?
What is its goal?
How does it work?
Great, yeah.
Yes, that’s what the argument.
And the reality was people would write high level coding
and use compiled code,
but there’s still user stories, use cases,
where you want to write Python,
but then have it still be fast.
You still need to write a for loop.
Like before Numba, it was always don’t write a for loop.
You know, write it in a vectorized way,
you know, put it in an array.
And often that can make a memory trade off.
Like quite often you can do it,
but then you make maybe use more memory
because you have to build this array of data
that you don’t necessarily need all the time.
So Numba was, it started from a desire to have
kind of a vectorized that worked.
A vectorized was a tool in NumPy, it was released.
You give it a Python function
and it gave you a universal function,
a ufunc that would work on arrays.
So you get the function that just worked on a scaler.
Like you could make a,
like the classic case was a simple function
that an if then statement in it.
So sine X over X function, sync function.
If X equals zero, return one, otherwise do sine X over X.
The challenge is you don’t want that loop
peg one in Python.
So you want a compiled version of that,
but the ufunc, the vectorized in NumPy
would just give you a Python function.
So it would take the array of numbers
and at every call do a loop back into Python.
So it was very slow.
It gave you the appearance of a ufunc,
but it was very slow.
So I always wanted a vectorized
that would take that Python scaler function
and produce a ufunc working on binary native code.
So in fact, I had somebody work on that with PyPy
and see if PyPy could be used to produce a ufunc like that
early on in 2009 or something like that, 2010.
They didn’t work that well.
It was kind of pretty bulky.
But in 2012, Peter and I had just started Anaconda.
We had, I just, I’d learned to raise money.
That’s a different topic,
but I’d learned to raise money from friends, family,
and fools, as they say.
And.
That’s a good line.
Oh, that’s a good line.
But, so we were trying to do something.
We were trying to change the world.
Peter and I are super ambitious.
We wanted to make array computing
and we had ideas for really what’s still,
it’s still the energy right now.
How do you do at scale data science?
And we had a bunch of ideas there, but one of them,
I had just talked to people about LLVM
and I was like, there’s a way to do this.
I just, I went, I heard about my friend Dave Beasley
at a compiler course.
So I was looking at compilers like,
and I realized, oh, this is what you do.
And so I wrote a version of Numba
that just basically mapped Python bytecode to LLVM.
Nice.
Right, so, and the first version is like, this works
and it produces code that’s fast.
This is cool for, you know,
obviously a reduced subset of Python.
I didn’t support all the Python language.
There had been efforts to speed up Python in the past,
but those efforts were, I would say,
not from the array computing perspective,
not from the perspective of wanting to produce
a vectorized improvement.
They were from the perspective of speeding up
the runtime of Python, which is fundamentally hard
because Python allows for some constructs
that aren’t, you can’t speed up.
Like it’s this generic, you know, when it does this variable.
So I, from the start, did not try to replicate
Python’s semantics entirely.
I said, I’m gonna take a subset of the Python syntax
and let people write syntax in Python,
but it’s kind of a new language really.
So it’s almost like four loops, like focusing on four loops.
Four loops, scalar arithmetic, you know, typed,
you know, really typed language, a typed subset.
That was the key.
So, but we wanted to add inference of types.
So you didn’t have to spell all the types out
because when you call a function,
so Python is typed, it’s just dynamically typed.
So you don’t tell it what the types are,
but when it runs, every time an object runs,
there’s a type for the variables.
You know what it is.
And so that was the design goals of Numba
were to make it possible to write functions
that could be compiled and have them used for NumPy arrays.
Like they needed to support NumPy arrays.
And so how does it work?
Do you add a comment within Python that tells it to do,
like how do you help out the compiler?
Yeah, so there isn’t much actually.
You don’t, it’s kind of magical in the sense
that it just looks at the type of the objects
and then it’s typed inference to determine
any other variables it needs.
And then it was also, because we had a use case
that could work early.
Like one of the challenges of any kind of new development
is if you have something that to make it work,
it was gonna take you a long time,
it’s really hard to get out off the ground.
If you have a project where there’s some incremental story,
it can start working today and solve a problem,
then you can start getting it out there, getting feedback.
Because Numba today, now Numba is nine years old today,
the first two, three versions were not great, right?
But they solved a problem and some people could try it
and we could get some feedback on it.
Not great in that it was very focused.
Very fragile, the subset it would actually compile
was small and so if you wrote Python code
and said, so the way it worked is you write a function
and you say at JIT, use decorators.
So decorators, just these little constructs
let you decorate code with an at and then a name.
The at JIT would take your Python function
and actually just compile it and replace the Python function
with another function that interacts
with this compiled function.
And it would just do that and we went from Python bytecode
then we went to AST.
I mean, writing compilers actually,
I learned a lot about why computer science
is taught the way it is because compilers
can be hard to write.
They use tree structures, they use all the concepts
of computer science that are needed.
It’s actually hard to, it’s easy to write a compiler
and then have it be spaghetti code.
Like the passes become challenging
and we ended up with three versions of Numba, right?
Numba got written three times.
What programming language is Numba written in?
Python.
Wait, okay.
Yeah, Python.
So.
Really?
That’s fascinating.
Yeah, so Python, but then the whole goal of Numba
is to translate Python bytecode to LLVM.
And so LLVM actually does the code generation.
In fact, a lot of times they’d say,
yeah, it’s super easy to write a compiler
if you’re not writing the parser nor the code generator.
Right?
So for people who don’t know, LLVM is a compiler itself.
So your compiler.
Yeah, it’s really badly named low level virtual machine,
which that part of it is not used.
It’s really low level.
Chris, he doesn’t mean that.
Yeah, love Chris.
But the name makes you imply that the virtual machine
is what it’s all about.
It’s actually the IR and the library,
the code generation.
That’s the real beauty of it.
The fact that, what I love about LLVM
was the fact that it was a plateau you could collaborate on.
Right?
Instead of the internals of GCC
or the internals of the Intel compiler,
or like how do I extend that?
And it was a place we could collaborate.
And we were early.
I mean, people had started before.
It’s a slow compiler.
Like it’s not a fast compiler.
So for some kind of JITs,
like JITs are common in language
because one, every browser has a JavaScript JIT.
It does real time compilation
of the JavaScript to machine code.
For people who don’t know, JIT is just in time compilation.
Thank you.
Yeah, just in time compilation.
They’re actually really sophisticated.
In fact, I got jealous of how much effort
was put into the JavaScript JITs.
Yes, well, it’s kind of incredible what they’ve done.
Yes, I completely agree.
I’m very impressed.
But you know, Numba was an effort
to make that happen with Python.
And so we used some of the money
we raised from Anaconda to do it.
And then we also applied for this DARPA grant
and used some of that money to continue the development.
And then we used proceeds from service projects we would do.
We get consulting projects
that we would then use some of the profits
to invest in Numba.
So we ended up with a team of two or three people
working on Numba.
It was a fits and starts, right?
And ultimately, the fact that we had a commercial version
of it also we were writing.
So part of the way I was trying to fund Numba,
say, well, let’s do the free Numba
and then we’ll have a commercial version of Numba
called Numba Pro.
And what Numba Pro did is it targeted GPUs.
So we had the very first CUDA JIT
and the very first at JIT compiler that in 2012 for 13,
you could run not just a view func on CPU,
but a view func on GPUs.
And it would automatically paralyze it
and get 1000X speed on it.
And that’s an interesting funding mechanism
because large companies or larger companies
care about speed in just this way.
So it’s exactly a really good way.
Yeah, there’s been a couple of things
you know people will pay for.
One, they’ll pay for really good user interfaces, right?
And so I’m always looking for what are the things
people will pay for that you could actually adapt
to the open source infrastructure?
One is definitely user interfaces.
The second is speed, like a better runtime, faster runtime.
And then when you say people,
you mean like a small number of people pay a lot of money,
but then there’s also this other mechanism that.
That’s true.
A ton of people pay.
That’s true.
A little bit.
First, I gotta, we mentioned Anaconda,
we mentioned friends, family, and fools.
So Anaconda is yet another.
So there’s a company, but there’s also a project.
Correct.
That is exceptionally impactful in terms of,
for many reasons, but one of which is bringing
a lot more people into the community
of folks who use Python.
So what is Anaconda?
What is its goals?
Maybe what is Conda versus Anaconda?
Yeah, I’ll tell you a little bit of the history of that.
Cause Anaconda, we wanted to do,
we wanted to scale Python.
Cause we, you know, that was the goal.
Peter and I had the goal of when we started Anaconda,
we actually started as Continuum Analytics
was the name of the company that started.
It got renamed Anaconda in 2015.
But we said, we want to scale analytics.
NumPy is great, Pandas is emerging,
but these need to run at scale with lots of machines.
The other thing we wanted to do was make user interfaces
that were web.
We wanted to make sure the web did not pass
by the Python community.
That we had ways to translate your data science to the web.
So those are the two kind of technical areas.
We thought, oh, we’ll build products in this space.
And that was the idea.
Very quickly in, but of course,
the thing I knew how to do was to do consulting
to make money and to make sure my family and friends
and fools that had invested didn’t lose their money.
So it’s a little different
than if you take money from a venture fund.
If you take money from a venture fund,
the venture fund, they want you to go big or go home.
And they’re kind of like expecting nine out of 10 to fail
or 99 out of 100 to fail.
It’s different.
I was, I was owed a barbell strategy.
I was like, I can’t fail.
I mean, I may not do super well,
but I cannot lose their money.
So I’m going to do something I know can return a profit,
but I want to have exposure to an upside.
So that’s what happened at Anaconda.
We didn’t, there was lots of things we did not well
in terms of that structure.
And I’ve learned from since and how to do it better.
But we’ve, we did a really good job
of kind of attracting the interest around the area
to get good people working
and then get funnel some money
on some interesting projects.
Super excited about what came out of our energy there.
Like a lot did.
So what are some of the interesting projects?
So Dask, Numba, Bokeh, Conda.
There was a data shader, Panel, Holoviz.
These are all tools that are extremely relevant
in terms of helping you build applications,
build tools, build, you know, faster code.
There’s a couple I’m forgetting.
Oh, JupyterLab, JupyterLab came out of this too.
And yeah.
Okay, so Bokeh does plotting?
Is that?
Bokeh does plotting.
So Bokeh was one of the foundational things to say,
I want to do plot in Python,
but have the things show up in a web.
Right, that’s right.
That’s right, that’s right.
And plotting to me still,
with all due respect to Matplotlib and Bokeh,
it feels like still an unsolved problem,
not a solved problem.
It is, it’s a big problem.
Right, because you’re, I mean, I don’t know,
it’s visualization broadly, right?
I think we’ve got a pretty good API story
around certain use cases of plotting.
But there’s a difference between static plots
versus interactive plots versus I’m an end user,
I just want to write a simple,
for Pandas started the idea of here’s a data frame
on a dot plot, I’m just going to attach plot
as a method to my object,
which was a little bit controversial, right?
But works pretty well, actually,
because there’s a lot less you have to pass in, right?
You can just say, here’s my object, you know what you are,
you tell the visualization what to do.
So that, and there’s things like that
that have not been super well developed entirely,
but Bokeh was focused on interactive plotting.
So you could, it’s a short path
between interactive plotting and application,
dashboard application.
And there’s some incredible work that got done there, right?
And it was a hard project,
because then you’re basically doing JavaScript and Python.
So we wanted to tackle some of these hard problems
and try to just go after them.
We got some DARPA funding to help,
and it was super helpful, funny story there,
we actually did two DARPA proposals,
but one we were five minutes late for.
And DARPA has a very strict cutoff window.
And so I, we had two proposals,
one for the Bokeh and one for actually Numba
and the other work.
Which one were you late for?
The Foundation on Numerical Work.
So Bokeh got funded. Oh no.
Fortunately, Chris let us use some of the money to fund
still some of the other foundational work,
but it wasn’t as, yeah, his hands were tired,
he couldn’t do anything about it.
That was a whole interesting story.
So one of the incredible projects
that you worked on is Conda.
Yes.
So what is Conda? So how that came about,
yeah, Conda, it was early on, like I said, with SciPy.
SciPy was a distribution mass generation library.
And he said, he heard me talking about compiler issues
and trying to get the stuff shipped
and the fact that people can use your libraries
if they have it.
So for a long time,
we’d understood the packaging problem in Python.
And one of the first things he did at Conda Analytics
became Anaconda was organize the Pi data ecosystem
in conjunction with NumFocus.
We actually started NumFocus
with some other folks in the community
the same year we started Anaconda.
I said, we’re gonna build a corporation,
but we’re also gonna reify the community aspect
and build a nonprofit.
So we did both of those.
Can we pause real quick and can you say what is PyPy,
the Python package index,
like this whole story of packaging in Python?
Yeah, that’s what I’m gonna get to actually.
This is exactly the journey I’m on.
It’s to sort of explain packaging in Python.
I think it’s best expressed to the conversation
I had with Guido at a conference,
where I said, so packaging is kind of a problem.
And Guido said, I don’t ever care about packaging.
I don’t use it.
I don’t install new libraries.
I’m like, I guess if you’re the language creator
and if you need something, you just put it in the distribution
maybe you don’t worry about packaging.
But Guido has never really cared about packaging, right?
And never really cared about the problem of distribution.
It’s somebody else’s problem.
And that’s a fair position to take, I think,
as a language creator.
In fact, there’s a philosophical question about
should you have different development packaging managers?
Should you have a package manager per language?
Is that really the right approach?
I think there are some answers of
it is appropriate to have development tools.
And there’s an aspect of a development tool
that is related to packaging.
And every language should have some story there
to help their developers create.
So you should have language specific development tools.
Development tools that relate to package managers.
But then there’s a very specific user story
around package management
that those language specific package managers
have to interact with.
And currently aren’t doing a good job of that.
That was one of the challenges
that not seeing that difference,
and it still exists in the difference today.
Conda always was a user.
I’m gonna use Python to do data science.
I’m gonna use Python to do something.
How do I get this installed?
It was always focused on that.
So it didn’t have a develop.
Classic example is pip has a pip develop.
It’s like, I wanna install this
into my current development environment today.
Conda doesn’t have that concept
because it’s not part of the story.
For people who don’t know,
pip is a Python specific package manager.
That’s exceptionally popular.
That’s probably like the default thing you’ve learned.
It’s the default user.
And so the story there emerged
because what happened is in 2012,
we had this meeting at the Googleplex
and Guido was there to come talk about what we’re gonna do,
how we’re gonna make things work better.
And Wes McKinney, me, Peter,
Peter has a great photo of me talking to Guido
and he pretends we’re talking about this story.
Maybe we were, maybe we weren’t.
But we did at that meeting talk about it
and asked Guido, we need to fix packaging in Python.
People can’t get the stuff.
And he said, go fix it yourself.
I don’t think we’re gonna do it.
All right.
The origin story right there.
All right, you said, okay, you said to do this ourselves.
So at the same time,
people did start to work on the packaging story in Python.
It just took a little longer.
So in 2012, kind of motivated
by our training courses we were teaching,
like very similar to what you just mentioned
about your mother.
Like it was motivated by the same purpose.
Like how do we get this into people’s hands?
It’s this big, long process.
It takes too expensive.
It was actually hurting NumPy development
because I would hear people were saying,
don’t make that change to NumPy
because I just spent a week getting my Python environment.
And if you change NumPy, I have to reinstall everything.
And reinstalling is such a pain, don’t do it.
I’m like, wait, okay.
So now we’re not making changes to a library
because of the installation problem
that it’ll cause for end users.
Okay, there’s a problem with installation.
We gotta fix this.
So we said, we’re gonna make a distribution in Python.
And we’d previously done that.
I’d previously done that at mthought.
I wanted to make one that would give away for free,
that everyone could just get.
Like that was critical that we could just get it.
It wasn’t tied to a product.
It was just you could get it.
And then we had constantly thought about,
well, do we just leverage RPM?
But the challenge had always been,
we want a package manager that works on Windows,
Mac OS X, and Linux the same, right?
And it wasn’t there.
Like you don’t have anything like that.
You have…
And for people who don’t know,
RPM is an operating system specific package manager.
Correct, it’s an operating specific.
Yes, exactly.
So do you create the design questions,
do you create an umbrella package manager
that works across operating systems?
Yes, that was the decision.
And in neighboring design questions,
do you also create a package manager
that spans multiple programming languages?
Correct, exactly.
That was the world we faced.
And we decided to go multiple operating systems,
multiple and programming language independent.
Because even Python, and particularly what was important
was SciPy has a bunch of Fortran in it, right?
And scikit learn has links to a bunch of C++.
There’s a lot of compiled code.
And the Python package managers, especially early on,
didn’t even support that.
So in 2000, so we released Anaconda,
which was just a distribution of libraries,
but we started to work on Conda in 2012.
First version of Conda came out in early 2013,
summer of 2013, and it was a package manager.
So you could say, Conda install scikit learn.
In fact, scikit learn was a fantastic project that emerged.
It was the classic example of the scikits.
I talked to you earlier about SciPy being too big
to be a single library.
Well, what the community had done is said,
let’s make scikits.
And there’s scikit image, there’s scikit learn,
there’s a lot of scikits.
And it was a fantastic move that the community did.
I didn’t do it.
I was like, okay, that’s a good idea.
I didn’t like the name.
I didn’t like the fact you typed scikit image.
I was like, that’s gotta be simpler.
That’s scikit learn, we gotta make that smaller.
I don’t like typing all this stuff from imports.
So I was kind of a pressure that way,
but I love the energy and love the fact
that they went out and they did it,
and DOS people, Jared Millman, and then of course, Gael,
and there’s people I’m not even naming.
Scikit learn really emerged as a fantastic project.
And the documentation around that is also incredible.
And the documentation was incredible, exactly.
I don’t know who did that, but they did a great job.
A lot of people in Inria, a lot of European contributors.
There’s some Andreas in the US.
There’s a lot of just people I just adore,
I think are amazing people.
Awesome use of SciPy, right?
I love the fact that they were using SciPy effectively
to do something I love, which is machine learning,
but couldn’t install it.
Because there’s so many pieces involved.
So many dependencies, right?
So our use case of Conda was Conda install scikit learn.
Right, and it was the best way to install scikit learn
in 2013 to really 2018, 17, 18, PIP finally caught up.
I still think it’s you should Conda install scikit learn
for the PIP install scikit learn,
but you can PIP install scikit learn.
The issue is the package they created was wheels
and PIP does not handle the multi vendor approach.
They don’t handle the fact you have C++ libraries
you’re depending on.
They just stop at the Python boundary.
And so what you have to do in the wheel world
is you have to vendor.
You have to take all of the binary and vendor it.
Now, if your change happens in underlying dependency,
you have to redo the whole wheel.
So TensorFlow, as you know,
you should not PIP install TensorFlow.
It’s a terrible idea.
People do it because the popularity of PIP,
many people think, oh, of course,
that’s how I install everything in Python.
Yeah, this is one of the big challenges.
You take a GitHub repository or just a basic blog post.
The number of time PIP is mentioned over Conda
is like 100 X to one.
Correct, correct.
So it just has to do with the.
And that was increasing.
It wasn’t true early because PIP didn’t exist.
Like Conda came first.
So but that’s the problem.
Like Conda came first, but that’s like the long tail
of the internet documentation user generated.
So that like you think, how do I install Google?
How do I install TensorFlow?
You’re just not gonna see Conda in that first page.
Correct, exactly.
And that.
Not today, you would have in 2016, 2017.
And it’s sad because Conda solves
a lot of usability issues.
Correct.
Like for especially super challenging thing.
I don’t know.
One of the big pain points for me was
just on the computer vision side, OpenCV installation.
Perfect example.
Yes.
I think Conda, I don’t know if Conda solved that one.
Conda has an OpenCV package.
I don’t know.
I certainly know PIP has not solved.
I mean, there’s complexities there because.
Right.
I actually don’t know.
I should probably know a good answer for this,
but if you compile OpenCV with certain dependencies,
you’ll be able to do certain things.
So there’s this kind of flexibility of what you,
like what options you compile with.
Yes.
And I don’t think it’s trivial to do that with Conda or.
So Conda has a notion of variance of a package.
You can actually have different compilation versions
of a package.
So not just the version is different,
but oh, this is compiled with these optimizations on.
So Conda does have an answer.
Has those flavors.
Has flavors, basically.
Well, PIP, as far as I know, does not have flavors.
No, no.
PIP generally hasn’t thought deeply
about the binary dependency problem, right?
And that’s why fundamentally it doesn’t work
for the SciPy ecosystem.
It barely, you can sort of paper over it and duct tape
and it kind of works until it doesn’t
and it falls apart entirely.
So it’s been a mixed bag.
Like, and I’ve been having lots of conversations
with people over the years because again,
it’s an area where if you understand some things,
but not all the things,
but they’ve done a great job of community appeal.
This is an area where I think Anaconda as a company
needed to do some things
in order to make Conda more community centric, right?
And this is a, I talk about this all the time.
There’s a balance between you have every project starts
with what I called company backed open source.
Even if the company is yourself, it’s just one person,
just doing business as.
But ultimately for products to succeed virally
and become massive influencers,
they have to create,
they have to get community people on board.
They have to get other people on board.
So it has to become community driven.
And a big part of that is engagement with those people.
Empowering people, governance around it.
And what happened with Conda in the early days,
PIP emerged and we did do some good things.
Conda Forge, Conda Forge community
is sort of the community recipe creation community.
But Conda itself, I still believe,
and Peter is CEO of Anaconda, he’s my co founder.
I ran Anaconda until 2017, 2018.
Is Peter still Anaconda?
Peter’s still Anaconda, right?
And we’re still great friends.
We talk all the time.
I love him to death.
There’s a long story there about like why and how
and we can cover in some other podcast perhaps.
Yeah.
It’s sort of a more, maybe a more business focused one.
But this is one area where I think Conda
should be more community driven.
Like he should be pushing more
to get more community contributors to Conda
and let the, Anaconda shouldn’t be fighting this battle.
Yeah.
Right?
It’s actually, it’s really a developers.
Like you said, like help the developers
and then they’ll actually move us the right direction.
Well, that was the problem I have is many
of the cool kids I know don’t use Conda.
And that to me is confusing.
It is confusing.
It’s really a matter of, Conda has some challenges.
First of all, Conda still needs to be improved.
There’s lots of improvements to be made.
And it’s that aspect of wait, who’s doing this?
And the fact that then the Pi PA really stepped up.
Like they were not solving the problem at all.
And now they kind of got to where they’re solving it
for the most part.
And then effectively you could get,
like Conda solved a problem that was there.
And it still does.
It’s still, you know, there’s still great things it can do.
But, and we still use it all the time at one site
and with other clients, but with,
but you can kind of do similar things with PIP and Docker.
Right?
So especially with the web development community,
that part of it, again, is this is the,
there’s a lot of different kinds of developers
in the Python ecosystem.
And there’s still a lack of some clear understanding.
I go to the Python conference all the time
and then there’s only a few people in the Pi PA who get it.
And then others who are just massively trumpeting
the power of PIP, but just do not understand the problem.
Yeah.
So one of the obvious things to me from a mom,
from a non programmer perspective,
is the across operating system usability.
That’s much more natural.
So there’s people that use Windows and just,
it seems much easier to recommend Conda there,
but then it, you should also recommend it across the board.
So I’ll definitely sort of.
But what I recommend now is a hybrid.
I do.
I mean, I have no problem.
Is it possible to use?
Oh, it is.
It is.
But like build the environment with PIP, with Conda,
build an environment with Conda
and then PIP install on top of that.
That’s fine.
Be careful about PIP installing OpenCV or TensorFlow
or because if somebody’s allowed that,
it’s gonna be most surely done in a way
that can’t be updated that easily.
So install like the big packages,
the infrastructure with Conda and then the weirdos.
Yeah.
That like the weird like implementation for some.
I had a, there’s a cool library I used
that based on your location and time of day and date
tells you the exact position of the sun
relative to the earth.
And it’s just like a simple library,
but it’s very precise.
And I was like, all right.
But that was, that was, and it’s like PIP.
Well, the thing they did really well is Python developers
who wanna get their stuff published,
you have to have a PIP recipe.
Yeah.
Right?
I mean, even if it’s, you know, the challenge is,
and there’s a key thing that needs to be added to PIP,
just simply add to PIP the ability to defer
to a system package manager.
Like, cause it’s, you know,
recognize you’re not gonna solve all the dependency problem.
So let like give up and allow the system package to work.
That way Anaconda is installed and it has PIP.
It would default to Conda to install stuff,
but Red Hat RPM would default to RPM
to install some more things.
Like that’s the, that’s a key, not difficult,
but somewhat work, some work feature needs to be added.
That’s an example of something like,
I’ve known we need to do it.
I mean, it’s where I wish I had more money.
I wish I was more successful in the business side,
trying to get there, but I wish my, you know,
my family, friends and full community that I know.
Was larger.
Was larger and had more money.
Cause I know tons of things to do effectively
with more resources, but you know,
I have not yet been successful at channel.
Tons of, you know, some, you know,
I’m happy with what we’ve done.
We created again at Quansight,
what we created to get Anaconda started.
We created community to get Anaconda started.
Done it again with Quansight.
Super excited by that.
But it took three years to do it.
What is Quansight?
What is its mission?
We’ve talked a few times about different fascinating
aspects of it, but let’s like big picture,
what is Quansight?
Big picture Quansight.
Quansight is, its mission is to connect data
to an open economy.
So it’s basically consulting of the pie data ecosystem,
right?
It’s a consulting company.
And what I’ve said when I started it was we’re trying
to create products, people, and technology.
So it’s divided into two groups.
And a third one as well.
The two groups are a consulting services company
that just helps people do data science
and data engineering and data management better
and more efficiently.
Like full stack, like full thing.
Full stack data science, full thing.
We’ll help you build a infrastructure.
If you’re using Jupiter, we need,
we do staff augmentation, need more pro programmers,
help you use Dask more effectively,
help you use GPUs more effectively.
Just basically a lot of people need help.
So we do training as well to help people, you know,
both immediate help and then get, learn from somebody.
We’ve added a bunch of stuff too.
We’ve kind of separated some of these other things
into another company called Open Teams
that we currently started.
One of the things I loved about what we did at Anaconda
was creating a community innovation team.
And so I wanted to replicate that.
This time we did a lot of innovation at Anaconda.
I wanted to do innovation,
but also contribute to the projects that existed,
like create a place where maintainers,
so the SciPy and NumPy and Numba
and all these projects we already started
can pay people to work on them and keep them going.
So that’s Labs.
Quansight Labs is a separate organization.
It’s a nonprofit mission.
The profits of Quansight help fund it.
And in fact, every project that we have at Quansight,
a portion of the money goes directly to Quansight Labs
to help keep it funded.
So we’ve gotten several mechanisms
that we keep Quansight Labs funded.
And currently, so I’m really excited about Labs
because it’s been a mission for a long time.
What kind of projects are within Labs?
So Labs is working to make the software better,
like make NumPy better, make SciPy better.
It only works on open source.
So if somebody wants to, so companies do,
we have a thing called a community work order, we call it.
If a company says, I wanna make Spyder better.
Okay, cool.
You can pay for a month of a developer of Spyder
or a developer of NumPy or a developer of SciPy.
You can’t tell them what you want them to do.
You can give them your priorities and things you wish existed
and they’ll work on those priorities with the community
to get what the community wants
and what emerges of what the community wants.
Is there some aspect on the consulting side
that is helping, as we were talking about morphology
and so on, is there specific application
that are particularly like driving,
sort of inspiring the need for updates to SciPy?
Correct, absolutely, absolutely.
GPUs are absolutely one of them.
And new hardware beyond GPUs.
I mean, Tesla’s Dojo chip, I’m hoping we’ll have a chance
to work on that perhaps.
Things like that are definitely driving it.
The other thing that’s driving it is scalable,
like speed and scale.
How do I write NumPy code or NumPy Lite code
if I want it to run across a cluster?
That’s Dask or maybe it’s Ray.
I mean, there’s sort of ways to do that now.
Or there’s Moden and there’s, so Pandas code,
NumPy code, SciPy code, Scikit learn code
that I want to scale.
So that’s one big area.
Have you gotten a chance to chat with Andre and Elon
about particular, because like.
No, I would love to, by the way.
I have not, but I’d love to.
I just saw their Tesla AI Days video.
Super excited.
That’s one of the, you know, I love great engineering,
software engineering teams and engineering teams in general.
And they’re doing a lot of incredible stuff with Python.
They’re like revolutionary.
So many aspects of the machine learning pipeline.
I agree.
That’s operating in the real world.
And so much of that is Python.
Like you said, the guy running, you know, Andre Kapathy,
running Autopilot is tweeting about optimization
of NumPy versus.
I would love to talk to him.
In fact, we have at Quonset, we’ve been fortunate enough
to work with Facebook on PyTorch directly.
So we have about 13 developers at Quonset.
Some of them are in labs working directly on PyTorch.
On PyTorch.
On PyTorch, right.
So I basically started Quonset.
I went to both TensorFlow and PyTorch and said,
hey, I want to help connect what you’re doing
to the broader SciPy ecosystem.
Because I see what you’re doing.
We have this bigger mission that we want to make sure
we don’t, you know, lose energy here.
So, and Facebook responded really positively
and I didn’t get the same reaction.
Not yet, not yet.
Not yet.
So I really love the folks at TensorFlow, too.
They’re fantastic.
I think it’s the, just how it integrates
with their business.
I mean, like I said, there’s a lot of reasons.
Just the timing, the integration with their business,
what they’re looking for.
They’re probably looking for more users.
And I was looking to kind of cut up some development effort
and they couldn’t receive that as easily, I think.
So I’m hoping, I’m really hopeful
and love the people there.
What’s the idea behind OpenTeams?
So OpenTeams, I’m super excited about OpenTeams
because it’s one of the,
I mentioned my idea for investing directly in open source.
So that’s a concept called fair OSS.
But one of the things we, when we started Quansight,
we knew we would do is we develop products and ideas
and new companies might come out.
At Anaconda, this was clear, right?
Anaconda, we did so much innovation
that like five or six companies could have come out of that.
And we just didn’t structure it so they could.
But in fact, they have, you look at Dask,
there’s two companies going out of Dask.
You know, Bokeh could be a company.
There’s like lots of companies that could exist
off the work we did there.
And so I thought, oh, here’s a recipe for an incubation,
a concept that we could actually spawn new companies
and new innovations.
And then the idea has always been,
well, money they earn should come back
to fund the open source projects.
So labs is, you know, I think there should be
a lot of things like Quansight Labs.
I think this concept is one that scales.
You could have a lot of open source research labs.
Along the way, so in 2018, when the bigger idea came,
how to make open source investable, I said,
oh, I need to write, I need to create a venture fund.
So we created a venture fund called Quansight Initiate
at the same time.
It’s an angel fund, really.
It’s, you know, we started to learn that process.
How do we actually do this?
How do we get LPs?
How do we actually go in this direction and build a fund?
And I’m like, every venture fund should have
an associated open source research lab,
which is no reason.
Like our venture fund, the carried interest,
a portion of it goes to the lab.
It directly will fund the lab.
That’s fascinating, brother.
So you use the power of the organic formation of teams
in the open source community, and then like naturally,
that leads to a business that can make money.
Yeah, correct.
And then it always maintains and loops back
to the open source.
Loops back to open source, exactly.
I mean, to me, it’s a natural fit.
There’s something, there’s absolutely
a repeatable pattern there, and it’s also beneficial
because, oh, I have, I have natural connections
to the open source if I have an open source research lab.
Like, they’ll always, they’ll be out there
talking to people, and so we’ve had a chance
to talk to a lot of early stage companies.
And we, and our fund focuses on the early stage.
So Quansight has the services, the lab, the fund, right?
In that process, a lot of stuff started to happen.
They’re like, oh, you know, we started to do recruiting
and support and training, and I was starting
to build a bigger sales team and marketing team
and people besides just developers.
And one of the challenges with that
is you end up with different cultural aspects.
You know, developers, you know, there’s a,
in any company you go to, you kind of go look,
is this a business led company, a developer led company?
Do they kind of coexist?
Are they, what’s the interface between them?
There’s always a bit of a tension there.
Like we were talking about before.
You know, what is the tension there?
With OpenTeams, I thought, wait a minute,
we can actually just create,
like this concept of Quansight plus labs,
it’s, well, it’s specific to the Pi data ecosystem.
The concept is general for all open source.
So OpenTeams emerged as a, oh,
we can create a business development company
for many, many Quansights, like thousands of Quansights.
And it can be a marketplace to connect,
essentially be the enterprise software company
of the future.
If you look at what enterprise software wants
from the customer side, and during this journey,
I’ve had the chance to work and sell to lots of companies,
Exxon and Shell and Davey Morgan Bank of America,
like the Fortune 100,
and talk to a lot of people in procurement
and see what are they buying and why are they buying?
So, you know, I don’t know everything,
but I’ve learned a lot about,
oh, what are they really looking for?
And they’re looking for solutions.
They’re constantly given products
from enterprise software.
Here’s open source, leave the enterprise software,
now I buy it.
And then they have to stitch it together into a solution.
Open source is fantastic for gluing
those solutions together.
So, whereas they keep getting new platforms
they’re trying to buy,
but most open source, what most enterprises want
is tools that they can customize
that are as inexpensive as they can.
Yeah, and so you always want to maintain
the connection to the open source
because that’s going to be the tools.
Yes, so open teams is about solving
enterprise software problems.
Brilliant, brilliant idea, by the way.
With a connect, but we do it honoring the topology.
We don’t hire all the people.
We are a network connecting the sales energy
and the procurement energy,
and we work on the business side,
get the deals closed,
and then have a network of partners
like Quonsight and others who we hand the deals to,
to actually do the work.
And then we have to maintain,
I feel like we have to maintain
some level of quality control
so that the client can rely on open teams
to ensure the delivery.
It’s not just, here’s a lead, go figure that out.
But no, we’re going to make sure you get what you need.
By the way, it’s such a skill,
and I don’t know if I have the patience.
I will have the patience to talk to the business people
or more specific, I mean,
there’s all kinds of flavors of business people
or like marketing people.
There’s a challenge.
I hear what you’re saying
because I’ve had the same challenge.
And it’s true.
There’s sometimes you think, okay, this is way overwrought.
Yeah, but you have to become an adult
and you have to, because the companies have needs.
They have ways to make money
and they also want to learn and grow,
and it’s your job to kind of educate them on the best way,
like the value of open source, for example.
Right, and I’m really grateful for all my experiences
over the past 14 years, understanding that side of it
and still learning for sure,
but not just understanding from companies,
but also dealing with marketing professionals
and sales professionals
and people that make a career out of that
and understanding what they’re thinking about
and also understanding, well, let’s make this better.
We can really make a place.
Open teams I see as the transmission layer
between companies and open source communities
producing enterprise software solutions.
Eventually we want to,
today we’re taking on SaaS and MATLAB
and tools that we know we can replace for folks.
Really, anytime you have a software tool at an organization
where you have to do a lot of customization
to make it work for you.
It’s not you’re just buying this thing off the shelf
and it works.
It’s like, okay, you buy this system
and then you customize it a lot,
usually with expensive consultants
to actually make it work for you.
All of those should be replaced by open source foundations
with the same customization.
You’re doing such important work,
such important work in these giant organizations
that do exactly that,
taking some proprietary software
and hiring a huge team of consultants
that customize it and then that whole thing
gets outdated quick.
Correct.
And so, I mean, that’s brilliant.
So the one solution to that
is kind of what Tesla’s doing a little bit of,
which is basically build up a software engineering team.
Like build a team from scratch.
Build a team from scratch.
And companies are doing it well,
that’s what they’re doing right now.
Yeah, exactly.
And that’s okay.
And you’re creating a topology for some of that.
You’re right.
You just don’t have to do it.
That’s not the only answer, right?
And so other companies can access this,
be more accessible.
We literally say,
open team is the future of enterprise software.
We’re still early.
Like this idea just percolated over the past year
as we’ve kind of grown Quansight
and realized the extensibility of it.
We just finished in our seed round
to help get more sales people
and then push the messaging correctly.
And there’s lots of tools we’re building
to make this easier.
Like we wanna automate the processes.
We feel like a lot of the power
is the efficiency of the sales process.
There’s a lot of wasted energy in small teams
and the sales energy to get into large companies
and make a deal.
There’s a lot of money spent on that process.
Creating the tools and processes for that sales.
So make that super seamless.
So a single company can go,
oh, I’ve got my contract with open teams.
We’ve got a subscription they can get.
They can make that procurement seamless.
And then the fact they have access
to the entire open source ecosystem.
And we have a part of our work
that’s embracing open source ecosystems
and making sure we’re doing things useful for them
or serving them.
And then companies making sure
they’re getting solutions they care about.
And then figuring out which targets we have.
We’re not taking on all of open source,
all of enterprise software yet.
But we’re step by step.
Well this feels like the future.
The idea and the vision is brilliant.
Can I ask you, why do you think Microsoft bought GitHub
and what do you think is the future of GitHub?
Great point.
I thought it was a brilliant move.
I think they did because Microsoft has always
had a developer centric culture.
Like they always have.
Like one of the things Microsoft’s always done well
is understand that their power is the developers.
It’s been, Ballmer didn’t necessarily make a good meme
about how he approached that.
But they’re broadening that.
I think that’s why.
Because they recognize GitHub is where developers are at.
Right?
And so.
But do they have a vision like open teams
type of situation, right?
I don’t think so yet.
Are they just basically throwing money at developers
to show their support?
I think so.
Without a topology like you put it.
Like a way to leverage that.
Like to give developers actual money.
Right.
I don’t think so.
They’re still, it’s an enterprise software company.
And they make a bunch of money.
They make a bunch of games.
They’re a big company.
They sell products.
I think part of it is they know there’s opportunity
to make money from GitHub.
Right?
There’s definitely a business there.
You know, to sell to developers.
Or to sell to people using development.
I think there’s part of that.
I think part of it is also there’s,
they had definitely wanted to recognize
that you need to value open source
to get great developers.
Which is an important concept that was emerging
over the past 10 years.
That, you know, pay at Pi Data.
We were able to convince J.P. Morgan
to support Pi Data because of that fact.
Right?
That was where the money for them putting
a couple hundred thousand into supporting Pi Data
for several conferences was they want developers.
And they realized that developers want
to participate in open source.
So enterprise software folks don’t always understand
how their software gets used.
Having spent a lot of time on the floors
at J.P. Morgan, at InShell, at ExxonMobil,
you see, oh, these companies have large development teams.
And then they’re kind of dealing with
what’s being delivered to them.
So I really feel kind of a privilege
that I had a chance to learn some of these people
and see what they’re doing.
And even work alongside them, you know,
as a consultant, using open source and trying to figure,
how do we make this work inside of our large organization?
Some of it is actually, for a large organization,
some of it is messaging to the world
that you care about developers
and you’re the cool, you care.
Like, for example, like if Ford,
cause I talked to them, like car companies, right?
They want to attract, you know,
you want to take on Tesla and autopilot.
You want to take on, right?
And so what do you do there?
You show that you’re cool.
Like you try to show off that you care about developers
and they have a lot of trouble doing that.
And like one way, I think like Ford should have bought GitHub.
They just to show off, like these old school companies
and it’s in a lot of different industries.
There’s probably different ways.
It’s probably an art show that you care to developers.
And the developers, it’s exactly what you, like,
for example, just spit balling here,
but like Ford or somebody like that
could give a hundred million dollars
to the development of NumPy.
And like literally look at like the top most popular projects
in Python and just say, we’re just going to give money.
Like that’s going to immediately make you cool.
They could actually, yeah.
And in fact, they set up NumFocus to make it easy.
But the challenge was,
is also you have to have some business development.
Like it’s a bit of a seeding problem, right?
And you look at how,
I’ve talked to the folks at Linux Foundation,
know how they’re doing it.
I know how, and starting NumFocus,
because we had two babies in 2012.
One was Anaconda, one was NumFocus, right?
And they were both important efforts.
They had distinct journeys
and super grateful that both existed
and still grateful both exist.
But there’s different energies in getting donations
as there is getting, this is important to my business.
Like I’m selling you something that this is a,
I’m going to make money this way.
Like if you can tie it,
if you can tie the message to an ROI for the company,
it becomes a brainer.
That’s more effective.
It’s much more effective, right?
So, and there are rational arguments to make.
I’ve tried to have conversations with marketing,
especially marketing departments.
Like very early on, it was clear to me that,
oh, you could just take a fraction of your marketing budget
and just spend it on open source development.
And you get better results from your marketing.
Like, because.
How did those, can I, sorry,
I’m going to try not to go and rants here.
What have you learned from the interaction
with the marketing folks on that kind of,
because you gave a great example
of something that will obviously be much better investment
in terms of marketing is supporting open source projects.
The challenge is not dissimilar
from the challenge you have in academia
or the different colleges, right?
Knowledge gets very specific and very channeled, right?
And so people get,
they get a lot of learning in the thing they know about.
And it’s hard then to bridge that
and to get them to think differently enough
to have a sense that you might have something to offer
because it’s different.
It’s like, well, how do I implement that?
How do I, what do I do with that?
Like, do I, which budget do I take from?
Do I slow down my spend on Google ads
or my spend on Facebook ads?
Or do I not hire a content creator and say like,
there’s an operational aspect to that,
that you have to be the CMO, right?
Or the CEO, you have to get the right level.
So you’ll have to hire at a high position level
where they care about this and this.
Right, or they won’t know how, right?
And because you can also do it very clumsily, right?
And I’ve seen it, cause you can,
you absolutely have to honor and recognize
the people you’re going to and the fact
that if you just throw money at them,
it could actually create more problems.
Can I just say, this is not you saying, can I just,
cause I just need, I need to say this.
I’ve been very surprised how often marketing people
are terrible at marketing.
I feel like the best marketing is doing something novel
and unique that anticipates the future.
It feels like so much of the marketing practice
is like what they took in school,
or maybe they’re studying for what was the best thing
that was done in the past decade,
and they’re just repeating that over and over,
as opposed to innovating, like taking the risk.
To me, marketing.
That’s a great point.
Is taking the big risk.
That’s a great point.
And being the first one to risk.
Yeah, there’s an aspect of data observation
from that risk, right?
That’s, I think, shared what they’re doing already.
But it absolutely, it’s about, I think it’s content.
Like there’s this whole world on content marketing
that you could almost say, well, yeah, it can get over,
you can get inundated with stuff
that’s not relevant to you.
Whereas what you’re saying would be highly relevant
and highly useful and highly beneficial.
Yeah, but it’s risk.
I mean, that’s why I sort of,
there’s a lot of innovative ways of doing that.
Tesla’s an example of people
that basically don’t do marketing.
They do marketing in a very, like,
let’s say Elon hired a person who’s just good at Twitter
for running Tesla’s Twitter account.
No, right, right.
I mean, that’s exactly what you wanna be doing.
You want it to be constantly innovating in the.
Right, there’s an aspect of telling.
I mean, I’ve definitely seen people doing great work
where you’re not talking about it.
Like, I would say that’s actually a problem
I have right now with Quonset Labs.
Quonset Labs has been doing amazing work,
really excited about it,
but we have not been talking about it enough.
We haven’t been.
And there’s different ways to talk about it.
There’s different ways to,
there’s different channels to which to communicate.
There’s also, like, I’ll just throw some shade
at companies I love.
So for example, iRobot,
I just had a conversation with them.
They make Roombas.
Sure.
And I think I love, they’re incredible robots,
but like every time they do like advertisement,
not advertisement, but like marketing type stuff,
it just looks so corporate.
And to me, the incredible,
maybe wrong in the case of iRobot, I don’t know.
But to me, when you’re talking about engineering systems,
it’s really nice to show off the magic of the engineering
and the software and all the geniuses behind this product
and the tinkering and like the raw authenticity
of what it takes to build that system
versus the marketing people who want to have like
pretty people, like standing there all pretty
with the robots, like moving perfectly.
So to me, there’s some aspect,
it’s like speaking to the hackers,
you have to throw some bones,
some care towards the engineers, the developers,
because there’s some aspect, one, for the hiring,
but two, there’s an authenticity to that,
authenticity to that kind of communication
that’s really inspiring to the end user as well.
Like if they know that brilliant people,
the best in the world are working at your company,
they start to believe that that product
that you’re creating is really good.
It’s interesting, because your initial reaction would be,
wait, there’s different users here.
Why would you do that to, you know,
my wife bought a Roomba, and she loves developers,
she loves me, but she doesn’t care about that culture.
So essentially what you said is actually the authenticity,
because everyone has a friend, everyone knows people,
there’s word of mouth, I mean, if you.
Word of mouth is so, so proper.
Yeah, exactly, that’s interesting.
Because I think it’s the lack of that realization,
there’s this halo effect that influences
your general marketing, interesting.
For some stupid reason, I do have a platform,
and it seems that the reason I have a platform,
many others like me, millions of others,
is like the authenticity,
and like we get excited naturally about stuff.
And like, I don’t want to get excited
about that iRobot video,
because it’s boring, it’s marketing, it’s corporate,
as opposed to, I wanted to do some fun,
this is me, like a shout out to iRobot,
is they’re not letting me get into the robot.
Yeah, well there’s an aspect of,
that could be benefiting from a culture of modularity,
like add ons, and that could actually dramatically help.
You’ve seen that over history,
I mean, Apple is an example of a company like that,
or the, like, I can see what your point is,
is that you have something that needs to be,
it needs to be adopted broadly,
the concept needs to be adopted broadly.
And if you want to go beyond this one device,
you need to engage this community.
Yeah, and connecting to the open source that you said.
I gotta ask you,
you’re a programmer,
one of the most impactful programmers ever.
You’ve led many programmers, you lead many programmers.
What are some, from a programmer perspective,
what makes a good programmer?
What makes a productive programmer?
Is there a device you can give
to be a great programmer in this world?
That’s a great, great question.
And there are times in my life
I’d probably answer this even better
than I hope maybe give an answer today.
Because I thought about this numerous times,
like right now I’ve spent on so much time
recently hiring salespeople that,
That your mind is a little bit on something else.
On something else.
But I reflected on the past,
and also, you know, I have some really,
the only way I can do this,
is I have some really great programmers that I work with,
who lead the teams that they lead.
And my goal is to inspire them and hopefully help them,
encourage them, and be,
help them encourage with their teams.
I would say there’s a number of things, couple things.
One is curiosity.
Like you, I think a programmer without curiosity
is mundane.
Like you’ll lose interest, you won’t do your best work.
So it’s sort of, it’s an affect.
It’s sort of, are you,
you have some curiosity about things.
I think two, don’t try to do everything at once.
Recognize that you’re, you know, we’re limited as humans.
You’re limited as a human.
And each one of us are limited in different ways.
You know, we all have our different strengths and skills.
So it’s adapting the art of programming to your skills.
One of the things that always works,
is to limit what you’re trying to solve.
Right, so, if you’re part of a team,
usually maybe somebody else has put the architecture together
and they’ve gotten given a portion for you if you’re young.
If you’re not part of a team,
it’s sort of breaking down the problem into smaller parts,
is essential for you to make progress.
It’s very easy to take on a big project
and try to do it all at once, and you get lost.
And then you do it badly.
And so thinking about, you know,
very concretely what you’re doing,
defining the inputs and outputs,
defining what you want to get done.
Even just talking about that and like writing down
before you write code, just what are you trying to accomplish?
I mean, very specific about it, really, really helps.
I think using other people’s work, right?
Don’t be afraid that somehow you’re,
like you should do it all.
Like, nobody does.
Stand on the shoulders of giants.
And copy and paste from Stack Overflow.
Copy and paste from Stack Overflow.
But don’t just copy and paste,
this is particularly relevant in the era of Codex
and the auto generated code, which is essentially,
I see as an indexing of Stack Overflow.
Right, exactly.
Secondly, it’s like.
It’s a search engine.
It’s a search engine over Stack Overflow, basically.
So it’s not, I mean, we’ve had this for a while.
But really, you want to cut and paste, but not blindly.
Like, absolutely I’ve cut and paste to understand,
but then you understand.
Oh, this is what this means.
Oh, this is what it’s doing.
And understand as much as you can.
So it’s critical, that’s where the curiosity comes in.
If you’re just blindly cutting and pasting,
you’re not gonna understand.
So understand, and then be sensitive to hype cycles.
Right, every few often there’s always a,
oh, test driven development is the answer.
Oh, object oriented is the answer.
Oh, there’s always an answer.
Agile is the answer.
Be cautious of jumping onto a hype cycle.
Like, likely there’s signal.
Like, there’s a thing there
that’s actually valuable, you can learn from.
But it’s almost certainly not the answer
to everything you need.
What lessons do you draw
from you having created NumPy and SciPy?
Like, in service of sort of answering the question
of what it takes to be a great programmer
and giving advice to people.
How can you be the next person to create a SciPy?
Yeah, so one is listen.
To?
Listen.
To who?
To people that have a problem, right?
Which is everybody, right?
But listen, and listen to many.
And then try to, and then do.
Like, you’re gonna have to do an experiment, you know?
Do, fall down, don’t be afraid to fall down.
Don’t be afraid, the first thing you do
is probably gonna suck, and that’s okay, right?
It’s honestly, I think iteration is the key to innovation.
And it’s almost that psychological hesitation we have
to just iterate.
Like, yeah, we know it’s not great,
but next time it’ll be better.
I mean, just keep learning and keep improving.
So it’s an attitude.
And then it doesn’t take intense concentration, right?
Good things don’t happen just,
it’s not quite like TikTok or like Facebook, you know?
You can’t scroll your way to good programming, right?
There are sincere hours of deep,
don’t be afraid of the deep problem.
Like, often people will run away from something
because, oh, I can’t solve this.
And you might be right, but give it an hour.
Give it a couple of hours and see.
And just five minutes, not gonna give you that.
Was it lonely when you were building SciPy and NumPy?
Hugely, yeah, absolutely lonely,
in the sense of you had to have an inner drive,
and that inner drive for me always comes from,
I have to see that this is right in some angle.
I have to believe it, that this is the right approach,
the right thing to do.
With SciPy, it was like, oh yeah,
the world needs libraries and Python.
Clearly Python’s popular enough
with enough influential people to start,
and it needs more libraries.
So that is a good in and of itself.
So I’m gonna go do that good.
So find a good, find a thing that you know is good
and just work on it.
So that has to happen, and it is.
And you kind of have to have enough realization
of your mission to be okay with the naysayer
or the fact that not everybody joins you at front.
In fact, one thing I’ve talked to people a lot,
I’ve seen a lot of projects come, and some fail.
Not everything I’ve done has actually worked perfectly.
I’ve tried a bunch of stuff that, okay,
that didn’t really work, or this isn’t working, and why.
But you see the patterns, and one of the key things is
you can’t even know for six months.
I say 18 months right now.
If you’re starting a new project,
you gotta give it a good 18 month run
before you even know if the feedback’s there.
You’re not gonna know in six months.
You might have the perfect thing,
but six months from now, it’s still kind of still emerging.
So give it time, because you’re dealing with humans,
and humans have an inertial energy
that just doesn’t change that quickly, so.
Let me ask a silly question, but like you said,
you’re focused on the sales side of things currently,
but back when you were actively programming,
maybe in the 90s, you talked about IDEs.
What’s a setup that you have that brings you joy?
Keyboard, number of screens, Linux.
I do still like to program some.
It’s not as much as I used to.
I have two projects I’m super interested in,
trying to find funding for them,
trying to figure out teams for them,
but I could talk about those.
But what I, yeah, I’m an Emacs guy.
Great, thank the superior editor, everybody.
I’ve got, I don’t often delete tweets,
but one of the tweets I deleted
when I said Emacs was better than Vim,
and then the hate I got from it.
It is.
I was like, I’m walking away from this.
I do too, I don’t push it.
I mean, I’m not.
I’m just joking, of course.
Yeah, exactly, it’s kind of like,
but people do take the editor seriously, right?
I did it as a joke.
That’s your life.
It is, but there’s something beautiful to me about Emacs,
but for people that love Vim,
there’s something beautiful to them about that.
There is.
I mean, I do use Vim for quick editing.
Like Command Line, if I said quick editing,
I will still sometimes use it, but not much.
Like it’s simple, corrective signal editor character.
So when you were developing SciPy, you were using Emacs?
Emacs, yeah.
SciPy and NumPy are all written on Emacs on a Linux box.
And CVS and then SVN, version control.
Git came later.
Like Git has, I love distributed branch stuff.
I think Git is pretty complicated, but I love the concept.
And also, of course, GitHub and then GitLab
make Git definitely consumable, but that came later.
Did you ever touch Lisp at all?
Like what were your emotional feelings
about all the parentheses?
Yeah, so great question.
So I find myself appreciating Lisp today
much more than I did early.
Because when I came to programming, I knew programming,
but I was a domain expert, right?
And to me, the parentheses were in the way.
It’s like, wow, there’s just all this,
like it just gets in the way of my thinking
about what I’m doing.
So why would I have all these, right?
That was my initial reaction to it.
And now as I appreciate kind of the structure
that kind of naturally maps to a logical thinking
about a program, I can appreciate them, right?
And why it’s actually, you could create editors
that make it not so problematic, right, honestly.
So I actually have a much more appreciation of Lisp
and things like Clojure and there’s HyVee,
which is a Python Lisp that compiles the Python bytecode.
I think it’s challenging.
Like typically these languages are,
I even saw the whole data science programming system
in Lisp that somebody created, which is cool.
But again, I think it’s the lack of recognition
of the fact that there exists
what I call occasional programmers.
People that are never gonna be programmers for a living.
They don’t want to have all this cuteness in their head.
They want just, it’s why basic, you know,
Microsoft had the right idea with basic
in terms of having that be the language of visual basic,
the language of Excel and SQL Server.
They should have converted that to Python 10 years ago.
Like the world would be a better place if they had, but.
There’s also, there’s a beauty and a magic
to the history behind a language in Lisp.
You know, some of the most interesting people
in the history of computer science
and artificial intelligence have used Lisp.
So you feel.
Well, especially that language,
when you have a language, you can think in it.
And it helps you think better.
And it attracts a certain kinds of people
that think in a certain kind of way.
And then that’s there.
Okay, so what about like small laptop with a tiny keyboard,
or is there like three screens?
You know, good question.
I’ve never gotten into the big, many screens to be honest.
I mean, and maybe it’s because in my head,
I kind of just, I just swap between windows.
Like, partly because I guess I really can’t process
three screens at once anyway.
Like, I just am looking at one and I just flip.
You know, I flip an application open.
So where it’s really helpful is actually
when I’m trying to do, you know,
here’s data and I want to input it from here.
Like this is the only time I really need another screen.
So now, because you’re both a developer, lead developers,
but then there’s also these businesses
and there’s salespeople and you’re working
with large companies.
Operations people, hiring people, yeah.
The whole thing.
Which operating system is your favorite at this point?
So Linux was the early days.
So yeah, I love Linux as a server side.
And it was early days I had my own Linux desktop.
I’ve been on Mac laptops for 10 years now.
Yeah, this is what leadership looks like.
As you switch to Mac.
Okay, great.
Pretty much, I mean, just the fact that I had
to do PowerPoints, I had to do presentations
and you know, plug in, I just couldn’t mess
with plugging in laptops, it wouldn’t project and yeah.
So you mentioned also Quantset Labs and things like that.
Can you give advice on how to hire great programmers
and great people?
Yeah, I would say, produce an open source project,
get people contributing to it and hire those people.
Yeah, I mean, you’re doing it sort of,
you may be perhaps a little biased,
but that’s probably 100% really good advice.
I find it hard to hire.
I still find it hard to hire, like in terms of,
I don’t think that it’s not hard to hire
if I’ve worked with somebody for a couple of weeks,
but an hour or two of interviews, I have no idea.
So that instinct, that radar of knowing if you’re good
or not, that you’ve found that you’re still not able to.
It’s really hard, I mean, the resume can help,
but again, the resume is like a presentation
of the things they want you to see, not the reality of,
and there’s also, you have to understand
what you’re hiring for.
There are different stages and different kinds of skills.
And so it isn’t just, one of the things I talk a lot about
internally at my company is just that the whole idea
of measuring ourselves against a single axis is flawed
because we’re not, it’s a multidimensional space
and how do you order a multidimensional space?
There isn’t one ordering.
So this whole idea, you immediately get projected
into a thing when you’re talking about hiring
or best or worst or better or not better.
So what is the thing you’re actually needing?
And you can hire for that.
There is such a thing, generally, I really value people
who have the affect, that care about open source.
Like so in some cases, their affinity to open source
is simply kind of a filter of an affect.
However, I have found this interesting dichotomy
between open source contributors and product creation.
There’s, I don’t know if it’s fully true,
but there does seem to be the more experienced,
the more affect somebody has an open source community,
the less ability to actually produce product that they have.
And the opposite is kind of true too.
The more product focused are, I find a lot of people,
I’ve talked to a lot of people who produce
really great products and they have a,
they’re looking over the open source communities,
kind of wanting to participate and play,
but they’ve played here and they do a great job here
and then they don’t necessarily have some of the same.
Now I don’t think that’s entirely necessary.
I think part of it is cultural, how they’ve emerged.
Because one of the things that open source communities
often lack is great product management,
like some product management energy.
That’s brilliant, but you want both of those energies
in the same place together.
Yes, you really do.
And so a lot of it’s creating these teams of people
that have these needed skills and attributes
that are hard.
And so one of the big things I look for is somebody
that fundamentally recognizes their need to learn.
Like one of the values that we have
in all of the things we do is learning.
Like if somebody thinks they know it all,
they’re gonna struggle.
And some of that is just, there’s more basic things
like humility, just being humble in the face
of all the things you don’t know.
And that’s step one of learning.
That’s step one of learning, right?
And I’ve spent a lot of time learning, right?
Other people spend a lot more time,
but I’ve spent a lot of time learning.
My whole goal was to get a PhD because I love school
and I wanted to be a scientist.
And then what I found is what’s been written about
elsewhere as well is the more I learned,
the more I didn’t know.
The more I realized, man, I know about this,
but this is such a tiny thing in the global scope
of what I might wanna know about.
So I need to be listening a whole lot better
than I am just talking.
That’s changed a little bit actually.
My wife says that I used to be a better listener.
Now that I’m so full of all these ideas I wanna do,
she kind of says, you gotta give people time to talk.
So you’ve succeeded on multiple dimensions.
So one is the tenure track faculty.
The other is just creating all these products
and building up the businesses,
then working with businesses.
Do you have advice for young people today
in high school and college of how to live a life
as nonlinear and as successful as yours,
a life that they could be proud of?
Well, that’s a super compliment.
I’m humbled by that actually.
I would say a life they can be proud of.
Honestly, one thing that I’ve said to people is first,
find people you love and care about them.
Like family matters to me a lot.
And family means people you love and have committed to.
So it can be whatever you mean by that,
but you need to have a foundation.
So find people you love and wanna commit to and do that.
Cause it anchors you in a way that nothing else can.
And then you find other things.
And then kind of from out there,
you find other kinds of things you can commit to,
whether it’s ideas or people or groups of people.
So, especially in high school,
I would say don’t settle on what you think you know.
Like give yourself 10 years to think about the world.
Like I see a lot of high school students
who seem to know everything already.
I think I did too.
I think it’s maybe natural,
but recognize that the things you care about,
you might change your perspective over time.
I certainly have over time.
I was really passionate about one specific thing
and I was kind of softened.
I was a big, I didn’t like the Federal Reserve, right?
And there’s still, we could have a longer conversation
about monetary policy and finances,
but I’m a little more nuanced in my perspective
at this point.
But that’s one area where you learn about something,
go, ah, I wanna attack it.
Build, don’t destroy.
Build, like so often the tendency is to not like something
and wanna go attack it.
Build something, build something to replace it.
Yeah.
Build up, attract people to your new thing.
You’ll be far better, right?
You don’t need to destroy something to build something else.
So that’s, I guess, generally.
And then definitely like curiosity,
follow your curiosity and let it,
don’t just follow the money.
And all of that, like you said,
is grounded in family, friendship, and ultimately love.
Yes.
Which is a great way to end it.
Travis, you’re one of the most impactful people
in the engineering and the computer science
in the human world.
So I truly appreciate everything you’ve done.
And I really appreciate that you would spend
your valuable time with me.
It was an honor.
It was a real pleasure for me.
I appreciate that.
Thanks for listening to this conversation
with Travis Oliphant.
To support this podcast,
please check out our sponsors in the description.
And now, let me leave you with something
that in the programming world is called Hodgson’s Law.
Every sufficiently advanced Lisp application
will eventually be re implemented in Python.
Thank you for listening and hope to see you next time.