The following is a conversation with Boris Sofman, who is the senior director of engineering and head of trucking at Waymo, the autonomous vehicle company, formerly the Google self driving car project.
Before that, Boris was the co founder and CEO of Anki, a robotics company that created Cosmo, which, in my opinion, is one of the most incredible social robots ever built.
It’s a toy robot, but one with an emotional intelligence that creates a fun and engaging human robot interaction. It was truly sad for me to see Anki shut down when he did.
I had high hopes for those little robots. We talk about this story and the future of autonomous trucks, vehicles, and robotics in general.
I spoke with Steve Viselli recently on episode 237 about the human side of trucking. This episode looks more at the robotics side.
This is the Lex Friedman podcast. To support it, please check out our sponsors in the description. And now here’s my conversation with Boris Sofman.
Who is your favorite robot in science fiction, books or movies?
I like WALLY and R2D2 where they were able to convey such an incredible degree of intent, emotion, and kind of character attachment without having any language whatsoever.
And just purely through the richness of emotional interaction. So those are fantastic. And then the Terminator series just like really, pretty wide range, right?
But I kind of love this dynamic. We have this incredible Terminator itself that Arnold played.
And then he was kind of like the inferior previous generation version that was totally outmatched in terms of specs by the new one, but still kind of held his own.
And so it was kind of interesting where you realize how many levels there are on the spectrum from human to kind of potentials in AI and robotics to futures.
So yeah, that movie really, as much as it was like kind of a direct world in a way, was actually quite fascinating, gets the imagination going.
Well, from an engineer perspective, both the movies you mentioned, WALLY and Terminator, the first one is probably achievable, you know, humanoid robot.
Maybe not with like the realism in terms of skin and so on, but that humanoid form, we have that humanoid form. It seems like a compelling form.
Maybe the challenge is that it’s super expensive to build, but you can imagine, maybe not a machine of war, but you can imagine Terminator type robots walking around.
And then the same obviously with WALLY, you’ve basically, so for people who don’t know, you created the company Anki that created a small robot with a big personality called Cosmo that just does exactly what WALLY does,
which is somehow with very few basic visual tools is able to communicate a depth of emotion. And that’s fascinating.
But then again, the humanoid form is super compelling. So like Cosmo is very distant from a humanoid form.
And then the Terminator has a humanoid form and you can imagine both of those actually being in our society.
That’s true. And it’s interesting because it is very intentional to go really far away from human form when you think about a character like Cosmo or like WALLY where you can completely rethink the constraints you put on that character,
what tools you leverage and then how you actually create a personality and a level of intelligence interactivity that actually matches the constraints that you’re under, whether it’s mechanical or sensors or AI of the day.
This is why I was always very surprised by how much energy people put towards trying to replicate human form in a robot because you actually take on some pretty significant constraints and downsides when you do that.
The first of which is obviously the cost where the articulation of a human body is just so magical in both the precision as well as the dimensionality that to replicate that even in its reasonably close form takes a giant amount of joints and actuators and motion and sensors and encoders and so forth.
But then you’re almost setting an expectation that the closer you try to get to human form, the more you expect the strengths to match.
And that’s not the way AI works is there’s places where you’re way stronger and there’s places where you’re weaker.
And by moving away from human form, you can actually change the rules and embrace your strengths and bypass your weaknesses.
And at the same time, the human form has way too many degrees of freedom to play with. It’s kind of counterintuitive, just as you’re saying, but when you have fewer constraints, it’s almost harder to master the communication of emotion.
Like you see this with cartoons, like stick figures, you can communicate quite a lot with just very minimal, like two dots for eyes and a line for a smile. I think you can almost communicate arbitrary levels of emotion with just two dots and a line.
And that’s enough. And if you focus on just that, you can communicate the full range. And then if you do that, then you can focus on the actual magic of human and dot line interaction versus all the engineering mess.
Like dimensionality, voice, all these sort of things actually become a crutch where you get lost in a search space almost. And so some of the best animators that we’ve worked with, they almost like study when they come up kind of in building their expertise by forcing these projects where all you have is like a ball that can like kind of jump and manipulate itself or like really, really like aggressive constraints where you’re forced to kind of extract the deepest level of emotion.
And so in a lot of ways, when we thought about Cosmo, I was like, you’re right. If we had to describe it in like one small phrase, it was bringing a Pixar character to life in the real world. It’s what we were going for.
And in a lot of ways, what was interesting is that with WALLY, which we studied incredibly deeply, and in fact, some of our team had worked previously at Pixar on that project, they intentionally constrained WALLY as well, even though in an animated film, you could do whatever you wanted to because it forced you to like really saturate the smaller amount of dimensions.
But you sometimes end up getting a far more beautiful output because you’re pushing at the extremes of this emotional space in a way that you just wouldn’t because you get lost in the surface area if you have like something that is just infinitely articulable.
So if we backtrack a little bit and you thought of Cosmo in 2011 and 2013 actually designed and built it. What is Anki? What is Cosmo? I guess, who is Cosmo? And what was the vision behind this incredible little robot?
We started Anki back while we were still in graduate school. So myself and my two cofounders, we were PhD students in the Robotics Institute at Carnegie Mellon. And so we were studying robotics, AI, machine learning, different areas.
One of my cofounders was working on walking robots for a period of time. And so we all had a bit of a deeper passion for applications of robotics and AI where there’s like a spectrum where there’s people that get really fascinated by the theory of AI and machine learning robotics where whether it gets applied in the near future or not is less of a factor on them, but they love the pursuit of the challenge.
And that’s necessary. And there’s a lot of incredible breakthroughs that happened there. We’re probably closer to the other end of the spectrum where we love the technology and all the evolution of it, but we were really driven by applications, like how can you really reinvent experiences and functionality and build value that wouldn’t have been possible without these approaches.
And that’s what drove us. And we had some experiences through previous jobs and internships where we got to see the applied side of robotics. And at that time, there was actually relatively few applications of robotics that were outside of peer research or industrial applications, military applications and so forth.
There were very few outside of it. So maybe iRobot was like one exception and maybe there are a few others, but for the most part, there weren’t that many. And so we got excited about consumer applications of robotics where you could leverage way higher levels of intelligence through software to create value and experiences that were just not possible in those fields today.
And we saw kind of a pretty wide range of applications that varied in the complexity of what it would take to actually solve those. And what we wanted to do was to commercialize this into a company, but actually do a bottoms up approach where we could have a huge impact in a space that was ripe to have an impact at that time and then build up off of that and move into other areas.
And then entertainment became the place to start because you had relatively little innovation in the toy space and entertainment space. You had these really rich experiences in video games and movies, but there was like this chasm in between.
And so we thought that we could really reinvent that experience. And there was a really fascinating transition technically that was happening at the time where the cost of components was plummeting because of the mobile phone industry and then the smartphone industry.
And so the cost of a microcontroller, of a camera, of a motor, of memory, of microphones, cameras was dropping by orders of magnitude. And then on top of that with iPhone coming out in 2000, I think it was 2007, I believe, it started to become apparent within a couple of years that this could become a really incredible interface device
and the brain with much more computation behind a physical world experience that wouldn’t have been possible previously. And so we really got excited about that and how we push all the complexity from the physical world into software by using really inexpensive components, but putting huge amounts of complexity into the AI side.
And so Cosmo became our second product and then the one that we’re probably most proud of. The idea there was to create a physical character that had enough understanding and awareness of the physical world around it and the context that mattered to feel like he was alive.
And to be able to have these emotional connections and experiences with people that you would typically only find inside of a movie. And the motivation very much was Pixar. We had an incredible respect and appreciation for what they were able to build in this really beautiful fashion and film.
But it was always like, one, it was virtual and two, it was like a story on rails that had no interactivity to it. It was very fixed and it obviously had a magic to it, but where you really start to hit a different level of experiences when you’re actually able to physically interact with a robot.
And then that was your idea with Anki, like the first product was the cars. So basically you take a toy, you add intelligence into it in the same way you would add intelligence into AI systems within a video game, but you’re not bringing it into the physical space.
So the idea is really brilliant, which is you’re basically bringing video games to life.
Exactly. That’s exactly right. We literally use that exact same phrase because in the case of Drive, this was a parallel of the racing genre. And the goal was to effectively have a physical racing experience, but have a virtual state at all times that matches what’s happening in the physical world.
And then you can have a video game off of that and you can have different characters, different traits for the cars, weapons and interactions and special abilities and all these sort of things that you think of virtually, but then you can have it physically.
And one of the things that we were really surprised by that really stood out and immediately led us to really accelerate the path towards Cosmo is that things that feel like they’re really constrained and simple in the physical world, they have an amplified impact on people.
The exact same experience virtually would not have anywhere near the impact, but seeing it physically really stood out.
And so effectively with Drive, we were creating a video game engine for the physical world.
And then with Cosmo, we expanded that video game engine to create a character and kind of an animation and interaction engine on top of it that allowed us to start to create these much more rich experiences.
And a lot of those elements were almost like a proving ground for what would human robot interaction feel like in a domain that’s much more forgiving, where you can make mistakes in a game.
It’s okay if a car goes off the track or if Cosmo makes a mistake.
And what’s funny is actually we were so worried about that.
In reality, we realized very quickly that those mistakes can be endearing, and if you make a mistake, as long as you realize you made a mistake and have the right emotional reaction to it, it builds even more empathy with the character.
Exactly. So when the thing you’re optimizing for is fun, you have so much more freedom to fail, to explore, and also in the toy space.
Like all of this is really brilliant, and I gotta ask you backtrack, it seems for a roboticist to take a jump into the direction of fun is a brilliant move.
Because one, you have the freedom to explore and to design all those kinds of things.
And you can also build cheap robots.
If you’re not chasing perfection and toys, it’s understood that you can go cheaper, which means a robot is still expensive, but it’s actually affordable by a large number of people.
So it’s a really brilliant space to explore.
Yeah, that’s right.
And in fact, we realized pretty quickly that perfection is actually not fun.
Because in a traditional roboticist sense, the first kind of path planner, and this is the part that I worked on out of the gate, was a lot of the AI systems where you have these vehicles and cars racing, making optimal maneuvers to try to get ahead.
And you realize very quickly that that’s actually not fun because you want the chaos from mistakes.
And so you start to kind of intentionally almost add noise to the system in order to kind of create more of a realism in the exact same way the human player might start really ineffective and inefficient and then start to kind of increase their quality bar as they progress.
And there is a really, really aggressive constraint that’s forced on you by being a consumer product where the price point matters a ton, particularly in kind of an entertainment where you can’t make a $1,000 product unless you’re going to meet the expectations of a $1,000 product.
And so in order to make this work, your cost of goods had to be well under $100.
In the case of Cosmo, we got it under $50 end to end, fully packaged and delivered.
And it was under $200 cost at retail.
Okay, if we sit down like at the early stages, if we go back to that and you’re sitting down and thinking about what Cosmo looks like from a design perspective and from a cost perspective, I imagine that was part of the conversation.
Well, first of all, what came first? Did you have a cost in mind? Is there a target you’re trying to chase?
Did you have a vision in mind, like size? Did you have, because there’s a lot of unique qualities to Cosmo.
So for people who don’t know, they should definitely check it out. There’s a display, there’s eyes on the little display and those eyes can, it’s pretty low resolution eyes, right?
But they’re still able to convey a lot of emotion.
And there’s this arm, like that sort of lift stuff.
But there’s something about arm movement that adds even more kind of depth.
It’s like the face communicates emotion and sadness and disappointment and happiness.
And then the arms kind of communicates, I’m trying here.
I’m doing my best in this complicated world.
Exactly. So it’s interesting because like all of Cosmo is only four degrees of freedom and two of them are the two treads, which is for basic movement.
And so you literally have only a head that goes up and down, a lift that goes up and down, and then your two wheels.
And you have sound and a screen, a low resolution screen.
And with that, it’s actually pretty incredible what you can come up with, where, like you said, it’s a really interesting give and take because there’s a lot of ideas far beyond that, obviously, as you can imagine, where, like you said, how big is it?
How much degrees of freedom? What does he look like? What does he sound like? How does he communicate?
It’s a formula that actually scales way beyond entertainment.
This is the formula for human kind of robot interface more generally, is you almost have this triangle between the physical aspects of it, the mechanics, the industrial design, what’s mass producible, the cost constraints and so forth.
You have the AI side of how do you understand the world around you, interact intelligently with it, execute what you want to execute.
So perceive the environment, make intelligent decisions and move forward. And then you have the character side of it.
Most companies have done anything in human robot interaction, really missed the mark or underinvest in the character side of it.
They overinvest in the mechanical side of it and then varied results on the AI side of it.
And so the thinking is that you put more mechanical flexibility into it, you’re going to do better.
You don’t necessarily, you actually create a much higher bar for a high ROI because now your price point goes up, your expectations go up.
And if the AI can’t meet it or the overall experience isn’t there, you miss the mark.
So how did you, through those conversations, get the cost down so much and made it so simple? There’s a big theme here because you come from the mecca of robotics, which is Carnegie Mellon University, robotics.
For all the people I’ve interacted with that come from there or just from the world experts at robotics, they would never build something like Cosmo.
And so where did that come from? The simplicity.
It came from this combination of a team that we had. It was quite cool.
And by the way, you ask anybody that’s experienced in the toy entertainment space, you’ll never sell a product over $99.
That was fundamentally false and we believed it to be false. It was because experience had to meet the mark.
And so we pushed past that amount, but there was a pressure where the higher you go, the more seasonal you become and the tougher it becomes.
And so on the cost side, we very quickly partnered up with some previous contacts that we worked with where, just as an example, our head of mechanical engineering was one of the earliest heads of engineering at Logitech and has a billion units of consumer products and circulation that he’s worked on.
So like crazy, low cost, high volume consumer product experience.
We had a really great mechanical engineering team and just a very practical mindset where we were not going to compromise on feasibility in the market in order to chase something that would be an enabler.
And we pushed a huge amount of expectations onto the software team where, yes, we’re going to use cheap, noisy motors and sensors, but we’re going to fix it on the software side.
Then we found on the design and character side, there was a faction that was more from like a game design background that thought that it should be very games driven, Cosmo, where you create a whole bunch of games experiences and it’s all about like game mechanics.
And then there was a faction which my cofather and I are the most involved in this, like really believed in, which was character driven.
And the argument is that you will never compete with what you can do virtually from a game standpoint, but you actually on a character side, put this into your wheelhouse and put it more towards your advantage because a physical character has a massively higher impact physically than virtually.
Okay, can I just pause on that because this is so brilliant. For people who don’t know, Cosmo plays games with you, but there’s also a depth of character. And I actually, when I was playing with it, I wondered exactly what is the compelling aspect of this.
Because to me, obviously I’m biased, but to me the character, what I enjoyed most, honestly, or what got me to return to it is the character.
But that’s a fascinating discussion of, you’re right, ultimately you cannot compete on the quality of the gaming experience.
It’s too restrictive. The physical world is just too restrictive and you don’t have a graphics engine, it’s like all this.
But on the character side, and clearly we moved in that direction as the winning path and we partnered up with this, we immediately went towards Pixar and Carlos Bena had been at Pixar for nine years.
He’d worked on tons of the movies, including WALLY and others, and just immediately spoke the language and it just clicked on how you think about that magic and drive.
And then we built out a team with him as a really prominent driver of this with different types of backgrounds and animators and character developers where we put these constraints on the team, but then got them to really try to create magic despite that.
And we converged on this system that was at the overlap of character and the character AI that where, if you imagine the dimensionality of emotions, happy, sad, angry, surprised, confused, scared, you think of these extreme emotions.
We almost put this challenge to populate this library of responses on how do you show the extreme response that goes to the extreme spectrum on angry or frustrated or whatever.
And so that gave us a lot of intuition and learnings and then we started parameterizing them where it wasn’t just a fixed recording, but they were parameterized and had randomness to them where you could have infinite permutations of happy and surprised and so forth.
And then we had a behavioral engine that took the context from the real world and would interpret it and then create probability mappings on what sort of responses you would have that actually made sense.
And so if Cosmo saw you for the first time in a day, he’d be really surprised and happy in the same way that the first time you walk in and your toddler sees you, they’re so happy, but they’re not going to be that happy for the entirety of your next two hours.
But you have this spike in response or if you leave him alone for too long, he gets bored and starts causing trouble and nudging things off the table.
Or if you beat him in a game, the most enjoyable emotions are him getting frustrated and grumpy to a point where our testers and our customers would be like, I had to let him win because I don’t want him to be upset.
And so you start to create this feedback loop where you see how powerful those emotions are.
And just to give you an example, something as simple as eye contact, you don’t think about it in a movie.
It kind of happens like camera angles and so forth, but that’s not really a prominent source of interaction.
What happens when a physical character like Cosmo, when he makes eye contact with you, it built universal kind of connection, kids all the way through adults.
And it was truly universal. It was not like people stopped caring after 10, 12 years old.
And so we started doing experiments and we found something as simple as increasing the amount of eye contact, like the amount of times in a minute that he’ll look over for your approval to make eye contact.
Just by, I think, doubling it, we increased the playtime engagement by 40%.
You see these sort of interactions where you build that empathy.
And so we studied pets. We studied virtual characters.
There’s like a lot of times actually dogs are one of the most perfect influencers behind these sort of interactions.
And what we realized is that the games were not there to entertain you.
The games were to create context to bring out the character.
And if you think about the types of games that you played, they’re relatively simple, but they were always once to create scenarios of either tension or winning or losing or surprise or whatever the case might be.
And they were purely there to just like create context to where an emotion could feel intelligent and not random.
And in the end, it was all about the character.
So yeah, there’s so many elements to play with here.
So you said dogs. What lessons do we draw from cats who don’t seem to give a damn about you?
Is that just another character?
It’s just another character.
So you could almost like in the early explorations, we thought it would be really incredible if you had a diversity of characters where you almost help encourage which direction it goes, just like in a role playing game.
And you had like think of like the seven dwarves sort of.
And initially we even thought that it would be amazing if like the other like, you know, like their characters actually help them have strengths and weaknesses and some like whatever they end up doing.
Like some are scared, some are, you know, arrogant, some are, you know, super warm and like kind of friendly.
And in the end, we focused on one because it made it very clear that, hey, we got to build out enough depth here because you’re kind of trying to expand.
It’s almost like how long can you maintain a fiction that this character is alive to where the person’s explorations don’t hit a boundary, which happens almost immediately with typical toys.
And, you know, even with video games, how long can we create that immersive experience to where you expand the boundary?
And one of the things we realized is that you’re just way more forgiving when something has a personality and it’s physical.
That is the key that unlocks robotics interacting in the physical world and more generally is that when you don’t have a personality and you make a mistake as a robot, the stupid robot made a mistake.
Why is it not perfect? When you have a character and you make a mistake, you have empathy and it becomes endearing and you’re way more forgiving.
And that was the key that was like I think goes far, far beyond entertainment.
It actually builds the depth of the personality, the mistakes.
So let me ask the movie Her question then.
How, so Cosmos seems, feels like the early days of something that will obviously be prevalent throughout society at a scale that we cannot even imagine.
My sense is it seems obvious that these kinds of characters will permeate society and that we’ll be friends with them.
We’ll be interacting with them in different ways.
I mean, you don’t think of it this way, but when you play video games, they’re often cold and impersonal.
But even then, you think about role playing games, you become friends with certain characters in that game.
They don’t remember much about you. They’re just telling a story.
It’s exactly what you’re saying. They exist in that virtual world.
But if they acknowledge that you exist in this physical world,
if the characters in the game remember that you exist, that you, like for me, like Lex,
they understand that I’m a human being who has like hopes and dreams and so on.
It seems like there’s going to be like billions, if not trillions of Cosmos in the world.
So if we look at that future, there’s several questions to ask.
How intelligent does that future Cosmo need to be to create fulfilling relationships like friendships?
Yeah, it’s a great question.
And part of it is the recognition that it’s going to take time to get there because it has to be a lot more intelligent
because it was good enough to be a magical experience for an eight year old.
It’s a higher bar to do that, be like a pet in the home or to help with functional interface in an office environment
or in a home and so forth.
And the idea was that you build on that and you kind of get there and as technology becomes more prevalent
and less expensive and so forth, you can start to kind of work up to it.
But you’re absolutely right.
At the end of the day, we almost equated it to how the touch screen created like this really novel interface
to physical kind of devices like this.
This is the extension of it where you have much richer physical interaction in the real world.
This is the enabler for it.
And it shows itself in a few kind of really obvious places.
So just take something as simple as a voice assistant.
You will never, most people will never tolerate an Alexa or a Google Home just starting a conversation proactively
when you weren’t kind of expecting it because it feels weird.
It’s like you were listening and like, and then now you’re kind of, it feels intrusive.
But if you had a character like a cat that touches you and gets your attention or toddler, like you never think twice about it.
And what we found really kind of immediately is that these types of characters like Cosmo and they would like roam around
and kind of get your attention.
And we had a future version that was always on kind of called Vector.
People were way more forgiving.
And so you could initiate interaction in a way that is not acceptable for machines.
And in general, there’s a lot of ways to customize it, but it makes people who are skeptical of technology much more comfortable with it.
There was like, there were a couple of really, really prominent examples of this.
So when we launched in Europe and so we were in I think like a dozen countries, if I remember correctly,
but like we went pretty aggressively in launching in Germany and France and UK.
And we were very worried in Europe because there’s obviously like a really socially higher bar for privacy and security
where you’ve heard about how many companies have had troubles on things that might’ve been okay in the US,
but like are just not okay in Germany and France in particular.
And so we were worried about this because you have Cosmo who’s in our future product Vector, like where you have cameras,
you have microphones, it’s connected and like you’re playing with kids and like in these experiences.
And you’re like, this is like ripe to be like a nightmare if you’re not careful.
And the journalists are like notoriously like really, really tough on these sorts of things.
We were shocked and we prepared so much for what we would have to encounter.
We were shocked in that not once from any journalists or customer did we have any complaints beyond like a really casual kind of question.
And it was because of the character where when the conversation came up, it was almost like, well, of course he has to see and hear.
How else is he going to be alive and interacting with you?
And it completely disarmed this like fear of technology that enabled this interaction to be much more fluid.
And again, like entertainment was a proving ground, but that is like, you know,
there’s like ingredients there that carry over to a lot of other elements down the road.
That’s hilarious that we’re a lot less concerned about privacy if the thing has value and charisma.
I mean, that’s true for all of human to human interactions.
It’s an understanding of intent where like, well, he’s looking at me, he can see me.
If he’s not looking at me, he can’t see me.
Right. So it’s almost like you’re communicating intent.
And with that intent, people are like kind of kind of a more understanding and calmer.
And it’s interesting. It was just the earliest kind of version of starting to experiment with this.
But it wasn’t an enabler.
And then you have like completely different dimensions where kids with autism had like an incredible connection with Cosmo
that just went beyond anything we’d ever seen.
And we have like these just letters that we would receive from parents.
And we had some research projects kind of going on with some universities on studying this.
But there’s an interesting dimension there that got unlocked that just hadn’t existed before
that has these really interesting kind of links into society and a potential building block of future experience.
So if you look out into the future, do you think we will have beyond a particular game, you know, a companion like her,
like the movie Her or like a Cosmo that’s kind of asks you how your day went to write, you know, like a friend.
How many years away from that do you think we are? What’s your intuition?
So I think the idea of a different type of character, like more closer to like kind of a pet style companionship will come way faster.
And there’s a few reasons.
One is like to do something like in her, that’s like effectively almost general AI.
And the bar is so high that if you miss it by a bit, you hit the uncanny valley where it just becomes creepy and like and not appealing.
Because the closer you try to get to a human in form and interface and voice, the harder it becomes.
Whereas you have way more flexibility on still landing a really great experience if you embrace the idea of a character.
And that’s why one of the other reasons why we didn’t have a voice and also why like a lot of video game characters like Sims,
for example, does not have a voice when you when you think about it, it was it wasn’t just a cost savings like for them.
It was actually for all of these purposes. It was because when you have a voice, you immediately narrow down the appeal to some particular demographic or age range or kind of style or gender.
If you don’t have a voice, people interpret what they want to interpret.
And an eight year old might get a very different interpretation than a 40 year old, but you create a dynamic range.
And so you just you can lean into these advantages much more in something that doesn’t resemble human.
And so that’ll come faster.
I don’t know when a human like that’s just still like just complete R&D at this point.
The chat interfaces are getting way more interesting and richer, but it’s still a long way to go to kind of pass the test of, you know.
Well, let me like let’s consider like let me play devil’s advocate.
So Google is a very large company that’s servicing.
It’s creating a very compelling product that wants to provide a service to a lot of people.
But let’s go outside of that. You said characters.
Yeah, it feels like and you also said that it requires general intelligence to be a successful participant in a relationship, which could explain why I’m single.
But the I honestly want to push back on that a little bit because I feel like is it possible that if you’re just good at playing a character in a movie, there’s a bunch of characters.
If you just understand what creates compelling characters and then you just are that character and you exist in the world and other people find you and they connect with you just like you do when you talk to somebody at a bar.
I like this character. This character is kind of shady. I don’t like them.
You pick the ones that you like.
And, you know, maybe it’s somebody that’s reminds you of your father or mother.
I don’t know what it is, but the Freudian thing.
But there’s some kind of connection that happens and that’s the Cosmo you connect to.
That’s the future Cosmo you connect.
And it’s so I guess the statement I’m trying to make, is it possible to achieve a depth of friendship without solving general intelligence?
I think so. And it’s about intelligent kind of constraints, right?
And just you set expectations and constraints such that in the space that’s left, you can be successful.
And so you can do that by having a very focused domain that you can operate in.
For example, you’re a customer support agent for a particular product and you create intelligence and a good interface around that.
Or, you know, kind of in the personal companionship side, you can’t be everything across the board.
You kind of solve those constraints.
And I think it’s possible.
My worry is right now I don’t see anybody that has picked up on where Cosmo left off and is pushing on it in the same way.
And so I don’t know if it’s a sort of thing where similar to like how, you know, in Dotcom there were all these concepts that we considered like, you know, that didn’t work out or like failed or like were too early or whatnot.
And then 20 years later, you have these like incredible successes on almost the same concept.
Like it might be that sort of thing where like there’s another pass at it that happens in five years or in 10 years.
But it does feel like that appreciation of that, like the three legged stool, if you will, between like, you know, the hardware, the AI and the character, that balance, it’s hard to, I’m not aware of anywhere right now where like that same kind of aggressive drive with the value on the character is happening.
And so to me, just a prediction, exactly as you said, something that looks awfully a lot like Cosmo, not in the actual physical form, but in the three legged stool, something like that in some number of years will be a trillion dollar company.
I don’t understand.
Like, it’s obvious to me that like character, not just as robotic companions, but in all our computers, they’ll be there.
It’s like Clippy was like two legs of that stool or something like that.
I mean, those are all different attempts.
And what’s really confusing to me is they’re born these attempts and everybody gets excited and for some reason they die and then nobody else tries to pick it up.
And then maybe a few years later, a crazy guy like you comes around with just enough brilliance and vision to create this thing and is born.
A lot of people love it.
A lot of people get excited, but maybe the timing is not right yet.
And then when the timing is right, it just blows up.
It just keeps blowing up more and more until it just blows up.
And I guess everything in the full span of human civilization collapses eventually.
And that wouldn’t surprise me at all.
And like, what’s going to be different in another five years or 10 years or whatnot?
Physical component costs will continue to come down in price and mobile devices and computation is going to become more and more prevalent as well as cloud as a big tool to offload cost.
AI is going to be a massive transformation compared to what we dealt with where everything from voice understanding to just kind of a broader contextual understanding and mapping of semantics and understanding scenes and so forth.
And then the character side will continue to kind of progress as well because that magic does exist.
It just exists in different forms.
And you have just the brilliance of the tapping and animation and these other areas where that was a big unlock in film, obviously.
And so I think, yeah, the pieces can reconnect and the building blocks are actually going to be way more impressive than they were five years ago.
So in 2019, Anki, the company that created Cosmo, the company that you started, had to shut down. How did you feel at that time?
Yeah, it was tough. That was a really emotional stretch and it was a really tough year.
I think about a year ahead of that was actually a pretty brutal stretch because we were kind of life or death on many, many moments just navigating these insane kind of just ups and downs and barriers.
And the thing that made it, just sort of winding a tiny bit, what ended up being really challenging about it as a business is from a commercial standpoint and customer reception standpoint, there’s a lot of things you could point to that were pretty big successes.
Sold millions of units, got to pretty serious revenue, kind of close to 100 million annual revenue, number one kind of product in various categories.
But it was pretty expensive. It ended up being very seasonal where something like 85% of our volume was in Q4 because it was a present and it was expensive to market it and explain it and so forth.
And even though the volume was really sizable and the reviews were really fantastic, forecasting and planning for it and managing the cash operations was just brutal.
It was absolutely brutal. You don’t think about this when you’re starting a company or when you have a few million in revenue because it’s just your biggest costs are kind of just your headcount and operations and everything’s ahead of you.
But we got to a point where if you look at the entire year, you have to operate your company, pay all the people and so forth.
You have to pay for the manufacturing, the marketing and everything else to do your sales in mostly November, December and then get paid in December, January by retailers.
And those swings were really rough and just made it so difficult because the more it successfully became, the more wild those swings became because you’d have to spend tens of millions of dollars on inventory, tens of millions of dollars on marketing and tens of millions of dollars on payroll and everything else.
The bigger dip and then you’re waiting for the Q4.
Yeah. And it’s not a business that is recurring month to month and predictable. And then you’re locking in your forecast in July, maybe August if you’re lucky.
And it’s also very hit driven and seasonal where you don’t have the sort of continued kind of slow growth like you do in some other consumer electronics industries.
And so before then, hardware kind of went out of favor too. And so you had Fitbit and GoPro drop from 10 billion revenue to 1 billion revenue and hardware companies are getting valued at like 1x revenue oftentimes, which is tough.
And so we effectively kind of got caught in the middle where we were trying to quickly evolve out of entertainment and move into some other categories.
But you can’t let go of that business because that’s what you’re valued on. That’s what you’re raising money on. But there is no path to kind of pure profitability just there because it was such specific type of price points and so forth.
And so we tried really hard to make that transition. And we had a financing round that fell apart at the last second.
And effectively, there was just no path to kind of get through that and get to the next kind of holiday season. And so we ended up selling some of the assets and kind of winding down the company.
It was brutal. I was very transparent with the company and the team while we were going through it where actually, despite how challenging that period was, very few people left.
I mean, people loved the vision, the team, the culture, the kind of chemistry and what we were doing. There was just a huge amount of pride there. And then we wanted to see it through. And we felt like we had a shot to kind of get through these checkpoints.
And by brutal, I mean literally days of cash, like three, four different times runway in the year kind of before it where you’re playing games of chicken on negotiating credit line timelines and repayment terms and how to get a bridge loan from an investor.
There was a level of stress that as hard as things might be anywhere else, you’ll never come close to that where you feel that responsibility for 200 plus people.
And so we were very transparent during our fundraise on who we’re talking to, the challenges that we have, how it’s going and when things are going well, when things were tough.
And so it wasn’t a complete shock when it happened, but it was just very emotional where we announced it finally that we basically were just watching the runway and trying to kind of time it.
And when we realized that we didn’t have any more outs, we wanted to kind of wind it down, make sure that it was clean and we could kind of take care of people the best we could.
But they broke down crying at the hands and somebody else had to step in for a bit and it was just very, very emotional. But the beautiful part is afterwards, everybody stayed at the office to two, three in the morning just drinking and hanging out and telling stories and celebrating.
And it was just one of the best, for many people, it was the best kind of work experience that they had. And there was a lot of pride in what we did.
And it wasn’t anything obvious we could point to that like, hey, if only we had done that different, things would have been completely different. It was just like the physics didn’t line up.
And but the experience was pretty incredible, but it was hard.
It had this feeling that there was this incredible beauty in both the technology and products and the team that there’s a lot there that in the right context could have been pretty incredible, but it was emotional.
Yeah, just thinking, I mean, just looking at this company, like you said, product and technology, but the vision, the implementation, you got the cost down very low and the compelling, the nature of the product was great.
So many robotics companies failed at this. The robot was too expensive. It didn’t have the personality. It didn’t really provide any value, like a sufficient value to justify the price.
So you succeeded where basically every single other robotics company or most of them that are like going the category of social robotics have kind of failed.
And I mean, it’s it’s quite tragic. I remember reading that. I’m not sure if I talked to you before that happened or not, but I remember, you know, I’m distant from this.
I remember being heartbroken reading that because, like, if if Cosmo is not going to succeed, what is going to succeed?
Because that to me was incredible. Like it was an incredible idea.
Cost is down. The minimum that the it’s just like the most minimal design in physical form that you could do.
It’s really compelling. The balance of games. So it’s a fun toy. It’s a great gift for all kinds of age groups.
Right. It’s just it’s compelling in every single way. And it seemed like it was a huge success and it failing was.
I don’t know. There was heartbreak on many levels for me, just as an external observer.
Is I was thinking, how hard is it to run a business? That’s that’s what I was thinking. Like, if this failed, this must have failed because it’s obviously not like, yeah, it’s business.
Yeah. Maybe it’s some aspect of the manufacturing and so on. But I’m now realizing it’s also not just that it’s.
Yeah. And sales, marketing, all those everything. Right. Like, how do you explain something that’s like a new category to people that like how all these positions.
And so, like, you know, it had some of the hardest elements of if you were to pick a business, it had some of the hardest customer dynamics, because like to sell a hundred fifty dollar product, you got to convince both the child, the one it and the parents to agree that it’s valuable.
So you’re having like this dual prong marketing challenge. You have manufacturing, you have like really high precision on the components that you need.
You have the challenges. So there were a lot of tough elements. But is this feeling where like just really great alignment of unique strength across kind of like all these different areas, just an incredible like, you know, kind of character and animation team between this Carlos.
And there’s like a character director day that came on board and really great people there.
The A.I. side, the the manufacturing, the you know, where like never missing a launch. Right. And actually, you know, he kind of had that quality was. Yeah, it was heartbreaking.
But here’s one neat thing is like we we had so much like fan mail from kind of kids and parents like I actually like there was a bunch that collected in the end that I actually saved.
And like I never it was too emotional to open it and I still haven’t opened it. And so I actually have this giant envelope of like a stack this much of like letters from, you know, kids and families, just like every kind of permutation permutation you can imagine.
And so planning to kind of I don’t know, maybe like a five year, you know, five year, some year reunion, just inviting everybody over and we’ll just like kind of dig into it and kind of bring back some memories.
But, you know, good impact. And well, I think there will be companies, maybe Waymo and Google will be somehow involved that will carry this flag forward and will will make you proud whether you’re involved or not.
I think this is one of the greatest robotics companies in the history of robotics.
So you should be proud. It’s still tragic to know that, you know, because you read all the stories of Apple and let’s see, SpaceX and like companies that were just on the verge of failure several times through that story.
And they just it’s almost like a roll of the dice. They succeeded. And here’s a roll of the dice that just happened to go.
And that’s the appreciation that like when you really like talk to a lot of the founders, like everybody goes through those moments.
And sometimes it really is a matter of like, you know, timing, a little bit of luck, like some things are just out of your control.
And and you get a much deeper appreciation for just the dimensionality of that challenge.
But the great thing is, is that like a lot of the team actually like stayed together. And so there were actually a couple of companies that we where we kind of kept big chunks of the team together and we actually kind of helped align this, you know, to to help people out as well.
And one of them was Waymo, where a majority of the AI and robotics team actually had the exact background that you would look for.
And like kind of AV space was a space that a lot of us like, you know, you know, worked on in grad school, were always passionate about and ended up, you know, maybe the time, you know, serendipitous timings from another perspective where like kind of landed in a really unique circumstances that should have been quite exciting, too.
So it’s interesting to ask you just your thoughts. Cosmo still lives on under Dream Labs, I think. Is that, are you tracking the progress there or is it too much pain? Is it, are you, is that something that you’re excited to see where that goes?
So keeping an eye on it, of course, just out of curiosity and obviously just kind of careful product line, I think it’s deceptive how complex it is to manufacture and evolve that product line and the amount of experiences that are required to complete the picture and be able to move that forward.
And I think that’s going to make it pretty hard to do something really substantial with it. It would be cool if like even the product in the way it was was able to be manufactured.
Which is the current goal, I suppose.
Yeah, which will be neat. But I think it’s deceptive how tricky that is on like everything from the quality control, the details and then like technology changes that forces you to reinvent and update certain things. So I haven’t been super close to it, but just kind of keeping an eye on it.
Yeah, it’s really interesting how it’s deceptively difficult, just as you’re saying. For example, those same folks, and I’ve spoken with them, they’re, they partner up with Rick and Morty creators to do the Butter Robot.
I love the idea. I just recently, I kind of half ass watched Rick and Morty previously, but now I just watched like the first season. It’s such a brilliant show.
I like, I did not understand how brilliant that show is. And obviously I think in season one is where the Butter Robot comes along for just a few minutes or whatever, but I just fell in love with the Butter Robot.
The sort of the, that particular character, just like you said, there’s characters you can create, personalities you can create, and that particular robot who’s doing a particular task realizes, you know, this like realizes, that’s the existential question.
The myth of Sisyphus question that Camus writes about, is this all there is? He moves butter. But, you know, that realization, that’s a beautiful little realization for a robot that my purpose is very limited to this particular task.
It’s humor of course, it’s darkness, it’s a beautiful mix. But so they want to release that Butter Robot, but something tells me that to do the same depth of personality as Cosmo had, the same richness, it would be on the manufacturing, on the AI, on the storytelling, on the design, it’s going to be very, very difficult.
It could be a cool sort of toy for Rick and Morty fans, but to create the same depth of existential angst that the Butter Robot symbolizes is really, that’s the brave effort you succeeded at with Cosmo, but it’s not easy. It’s really difficult.
You can fail on almost any one of the kind of dimensions, and unique convergence of a lot of different skill sets to try to pull that off.
On this topic, let me ask you for some advice, because as I’ve been watching Rick and Morty, I told myself, I have to build the Butter Robot, just as a hobby project. And so I got a nice platform for it with treads and there’s a camera that moves up and down and so on.
But the question I’d like to ask, there’s obvious technical questions I’m fine with, communication, the personality, storytelling, all those kinds of things. I think I understand the process of that, but how do you know when you got it right?
So with Cosmo, how did you know this is great? Or something is off. Is this brainstorming with the team? Do you know it when you see it? Is it like love at first sight? It’s like, this is right.
Or I guess if we think of it as an optimization space, is there Uncanny Valley where you’re like, that’s not right, or this is right, or are a lot of characters right?
Yeah, we stayed away from Uncanny Valley just by having such a different mapping where it didn’t try to look like a dog or a human or anything like that. And so you avoided having a weird pseudo similarity, but not quite hitting the mark.
But you could just fall flat where just a personality or a character emotion just didn’t feel right. And so it actually mirrored very closely to the iterations that a character director at Pixar would have, where you’re running through it and you can virtually see what it’ll look like.
We created a plugin to where we actually used Maya, the animation tools, and then we created a plugin that perfectly matched it to the physical one. And so you could test it out virtually and then push a button and see it physically play out.
And there’s subtle differences. And so you want to make sure that that feedback loop is super easy to be able to test it live.
And then sometimes you would just feel it that it’s right and intuitively know. And then we did user testing. But it was very, very often that if we found it magical, it would scale and be magical more broadly.
There were not too many cases where we were pretty decent about not geeking out or getting too attached to something that was super unique to us, but trying to put a customer hat on and does it truly feel magical?
And so in a lot of ways, we just give a lot of autonomy to the character team to really think about the character board and mood boards and storyboards and what’s the background of this character and how would they react.
And they went through a process that’s actually pretty familiar, but now had to operate under these unique constraints.
The moment where it felt right kind of took a fairly similar journey than like as a character in an animated film. Actually, it’s quite cool. Well, the thing that’s really important to me and I wonder if it’s possible.
Well, I hope it’s possible. Pretty sure it’s possible is for me, even though I know how it works to make sure there’s sufficient randomness in the process.
Probably because it would be machine learning based that I’m surprised that I don’t. I’m surprised by certain reactions. I’m surprised by certain communication.
Maybe that’s in a form of a question. Were you surprised by certain things Cosmo did, like certain interactions?
Yeah, we made it intentionally so that there would be some surprise and a decent amount of variability in how he’d respond in certain circumstances. And so in the end, this isn’t general AI.
This is a giant spectrum and library of parameterized emotional responses and an emotional engine that would map your current state of the game, your emotions, the world, the people who are playing with you, so forth, to what’s happening.
But we could make it feel spontaneous by creating enough diversity and randomness, but still within the bounds of what felt like very realistic to make that work.
And then what was really neat is that we could get statistics on how much of that space we were saturating and then add more animations and more diversity in the places that would get hit more often so that you stay ahead of the curve and maximize the chance that it stays feeling alive.
But then when you combine it, the permutations and the combinations of emotions stitched together sometimes surprised us because you see them in isolation.
But when you actually see them and you see them live relative to some event that happened in the game or whatnot, it was kind of cool to see the combination of the two.
And it’s not too different in other robotics applications where you get so used to thinking about the modules of a system and how things progress through a tech stack that the real magic is when all the pieces come together and you start getting the right emergent behavior in a way that’s easy to lose when you just kind of go too deep into any one piece of it.
Yeah, when the system is sufficiently complex, there is something like emergent behavior and that’s where the magic is. As a human being, you can still appreciate the beauty of that magic at the system level. First of all, thank you for humoring me on this.
It’s really, really fascinating. I think a lot of people would love this. One last thing on the butter robot, I promise.
In terms of speech, Cosmo is able to communicate so much with just movement and face. Do you think speech is too much of a degree of freedom? Like speech a feature or a bug of deep interaction, emotional interaction?
For a product, it’s too deep right now. You would immediately break the fiction because the state of the art is just not good enough. And that’s on top of just narrowing down the demographic where the way you speak to an adult versus the way you speak to a child is very different.
Yet a dog is able to appeal to everybody. And so right now there is no speech system that is rich enough and subtly realistic enough to feel appropriate. And so we very, very quickly kind of moved away from it.
Now, speech understanding is a different matter where understanding intent, that’s a really valuable input. But giving it back requires like a way, way higher bar given kind of where today’s world is.
And so that realization that you can do surprisingly much with either no speech or kind of tonal like the way Wally R2D2 and kind of other characters are able to, it’s quite powerful and it generalizes across cultures and across ages really, really well.
I think we’re going to be in that world for a little while where it’s still very much an unsolved problem on how to like make something. It touches on the uncanny valley thing. So if you have legs and you’re a big humanoid looking thing, you have very different expectations and a much narrower degree of what’s going to be acceptable by society.
And then if you’re a robot like Cosmo or Wally or some other form where you can kind of like reinvent the character, speech has that same property where speech is so well understood in terms of expectations by humans that you have far less flexibility on how to deviate from that and lean into your strengths and avoid weaknesses.
But I wonder if there is, obviously there’s certain kinds of speech that activates the uncanny valley and breaks the illusion faster. So I guess my intuition is we will solve certain, we would be able to create some speech based personalities sooner than others.
So for example, I could think of a robot that doesn’t know English and is learning English, right? Those kinds of personalities.
It’s like a fiction where you’re intentionally kind of like getting a toddler level of speech. So that’s exactly right. So you can have like tied into the experience where it is a more limited character or you embrace the lack of emotions or the lack of dynamic range in the speech kind of capabilities, emotions as like part of the character itself.
And you’ve seen that in like kind of fictional characters as well.
That’s why this podcast works.
Yeah, and you kind of had that with like, I don’t know, I guess like data and some of the other ones.
But yeah, so you have to, and that becomes a constraint that lets you meet the bar.
See, I honestly think like also if you add drunk and angry, that gives you more constraints that allow you to be dumber from an NLP perspective. Like there’s certain aspects. So if you modify human behavior, like, so forget the sort of artificial thing where you don’t know English toddler thing.
We, if you just look at the full range of humans, I think we, there’s certain situations where we put up with a like lower level of intelligence in our communication.
Like if somebody is drunk, we understand the situation that they’re probably under the influence. Like we understand that they’re not going to be making any sense. Anger is another one like that.
I’m sure there’s a lot of other kind of situations like this. Maybe, again, language, loss in translation, that kind of stuff that I think if you play with that, what is it, the Ukrainian boy that passed the touring test, you know, play with those ideas.
I think that’s really interesting that you can create compelling characters, but you’re right, that’s a dangerous sort of road to walk because you’re adding degrees of freedom that can get you in trouble.
Yeah. And that’s why like you have these big pushes that like for most of the last decade plus like where you’d have like full like human replicas of robots really being down to like skin and like kind of in some places.
My personal feeling is like, man, like that’s not the direction that’s most fruitful right now.
Beautiful art. It’s not in terms of a rich, deep, fulfilling experience. Yeah, you’re right.
Yeah. And creating a minefield of potential places to feel off. And then you’re sidestepping where like the biggest kind of functional AI challenges are to actually have, you know, kind of like really rich productivity that actually kind of justifies the higher price points.
And that’s part of the challenge is like, yeah, like robots are going to get to like thousands of dollars, tens of thousands of dollars and so forth.
But you can imagine what sort of expectation of value that comes with it. And so that’s where you want to be able to invest the time and depth.
And so going down the full human replica route creates a gigantic distraction and really, really high bar that can end up sucking up so much of your resources.
So it’s weird to say, but you happen to be one of the greatest at this point roboticists ever because you created this little guy. Your part obviously of a great team that created the little guy with a deep personality.
And they’re now switching to an entirely, well, maybe not entirely, but a different fascinating, impactful robotics problem, which is autonomous driving and more specifically, the biggest version of autonomous driving, which is autonomous trucking.
So you are at Waymo now. Can you give us a big picture overview? What is Waymo? What is Waymo Driver? What is Waymo One? What is Waymo Via? Can you give an overview of the company and the vision behind the company?
For sure. Waymo, by the way, has been eye opening on just how incredible the people and the talent is and how in one company you almost have to create 30 companies worth of technology and capability to solve the full spectrum of it.
So I’ve been at Waymo since 2019, so about two and a half years. So Waymo is focused on building what we call a driver, which is creating the ability to have autonomous driving across different environments, vehicle platforms, domains, and use cases.
As you know, it got started in 2009. It was almost like an immediate successor to the Grand Challenge and Urban Challenges that were like incredible catalysts for this whole space.
And so Google started this project and then eventually Waymo spun out. And so what Waymo is doing is creating the systems, both hardware, software, infrastructure, everything that goes into it to enable and to commercialize autonomous driving.
This hits on consumer transportation and ride sharing and kind of vehicles and urban environments. And as you mentioned, it hits on autonomous trucking to transport goods.
So in a lot of ways, it’s transporting people and transporting goods. But at the end of the day, the underlying capabilities required to do that are surprisingly better aligned than one might expect,
where it’s the fundamentals of being able to understand the world around you, process it, make intelligent decisions, and prove that we are at a level of safety that enables large scale autonomy.
So from a branding perspective, Waymo Driver is the system that’s irrespective of a particular vehicle it’s operating in. You have a set of sensors that perceive the world, can act in that world, and move whatever the vehicle is through the world.
And so in the same way that you have a driver’s license and your ability to drive is tied to a particular make and model of a car, and of course, there are special licenses for other types of vehicles, but the fundamentals of a human driver very, very largely carry over.
And then there’s uniquenesses related to a particular environment or domain or a particular vehicle type that kind of add some extra additive challenges.
But that’s exactly right. It’s the underlying systems that enable a physical vehicle without a human driver to very successfully accomplish the task that previously wasn’t possible without 100% human driving.
And then there’s Waymo One, which is the transporting people from a brand perspective. And just in case we refer to it so people know. And then there’s Waymo Via, which is the trucking component. Why Via, by the way? What is that? Is it just like a cool sounding name?
Is there an interesting story there? It is a pretty cool sounding name. It’s a cool sounding name. I mean, when you think about it, it’s just like, well, we’re going to transport it via this and that.
So it’s just kind of like an allusion to the mechanics of transporting something. And it is a pretty good grouping.
And the interesting thing is that even the groupings kind of blur where Waymo One is like human transportation and there’s a fully autonomous service in the Phoenix area that like every day is transporting people. And it’s pretty incredible to like just see that operated reasonably large scale and just kind of happen.
And then on the Via side, it doesn’t even have to be like long haul trucking is a like a major focus of ours. But down the road, you can stitch together the vehicle transportation as well for local delivery. Also, and a lot of this requirements for local delivery overlap very heavily with consumer transportation.
Obviously, given that you’re operating on a lot of the same roads and navigating the same safety challenges. And Waymo very much is a multi product company that has ambitions in both. They have different challenges and both are tremendous opportunities.
But the cool thing is, is that there’s a huge amount of leverage and this kind of core technology stack now gets pushed on by both sides. And that adds its own unique challenges. But the success case is that the challenges that you push on, they get leveraged across all platforms and all.
From an engineer perspective, the teams are integrated.
It’s a mix. So there’s a huge amount of centralized kind of core teams that support all applications. And so you think of something like the hardware team that develops the lasers to compute integrates into vehicle platforms.
This is an experience that carries over across, you know, any application that we’d have in a ebb and flow with both. Then there’s like really unique perception challenges, planning challenges, like other types of challenges where there’s a huge amount of leverage on a core tech stack.
But then there’s like dedicated teams that think of how do you deal with a unique challenge, for example, an articulated trailer with varying loads that completely changes the physical dynamics of a vehicle that doesn’t exist on a car, but it becomes one of the most important kind of unique new challenges on a truck.
So what’s the long term dream of Waymo via the autonomous trucking effort that Waymo is doing?
Yeah, so we’re starting with developing L4 autonomy for class 8 trucks. These are 53 foot trailers that capture like a pretty sizable percentage of the goods transportation in the country.
Long term, the opportunity is obviously to expand to much more diverse types of vehicles, types of goods transportation and start to really expand in both the volume and the route feasibility that’s possible.
And so just like we did on the car side, you start with a single route with a very specific operating kind of domain and constraints that allow you to solve the problem.
But then over time, you start to really try to push against those boundaries and open up deeper feasibility across routes, across surface streets, across environmental conditions, across the type of goods that you carry,
the versatility of those goods and how little supervision is necessary to just start to scale this network. And long term, there’s actually it’s a pretty incredible enabler where today you have already a giant shortage of truck drivers.
It’s over 80,000 truck driver shortage that’s expected to grow to hundreds of thousands in the years ahead.
You have really, really quickly increasing demand from ecommerce and just distribution of where people are located.
You have one of the deepest safety challenges of any profession in the US where there’s a huge, huge, huge kind of challenge around fatigue and around kind of the long routes that are driven.
And even beyond kind of the cost and necessity of it, there are fundamental constraints built into our logistics network that are tied to the type of human constraints and regulatory constraints that are tied to trucking today.
For example, our limits on how long a driver can be driving in a single day before they’re not allowed to drive anymore, which is a very important safety constraint.
What that does is it enforces limitations on how far jumps with a single driver could be and makes you very subject to availability of drivers, which influences where warehouses are built, which influences how goods are transported, which influences costs.
And so you start to have an opportunity on everything from plugging into existing fleets and brokerages and the existing logistics network and just immediately start to have a huge opportunity to add value from a cost and driving fuel insurance and safety standpoint,
all the way to completely reinventing the logistics network across the United States and enabling something completely different than what it looks like today.
Yeah, I had to be published before this had a great conversation with Steve Vicelli, who we talked about the manual driving.
He echoed many of the same things that you were talking about, but we talked about much of the fascinating human stories of truck drivers.
He was also was a truck driver for a bit as a grad student to try to understand the depth of the problem.
Fascinating lives. We have some drivers that have four million miles of lifetime driving experience.
It’s pretty incredible. And yeah, it’s learning from them, like some of them are on the road for 300 days a year. It’s a very unique type of lifestyle.
So there’s fascinating stuff there. Just like you said, there’s a shortage of actually people, truck drivers taking the job, counter to what I think is publicly believed.
So there’s an excess of jobs and a shortage of people to take up those jobs. And just like you said, it’s such a difficult problem.
And these are experts at driving and solving this particular problem. And it’s fascinating to learn from them to understand, you know, how hard is this problem?
And that’s the question I want to ask you from a perception, from a robotics perspective. What’s your sense of how difficult is autonomous trucking?
Maybe you can comment on which scenarios are super difficult, which are more manageable. Is there is there a way to kind of convert into words how difficult the problem is?
Yeah, it’s a good question. So there’s and as you can expect, it’s a mix. Some things become a lot easier or at least more flexible.
Some things are harder. And so, you know, on the things that are like the tailwinds, the benefits, a big focus of automating trucking, especially initially, is really focusing on the long haul freeway stretch of it, where that’s where a majority of the value is captured.
On a freeway, you have a lot more structure and a lot more consistency across freeways across the U.S.
compared to surface streets where you have a way higher dimensionality of what can happen, lack of structure, lack of consistency and variability across cities.
So you can leverage that consistency to tackle, at least in that respect, a more constrained problem, which has some benefits to it.
You can itemize much more of the sort of things you might encounter and so forth. And so those are benefits.
Is there a canonical freeway and city we should be thinking about? Like, is there is there a standard thing that’s brought up in conversation often?
Like, here’s a stretch of road. What is it like when people talk about traveling across country, they’ll talk about New York, San Francisco.
Is that the route? Like, is there a stretch of road that’s like nice and clean and then there’s like cities with difficulties in them that you kind of think of as the canonical problem to solve here?
Right. So starting with the car side.
Well, Waymo very intentionally picked the Phoenix area and the San Francisco area as a follow.
Once we hit driverless, where when you think of consumer transportation and ride sharing kind of economy, a big percentage of that market is captured in the densest cities in the United States.
And so really pushing out and solving San Francisco becomes a really huge opportunity and importance and places one dot on kind of like the spectrum of complexity.
The Phoenix area, starting with Chandler and then expanding more broadly in the Phoenix metropolitan area, it’s I believe the fastest growing city in the US.
It’s a kind of a higher medium sized city, but growing quickly and still captures a really wide range of kind of complexities.
And so getting to driverless there actually exposes you to a lot of the building blocks you need for the more complicated environments.
And so in a lot of ways, there’s a thesis that if you start to kind of place a few of these kind of dots where San Francisco has these types of unique challenges, dense pedestrians, all this like complexity, especially when you get into the downtown areas and so forth.
And Phoenix has like a really interesting kind of spectrum of challenges, maybe other ones like LA kind of add freeway focus and so forth.
You start to kind of cover the full set of features that you might expect and it becomes faster and faster if you have the right systems and the right organization to then open up the fifth city and the 10th city and the 20th city.
On trucking, there’s similar properties where obviously there’s uniquenesses and freeways when you get into really dense environments and then the real opportunity to then get even more
valuous to think about how you expand with like some of the surface free challenges. But for example, right now we’re looking we have a big facility that we’re finishing building in Q1 in Dallas area.
That’ll allow us to do testing from the Dallas area on routes like Dallas to Houston, Dallas to Phoenix, going out east.
Dallas to Austin.
Austin to that triangle.
Waymo should come to Austin.
Well, Waymo the car side wasn’t Austin for a while.
Yes, I know. Come back.
But trucking is actually, Texas is one of the best places to start because of both volume, regulatory weather, there’s a lot of benefits.
On trucking, a huge opportunity is Port of LA going east.
So in a lot of ways, a lot of the work is to start to stitch together a network and converge to Port of LA where you have the biggest port in the United States.
And the amount of goods going east from there is pretty tremendous. And then obviously, there’s, you know, kind of channels everywhere. And then you have extra complexities as you get into like snow and increment weather and so forth.
But what’s interesting about trucking is every single route segment that you add increases the value of the whole network.
And so it has this kind of network effect and cumulative effect that’s very unique. And so there’s all these dimensions that we think about.
And so in a lot of ways, Dallas is a really unique hub that opens up a lot of options has become a really valuable lever.
So the million questions I could ask you, first of all, you mentioned level four.
For people who totally don’t know, there’s these levels of automation that level four refers to kind of the first step that you could recognize as fully autonomous driving.
Level five is really fully autonomous driving and level four is kind of fully autonomous driving.
And then there are specific definitions, depending on who you ask what that actually means. But for you, what does the level four mean?
And you mentioned freeway. Let’s say like there’s three parts of long haul trucking.
Maybe I’m wrong in this, but there’s freeway driving. There’s like truck stop.
And then there’s more urban type of area.
So which of those do you want to tackle? Which of them do you include under level four?
Like how do you think about this problem? What do you focus on? What is the biggest impact to be had in the short term?
So the goal is that we got to get to market as fast as we can, because the moment you get to market, you just learn so much and it influences everything that you do.
And it is one of the experiences I carried over from before is that you add constraints.
You figure out the right compromises. You do whatever it takes because getting to market is so critical.
But here with autonomous driving, you can get to market in so many different ways.
That’s right. And so one of the simplifications that we intentionally have put on is using what we call transfer hubs,
where you can imagine depots that are at the entry points to metropolitan areas, like let’s say Dallas, like the hub that we’re building, which does a few things that are very valuable.
So from a first product standpoint, you can automate transfer hub to transfer hub.
And that path from the transfer hub to the full freeway route can be a very intentional single route that you can select for the features that you feel you want to handle at that point in time.
And you build the hub specifically designed for autonomous trucking.
And that’s what’s going to happen, actually. And you need to come out in January and check it out because it’s going to be really cool.
Not only is it our main operating headquarters for our fleet there, but it will be the first fully ground up designed driverless hub for autonomous trucks in terms of where do they enter, where do they depart, how do you think about the flow of people, goods, everything.
It’s quite cool and it’s really beautiful on how it’s thought through.
And so early on, it is totally reasonable to do the last five miles manually to get to the final kind of depot to avoid having to solve the general surface street problem, which is obviously very complex.
Now, when the time comes and we are increasingly, already we’re pushing on some of this, but we will increasingly be pushing on surface street capabilities to build out the value chain to go all the way depot to depot instead of transfer hub to transfer hub.
And we have probably the best advantages in the world because of all the Waymo experience on surface streets, but that’s not the highest ROI right now where the highest ROI is hub to hub and get the routes going.
And so when you ask what’s L4, L4 can be applied to any operating domain or scope, but it’s effectively for the places where we say we’re ready for autonomous operation.
We are 100% operating as a self driving truck with no human behind the wheel.
That is L4 autonomy. And it doesn’t mean that you operate in every condition, it doesn’t mean you operate on every road, but for a particularly well defined area, operating conditions, routes, kind of domain, you are fully autonomous.
And that’s the difference between L4 and L5. And most people would agree that at least anytime in the foreseeable future, L5 is just not even really worth thinking about because there’s always going to be these extremes.
And so it’s a race and almost like a game where you think of what is the sequence of expanded capabilities that create the most value and teach us the most and create this feedback loop where we’re building out and unlocking more and more capability over time.
I gotta ask you, just curious. So first of all, I have to, when I’m allowed, visit the Dallas facility because it’s super cool. It’s like robot on the giving and the receiving end. The truck is a robot and the hub is a robot.
Yeah, it’s got to be very robot friendly.
Yeah, that’s great. I will feel at home. What’s the sensor suite like on the hub if you can just high level mention it? Does the hub have like lidars? Is the truck doing most of the intelligence or is the hub also intelligent?
Yeah, so most of it will be the truck and everything is like connected. So we have our servers where we know exactly where every truck is. We know exactly what’s happening at a hub. And so you can imagine like a large backend system that over time starts to manage timings, goods, delivery, windows, all these sort of things.
And so you don’t actually need to, there might be special cases where that is valuable to equip some sensors in the hub, but a majority of the intelligence is going to be on the truck because whatever’s relevant to the truck, relevant should be seen by the truck and can be relayed remotely for any sort of kind of cognizance or decision making.
But there’s a distinct type of workflow where do you check trucks? Where do you want them to enter? What if there’s many operating at once? Where’s the staging area to depart? How do you set up the flow of humans and human cars and traffic so that you minimize the interaction between humans and kind of self driving trucks?
And then how do you even intelligently select the locations of these transfer hubs that are both really great service locations for a metropolitan area? And there could be over time, many of them for a metropolitan area while at the same time leaning into the path of least resistance to lean into your current capabilities and strengths so that you minimize the amount of work that’s necessary to unlock the next kind of big bar.
I have a million questions. So first, is the goal to have no human in the truck?
The goal is to have no human in the truck. Now, of course, right now we’re testing with expert operators and so forth. But the goal is to… Now, there might be circumstances where it makes sense to have a human or… And obviously, these trucks can also be manually driven.
So sometimes we talk with our fleet partners about how you can buy a Waymo equipped Dymor truck down the road and on the routes that are autonomous, it’s autonomous. On the routes that are not, it’s human driven. Maybe there’s L2 functionality that adds safety systems and so forth.
But as soon as they become, as soon as we expand in software, the availability of driverless routes, the hardware is forward compatible to just now start using them in real time. And so you can imagine this mixed use.
But at the end of the day, the largest value proposition is where you’re able to have no constraints on how you can operate this truck. And it’s 100% autonomous with nobody inside.
That’s amazing. So the… Let me ask on the logistics front, because you mentioned that also opportunity to revamp or for build from scratch some of the ideas around logistics.
I don’t want to throw too much shade, but from talking to Steve, my understanding is logistics is not perhaps as great as it could be in the current trucking environment.
I’m not, maybe you can break down why, but there’s probably competing companies. There’s just a mess. Maybe some of it is literally just, it’s old school.
Like they, it’s just like, it’s not computer, it’s not computerized. Like truckers are almost like contractors.
There’s an independence and there’s not a nice interface where they can communicate where they’re going, where they’re at, you know, all those kinds of things.
And so there, it just feels like there’s so much opportunity to digitize everything to where you could optimize the use of human time, optimize the use of all kinds of resources.
How much are you thinking about that problem? How fascinating is that problem? How difficult is it?
How much opportunity is there to revolutionize the space of logistics in autonomous trucking, in trucking period?
It’s pretty fascinating. It’s one of the most motivating aspects of all this where like, yes, there’s like a mountain of problems that are like you want to, you have to solve to get to like the first checkpoints and first drivers and so forth.
And inevitably, like in a space like this, you plug in initially into the existing kind of system and start to kind of, you know, learn and iterate.
But that opportunity is massive. And so, you know, a couple of the factors that play into it.
So first of all, there’s obviously just the physical constraints of driving time, driver availability.
Some fleets have a 95% attrition rate, you know, right now because of just this demands and like, you know, kind of gaps in competition and so forth.
And then it’s also incredibly fragmented where you would be shocked at like when you look at industries, like when you think of the top 10 players, like the biggest fleets, like the Walmarts and FedExes and so forth.
The percentage of the overall trucking market that’s captured by the top 10 or 50 fleets is surprisingly small.
The average kind of truck operation is like a one to five truck, you know, family business.
And so and so there’s just like a huge amount of like fragmentation, which makes for really interesting challenges in kind of stitching together through like bulletin boards and brokerages and some people run their own fleets.
And this world’s kind of like evolving, but it is one of the less digitized and optimized worlds that there is.
And the part that is optimized is optimized to the constraints of today.
And even within the constraints of today, this is a 900 billion dollar industry in the US and it’s continuing to grow.
It feels like from a business perspective, if I were to predict that while trying to solve the autonomous trucking problem, Waymo might solve first the logistics problem because that would already be a huge impact.
So on the way to solving autonomous trucking, the human driven, like there’s so much opportunity to significantly improve the human driven trucking, the timing, the logistics. So you use humans optimally.
You use handoffs to like, you know, well, even you get really ambitious, you start to expand this beyond like how does the fulfillment center work and like how does the transfer hub work, how does the warehouse work?
I mean, there’s a lot of opportunities to start to automate these chains. And a lot of the inefficiency today is because like you have a delay, like Port of LA has a bunch of ships right now waiting outside of it because they can’t dock because there’s not enough labor inside of the Port of LA.
There’s a big backlog of trucks, which means there’s a big backlog of deliveries, which means the drivers aren’t where they need to be. And so you have this like huge chain reaction and your feasibility of readjusting in this network is low because everything’s tied to humans and manual kind of processes or distributed processes across a whole bunch of players.
And so one of the biggest enablers is, yes, we have to solve autonomous trucking first. And that, by the way, that’s not like an overnight thing. That’s decades of continued kind of expansion and work. But the first checkpoint in the first route is like is not that far off.
But once you start enabling and you start to learn about how the constraints of autonomous trucking, which are very, very different than the constraints of human trucking and again, strengths and weaknesses, how do you then start to leverage that and rethink a flow of goods more broadly?
And this is where like the learnings of like really partnering with some of the largest fleets in the US and the sort of learnings that they have about the industry and the sort of needs that they have. And what would change if you just like really broke this one constraint that like holds up the whole network?
Or what if you enable this other constraint? That actually drives the roadmap in a lot of ways because this is not like an all or nothing problem. You start to kind of unlock more and more functionality over time, which functionality most enables this optimization ends up being kind of part of the discussion.
But you’re totally right. Like you fast forward to like five years, 10 years, 15 years, and you think about like very generalized capability of automation and logistics, as well as the ability to like poke into how those handoffs work.
The efficiency goes far beyond just direct cost of today’s like unit economics of a truck. They go towards reinventing the entire system in the same way that you see these other industries that like when you get to enough scale, you can really rethink how you build around your new set of capabilities, not the old set of capabilities.
Yeah, use the analogy metaphor or whatever that autonomous trucking is like email versus mail. And then with email, you’re still doing the communication, but it opens up all kinds of communities, varieties of communication that you didn’t anticipate.
That’s right. Constraints are just completely different. And yeah, there’s a definitely a property of that here.
And we’re also still learning about it because there is a lot of really fascinating and sometimes really elegant things that the industry has done where there’s companies whose entire existence is around, despite the constraints, optimizing as much as they can out of it.
And those lessons do carry over. But it’s an interesting kind of merger of worlds to think about like, well, what if this was completely different? How would we approach it?
And the interesting thing is that for a really, really, really long time, it’s actually going to be the merger between how to use autonomy and how to use humans that leans into each of their strengths.
Yeah. And then we’re back to Cosmo, human robot interaction.
So and the interesting thing about Waymo is because there’s the passenger vehicle, the human, the transportation of humans and transportation of goods, you could see over time, they may kind of meld together more because you’ll probably have like zero occupancy vehicles moving around.
So you have transportation of goods for short distances and then for slightly longer distances and then slightly longer and then there’ll be this, then you just see the difference between a passenger vehicle and a truck is just size and you can have different sizes and all that kind of stuff.
And at the core, you can have a Waymo driver that doesn’t, as long as you have the same sense of suite, you can just think of it as one problem.
And that’s why over time, these do kind of converge where in a lot of ways, a lot of the challenges we’re solving are freeway driving, which are going to carry over very well to the vehicles, to the car side.
But there are like then unique challenges like you have a very different dynamics in your vehicle where you have to see much further out in order to have the proper response time because you have an 80,000 pound fully loaded truck.
That’s a very, very different type of breaking profile than a car.
You have a really interesting kind of dynamic limits because of the trailer where you actually, it’s very, very hard to like physically like flip a car or do something like physically like most risk in a car is from just collisions.
It’s very hard to like in any normal operation to do something other than like unless you hit something to actually kind of like roll over something on a truck, you actually have to drive much closer to the physical bounds of the safety limits.
But you actually have like real constraints because you could have really interesting interactions between the cabin and the trailer.
There’s something called jackknifing if you turn too quickly, you have roll risk and so forth.
And so we spent a huge amount of time understanding those boundaries and those boundaries change based on the load that you have, which is also an interesting difference.
You have to propagate that through the algorithm so that you’re leveraging your dynamic range, but always staying within the safety bounds, but understanding what those safety bounds are.
And so we have this like really cool test facility where we like take it to the max and actually imagine a truck with these giant training wheels on the back of the trailer and you’re pushing it past the safety limits in order to like try to actually see where it rolls.
And so you define this high dimensional boundary, which then gets captured in software to stay safe and actually do the right thing.
But it’s kind of fascinating the sort of kind of challenges you have there.
But then all of these things drive really interesting challenges from perception to unique behavior prediction challenges.
And obviously in Planner where you have to think about merging and creating gaps with a 53 foot trailer and so forth.
And then obviously the platform itself is very different. We have different numbers of sensors, sometimes types of sensors, and you also have unique blind spots that you have because of the trailer, which you have to think about.
And so it’s a really interesting spectrum. And in the end, you try to capture these special cases in a way that is cleanly augmentations of the existing tech stack because a majority of what we’re solving is actually generalizable to freeway driving and different platforms.
And over time, they all start to kind of merge ideally where the things that are unique are as minimal as possible.
And that’s where you get the most leverage. And that’s why Waymo can take on two trillion dollar opportunities and have been nowhere near 2x the cost or investment or size.
In fact, it’s much, much smaller than that because of the high degree of leverage.
So what kind of sensor suite they can speak to that a long haul truck needs to have? Lidar, vision, how many? What are we talking about here?
Yeah, so it’s more than the car. So very loosely you can think of it as like 2x, but it varies depending on the sensor.
And so we have like dozens of cameras, radar, and then multiple Lidar as well.
You’ll see one difference where the cars have a central main sensor pod on the roof in the middle and then some kind of hood sensors for blind spots.
The truck moves to two main sensor pods on the outsides where you would typically have the mirrors next to the driver.
They effectively go as far out as possible, kind of up to the front, kind of on the cabin, not all the way in the front, but like kind of where the mirrors for the driver would be.
And so those are the main sensor pods. And the reason they’re there is because if you had one in the middle, the trailer is higher than the cabin and you would be occluded with this like awkward wedge.
Too much occlusion.
Too much occlusion. And so then you would add a lot of complexity to the software to make up for that and just unnecessary complexity.
There’s so many probably fascinating design choices here.
It’s really cool.
Because you can probably bring up a Lidar higher and have it in the center or something.
You could have all kinds of choices to make the decisions here that ultimately probably will define the industry.
Right. But by having two on the side, there’s actually multiple benefits.
So one is like you’re just beyond the trailer so you can see fully flush with the trailer.
And so you eliminate most of your blind spot except for right behind the trailer, which is great because now the software carries over really well.
And the same perception system you use on the car side, largely that architecture can carry over and you can retrain some models and so forth that you leverage it a lot.
It also actually helps with redundancy where there’s a really not nice built in redundancy for all the Lidar cameras and radar where you can afford to have any one of them fail and you’re still OK.
And at scale, every one of them will fail.
And you will be able to detect when one of them fails because they don’t because the redundancy that they’re giving you the data that’s inconsistent with the rest of that’s right.
And it’s not just like they no longer give data. It could be like they’re fouled or they stop giving data where some electrical thing gets cut or part of your compute goes down.
So what’s neat is that like you have way more sensors. Part of his field of view and occlusions, part of its redundancy and part of it is new use cases.
So there’s new types of sensors to optimize for long range and kind of the sensing horizon that we look for on our vehicles that is unique to trucks because it actually is like kind of much like further out than than a car.
But a majority are actually used across both cars and trucks. And so we use the same compute, the same fundamental baseline sensors, cameras, radar, IMUs.
And so you get a great leverage from all of the infrastructure and the hardware development as a result.
So what about cameras? What role does. So LIDAR is this rich set of information that has its strengths, has some weaknesses.
Camera is this rich source of information that has some strengths, has its weaknesses.
What role does LIDAR play? What role does vision cameras play in this beautiful problem of autonomous trucking?
It is beautiful. There’s like so much that comes together.
And how much and at which point do they come together?
Yeah. So I’ll start with LIDAR. So LIDAR has been like Waymo’s, one of Waymo’s big strengths and advantages where we developed our own LIDAR in house where many generations in both in cost and functionality.
It is the best in this space.
Which generation? Because I know there’s this there’s this cool. I mean, I love versions that are increasing.
Which version of the hardware stack is it currently, officially, publicly?
So some parts iterate more than others. I’m trying to remember on the sensor side.
So the entire self driving system, which includes sensors and compute, is fifth generation.
I can’t wait until there’s like iPhone style like announcements for like new versions of the Waymo hardware.
Well, we try to be careful because, man, when you change the hardware, it takes a lot to like retrain the models and everything.
So we just went through that and going from the Pacificus to the Jaguars.
And so the Jaguars and then the trucks are, you know, have the same generation now.
But yeah, the LIDAR is it’s incredible. And so Waymo has leaned into that as a strength.
And so a lot of the near range perception system that obviously kind of carries over a lot from the car side uses LIDAR as a very prominent kind of like primary sensor.
But then obviously everything has its strengths and weaknesses.
And so in the near range, LIDAR is a gigantic advantage and it has its weaknesses on when it comes to occlusions in certain areas, rain and weather, like things like that.
But it’s an incredible sensor and it gives you incredible density, perfect location precision and consistency, which is a very valuable property to be able to kind of apply ML approaches.
Can you elaborate consistency?
Yeah. When you have a camera, the position of the sun, the time of the day, various of the properties can have a big impact, whether there’s glare, the field of view, things like that.
So consistent in the face of a changing external environment, the signal.
Yeah. Daytime, nighttime. It’s about 3D physical existence, in effect, like you’re seeing beams of light that physically bounce off of something and come back.
And so whatever the conditional conditions are, like the shape of a human sensor reading from a human or from a car or from an animal, like you have a reliability there, which ends up being valuable for kind of like the long tail of challenges.
So LIDAR is the first sensor to drop off in terms of range and ours has a really good range, but at the end of the day, it drops off. And so particularly for trucks, on top of the general redundancy that you want for near range and complements through cameras and radar for occlusions and for complementary information and so forth,
when you get the long range, you have to be radar and camera primary because your LIDAR data will fundamentally drop off after a period of time and you have to be able to see kind of objects further out.
Now, cameras have the incredible range where you get a high density, high resolution camera, you can get data, you know, well past a kilometer and it’s like really potentially a huge value.
Now, the signal drops off, the noise is higher, detecting is harder, classifying is harder and one that you may not think about localizing is harder because you can be off by like two meters and where something’s located a kilometer away.
And that’s the difference between being on the shoulder and being in your lane. And so you have like interesting challenges there that you have to solve, which have a bunch of approaches that come into it.
Radar is interesting because it also has longer range than LIDAR and it gives you speed information.
So it becomes very, very useful for dynamic information of traffic flow, vehicle motions, animals, pedestrians, like just things that might be useful signals.
And it helps with weather conditions where radar actually penetrates weather conditions in a better way than other sensors.
And so it’s kind of interesting where we’ve kind of started to converge towards not thinking about a problem as a LIDAR problem or a camera problem or radar problem, but it’s a fusion problem where these are all like large scale ML problems where you put data into the system.
And in many cases, you just look for the signals that might be present in the union of all of these and leave it to the system as much as possible to start to really identify how to how to extract that. And then there’s places we have to intervene and actually include more.
But no single sensor is in a great position to really solve this problem and then without a huge extra challenge.
That’s fascinating. There’s a question that’s probably still an open question is at which point do you fuse them? Do you solve the perception problem for each sensor suite individually, the LIDAR suite and the camera suite?
Or do you do some kind of heterogeneous fusion or do you fuse at the very beginning? Is there a good answer or at least an inkling of intuitions you can come up with?
Yeah, so people refer to this as early fusion or late fusion. So late fusion might be that you have the camera pipeline, the LIDAR pipeline, and then you fuse them and when it gets to final semantics and classification and tracking, you fuse them together and figure out which one’s best.
There’s more and more evidence that early fusion is important, and that is because late fusion does not allow you to pick up on the complementary strengths and weaknesses of the sensors.
Weather is a great example where if you do early fusion, you have an incredibly hard problem for any single sensor in rain to solve that problem because you have reflections from the LIDAR, you have weird kind of noise from the camera, blah, blah, blah.
But the combination of all of them can help you filter and help you get to the real signal that then gets you as close as possible to the original stack.
And be much more fluid about the strengths and weaknesses where your camera is much more susceptible to fouling on the actual lens from rain or random stuff, whereas you might be a little bit more resilient in other sensors.
So there’s an element of logic that always happens late in the game, but that fusion early on, especially as you move towards ML and large scale data driven approaches, just maximizes your ability to pull out the best signal you can out of each modality before you start making constraining decisions that end up being hard to unwind late in the stack.
So how much of this is a machine learning problem? What role does ML, machine learning, play in this whole problem of autonomous driving, autonomous trucking?
It’s massive, and it’s increasing over time. If you go back to the grand challenge days and the early days of AV development, there was ML, but it was not in the mass scale data style of ML.
It was like learning models, but in a more structured kind of way. And it was a lot of heuristic and search based approaches and planning and so forth. You can make a lot of progress with these types of approaches kind of across the board and almost deceptive amount of progress.
We can get pretty far, but then you start to really grind the further you get in some parts of the stack if you don’t have an ability to absorb a massive amount of experience in a way that scales very sublinearly in terms of human labor and human attention.
And so when you look at the stack, the perception side is probably the first to get really revolutionized by ML, and it goes back many years because ML for computer vision and these types of approaches kind of took off with a lot of the early kind of push in deep learning.
And so there’s always a debate on the spectrum between end to end ML, which is a little bit too far to how you architect it to where you have modules, but enough ability to think about long tail problems and so forth.
But at the end of the day, you have big parts of system that are very ML and data driven, and we’re increasingly moving in that direction all the way across the board, including behavior where even when it’s not like a gigantic ML problem that covers like a giant swath end to end,
more and more parts of the system have this property where you want to be able to put more data into it and it gets better.
And that has been one of the realizations as you drive tens of millions of miles and try to solve new expansions of domains without regressing your old ones, it becomes intractable for a human to approach that in the way that traditionally robotics has kind of approached some elements of the tech stack.
So are you trying to create a data pipeline specifically for the trucking problem? How much leveraging of the autonomous driving is there in terms of data collection? And how unique is the data required for the trucking problem?
So we reuse all the same infrastructure, so labeling workflows, ML workflows, everything, so that actually carries over quite well. We heavily reuse the data even, where almost every model that we have on a truck, we started with the latest car model.
So it’s almost like a good back arm model.
Yeah, it’s like you can think of like, despite the different domain and different numbers of sensors and position of sensors, there’s a lot of signals that carry over across driving. And so it’s almost like pre training and getting a big boost out of the gate where you can reduce the amount of data you need by a lot.
And it goes both ways, actually. And so we’re increasingly thinking about our data strategy on how we leverage both of these.
So you think about, you know, how other agents react to a truck. Yeah, it’s a little bit different, but the fundamentals are actually like, what will other vehicles in the road do? There’s a lot of carry over that’s possible.
And in fact, just to give you an example, we’re constantly kind of like adding more data from the trucking side.
But as of right now, when we think of our, like one of our models, behavior prediction for other agents on the road, like vehicles, 85% of that data comes from cars.
And a lot of that 85% comes from surface streets, because we just had so much of it, and it was really valuable. And so we’re adding in more and more, particularly in the areas where we need more data, but you get a huge boost out of the gate.
Just all different visual characteristics of roads, lane markings, pedestrians, all that that’s still relevant.
It’s all still relevant. And then just the fundamentals of how, you know, you detect the car. Does it really change that much, whether you’re detecting it from a car or a truck?
The fundamentals of how a person will walk around your vehicle is that it’ll change a little bit.
But the basics, like there’s a lot of signal in there that as a starting point to a network can actually be very valuable.
Now, we do have some very unique challenges where there’s a sparsity of events on a freeway.
The frequency of events happening on a freeway, whether it’s interesting objects in the road or incidents or even like from a human benchmark, like how often does a human have an accident on a freeway is far more sparse than on a surface street.
And so that leads to really interesting data problems where you can’t just drive infinitely to encounter all the different permutations of things you might encounter.
And so there you get into interesting tools like structure testing and data collection, data augmentation and so forth.
And so there’s really interesting kind of technical challenges that push some of the research that enables these new new suites of approaches.
What role does simulation play? Really good question. So Waymo simulates about a thousand miles for every mile it drives.
So you think of in both. So across the board, across the board. Yeah. So you think of, for example, well, if we’ve driven over 20 million miles, that’s over 20 billion miles in simulation.
Now, how do you use simulation? It’s a multipurpose. So you use it for basic development.
So you want to do make sure you have regression, prevention and protection of everything you’re doing. Right. That’s an easy one.
When you encounter something interesting in the world, let’s say there was an issue with how the vehicle behaved versus an ideal human.
You can play that back in simulation and start augmenting your system and seeing how you would have reacted to that scenario with this improvement or this new area.
You can create scenarios that become part of your regression set after that point.
Then you start getting into like really, really kind of hill climbing where you say, hey, I need to improve this system.
I have these metrics are really correlated with final performance. How do I know how well I’m doing operation?
The actual physical driving is the least efficient form of testing and it’s expensive.
It’s time consuming. So grabbing a large scale batch of historical data and simulating it to get a signal of over these last or just random sample of one hundred thousand miles.
How has this metric changed versus where we are today? You can do that far more efficiently in simulation than just driving with that new system on board.
And then you go all the way to the validation phase where to actually see your human relative safety of like how well are you performing on the car side or the trucking side relative to a human.
A lot of that safety case is actually driven by taking all of the physical operational driving, which probably includes a lot of interventions where the driver took over just in case.
And then you simulate those forward and see if would anything have happened. And in most cases, the answer is no.
But you can simulate it forward and you can even start to do really interesting things where you add virtual agents to create harder environments.
You can fuzz the locations of physical agents. You can muck with the scene and stress test the scenario from a whole bunch of different dimensions.
And effectively, you’re trying to like more efficiently sample this like infinite dimensional space, but try to encounter the problems as fast as possible.
Because what most people don’t realize is the hardest problem in autonomous driving is actually the evaluation problem in many ways, not the actual autonomy problem.
And so if you could, in theory, evaluate perfectly and instantaneously, you can solve that problem in a really fast feedback loop quite well.
But the hardest part is being really smart about this suite of approaches on how can you get an accurate signal on how well you’re doing as quickly as possible in a way that correlates to physical driving.
Can you explain the evaluation problem? Which metric are you evaluating towards? Are we talking about safety? What are the performance metrics that we’re talking about?
So in the end, you care about end safety. That’s what’s deceptive where there’s a lot of companies that have a great demo.
The path from a really great demo to being able to go driverless can be deceptively long, even when that demo looks like it’s driverless quality.
And the difference is that the thing that keeps you from going driverless is not the stuff you encounter in a demo.
It’s the stuff that you encounter once at 100,000 miles or 500,000 miles.
And so that is at the root of what is most challenging about going driverless because any issue you encounter, you can go and fix it.
But how do you know you didn’t create five other issues that you haven’t encountered yet?
So those were painful learnings in Waymo’s history that Waymo went through and led to us then finally being able to go driverless in Phoenix and now are at the heart of how we develop.
Evaluation is simultaneously evaluating final kind of end safety of how ready are you to go driverless,
which may be as direct as what is your collision, human relative kind of collision rate for all these types of scenarios and
and severities to make sure that you’re better than a human bar by a good amount.
But that’s not actually the most useful for development.
For development, it’s much more kind of analog metrics that are part of the art of finding how,
what are the properties of driving that give you a way quicker signal that’s more sensitive than a collision that can correlate to the quality you care about and push the feedback loop to all of your development?
A lot of these are, for example, comparisons to human drivers, like manual drivers. How do you do relative to a human driver in various dimensions of various circumstances?
Can I ask you a tricky question? So if I brought you a truck, how would you test it?
Okay, Alan Turing came along and you said,
This one can’t tell if it’s a human driver or autonomous driver.
Yeah, exactly. But not the human because, you know, humans are flawed.
How do you actually know you’re ready, basically? How do you know it’s good enough?
And by the way, this is the reason why Waymo released the safety framework for the car side, because one, it sets the bar so nobody cuts below it and does something bad for the field that causes an accident.
And two, it’s to start the conversation on framing what does this need to look like? Same thing we’ll end up doing for the trucking side.
It ends up being different portfolio of approaches. There’s easy things like, are you compliant with all these fundamental rules of the road?
Like you never drive above the speed limit. That’s actually pretty easy.
You can fundamentally prove that it’s either impossible to violate that rule or that you can itemize the scenarios where that comes up and you can do a test and show that you pass that test and therefore you can handle that scenario.
And so those are like traditional structure testing kind of system engineering approaches where you can just, like fault rates is another example where when something fails, how do you deal with it?
You’re not going to drive and randomly wait for it to fail. You’re going to force a failure and make sure that you can handle it and close courses and simulation or on the road and run through all the permutations of failures, which you can oftentimes for some parts of the system itemize like hardware.
The hardest part is behavioral where you have just infinite situations that could in theory happen and you want to figure out the combinations of approaches that can work there.
You can probably pass the Turing test pretty quickly, even if you’re not like completely ready for driverless because the events that are really kind of like hard will not happen that often.
Just to give you a perspective, a human has a serious accident on a freeway, like a truck driver on a freeway. There’s a serious event happens once every 1.3 million miles and something that actually has like really serious injuries, 28 million miles.
And so those are really rare. And so you could have a driver that looks like it’s ready to go, but you have no signal on what happens there.
And so that’s where you start to get creative on combinations of sampling and statistical arguments, focused structured arguments where you can kind of simulate those scenarios and show that you can handle them and metrics that are correlated with what you care about,
but you can measure much more quickly and get to a right answer. And that’s what makes it pretty hard.
And in the end, you end up borrowing a lot of properties from aerospace and like space shuttles and so forth where you don’t get the chance to launch it a million times just to say you’re ready because it’s too expensive to fail.
And so you go through a huge amount of kind of structured approaches in order to validate it. And then by thoroughness, you can make a strong argument that you’re ready to go.
This is actually a harder problem in a lot of ways, though, because you can think of a space shuttle as getting to a fixed point and then you kind of like or an airplane and you like freeze the software and then you like prove it and you’re good to go.
Here you have to get to a driverless quality bar, but then continue to aggressively change the software even while you’re driverless.
And also the full range of environment that you there’s an external environment where the shuttle is you’re basically testing the like the systems, the internal stuff. Yeah. And you have a lot of control in the external stuff.
Yeah. And the hard part is how do you know you didn’t get worse in something that you just changed?
Yes. Sure. And so so in a lot of ways, like the Turing test starts to fail pretty quickly because you start to feel driverless quality pretty early in that curve.
And if you think about it, right, like in most most kind of, you know, really good A.V. demos, maybe you’ll sit there for 30 minutes.
Right. Yeah. So you’ve driven, you know, 15 miles or something like that to go driverless.
Like what’s the sort of rate of issues that you need to have? You won’t even encounter.
So let’s try something different then. Let’s try a different version of the Turing test, which is like an IQ test.
So there’s these difficult questions of increasing difficulty. They’re very they’re they’re designed.
You don’t know them ahead of time. Nobody knows the answer to them. Right.
And so is it possible to in the future orchestrate basically really difficult course almost of like. Yeah.
That maybe change every year. And that represent if you can pass these, they don’t necessarily represent the full spectrum.
That’s it. Yeah. They won’t be conclusive, but you can at least get a really quick read and filter.
Yeah. Like you’re able to. Yeah. Because you didn’t know them ahead of time. Like, I don’t know.
Probably like construction zones, failures or or driving anywhere in Russia. Yeah. Yeah.
Snow, weather, cut ins, dense traffic, kind of merging, lane closures, animal foreign objects on a road that pop out on short notice,
mechanical failures, sensor breaking, tire popped, weird behaviors by other vehicles like a heartbreak, something reckless that they’ve done,
fouling of sensors like bugs or birds, you know, poop or something.
So but yeah, like you have these like kind of like extreme conditions where like you have a nasty construction zone where everything shuts down and you have to like, you know,
get pulled to the other side of the freeway with a temporary lane like that. Right.
Those are sort of conditions where we do that to ourselves. Right. We itemize everything that could possibly happen to give you a starting point to how to think about what you need to develop.
And at the end of the day, there’s no substitute for real miles.
Like if you think of traditional ML, like, you know how there’s like a validation set where you hold out some data and like real world driving is the ultimate validation set.
That’s the in the end, like the cleanest signal. But you can do a really good job on creating an obstacle course.
And you’re absolutely right. Like at the end, if there was such a thing as automating and kind of a readiness, it would be these extreme conditions like a red light runner.
Right. A really reckless pedestrian that’s jaywalking, a cyclist that, you know, makes like a really awkward maneuver.
That’s actually what keeps you from going driverless. Like in the end, that is the long tail.
Yeah. And it’s interesting to think about that. That to me is the Turing test. Turing test means a lot of things. But to me, in driving, the Turing test is exactly this validation set that is handcrafted.
I don’t know if you know him. There’s a guy named Francois Chollet. He thinks about like how to design a test for general intelligence.
He designs these IQ tests for machines. And the validation set for him is handcrafted. And that it requires like human genius or ingenuity to create a really good test.
And you hold, you truly hold it out. It’s an interesting perspective on the validation set, which is like, make that as hard as possible.
Not a generic representation of the data, but this is the hardest.
The hardest. Yeah. You know, it’s like go. Like you’ll never fully itemize like all the world states that you’ll expand.
And so you have to come up with different approaches. And this is where you start hitting the struggles of ML, where ML is fantastic at optimizing the average case.
It’s a really unique craft to think about how you deal with the worst case, which is what we care about in the AV space when using an ML system on something that occurs like super infrequently.
So like you don’t care about the worst case really on ads because if you miss a few, it’s not a big deal.
But you do care about it on the driving side. And so typically like you’ll never fully enumerate the world.
And so you have to take a step back and abstract away what are the signals that you care about and the properties of a driver that correlate to defensive driving and avoiding nasty situations.
That even though you’ll always be surprised by things you’ll encounter, you feel good about your ability to generalize from what you’ve learned.
All right. Let me ask you a tricky question. So to me, the two companies that are building at scale some of the most incredible robots ever built is Waymo and Tesla.
So there’s very distinct approaches technically, philosophically in these two systems.
Let me ask you to play sort of devil’s advocate and then the devil’s advocate to the devil’s advocate.
It’s a bit of a race. Of course, everyone can win. But if Waymo wins this race to level four, why would they win?
What aspect of the approach do you think would be the winning aspect? And if Tesla wins, why would they win and which aspect of their approach would be the reason?
Just building some intuition, almost not from a business perspective, from any of that, just technically.
Yeah. And we could summarize, I think maybe you can correct me, one of the more distinct aspects is Waymo has a richer suite of sensors as LIDAR and vision.
Tesla now removed radar. They do vision only. Tesla has a larger fleet of vehicles operated by humans.
So it’s already deployed on the field and it’s a larger, what do you call it, operational domain.
And then Waymo is more focused on a specific domain and growing it with fewer vehicles.
So both are fascinating approaches. I think there’s a lot of brilliant ideas. Nobody knows the answer.
So I’d love to get your comments on this lay of the land.
Yeah, for sure. So maybe I’ll start with Waymo.
And you’re right, both incredible companies and just a gigantic respect to everything Tesla has accomplished and how they pushed the field forward as well.
So on the Waymo side, there is a fundamental advantage in the fact that it is focused and geared towards L4 from the very beginning.
We’ve customized the sensor suite for it, the hardware, the compute, the infrastructure, the tech stack and all of the investment inside the company.
That’s deceptively important because there’s like a giant spectrum of problems you have to solve in order to really do this from infrastructure to hardware to autonomy stack to the safety framework.
And that’s an advantage because there’s a reason why it’s the fifth generation hardware and why all of those learnings went into the Dimore program.
It becomes such an advantage because you learn a lot as you drive and you optimize for the best information you have.
But fundamentally, like there’s a big, big jump, like every order of magnitude that you drive in numbers of miles and what you learn and the gap from really kind of like decent progress for L2 and so forth to what it takes to actually go L4.
And at the end of the day, there’s a feeling that Waymo has there’s a long way to go.
Nobody’s won, but there’s a lot of advantages in all of these buckets where it’s the only company that has shipped a fully driverless service where you can go and you can use it and it’s at a decently sizable scale.
And those learnings can feed forward to how to solve the more general problems.
And you see this process you’ve deployed in Chandler.
You don’t know the timeline exactly, but you could see the steps.
They seem almost incremental. It’s become more engineering than totally blind R&D.
It works in one place and then you move to another place and you grow it this way.
And just to give you an example, like we fundamentally changed our hardware and our software stack almost entirely from what went driverless in Phoenix to what is the current generation of the system on both sides because the things that got us to driverless,
even though it got to driverless way beyond human relative safety, it is fundamentally not well set up to scale in an exponential fashion without getting into huge kind of scaling pains.
And so those learnings you just can’t shortcut.
And so that’s an advantage.
And so there’s a lot of open challenges to kind of get through, technical, organizational, like how do you solve problems that are increasingly broad and complex like this, work on multiple products.
But there’s the feeling that, okay, like balls in our court, there’s a head start there.
Now we’ve got to go and solve it.
And I think that focus on L4, it’s a fundamentally different problem.
If you think about it, like let’s say we were designing an L2 truck that was meant to be safer and help a human.
You could do that with far less sensors, far less complexity and provide value very quickly, arguably what we already have today just packaged up in a good product.
But you would take a huge risk in having a gap from even the like compute and sensors, not to mention the software, to then jump from that system to an L4 system.
So it’s a huge risk basically.
So again, allow me to be the person that plays the devil’s advocate and argue for the Tesla approach.
So what you just laid out makes perfect sense and is exactly right.
I have some open questions here, which is it’s possible that investing more in faster data collection, which is essentially what Tesla is doing, will get us there faster if the sensor suite doesn’t matter as much and machine learning can do a lot of the work.
My question is, how much is the thing you mentioned before, how much of driving can be end to end learned?
That’s the open question.
Obviously, the Waymo and the vision only machine learning approach will solve driving eventually, both.
The question is of timeline, what’s faster?
And what you mentioned, like if I were to make the opposite argument, like what puts Tesla in the strongest position, it’s data.
That is their superpower where they have an access to real world data effectively with a safety driver.
They found a way to get paid by safety drivers versus safer safety drivers.
But all joking aside, one, it is incredible that they’ve built a business that’s incredibly successful that can now be a foundation and bootstrap really aggressive investment in the autonomy space.
If you can do it, that’s always like an incredible kind of advantage.
And in the data aspect of it, it is a giant amount of data if you can use it the right way to then solve the problem.
But the ability to collect and filter through to the things that matter at real world scale, like a large distribution, that is huge.
Like it’s a big advantage.
And so then the question becomes, can you use it in the right way?
And do you have the right software systems and hardware systems in order to solve the problem?
And you’re right that in the long term, there’s no reason to believe that pure camera systems can’t solve the problem that humans obviously are solving with vision systems.
But it’s a risk.
So there’s no argument that it’s not a risk.
And it’s already such a hard problem.
And so much of that problem, by the way, is even beyond the perception side, some of the hardest elements of the problem on the behavioral side and decision making and the long tail safety case.
If you are adding risk and complexity on the input side from perception, you’re now making a really, really hard problem, which on its own is still almost insurmountably hard, even harder.
And so the question is just how much.
And this is where you can easily get into a little bit of a kind of a trap where similar to how you how do you evaluate how good an AV company’s product is.
Like you go and you do a trial kind of a test run with them, a demo run, which they’ve kind of optimized like crazy and so forth and like and it feels good.
Do you do you put any weight in that? Right.
You know that that gap is kind of like, you know, pretty large still.
Same thing on the like perception case, like the long tail of computer vision is really, really hard.
And there’s a lot of ways that that can come up.
And even if it doesn’t happen that often at all, when you think about the safety bar and what it takes to actually go full driverless, not like incredible assistance driverless, but full driverless, that bar gets crazy high.
And not only do you have to solve it on the behavioral side, but now you have to push computer vision beyond arguably where it’s ever been pushed.
And so, you know, on top of the broader AV challenge, you have a really hard perception challenge as well.
So there’s perception, there’s planning, there’s human robot interaction. To me, what’s fascinating about what Tesla is doing is in this march towards level four, because it’s in the hands of so many humans, you get to see video, you get to see humans.
I mean, forget companies, forget businesses. It’s fascinating for humans to be interacting with robots.
That’s incredible. And they’re actually helping kind of push it forward.
And that is valuable, by the way, where even for us, a decent percentage of our data is human driving.
We intentionally have humans drive higher percentage than you might expect because that creates some of the best signals to train the autonomy. And so that is on its own a value.
So together, we’re kind of learning about this problem in an applied sense, just like you had with Cosmo. When you’re chasing an actual product that people are going to use, robot based product that people are going to use, you have to contend with the reality of what it takes to build a robot that successfully perceives the world and operates in the world.
And what it takes to have a robot that interacts with other humans in the world. And that’s like, to me, one of the most interesting problems humans have ever undertaken because you’re in trying to create an intelligent agent that operates in a human world.
You’re also understanding the nature of intelligence itself. Like how hard is driving is still not answered to me.
Yeah, I still don’t understand the subtle cues, like even little things like your interaction with a pedestrian where you look at each other and just go, OK, go.
Like that’s hard to do without a human driver. Right. And you’re missing that dimension. How do you communicate that?
So there’s like really, really interesting kind of like elements here. Now, here’s what’s beautiful.
Can you imagine that like when autonomous driving is solved, how much of the technology foundation of that space can go and have like tremendous, just transformative impacts on other problem areas and other spaces that have subsets of these same problems?
Like, it’s just incredible to think about that.
It’s both a pro and a con is with autonomous driving is so safety critical. So once you solve it, it’s beautiful because there’s so many applications that are a lot less safety critical.
But it’s also the con of that is it’s so hard to solve. And the same journalists that you mentioned to get excited for a demo are the ones who write long articles about the failure of your company.
If there’s one accident that’s based on a robot, it’s just society is so tense and waiting for failure of robots.
You’re in such a high stake environment. Failure has such a high cost. And it slows down development. It slows down development.
Yeah, like the team like definitely noticed that like once you go driverless, like we’re driverless in Phoenix and you continue to iterate, your iteration pace slows down
because your fear of regression forces so much more rigor that obviously you have to find a compromise on like, okay, well, how often do we release driverless builds?
Because every time you release a driverless build, you have to go through this like validation process, which is very expensive and so forth.
So it is interesting. It is one of the hardest things. There’s no other industry where like you wouldn’t release products way, way quicker when you start to kind of provide even portions of the value that you provide.
Healthcare maybe is the other one.
But at the same time, right, like we’ve gotten there where you think of like surgery, right?
Like you have surgery, there’s always a risk, but like it’s really, really bounded.
You know that there’s an accident rate when you go out and drive your car today, right? And you know what the fatality rate in the US is per year.
We’re not banning driving because there was a car accident, but the bar for us is way higher and we hold ourselves very serious to it where you have to not only be better than a human,
but you probably have to like at scale be far better than a human by a big margin and you have to be able to like really, really thoughtfully explain all of the ways that we validate that becomes very comfortable for humans to understand
because a bunch of jargon that we use internally just doesn’t compute.
At the end of the day, we have to be able to explain to society how do we quantify the risk and acknowledge that there is some nonzero risk, but it’s far above a human relative safety.
See, here’s the thing, to push back a little bit and bring Cosmo back in the conversation, you said something quite brilliant at the beginning of this conversation that I think probably applies for autonomous driving, which is, you know, there’s this desire to make autonomous cars more safer than human driven cars.
But if you create a product that’s really compelling and is able to explain both the leadership and the engineers and the product itself can communicate intent, then I think people may be able to be willing to put up with the thing that might be even riskier than humans
because they understand the value of taking risks.
You mentioned the speed limit.
Humans understand the value of going over the speed limit.
Humans understand the value of going fast through a yellow light.
When you’re in Manhattan streets, pushing through crossing pedestrians, they understand that.
I mean, this is a much more tense topic of discussion, so this is just me talking.
So with Cosmo’s case, there was something about the way this particular robot communicated, the energy it brought, the intent it was able to communicate to the humans that you understood that of course it needs to have a camera.
Of course it needs to have this information.
And in that same way, to me, of course a car needs to take risks.
Of course there’s going to be accidents.
If you want a car that never has an accident, have a car that just doesn’t go anywhere.
But that’s tricky because that’s not a robotics problem.
Many accidents are not even due to you, obviously.
So there’s a big difference though.
That’s not a personal decision.
You’re also impacting obviously kind of the rest of the road and we’re facilitating it.
And so there’s a higher kind of ethical moral bar, which obviously then translates into as a society and from a regulatory standpoint, kind of like what comes out of it where it’s hard for us to ever see this even being a debate in the sense that you have to be beyond reproach from a safety standpoint because if you’re wrong about this, you could set the entire field back a decade.
See, this is me speaking.
I think if we look into the future, there will be, I personally believe, this is me speaking, that there will be less and less focus on safety.
It’s still very, very high.
Meaning like after autonomy is very common and accepted.
Not so common as everywhere.
But there has to be a transition because I think for innovation, just like you were saying to explore ideas, you have to take risks.
And I think if autonomy in the near term is to become prevalent in society, I think people need to be more willing to understand the nature of risk, the value of risk.
It’s very difficult, you’re right, of course, with driving, but that’s the fascinating nature of it.
It’s a life and death situation that brings value to millions of people, so you have to figure out what do we value about this world?
How much do we value, how deeply do we want to avoid hurting other humans?
And there is a point where you can imagine a scenario where Waymo has a system that is, even when it’s beyond human relative safety and provably statistically will save lives,
there is a thoughtful navigation of that fact versus just kind of society readiness and perception and education of society and regulators and everything else,
where it’s multidimensional and it’s not a purely logical argument.
But ironically, the logic can actually help with the emotions. And just like any technology, there’s early adopters and then there’s kind of like a curve that happens after it.
And eventually celebrities, you get the rock in a Waymo vehicle and then everybody just comes along.
And then everybody calms down because the rock likes it.
If you post the…
And it’s an open question on how this plays out. Maybe we’re pleasantly surprised and people just realize that this is such an enabler of life and efficiency and cost and everything that there’s a pull.
At some point, I should fully believe that this will go from a thoughtful kind of movement and tiptoeing and kind of like a push to society realizes how wonderful of an enabler this could become and it becomes more of a pull.
And hard to know exactly how that will play out. But at the end of the day, like both the goods transportation and the people transportation side of it has that property where it’s not easy.
There’s a lot of open questions and challenges to navigate. And there’s obviously the technical problems to solve as a kind of prerequisite.
But they have such an opportunity that is on a scale that very few industries in the last 20, 30 years have even had a chance to tackle that I maybe we’re pleasantly surprised by how much that tipping point like in a really short amount of time actually turns into a societal pull to kind of embrace the benefits of this.
Yeah, I hope so.
It seems like in the recent few decades, there’s been tipping points for technologies where like overnight things change. It’s like from taxis to ride sharing services, all that shift.
I mean, there’s just shift after shift after shift that requires digitization to end technology.
I hope we’re pleasantly surprising this.
So there’s millions of long haul trucks now in the United States.
Do you see a future where there’s millions of Waymo trucks and maybe just broadly speaking Waymo vehicles, just like ants running around the United States, freeways and local roads?
Yeah, in other countries too.
You look back decades from now and it might be one of those things that just feels so natural and then it becomes almost like this kind of interesting kind of oddity that we had none of it like, you know, kind of decades earlier.
And it’ll take a long time to grow and scale.
Very different challenges appear at every stage.
But over time, like this is one of the most enabling technologies that we have in the world today.
It’ll feel like, you know, how is the world before the Internet?
How is the world before mobile phones?
Like it’s going to have that sort of a feeling to it on both sides.
It’s hard to predict the future, but do you sometimes think about weird ways it might change the world, like surprising ways?
So obviously there’s more direct ways where like there’s increases efficiency.
It will enable a lot of kind of logistics, optimizations kind of things.
It will change probably our roadways and all that kind of stuff.
But it could also change society in some kind of interesting ways.
Do you ever think about how might change cities, how might change our lives, all that kind of stuff?
You can imagine city where people live versus work becoming more distributed because the pain of commuting becomes different, just easier.
And there’s a lot of options that open up.
The layout of cities themselves and how you think about car storage and parking obviously just enables a completely different type of experience in urban environments.
I think there was like a statistic that something like 30 percent of the traffic in cities during rush hour is caused by pursuit of parking or like some really high stats.
So those obviously kind of open up a lot of options.
Flexibility on goods will enable new industries and businesses that never existed before because now the efficiency becomes more palatable.
Good delivery, timing, consistency and flexibility is going to change.
The way we distribute the logistics network will change.
The way we then can integrate with warehousing, with shipping ports, you can start to think about greater automation through the whole kind of stack and how that supply chain,
the ripples become much more agile versus like very grindy the way they are today where just the adaptation is like very tough and there’s a lot of constraints that we have.
I think it’ll be great for the environment.
It’ll be great for safety where like probably about 95 percent of accidents today statistically are due to just attention or things that are preventable with the strengths of automation.
Yeah, and it’ll be one of those things where industries will shift, but the net creation is going to be massively positive.
And then we just have to be thoughtful about the negative implications that will happen in local places and adjust for those.
But I’m an optimist in general for the technology where you could argue a negative on any new technology,
but you start to kind of see that if there is a big demand for something like this, in almost all cases,
that like it’s an enabling factor that’s going to kind of propagate through society.
And particularly as life expectancies get longer and so forth, like there’s just a lot more need for a greater percentage of the population to kind of just be serviced with a high level of efficiency
because otherwise we’re going to have a really hard time kind of scaling to what’s ahead in the next 50 years.
Yeah, and you’re absolutely right.
Every technology has negative consequences and positive consequences, and we tend to just focus on the negative a little bit too much.
In fact, autonomous trucks are often brought up as an example of artificial intelligence and robots in general taking our jobs.
And as we’ve talked about briefly here, we talk a lot with Steve.
It is a concern that automation will take away certain jobs, it will create other jobs.
There’s temporary pain, hopefully temporary, but pain is pain and people suffer and that human suffering is really important to think about.
But trucking is, I mean, there’s a lot written on this is I would say far from the thing that will cause the most pain.
Yeah, there’s even more positive properties about trucking where not only is there just a huge shortage which is going to increase,
the average age of truck drivers is getting closer to 50 because the younger people aren’t wanting to come into it.
They’re trying to like incentivize, lower the age limit, like all these sort of things.
And the demand is just going to increase.
And the least favorable, I mean, it depends on the person, but in most cases, the least favorable types of routes are the massive long haul routes
where you’re on the road away from your family 300 plus days a year.
Steve talked about the pain of those kinds of routes from a family perspective.
You’re basically away from family.
It’s not just hours, you work insane hours, but it’s also just time away from family.
Obesity rate is through the roof because you’re just sitting all day.
It’s really, really tough.
And that’s also where like the biggest kind of safety risk is because of fatigue.
And so when you think of the gradual evolution of how trucking comes in, first of all, it’s not overnight.
It’s going to take decades to kind of phase in all the like, there’s just a long, long, long road ahead.
But the routes and the portions of trucking that are going to require humans the longest and benefit the most from humans are the short haul
and most complicated kind of more urban routes, which are also the more pleasant ones, which are less continual driving time,
more flexibility on like geography and location, and you get to kind of sleep at your own home.
And very importantly, if you optimize the logistics, you’re going to use humans much better and thereby pay them much better.
Because like one of the biggest problems is truck drivers currently are paid by like how much they drive.
So they really feel the pain of inefficient logistics because like if they’re just sitting around for hours,
which they often do not driving, waiting, they’re not getting paid for that time.
So like logistics has a significant impact on the quality of life of a truck driver.
And a high percentage of trucks are like empty because of inefficiencies in the system.
Yeah, it’s one of those things where like, and the other thing is when you increase the efficiency of a system like this,
the overall net like volume of the system tends to increase, right?
Like the entire market cap of trucking is going to go up when the efficiency improves
and facilitates both growth in industries and better utilization of trucking.
And so that on its own just creates more and more demand, which of all the places where AI comes in
and starts to really kind of reshape an industry, this is one of those where like there’s just a lot of positives
that for at least any time in the foreseeable future seem really lined up in a good way to kind of come in
and help with the shortage and start to kind of optimize for the routes that are most dangerous and most painful.
Yeah, so this is true for trucking, but if we zoom out broader, automation and AI does technology broadly, I would say.
But you know, automation is a thing that has a potential in the next couple of decades to shift the kind of jobs available to humans.
And so that results in, like I said, human suffering because people lose their jobs, there’s economic pain there,
and there’s also a pain of meaning.
So for a lot of people, work is a source of meaning, it’s a source of identity, of pride, of pride in getting good at the job,
pride in craftsmanship and excellence, which is what truck drivers talk about.
But this is true for a lot of jobs.
And is that something you think about as a sort of a roboticist zooming out from the trucking thing?
Like where do you think it would be harder to find activity and work that’s a source of identity, a source of meaning in the future?
I do think about it because you want to make sure that you worry about the entire system,
like not just like the part of the economy plays in it, but what are the ripple effects of it down the road.
And on enough of a time window, there’s a lot of opportunity to put in the right policies,
the right opportunities to kind of reshape and retrain and find those openings.
And so just to give you a few examples, both trucking and cars, we have remote assistance facilities
that are there to interface with customers and monitor vehicles and provide like very focused kind of assistance
on kind of areas where the vehicle may want to request help in understanding an environment.
So those are jobs that kind of get created and supported.
I remember like taking a tour of one of the Amazon facilities where you’ve probably seen the Kiva systems robots
where you have these orange robots that have automated the warehouse, like kind of picking and collecting of items.
And it’s like really elegant and beautiful way.
It’s actually one of my favorite applications of robotics of all time.
You know, like I think it kind of came across a company like 2006 was just amazing.
And what was the warehouse robots that transport little things?
So basically, instead of a person going and walking around and picking the seven items in your order,
these robots go and pick up a shelf and move it over in a row where like the seven shelves that contain the seven items
are lined up in a laser or whatever points to what you need to get.
And you go and pick it and you place it to fill the order.
And so the people are fulfilling the final orders.
What was interesting about that is that when I was asking them about like kind of the impact on labor
when they transitioned that warehouse, the throughput increased so much
that the jobs shifted towards the final fulfillment, even though the robots took over entirely the search of the items themselves.
And the labor, the job stayed like nobody like that was actually the same amount of jobs, roughly they were necessary.
But the throughput increased by I think over 2x or some amount.
Right. So you have these situations that are not zero sum games in this really interesting way.
And the optimist in me thinks that there’s these types of solutions in almost any industry
where the growth that’s enabled creates opportunities that you can then leverage.
But you got to be intentional about finding those and really helping make those links because
even if you make the argument that like there’s a net positive,
locally there’s always tough hits that you got to be very careful about.
That’s right. You have to have an understanding of that link because there’s a short period of time
whether training is acquired or just mental transition or physical or whatever is acquired,
that’s still going to be short term pain. The uncertainty of it, there’s families involved.
It’s exceptionally is difficult on a human level and you have to really think about that.
You can’t just look at economic metrics always, it’s human beings.
That’s right. And you can’t even just take it as like, okay, well, we need to like subsidize or whatever
because like there is an element of just personal pride where majority of people,
like people don’t want to just be okay, but like they want to actually like have a craft like you said
and have a mission and feel like they’re having a really positive impact.
And so my personal belief is that there’s a lot of transferability and skill set that is possible,
especially if you create a bridge and an investment to enable it.
And to some degree, that’s our responsibility as well in this process.
You mentioned Kiva robots, Amazon. Let me ask you about the Astro robot, which is, I don’t know if you’ve seen it,
it’s Amazon has announced that it’s a home robot that they have a screen looks awfully a lot like Cosmo
has I think different vision probably. What are your thoughts about like home robotics in this kind of space?
There’s been quite a bunch of home robots, social robots that very unfortunately have closed their doors
that for various reasons, perhaps it were too expensive, there’s manufacturing challenges, all that kind of stuff.
What are your thoughts about Amazon getting into this space?
Yeah, we had some signs that they’re getting into it like long, long, long ago.
Maybe they were a little bit too interested in Cosmo during our conversations,
but they’re also very good partners actually for us as we kind of just integrated a lot of shared technology.
If I could also get your thoughts on, you could think of Alexa as a robot as well, Echo.
Do you see those as fundamentally different just because you can move and look around?
Is that fundamentally different than the thing that just sits in place?
It opens up options, but my first reaction is I have my doubts that this one’s going to hit the mark
because I think for the price point that it’s at and the kind of functionality and value propositions that they’re trying to put out,
it’s still searching for the killer application that justifies I think it was like a $1,500 price point or kind of somewhere on there.
That’s a really high bar, so there’s enthusiasts and early adopters will obviously kind of pursue it,
but you have to really, really hit a high mark at that price point, which we always tried to –
we were always very cautious about jumping too quickly to the more advanced systems that we really wanted to make,
but would have raised the bar so much you have to be able to hit it in today’s cost structures and technologies.
The mobility is an angle that hasn’t been utilized, but it has to be utilized in the right way,
so that’s going to be the biggest challenge is can you meet the bar of what the mass market consumer –
think our neighbors, our friends, parents, would they find a deep, deep value in this at a mass scale that justifies the price point?
I think that’s in the end one of the biggest challenges for robotics, especially consumer robotics where you have to kind of meet that bar.
It becomes very, very hard.
And there’s also the higher bar, just like you were saying with Cosmo, of a thing that can look one way and then turn around and look at you.
That’s either a super desirable quality or a super undesirable quality depending on how much you trust the thing.
And so there’s a problem of trust to solve there.
There’s a problem of personality.
It’s the quote unquote problem that Cosmo solved so well is that you trust the thing,
and that has to do with the company, with the leadership, with the intent that’s communicated by the device and the company and everything together.
Yeah, exactly right.
And I think they also have to retrace some of the learnings on the character side where, as usual,
I think that’s the place where a lot of companies are great at the hardware side of it and think about those elements.
Thinking about the AI challenges, particularly with the advantage of Alexa, is a pretty huge boost for them.
The character side of it for technology companies is pretty novel territory, and so that will take some iterations.
But yeah, I mean, I hope this continued progress in the space and that thread doesn’t kind of go dormant for too long,
and it’s going to take a while to kind of evolve into the ideal applications.
But this is one of Amazon’s – I guess you could call it – it’s definitely part of their DNA,
but in many cases is also strength where they’re very willing to iterate kind of aggressively and move quickly.
And take risks.
You have deep pockets so you can kind of –
Yeah, and they’ll maybe have more misfires than an apple would, but it’s different styles and different approaches.
And at the end of the day, it’s like there’s a few familiar kind of elements there for sure, which was kind of –
Is one way to put it.
Yeah, so why is it so hard at a high level to build a robotics company?
A robotics company that lives for a long time.
So if you look at – I thought Cosmo for sure would live for a very long time.
That to me was exceptionally successful vision and idea and implementation.
iRobot is an example of a company that has pivoted in all the right ways to survive and arguably thrive
by focusing on having like a – have a driver that constantly provides profit, which is the vacuum cleaner.
And of course there’s like Amazon, what they’re doing is they’re almost like taking risks so they can afford it
because they have other sources of revenue.
But outside of those examples, most robotics companies fail.
Why do they fail?
Why is it so hard to run a robotics company?
iRobot’s impressive because they found a really, really great fit of where the technology could satisfy
a really clear use case and need, and they did it well, and they didn’t try to overshoot from a cost to benefit standpoint.
Robotics is hard because it like tends to be more expensive.
It combines way more technologies than a lot of other types of companies do.
If I were to like say one thing that is maybe the biggest risk in like a robotics company failing
is that it can be either a technology in search of an application or they try to fight off a kind of an offering
that has a mismatch in kind of price to function.
And just the mass market appeal isn’t there.
And consumer products are just hard.
It’s just, I mean, after all the years and I’d like definitely kind of feel a lot of the battle scars
because you have, not only do you have to like hit the function, but you have to educate and explain,
get awareness up, deal with different types of consumers.
There’s a reason why a lot of technologies sometimes start in the enterprise space and then kind of continue
forward in the consumer space, even like you see AR like starting to kind of make that shift with HoloLens
and so forth in some ways.
Consumers and price points that they’re willing to kind of be attracted in a mass market way.
And I don’t mean like 10,000 enthusiasts bought it, but I mean like 2 million, 10 million, 50 million
like mass market kind of interest have bought it.
That bar is very, very high and typically robotics is novel enough and nonstandardized enough to where it pushes
on price points so much that you can easily get out of range where the capabilities and today’s technology
or just the function that was picked just doesn’t line up.
And so that product market fit is very important.
So the space of killer apps or rather super compelling apps is much smaller because it’s easy to get outside
of the price range for most consumers.
And it’s not constant, right? And that’s why we picked off entertainment because the quality was just so low
in physical entertainment that we felt we could leapfrog that and still create a really compelling offering
at a price point that was defensible and that proved out to be true.
And over time, that same opportunity opens up in healthcare, in home applications and commercial applications
and kind of broader, more generalized interface, but there’s missing pieces in order for that to happen.
And all of those have to be present for it to line up.
And we see these sort of trends in technology where kind of technologies that start in one place evolve
and kind of grow to another.
Some things start in gaming.
Some things start in space or aerospace and then kind of move into the consumer market.
And sometimes it’s just a timing thing, right, where how many stabs at what became the iPhone were there
over the 20 years before that just weren’t quite ready in the function relative to the kind of price point complexity.
And sometimes it’s a small detail of the implementation that makes all the difference, which is design is so important.
Something, yeah, like the new generation UX, right?
And it’s tough and oftentimes all of them have to be there and it has to be like a perfect storm.
But yeah, history repeats itself in a lot of ways in a lot of these trends, which is pretty fascinating.
Well, let me ask you about the humanoid form.
What do you think about the Tesla bot and humanoid robotics in general?
So obviously, to me, autonomous driving Waymo and the other companies working in the space,
that seems to be a great place to invest in potential revolutionary application robotics application folks application.
What’s the role of humanoid robotics?
Do you think Tesla bot is ridiculous?
Do you think it’s super promising?
Do you think it’s interesting, full of mystery, nobody knows?
What do you think about this thing?
Yeah, I think today humanoid form robotics is research.
There’s very few situations where you actually need a humanoid form to solve a problem.
If you think about it, right, like wheels are more efficient than legs.
There’s joints and degrees of freedom beyond a certain point, just add a lot of complexity and cost.
Right. So if you’re doing a humanoid robot, oftentimes it’s in the pursuit of a humanoid robot,
not in the pursuit of an application for the time being.
Especially when you have like kind of the gaps in interface and, you know, kind of AI that we kind of talk about today.
So anything you want does I’m interested in following.
So there’s there’s an element of that world, no matter how crazy, how crazy it is.
I just like, you know, I’ll pay attention. I’m curious to see what comes out of it.
So it’s like you can’t you can’t ever, you know, ignore it.
But, you know, it’s definitely far afield from their kind of core business, obviously.
What was interesting to me is I’ve disagreed with Elon a lot about this is to me,
the compelling aspect of the humanoid form and a lot of kind of robots, Cosmo,
for example, is a human robot interaction part.
From Elon Musk’s perspective, Tesla bot has nothing to do with the human.
It’s a form that’s effective for the factory because the factory is designed for humans.
But to me, the reason you might want to argue for the humanoid form is because, you know,
at a party, it’s a nice way to fit into the party.
The humanoid form has a compelling notion to it in the same way that Cosmo is compelling.
I would argue, if we were arguing about this, that it’s cheaper to build a Cosmo like that form.
But if you wanted to make an argument, which I have with Jim Keller about, you know,
you could actually make a human robot for pretty cheap. It’s possible.
And then the question is, all right, if you’re using an application where it can be flawed,
it can have a personality and be flawed in the same way that Cosmo is,
then maybe it’s interesting for integration to human society.
That, to me, is an interesting application of a humanoid form because humans are drawn,
like I mentioned to you, like robots, we’re drawn to legs and limbs and body language
and all that kind of stuff. And even a face, even if you don’t have the facial features,
which you might not want to have to reduce the creepiness factor, all that kind of stuff.
But yeah, that, to me, the humanoid form is compelling.
But in terms of that being the right form for the factory environment, I’m not so sure.
Yeah, for the factory environment, like right off the bat, what are you optimizing for?
Is it strength? Is it mobility? Is it versatility, right?
Like that changes completely the look and feel of the robot that you create, you know,
and almost certainly the human form is over designed for some dimensions and constrained for some dimensions.
And so, like, what are you grasping? Is it big? Is it little, right?
So you would customize it and make it customizable for the different needs if that was the optimization, right?
And then, you know, for the other one, I could totally be wrong.
You know, I still feel that the closer you try to get to a human, the more you’re subject to the biases of what a human should be
and you lose flexibility to shift away from your weaknesses and towards your strengths.
And that changes over time, but there’s ways to make really approachable and natural interfaces for robotic kind of characters
and, you know, kind of deployments in these applications that do not at all look like a human directly,
but that actually creates way more flexibility and capability and role and forgiveness and interface and everything else.
Yeah, it’s interesting, but I’m still confused by the magic I see in legged robots.
Yeah, so there is a magic. So I’m absolutely amazed at it from a technical curiosity standpoint
and like the magic that like the Boston Dynamics team can do from, you know, like from walking and jumping and so forth.
Now, like there’s been a long journey to try to find an application for that sort of technology.
But wow, that’s incredible technology, right?
So then you kind of go towards, OK, are you working back from a goal of what you’re trying to solve?
Are you working forward from a technology and I’m looking for a solution?
And I think that’s where it’s a kind of a bi directional search oftentimes, but you got the two have to meet.
And that’s where humanoid robots is kind of close to that.
And that like it is a decision about a form factor and a technology that it forces
that doesn’t have a clear justification on why that’s the killer app for, you know, from the other end.
But I think the core fascinating idea with the Tesla bot is the one that’s carried by Waymo as well,
is when you’re solving the general robotics problem of perception control where there’s the very clear applications of driving.
It’s as you get better and better at it when you have like Waymo driver.
Yeah, the whole world starts to kind of start to look like a robotics problem.
So it’s very interesting for now.
Detection, classification, segmentation, tracking, planning, like it’s.
So there’s no reason. I mean, I’m not I’m not speaking for Waymo here, but, you know, moving goods.
There’s no reason transformer like this thing couldn’t, you know, take the goods up an elevator, you know, like that,
like slowly expand what it means to move goods and expand more and more of the world into a robotics problem.
Well, that’s right. And you start to like think of it as an end end robotics problem from like loading from, you know, from everything else.
And even like the truck itself, you know, today’s generation is integrating into today’s understanding of what a vehicle is, right?
The Pacifica Jaguar, the Freightliners from Daimler.
There’s nothing that stops these us from like down the road after like starting to get to scale to like expand these partnerships to really rethink what would the next generation of a truck look like that is actually optimized for autonomy, not for today’s world.
And maybe that means a very different type of trailer.
Maybe that like there’s a lot of things you could rethink on that front, which is on its own very, very exciting.
Let me ask you, like I said, you went to the Mecca of robotics, which is CMU, Carnegie Mellon University.
You got a PhD there. So maybe by way of advice and maybe by way of story and memories, what does it take to get a PhD in robotics at CMU?
And maybe you can throw in there some advice for people who are thinking about doing work in artificial intelligence and robotics and are thinking about whether to get a PhD.
I actually went, I was at CMU for undergrad as well and didn’t know anything about robotics coming in and was doing electrical computer engineering, computer science, and really got more and more into kind of AI and then fell in love with autonomous driving.
And at that point, that was just by a big margin, such an incredible central spot of investment in that area.
And so what I would say is that robotics, for all the progress that’s happened, is still a really young field.
There’s a huge amount of opportunity. Now that opportunity shifted where something like autonomous driving has moved from being very research and academics driven to being commercial driven where you see the investments happening in commercial.
Now there’s other areas that are much younger and you see like kind of grasping and manipulation, making kind of the same sort of journey that like autonomy made and there’s other areas as well.
What I would say is the space moves very quickly. Anything you do a PhD in, like it is in most areas, will evolve and change as technology changes and constraints change and hardware changes and the world changes.
And so the beautiful thing about robotics is it’s super broad. It’s not a narrow space at all and it could be a million different things in a million different industries.
And so it’s a great opportunity to come in and get a broad foundation on AI, machine learning, computer vision, systems, hardware, sensors, all these separate things.
You do need to go deep and find something that you’re really, really passionate about. Obviously, just like any PhD, this is like a five, six year kind of endeavor.
And you have to love it enough to go super deep to learn all the things necessary to be super deeply functioning in that area and then contribute to it in a way that hasn’t been done before.
And in robotics, that probably means more breadth because robotics is rarely kind of like one particular kind of narrow technology.
And it means being able to collaborate with teams where like one of the coolest aspects of like the experience that I kind of cherish in our PhD is that we actually had a pretty large AV project that for that time was like a pretty serious initiative where you got to like partner with a larger team.
And you had the experts in perception and the experts in planning and the staff and the mechanical engineers.
So I was working on a project called UPI back then, which was basically the off road version of the DARPA challenge.
It was a DARPA funded project for basically like a large off road vehicle that you would like drop and then give it a waypoint 10 kilometers away and it would have to navigate a completely unstructured environment.
In an off road environment.
Yeah. So like forests, ditches, rocks, vegetation, and so it was like a really, really interesting kind of a hard problem where like wheels would be off to my shoulders. It’s like gigantic, right?
Yeah. By the way, AV for people stands for autonomous vehicles.
Autonomous vehicles. Yeah. Sorry.
And so what I think is like the beauty of robotics, but also kind of like the expectation is that there’s spaces in computer science where you can be very, very narrow and deep.
Robotics, the necessity, but also the beauty of it is that it forces you to be excited about that breadth and that partnership across different disciplines that enable it.
But that also opens up so many more doors where you can go and you can do robotics and almost any category where robotics isn’t really an industry.
It’s like AI, right?
It’s like the application of physical automation to all these other worlds. And so you can do robotic surgery, you can do vehicles, you can do factory automation, you can do health care, you can do like leverage the AI around the sensing to think about static sensors and scene understanding.
So I think that’s got to be the expectation and the excitement and it breeds people that are probably a little bit more collaborative and more excited about working in teams.
If I could briefly comment on the fact that the robotics people I’ve met in my life from CMU and MIT, they’re really happy people.
Yeah. Because I think it’s the collaborative thing.
I think I think you don’t.
You’re not like sitting in like the fourth basement.
Yes, exactly. Which when you’re doing machine learning purely software, it’s very tempting to just disappear into your own hole and never collaborate.
And that breeds a little bit more of the silo mentality of like, I have a problem.
It’s almost like negative to talk to somebody else or something like that.
But robotics folks are just very collaborative, very friendly. And there’s also an energy of like you get to confront the physics of reality often, which is humbling and also exciting.
So it’s humbling when it fails and exciting when it finally works.
It’s like a purity of the passion.
And you’ve got to remember that like right now, like robotics and AI is like just all the rage and autonomous vehicles and all this, like 15 years ago and 20 years ago, like it wasn’t that deeply lucrative.
People that went into robotics, they did it because they were like thought it was just the coolest thing in the world to like make physical things intelligent in the real world.
And so there’s like a raw passion where they went into it for the right reasons and so forth.
And so it’s really great space. And that organizational challenge, by the way, like when you think about the challenges in AV, we talk a lot about the technical challenges.
The organizational challenges through the roof where you think about what it takes to build an AV system and you have companies that are now thousands of people.
And you look at other really hard technical problems like an operating system.
It’s pretty well established.
Like you kind of know that there’s a file system, there’s virtual memory, there’s this, there’s that, there’s like caching and like and there’s like a really reasonably well established modularity and APIs and so forth.
And so you can kind of like scale it in an efficient fashion.
That doesn’t exist anywhere near to that level of maturity in autonomous driving right now.
And tech stacks are being reinvented, organizational structures are being reinvented.
You have problems like pedestrians that are not isolated problems. They’re part sensing, part behavior prediction, part planning, part evaluation.
And like one of the biggest challenges is actually how do you solve these problems where the mental capacity of a human is starting to get strained on how do you organize it and think about it where you have this like multidimensional matrix that needs to all work together.
And so that makes it kind of cool as well because it’s not like solved at all from like what does it take to actually scale this, right?
And then you look at like other gigantic challenges that have been successful and are way more mature, there’s a stability to it.
And like maybe the autonomous vehicle space will get there.
But right now, just as many technical challenges as they are, they’re like organizational challenges and how do you like solve these problems that touch on so many different areas and efficiently tackle them while like maintaining progress among all these constraints while scaling.
By way of advice, what advice would you give to somebody thinking about doing a robotics startup? You mentioned Cosmo. Somebody that wanted to carry the Cosmo flag forward, the Anki flag forward.
Looking back at your experience, looking forward to the future that will obviously have such robots. What advice would you give to that person?
Yeah, it was the greatest experience ever. And it’s like there’s something you there are things you learn navigating a startup that you’ll never like.
It was very hard to encounter that in like a typical kind of work environment. And it’s just it’s wonderful. You got to be ready for it.
It’s not like, you know, the glamour of a startup. There’s just like just brutal emotional swings up and down.
And so having cofounders actually helps a ton. Like, I would not cannot imagine doing it solo, but having at least somebody where on your darkest days, you can kind of like really openly just like have that conversation and, you know, lean on to somebody that’s that’s in the thick of it with you helps a lot.
What I would say, what was the nature of darkest days and the emotional swings? Is it worried about the funding? Is it worried about whether any of your ideas are any good or ever were good? Is it like the self doubt?
Is it like facing new challenges that have nothing to do with the technology, like organizational, human resources, that kind of stuff?
Yeah, you come from a world in school where you feel that you put in a lot of effort and you’ll get the right result. And input translates proportional to output.
And, you know, you need to solve the set or do whatever and just kind of get it done. Now, PhD tests out a little bit.
But at the end of the day, you put in the effort, you tend to like kind of come out with your enough results that you kind of get a PhD in the startup space.
Like, you know, like you could talk to 50 investors and they just don’t see your vision. And it doesn’t matter how hard you kind of tried and pitched, you could work incredibly hard and you have a manufacturing defect.
And if you don’t fix it, you’re going to you’re out of business. You need to raise money by a certain date.
And there’s a you got to have this milestone in order to like have a good pitch and you do it.
You have to have this talent and you just don’t have it inside the company or, you know, you have to get 200 people or however many people kind of like along with you and kind of buy in the journey.
You’re like disagreeing with an investor and they’re your investors. So it’s just like, you know, it’s like there’s no walking away from it.
Right. So and it tends to be like those things where you just kind of get clobbered in so many different ways that like things end up being harder than you expect.
And it’s like such a gauntlet, but you learn so much in the process.
And there’s a lot of people that actually end up rooting for you and helping you like from the outside.
And you get good, great mentors and you like get find fantastic people that step up in the company.
And you have this like magical period where everybody’s like it’s life or death for the company.
But like you’re all fighting for the same thing. And it’s the most satisfying kind of journey ever.
The things that make it easier and that I would recommend is like be really, really thoughtful about the the application.
Like there’s a there’s a saying of like kind of, you know, team and execution and market and like kind of how important are each of those.
And oftentimes the market wins and you come out of thinking that if you’re smart enough and you work hard enough and you’re like have the right talented team and so forth, like you’ll always kind of find a way through.
And it’s surprising how much dynamics are driven by the industry you’re in and the timing of you entering that industry.
And so just Waymo is a great example of it. There is I don’t know if there’ll ever be another company or suite of companies that has raised and continues to spend so much money at such an early phase of revenue generation and productization.
You know, from a PNL standpoint, like it’s it’s an anomaly, like by any measure of any industry that’s ever existed, except for maybe the US space program.
But it’s like multiple trillion dollar opportunities, which is so unusual to find that size of a market that just the progress that shows the de risking of it.
You could apply whatever discounts you want off that trillion dollar market and it still justifies the investment that is happening because like being successful in that space makes all the investment feel trivial.
Now, by the same consequence, like the size of the market, the size of the target audience, the ability to capture that market share, how hard that’s going to be, who the incumbents like.
That’s probably one of the lessons I appreciate like more than anything else, where like those things really, really do matter.
And oftentimes can dominate the quality of the team or execution, because if you miss the timing or you do it in the wrong space, you run into like the institutional kind of headwinds of a particular environment.
Like let’s say you have the greatest idea in the world, but you burrow into health care, but it takes 10 years to innovate in health care because of a lot of challenges.
Right. Like there’s fundamental laws of physics that you have to think about.
And so the combination of like Anki and Waymo kind of drives that point home for me where you can do a ton if you have the right market, the right opportunity, the right way to explain it and you show the progress in the right sequence.
It actually can really significantly change the course of your journey and startup.
How much of is understanding the market and how much of is creating a new market?
So how do you think about like the space robotics is really interesting. You said exactly right. The space of applications is small.
You know, relative to the cost involved. So how much is like truly revolutionary thinking about like what is the application?
And then, yeah, so creating something that didn’t exist, didn’t really exist.
Like this is pretty obvious to me, the whole space of home robotics, just everything that Cosmo did.
I guess you could talk to it as a toy and people will understand it because it was much more than a toy.
And I don’t think people fully understand the value of that. You have to create it and the product will communicate it.
Just like the iPhone, nobody understood the value of no keyboard and a thing that can do web browsing.
I don’t think they understood the value of that until you create it.
Yeah. Having a foot in the door and an entry point still helps because at the end of the day, like an iPhone replaced your phone.
And so it had a fundamental purpose and all these things that it did better. Right.
And so then you could do ABC on top of it.
And then you even remember the early commercials where it’s always like one application of what it could do and then you get a phone call.
And so that was intentionally sending a message, something familiar.
But then you can send a text message, you can listen to music, you can surf the web.
And so autonomous driving obviously anchors on that as well.
You don’t have to explain to somebody the functionality of an autonomous truck.
Like there’s nuances around it, but the functionality makes sense.
In the home, you have a fundamental advantage. We always thought about this because it was so painful to explain to people what our products did and how to communicate that super cleanly, especially when something was so experiential.
And so you compare Anki to Nest.
Nest had some beautiful products where they started scaling and actually found really great success and they had really clean and beautiful marketing messaging because they anchored on reinventing existing categories where it was a smart thermostat.
And so you kind of are able to take what’s familiar, anchor that understanding and then explain what’s better about it.
That’s funny. You’re right. Cosmos is a totally new thing.
What is this thing?
We struggled. We spent a lot of money on marketing.
We actually had far greater efficiency on Cosmo than anything else because we found a way to capture the emotion in some little shorts to kind of lean into the personality in our marketing.
And it became viral where we had these kind of videos that would go and get hundreds of thousands of views and get spread and sometimes millions of views.
But it was really, really hard.
And so finding a way to kind of anchor on something that’s familiar but then grow into something that’s not is an advantage.
But then again, there’s successes otherwise.
Alexa never had a comp.
You could argue that that’s very novel and very new.
And there’s a lot of other examples that kind of created a kind of a category out of like Kiva systems. I mean, they like came in and they like enterprises a little easier because if you can is less susceptible to this because if you can argue a clear value proposition, it’s a more logical conversation that you can have with customers.
It’s not it’s a little bit less emotional and kind of subjective.
And the home you have to. Yeah, it’s like a home robot. It’s like, what does that mean? Yeah. And so then you really have to be crisp about the value proposition and what like really makes it worth it.
Like and we, by the way, went to that same where we almost like we almost hit a wall coming out of 2013 where we were so big on explaining why our stuff was so high tech and all the kind of like great technology in it and how cool it is and so forth.
To having to make a super hard pivot on why is it fun and why does the random kind of family of four need this, right?
Like so it’s learnings, but that’s that’s the challenge.
And I think like robotics tends to sometimes fall into the new category problem, but then you’ve got to be really crisp about why it needs to exist.
Well, I think some of robotics, depending on the category, depending on the application is a little bit of a marketing this challenge.
And I don’t I don’t mean I mean it’s it’s the kind of marketing that Waymo is doing that Tesla is doing is like showing off incredible engineering, incredible technology.
But convincing, like you said, a family of four that this this this is like this is transformative for your life.
This is fun. This is they don’t care how much tech is in your thing.
They don’t they really don’t care. They need to know why they want it.
And some of that is just marketing. Yeah.
And that’s why like Roomba, like yesterday, you know, like go and have this like, you know, huge, huge ramp into like the entirety of a kind of a robotics and so forth. But like they built a really great business and in a vacuum cleaner world.
And like everybody understands where a vacuum cleaner is. Most people are annoyed by doing it.
And now you have one that like kind of does it itself.
Yeah. The various degrees of quality. But that is so compelling that like it’s easy to understand. And like and they had a very kind of and I think they have like 15 percent of the vacuum cleaner market.
So it’s like pretty successful. Right. I think we need more of those types of thoughtful stepping stones in robotics.
But the opportunities are becoming bigger because hardware is cheaper, computes cheaper, clouds cheaper and A.I. is better.
So there’s a lot of opportunity.
If we zoom out from specifically startups and robotics, what advice do you have to high school students, college students about career and living a life that you’d be proud of?
You lived one heck of a life. You’re very successful in several domains.
If you can convert that into a generalizable potion, what advice would you give?
That’s a very good question. So it’s very hard to go into a space that you’re not passionate about and push like push hard enough to be, you know, to like maximize your potential in it.
And so there’s a there’s always kind of like the saying of like, OK, follow your passion.
Great. Try to find the overlap of where your passion overlaps with like a growing opportunity and need in the world where it’s not too different than the startup kind of argument that we talked about, where if you are where your passion meets the market.
Right. You know, I mean, like because it’s like it’s a you know, that’s a beautiful thing where like you can do what you love.
But it’s also just opens up tons of opportunities because the world’s ready for it.
Right. And so and so like if you’re interested in technology, that might point to like go and study machine learning because you don’t have to decide what career you’re going to go into.
But it’s going to be such a versatile space that’s going to be at the root of like everything that’s going to be in front of us that you can have eight different careers in different industries and be an absolute expert in this like kind of tool set that you wield that can go and be applied.
And that doesn’t apply to just technology. Right. It’s it could be the exact same thing if you want to, you know, the same thought process of price to design, to marketing, to, you know, to sales, to anything.
But that versatility where you like when you’re in a space that’s going to continue to grow, it’s just like what company do you join?
One that just is going to grow and the growth creates opportunities where the surface area is just going to increase and the problems will never get stale. And you can have, you know, many like.
And so you go into a career where you have that sort of growth in the world that you’re in, you end up having so much more opportunity that organically just appears.
And you can then have more shots on goal to find like that killer overlap of timing and passion and skill set and point in life where you can like, you know, just really be motivated and fall in love with something.
And then at the same time, like find a balance. Like there’s been times in my life where I worked like a little bit too obsessively and, you know, and crazy.
And I think we kind of like tried to correct it, you know, kind of the right opportunities. But, you know, I think I probably appreciate a lot more now friendships that go way back, you know, family and things like that.
And I kind of have the personality where I could ease like I have like so much desire to really try to optimize, like, you know, what I’m working on that I can easily go to a kind of an extreme.
And now I’m trying to like kind of find that balance and make sure that I have the friendships, the family, like relationship with the kids, everything that like I don’t.
I push really, really hard, but it kind of find a balance. And I think people can be happy on actually many kind of extremes on that spectrum.
But it’s easy to kind of inadvertently make a choice by how you approach it that then becomes really hard to unwind.
And so being very thoughtful about kind of all of those dimensions makes a lot of sense. And so those are all interrelated.
But at the end of the day, love, passion and love, love towards, you said, family, friends, family.
And hopefully one day if your work pans out, Boris, is love towards robots.
Not the creepy kind, the good kind. Just friendship and fun. Yeah.
It’s like another dimension to just how we interface with the world. Yeah.
Boris, you’re one of my favorite human beings, roboticist. You’ve created some incredible robots and I think inspired countless people.
And like I said, I hope Cosmo, I hope your work with Anki lives on. And I can’t wait to see what you do with Waymo.
I mean, that’s if we’re talking about artificial intelligence technology that has the potential to revolutionize so much of our world.
That’s it right there. So thank you so much for the work you’ve done. And thank you for spending your valuable time talking with me.
Thanks for listening to this conversation with Boris Hoffman. To support this podcast, please check out our sponsors in the description.
And now let me leave you with some words from Isaac Asimov.
If you were to insist I was a robot, you might not consider me capable of love in some mystic human sense.
Thank you for listening and hope to see you next time.