Video
Transcript
(calm music plays)
- [Voice Off-Screen] I am a translator,
transforming text into creative discovery,
turning movement into animation
and infusing words with emotion.
-
(speaks foreign language)
-
[Voice Off-Screen] I am a healer,
exploring the building blocks that make us unique,
discovering new threats before they happen.
And searching for the cures to keep them at bay.
I am a visionary
creating new medical miracles
and unlocking the secrets of our sun
to keep us safer here on Earth.
I am a navigator finding a single moment
in a sea of content.
-
We are announcing the next generation.
-
[Voice Off-Screen] In the perfect setting
for our most amazing stories.
I am a creator
adding new dimensions to creative expression
and reimagining our virtual selves.
I am a helper,
personalizing our surroundings.
-
[Person Off-Screen] Help me arrange the living room.
-
[Voice Off-Screen] harnessing the wisdom
of a million programmers
and turning the real world into a virtual playground.
I even helped write this script,
breathed life into the words and compose the melody.
(rousing music plays)
I am AI brought to life by Nvidia,
deep learning and brilliant minds everywhere.
- [Announcer] Ladies and gentlemen,
please welcome Nvidia founder and CEO Jensen Huang
- [2nd Announcer] (speaks foreign language)
(rousing music plays)
(audience applauds)
- (speaks foreign language)
(audience cheers)
We’re back.
(audience applauds and cheers)
Our first live event in almost four years.
I haven’t given a public speech in four years.
Wish me luck.
(audience laughs)
(audience applauds)
I have a lot to tell you, very little time,
so let’s get going.
Ray Tracing,
simulating the characteristics of light and materials
is the ultimate accelerated computing challenge.
Six years ago we demonstrated for the very first time
rendering this scene in less than few hours.
After a decade of research,
we were able to render this scene in seconds,
15 seconds on our highest end GPU six years ago.
And then we invented Nvidia RTX
and combined three fundamental technologies,
hardware accelerated rate tracing,
artificial intelligence
processing on Nvidia Tensor core GPUs,
and brand new algorithms.
Let’s take a look at the difference in just five years,
roll it.
This is running on CUDA GPUs six years ago
rendering this beautiful image
that would’ve otherwise taken a couple of hours on a CPU.
So this was a giant breakthrough already,
enormous speed up running on accelerated computing.
And then we invented the RTX GPU.
Run it, please.
(upbeat music plays)
The holy grail of computer graphics, Ray tracing,
is now possible in real time.
This is the technology we have put into RTX
and this after five years
is a very important time for us
because for the very first time
we took our third generation ADA architecture, RTX GPUs
and brought it to the mainstream with two new products
that are now completely in production.
Are you?
I got that backwards.
Everything looks different inside out and upside down.
(audience laughs)
Okay, this is our brand new,
right here you’re looking at an ADA GPU
running ray tracing and artificial intelligence
at 60 frames a second.
It’s 14 inch, it weighs almost nothing.
It’s more powerful than the highest end PlayStation
and this is the RTX 4060 Ti for our core gamers.
Both of these are now in production.
Our partners here in Taiwan are producing
both of these products in very, very large productions
and I’m really excited about ’em.
Thank you very much.
(audience applauds)
I can almost put this in my pocket.
(audience laughs)
AI made it possible for us to do that.
Everything that you saw
would’ve been utterly impossible without AI.
For every single pixel we render,
we use AI to predict seven others.
For every pixel we compute,
AI predicted seven others.
The amount of energy we save,
the amount of performance we get is incredible.
Now of course,
I showed you the performance on those two GPUs,
but it wouldn’t have been possible
if not for the super computer back at Nvidia
running all the time training the model
so that we can enhance applications.
So the future is what I demonstrated to you just now,
you can extrapolate almost everything
that I’m gonna talk about for the rest of the talk
into that simple idea
that there will be a large computer writing software
developing and deploying software that is incredible,
that can be deployed in devices all over the world.
We used AI to render this scene.
We’re gonna also use AI to bring it alive.
Today we’re announcing Nvidia ACE, Avatar Cloud Engine,
that is designed for animating
to bringing a digital avatar to life.
It has several characteristics, several capabilities,
speech recognition, text-to-speech,
natural language understanding,
basically a large language model,
and using the sound
that you will be generating with your voice,
animate the face
and using the sound and the expression that you’re saying
animate your gestures.
All of this is completely trained by AI.
We have a service that includes pre-trained models
that you can come, developers can come,
and modify and enhance for your own application,
for your own story because every game has a different story.
And then you can deploy it in the cloud
or deploy it on your device.
Has a great backend, has a TensorRT.
TensorRT is Nvidia’s deep learning, optimizing compiler,
and you could deploy it on Nvidia GPUs
as well as output Onyx, an industry standard backend,
so that you can run it on any device.
Let’s take a look at this scene in just a second,
but let me first tell you about it.
It is completely rendered with Ray tracing.
Notice the beautiful lights,
so many different lights
and all of the different lights
are projecting light from that source.
So you have all kinds of direct lights,
you have global illumination.
You’re gonna see incredibly beautiful
shadows and physics simulation
and notice the character,
the beautiful rendering of the character.
Everything is done in Unreal Engine 5.
We partnered with a avatar framework,
an avatar tool maker called (indistinct),
and together we developed this demo you’re about to see.
Okay, run please.
(upbeat music plays)
Everything is real time.
-
Hey Jen, how are you?
-
Unfortunately not so good.
-
How come?
-
I’m worried about the crime around here?
It’s gotten bad lately.
My ramen shop got caught in the crossfire.
-
Can I help?
-
If you want to do something about this,
I have heard rumors that the powerful crime lord Kumon Aoki
is causing all sorts of chaos in the city.
He may be the root of this violence.
-
I’ll talk to him, where can I find him?
-
I have heard he hangs out
in the underground fight clubs on the city’s east side.
Try there.
-
Okay, I’ll go.
-
Be careful, Kai.
-
None of that conversation was scripted.
We gave that AI, this Gin AI character, a backstory,
his story about his ramen shop
and the story of this game.
And all you have to do is go up and talk to this character.
And because this character
has been infused with artificial intelligence
and large language models,
it can interact with you,
understand your meaning and interact with you
in a really reasonable way.
All of the facial animation completely done by the AI,
we have made it possible
for all kinds of characters to be generated.
They’re all domain,
they have their own domain knowledge.
You can customize it,
so everybody’s games different
and look how wonderfully beautiful they are
and natural they are.
This is the future of video games.
Not only will AI contribute to the rendering
and the synthesis of the environment,
AI will also animate the characters.
AI will be a very big part of the future of video games.
The most important computer
of our generation is unquestionably the IBM System/360.
This computer revolutionized several things.
The first computer in history
to introduce the concept of a central processing unit,
the CPU,
virtual memory,
expandable I/O,
multitasking,
the ability to scale this computer
for different applications
across different computing ranges.
And one of the most important contributions
and one of its greatest insights
is the importance of preserving software investment.
The software ran across the entire range of computers
and it ran across multiple generations.
So that the software you develop,
IBM recognized the importance of software,
recognized the importance of preserving your investment,
and very importantly
recognized the importance of installed base.
This computer revolutionized not only computing
and many of us grew up reading the manuals of this computer
to understand how computer architecture worked,
to even learn about DMA for the very first time,
this computer not only revolutionized computing,
it revolutionized the thinking of the computer industry.
System/360 and the programming model of the System/360
has largely retained until today, 60 years.
In 60 years a trillion dollars worth
of the world’s data center
all basically used a computing model
that was innovated all the way 60 years ago.
Until now.
There are two fundamental transitions
happening in the computer industry today.
All of you are deep within it and you feel it.
There are two fundamental trends.
The first trend is because CPU scaling has ended,
the ability to get 10 times more performance
every five years has ended.
The ability to get 10 times more performance
every five years at the same cost
is the reason why computers are so fast today.
The ability to sustain 10 times more computing
every five years without increase in power
is the reason why the world’s data center
hasn’t consumed so much more power on Earth.
That trend has ended and we need a new computing approach
and accelerated computing is the path forward.
It happened at exactly the time
when a new way of doing software was discovered,
deep learning,
these two events came together
and it’s driving computing today.
Accelerated computing and generative AI.
This way of doing software,
this way of doing computation
is a reinvention from the ground up and it’s not easy.
Accelerated computing is a full stack problem.
It’s not as easy as general purpose computing.
The CPU is a miracle.
High level programming languages, great compilers,
almost anybody could write reasonably good programs,
because the CPU is so flexible.
However,
its ability to continue to scale and performance has ended
and we need a new approach.
Accelerate computing is full stack.
You have to re-engineer everything from the top down
and from the bottom up,
from the chip to the systems,
to the systems’ software,
new algorithms
and of course optimizing the new, the applications.
The second is that it’s a data center scale problem.
And the reason why it’s a data center scale problem
is today the data center is the computer.
Unlike the past,
when your PC was a computer or the phone was a computer,
today your data center is the computer.
The application runs across the entire data center
and therefore it’s vital that you have to understand
how to optimize the chips, the compute,
the software across the NIC, the switch,
all the way to the other end in a distributor computing way.
And the third accelerated computing is multi-domain.
It’s domain specific.
The algorithms and the software stacks that you create
for computational biology
and the software stack you create
for computational fluid dynamics
are fundamentally different.
Each one of these domains of science need their own stack,
which is the reason why accelerated computing has taken us
nearly three decades to accomplish.
This entire stack has taken us nearly three decades.
However, the performance is incredible,
and I’ll show you.
After three decades,
we realize now that we’re at the tipping point.
A new computing model is extremely hard to come by.
And the reason for that is this.
In order for there to be a new computing model,
you need developers.
But a developer would only come if they’re,
and developers have to create applications
that end users would buy.
And without end users, there would be no customers,
no computer companies to build computers,
without computer companies,
like yourself building computers,
there would be no install base.
Without install base, there would be no developers.
Without developers, there’ll be no applications.
This loop,
this loop has been suffered
by so many computing companies in the 40 years
that I’ve been in this industry,
this is really one of the first major times in history
a new computing model has been developed and created.
We now have 4 million developers,
3000 plus applications,
40 million CUDA downloads in history,
25 million just last year.
40 million downloaded in history, 25 million just last year.
15,000 startup companies in the world built on Nvidia today,
building on Nvidia today,
and 40,000 large companies, enterprises around the world,
are using accelerated computing.
We have now reached the tipping point
of a new computing era.
This new computing model is now enjoyed and embraced
by just about every computer company
and every cloud company in the world.
There’s a reason for that.
It turns out that every single computing approach
its benefit in the final analysis is lower cost.
The PC revolution that started
and that Taiwan enjoyed in 1984,
starting in 1984, the year I graduated,
that decade in the eighties was the PC revolution.
PC brought computing to a price point
nobody’s ever seen before.
And then of course,
mobile devices was convenient
and it also saved enormous amounts of money.
We aggregated and combined the camera,
the music player, your PC, a phone.
So many different devices were all integrated into one.
And as a result,
not only are you able to enjoy your life better,
it also saves a lot of money and great convenience.
Every single generation
provided something new and saved money.
Well, this is how accelerated computing works.
This is accelerated computing
used for large language models.
For large language models.
Basically the core of generative AI.
This is a 10, this example is a $10 million server
and we costed everything.
We costed the process, we costed all the chips,
we costed all the network, we costed literally everything.
And so $10 million gets you nearly a thousand CPU servers.
And to train to process this large language model
takes 11 gigawatt hours.
11 gigawatt hours, okay?
And this is what happens when you accelerate
this workload with accelerated computing.
And so with $10 million, for a $10 million server,
you buy 48 GPU servers.
It’s the reason why people say
that GPU servers are so expensive.
Remember people say GPU servers are so expensive.
However, the GPU server is no longer the computer.
The computer is the data center.
Your goal is to build the most cost effective data center,
not build the most cost effective server.
Back in the old days when the computer was the server,
that would be a reasonable thing to do,
but today the computer is the data center.
And so what you want to do
is you want to create the most effective data center
with the best TCO.
So for $10 million,
you buy 48 GPU servers.
It only consumes 3.2 gigawatt hours
and 44 times the performance.
Let me just show it to you one more time.
This is before and this is after.
And this is,
(audience laughs and applauds)
we want dense computers, not big ones.
We want dense computers, fast computers, not big ones.
And so that’s ISO budget.
Let me show you something else.
Okay, so this is $10 million again,
960 CPU servers.
Now this time,
this time we’re gonna be ISO power.
We’re gonna keep this number the same.
We’re gonna keep this number the same, okay?
So this number is the same,
the same amount of power.
This means your data center is power limited.
In fact, most data centers today are power limited.
And so with being power limited using accelerated computing,
you can get 150 times more performance
with three times more cost.
But why is that such a great deal?
The reason for that is because
it’s very expensive and time consuming
to find another data center.
Almost everybody is power limited today.
Almost everybody is scrambling to break new ground
to get more data centers.
And so if you are power limited
or if your customers are power limited,
then what they can do
is invest more into that data center,
which already, which has 11 gigawatts,
and you can get a lot more throughput,
continue to drive your growth.
Here’s another example.
This is my favorite.
If your goal, if your goal is to get the work done,
if your goal is to get the work done,
you don’t care how.
Your goal is to get the work done, you don’t care how.
And this is the work you want to get done.
ISO work, okay?
This is ISO work.
All right, look at this.
(audience laughs and applauds)
Oh, (speaks foreign language)
(audience laughs)
(audience applauds)
That was, people love that, right?
Nice to see you, Carol.
Nice to see you, Spencer.
Okay, so let’s do that one more time.
It’s so, it’s so delightful.
Look at this.
Oh, oh, oh, no, no.
Okay, look at this.
Look at this,
before,
after.
The more you buy, the more you save.
That’s right.
The more you buy, the more you save,
(audience laughs and applauds)
the more you buy, the more you save.
That’s Nvidia.
You don’t have to understand the strategy,
you don’t have to understand the technology.
The more you buy, the more you save.
That’s the only thing you have to understand.
Data center, data center.
Now, why is it?
You have been, you’ve heard me talk about this
for so many years.
In fact, every single time you saw me,
I’ve been talking to you about accelerated computing.
I’ve been talking about accelerated computing,
well, for a long time,
well over two decades.
And now why is it that finally it’s the tipping point?
Because the data center equation is very complicated.
This equation is very complicated.
This is the cost of building a data center.
The data center TCO is a function of,
and this is the part where everybody mess up.
It’s a function of the chips, of course, no question.
It’s a function of the systems, of course, no question.
But it’s also because there’s so many different use cases.
It’s a function of the diversity of systems
that can be created.
It is the reason why Taiwan is at the bedrock
at the foundation of the computer industry.
Without Taiwan,
why would there be
so many different configurations of computers?
Big, small, powerful, cheap,
enterprise, hyperscale, super computing,
so many different types of configurations,
1U, 2U, 4U, right?
And all completely compatible.
The ability for the hardware ecosystem of Taiwan
to have created so many different versions
that are software compatible.
Incredible.
The throughput of the computer of course is very important.
It depends on the chip,
but it also depends as you know,
the algorithm.
Because without the algorithm libraries
accelerated computing does nothing.
It just sits there.
And so you need to algorithm software libraries.
It’s a data center scale problem.
So networking matters.
And networking matters,
distributed computing is all about software.
Again, system software matters.
And before, before long,
in order for you to present your system to your customers,
you have to ultimately have a lot of applications
that run on top of it.
The software ecosystem matter.
Well, the utilization of a data center
is one of the most important criteria of its TCO.
Just like a hotel.
If the hotel is wonderful, but it’s mostly empty,
the cost is incredible.
And so you need the utilization to be high.
In order for the utilization to be high,
you have to have many different,
many different applications.
So the richness of the applications matter.
Again, the algorithm in libraries
and now the software ecosystem.
You purchase a computer,
but these computers are incredibly hard to deploy
from the moment that you buy the computer
to the time that you put that computer to work
to start making money,
that difference can be weeks,
if you’re very good at it, incredibly good at it.
We can stand up a super computer
in a matter of a couple of weeks,
because we build so many all around the world,
hundreds around the world.
But if you’re not very good at it, it could take a year.
That difference,
depriving yourself the year of making money
and the year of depreciation,
incredible cost.
Lifecycle optimization.
Because the data center is software defined,
there are so many engineers that will continue to refine
and continue to optimize the software stack.
Because NVIDIA’s software stack
is architecturally compatible across all of our generations,
across all of our GPUs.
Every time we optimize something, it benefits everybody.
Every time we optimize something, it benefits everybody.
So lifecycle optimization,
and of course finally the energy that you use, power.
But this equation is incredibly complicated.
Well,
because we have now addressed
so many different domains of science,
so many industries,
and in data processing,
in deep learning, classical machine learning,
so many different ways for us to deploy software
from the cloud to enterprise to supercomputing to the Edge,
so many different configurations of GPUs,
from our HGX versions to our Omniverse versions,
to our cloud GPU and graphics version,
so many different versions.
Now, the utilization is incredibly high.
The utilization of Nvidia GPU is so high,
almost every single cloud is over extended.
Almost every single data center is over extended.
There are so many different applications using it.
So we have now reached the tipping point
of accelerated computing.
We have now reached the tipping point of generative AI.
And I want to thank all of you for your support
and all of your assistance and partnership
in making this dream happen.
Thank you.
(audience applauds)
Every single time we announce the new product,
the demand for every single generation
increased and increased and increased.
And then one generation, it hockey steps,
we stick with it, we stick with it, we stick with it,
Kepler,
and then Volta,
and then Pascal,
and then Volta,
and then Ampere.
And now this generation of accelerated computing,
the demand is literally from every corner of the world.
And we are so, so, so excited
to be in full volume production of the H100.
And I wanna thank all of you for your support.
This is incredible.
(audience applauds)
H100 is in full production,
manufactured by companies all over Taiwan,
used in clouds everywhere,
enterprises everywhere.
And let’s take a look at a short video
of how H100 is produced.
(machines whir)
(upbeat music plays)
(machines continue to whir)
(upbeat music continues)
It’s incredible.
This computer,
(audience applauds)
35,000 components on that system board,
eight Hopper GPUs.
Let me show it to you.
Whoosh.
-
[Audience Member] Yay. (audience applauds and cheers)
-
All right, this,
I would lift this,
but I still have the rest of the keynote
I would like to give.
This is 60 pounds, 65 pounds.
It takes robots to lift it, of course,
and it takes robots to insert it,
because the insertion pressure is so high
and it has to be so perfect.
This computer is $200,000, and as you know,
it replaces an entire room of other computers.
So this, I know it’s a very, very expensive computer.
It’s the world’s single most expensive computer
that you can say, “The more you buy, the more you save.”
(audience laughs)
This is what a compute tray looks like.
Even this is incredibly heavy.
See that?
So this is the brand new H100
with the world’s first computer
that has a transformer engine in it.
The performance is utterly incredible.
Hopper is in full production.
We’ve been driving computing,
this new form of computing for 12 years.
When we first met the deep learning researchers,
we were fortunate to realize
that not only was deep learning going to be
a fantastic algorithm for many applications initially,
computer vision and speech,
but it would also be a whole new way of doing software.
This fundamental new way of doing software
that can use data to develop,
to train a universal function approximator
of incredible dimensionality.
It can basically predict almost anything
that you have data for,
so long as the data has structure that it can learn from.
And so we realized the importance of this
new method of developing software,
and then it has the potential
of completely reinventing computing.
And we were right.
12 years later,
we have reinvented literally everything.
We reinvented, of course, we started by creating
a new type of library.
It’s essentially like a SQL,
except for deep learning for neural network processing.
It’s like a rendering engine,
a solver for neural network processing called (indistinct).
We reinvented the GPU.
People thought that GPUs would just be GPUs.
They were completely wrong.
We dedicated ourselves to reinventing the GPUs,
so that it’s incredibly good at Tensor processing.
We created a new type of packaging called SXM
and worked with TSMC on CoWos,
so that we could stack multiple chips on the same dye.
NVLink, so that we can connect these SXM modules together
with high speed chip to chip interconnect.
Almost a decade ago,
we built the world’s first chip to chip (indistinct),
so that we can expand the memory size of GPUs
using SXMs and NVLink.
And we create a new type of motherboard,
we call it HGX that I just showed you.
No computers has ever been this heavy before
or consumed this much current.
Every aspect of a data center had to be reinvented.
We also invented a new type of computer appliance
so that we could develop software on it
so that third party developers could develop software on it
with a simple appliance we call DGX,
basically a giant GPU computer.
DGX.
We also purchased Mellanox,
which is one of the great strategic decisions of our company
because we realized that in the future,
if the data center is the computer,
then the networking is the nervous system.
If the data center is the configure, is the computer,
then the networking defines the data center.
That was an incredibly good acquisition
and since then we’ve done so many things together
and I’m gonna show you some
really, really amazing work today.
And then of course, an operating system.
If you have a nervous system, a distributed computer,
it needs to have an operating system.
And the operating system of this distributed computering
we call Magnum IO.
Some of our most important work.
And then all of the algorithms and engines
that sit on top of these computers, we call Nvidia AI.
The only AI operating system in the world
that takes data processing from data processing
to training, to optimization, to deployment and inference.
End-to-end deep learning processing.
It is the engine of AI today.
Well, every single generation since Kepler, which is K80,
to Pascal Volta, Ampere, Hopper,
every two years, every two years,
we took a giant leap forward.
But we realized we needed more than even that,
and which is the reason why we connected GPUs
to other GPUs called NVLink,
built one giant GPU,
and we connected those GPUs together
using InfiniBand into larger scale computers.
That ability for us to drive the processor
and extend the scale of computing
made it possible for the AI research organization,
the community,
to advance AI at an incredible rate.
We just kept pushing and pushing and pushing.
Hopper went into production August of last year.
August, 2022.
2024, which is next year, we’ll have Hopper-Next.
Last year we had Quantum.
Two years from now or next year,
we’ll have Quantum-Next.
So every two years we take giant leaps forward
and I’m expecting the next leap to be giant as well.
This is the new computer industry.
Software is no longer programmed just by computer engineers.
Software is programmed by computer engineers
working with AI supercomputers.
These AI supercomputers are a new type of factory.
It is very logical that a car industry has factories.
They build things that you can see, cars.
It is very logical that computer industry
has computer factories.
You build things that you can see, computers.
In the future,
every single major company
will also have AI factories
and you will build and produce your company’s intelligence.
And it’s a very sensible thing.
We cultivate and develop and nourish our employees
and continue to create the conditions
by which they can do their best work.
We are intelligence producers already.
It’s just that the intelligence producers,
the intelligence are people.
In the future, we will be intelligence producers,
artificial intelligence producers.
And every single company will have factories
and the factories will be built this way.
This translates to your throughput.
This translates to your scale
and you will build it in a way that is very, very good TCO.
Well, our dedication to pursuing this path
and relentlessly increasing the performance.
just think in 10 years time,
we increased the throughput,
we increased the scale,
the overall throughput across all of that stack
by 1 million x,
1 million x in 10 years.
Well just now, in the beginning,
I showed you computer graphics.
In five years,
we improved the computer graphics by 1000 times.
In five years,
using artificial intelligence and accelerated computing.
Using accelerated computing and artificial intelligence,
we accelerated computer graphics
by 1000 times in five years.
Moore’s law is probably
currently running at about two times.
A thousand times in five years.
A thousand times in five years
is 1 million times in 10.
We’re doing the same thing in artificial intelligence.
Now, question is,
what can you do when your computer
is 1 million times faster?
What would you do if your computer
was 1 million times faster?
Well, it turns out that the friends we met
at University of Toronto,
Ilia Sutskever, Alex Krizhevsky, and Geoffrey Hinton, they,
and Ilia Sutskever, of course, was the founder of OpenAI,
and he discovered the continuous scaling
of artificial intelligence and deep learning networks
and came up with the ChatGPT breakthrough.
Well,
in this general form,
this is what has happened.
The transformer engine, the transformer engine,
and the ability to use unsupervised learning,
unsupervised learning,
be able to learn from a giant amount of data
and recognize patterns and relationships
across a large sequence.
And using transformers to predict the next word,
large language models were created,
and the breakthrough, of course, is very, very, very clear.
And I’m sure that everybody here
has already tried ChatGPT.
But the important thing is this.
We now have a software capability
to learn the structure of almost any information.
We can learn the structure of text, sound, images,
there is structure in all of us,
physics,
proteins,
DNA,
chemicals,
anything that has structure,
we can learn that language, learn its language.
Of course you can learn English and Chinese and Japanese
and so on and so forth,
but you can also learn the language of many other things.
And then the next breakthrough came,
generative AI.
Once you can learn the language,
once you can learn the language of certain information,
then with control and guidance
from another source of information,
that we call prompts,
we can now guide the AI
to generate information of all kinds.
We can generate text-to-text, text-to-image.
But the important thing is this,
information transformed to other information
is now possible.
Text to proteins, text to chemicals,
images to 3D,
images to 2D,
images to text, captioning,
video to video.
So many different types of information
can now be transformed.
For the very first time in history
we have a software technology
that is able to understand
the representation of information of many modalities.
We can now apply computer science,
we can now apply the instrument of our industry.
We can now apply the instrument of our industry
to so many different fields that were impossible before.
This is the reason why everybody is so excited.
Now let’s take a look at some of these.
Let’s take a look at what it can do.
This, here’s a prompt, and this prompt says,
“Hi Computex,”
so this is a, we type in the word,
“Hi, Computex, I’m here to tell you
how wonderful stinky tofu is.
(audience laughs)
You can enjoy it right here in Taiwan.
It’s best from the night market.”
I was just there the other night.
(audience laughs)
Okay, play it.
- Hi Computex, I’m here to tell you about
how wonderful stinky tofu is.
You can enjoy it right here in Taiwan.
It’s best from the night market.
- The only input was words.
The output was that video.
(audience applauds)
Okay, here’s another prompt.
Taiwanese, we give, we tell this AI, okay?
We tell this AI, this is a Google text-to-music.
“Traditional Taiwanese music.
Peaceful, like it’s warm and raining
in a lush forest at daybreak.”
Please.
(calm music plays)
(speaks foreign language)
(audience applauds)
We send texts in.
AI says, “Hmm, okay, this music,” okay?
Hear this one.
“I am here at Computex,
I will make you like me best.
Sing sing it with me.
I really like Nvidia.”
(audience laughs)
Okay?
So this is the word, these are the words are,
these are the words.
And I, “Hey, hey, voice mod, could you write me a song?”
These are the words.
Okay, play it.
- (upbeat accompaniment plays) ♪ I am here at Computex
♪ I will make you like me best, yeah. ♪
♪ Sing sing it with me ♪
♪ I really like Nvidia. ♪
(audience cheers and applauds)
- Okay, so obviously
this is a very, very important new capability,
and that’s the reason why there’s
so many generative AI startups.
We’re working with some 1600 generative AI startups.
They’re in all kinds of domain,
in language domains, in media, in biology.
This is one of the most important areas that we care about.
Digital biology is going to go through its revolution.
This is going to be an incredible thing.
Just as we had Synopsis and Cadence
help us create tools
so that we can build wonderful chips and systems,
for the very first time,
we’re gonna have computer aided drug discovery tools.
And they’ll be able to manipulate
and work with proteins and chemicals
and understands disease targets
and try all kinds of chemicals
that previously had never been thought of before.
Okay, so really, really important area.
Lots of startups, tools and platform companies.
And let me show you a video
of some of the work that they’re doing.
Play it, please.
- Generative AI is the most important
computing platform of our generation.
Everyone from first movers to Fortune 500 companies
are creating new applications
to capitalize on generative AI’s ability
to automate and co-create.
For creatives, there’s a brand new set of tools
that would be simply impossible a few years ago.
Adobe is integrating Firefly into their creative apps,
ethically trained and artist friendly.
You can now create images with a simple text prompt.
Or expand the image of your real photos
to what lies beyond the lens.
Productivity apps will never be the same.
Microsoft has created a co-pilot for office apps.
Every profession is about to change.
Tabnine is democratizing programming
by tapping into the knowledge base
of a million developers
to accelerate application development
and reduce debugging time.
If you’re an architect
or just thinking of remodeling your home,
Planner 5D can instantly turn a 2D floor plan to a 3D model.
And right here in Taiwan,
AnHorn medicines is targeting
difficult to treat diseases and accelerating cures.
- Incredible, right?
Just utterly incredible.
There’s no question that we’re in a new computing era.
There’s just absolutely no question about it.
Every single computing era,
you could do different things that weren’t possible before,
and artificial intelligence certainly qualifies.
This particular computing era is special in several ways.
One,
it is able to understand information
of more than just text and numbers.
It can now understand multimodality,
which is the reason why this computing revolution
can impact every industry, every industry.
Two, because this computer doesn’t care how you program it.
It will try to understand what you mean,
because it has this incredible
large language model capability.
And so the programming barrier is incredibly low.
We have closed the digital divide.
Everyone is a programmer now.
You just have to say something to the computer.
Third,
this computer,
not only is it able to do amazing things for the future,
it can do amazing things for every single application
of the previous era,
which is the reason why all of these APIs
are being connected into Windows applications
here and there, and browsers and PowerPoint and Word.
Every application that exists will be better because of AI.
You don’t have to just AI this generation,
this computing era does not need new applications.
It can succeed with old applications
and it’s gonna have new applications.
The rate of progress, the rate of progress,
because it’s so easy to use,
is the reason why it’s growing so fast.
This is going to touch literally every single industry.
And at the core with,
just as with every single computing era,
it needs a new computing approach.
In this particular era,
the computing approach is accelerated computing,
and it has been completely reinvented from the ground up.
Last several years,
I’ve been talking to you about
the new type of processor we’ve been creating,
and this is the reason we’ve been creating it.
Ladies and gentlemen,
Grace Hopper is now in full production.
This is Grace Hopper.
(audience applauds)
Nearly 200 billion transistors in this computer.
Oh, (speaks foreign language),
(speaks foreign language),
(audience laughs)
look at this.
This is Grace Hopper.
This processor,
this processor is really quite amazing.
There are several characteristics about it.
This is the world’s first accelerated processor,
accelerated computing processor
that also has a giant memory.
It has almost 600 gigabytes of memory
that’s coherent between the CPU and the GPU.
And so the GPU can reference the memory,
the CPU can reference the memory,
and unnecessary, any unnecessary copying back and forth
could be avoided.
The amazing amount of high speed memory
lets the GPU work on very, very large data sets.
This is a computer, this is not a chip.
Practically, the entire computer’s on here.
All of the low, this is uses low power DDR memory,
just like your cell phone,
except this has been optimized and designed
for high resilience data center applications.
Incredible levels of performance.
This took us several years to build,
and I’m so excited about it
and I’ll show you some of the things
that we’re gonna do with it.
Janine, thank you, (speaks foreign language).
You’re supposed to say (speaks foreign language).
(audience laughs)
(speaks foreign language)
Okay, so four PetaFLOPS,
transformer engine,
72 CPU cores,
they’re connected together.
They’re connected together by a high speed chip to chip link
900 gigabytes per second.
And so the local memory, 96 gigabytes of HBM3 memory
is augmented by LPDDR memory.
Across this link,
across very, very large and high speed cache.
So this computer is like none other the world’s ever seen.
Now, let me show you some of its performance.
So the, I’m comparing here on three different applications,
and this is a very important application.
If you have never heard of it,
be sure to look into it.
It’s called Vector database.
Vector database is a database that has tokenized,
that has vectorized the data that you’re trying to store.
And so it understands the relationships
of all of the data inside its storage.
This is incredibly important
for knowledge augmentation of the large language models
to avoid hallucination.
The second is deep learning recommender systems.
This is how we get news and music
and all the texts that you see on your devices.
Recommend of course, music and goods
and all kinds of things.
Recommender system is the engine of the digital economy.
This is probably the single most valuable piece of software
that any of the companies in the world runs.
This is the world’s first AI factory.
There will be other AI factories in the future,
but this is really the first one.
And the last one is large language model inference.
65 gigabytes.
65 gigabytes is a fair,
65 billion parameters is a fairly large language model,
and you can see that on a CPU it’s just not possible.
The CPU is simply not possible.
With Grace Hopper, excuse me,
with Hopper on an x86,
it’s faster, but notice it’s memory limited.
You could of course take this 400 gigabytes
and cut it up into a whole bunch of small pieces,
shard it and distribute it across more GPUs
and distribute it across more GPUs.
But in the case of Grace Hopper,
in the case of Grace Hopper,
Janine, (speaks foreign language).
Oh, Janine doesn’t speak Chinese.
(audience laughs)
(speaks foreign language).
Okay,
Grace Hopper, Grace Hopper has more memory,
has more memory on this one module than all of these.
Does that make sense?
And so as a result,
you don’t have to break the data into so many pieces.
Of course, the amount of computation of this is higher,
but this is so much easier to use.
And if you want to scale out large language models,
if you wanna scale out vector databases,
if you want to scale out deep learning recommender systems,
this is the way to do it.
This is so easy to use.
Plug this into your data center and you can scale out AI.
Okay?
So this is the reason why we built Grace Hopper.
The other application that I’m super excited about
is the foundation of our own company.
Nvidia is a big customer of Cadence.
We use all of their tools.
And all of their tools run on CPUs.
And the reason why they run on CPUs
is because NVIDIA’s data sets are very large
and the algorithms are refined
over very long periods of time.
And so most of the algorithms are very CPU centric.
We’ve been accelerating some of these algorithms
with Cadence for some time,
but now with Grace Hopper,
and we’ve only been working on it
for a couple of days and weeks,
the performance speed up,
I can’t wait to show it to you, is insane.
This is going to revolutionize an entire industry,
one of the highest
compute intensive industries in the world, of course,
designing chips, designing electronic systems,
CAE,
CAD,
EDA,
and of course, digital biology.
All of these markets,
all of these industries
require very large amounts of computation.
But the data set is also very large.
Ideal for Grace Hopper.
Well, 600 gigabytes is a lot.
600 gigabytes is a lot.
This is basically a supercomputer I’m holding in my hands.
This 600 gigabytes is a lot.
But when you think about it,
when we went from AlexNet
of 62 million parameters 12 years ago
and trained on 1.2 million images,
it is now 5,000 times bigger with Google’s Palm,
5,000 times bigger with 340 billion parameters.
And of course, we’re gonna make even bigger ones than that.
And that’s been trained on 3 million times more data.
So literally in the course of 10 years,
the computing problem of deep learning
increased by 5,000 times for the software
and 3 million times for the dataset.
No other area of computing has ever increased this fast.
And so we’ve been chasing
the deep learning advance
for quite some time.
This is going to make a big, big contribution.
However, 600 gigabytes is still not enough.
We need a lot more.
So let me show you what we’re gonna do.
So the first thing is, of course,
we have the Grace Hopper Super chip,
put that into a computer.
The second thing that we’re gonna do
is we’re gonna connect eight of these together
using NVLink.
This is an NVLink switch.
So eight of this, eight of this,
connect into three switch trays
into eight, eight Grace Hopper pod.
These eight Grace Hopper pods,
each one of the Grace Hoppers
are connected to the other Grace Hopper
at 900 gigabytes per second.
600 gigabytes, 900 megabytes per second.
Eight of them connected together as a pod,
and then we connect 32 of them together
with another layer of switches.
And in order to build,
in order to build this,
256 Grace Hopper super chips connected
into one exaFLOPS,
one exaFLOPS.
You know that countries and nations
have been working on exaFLOPS computing
and just recently achieved it.
256 Grace Hoppers for deep learning
is one exaFLOPS transformer engine.
And it gives us 144 terabytes of memory
that every GPU can see.
This is not 144 terabytes distributed.
This is 144 terabytes connected.
Why don’t we take a look at what it really looks like?
Play it, please.
(machine whirs)
(audience applauds)
This is 150 miles of cables,
fiber optic cables,
2000 fans,
70,000 cubic feet per minute.
It probably recycles the air in this entire room
in a couple of minutes.
40,000 pounds.
Four elephants.
(audience laughs)
One GPU.
(audience applauds)
if I can get up on here.
This is actual size.
I wonder if this can play crisis.
(audience laughs)
Only gamers know that joke.
So this is our brand new Grace Hopper
AI super computer.
It is one giant GPU.
Utterly incredible.
We’re building it now.
All of the, every component is in production
and we’re so excited that Google Cloud, Meta and Microsoft
will be the first companies in the world to have access,
and they will be doing exploratory research
on the pioneering front,
the boundaries of artificial intelligence with us.
We will, of course, build these systems as products.
And so if you would like to have an AI supercomputer,
we would of course, come and install it in your company.
We also share the blueprints of this supercomputer
with all of our cloud suppliers,
so that our cloud partners,
so that they can integrate it into their networks
and into their infrastructure.
And we will also build it inside our company
for us to do research ourselves and do development.
So this is the DGX GH200.
It is one giant GPU.
Okay?
(audience applauds)
1964,
the year after I was born,
was a very good year for technology.
IBM, of course, launched the System/360
and AT&T demonstrated to the world
their first picture phone.
Encoded, compressed,
streamed over copper telephone wires and,
twisted pair, and on the other end decoded,
picture phone, little tiny screen black and white.
To this day, this very experience is largely the same,
of course, at much, much higher volumes
for all of the reasons we all know well.
Video calls is now one of the most important things we do.
Everybody does it.
About 65% of the Internet’s traffic is now video,
and yet the way it’s done is fundamentally still the same.
Compress it on the device, stream it,
and decompress it on the other end.
Nothing changed in 60 years.
We treat communications like it goes down a dump pipe.
The question is,
what would happen if we applied generative AI to that?
We have now created a computer,
I showed you, Grace Hopper.
It can be deployed broadly all over the world, easily.
And as a result, every data the center, every server
will have generative AI capability.
What would happen if instead of decompression,
streaming and re, you know, recovering,
decompression, compression/decompression,
what if, the cloud performed generative AI capability to it?
Let’s take a look.
(upbeat music plays)
- [Voice Off-Screen] The future of wireless
and video communications will be 3D, generated by AI.
Let’s take a look at how Nvidia Maxine 3D
running on the Nvidia Grace Hopper super chip
can enable 3D video conferencing on any device
without specialized software or hardware.
Starting with a standard 2D camera sensor
that’s in most cell phones, laptops, and webcams,
and tapping into the processing power of Grace Hopper.
Maxine 3D converts these 2D videos to 3D
using cloud services.
This brings a new dimension to video conferencing.
With Maxine 3D visualization,
creating an enhanced sense of depth and presence,
you can dynamically adjust the camera
to see every angle, emotion,
engage with others more directly with enhanced eye contact
and personalize your experience with animated avatars.
Stylizing them with simple text prompts.
With Maxine’s language capabilities,
your avatar can speak in other languages,
even ones you don’t know.
-
Nvidia (speaks foreign language).
-
Nvidia (speaks foreign language)
-
[Voice Off-Screen] Nvidia Maxine 3D,
together with Grace Hopper,
bring immersive 3D video conferencing
to anyone with a mobile device,
revolutionizing the way we connect,
communicate, and collaborate.
- (speaks foreign language).
Okay, so all of the words,
all of the words coming out of my mouth, of course,
was generated by AI.
So instead of compression stream and decompression
in the future,
communications will be perceive, stream
and reconstruction, regeneration.
And it can be generated in all kinds of different ways.
It can be generated in 3D, of course,
it can regenerate your language in another language.
So we now have a universal translator.
This computing technology could be, of course,
placed into every single cloud.
But the thing that’s really amazing,
Grace Hopper is so fast, it can even run the 5G stack.
A state-of-the-art 5G stack
could just run in software in Grace Hopper, completely free.
Completely free.
All of a sudden a 5G radio runs in software,
just like a video code deck used to run in software.
Now you can run a 5G stack in software.
Of course, the layer one, PHY layer,
the layer two, MAC layer,
and the 5G core,
all of that computation is quite intensive,
and it has to be timing precise,
which is the reason why we have
a BlueField-3 in the computer.
So that kind of time, precision timing, networking,
but the entire stack can now run in a Grace Hopper.
Basically what’s happening here,
this computer that you’re seeing here
allows us to bring generative AI
into every single data center in the world today,
because we have software defined 5G,
then the telecommunication network
can also become a computing platform,
like the cloud data centers.
Every single data center in the future could be intelligent.
Every data center could be software defined.
Whether it’s internet based, networking based,
or 5G communications based.
Everything will be software defined.
This is really a great opportunity
and we’re announcing a partnership with SoftBank
to partner, to re-architect and implement generative AI
and software defined 5G stack into the network
of SoftBank data centers around the world.
Really excited about this collaboration.
I just talked about how we are going to
extend the frontier of AI.
I talked about how we’re gonna scale out generative AI,
to scale out generative AI,
to advance generative AI,
but the number of computers in the world
is really quite magnificent.
Data centers all over the world.
And all of them over the next decade
will be recycled and re-engineered into
accelerated data centers
and generative AI capable data centers.
But there are so many different applications
in so many different areas.
Scientific computing, data processing,
large language model training that you’ve been,
we’ve been talking about generative AI inference,
that we just talked about,
cloud and video and graphics,
EDA, SDA, which as we just mentioned,
generative AI for enterprise.
And of course the Edge.
Each one of these applications
have different configurations of servers,
different focus of applications,
different deployment methods.
And so security is different.
Operating system is different.
How it’s managed is different.
Where the computers are will be different.
And so each one of these diverse application spaces
will have to be re-engineered with a new type of computer.
Well, this is just an enormous number of configurations.
And so today we’re announcing,
in partnership with so many companies here in Taiwan,
the Nvidia MGX,
it’s an open modular server design specification
and the design for accelerated computing.
Most of the servers today are designed
for general purpose of computing.
The mechanical, thermal and electrical
is insufficient for a very highly dense computing system.
Accelerated computers take, as you know, many servers
and compress it into one.
You save a lot of money,
you save a lot of floor space,
but the architecture is different.
And we designed it so that it’s
multi-generation standardized,
so that once you make an investment,
our next generation GPUs and next generation CPUs
and next generation DPUs
will continue to easily configure into it
so that we can have best time-to-market
and best preservation of our investment.
We could, configurable into hundreds of configurations
for different diversities
and different diverse applications
and integrate into cloud or enterprise data center.
So you could have either busbar or power regulators.
You could have cabling in the hot aisle
or cabling in the cold aisle.
Different data centers have different requirements,
and we’ve made this modular and flexible
so that it could address all of these different domains.
Now, this is the basic chassis.
Let’s take a look at some of the other things
you could do with it.
This is the Omniverse OVX server.
It has x86, 4 L40s, BlueField-3,
two CX7, six PCI express slots.
This is the Grace Omniverse server.
Grace, same four L40s, BF-3, BlueField-3 and two CX7s, okay?
This is the Grace cloud graphics server.
This is the Hopper NVLink generative AI inference server.
And we need sound effects like,
(makes swooshing sounds)
like that.
And then Grace Hopper 5G aerial server,
okay, for telecommunications?
Software defined telco.
Grace Hopper, 5G, aerial server, short,
and of course, Grace Hopper liquid cooled, okay?
For very dense servers.
And then this one is our dense general purpose
Grace super chip server.
This is just CPU.
And has the ability to accommodate four Grace CPUs
or two Grace super chips,
enormous amounts of performance.
And if you are, if your data center is power limited,
this CPU has incredible capabilities.
In a power limited environment running PageRank,
and there’s all kinds of benchmarks you can run,
but we ran PageRank,
in ISO performance, in ISO performance Grace only consumes
580 watts for the whole server.
Versus the latest generation CPU servers, x86 servers,
1090 watts.
It’s basically half the power at the same performance.
Or another way of saying, you know,
at the same power, if your data center is power constrained,
you get twice the performance.
Most data centers today are power limited,
and so this is really a terrific capability.
There are all kinds of different servers
that are being made here in Taiwan.
Let me show you one of them.
Get my exercise in today.
Whoosh.
I am the sound effect.
(audience laughs)
Okay, you’ve got BlueField-3, got the CX7,
you got the Grace Hopper.
There’s so many systems.
Let me show you some of them.
All of our partners, I’m so grateful,
you’re working on Grace, Grace Hopper,
Hopper, L40s, L4s, BlueField-3s.
Just about every single processor that we’re building
are configured into these servers of all different types.
And so this is Supermicro, this is Gigabyte.
Tens,
I think it’s like 70 different server configurations.
This is Ingrasys,
this is ASRock,
Tyan,
Wistron,
Inventec.
They’re just beautiful servers.
Pegatron.
We love servers, I love servers, they’re beautiful.
They’re beautiful to me.
QCT,
Asus,
Wiwynn,
ZT Systems.
And this ZT System,
what you’re looking at here is one of the pods
of our Grace Hopper AI supercomputer.
So I want to thank all of you.
I want to thank all of you for your great support.
Thank you.
(audience applauds)
We’re gonna expand AI into a new territory.
If you look at the world’s data centers,
the data center is now the computer
and the network defines what that data center does.
Largely there are two types of data centers today.
There’s the data center that’s used for hyperscale,
where you have application workloads of all different kinds.
The number of CPUs you,
the number of GPUs you connect to it,
is relatively low.
The number of tenants is very high.
The workload very, is heterogeneous.
The workloads are loosely coupled,
and you have another type of data center.
They’re like supercomputing data centers,
AI supercomputers,
where the workloads are tightly coupled.
The number of tenants far fewer,
and sometimes just one.
Its purpose is high throughput
on very large computing problems.
Okay?
And it’s basically a standalone,
it’s basically a standalone.
And so supercomputing centers and AI supercomputers
and the world’s cloud, hyperscale cloud,
are very different in nature.
Ethernet is based on TCP,
it’s a lossy algorithm and it’s very resilient.
And whenever there’s a loss, packet loss, it retransmits.
There’s error correction that’s done.
It knows which one of the packets are lost
and requests the sender to retransmit it.
The ability for ethernet to interconnect
components of almost from anywhere
is the reason why the world’s internet was created.
If it required too much coordination,
how could we have built today’s internet?
So Ethernet’s profound contribution,
it’s this lossy capability,
it’s resilient capability.
And because so,
it basically can connect almost anything together.
However, a supercomputing data center can’t afford that.
You can’t interconnect random things together
because that billion dollar supercomputer,
the difference between 95%
networking throughput achieved
versus 50% is effectively $500 million.
So the cost of that one workload
running across the entire supercomputer
is so expensive that you can’t afford
to lose anything in the network.
InfiniBand is our, relies on RDMA very heavily.
It is a flow control.
It’s a loss-less approach.
It requires flow control,
which basically means
you have to understand the data center
from end to end,
the switch to the NIC, to the software.
So that you can orchestrate the traffic
with adaptive routing,
so that you could deal with congestion control
and avoid the oversaturation of traffic
in an isolated area, which result in packet loss.
You simply can’t afford that,
because in the case of InfiniBand, it’s loss-less.
And so one is lossy, the other one’s lost-less.
Very resilient, very performant.
These two data centers have lived separate lives.
These two data centers have lived separate lives,
but now we would like to bring generative AI
to every data center.
The question is how.
The question is how do we introduce
a new type of ethernet that’s, of course,
backwards compatible with everything,
but is engineered in a way
that achieves the type of capabilities
that we can bring AI workloads
to the world’s, any data center.
This is a really exciting journey
and at the core of this strategy
is a brand new switch that we’ve made.
This is the Spectrum-4 switch.
And this switch,
Everything I’m showing today are very heavy.
Whoosh, like that.
(audience applauds)
This is the Spectrum-4 switch,
128 ports of 400 gigabits per second.
128 ports of 400 gigabits per second,
51.2 terabytes per second.
This is the chip.
It’s gigantic.
100 billion transistors,
90 millimeters by 90 millimeters,
800 balls on the bottom.
This is a 500 watt chip.
This switch is 2,800 watts.
It’s air cooled.
There are 48 PCBs that connect the switch together.
48 PCBs that build up the switch.
And the switch is designed,
the switch is designed to, wait (indistinct), oh,
this switch is designed to enable a new type of ethernet.
Remember what I said,
InfiniBand is fundamentally different
in the sense that we build InfiniBand from end to end,
so that we could do adaptive routing,
so that we could do congestion control,
so that we can isolate performance,
so we could keep noisy neighbors apart,
so that we could do in fabric computing.
All of these capabilities are simply not possible
in a loss-less approach of the internet
and of course of ethernet.
And so the way that we do Infinity Band
is designed from end-to-end.
Just the way supercomputers are built,
this is the way AI supercomputers are built.
And we are gonna do the same thing now,
for the very first time for ethernet,
we’ve been waiting for the critical part.
And the critical part is the Spectrum-4 switch.
The entire system consists of several things.
So our new ethernet system for AI,
(speaks foreign language),
is this,
the Spectrum-4 switch
and the BlueField-3 SmartNIC or DPU.
This BlueField-3 is 400 gigabits per second NIC,
it connects directly to the Spectrum-4 switch
in combination of four things,
The switch, the BlueField-3,
the cables that connect them together,
which are super important,
and the software that runs it all together
represents the Spectrum-4.
This is what it takes to build a high performance network.
And we’re gonna take this capability to the world’s CSPs.
The reception has been incredible,
and the reason for that is, of course,
every CSP, every data center would like to turn
every single data center into a generative AI data center.
There are some people that need,
they deployed ethernet throughout their company
and they have a lot of users for that data center.
The ability to have the capabilities of InfiniBand
and isolating it within their data center
is very difficult to do.
And so for the very first time,
we’re bringing the capabilities
of high performance computing into the ethernet market.
And we’re gonna bring to the ethernet market several things.
First, adaptive routing.
Adaptive routing basically says,
based on the traffic that is going through your data center,
depending on which one of the ports
of that switch is overcongested,
it will tell BlueField-3 to send,
and we’ll send it to another port,
BlueField-3 on the other end would reassemble it
and present the data to the CPU,
present the data to the computer, to the GPU,
without any CPU intervention.
All completely in RDMA.
Number one, adaptive routing.
Second, congestion control.
Congestion control, it is possible,
it is possible for certain different ports
to become heavily congested,
in which case the telemetry of the switch,
each switch will see how the network is performing
and communicate to the senders,
“Please don’t send any more data right away,
because you’re congesting the network.”
That congestion control requires
basically a overriding system,
which includes software to switch,
working with all of the endpoints
to overall manage the congestion
or the traffic and the throughput of the data center.
Now it’s really important to realize
that in a high performance computing application,
every single GPU must finish their job,
so that the application can move on.
In many cases where you do all reductions,
you have to wait until the results of every single one.
So if one node takes too long,
everybody gets held back,
this capability is going to increase
Ethernet’s overall performance dramatically.
So Spectrum-X, really excited to roll this out.
The world’s applications,
the world’s enterprise has yet to enjoy generative AI.
So far we’ve been working with CSPs.
And the CSPs, of course,
is going to be able to bring generative AI
to many different regions
and many different applications and industries.
The big journey is still ahead of us.
There are so many enterprises in the world,
and everybody, because of the multi-modality capability
that I was mentioning before,
every industry can now benefit from generative AI.
There’s several things that we have to do.
Number one,
we have to help the industries build custom language models.
Not everybody can use the language models
that are available in a public service.
Some customers need language models
that are highly specialized for their particular modality.
For example, proteins or chemicals.
Each one of these industries have proprietary information,
and so how can we help them do that?
We have created a service called NVIDIA AI Foundation.
It is a cloud service that captures NVIDIA’s AI expertise
and makes it possible for you to train your own AI models.
We will help you develop your own AI models
with supervised fine tuning, with guard railing,
with proprietary knowledge bases
and reinforcement learning human feedback,
so that this AI model is perfect for your application.
We then deploy this model to run on Nvidia AI Enterprise.
This is the operating system
that I was talking to you about earlier.
This operating system runs in every single cloud.
This allows this very simple system
with Nvidia AI Foundation for training large language models
and deploying the language model into Nvidia AI Enterprise,
which is available in every single model,
every single cloud in, on-prem,
allows every single enterprise to be able to engage.
Now, one of the things that very few people realize
is that today there’s only one software stack
that is enterprise secure and enterprise grade.
That software stack is CPU.
And the reason for that is
because in order to be enterprise grade,
it has to be enterprise secure,
it has to be enterprise managed
and enterprise supported across its entire life,
across its lifecycle.
There are so much software in accelerated computing.
Over 4,000 software packages is what it takes
for people to use accelerated computing today.
In data processing and training and optimization,
all the way to inference.
So for the very first time,
we are taking all of that software
and we’re gonna maintain it
and manage it like Red Hat does for Linux.
Nvidia AI Enterprise will do it
for all of NVIDIA’s libraries.
Now, enterprise can finally have
an enterprise grade and enterprise secure software stack.
This is such a big deal.
Otherwise,
even though the promise of accelerated computing
is possible for many researchers and scientists,
it’s not available for enterprise companies.
And so let’s take a look at the benefit for them.
This is a simple image processing application.
If you were to do it on a CPU versus on a GPU
running on enterprise Nvidia AI Enterprise,
you’re getting 31.8 images per minute
or basically 24 times the throughput,
or you only pay 5% of the cost.
This is really quite amazing.
This is the benefit of accelerated computing in the cloud.
But for many companies, enterprises is simply not possible
unless you have the stack.
Nvidia AI Enterprise is now fully integrated into
AWS, Google Cloud and Microsoft Azure and Oracle Cloud.
And so when you go and deploy your workloads in those clouds
and you want software that is enterprise grade,
or if you have customers that need enterprise grade software
Nvidia AI Enterprise is ready for you.
It is also integrated into the world’s
machine learning operations pipeline.
As I mentioned before,
AI is a different type of workload
and this type of, new type of software,
this new type of software has a whole new software industry.
And this software industry,
a hundred percent of them,
we have now connected with Nvidia AI Enterprise.
Now lemme talk to you about the next phase of AI,
where AI meets a digital twin.
Now, why does AI need a digital twin?
I’m gonna explain that in just a second,
but first, let me show you what you can do with it.
In order for AI to have a digital twin,
in order for AI to understand heavy industry,
remember, so far AI has only been used for light industry,
information, words, images, music, so on and so forth.
If we want to use AI for heavy industry,
the $50 trillion of manufacturing,
many of that you’re part of,
the trillions of dollars of healthcare,
all of the different manufacturing sites,
whether you’re building chip fabs or battery plants
or electric vehicle manufacturing factories,
all of these would have to be digitized
in order for artificial intelligence to be used,
to automate, to design, to build and to automate
the future of your business.
And so the first thing that we have to do
is we have to create the ability for their world
to be represented in digital.
Okay, so number one is digitalization.
Well, why does it, how would you use that?
So let me give you just simple example.
In the future, you would say to your robot,
I would like you to do something,
and the robot will understand your words
and it would generate animation.
Remember I said earlier,
you can go from text to text,
you can go from text to image,
you can go from text to music.
Why can’t you go from text to animation?
And so of course, in the future,
robotics will be highly revolutionized
by the technology we already have in front of us.
However,
how does this robot know
that the motion that it is generating
is grounded in reality?
It is grounded in physics.
You need a software system
that understands the laws of physics.
Now, you’ve actually seen this already with ChatGPT.
Whereas AI, Nvidia AI, would use Nvidia Omniverse
as in a reinforcement learning loop to ground itself.
You have seen ChatGPT do this
using reinforcement learning human feedback.
Using human’s feedback
ChatGPT was able to be developed
by grounding it to humans’, well, sensibility,
and align it with our principles.
So reinforcement learning with human feedback
is really important.
Reinforcement learning for physics feedback
is very important.
Let me show you everything
that you’re about to see is a simulation.
Let’s roll it, please.
(calm music plays)
(rousing music plays)
(rousing music continues)
(calm music plays)
(rousing music plays)
Everything was a simulation.
Nothing was art, everything was simulation.
Isn’t that amazing?
(audience applauds)
In the last 25 years,
I come to Taiwan, you sell me things.
(audience laughs)
Omniverse will be the first thing I’m gonna sell you.
And this,
(audience applauds)
because this will help you revolutionize your business
and turn it into a digital business
and automate it With AI.
You will build products in digital first,
before you make it physical,
you will build factories and plan it in digital first
before you make it in physical.
And so in the future, Omniverse is a very big deal.
Now, I’m gonna show you very quickly,
Omniverse in the cloud.
Omniverse, the entire stack is so complicated.
And so we put the whole thing into a cloud managed service
and it’s hosted in Azure.
This particular experience you’re gonna have,
the computer is in California.
And Sean, I’m sorry I took so much time,
so you’re gonna have to (indistinct).
-
[Sean] We’ll go quick.
-
[Jensen Huang] Okay.
-
[Sean] So this is,
let’s take a look at the Omniverse cloud.
So this is, you know, just a browser.
And we’re looking now into Omniverse Factory Explorer.
It’s running 10,000 kilometers away
in our Santa Clara headquarters,
and we’re leveraging the power of our data center now
to visualize this factory floor.
We’ve, using real factory data
from Simmons and Autodesk Revit to take a look.
It’s a cloud application,
so we can have multiple users collaborating.
Let’s go ahead and bring up Eloise screen.
And we can see,
now we have these two users in this environment,
and Jeff on the left there is gonna look at some markup.
We have this task to perform.
We need to move this object.
So we can have Eloise just go ahead
and grab that conveyor belt, move it over,
and as he does so,
you’ll see that it’s reflected
accurately and completely in real time on Jeff’s screen.
So we’re able to collaborate with multiple users.
And even in bringing up this demo,
we had users from around the globe working on the process.
East and West Coast, United States, Germany, even Sydney,
and of course here in Taipei to put this together.
Now, as we, if we’re modifying our production line,
of course one of the things we’ll want to do
is add the necessary safety equipment.
So we’re able to simply drag and drop items into Omniverse
and modify this production environment
and begin tweaking this
and optimizing for performance
even before we break around with construction.
- [Jensen Huang] That is so cool.
This is in California,
6,264 miles away or something like that.
34 milliseconds by speed of light one way.
And it’s completely interactive.
Everything is Ray traced.
No art is necessary.
You bring everything, the entire CAD into Omniverse,
open up a browser,
bring your data in,
bring your factory in.
No art is necessary.
The lighting just does what the lighting does,
physics does what the physics does,
if you wanna turn off physics, you can,
if you wanna turn on physics, you can.
And multiple users, as many as you like,
can enter the Omniverse at the same time and work together.
One unified source of data across your entire company.
You could virtually build,
you could virtually design and build
and operate your factory
before you break ground and not make the mistake,
which usually in the beginning of the integration
creates a lot of change orders,
which costs a lot of money.
Thank you very much, Sean.
Good job.
(audience applauds)
Not only, notice just now,
it was humans interacting with Omniverse,
humans interacting with Omniverse.
In the future, Sean will even have a generative AI,
an AI interact with him in Omniverse.
We could, of course,
imagine in the very beginning
there was (indistinct) that could be a character,
that could be one of the users of Omniverse
interacting with you, answering questions, helping you.
We can also use generative AI
to help us create virtual worlds.
So for example, this is a bottle
that’s rendered in Omniverse
that could be placed in a whole bunch
of different type of environments.
It could render beautifully physically.
You could place it just by giving it a prompt by saying,
I would like to put this life,
put these bottles in a lifestyle photograph style backdrop
for of modern warm farmhouse bathroom.
Changed the background,
everything is all integrated and rendered again.
Okay, so generative AI will come together with Omniverse
to assist the virtual, the the creation of virtual worlds.
Today we’re announcing that WPP,
the world’s largest advertising agency
and advertising services company
is partnering with Nvidia to build
a content generation engine
based on Omniverse and generative AI.
It integrates tools from so many different other partners,
Adobe Firefly for example, Getty, Shutterstock.
And it integrates into this entire environment
and it makes it possible for them
to generate unique content for different users,
for ad applications. for example.
So in the future, whenever you engage a particular ad,
it could be generated just for you,
but yet the product is precisely rendered,
because of course the product integrity is very important.
And so every time that you engage
a particular ad in the future,
today it was retrieved,
and today the computing model,
when you engage information, it is retrieved.
In the future when you engage information,
much of it will be generated.
Notice the computing model has changed.
WPP generates 25% of the ads that the world sees.
60% of the world’s largest companies are already clients.
And so they made a video
of how they would use this technology.
(calm music plays)
- [Voice Off-Screen] The world’s industries
are racing to realize the benefits of AI.
Nvidia and WPP are building a groundbreaking
generative AI enabled content engine
to enable the next evolution
of the $700 billion digital advertising industry.
Built on Nvidia AI and Omniverse
this engine gives brands the ability
to build and deploy highly personalized
and compelling visual content,
faster and more efficiently than ever before.
The process starts by building
a physically accurate digital twin of a product
using Omniverse Cloud,
which connects product design data
from industry standard tools.
Then WPP artists create customized and diverse virtual sets
using a combination of digitized environments
and generative AI tools by organizations,
such as Giddy Images and Adobe,
trained on fully licensed data using Nvidia Picasso.
(rousing music plays)
This unique combination of technologies allows WPP
to build accurate photorealistic visual content
and e-commerce experiences
that bring new levels of realism and scale to the industry.
(audience applauds)
(Jensen Huang knocks) (audience laughs)
- (speaks foreign language), (audience laughs)
no problem, we continue.
Okay, so that’s WPP.
You could see, you could see, you could see
that that was an example.
If you think about a second,
that’s an example of a company using digital information
that was created in design,
and using that digital information all the way in marketing.
I’m gonna show you now how we’re gonna use Omniverse and AI
here in Taiwan and we’re gonna use it for manufacturing.
Manufacturing, as you know,
is one of the largest industries in the world.
We’re gonna use Omniverse to teach in AI
and then we’re gonna use Metropolis,
our AI deployment, Edge deployment system, to deploy the AI.
Okay, run it.
(upbeat music plays)
- [Voice Off-Screen] The $45 trillion
global manufacturing industry
is comprised of 10 million factories operating 24/7.
Enterprises are racing to become software defined
to ensure they can produce high quality products
as quickly and cost efficiently as possible.
Let’s see how electronics manufacturer Pegatron
uses Nvidia AI and Omniverse to digitalize their factories.
In Omniverse, they start by building
a digital twin of their factory,
unifying disparate 3D and CAD data sets
to provide a real-time view of their complex factory data
to their planners and suppliers.
In the cloud native digital twin
planners can then optimize layout virtually
before deploying changes to the real factory.
The digital twin is also used
as a training ground at Data Factory,
for Pegatron’s perception AIs.
They use Nvidia Isaacs Sim built on Omniverse
to simulate and optimize their fleet of mobile robots,
which help move materials throughout the facility
as well as the pick and place robotic arms
that assist on production lines.
In the fully operational factory,
Pegatron deploys automated optical inspection or AOI points
along their production lines,
which reduces cost and increases line throughput.
Nvidia Metropolis enables Pegatron
to quickly develop and deploy cloud native,
highly accurate AOI workflows
across their production lines.
Omnibus replicator generates synthetic data sets
of PCBA defects,
which are too complex and costly
to capture in the real world,
like scratches and missing or misaligned components.
Pegatron then combines the synthetic data
with Nvidia pre-trained models,
Nvidia TAO for training, adaptation and optimization,
and NVIDIA DeepStream for realtime inference.
Resulting in AOI performance
that is 99.8% accurate
with a four times improvement in throughput.
With software defined factories
built on Nvidia AI and Omniverse
manufacturers can super accelerate factory bring up
and minimize change orders,
continuously optimize operations,
maximize production line throughput,
all while reducing costs.
- Did you see that?
The whole factory is an Omniverse.
(audience applauds)
It’s completely digital.
Imagine if you have digital information in your hands,
what can you do with it?
Almost everything.
And so this is one of the things that’s really exciting.
What you just saw is basically every factory in the future
will be digital, of course, first.
Every factory will be a robot.
Inside the factories there will be other robots
that the factory is orchestrating.
We are also going to build robots that move themselves.
So far the robots that you saw are stationary.
Now we’re gonna also have robots that move.
Everything that move in the future
will have artificial intelligence
and will have robotic capability.
And so today we’re announcing our robot platform,
Nvidia Isaac AMR is now available as a reference design
for anybody who wants to build robots.
Just like we did with our high performance computing.
Nvidia builds the whole stack.
And then we disaggregate it,
so that if you would like to buy the chip, that’s fine,
if you’d like to buy the system, that’s fine,
if you like to use your software, that’s fine,
you’d like to use our software, that’s fine.
If you’d like to use your own algorithm, that’s terrific,
if you’d like to use ours, that’s terrific.
However you would like to work with us,
we’re open for business.
So that we can help you integrate accelerated computing
wherever you like.
In the future, we’re gonna do the same with robotics.
We built the entire robotics stack top to bottom
from the chip to the algorithms.
We have state-of-the-art perception
for multi-modality sensors,
state-of-the-art mapping,
state-of-the-art localization and planning,
and a cloud mapping system.
Everything has been created.
However you would like to use it.
You can use pieces of it.
It’s open, available for you,
including all the cloud mapping systems.
So this is Isaac AMR.
It includes, starts with a chip called Orin.
It goes into a computer
and it goes into the Nvidia Nova Orin,
which is a reference system,
a blueprint for AMRs.
This is the most advanced AMR in the world today.
And that entire stack has been built.
And let’s take a look at it.
(upbeat music plays)
- [Voice Off-Screen] To improve productivity
and increase worker safety,
factories and warehouses are migrating away
from manual forklifts and guided vehicles to full autonomy.
Nvidia Isaac AMR provides an integrated end-to-end solution
to deploy fully autonomous mobile robots.
The core of the solution is Nova Orin,
a sensor suite and computing hardware
that enables mapping, autonomy and simulation.
Nova’s collection of advanced sensors
speeds the mapping process,
leveraging our cloud-based service
to generate an accurate and detailed 3D voxel map.
This 3D map can then be sliced across a plane
to generate 2D maps tailored for different autonomous robots
that might operate in a facility.
With these maps in place on robot lidar
or cost effective cameras
provide autonomous navigation
that works reliably in the most complex
and dynamic environments.
Isaac mission control optimizes route planning
using the (indistinct) library.
To improve operations developers can use Isaac Sim
and NVIDIA Omniverse to create realistic digital twins
of the operating environment.
This allows fully autonomous robots
to be trained on complex tasks entirely in simulation.
All operations can be fully validated
using Isaac Sim before deployment to the real world.
Isaac AMR accelerates your migration to full autonomy,
reducing costs, and speeding deployment
of the next generation of AMRs.
- Nova cannot tell that it is not
in the reality environment.
Nova thinks it is in the real environment.
It cannot tell.
And the reason for that is because all the sensors work,
physics work, it can navigate, it can localize itself.
Everything is physically based.
So therefore we could design, we could design,
(speaks foreign language),
therefore we can design the robot,
simulate the robot,
train the robot all in Isaac,
and then we take the brain, Isaac Sim,
then we take the brain, the software,
and we put it into the actual robot.
And with some amount of adaptation,
it should be able to perform the same job.
This is the future of robotics.
Omniverse and AI working together.
The ecosystem that we have been in, the IT ecosystem,
is a quarter of a trillion dollars per year,
$250 billion a year.
This is the IT industry.
For the very first time in our history together,
we finally have the ability to understand
the language of the physical world.
We can understand the language of heavy industry
and we have a software tool.
We have a software system called Omniverse
that allows us to simulate, to develop,
to build and operate our physical plants,
our physical robots,
our physical assets,
as if they were digitally.
The excitement in the hard industries,
the heavy industries has been incredible.
We have been connecting Omniverse all over the world
with tools companies, robotics companies, sensor companies,
all kinds of industries.
There are three industries right now as we speak
that’s putting enormous investments into the world.
Number one, of course, it’s chip industry.
Number two, electric battery industry.
Number three, electric vehicle industry.
Trillions of dollars will be invested
in the next several years,
trillions of dollars will be invested
in the next several years.
And they would all like to do it better
and they would like to do it in a modern way.
For the very first time we now give them a system,
a platform, tools that allows them to do that.
I wanna thank all of you for coming today.
I talked about many things.
It’s been a long time since I’ve seen you,
so I had so much to tell you.
(audience laughs)
It was too much,
it was too much.
Last night I said this is too much.
This morning I said this is too much.
And now I realize it’s too much.
(speaks foreign language)
(audience laughs and applauds)
I told you, I told you several things.
I told you that we are going through two
simultaneous computing industry transition,
accelerated computing and generative AI.
Two.
This form of computing
is not like the traditional general purpose computing.
It is full stack.
It is data center scale
because the data center is the computer.
And it is domain specific,
for every domain that you want to go into,
every industry you go into,
you need to have the software stack.
And if you have the software stack,
then the utility, the utilization of your machine,
the utilization of your computer will be high.
So number two,
it is full stack data scanner scale and domain specific.
We are in full production of the engine of generative AI
and that is HGX H100.
Meanwhile, this engine that’s gonna be used for AI factories
will be scaled out using Grace Hopper,
the engine that we created for the era of generative AI.
We also took Grace Hopper
and realized that we can extend on the one hand
the performance,
but we also have to extend the fabric
so that we can make larger models trainable.
And we took Grace Hopper connected to 256 node NVLink
and created the largest GPU in the world, DGX GH200.
We’re trying to extend generative AI
and accelerated competing
in several different directions at the same time.
Number one,
we would like to of course, extend it in the cloud,
so that every cloud data center can be an AI data center,
not just AI factories and hyperscale.
But every hyperscale data center
can now be a generative AI data center.
And the way we do that is the Spectrum-X.
It takes four components to make Spectrum-X possible.
The switch, the BlueField-3 NIC,
the interconnects themselves,
the cables are so important in high speed communications
and the software stack that goes on top of it.
We would like to extend generative AI
to the world’s enterprise.
And there are so many different configurations of servers.
And the way we’re doing that with partnership
with our Taiwanese ecosystem,
the MGX modular accelerated computing systems.
We put Nvidia in the cloud
so that every enterprise in the world
can engage us to create generative AI models
and deploy it in a secure way, hyper,
in a enterprise grade,
enterprise secure way in every single cloud.
And lastly,
we would like to extend AI to the world’s heavy industries,
the largest industries in the world.
So far our industry, our industry that I’ve been,
all of us been part of,
has been part, a small part of the world’s total industry.
For the very first time the work that we’re doing
can engage every single industry.
And we do that by automating factories, automating robots.
And today we even announced our first
robotics full reference stack, the Nova Orin,
I wanna thank all of you
for your partnership over the years.
Thank you.
(audience applauds)
- (upbeat accompaniment plays) ♪ I am here at Computex
♪ I hope that you do like me best ♪
♪ now my song say lo-long thank you ♪
♪ from Nvidia ♪