NVIDIA - NVIDIA Keynote at COMPUTEX 2023

Video

Transcript

(calm music plays)

[Voice Off-Screen] I am a translator,

transforming text into creative discovery,

turning movement into animation

and infusing words with emotion.

(speaks foreign language)
[Voice Off-Screen] I am a healer,

exploring the building blocks that make us unique,

discovering new threats before they happen.

And searching for the cures to keep them at bay.

I am a visionary

creating new medical miracles

and unlocking the secrets of our sun

to keep us safer here on Earth.

I am a navigator finding a single moment

in a sea of content.

We are announcing the next generation.
[Voice Off-Screen] In the perfect setting

for our most amazing stories.

I am a creator

adding new dimensions to creative expression

and reimagining our virtual selves.

I am a helper,

personalizing our surroundings.

[Person Off-Screen] Help me arrange the living room.
[Voice Off-Screen] harnessing the wisdom

of a million programmers

and turning the real world into a virtual playground.

I even helped write this script,

breathed life into the words and compose the melody.

(rousing music plays)

I am AI brought to life by Nvidia,

deep learning and brilliant minds everywhere.

[Announcer] Ladies and gentlemen,

please welcome Nvidia founder and CEO Jensen Huang

[2nd Announcer] (speaks foreign language)

(rousing music plays)

(audience applauds)

(speaks foreign language)

(audience cheers)

We’re back.

(audience applauds and cheers)

Our first live event in almost four years.

I haven’t given a public speech in four years.

Wish me luck.

(audience laughs)

(audience applauds)

I have a lot to tell you, very little time,

so let’s get going.

Ray Tracing,

simulating the characteristics of light and materials

is the ultimate accelerated computing challenge.

Six years ago we demonstrated for the very first time

rendering this scene in less than few hours.

After a decade of research,

we were able to render this scene in seconds,

15 seconds on our highest end GPU six years ago.

And then we invented Nvidia RTX

and combined three fundamental technologies,

hardware accelerated rate tracing,

artificial intelligence

processing on Nvidia Tensor core GPUs,

and brand new algorithms.

Let’s take a look at the difference in just five years,

roll it.

This is running on CUDA GPUs six years ago

rendering this beautiful image

that would’ve otherwise taken a couple of hours on a CPU.

So this was a giant breakthrough already,

enormous speed up running on accelerated computing.

And then we invented the RTX GPU.

Run it, please.

(upbeat music plays)

The holy grail of computer graphics, Ray tracing,

is now possible in real time.

This is the technology we have put into RTX

and this after five years

is a very important time for us

because for the very first time

we took our third generation ADA architecture, RTX GPUs

and brought it to the mainstream with two new products

that are now completely in production.

Are you?

I got that backwards.

Everything looks different inside out and upside down.

(audience laughs)

Okay, this is our brand new,

right here you’re looking at an ADA GPU

running ray tracing and artificial intelligence

at 60 frames a second.

It’s 14 inch, it weighs almost nothing.

It’s more powerful than the highest end PlayStation

and this is the RTX 4060 Ti for our core gamers.

Both of these are now in production.

Our partners here in Taiwan are producing

both of these products in very, very large productions

and I’m really excited about ’em.

Thank you very much.

(audience applauds)

I can almost put this in my pocket.

(audience laughs)

AI made it possible for us to do that.

Everything that you saw

would’ve been utterly impossible without AI.

For every single pixel we render,

we use AI to predict seven others.

For every pixel we compute,

AI predicted seven others.

The amount of energy we save,

the amount of performance we get is incredible.

Now of course,

I showed you the performance on those two GPUs,

but it wouldn’t have been possible

if not for the super computer back at Nvidia

running all the time training the model

so that we can enhance applications.

So the future is what I demonstrated to you just now,

you can extrapolate almost everything

that I’m gonna talk about for the rest of the talk

into that simple idea

that there will be a large computer writing software

developing and deploying software that is incredible,

that can be deployed in devices all over the world.

We used AI to render this scene.

We’re gonna also use AI to bring it alive.

Today we’re announcing Nvidia ACE, Avatar Cloud Engine,

that is designed for animating

to bringing a digital avatar to life.

It has several characteristics, several capabilities,

speech recognition, text-to-speech,

natural language understanding,

basically a large language model,

and using the sound

that you will be generating with your voice,

animate the face

and using the sound and the expression that you’re saying

animate your gestures.

All of this is completely trained by AI.

We have a service that includes pre-trained models

that you can come, developers can come,

and modify and enhance for your own application,

for your own story because every game has a different story.

And then you can deploy it in the cloud

or deploy it on your device.

Has a great backend, has a TensorRT.

TensorRT is Nvidia’s deep learning, optimizing compiler,

and you could deploy it on Nvidia GPUs

as well as output Onyx, an industry standard backend,

so that you can run it on any device.

Let’s take a look at this scene in just a second,

but let me first tell you about it.

It is completely rendered with Ray tracing.

Notice the beautiful lights,

so many different lights

and all of the different lights

are projecting light from that source.

So you have all kinds of direct lights,

you have global illumination.

You’re gonna see incredibly beautiful

shadows and physics simulation

and notice the character,

the beautiful rendering of the character.

Everything is done in Unreal Engine 5.

We partnered with a avatar framework,

an avatar tool maker called (indistinct),

and together we developed this demo you’re about to see.

Okay, run please.

(upbeat music plays)

Everything is real time.

Hey Jen, how are you?
Unfortunately not so good.
How come?
I’m worried about the crime around here?

It’s gotten bad lately.

My ramen shop got caught in the crossfire.

Can I help?
If you want to do something about this,

I have heard rumors that the powerful crime lord Kumon Aoki

is causing all sorts of chaos in the city.

He may be the root of this violence.

I’ll talk to him, where can I find him?
I have heard he hangs out

in the underground fight clubs on the city’s east side.

Try there.

Okay, I’ll go.
Be careful, Kai.
None of that conversation was scripted.

We gave that AI, this Gin AI character, a backstory,

his story about his ramen shop

and the story of this game.

And all you have to do is go up and talk to this character.

And because this character

has been infused with artificial intelligence

and large language models,

it can interact with you,

understand your meaning and interact with you

in a really reasonable way.

All of the facial animation completely done by the AI,

we have made it possible

for all kinds of characters to be generated.

They’re all domain,

they have their own domain knowledge.

You can customize it,

so everybody’s games different

and look how wonderfully beautiful they are

and natural they are.

This is the future of video games.

Not only will AI contribute to the rendering

and the synthesis of the environment,

AI will also animate the characters.

AI will be a very big part of the future of video games.

The most important computer

of our generation is unquestionably the IBM System/360.

This computer revolutionized several things.

The first computer in history

to introduce the concept of a central processing unit,

the CPU,

virtual memory,

expandable I/O,

multitasking,

the ability to scale this computer

for different applications

across different computing ranges.

And one of the most important contributions

and one of its greatest insights

is the importance of preserving software investment.

The software ran across the entire range of computers

and it ran across multiple generations.

So that the software you develop,

IBM recognized the importance of software,

recognized the importance of preserving your investment,

and very importantly

recognized the importance of installed base.

This computer revolutionized not only computing

and many of us grew up reading the manuals of this computer

to understand how computer architecture worked,

to even learn about DMA for the very first time,

this computer not only revolutionized computing,

it revolutionized the thinking of the computer industry.

System/360 and the programming model of the System/360

has largely retained until today, 60 years.

In 60 years a trillion dollars worth

of the world’s data center

all basically used a computing model

that was innovated all the way 60 years ago.

Until now.

There are two fundamental transitions

happening in the computer industry today.

All of you are deep within it and you feel it.

There are two fundamental trends.

The first trend is because CPU scaling has ended,

the ability to get 10 times more performance

every five years has ended.

The ability to get 10 times more performance

every five years at the same cost

is the reason why computers are so fast today.

The ability to sustain 10 times more computing

every five years without increase in power

is the reason why the world’s data center

hasn’t consumed so much more power on Earth.

That trend has ended and we need a new computing approach

and accelerated computing is the path forward.

It happened at exactly the time

when a new way of doing software was discovered,

deep learning,

these two events came together

and it’s driving computing today.

Accelerated computing and generative AI.

This way of doing software,

this way of doing computation

is a reinvention from the ground up and it’s not easy.

Accelerated computing is a full stack problem.

It’s not as easy as general purpose computing.

The CPU is a miracle.

High level programming languages, great compilers,

almost anybody could write reasonably good programs,

because the CPU is so flexible.

However,

its ability to continue to scale and performance has ended

and we need a new approach.

Accelerate computing is full stack.

You have to re-engineer everything from the top down

and from the bottom up,

from the chip to the systems,

to the systems’ software,

new algorithms

and of course optimizing the new, the applications.

The second is that it’s a data center scale problem.

And the reason why it’s a data center scale problem

is today the data center is the computer.

Unlike the past,

when your PC was a computer or the phone was a computer,

today your data center is the computer.

The application runs across the entire data center

and therefore it’s vital that you have to understand

how to optimize the chips, the compute,

the software across the NIC, the switch,

all the way to the other end in a distributor computing way.

And the third accelerated computing is multi-domain.

It’s domain specific.

The algorithms and the software stacks that you create

for computational biology

and the software stack you create

for computational fluid dynamics

are fundamentally different.

Each one of these domains of science need their own stack,

which is the reason why accelerated computing has taken us

nearly three decades to accomplish.

This entire stack has taken us nearly three decades.

However, the performance is incredible,

and I’ll show you.

After three decades,

we realize now that we’re at the tipping point.

A new computing model is extremely hard to come by.

And the reason for that is this.

In order for there to be a new computing model,

you need developers.

But a developer would only come if they’re,

and developers have to create applications

that end users would buy.

And without end users, there would be no customers,

no computer companies to build computers,

without computer companies,

like yourself building computers,

there would be no install base.

Without install base, there would be no developers.

Without developers, there’ll be no applications.

This loop,

this loop has been suffered

by so many computing companies in the 40 years

that I’ve been in this industry,

this is really one of the first major times in history

a new computing model has been developed and created.

We now have 4 million developers,

3000 plus applications,

40 million CUDA downloads in history,

25 million just last year.

40 million downloaded in history, 25 million just last year.

15,000 startup companies in the world built on Nvidia today,

building on Nvidia today,

and 40,000 large companies, enterprises around the world,

are using accelerated computing.

We have now reached the tipping point

of a new computing era.

This new computing model is now enjoyed and embraced

by just about every computer company

and every cloud company in the world.

There’s a reason for that.

It turns out that every single computing approach

its benefit in the final analysis is lower cost.

The PC revolution that started

and that Taiwan enjoyed in 1984,

starting in 1984, the year I graduated,

that decade in the eighties was the PC revolution.

PC brought computing to a price point

nobody’s ever seen before.

And then of course,

mobile devices was convenient

and it also saved enormous amounts of money.

We aggregated and combined the camera,

the music player, your PC, a phone.

So many different devices were all integrated into one.

And as a result,

not only are you able to enjoy your life better,

it also saves a lot of money and great convenience.

Every single generation

provided something new and saved money.

Well, this is how accelerated computing works.

This is accelerated computing

used for large language models.

For large language models.

Basically the core of generative AI.

This is a 10, this example is a $10 million server

and we costed everything.

We costed the process, we costed all the chips,

we costed all the network, we costed literally everything.

And so $10 million gets you nearly a thousand CPU servers.

And to train to process this large language model

takes 11 gigawatt hours.

11 gigawatt hours, okay?

And this is what happens when you accelerate

this workload with accelerated computing.

And so with $10 million, for a $10 million server,

you buy 48 GPU servers.

It’s the reason why people say

that GPU servers are so expensive.

Remember people say GPU servers are so expensive.

However, the GPU server is no longer the computer.

The computer is the data center.

Your goal is to build the most cost effective data center,

not build the most cost effective server.

Back in the old days when the computer was the server,

that would be a reasonable thing to do,

but today the computer is the data center.

And so what you want to do

is you want to create the most effective data center

with the best TCO.

So for $10 million,

you buy 48 GPU servers.

It only consumes 3.2 gigawatt hours

and 44 times the performance.

Let me just show it to you one more time.

This is before and this is after.

And this is,

(audience laughs and applauds)

we want dense computers, not big ones.

We want dense computers, fast computers, not big ones.

And so that’s ISO budget.

Let me show you something else.

Okay, so this is $10 million again,

960 CPU servers.

Now this time,

this time we’re gonna be ISO power.

We’re gonna keep this number the same.

We’re gonna keep this number the same, okay?

So this number is the same,

the same amount of power.

This means your data center is power limited.

In fact, most data centers today are power limited.

And so with being power limited using accelerated computing,

you can get 150 times more performance

with three times more cost.

But why is that such a great deal?

The reason for that is because

it’s very expensive and time consuming

to find another data center.

Almost everybody is power limited today.

Almost everybody is scrambling to break new ground

to get more data centers.

And so if you are power limited

or if your customers are power limited,

then what they can do

is invest more into that data center,

which already, which has 11 gigawatts,

and you can get a lot more throughput,

continue to drive your growth.

Here’s another example.

This is my favorite.

If your goal, if your goal is to get the work done,

if your goal is to get the work done,

you don’t care how.

Your goal is to get the work done, you don’t care how.

And this is the work you want to get done.

ISO work, okay?

This is ISO work.

All right, look at this.

(audience laughs and applauds)

Oh, (speaks foreign language)

(audience laughs)

(audience applauds)

That was, people love that, right?

Nice to see you, Carol.

Nice to see you, Spencer.

Okay, so let’s do that one more time.

It’s so, it’s so delightful.

Look at this.

Oh, oh, oh, no, no.

Okay, look at this.

Look at this,

before,

after.

The more you buy, the more you save.

That’s right.

The more you buy, the more you save,

(audience laughs and applauds)

the more you buy, the more you save.

That’s Nvidia.

You don’t have to understand the strategy,

you don’t have to understand the technology.

The more you buy, the more you save.

That’s the only thing you have to understand.

Data center, data center.

Now, why is it?

You have been, you’ve heard me talk about this

for so many years.

In fact, every single time you saw me,

I’ve been talking to you about accelerated computing.

I’ve been talking about accelerated computing,

well, for a long time,

well over two decades.

And now why is it that finally it’s the tipping point?

Because the data center equation is very complicated.

This equation is very complicated.

This is the cost of building a data center.

The data center TCO is a function of,

and this is the part where everybody mess up.

It’s a function of the chips, of course, no question.

It’s a function of the systems, of course, no question.

But it’s also because there’s so many different use cases.

It’s a function of the diversity of systems

that can be created.

It is the reason why Taiwan is at the bedrock

at the foundation of the computer industry.

Without Taiwan,

why would there be

so many different configurations of computers?

Big, small, powerful, cheap,

enterprise, hyperscale, super computing,

so many different types of configurations,

1U, 2U, 4U, right?

And all completely compatible.

The ability for the hardware ecosystem of Taiwan

to have created so many different versions

that are software compatible.

Incredible.

The throughput of the computer of course is very important.

It depends on the chip,

but it also depends as you know,

the algorithm.

Because without the algorithm libraries

accelerated computing does nothing.

It just sits there.

And so you need to algorithm software libraries.

It’s a data center scale problem.

So networking matters.

And networking matters,

distributed computing is all about software.

Again, system software matters.

And before, before long,

in order for you to present your system to your customers,

you have to ultimately have a lot of applications

that run on top of it.

The software ecosystem matter.

Well, the utilization of a data center

is one of the most important criteria of its TCO.

Just like a hotel.

If the hotel is wonderful, but it’s mostly empty,

the cost is incredible.

And so you need the utilization to be high.

In order for the utilization to be high,

you have to have many different,

many different applications.

So the richness of the applications matter.

Again, the algorithm in libraries

and now the software ecosystem.

You purchase a computer,

but these computers are incredibly hard to deploy

from the moment that you buy the computer

to the time that you put that computer to work

to start making money,

that difference can be weeks,

if you’re very good at it, incredibly good at it.

We can stand up a super computer

in a matter of a couple of weeks,

because we build so many all around the world,

hundreds around the world.

But if you’re not very good at it, it could take a year.

That difference,

depriving yourself the year of making money

and the year of depreciation,

incredible cost.

Lifecycle optimization.

Because the data center is software defined,

there are so many engineers that will continue to refine

and continue to optimize the software stack.

Because NVIDIA’s software stack

is architecturally compatible across all of our generations,

across all of our GPUs.

Every time we optimize something, it benefits everybody.

So lifecycle optimization,

and of course finally the energy that you use, power.

But this equation is incredibly complicated.

Well,

because we have now addressed

so many different domains of science,

so many industries,

and in data processing,

in deep learning, classical machine learning,

so many different ways for us to deploy software

from the cloud to enterprise to supercomputing to the Edge,

so many different configurations of GPUs,

from our HGX versions to our Omniverse versions,

to our cloud GPU and graphics version,

so many different versions.

Now, the utilization is incredibly high.

The utilization of Nvidia GPU is so high,

almost every single cloud is over extended.

Almost every single data center is over extended.

There are so many different applications using it.

So we have now reached the tipping point

of accelerated computing.

We have now reached the tipping point of generative AI.

And I want to thank all of you for your support

and all of your assistance and partnership

in making this dream happen.

Thank you.

(audience applauds)

Every single time we announce the new product,

the demand for every single generation

increased and increased and increased.

And then one generation, it hockey steps,

we stick with it, we stick with it, we stick with it,

Kepler,

and then Volta,

and then Pascal,

and then Volta,

and then Ampere.

And now this generation of accelerated computing,

the demand is literally from every corner of the world.

And we are so, so, so excited

to be in full volume production of the H100.

And I wanna thank all of you for your support.

This is incredible.

(audience applauds)

H100 is in full production,

manufactured by companies all over Taiwan,

used in clouds everywhere,

enterprises everywhere.

And let’s take a look at a short video

of how H100 is produced.

(machines whir)

(upbeat music plays)

(machines continue to whir)

(upbeat music continues)

It’s incredible.

This computer,

(audience applauds)

35,000 components on that system board,

eight Hopper GPUs.

Let me show it to you.

Whoosh.

[Audience Member] Yay. (audience applauds and cheers)
All right, this,

I would lift this,

but I still have the rest of the keynote

I would like to give.

This is 60 pounds, 65 pounds.

It takes robots to lift it, of course,

and it takes robots to insert it,

because the insertion pressure is so high

and it has to be so perfect.

This computer is $200,000, and as you know,

it replaces an entire room of other computers.

So this, I know it’s a very, very expensive computer.

It’s the world’s single most expensive computer

that you can say, “The more you buy, the more you save.”

(audience laughs)

This is what a compute tray looks like.

Even this is incredibly heavy.

See that?

So this is the brand new H100

with the world’s first computer

that has a transformer engine in it.

The performance is utterly incredible.

Hopper is in full production.

We’ve been driving computing,

this new form of computing for 12 years.

When we first met the deep learning researchers,

we were fortunate to realize

that not only was deep learning going to be

a fantastic algorithm for many applications initially,

computer vision and speech,

but it would also be a whole new way of doing software.

This fundamental new way of doing software

that can use data to develop,

to train a universal function approximator

of incredible dimensionality.

It can basically predict almost anything

that you have data for,

so long as the data has structure that it can learn from.

And so we realized the importance of this

new method of developing software,

and then it has the potential

of completely reinventing computing.

And we were right.

12 years later,

we have reinvented literally everything.

We reinvented, of course, we started by creating

a new type of library.

It’s essentially like a SQL,

except for deep learning for neural network processing.

It’s like a rendering engine,

a solver for neural network processing called (indistinct).

We reinvented the GPU.

People thought that GPUs would just be GPUs.

They were completely wrong.

We dedicated ourselves to reinventing the GPUs,

so that it’s incredibly good at Tensor processing.

We created a new type of packaging called SXM

and worked with TSMC on CoWos,

so that we could stack multiple chips on the same dye.

NVLink, so that we can connect these SXM modules together

with high speed chip to chip interconnect.

Almost a decade ago,

we built the world’s first chip to chip (indistinct),

so that we can expand the memory size of GPUs

using SXMs and NVLink.

And we create a new type of motherboard,

we call it HGX that I just showed you.

No computers has ever been this heavy before

or consumed this much current.

Every aspect of a data center had to be reinvented.

We also invented a new type of computer appliance

so that we could develop software on it

so that third party developers could develop software on it

with a simple appliance we call DGX,

basically a giant GPU computer.

DGX.

We also purchased Mellanox,

which is one of the great strategic decisions of our company

because we realized that in the future,

if the data center is the computer,

then the networking is the nervous system.

If the data center is the configure, is the computer,

then the networking defines the data center.

That was an incredibly good acquisition

and since then we’ve done so many things together

and I’m gonna show you some

really, really amazing work today.

And then of course, an operating system.

If you have a nervous system, a distributed computer,

it needs to have an operating system.

And the operating system of this distributed computering

we call Magnum IO.

Some of our most important work.

And then all of the algorithms and engines

that sit on top of these computers, we call Nvidia AI.

The only AI operating system in the world

that takes data processing from data processing

to training, to optimization, to deployment and inference.

End-to-end deep learning processing.

It is the engine of AI today.

Well, every single generation since Kepler, which is K80,

to Pascal Volta, Ampere, Hopper,

every two years, every two years,

we took a giant leap forward.

But we realized we needed more than even that,

and which is the reason why we connected GPUs

to other GPUs called NVLink,

built one giant GPU,

and we connected those GPUs together

using InfiniBand into larger scale computers.

That ability for us to drive the processor

and extend the scale of computing

made it possible for the AI research organization,

the community,

to advance AI at an incredible rate.

We just kept pushing and pushing and pushing.

Hopper went into production August of last year.

August, 2022.

2024, which is next year, we’ll have Hopper-Next.

Last year we had Quantum.

Two years from now or next year,

we’ll have Quantum-Next.

So every two years we take giant leaps forward

and I’m expecting the next leap to be giant as well.

This is the new computer industry.

Software is no longer programmed just by computer engineers.

Software is programmed by computer engineers

working with AI supercomputers.

These AI supercomputers are a new type of factory.

It is very logical that a car industry has factories.

They build things that you can see, cars.

It is very logical that computer industry

has computer factories.

You build things that you can see, computers.

In the future,

every single major company

will also have AI factories

and you will build and produce your company’s intelligence.

And it’s a very sensible thing.

We cultivate and develop and nourish our employees

and continue to create the conditions

by which they can do their best work.

We are intelligence producers already.

It’s just that the intelligence producers,

the intelligence are people.

In the future, we will be intelligence producers,

artificial intelligence producers.

And every single company will have factories

and the factories will be built this way.

This translates to your throughput.

This translates to your scale

and you will build it in a way that is very, very good TCO.

Well, our dedication to pursuing this path

and relentlessly increasing the performance.

just think in 10 years time,

we increased the throughput,

we increased the scale,

the overall throughput across all of that stack

by 1 million x,

1 million x in 10 years.

Well just now, in the beginning,

I showed you computer graphics.

In five years,

we improved the computer graphics by 1000 times.

In five years,

using artificial intelligence and accelerated computing.

Using accelerated computing and artificial intelligence,

we accelerated computer graphics

by 1000 times in five years.

Moore’s law is probably

currently running at about two times.

A thousand times in five years.

A thousand times in five years

is 1 million times in 10.

We’re doing the same thing in artificial intelligence.

Now, question is,

what can you do when your computer

is 1 million times faster?

What would you do if your computer

was 1 million times faster?

Well, it turns out that the friends we met

at University of Toronto,

Ilia Sutskever, Alex Krizhevsky, and Geoffrey Hinton, they,

and Ilia Sutskever, of course, was the founder of OpenAI,

and he discovered the continuous scaling

of artificial intelligence and deep learning networks

and came up with the ChatGPT breakthrough.

Well,

in this general form,

this is what has happened.

The transformer engine, the transformer engine,

and the ability to use unsupervised learning,

unsupervised learning,

be able to learn from a giant amount of data

and recognize patterns and relationships

across a large sequence.

And using transformers to predict the next word,

large language models were created,

and the breakthrough, of course, is very, very, very clear.

And I’m sure that everybody here

has already tried ChatGPT.

But the important thing is this.

We now have a software capability

to learn the structure of almost any information.

We can learn the structure of text, sound, images,

there is structure in all of us,

physics,

proteins,

DNA,

chemicals,

anything that has structure,

we can learn that language, learn its language.

Of course you can learn English and Chinese and Japanese

and so on and so forth,

but you can also learn the language of many other things.

And then the next breakthrough came,

generative AI.

Once you can learn the language,

once you can learn the language of certain information,

then with control and guidance

from another source of information,

that we call prompts,

we can now guide the AI

to generate information of all kinds.

We can generate text-to-text, text-to-image.

But the important thing is this,

information transformed to other information

is now possible.

Text to proteins, text to chemicals,

images to 3D,

images to 2D,

images to text, captioning,

video to video.

So many different types of information

can now be transformed.

For the very first time in history

we have a software technology

that is able to understand

the representation of information of many modalities.

We can now apply computer science,

we can now apply the instrument of our industry.

We can now apply the instrument of our industry

to so many different fields that were impossible before.

This is the reason why everybody is so excited.

Now let’s take a look at some of these.

Let’s take a look at what it can do.

This, here’s a prompt, and this prompt says,

“Hi Computex,”

so this is a, we type in the word,

“Hi, Computex, I’m here to tell you

how wonderful stinky tofu is.

(audience laughs)

You can enjoy it right here in Taiwan.

It’s best from the night market.”

I was just there the other night.

(audience laughs)

Okay, play it.

Hi Computex, I’m here to tell you about

how wonderful stinky tofu is.

You can enjoy it right here in Taiwan.

It’s best from the night market.

The only input was words.

The output was that video.

(audience applauds)

Okay, here’s another prompt.

Taiwanese, we give, we tell this AI, okay?

We tell this AI, this is a Google text-to-music.

“Traditional Taiwanese music.

Peaceful, like it’s warm and raining

in a lush forest at daybreak.”

Please.

(calm music plays)

(speaks foreign language)

(audience applauds)

We send texts in.

AI says, “Hmm, okay, this music,” okay?

Hear this one.

“I am here at Computex,

I will make you like me best.

Sing sing it with me.

I really like Nvidia.”

(audience laughs)

Okay?

So this is the word, these are the words are,

these are the words.

And I, “Hey, hey, voice mod, could you write me a song?”

These are the words.

Okay, play it.

(upbeat accompaniment plays) ♪ I am here at Computex

♪ I will make you like me best, yeah. ♪

♪ Sing sing it with me ♪

♪ I really like Nvidia. ♪

(audience cheers and applauds)

Okay, so obviously

this is a very, very important new capability,

and that’s the reason why there’s

so many generative AI startups.

We’re working with some 1600 generative AI startups.

They’re in all kinds of domain,

in language domains, in media, in biology.

This is one of the most important areas that we care about.

Digital biology is going to go through its revolution.

This is going to be an incredible thing.

Just as we had Synopsis and Cadence

help us create tools

so that we can build wonderful chips and systems,

for the very first time,

we’re gonna have computer aided drug discovery tools.

And they’ll be able to manipulate

and work with proteins and chemicals

and understands disease targets

and try all kinds of chemicals

that previously had never been thought of before.

Okay, so really, really important area.

Lots of startups, tools and platform companies.

And let me show you a video

of some of the work that they’re doing.

Play it, please.

Generative AI is the most important

computing platform of our generation.

Everyone from first movers to Fortune 500 companies

are creating new applications

to capitalize on generative AI’s ability

to automate and co-create.

For creatives, there’s a brand new set of tools

that would be simply impossible a few years ago.

Adobe is integrating Firefly into their creative apps,

ethically trained and artist friendly.

You can now create images with a simple text prompt.

Or expand the image of your real photos

to what lies beyond the lens.

Productivity apps will never be the same.

Microsoft has created a co-pilot for office apps.

Every profession is about to change.

Tabnine is democratizing programming

by tapping into the knowledge base

of a million developers

to accelerate application development

and reduce debugging time.

If you’re an architect

or just thinking of remodeling your home,

Planner 5D can instantly turn a 2D floor plan to a 3D model.

And right here in Taiwan,

AnHorn medicines is targeting

difficult to treat diseases and accelerating cures.

Incredible, right?

Just utterly incredible.

There’s no question that we’re in a new computing era.

There’s just absolutely no question about it.

Every single computing era,

you could do different things that weren’t possible before,

and artificial intelligence certainly qualifies.

This particular computing era is special in several ways.

One,

it is able to understand information

of more than just text and numbers.

It can now understand multimodality,

which is the reason why this computing revolution

can impact every industry, every industry.

Two, because this computer doesn’t care how you program it.

It will try to understand what you mean,

because it has this incredible

large language model capability.

And so the programming barrier is incredibly low.

We have closed the digital divide.

Everyone is a programmer now.

You just have to say something to the computer.

Third,

this computer,

not only is it able to do amazing things for the future,

it can do amazing things for every single application

of the previous era,

which is the reason why all of these APIs

are being connected into Windows applications

here and there, and browsers and PowerPoint and Word.

Every application that exists will be better because of AI.

You don’t have to just AI this generation,

this computing era does not need new applications.

It can succeed with old applications

and it’s gonna have new applications.

The rate of progress, the rate of progress,

because it’s so easy to use,

is the reason why it’s growing so fast.

This is going to touch literally every single industry.

And at the core with,

just as with every single computing era,

it needs a new computing approach.

In this particular era,

the computing approach is accelerated computing,

and it has been completely reinvented from the ground up.

Last several years,

I’ve been talking to you about

the new type of processor we’ve been creating,

and this is the reason we’ve been creating it.

Ladies and gentlemen,

Grace Hopper is now in full production.

This is Grace Hopper.

(audience applauds)

Nearly 200 billion transistors in this computer.

Oh, (speaks foreign language),

(speaks foreign language),

(audience laughs)

look at this.

This is Grace Hopper.

This processor,

this processor is really quite amazing.

There are several characteristics about it.

This is the world’s first accelerated processor,

accelerated computing processor

that also has a giant memory.

It has almost 600 gigabytes of memory

that’s coherent between the CPU and the GPU.

And so the GPU can reference the memory,

the CPU can reference the memory,

and unnecessary, any unnecessary copying back and forth

could be avoided.

The amazing amount of high speed memory

lets the GPU work on very, very large data sets.

This is a computer, this is not a chip.

Practically, the entire computer’s on here.

All of the low, this is uses low power DDR memory,

just like your cell phone,

except this has been optimized and designed

for high resilience data center applications.

Incredible levels of performance.

This took us several years to build,

and I’m so excited about it

and I’ll show you some of the things

that we’re gonna do with it.

Janine, thank you, (speaks foreign language).

You’re supposed to say (speaks foreign language).

(audience laughs)

(speaks foreign language)

Okay, so four PetaFLOPS,

transformer engine,

72 CPU cores,

they’re connected together.

They’re connected together by a high speed chip to chip link

900 gigabytes per second.

And so the local memory, 96 gigabytes of HBM3 memory

is augmented by LPDDR memory.

Across this link,

across very, very large and high speed cache.

So this computer is like none other the world’s ever seen.

Now, let me show you some of its performance.

So the, I’m comparing here on three different applications,

and this is a very important application.

If you have never heard of it,

be sure to look into it.

It’s called Vector database.

Vector database is a database that has tokenized,

that has vectorized the data that you’re trying to store.

And so it understands the relationships

of all of the data inside its storage.

This is incredibly important

for knowledge augmentation of the large language models

to avoid hallucination.

The second is deep learning recommender systems.

This is how we get news and music

and all the texts that you see on your devices.

Recommend of course, music and goods

and all kinds of things.

Recommender system is the engine of the digital economy.

This is probably the single most valuable piece of software

that any of the companies in the world runs.

This is the world’s first AI factory.

There will be other AI factories in the future,

but this is really the first one.

And the last one is large language model inference.

65 gigabytes.

65 gigabytes is a fair,

65 billion parameters is a fairly large language model,

and you can see that on a CPU it’s just not possible.

The CPU is simply not possible.

With Grace Hopper, excuse me,

with Hopper on an x86,

it’s faster, but notice it’s memory limited.

You could of course take this 400 gigabytes

and cut it up into a whole bunch of small pieces,

shard it and distribute it across more GPUs

and distribute it across more GPUs.

But in the case of Grace Hopper,

in the case of Grace Hopper,

Janine, (speaks foreign language).

Oh, Janine doesn’t speak Chinese.

(audience laughs)

(speaks foreign language).

Okay,

Grace Hopper, Grace Hopper has more memory,

has more memory on this one module than all of these.

Does that make sense?

And so as a result,

you don’t have to break the data into so many pieces.

Of course, the amount of computation of this is higher,

but this is so much easier to use.

And if you want to scale out large language models,

if you wanna scale out vector databases,

if you want to scale out deep learning recommender systems,

this is the way to do it.

This is so easy to use.

Plug this into your data center and you can scale out AI.

Okay?

So this is the reason why we built Grace Hopper.

The other application that I’m super excited about

is the foundation of our own company.

Nvidia is a big customer of Cadence.

We use all of their tools.

And all of their tools run on CPUs.

And the reason why they run on CPUs

is because NVIDIA’s data sets are very large

and the algorithms are refined

over very long periods of time.

And so most of the algorithms are very CPU centric.

We’ve been accelerating some of these algorithms

with Cadence for some time,

but now with Grace Hopper,

and we’ve only been working on it

for a couple of days and weeks,

the performance speed up,

I can’t wait to show it to you, is insane.

This is going to revolutionize an entire industry,

one of the highest

compute intensive industries in the world, of course,

designing chips, designing electronic systems,

CAE,

CAD,

EDA,

and of course, digital biology.

All of these markets,

all of these industries

require very large amounts of computation.

But the data set is also very large.

Ideal for Grace Hopper.

Well, 600 gigabytes is a lot.

600 gigabytes is a lot.

This is basically a supercomputer I’m holding in my hands.

This 600 gigabytes is a lot.

But when you think about it,

when we went from AlexNet

of 62 million parameters 12 years ago

and trained on 1.2 million images,

it is now 5,000 times bigger with Google’s Palm,

5,000 times bigger with 340 billion parameters.

And of course, we’re gonna make even bigger ones than that.

And that’s been trained on 3 million times more data.

So literally in the course of 10 years,

the computing problem of deep learning

increased by 5,000 times for the software

and 3 million times for the dataset.

No other area of computing has ever increased this fast.

And so we’ve been chasing

the deep learning advance

for quite some time.

This is going to make a big, big contribution.

However, 600 gigabytes is still not enough.

We need a lot more.

So let me show you what we’re gonna do.

So the first thing is, of course,

we have the Grace Hopper Super chip,

put that into a computer.

The second thing that we’re gonna do

is we’re gonna connect eight of these together

using NVLink.

This is an NVLink switch.

So eight of this, eight of this,

connect into three switch trays

into eight, eight Grace Hopper pod.

These eight Grace Hopper pods,

each one of the Grace Hoppers

are connected to the other Grace Hopper

at 900 gigabytes per second.

600 gigabytes, 900 megabytes per second.

Eight of them connected together as a pod,

and then we connect 32 of them together

with another layer of switches.

And in order to build,

in order to build this,

256 Grace Hopper super chips connected

into one exaFLOPS,

one exaFLOPS.

You know that countries and nations

have been working on exaFLOPS computing

and just recently achieved it.

256 Grace Hoppers for deep learning

is one exaFLOPS transformer engine.

And it gives us 144 terabytes of memory

that every GPU can see.

This is not 144 terabytes distributed.

This is 144 terabytes connected.

Why don’t we take a look at what it really looks like?

Play it, please.

(machine whirs)

(audience applauds)

This is 150 miles of cables,

fiber optic cables,

2000 fans,

70,000 cubic feet per minute.

It probably recycles the air in this entire room

in a couple of minutes.

40,000 pounds.

Four elephants.

(audience laughs)

One GPU.

(audience applauds)

if I can get up on here.

This is actual size.

I wonder if this can play crisis.

(audience laughs)

Only gamers know that joke.

So this is our brand new Grace Hopper

AI super computer.

It is one giant GPU.

Utterly incredible.

We’re building it now.

All of the, every component is in production

and we’re so excited that Google Cloud, Meta and Microsoft

will be the first companies in the world to have access,

and they will be doing exploratory research

on the pioneering front,

the boundaries of artificial intelligence with us.

We will, of course, build these systems as products.

And so if you would like to have an AI supercomputer,

we would of course, come and install it in your company.

We also share the blueprints of this supercomputer

with all of our cloud suppliers,

so that our cloud partners,

so that they can integrate it into their networks

and into their infrastructure.

And we will also build it inside our company

for us to do research ourselves and do development.

So this is the DGX GH200.

It is one giant GPU.

Okay?

(audience applauds)

1964,

the year after I was born,

was a very good year for technology.

IBM, of course, launched the System/360

and AT&T demonstrated to the world

their first picture phone.

Encoded, compressed,

streamed over copper telephone wires and,

twisted pair, and on the other end decoded,

picture phone, little tiny screen black and white.

To this day, this very experience is largely the same,

of course, at much, much higher volumes

for all of the reasons we all know well.

Video calls is now one of the most important things we do.

Everybody does it.

About 65% of the Internet’s traffic is now video,

and yet the way it’s done is fundamentally still the same.

Compress it on the device, stream it,

and decompress it on the other end.

Nothing changed in 60 years.

We treat communications like it goes down a dump pipe.

The question is,

what would happen if we applied generative AI to that?

We have now created a computer,

I showed you, Grace Hopper.

It can be deployed broadly all over the world, easily.

And as a result, every data the center, every server

will have generative AI capability.

What would happen if instead of decompression,

streaming and re, you know, recovering,

decompression, compression/decompression,

what if, the cloud performed generative AI capability to it?

Let’s take a look.

(upbeat music plays)

[Voice Off-Screen] The future of wireless

and video communications will be 3D, generated by AI.

Let’s take a look at how Nvidia Maxine 3D

running on the Nvidia Grace Hopper super chip

can enable 3D video conferencing on any device

without specialized software or hardware.

Starting with a standard 2D camera sensor

that’s in most cell phones, laptops, and webcams,

and tapping into the processing power of Grace Hopper.

Maxine 3D converts these 2D videos to 3D

using cloud services.

This brings a new dimension to video conferencing.

With Maxine 3D visualization,

creating an enhanced sense of depth and presence,

you can dynamically adjust the camera

to see every angle, emotion,

engage with others more directly with enhanced eye contact

and personalize your experience with animated avatars.

Stylizing them with simple text prompts.

With Maxine’s language capabilities,

your avatar can speak in other languages,

even ones you don’t know.

Nvidia (speaks foreign language).
Nvidia (speaks foreign language)
[Voice Off-Screen] Nvidia Maxine 3D,

together with Grace Hopper,

bring immersive 3D video conferencing

to anyone with a mobile device,

revolutionizing the way we connect,

communicate, and collaborate.

(speaks foreign language).

Okay, so all of the words,

all of the words coming out of my mouth, of course,

was generated by AI.

So instead of compression stream and decompression

in the future,

communications will be perceive, stream

and reconstruction, regeneration.

And it can be generated in all kinds of different ways.

It can be generated in 3D, of course,

it can regenerate your language in another language.

So we now have a universal translator.

This computing technology could be, of course,

placed into every single cloud.

But the thing that’s really amazing,

Grace Hopper is so fast, it can even run the 5G stack.

A state-of-the-art 5G stack

could just run in software in Grace Hopper, completely free.

Completely free.

All of a sudden a 5G radio runs in software,

just like a video code deck used to run in software.

Now you can run a 5G stack in software.

Of course, the layer one, PHY layer,

the layer two, MAC layer,

and the 5G core,

all of that computation is quite intensive,

and it has to be timing precise,

which is the reason why we have

a BlueField-3 in the computer.

So that kind of time, precision timing, networking,

but the entire stack can now run in a Grace Hopper.

Basically what’s happening here,

this computer that you’re seeing here

allows us to bring generative AI

into every single data center in the world today,

because we have software defined 5G,

then the telecommunication network

can also become a computing platform,

like the cloud data centers.

Every single data center in the future could be intelligent.

Every data center could be software defined.

Whether it’s internet based, networking based,

or 5G communications based.

Everything will be software defined.

This is really a great opportunity

and we’re announcing a partnership with SoftBank

to partner, to re-architect and implement generative AI

and software defined 5G stack into the network

of SoftBank data centers around the world.

Really excited about this collaboration.

I just talked about how we are going to

extend the frontier of AI.

I talked about how we’re gonna scale out generative AI,

to scale out generative AI,

to advance generative AI,

but the number of computers in the world

is really quite magnificent.

Data centers all over the world.

And all of them over the next decade

will be recycled and re-engineered into

accelerated data centers

and generative AI capable data centers.

But there are so many different applications

in so many different areas.

Scientific computing, data processing,

large language model training that you’ve been,

we’ve been talking about generative AI inference,

that we just talked about,

cloud and video and graphics,

EDA, SDA, which as we just mentioned,

generative AI for enterprise.

And of course the Edge.

Each one of these applications

have different configurations of servers,

different focus of applications,

different deployment methods.

And so security is different.

Operating system is different.

How it’s managed is different.

Where the computers are will be different.

And so each one of these diverse application spaces

will have to be re-engineered with a new type of computer.

Well, this is just an enormous number of configurations.

And so today we’re announcing,

in partnership with so many companies here in Taiwan,

the Nvidia MGX,

it’s an open modular server design specification

and the design for accelerated computing.

Most of the servers today are designed

for general purpose of computing.

The mechanical, thermal and electrical

is insufficient for a very highly dense computing system.

Accelerated computers take, as you know, many servers

and compress it into one.

You save a lot of money,

you save a lot of floor space,

but the architecture is different.

And we designed it so that it’s

multi-generation standardized,

so that once you make an investment,

our next generation GPUs and next generation CPUs

and next generation DPUs

will continue to easily configure into it

so that we can have best time-to-market

and best preservation of our investment.

We could, configurable into hundreds of configurations

for different diversities

and different diverse applications

and integrate into cloud or enterprise data center.

So you could have either busbar or power regulators.

You could have cabling in the hot aisle

or cabling in the cold aisle.

Different data centers have different requirements,

and we’ve made this modular and flexible

so that it could address all of these different domains.

Now, this is the basic chassis.

Let’s take a look at some of the other things

you could do with it.

This is the Omniverse OVX server.

It has x86, 4 L40s, BlueField-3,

two CX7, six PCI express slots.

This is the Grace Omniverse server.

Grace, same four L40s, BF-3, BlueField-3 and two CX7s, okay?

This is the Grace cloud graphics server.

This is the Hopper NVLink generative AI inference server.

And we need sound effects like,

(makes swooshing sounds)

like that.

And then Grace Hopper 5G aerial server,

okay, for telecommunications?

Software defined telco.

Grace Hopper, 5G, aerial server, short,

and of course, Grace Hopper liquid cooled, okay?

For very dense servers.

And then this one is our dense general purpose

Grace super chip server.

This is just CPU.

And has the ability to accommodate four Grace CPUs

or two Grace super chips,

enormous amounts of performance.

And if you are, if your data center is power limited,

this CPU has incredible capabilities.

In a power limited environment running PageRank,

and there’s all kinds of benchmarks you can run,

but we ran PageRank,

in ISO performance, in ISO performance Grace only consumes

580 watts for the whole server.

Versus the latest generation CPU servers, x86 servers,

1090 watts.

It’s basically half the power at the same performance.

Or another way of saying, you know,

at the same power, if your data center is power constrained,

you get twice the performance.

Most data centers today are power limited,

and so this is really a terrific capability.

There are all kinds of different servers

that are being made here in Taiwan.

Let me show you one of them.

Get my exercise in today.

Whoosh.

I am the sound effect.

(audience laughs)

Okay, you’ve got BlueField-3, got the CX7,

you got the Grace Hopper.

There’s so many systems.

Let me show you some of them.

All of our partners, I’m so grateful,

you’re working on Grace, Grace Hopper,

Hopper, L40s, L4s, BlueField-3s.

Just about every single processor that we’re building

are configured into these servers of all different types.

And so this is Supermicro, this is Gigabyte.

Tens,

I think it’s like 70 different server configurations.

This is Ingrasys,

this is ASRock,

Tyan,

Wistron,

Inventec.

They’re just beautiful servers.

Pegatron.

We love servers, I love servers, they’re beautiful.

They’re beautiful to me.

QCT,

Asus,

Wiwynn,

ZT Systems.

And this ZT System,

what you’re looking at here is one of the pods

of our Grace Hopper AI supercomputer.

So I want to thank all of you.

I want to thank all of you for your great support.

Thank you.

(audience applauds)

We’re gonna expand AI into a new territory.

If you look at the world’s data centers,

the data center is now the computer

and the network defines what that data center does.

Largely there are two types of data centers today.

There’s the data center that’s used for hyperscale,

where you have application workloads of all different kinds.

The number of CPUs you,

the number of GPUs you connect to it,

is relatively low.

The number of tenants is very high.

The workload very, is heterogeneous.

The workloads are loosely coupled,

and you have another type of data center.

They’re like supercomputing data centers,

AI supercomputers,

where the workloads are tightly coupled.

The number of tenants far fewer,

and sometimes just one.

Its purpose is high throughput

on very large computing problems.

Okay?

And it’s basically a standalone,

it’s basically a standalone.

And so supercomputing centers and AI supercomputers

and the world’s cloud, hyperscale cloud,

are very different in nature.

Ethernet is based on TCP,

it’s a lossy algorithm and it’s very resilient.

And whenever there’s a loss, packet loss, it retransmits.

There’s error correction that’s done.

It knows which one of the packets are lost

and requests the sender to retransmit it.

The ability for ethernet to interconnect

components of almost from anywhere

is the reason why the world’s internet was created.

If it required too much coordination,

how could we have built today’s internet?

So Ethernet’s profound contribution,

it’s this lossy capability,

it’s resilient capability.

And because so,

it basically can connect almost anything together.

However, a supercomputing data center can’t afford that.

You can’t interconnect random things together

because that billion dollar supercomputer,

the difference between 95%

networking throughput achieved

versus 50% is effectively $500 million.

So the cost of that one workload

running across the entire supercomputer

is so expensive that you can’t afford

to lose anything in the network.

InfiniBand is our, relies on RDMA very heavily.

It is a flow control.

It’s a loss-less approach.

It requires flow control,

which basically means

you have to understand the data center

from end to end,

the switch to the NIC, to the software.

So that you can orchestrate the traffic

with adaptive routing,

so that you could deal with congestion control

and avoid the oversaturation of traffic

in an isolated area, which result in packet loss.

You simply can’t afford that,

because in the case of InfiniBand, it’s loss-less.

And so one is lossy, the other one’s lost-less.

Very resilient, very performant.

These two data centers have lived separate lives.

These two data centers have lived separate lives,

but now we would like to bring generative AI

to every data center.

The question is how.

The question is how do we introduce

a new type of ethernet that’s, of course,

backwards compatible with everything,

but is engineered in a way

that achieves the type of capabilities

that we can bring AI workloads

to the world’s, any data center.

This is a really exciting journey

and at the core of this strategy

is a brand new switch that we’ve made.

This is the Spectrum-4 switch.

And this switch,

Everything I’m showing today are very heavy.

Whoosh, like that.

(audience applauds)

This is the Spectrum-4 switch,

128 ports of 400 gigabits per second.

128 ports of 400 gigabits per second,

51.2 terabytes per second.

This is the chip.

It’s gigantic.

100 billion transistors,

90 millimeters by 90 millimeters,

800 balls on the bottom.

This is a 500 watt chip.

This switch is 2,800 watts.

It’s air cooled.

There are 48 PCBs that connect the switch together.

48 PCBs that build up the switch.

And the switch is designed,

the switch is designed to, wait (indistinct), oh,

this switch is designed to enable a new type of ethernet.

Remember what I said,

InfiniBand is fundamentally different

in the sense that we build InfiniBand from end to end,

so that we could do adaptive routing,

so that we could do congestion control,

so that we can isolate performance,

so we could keep noisy neighbors apart,

so that we could do in fabric computing.

All of these capabilities are simply not possible

in a loss-less approach of the internet

and of course of ethernet.

And so the way that we do Infinity Band

is designed from end-to-end.

Just the way supercomputers are built,

this is the way AI supercomputers are built.

And we are gonna do the same thing now,

for the very first time for ethernet,

we’ve been waiting for the critical part.

And the critical part is the Spectrum-4 switch.

The entire system consists of several things.

So our new ethernet system for AI,

(speaks foreign language),

is this,

the Spectrum-4 switch

and the BlueField-3 SmartNIC or DPU.

This BlueField-3 is 400 gigabits per second NIC,

it connects directly to the Spectrum-4 switch

in combination of four things,

The switch, the BlueField-3,

the cables that connect them together,

which are super important,

and the software that runs it all together

represents the Spectrum-4.

This is what it takes to build a high performance network.

And we’re gonna take this capability to the world’s CSPs.

The reception has been incredible,

and the reason for that is, of course,

every CSP, every data center would like to turn

every single data center into a generative AI data center.

There are some people that need,

they deployed ethernet throughout their company

and they have a lot of users for that data center.

The ability to have the capabilities of InfiniBand

and isolating it within their data center

is very difficult to do.

And so for the very first time,

we’re bringing the capabilities

of high performance computing into the ethernet market.

And we’re gonna bring to the ethernet market several things.

First, adaptive routing.

Adaptive routing basically says,

based on the traffic that is going through your data center,

depending on which one of the ports

of that switch is overcongested,

it will tell BlueField-3 to send,

and we’ll send it to another port,

BlueField-3 on the other end would reassemble it

and present the data to the CPU,

present the data to the computer, to the GPU,

without any CPU intervention.

All completely in RDMA.

Number one, adaptive routing.

Second, congestion control.

Congestion control, it is possible,

it is possible for certain different ports

to become heavily congested,

in which case the telemetry of the switch,

each switch will see how the network is performing

and communicate to the senders,

“Please don’t send any more data right away,

because you’re congesting the network.”

That congestion control requires

basically a overriding system,

which includes software to switch,

working with all of the endpoints

to overall manage the congestion

or the traffic and the throughput of the data center.

Now it’s really important to realize

that in a high performance computing application,

every single GPU must finish their job,

so that the application can move on.

In many cases where you do all reductions,

you have to wait until the results of every single one.

So if one node takes too long,

everybody gets held back,

this capability is going to increase

Ethernet’s overall performance dramatically.

So Spectrum-X, really excited to roll this out.

The world’s applications,

the world’s enterprise has yet to enjoy generative AI.

So far we’ve been working with CSPs.

And the CSPs, of course,

is going to be able to bring generative AI

to many different regions

and many different applications and industries.

The big journey is still ahead of us.

There are so many enterprises in the world,

and everybody, because of the multi-modality capability

that I was mentioning before,

every industry can now benefit from generative AI.

There’s several things that we have to do.

Number one,

we have to help the industries build custom language models.

Not everybody can use the language models

that are available in a public service.

Some customers need language models

that are highly specialized for their particular modality.

For example, proteins or chemicals.

Each one of these industries have proprietary information,

and so how can we help them do that?

We have created a service called NVIDIA AI Foundation.

It is a cloud service that captures NVIDIA’s AI expertise

and makes it possible for you to train your own AI models.

We will help you develop your own AI models

with supervised fine tuning, with guard railing,

with proprietary knowledge bases

and reinforcement learning human feedback,

so that this AI model is perfect for your application.

We then deploy this model to run on Nvidia AI Enterprise.

This is the operating system

that I was talking to you about earlier.

This operating system runs in every single cloud.

This allows this very simple system

with Nvidia AI Foundation for training large language models

and deploying the language model into Nvidia AI Enterprise,

which is available in every single model,

every single cloud in, on-prem,

allows every single enterprise to be able to engage.

Now, one of the things that very few people realize

is that today there’s only one software stack

that is enterprise secure and enterprise grade.

That software stack is CPU.

And the reason for that is

because in order to be enterprise grade,

it has to be enterprise secure,

it has to be enterprise managed

and enterprise supported across its entire life,

across its lifecycle.

There are so much software in accelerated computing.

Over 4,000 software packages is what it takes

for people to use accelerated computing today.

In data processing and training and optimization,

all the way to inference.

So for the very first time,

we are taking all of that software

and we’re gonna maintain it

and manage it like Red Hat does for Linux.

Nvidia AI Enterprise will do it

for all of NVIDIA’s libraries.

Now, enterprise can finally have

an enterprise grade and enterprise secure software stack.

This is such a big deal.

Otherwise,

even though the promise of accelerated computing

is possible for many researchers and scientists,

it’s not available for enterprise companies.

And so let’s take a look at the benefit for them.

This is a simple image processing application.

If you were to do it on a CPU versus on a GPU

running on enterprise Nvidia AI Enterprise,

you’re getting 31.8 images per minute

or basically 24 times the throughput,

or you only pay 5% of the cost.

This is really quite amazing.

This is the benefit of accelerated computing in the cloud.

But for many companies, enterprises is simply not possible

unless you have the stack.

Nvidia AI Enterprise is now fully integrated into

AWS, Google Cloud and Microsoft Azure and Oracle Cloud.

And so when you go and deploy your workloads in those clouds

and you want software that is enterprise grade,

or if you have customers that need enterprise grade software

Nvidia AI Enterprise is ready for you.

It is also integrated into the world’s

machine learning operations pipeline.

As I mentioned before,

AI is a different type of workload

and this type of, new type of software,

this new type of software has a whole new software industry.

And this software industry,

a hundred percent of them,

we have now connected with Nvidia AI Enterprise.

Now lemme talk to you about the next phase of AI,

where AI meets a digital twin.

Now, why does AI need a digital twin?

I’m gonna explain that in just a second,

but first, let me show you what you can do with it.

In order for AI to have a digital twin,

in order for AI to understand heavy industry,

remember, so far AI has only been used for light industry,

information, words, images, music, so on and so forth.

If we want to use AI for heavy industry,

the $50 trillion of manufacturing,

many of that you’re part of,

the trillions of dollars of healthcare,

all of the different manufacturing sites,

whether you’re building chip fabs or battery plants

or electric vehicle manufacturing factories,

all of these would have to be digitized

in order for artificial intelligence to be used,

to automate, to design, to build and to automate

the future of your business.

And so the first thing that we have to do

is we have to create the ability for their world

to be represented in digital.

Okay, so number one is digitalization.

Well, why does it, how would you use that?

So let me give you just simple example.

In the future, you would say to your robot,

I would like you to do something,

and the robot will understand your words

and it would generate animation.

Remember I said earlier,

you can go from text to text,

you can go from text to image,

you can go from text to music.

Why can’t you go from text to animation?

And so of course, in the future,

robotics will be highly revolutionized

by the technology we already have in front of us.

However,

how does this robot know

that the motion that it is generating

is grounded in reality?

It is grounded in physics.

You need a software system

that understands the laws of physics.

Now, you’ve actually seen this already with ChatGPT.

Whereas AI, Nvidia AI, would use Nvidia Omniverse

as in a reinforcement learning loop to ground itself.

You have seen ChatGPT do this

using reinforcement learning human feedback.

Using human’s feedback

ChatGPT was able to be developed

by grounding it to humans’, well, sensibility,

and align it with our principles.

So reinforcement learning with human feedback

is really important.

Reinforcement learning for physics feedback

is very important.

Let me show you everything

that you’re about to see is a simulation.

Let’s roll it, please.

(calm music plays)

(rousing music plays)

(rousing music continues)

(calm music plays)

(rousing music plays)

Everything was a simulation.

Nothing was art, everything was simulation.

Isn’t that amazing?

(audience applauds)

In the last 25 years,

I come to Taiwan, you sell me things.

(audience laughs)

Omniverse will be the first thing I’m gonna sell you.

And this,

(audience applauds)

because this will help you revolutionize your business

and turn it into a digital business

and automate it With AI.

You will build products in digital first,

before you make it physical,

you will build factories and plan it in digital first

before you make it in physical.

And so in the future, Omniverse is a very big deal.

Now, I’m gonna show you very quickly,

Omniverse in the cloud.

Omniverse, the entire stack is so complicated.

And so we put the whole thing into a cloud managed service

and it’s hosted in Azure.

This particular experience you’re gonna have,

the computer is in California.

And Sean, I’m sorry I took so much time,

so you’re gonna have to (indistinct).

[Sean] We’ll go quick.
[Jensen Huang] Okay.
[Sean] So this is,

let’s take a look at the Omniverse cloud.

So this is, you know, just a browser.

And we’re looking now into Omniverse Factory Explorer.

It’s running 10,000 kilometers away

in our Santa Clara headquarters,

and we’re leveraging the power of our data center now

to visualize this factory floor.

We’ve, using real factory data

from Simmons and Autodesk Revit to take a look.

It’s a cloud application,

so we can have multiple users collaborating.

Let’s go ahead and bring up Eloise screen.

And we can see,

now we have these two users in this environment,

and Jeff on the left there is gonna look at some markup.

We have this task to perform.

We need to move this object.

So we can have Eloise just go ahead

and grab that conveyor belt, move it over,

and as he does so,

you’ll see that it’s reflected

accurately and completely in real time on Jeff’s screen.

So we’re able to collaborate with multiple users.

And even in bringing up this demo,

we had users from around the globe working on the process.

East and West Coast, United States, Germany, even Sydney,

and of course here in Taipei to put this together.

Now, as we, if we’re modifying our production line,

of course one of the things we’ll want to do

is add the necessary safety equipment.

So we’re able to simply drag and drop items into Omniverse

and modify this production environment

and begin tweaking this

and optimizing for performance

even before we break around with construction.

[Jensen Huang] That is so cool.

This is in California,

6,264 miles away or something like that.

34 milliseconds by speed of light one way.

And it’s completely interactive.

Everything is Ray traced.

No art is necessary.

You bring everything, the entire CAD into Omniverse,

open up a browser,

bring your data in,

bring your factory in.

No art is necessary.

The lighting just does what the lighting does,

physics does what the physics does,

if you wanna turn off physics, you can,

if you wanna turn on physics, you can.

And multiple users, as many as you like,

can enter the Omniverse at the same time and work together.

One unified source of data across your entire company.

You could virtually build,

you could virtually design and build

and operate your factory

before you break ground and not make the mistake,

which usually in the beginning of the integration

creates a lot of change orders,

which costs a lot of money.

Thank you very much, Sean.

Good job.

(audience applauds)

Not only, notice just now,

it was humans interacting with Omniverse,

humans interacting with Omniverse.

In the future, Sean will even have a generative AI,

an AI interact with him in Omniverse.

We could, of course,

imagine in the very beginning

there was (indistinct) that could be a character,

that could be one of the users of Omniverse

interacting with you, answering questions, helping you.

We can also use generative AI

to help us create virtual worlds.

So for example, this is a bottle

that’s rendered in Omniverse

that could be placed in a whole bunch

of different type of environments.

It could render beautifully physically.

You could place it just by giving it a prompt by saying,

I would like to put this life,

put these bottles in a lifestyle photograph style backdrop

for of modern warm farmhouse bathroom.

Changed the background,

everything is all integrated and rendered again.

Okay, so generative AI will come together with Omniverse

to assist the virtual, the the creation of virtual worlds.

Today we’re announcing that WPP,

the world’s largest advertising agency

and advertising services company

is partnering with Nvidia to build

a content generation engine

based on Omniverse and generative AI.

It integrates tools from so many different other partners,

Adobe Firefly for example, Getty, Shutterstock.

And it integrates into this entire environment

and it makes it possible for them

to generate unique content for different users,

for ad applications. for example.

So in the future, whenever you engage a particular ad,

it could be generated just for you,

but yet the product is precisely rendered,

because of course the product integrity is very important.

And so every time that you engage

a particular ad in the future,

today it was retrieved,

and today the computing model,

when you engage information, it is retrieved.

In the future when you engage information,

much of it will be generated.

Notice the computing model has changed.

WPP generates 25% of the ads that the world sees.

60% of the world’s largest companies are already clients.

And so they made a video

of how they would use this technology.

(calm music plays)

[Voice Off-Screen] The world’s industries

are racing to realize the benefits of AI.

Nvidia and WPP are building a groundbreaking

generative AI enabled content engine

to enable the next evolution

of the $700 billion digital advertising industry.

Built on Nvidia AI and Omniverse

this engine gives brands the ability

to build and deploy highly personalized

and compelling visual content,

faster and more efficiently than ever before.

The process starts by building

a physically accurate digital twin of a product

using Omniverse Cloud,

which connects product design data

from industry standard tools.

Then WPP artists create customized and diverse virtual sets

using a combination of digitized environments

and generative AI tools by organizations,

such as Giddy Images and Adobe,

trained on fully licensed data using Nvidia Picasso.

(rousing music plays)

This unique combination of technologies allows WPP

to build accurate photorealistic visual content

and e-commerce experiences

that bring new levels of realism and scale to the industry.

(audience applauds)

(Jensen Huang knocks) (audience laughs)

(speaks foreign language), (audience laughs)

no problem, we continue.

Okay, so that’s WPP.

You could see, you could see, you could see

that that was an example.

If you think about a second,

that’s an example of a company using digital information

that was created in design,

and using that digital information all the way in marketing.

I’m gonna show you now how we’re gonna use Omniverse and AI

here in Taiwan and we’re gonna use it for manufacturing.

Manufacturing, as you know,

is one of the largest industries in the world.

We’re gonna use Omniverse to teach in AI

and then we’re gonna use Metropolis,

our AI deployment, Edge deployment system, to deploy the AI.

Okay, run it.

(upbeat music plays)

[Voice Off-Screen] The $45 trillion

global manufacturing industry

is comprised of 10 million factories operating 24/7.

Enterprises are racing to become software defined

to ensure they can produce high quality products

as quickly and cost efficiently as possible.

Let’s see how electronics manufacturer Pegatron

uses Nvidia AI and Omniverse to digitalize their factories.

In Omniverse, they start by building

a digital twin of their factory,

unifying disparate 3D and CAD data sets

to provide a real-time view of their complex factory data

to their planners and suppliers.

In the cloud native digital twin

planners can then optimize layout virtually

before deploying changes to the real factory.

The digital twin is also used

as a training ground at Data Factory,

for Pegatron’s perception AIs.

They use Nvidia Isaacs Sim built on Omniverse

to simulate and optimize their fleet of mobile robots,

which help move materials throughout the facility

as well as the pick and place robotic arms

that assist on production lines.

In the fully operational factory,

Pegatron deploys automated optical inspection or AOI points

along their production lines,

which reduces cost and increases line throughput.

Nvidia Metropolis enables Pegatron

to quickly develop and deploy cloud native,

highly accurate AOI workflows

across their production lines.

Omnibus replicator generates synthetic data sets

of PCBA defects,

which are too complex and costly

to capture in the real world,

like scratches and missing or misaligned components.

Pegatron then combines the synthetic data

with Nvidia pre-trained models,

Nvidia TAO for training, adaptation and optimization,

and NVIDIA DeepStream for realtime inference.

Resulting in AOI performance

that is 99.8% accurate

with a four times improvement in throughput.

With software defined factories

built on Nvidia AI and Omniverse

manufacturers can super accelerate factory bring up

and minimize change orders,

continuously optimize operations,

maximize production line throughput,

all while reducing costs.

Did you see that?

The whole factory is an Omniverse.

(audience applauds)

It’s completely digital.

Imagine if you have digital information in your hands,

what can you do with it?

Almost everything.

And so this is one of the things that’s really exciting.

What you just saw is basically every factory in the future

will be digital, of course, first.

Every factory will be a robot.

Inside the factories there will be other robots

that the factory is orchestrating.

We are also going to build robots that move themselves.

So far the robots that you saw are stationary.

Now we’re gonna also have robots that move.

Everything that move in the future

will have artificial intelligence

and will have robotic capability.

And so today we’re announcing our robot platform,

Nvidia Isaac AMR is now available as a reference design

for anybody who wants to build robots.

Just like we did with our high performance computing.

Nvidia builds the whole stack.

And then we disaggregate it,

so that if you would like to buy the chip, that’s fine,

if you’d like to buy the system, that’s fine,

if you like to use your software, that’s fine,

you’d like to use our software, that’s fine.

If you’d like to use your own algorithm, that’s terrific,

if you’d like to use ours, that’s terrific.

However you would like to work with us,

we’re open for business.

So that we can help you integrate accelerated computing

wherever you like.

In the future, we’re gonna do the same with robotics.

We built the entire robotics stack top to bottom

from the chip to the algorithms.

We have state-of-the-art perception

for multi-modality sensors,

state-of-the-art mapping,

state-of-the-art localization and planning,

and a cloud mapping system.

Everything has been created.

However you would like to use it.

You can use pieces of it.

It’s open, available for you,

including all the cloud mapping systems.

So this is Isaac AMR.

It includes, starts with a chip called Orin.

It goes into a computer

and it goes into the Nvidia Nova Orin,

which is a reference system,

a blueprint for AMRs.

This is the most advanced AMR in the world today.

And that entire stack has been built.

And let’s take a look at it.

(upbeat music plays)

[Voice Off-Screen] To improve productivity

and increase worker safety,

factories and warehouses are migrating away

from manual forklifts and guided vehicles to full autonomy.

Nvidia Isaac AMR provides an integrated end-to-end solution

to deploy fully autonomous mobile robots.

The core of the solution is Nova Orin,

a sensor suite and computing hardware

that enables mapping, autonomy and simulation.

Nova’s collection of advanced sensors

speeds the mapping process,

leveraging our cloud-based service

to generate an accurate and detailed 3D voxel map.

This 3D map can then be sliced across a plane

to generate 2D maps tailored for different autonomous robots

that might operate in a facility.

With these maps in place on robot lidar

or cost effective cameras

provide autonomous navigation

that works reliably in the most complex

and dynamic environments.

Isaac mission control optimizes route planning

using the (indistinct) library.

To improve operations developers can use Isaac Sim

and NVIDIA Omniverse to create realistic digital twins

of the operating environment.

This allows fully autonomous robots

to be trained on complex tasks entirely in simulation.

All operations can be fully validated

using Isaac Sim before deployment to the real world.

Isaac AMR accelerates your migration to full autonomy,

reducing costs, and speeding deployment

of the next generation of AMRs.

Nova cannot tell that it is not

in the reality environment.

Nova thinks it is in the real environment.

It cannot tell.

And the reason for that is because all the sensors work,

physics work, it can navigate, it can localize itself.

Everything is physically based.

So therefore we could design, we could design,

(speaks foreign language),

therefore we can design the robot,

simulate the robot,

train the robot all in Isaac,

and then we take the brain, Isaac Sim,

then we take the brain, the software,

and we put it into the actual robot.

And with some amount of adaptation,

it should be able to perform the same job.

This is the future of robotics.

Omniverse and AI working together.

The ecosystem that we have been in, the IT ecosystem,

is a quarter of a trillion dollars per year,

$250 billion a year.

This is the IT industry.

For the very first time in our history together,

we finally have the ability to understand

the language of the physical world.

We can understand the language of heavy industry

and we have a software tool.

We have a software system called Omniverse

that allows us to simulate, to develop,

to build and operate our physical plants,

our physical robots,

our physical assets,

as if they were digitally.

The excitement in the hard industries,

the heavy industries has been incredible.

We have been connecting Omniverse all over the world

with tools companies, robotics companies, sensor companies,

all kinds of industries.

There are three industries right now as we speak

that’s putting enormous investments into the world.

Number one, of course, it’s chip industry.

Number two, electric battery industry.

Number three, electric vehicle industry.

Trillions of dollars will be invested

in the next several years,

trillions of dollars will be invested

in the next several years.

And they would all like to do it better

and they would like to do it in a modern way.

For the very first time we now give them a system,

a platform, tools that allows them to do that.

I wanna thank all of you for coming today.

I talked about many things.

It’s been a long time since I’ve seen you,

so I had so much to tell you.

(audience laughs)

It was too much,

it was too much.

Last night I said this is too much.

This morning I said this is too much.

And now I realize it’s too much.

(speaks foreign language)

(audience laughs and applauds)

I told you, I told you several things.

I told you that we are going through two

simultaneous computing industry transition,

accelerated computing and generative AI.

Two.

This form of computing

is not like the traditional general purpose computing.

It is full stack.

It is data center scale

because the data center is the computer.

And it is domain specific,

for every domain that you want to go into,

every industry you go into,

you need to have the software stack.

And if you have the software stack,

then the utility, the utilization of your machine,

the utilization of your computer will be high.

So number two,

it is full stack data scanner scale and domain specific.

We are in full production of the engine of generative AI

and that is HGX H100.

Meanwhile, this engine that’s gonna be used for AI factories

will be scaled out using Grace Hopper,

the engine that we created for the era of generative AI.

We also took Grace Hopper

and realized that we can extend on the one hand

the performance,

but we also have to extend the fabric

so that we can make larger models trainable.

And we took Grace Hopper connected to 256 node NVLink

and created the largest GPU in the world, DGX GH200.

We’re trying to extend generative AI

and accelerated competing

in several different directions at the same time.

Number one,

we would like to of course, extend it in the cloud,

so that every cloud data center can be an AI data center,

not just AI factories and hyperscale.

But every hyperscale data center

can now be a generative AI data center.

And the way we do that is the Spectrum-X.

It takes four components to make Spectrum-X possible.

The switch, the BlueField-3 NIC,

the interconnects themselves,

the cables are so important in high speed communications

and the software stack that goes on top of it.

We would like to extend generative AI

to the world’s enterprise.

And there are so many different configurations of servers.

And the way we’re doing that with partnership

with our Taiwanese ecosystem,

the MGX modular accelerated computing systems.

We put Nvidia in the cloud

so that every enterprise in the world

can engage us to create generative AI models

and deploy it in a secure way, hyper,

in a enterprise grade,

enterprise secure way in every single cloud.

And lastly,

we would like to extend AI to the world’s heavy industries,

the largest industries in the world.

So far our industry, our industry that I’ve been,

all of us been part of,

has been part, a small part of the world’s total industry.

For the very first time the work that we’re doing

can engage every single industry.

And we do that by automating factories, automating robots.

And today we even announced our first

robotics full reference stack, the Nova Orin,

I wanna thank all of you

for your partnership over the years.

Thank you.

(audience applauds)

(upbeat accompaniment plays) ♪ I am here at Computex

♪ I hope that you do like me best ♪

♪ now my song say lo-long thank you ♪

♪ from Nvidia ♪