NVIDIA - GTC 2023 Keynote with NVIDIA CEO Jensen Huang

🎁Amazon Prime 📖Kindle Unlimited 🎧Audible Plus 🎵Amazon Music Unlimited 🌿iHerb 💰Binance

For nearly four decades

Moore’s Law has been the governing dynamics of the computer industry

which in turn has impacted every industry.

The exponential performance increase at constant cost and power has slowed.

Yet, computing advance has gone to lightspeed.

The warp drive engine is accelerated computing and the energy source is AI.

The arrival of accelerated computing and AI is timely

as industries tackle powerful dynamics

sustainability

generative AI

and digitalization.

Without Moore’s Law, as computing surges, data center power is skyrocketing

and companies struggle to achieve Net Zero.

The impressive capabilities of Generative AI

created a sense of urgency for companies to reimagine their products and business models.

Industrial companies are racing to digitalize and reinvent into software-driven tech companies

to be the disruptor

and not the disrupted.

Today, we will discuss how accelerated computing and AI are powerful tools for tackling these challenges

and engaging the enormous opportunities ahead.

We will share new advances in NVIDIA’s full-stack, datacenter-scale, accelerated computing platform.

We will reveal new chips and systems, acceleration libraries, cloud and AI services

and partnerships that open new markets.

Welcome to GTC!

GTC is our conference for developers.

The global NVIDIA ecosystem spans 4 million developers, 40,000 companies

and 14,000 startups.

Thank you to our Diamond sponsors for supporting us and making GTC 2023 a huge success.

We’re so excited to welcome more than 250,000 of you to our conference.

GTC has grown incredibly.

Only four years ago, our in-person GTC conference had 8,000 attendees.

At GTC 2023, we’ll learn from leaders like Demis Hassabis of DeepMind

Valeri Taylor of Argonne Labs

Scott Belsky of Adobe

Paul Debevec of Netflix

Thomas Schulthess of ETH Zurich

and a special fireside chat I’m having with Ilya Sutskever

co-founder of OpenAI, the creator of ChatGPT.

We have 650 amazing talks from the brightest minds in academia and the world’s largest industries:

There are more than 70 talks on Generative AI alone.

Other great talks, like pre-trained multi-task models for robotics…

sessions on synthetic data generation, an important method for advancing AI

including one on using Isaac Sim to generate physically based lidar point clouds

a bunch of talks on digital twins, from using AI to populate virtual factories of the future

to restoring lost Roman mosaics of the past

cool talks on computational instruments, including a giant optical telescope and a photon-counting CT

materials science for carbon capture and solar cells, to climate science, including our work on Earth-2

important works by NVIDIA Research on trustworthy AI and AV safety

From computational lithography for micro-chips, to make the smallest machines

to AI at the Large Hadron Collider to explain the universe.

The world’s most important companies are here from auto and transportation

healthcare, manufacturing, financial services,

retail, apparel, media and entertainment, telco

and of course, the world’s leading AI companies.

The purpose of GTC is to inspire the world on the art-of-the-possible of accelerating computing

and to celebrate the achievements of the scientists and researchers that use it.

I am a translator.

Transforming text into creative discovery,

movement into animation,

and direction into action.

I am a healer.

Exploring the building blocks that make us unique

modeling new threats before they happen

and searching for the cures to keep them at bay.

I am a visionary.

Generating new medical miracles

and giving us a new perspective on our sun

to keep us safe here on earth.

I am a navigator.

Discovering a unique moment in a sea of content

we’re announcing the next generation

and the perfect setting for any story.

I am a creator.

Building 3D experiences from snapshots

and adding new levels of reality to our virtual selves.

I am a helper.

Bringing brainstorms to life

sharing the wisdom of a million programmers

and turning ideas into virtual worlds.

Build northern forest.

I even helped write this script

breathed life into the words

and composed the melody.

I am AI.

Brought to life by NVIDIA, deep learning, and brilliant minds everywhere.

NVIDIA invented accelerated computing to solve problems that normal computers can’t.

Accelerated computing is not easy

it requires full-stack invention from chips, systems, networking,

acceleration libraries, to refactoring the applications.

Each optimized stack accelerates an application domain

from graphics, imaging, particle or fluid dynamics

quantum physics, to data processing and machine learning.

Once accelerated, the application can enjoy incredible speed-up, as well as scale-up across many computers.

The combination of speed-up and scale-up has enabled us to achieve a million-X

for many applications over the past decade

helping solve problems previously impossible.

Though there are many examples, the most famous is deep learning.

In 2012, Alex Kerchevsky, Ilya Suskever, and Geoff Hinton needed an insanely fast computer

to train the AlexNet computer vision model.

The researchers trained AlexNet with 14 million images on GeForce GTX 580

processing 262 quadrillion floating-point operations,

and the trained model won the ImageNet challenge by a wide margin, and ignited the Big Bang of AI.

A decade later, the transformer model was invented.

And Ilya, now at OpenAI, trained the GPT-3 large language model to predict the next word.

323 sextillion floating-point operations were required to train GPT-3.

One million times more floating-point operations than to train AlexNet.

The result this time – ChatGPT, the AI heard around the world.

A new computing platform has been invented.

The iPhone moment of AI has started.

Accelerated computing and AI have arrived.

Acceleration libraries are at the core of accelerated computing.

These libraries connect to applications which connect to the world’s industries, forming a network of networks.

Three decades in the making, several thousand applications are now NVIDIA accelerated

with libraries in almost every domain of science and industry.

All NVIDIA GPUs are CUDA-compatible, providing a large install base and significant reach for developers.

A wealth of accelerated applications attract end users, which creates a large market for cloud service providers

and computer makers to serve.

A large market affords billions in R&D to fuel its growth.

NVIDIA has established the accelerated computing virtuous cycle.

Of the 300 acceleration libraries and 400 AI models that span ray tracing and neural rendering

physical, earth, and life sciences, quantum physics and chemistry, computer vision

data processing, machine learning and AI, we updated 100

we updated 100 this year that increase performance and features for our entire installed base.

Let me highlight some acceleration libraries that solve new challenges and open new markets.

The auto and aerospace industries use CFD for turbulence and aerodynamics simulation.

The electronics industry uses CFD for thermal management design.

This is Cadence’s slide of their new CFD solver accelerated by CUDA.

At equivalent system cost, NVIDIA A100 is 9X the throughput of CPU servers.

Or at equivalent simulation throughput, NVIDIA is 9X lower cost or 17X less energy consumed.

Ansys, Siemens, Cadence, and other leading CFD solvers are now CUDA-accelerated.

Worldwide, industrial CAE uses nearly 100 billion CPU core hours yearly.

Acceleration is the best way to reclaim power and achieve sustainability and Net Zero.

NVIDIA is partnering with the global quantum computing research community.

The NVIDIA Quantum platform consists of libraries and systems for researchers to advance quantum programming models,

system architectures, and algorithms.

cuQuantum is an acceleration library for quantum circuit simulations.

IBM Qiskit, Google Cirq, Baidu Quantum Leaf, QMWare, QuEra, Xanadu Pennylane, Agnostiq, and AWS Bracket

have integrated cuQuantum into their simulation frameworks.

Open Quantum CUDA is our hybrid GPU-Quantum programming model.

IonQ, ORCA Computing, Atom, QuEra, Oxford Quantum Circuits, IQM, Pasqal, Quantum Brilliance, Quantinuum, Rigetti,

Xanadu, and Anyon have integrated Open Quantum CUDA.

Error correction on a large number of qubits is necessary to recover data from quantum noise and decoherence.

Today, we are announcing a quantum control link, developed in partnership with Quantum Machines

that connects NVIDIA GPUs to a quantum computer to do error correction at extremely high speeds.

Though commercial quantum computers are still a decade or two away, we are delighted to support this large and vibrant

research community with NVIDIA Quantum.

Enterprises worldwide use Apache Spark to process data lakes and warehouses

SQL queries, graph analytics, and recommender systems.

Spark-RAPIDS is NVIDIA’s accelerated Apache Spark data processing engine.

Data processing is the leading workload of the world’s $500B cloud computing spend.

Spark-RAPIDS now accelerates major cloud data processing platforms, including GCP Dataproc

Amazon EMR, Databricks, and Cloudera.

Recommender systems use vector databases to store, index, search, and retrieve massive datasets of unstructured data.

A new important use-case of vector databases is large language models to retrieve domain-specific or proprietary facts

that can be queried during text generation.

We are introducing a new library, RAFT, to accelerate indexing, loading the data

and retrieving a batch of neighbors for a single query.

We are bringing the acceleration of RAFT to Meta’s open-source FAISS AI Similarity Search, Milvus open-source vector DB

used by over 1,000 organizations, and Redis with over 4B docker pulls.

Vector databases will be essential for organizations building proprietary large language models.

Twenty-two years ago, operations research scientists Li and Lim posted a series of challenging pickup and delivery problems.

PDP shows up in manufacturing, transportation, retail and logistics, and even disaster relief.

PDP is a generalization of the Traveling Salesperson Problem and is NP-hard

meaning there is no efficient algorithm to find an exact solution.

The solution time grows factorially as the problem size increases.

Using an evolution algorithm and accelerated computing to analyze 30 billion moves per second

NVIDIA cuOpt has broken the world record and discovered the best solution for Li&Lim’s challenge.

AT&T routinely dispatches 30,000 technicians to service 13 million customers across 700 geographic zones.

Today, running on CPUs, AT&T’s dispatch optimization takes overnight.

AT&T wants to find a dispatch solution in real time that continuously optimizes for urgent customer needs

and overall customer satisfaction, while adjusting for delays and new incidents that arise.

With cuOpt, AT&T can find a solution 100X faster and update their dispatch in real time.

AT&T has adopted a full suite of NVIDIA AI libraries.

In addition to Spark-RAPIDS and cuOPT, they’re using Riva for conversational AI and Omniverse for digital avatars.

AT&T is tapping into NVIDIA accelerated computing and AI for sustainability, cost savings, and new services.

cuOpt can also optimize logistic services. 400 billion parcels are delivered to 377 billion stops each year.

Deloitte, Capgemini, Softserve, Accenture, and Quantiphi are using NVIDIA cuOpt to help customers optimize operations.

NVIDIA’s inference platform consists of three software SDKs.

NVIDIA TensorRT is our inference runtime that optimizes for the target GPU.

NVIDIA Triton is a multi-framework data center inference serving software supporting GPUs and CPUs.

Microsoft Office and Teams, Amazon, American Express, and the U.S. Postal Service

are among the 40,000 customers using TensorRT and Triton.

Uber uses Triton to serve hundreds of thousands of ETA predictions per second.

With over 60 million daily users, Roblox uses Triton to serve models for game recommendations

build avatars, and moderate content and marketplace ads.

We are releasing some great new features – model analyzer support for model ensembles, multiple concurrent model serving,

and multi-GPU, multi-node inference for GPT-3 large language models.

NVIDIA Triton Management Service is our new software that automates the scaling and orchestration

of Triton inference instances across a data center.

Triton Management Service will help you improve the throughput and cost efficiency of deploying your models.

50-80% of cloud video pipelines are processed on CPUs

consuming power and cost and adding latency.

CV-CUDA for computer vision, and VPF for video processing, are new cloud-scale acceleration libraries.

CV-CUDA includes 30 computer vision operators for detection, segmentation, and classification.

VPF is a python video encode and decode acceleration library.

Tencent uses CV-CUDA and VPF to process 300,000 videos per day.

Microsoft uses CV-CUDA and VPF to process visual search.

Runway is a super cool company that uses CV-CUDA and VPF to process video

for their cloud Generative AI video editing service.

Already, 80% of internet traffic is video.

User-generated video content is driving significant growth and consuming massive amounts of power.

We should accelerate all video processing and reclaim the power.

CV-CUDA and VPF are in early access.

NVIDIA accelerated computing helped achieve a genomics milestone

now doctors can draw blood and sequence a patient’s DNA in the same visit.

In another milestone, NVIDIA-powered instruments reduced the cost of whole genome sequencing to just $100.

Genomics is a critical tool in synthetic biology with applications ranging from drug discovery

and agriculture to energy production.

NVIDIA Parabricks is a suite of AI-accelerated libraries for end-to-end genomics analysis in the cloud or in-instrument.

NVIDIA Parabricks is available in every public cloud and genomics platforms like Terra, DNAnexus, and FormBio.

Today, we’re announcing Parabricks 4.1 and will run on NVIDIA-accelerated genomics instruments

from PacBio, Oxford Nanopore, Ultima, Singular, BioNano, and Nanostring.

The world’s $250B medical instruments market is being transformed.

Medical instruments will be software-defined and AI powered.

NVIDIA Holoscan is a software library for real-time sensor processing systems.

Over 75 companies are developing medical instruments on Holoscan.

Today, we are announcing Medtronic, the world leader in medical instruments, and NVIDIA are building their AI platform

for software-defined medical devices.

This partnership will create a common platform for Medtronic systems, ranging from surgical navigation

to robotic-assisted surgery.

Today, Medtronic announced that its next-generation GI Genius system, with AI for early detection of colon cancer

is built on NVIDIA Holoscan and will ship around the end of this year.

The chip industry is the foundation of nearly every industry.

Chip manufacturing demands extreme precision, producing features 1,000 times smaller than a bacterium

and on the order of a single gold atom or a strand of human DNA.

Lithography, the process of creating patterns on a wafer, is the beginning of the chip manufacturing process

and consists of two stages – photomask making and pattern projection.

It is fundamentally an imaging problem at the limits of physics.

The photomask is like a stencil of a chip. Light is blocked or passed through the mask

to the wafer to create the pattern.

The light is produced by the ASML EUV extreme ultraviolet lithography system.

Each system is more than a quarter-of-a-billion dollars.

ASML EUV uses a radical way to create light.

Laser pulses firing 50,000 times a second at a drop of tin, vaporizing it, creating a plasma that emits 13.5nm EUV light

nearly X-ray.

Multilayer mirrors guide the light to the mask.

The multilayer reflectors in the mask reticle take advantage of interference patterns of the 13.5nm light

to create finer features down to 3nm.

Magic.

The wafer is positioned within a quarter of a nanometer and aligned 20,000 times a second to adjust for any vibration.

The step before lithography is equally miraculous.

Computational lithography applies inverse physics algorithms to predict the patterns on the mask

that will produce the final patterns on the wafer.

In fact, the patterns on the mask do not resemble the final features at all.

Computational lithography simulates Maxwell’s equations of the behavior of light passing through optics

and interacting with photoresists.

Computational lithography is the largest computation workload in chip design and manufacturing

consuming tens of billions of CPU hours annually.

Massive data centers run 24/7 to create reticles used in lithography systems.

These data centers are part of the nearly $200 billion annual CAPEX invested by chip manufacturers.

Computational lithography is growing fast as algorithm complexity increases

enabling the industry to go to 2nm and beyond.

NVIDIA today is announcing cuLitho, a library for computational lithography.

cuLitho, a massive body of work that has taken nearly four years, and with close collaborations with TSMC,

ASML, and Synopsys, accelerates computational lithography by over 40X.

There are 89 reticles for the NVIDIA H100.

Running on CPUs, a single reticle currently takes two weeks to process.

cuLitho, running on GPUs, can process a reticle in a single 8-hour shift.

TSMC can reduce their 40,000 CPU servers used for computational lithography by accelerating with cuLitho

on just 500 DGX H100 systems, reducing power from 35MW to just 5MW.

With cuLitho, TSMC can reduce prototype cycle time, increase throughput

and reduce the carbon footprint of their manufacturing, and prepare for 2nm and beyond.

TSMC will be qualifying cuLitho for production starting in June.

Every industry needs to accelerate every workload, so that we can reclaim power and do more with less.

Over the past ten years, cloud computing has grown 20% annually into a massive $1T industry.

Some 30 million CPU servers do the majority of the processing.

There are challenges on the horizon.

As Moore’s Law ends, increasing CPU performance comes with increased power.

And the mandate to decrease carbon emissions is fundamentally at odds with the need to increase data centers.

Cloud computing growth is power-limited.

First and foremost, data centers must accelerate every workload.

Acceleration will reclaim power.

The energy saved can fuel new growth.

Whatever is not accelerated will be processed on CPUs.

The CPU design point for accelerated cloud datacenters differs fundamentally from the past.

In AI and cloud services, accelerated computing offloads parallelizable workloads, and CPUs process other workloads,

like web RPC and database queries.

We designed the Grace CPU for an AI and cloud-first world, where AI workloads are GPU-accelerated

and Grace excels at single-threaded execution and memory processing.

It’s not just about the CPU chip. Datacenter operators optimize for throughput and total cost of ownership of the entire datacenter.

We designed Grace for high energy-efficiency at cloud datacenter scale.

Grace comprises 72 Arm cores connected by a super high-speed on-chip scalable coherent fabric that delivers 3.2 TB/sec

of cross-sectional bandwidth.

Grace Superchip connects 144 cores between two CPU dies over a 900 GB/sec low-power chip-to-chip coherent interface.

The memory system is LPDDR low-power memory, like used in cellphones, that we specially enhanced for use in datacenters.

It delivers 1 TB/s, 2.5x the bandwidth of today’s systems at 1/8th the power.

The entire 144-core Grace Superchip module with 1TB of memory is only 5x8 inches.

It is so low power it can be air cooled.

This is the computing module with passive cooling.

Two Grace Superchip computers can fit in a single 1U air-cooled server.

Grace’s performance and power efficiency are excellent for cloud and scientific computing applications.

We tested Grace on a popular Google benchmark, which tests how quickly cloud microservices communicate

and the Hi-Bench suite that tests Apache Spark memory-intensive data processing.

These kinds of workloads are foundational for cloud datacenters.

At microservices, Grace is 1.3X faster than the average of the newest generation x86 CPUs

and 1.2X faster at data processing

And that higher performance is achieved using only 60% of the power measured at the full server node.

CSPs can outfit a power-limited data center with 1.7X more Grace servers, each delivering 25% higher throughput.

At iso-power, Grace gives CSPs 2X the growth opportunity.

Grace is sampling.

And Asus, Atos, Gigabyte, HPE, QCT, Supermicro, Wistron, and ZT are building systems now.

In a modern software-defined data center, the operating system doing virtualization, network, storage, and security can

consume nearly half of the datacenter’s CPU cores and associated power.

Datacenters must accelerate every workload to reclaim power and free CPUs for revenue-generating workloads.

NVIDIA BlueField offloads and accelerates the datacenter operating system and infrastructure software.

Over two dozen ecosystem partners, including Check Point, Cisco, DDN, Dell EMC

Juniper, Palo Alto Networks, Red Hat, and VMWare,

use BlueField’s datacenter acceleration technology to run their software platforms more efficiently.

BlueField-3 is in production and adopted by leading cloud service providers, Baidu, CoreWeave, JD.com, Microsoft Azure,

Oracle OCI, and Tencent Games, to accelerate their clouds.

NVIDIA accelerated computing starts with DGX the world’s AI supercomputer

the engine behind the large language model breakthrough.

I hand-delivered the world’s first DGX to OpenAI.

Since then, half of the Fortune 100 companies have installed DGX AI supercomputers.

DGX has become the essential instrument of AI.

The GPU of DGX is eight H100 modules.

H100 has a Transformer Engine designed to process models like the amazing ChatGPT,

which stands for Generative Pre-trained Transformers.

The eight H100 modules are NVLINK’d to each other across NVLINK switches to allow fully non-blocking transactions.

The eight H100s work as one giant GPU.

The computing fabric is one of the most vital systems of the AI supercomputer.

400 Gbps ultra-low latency NVIDIA Quantum InfiniBand

with in-network processing

connects hundreds and thousands of DGX nodes

into an AI supercomputer.

NVIDIA DGX H100 is the blueprint for customers building AI infrastructure worldwide.

It is now in full production.

I am thrilled that Microsoft announced Azure is opening private previews to their H100 AI supercomputer.

Other systems and cloud services will soon come from Atos, AWS, Cirrascale, CoreWeave, Dell, Gigabyte, Google, HPE,

Lambda Labs, Lenovo, Oracle, Quanta, and SuperMicro.

The market for DGX AI supercomputers has grown significantly.

Originally used as an AI research instrument, DGX AI supercomputers are expanding into operation

running 24/7 to refine data and process AI.

DGX supercomputers are modern AI factories.

We are at the iPhone moment of AI.

Start-ups are racing to build disruptive products and business models, while incumbents are looking to respond.

Generative AI has triggered a sense of urgency in enterprises worldwide to develop AI strategies.

Customers need to access NVIDIA AI easier and faster.

We are announcing NVIDIA DGX Cloud through partnerships with Microsoft Azure, Google GCP, and Oracle OCI

to bring NVIDIA DGX AI supercomputers to every company, instantly, from a browser.

DGX Cloud is optimized to run NVIDIA AI Enterprise, the world’s leading acceleration library suite

for end-to-end development and deployment of AI.

DGX Cloud offers customers the best of NVIDIA AI and the best of the world’s leading cloud service providers.

This partnership brings NVIDIA’s ecosystem to the CSPs, while amplifying NVIDIA’s scale and reach.

This win-win partnership gives customers racing to engage Generative AI instant access to NVIDIA in global-scale clouds.

We’re excited by the speed, scale, and reach of this cloud extension of our business model.

Oracle Cloud Infrastructure, OCI, will be the first NVIDIA DGX Cloud.

OCI has excellent performance. They have a two-tier computing fabric and management network.

NVIDIA’s CX-7, with the industry’s best RDMA, is the computing fabric.

And BlueField-3 will be the infrastructure processor for the management network.

The combination is a state-of-the-art DGX AI supercomputer that can be offered as a multi-tenant cloud service.

We have 50 early access enterprise customers, spanning consumer internet and software, healthcare

media and entertainment, and financial services.

ChatGPT, Stable Diffusion, DALL-E, and Midjourney have awakened the world to Generative AI.

These applications’ ease-of-use and impressive capabilities attracted over a hundred million users in just a few months

  • ChatGPT is the fastest-growing application in history.

No training is necessary. Just ask these models to do something.

The prompts can be precise or ambiguous. If not clear,

through conversation, ChatGPT learns your intentions.

The generated text is beyond impressive.

ChatGPT can compose memos and poems, paraphrase a research paper, solve math problems,

highlight key points of a contract, and even code software programs.

ChatGPT is a computer that not only runs software but writes software.

Many breakthroughs led to Generative AI.

Transformers learn context and meaning from the relationships and dependencies of data, in parallel and at large scale.

This led to large language models that learn from so much data

they can perform downstream tasks without explicit training.

And diffusion models, inspired by physics, learn without supervision to generate images.

In just over a decade, we went from trying to recognize cats to generating realistic images of a cat

in a space suit

walking on the moon.

Generative AI is a new kind of computer — one that we program in human language.

This ability has profound implications. Everyone can direct a computer to solve problems.

This was a domain only for computer programmers.

Now everyone is a programmer.

Generative AI is a new computing platform like PC, internet, mobile, and cloud.

And like in previous computing eras, first-movers are creating new applications

and founding new companies to capitalize on Generative AI’s ability to automate and co-create.

Debuild lets users design and deploy web applications just by explaining what they want.

Grammarly is a writing assistant that considers context.

Tabnine helps developers write code.

Omnekey generates customized ads and copy.

Kore.ai is a virtual customer service agent.

Jasper generates marketing material. Jasper has written nearly 5 billion words,

reducing time to generate the first draft by 80%.

Insilico uses AI to accelerate drug design.

Absci is using AI to predict therapeutic antibodies.

Generative AI will reinvent nearly every industry.

Many companies can use one of the excellent Generative AI APIs coming to market.

Some companies need to build custom models, with their proprietary data, that are experts in their domain.

They need to set up usage guardrails and refine their models to align

with their company’s safety, privacy, and security requirements.

The industry needs a foundry, a TSMC, for custom large language models.

Today, we announce the NVIDIA AI Foundations

a cloud service for customers needing to build, refine, and operate

custom LLMlarge language models and Generative AI

trained with their proprietary data

and for their domain-specific tasks.

NVIDIA AI Foundations comprises Language,

Visual, and Biology model-making services.

NVIDIA Nemo is for building custom language text-to-text

generative models.

Customers can bring their model or start with the Nemo pre-trained language models, ranging from GPT-8, GPT-43

and GPT-530 billion parameters.

Throughout the entire process, NVIDIA AI experts will work with you, from creating your proprietary model to operations.

Let’s take a look.

Generative models, like NVIDIA’s 43B foundational model, learn by training on billions of sentences

and trillions of words.

As the model converges, it begins to understand the relationships between words and their underlying concepts

captured in the weights in the embedding space of the model.

Transformer models use a technique called self attention: a mechanism designed to learn dependencies and relationships

within a sequence of words.

The result is a model that provides the foundation for a ChatGPT-like experience.

These generative models require expansive amounts of data

deep AI expertise for data processing and distributed training

and large scale compute to train, deploy and maintain at the pace of innovation.

Enterprises can fast-track their generative AI adoption

with NVIDIA NeMo service running on NVIDIA DGX Cloud.

The quickest path is starting with one of NVIDIA’s state-of-the-art

pre-trained foundation models.

With the NeMo service, organizations can easily customize a model

with p-tuning to teach it specialized skills

like summarizing financial documents

creating brand-specific content

and composing emails with personalized writing styles.

Connecting the model to a proprietary knowledge base

ensures that responses are accurate, current

and cited for their business.

Next, they can provide guardrails by adding logic

and monitoring inputs, outputs, toxicity, and bias thresholds

so it operates within a specified domain

and prevents undesired responses.

After putting the model to work, it can continuously improve

with reinforcement learning based on user interactions.

And NeMo’s playground is available for rapid prototyping before moving to the cloud API

for larger-scale evaluation and application integration.

Sign up for the NVIDIA NeMo service today

to codify your enterprise’s knowledge into a personalized

AI model that you control.

Picasso is a visual language model-making service for customers who want to build custom models

trained with licensed or proprietary content.

Let’s take a look.

Generative AI is transforming how visual content is created.

But to realize its full potential, enterprises need massiveamounts of copyright-cleared data, AI experts, and an AI supercomputer.

NVIDIA Picasso is a cloud service for building and deploying

generative AI-powered image, video, and 3D applications.

With it, enterprises, ISVs, and service providers

can deploy their own models.

We’re working with premier partners to bring

generative AI capabilities to every industry

Organizations can also start with NVIDIA Edify models

and train them on their data to create a product or service.

These models generate images, videos, and 3D assets.

To access generative AI models

applications send an API call with text prompts

and metadata to Picasso.

Picasso uses the appropriate model running on NVIDIA DGX Cloud

to send back the generated asset to the application.

This can be a photorealistic image, a high-resolution video, or a detailed 3D geometry.

Generated assets can be imported into editing tools or into NVIDIA Omniverse to build photorealistic virtual worlds,

metaverse applications, and digital twin simulations.

With NVIDIA Picasso services running on NVIDIA DGX Cloud

you can streamline training, optimization, and inference

needed to build custom generative AI applications.

See how NVIDIA Picasso can bring transformative generative AI capabilities into your applications.

We are delighted that Getty Images will use the Picasso service to build Edify-image and Edify-video generative models

trained on their rich library of responsibly licensed professional images and video assets.

Enterprises will be able to create custom images and video with simple text or image prompts.

Shutterstock is developing an Edify-3D generative model

trained on their professional image, 3D, and video assets library.

Shutterstock will help simplify the creation of 3D assets for creative production, digital twins and virtual collaboration,

making these workflows faster and easier for enterprises to implement.

And I’m thrilled to announce a significant expansion of our long-time partnership with Adobe

to build a set of next-generation AI capabilities for the future of creativity

integrating generative AI into the everyday workflows of marketers and creative professionals.

The new Generative AI models will be optimized

for image creation, video, 3D, and animation.

To protect artists’ rights, Adobe is developing with a focus on commercial viability and proper content attribution

powered by Adobe’s Content Authenticity Initiative.

Our third language domain is biology.

Drug discovery is a nearly $2T industry

with $250B dedicated to R&D.

NVIDIA’s Clara is a healthcare application framework for imaging

instruments, genomics, and drug discovery.

The industry is now jumping onto generative AI to discover disease targets

design novel molecules or protein-based drugs, and predict the behavior of the medicines in the body.

Insilico Medicine, Exscientia, Absci, and Evozyme, are among hundreds of new AI drug discovery start-ups.

Several have discovered novel targets or drug candidates and have started human clinical trials.

BioNeMo helps researchers create

fine-tune, and serve custom models with their proprietary data.

Let’s take a look.

There are 3 key stages to drug discovery

discovering the biology that causes disease

designing new molecules - whether those are small-molecules, proteins or antibodies

and finally screening how those molecules interact with each other.

Today, Generative AI is transforming every step of the drug discovery process.

NVIDIA BioNeMo Service provides state-of-the-art

generative AI models for drug discovery.

It’s available as a cloud service, providing instant and easy access to accelerated drug discovery workflows.

BioNeMo includes models like AlphaFold, ESMFold and OpenFold

for 3D protein structure prediction.

ProtGPT for protein generation,

ESM1 and ESM2 for protein property prediction

MegaMolBART and MoFlow and for molecule generation

and DiffDock for molecular docking.

Drug discovery teams can use the models through BioNeMo’s web interface

or cloud APIs.

Here is an example of using NVIDIA BioNeMo

for drug discovery virtual screening.

Generative models can now read a proteins amino acid sequence

and in seconds, accurately predict the structure of a target protein.

They can also generate molecules with desirable ADME properties that optimize how a drug behaves in the body.

Generative models can even predict the 3D interactions of a protein and molecule

accelerating the discovery of optimal drug candidates.

With NVIDIA DGX Cloud BioNeMo also provides on-demand super computing infrastructure to further optimize and train models,

saving teams valuable time and money so they can focus on discovering life saving medicines.

The new AI drug discovery pipelines are here.

Sign up for access for NVIDIA BioNeMo Service.

We will continue to work with the industry to include models into BioNemo

that encompass the end-to-end workflow of drug discovery and virtual screening.

Amgen, AstraZeneca, Insilico Medicine, Evozyne, Innophore, and Alchemab Therapeutics are early access users of BioNeMo.

NVIDIA AI Foundations, a cloud service, a foundry, for building custom language models and Generative AI.

Since AlexNet a decade ago, deep learning has opened giant new markets — automated driving, robotics, smart speakers,

and reinvented how we shop, consume news, and enjoy music.

That’s just the tip of the iceberg.

AI is at an inflection point as Generative AI has started a new wave of opportunities, driving a step-function increase

in inference workloads.

AI can now generate diverse data, spanning voice, text, images, video, and 3D graphics to proteins and chemicals.

Designing a cloud data center to process Generative AI is a great challenge.

On the one hand, a single type of accelerator is ideal, because it allows the datacenter to be elastic

and handle the unpredictable peaks and valleys of traffic.

On the other hand, no one accelerator can optimally process the diversity of algorithms, models, data types, and sizes.

NVIDIA’s One Architecture platform offers both acceleration and elasticity.

Today, we are announcing our new inference platform - four configurations - one architecture - one software stack.

Each configuration is optimized for a class of workloads.

For AI video workloads, we have L4 optimized for video decoding and transcoding, video content moderation,

and video call features like background replacement, relighting, making eye contact,

transcription, and real-time language translation.

Most cloud videos today are processed on CPUs.

One 8-GPU L4 server will replace over a hundred dual-socket CPU servers for processing AI Video.

Snap is a leading user of NVIDIA AI for computer vision and recommender systems.

Snap will use L4 for AV1 video processing, generative AI, and augmented reality.

Snapchat users upload hundreds of millions of videos every day.

Google announced today NVIDIA L4 on GCP.

NVIDIA and Google Cloud are working to deploy major workloads on L4.

Let me highlight five.

First, we’re accelerating inference for generative AI models for cloud services like Wombo and Descript.

Second, we’re integrating Triton Inference Server with Google Kubernetes Engine and VertexAI.

Third, we’re accelerating Google Dataproc with NVIDIA Spark-RAPIDS.

Fourth, we’re accelerating AlphaFold, and UL2 and T5 large language models.

And fifth, we are accelerating Google Cloud’s Immersive Stream that renders 3D and AR experiences.

With this collaboration, Google GCP is a premiere NVIDIA AI cloud.

We look forward to telling you even more about our collaboration very soon.

For Omniverse, graphics rendering and generative AI like text-to-image and text-to-video, we are announcing L40.

L40 is up to 10 times the performance of NVIDIA’s T4, the most popular cloud inference GPU.

Runway is a pioneer in Generative AI.

Their research team was a key creator of Stable Diffusion and its predecessor, Latent Diffusion.

Runway is inventing generative AI models for creating and editing content.

With over 30 AI Magic Tools, their service is revolutionizing the creative process, all from the cloud.

Let’s take a look.

Runway is making amazing AI-powered video editing and image creation tools accessible to everyone.

Powered by the latest generation of NVIDIA GPUs running locally or in the cloud, Runway makes it possible

to remove an object from a video with just a few brush strokes.

Or apply different styles to video using just an input image.

Or change the background or the foreground of a video.

What used to take hours using conventional tools can now be completed with professional broadcast quality results

in just a few minutes.

Runway does this by utilizing CV-CUDA, an open-source project that enables developers to build highly efficient

GPU-accelerated pre- and post-processing pipelines for computer vision workloads and scale them into the cloud.

With NVIDIA technology, Runway is able to make impossible things to give the best experience to content creators.

What previously limited pros can now be done by you.

In fact, Runway is used in Oscar-nominated Hollywood films and we are placing this technology

in the hands of the world’s creators.

Large language models like ChatGPT are a significant new inference workload.

GPT models are memory and computationally intensive.

Furthermore, inference is a high-volume, scale-out workload and requires standard commodity servers.

For large language model inference, like ChatGPT, we are announcing a new Hopper GPU — the PCIE H100

with dual-GPU NVLINK. The new H100 has 94GB of HBM3 memory.

H100 can process the 175-billion-parameter GPT-3

and supporting commodity PCIE servers make it easy to scale out.

The only GPU in the cloud today that can practically process ChatGPT is HGX A100.

Compared to HGX A100 for GPT-3 processing, a standard server with four pairs of H100 with dual-GPU NVLINK

is up to 10X faster.

H100 can reduce large language model processing costs by an order of magnitude.

Grace Hopper is our new superchip that connects Grace CPU and Hopper GPU over a high-speed 900 GB/sec

coherent chip-to-chip interface.

Grace Hopper is ideal for processing giant data sets like AI databases for recommender systems

and large language models.

Today, CPUs, with large memory, store and query giant embedding tables then transfer results to GPUs for inference.

With Grace-Hopper, Grace queries the embedding tables and transfers the results directly to Hopper

across the high-speed interface – 7 times faster than PCIE.

Customers want to build AI databases several orders of magnitude larger.

Grace-Hopper is the ideal engine.

This is NVIDIA’s inference platform – one architecture for diverse AI workloads,

and maximum datacenter acceleration and elasticity.

The world’s largest industries make physical things, but they want to build them digitally.

Omniverse is a platform for industrial digitalization that bridges digital and physical.

It lets industries design, build, operate, and optimize physical products and factories digitally,

before making a physical replica.

Digitalization boosts efficiency and speed and saves money.

One use of Omniverse is the virtual bring-up of a factory, where all of its machinery is integrated digitally

before the real factory is built.

This reduces last-minute surprises, change orders, and plant opening delays.

Virtual factory integration can save billions for the world’s factories.

The semiconductor industry is investing half a trillion dollars to build a record 84 new fabs.

By 2030, auto manufacturers will build 300 factories to make 200 million electric vehicles.

And battery makers are building 100 more mega factories.

Digitalization is also transforming logistics, moving goods through billions of square feet of warehouses worldwide.

Let’s look at how Amazon uses Omniverse to automate, optimize, and plan its autonomous warehouses.

Amazon Robotics has manufactured and deployed the largest fleet of mobile industrial robots in the world.

The newest member of this robotic fleet is Proteus, Amazon’s first fully autonomous warehouse robot.

Proteus is built to move through our facilities using advanced safety, perception, and navigation technology.

Let’s see how NVIDIA Isaac Sim, built on Omniverse is creating physically accurate, photoreal simulations

to help accelerate Proteus deployments.

Proteus features multiple sensors that include cameras, lidars, and ultrasonic sensors

to power it’s autonomy software systems.

The Proteus team needed to improve the performance of a neural network that read fiducial markers and helped the robot

determine its location on the map.

It takes lots of data—and the right kind—to train the ML models that are driven by the robot sensor input.

With Omniverse Replicator in Isaac Sim, Amazon Robotics was able to generate large photoreal synthetic datasets that improved

the marker detection success rate from 88.6% to 98%.

The use of the synthetic data generated by Omniverse Replicator also sped up development times, from months to days,

as we were able to iteratively test and train our models much faster than when only using real data.

To enable new autonomous capabilities for the expanding fleet of Proteus robots, Amazon Robotics is working towards

closing the gap from simulation to reality, building large scale multi-sensor, multi-robot simulations.

With Omniverse, Amazon Robotics will optimize operations with full fidelity warehouse digital twins.

Whether we’re generating synthetic data or developing new levels of autonomy, Isaac Sim on Omniverse

helps the Amazon Robotics team save time and money as we deploy Proteus across our facilities.

Omniverse has unique technologies for digitalization.

And Omniverse is the premier development platform for USD, which serves as a common language that lets teams collaborate

to create virtual worlds and digital twins.

Omniverse is physically based, mirroring the laws of physics.

It can connect to robotic systems and operate with hardware-in-the-loop.

It features Generative AI to accelerate the creation of virtual worlds.

And Omniverse can manage data sets of enormous scale.

We’ve made significant updates to Omniverse in every area.

Let’s take a look.

Nearly 300,000 creators and designers have downloaded Omniverse.

Omniverse is not a tool, but a USD network and shared database,

a fabric connecting to design tools used across industries.

It connects, composes, and simulates the assets created by industry-leading tools.

We are delighted to see the growth of Omniverse connections.

Each connection links the ecosystem of one platform to the ecosystems of all the others.

Omniverse’s network of networks is growing exponentially.

Bentley Systems LumenRT is now connected.

So are Siemens Teamcenter, NX, and Process Simulate, Rockwell Automation Emulate 3D, Cesium, Unity, and many more.

Let’s look at the digitalization of the $3T auto industry

and see how car companies are evaluating Omniverse in their workflows.

Volvo Cars and GM use Omniverse USD Composer to connect and unify their asset pipelines.

GM connects designers, sculptors, and artists using Alias, Siemens NX, Unreal, Maya, 3ds Max,

and virtually assembles the components into a digital twin of the car.

In engineering and simulation, they visualize the power flow aerodynamics in Omniverse.

For next-generation Mercedes-Benz and Jaguar Land Rover vehicles, engineers use Drive Sim in Omniverse to generate

synthetic data to train AI models, validate the active-safety system against a virtual NCAP driving test,

and simulate real driving scenarios.

Omniverse’s generative AI reconstructs previously driven routes into 3D

so past experiences can be reenacted or modified.

Working with Idealworks, BMW uses Isaac Sim in Omniverse to generate synthetic data

and scenarios to train factory robots.

Lotus is using Omniverse to virtually assemble welding stations.

Toyota is using Omniverse to build digital twins of their plants.

Mercedes-Benz uses Omniverse to build, optimize, and plan assembly lines for new models.

Rimac and Lucid Motors use Omniverse to build digital stores from actual design data that faithfully represent their cars.

BMW is using Omniverse to plan operations across nearly three dozen factories worldwide.

And they are building a new EV factory, completely in Omniverse, two years before the physical plant opens.

Let’s visit.

The world’s industries are accelerating digitalization with over $3.4 trillion being invested in the next three years.

We at BMW strive to be leading edge in automotive digitalization.

With NVIDIA Omniverse and AI we set up new factories faster and produce more efficiently than ever.

This results in significant savings for us.

It all starts with planning – a complex process in which we need to connect many tools,

datasets and specialists around the world.

Traditionally, we are limited, since data is managed separately in a variety of systems and tools.

Today, we’ve changed all that.

We are developing custom Omniverse applications to connect our existing tools, know-how and teams

all in a unified view.

Omniverse is cloud-native and cloud-agnostic enabling teams to collaborate across our virtual factories from everywhere.

I’m about to join a virtual planning session for Debrecen in Hungary – our new EV factory – opening in 2025.

Letʼs jump in.

Planner 1: Ah, Milan is joining.

Milan: Hello, everyone!

Planner 1:Hi Milan – great to see you, we’re in the middle of an optimization loop for our body shop.

Would you like to see?

Milan: Thanks – I’m highly interested. And I’d like to invite a friend.

Planner 1: Sure.

Jensen: Hey Milan! Good to see you.

Milan: Jensen, welcome to our virtual planning session.

Jensen: Its great to be here. What are we looking at?

Milan: This is our global planning team who are working on a robot cell in Debrecen’s digital twin.

Matthias, tell us what’s happening …

Matthias: So, we just learned the production concept requires some changes.

We’re now reconfiguring the layout to add a new robot into the cell.

Planner 2: Ok, but if we add a new robot, on the logistics side, we’ll need to move our storage container.

Planner 3: Alright, let’s get this new robot in.

Matthias: That’s perfect. But let’s double-check - can we run the cell?

Excellent.

Jensen: Milan, this is just incredible!

Virtual factory integration is essential for every industry.

I’m so proud to see what our teams did together. Congratulations!

Milan: We are working globally to optimize locally.

After planning, operations is king, and we’ve already started!

To celebrate the launch of our virtual plant, I’d like to invite you to open the first digital factory with me.

Jensen: I’d be honored. Let’s do it!

Car companies employ nearly 14 million people.

Digitalization will enhance the industry’s efficiency, productivity, and speed.

Omniverse is the digital-to-physical operating system to realize industrial digitalization.

Today we are announcing three systems designed to run Omniverse.

First, we’re launching a new generation of workstations powered by NVIDIA Ada RTX GPUs and Intel’s newest CPUs.

The new workstations are ideal for doing ray tracing, physics simulation, neural graphics, and generative AI.

They will be available from Boxx, Dell, HP, and Lenovo starting in March.

Second, new NVIDIA OVX servers optimized for Omniverse.

OVX consists of L40 Ada RTX server GPUs and our new BlueField-3.

OVX servers will be available from Dell, HPE, Quanta, Gigabyte, Lenovo, and Supermicro.

Each layer of the Omniverse stack, including the chips, systems, networking, and software are new inventions.

Building and operating the Omniverse computer requires a sophisticated IT team.

We’re going to make Omniverse fast and easy to scale and engage.

Let’s take a look.

The world’s largest industries are racing to digitalize their physical processes.

Today, that’s a complex undertaking.

NVIDIA Omniverse Cloud is a platform-as-a-service that provides instant, secure access to managed Omniverse Cloud APIs,

workflows, and customizable applications running on NVIDIA OVX.

Enterprise teams access the suite of managed services through the web browser Omniverse Launcher

or via a custom-built integration.

Once in Omniverse Cloud, enterprise teams can instantly access, extend, and publish foundation applications

and workflows - to assemble and compose virtual worlds -

generate data to train perception AIs -

test and validate autonomous vehicles -

or simulate autonomous robots…

…accessing and publishing shared data to Omniverse Nucleus.

Designers and engineers working in their favorite 3rd party design tools on RTX workstations,

publish edits to Nucleus in parallel.

Then when ready to iterate or view their integrated model in Omniverse,

can simply open a web browser and log in.

As projects and teams scale, Omniverse Cloud helps optimize cost

by provisioning compute resources and licenses as needed.

And new services and upgrades are automatically provided with real time updates.

With Omniverse Cloud, enterprises can fast-track unified digitalization and collaboration

across major industrial workflows, increasing efficiency, reducing costs and waste,

and accelerating the path to innovation.

See you in Omniverse!

Today, we announce the NVIDIA Omniverse Cloud, a fully managed cloud service.

We’re partnering with Microsoft to bring Omniverse Cloud to the world’s industries.

We will host it in Azure, benefiting from Microsoft’s rich storage, security, applications, and services portfolio.

We are connecting Omniverse Cloud to Microsoft 365 productivity suite, including Teams, OneDrive, SharePoint,

and the Azure IoT Digital Twins services.

Microsoft and NVIDIA are bringing Omniverse to hundreds of millions of Microsoft 365 and Azure users.

Accelerated computing and AI have arrived.

Developers use NVIDIA to speed-up and scale-up to solve problems previously impossible.

A daunting challenge is Net Zero. Every company must accelerate every workload to reclaim power.

Accelerated computing is a full-stack, datacenter-scale computing challenge.

Grace, Grace-Hopper, and BlueField-3 are new chips for super energy-efficient accelerated data centers.

Acceleration libraries solve new challenges and open new markets.

We updated 100 acceleration libraries, including cuQuantum for quantum computing, cuOpt for combinatorial optimization,

and cuLitho for computational lithography.

We are thrilled to partner with TSMC, ASML, and Synopsys to go to 2nm and beyond.

NVIDIA DGX AI Supercomputer is the engine behind the generative large language model breakthrough.

The DGX H100 AI Supercomputer is in production and available soon

from an expanding network of OEM and cloud partners worldwide.

The DGX supercomputer is going beyond research and becoming a modern AI factory.

Every company will manufacture intelligence.

We are extending our business model with NVIDIA DGX Cloud by partnering with Microsoft Azure, Google GCP, and Oracle OCI

to instantly bring NVIDIA AI to every company, from a browser.

DGX Cloud offers customers the best of NVIDIA and the best of the world’s leading CSPs.

We are at the iPhone moment for AI.

Generative AI inference workloads have gone into overdrive.

We launched our new inference platform - four configurations - one architecture.

L4 for AI video.

L40 for Omniverse and graphics rendering.

H100 PCIE for scaling out large language model inference.

Grace-Hopper for recommender systems and vector databases.

NVIDIA’s inference platform enables maximum data center acceleration and elasticity.

NVIDIA and Google Cloud are working together to deploy a broad range of inference workloads.

With this collaboration, Google GCP is a premiere NVIDIA AI cloud.

NVIDIA AI Foundations is a cloud service, a foundry, for building custom language models and Generative AI.

NVIDIA AI Foundations comprises language, visual, and biology model-making services.

Getty Images and Shutterstock are building custom visual language models.

And we’re partnering with Adobe to build a set of next-generation AI capabilities for the future of creativity.

Omniverse is the digital-to-physical operating system to realize industrial digitalization.

Omniverse can unify the end-to-end workflow and digitalize the $3T, 14 million-employee automotive industry.

Omniverse is leaping to the cloud.

Hosted in Azure, we partner with Microsoft to bring Omniverse Cloud to the world’s industries.

I thank our systems, cloud, and software partners, researchers, scientists,

and especially our amazing employees

for building the NVIDIA accelerated computing ecosystem.

Together, we are helping the world do the impossible.

Have a great GTC!