For nearly four decades
Moore’s Law has been the governing dynamics of the computer industry
which in turn has impacted every industry.
The exponential performance increase at constant cost and power has slowed.
Yet, computing advance has gone to lightspeed.
The warp drive engine is accelerated computing and the energy source is AI.
The arrival of accelerated computing and AI is timely
as industries tackle powerful dynamics
Without Moore’s Law, as computing surges, data center power is skyrocketing
and companies struggle to achieve Net Zero.
The impressive capabilities of Generative AI
created a sense of urgency for companies to reimagine their products and business models.
Industrial companies are racing to digitalize and reinvent into software-driven tech companies
to be the disruptor
and not the disrupted.
Today, we will discuss how accelerated computing and AI are powerful tools for tackling these challenges
and engaging the enormous opportunities ahead.
We will share new advances in NVIDIA’s full-stack, datacenter-scale, accelerated computing platform.
We will reveal new chips and systems, acceleration libraries, cloud and AI services
and partnerships that open new markets.
Welcome to GTC!
GTC is our conference for developers.
The global NVIDIA ecosystem spans 4 million developers, 40,000 companies
and 14,000 startups.
Thank you to our Diamond sponsors for supporting us and making GTC 2023 a huge success.
We’re so excited to welcome more than 250,000 of you to our conference.
GTC has grown incredibly.
Only four years ago, our in-person GTC conference had 8,000 attendees.
At GTC 2023, we’ll learn from leaders like Demis Hassabis of DeepMind
Valeri Taylor of Argonne Labs
Scott Belsky of Adobe
Paul Debevec of Netflix
Thomas Schulthess of ETH Zurich
and a special fireside chat I’m having with Ilya Sutskever
co-founder of OpenAI, the creator of ChatGPT.
We have 650 amazing talks from the brightest minds in academia and the world’s largest industries:
There are more than 70 talks on Generative AI alone.
Other great talks, like pre-trained multi-task models for robotics…
sessions on synthetic data generation, an important method for advancing AI
including one on using Isaac Sim to generate physically based lidar point clouds
a bunch of talks on digital twins, from using AI to populate virtual factories of the future
to restoring lost Roman mosaics of the past
cool talks on computational instruments, including a giant optical telescope and a photon-counting CT
materials science for carbon capture and solar cells, to climate science, including our work on Earth-2
important works by NVIDIA Research on trustworthy AI and AV safety
From computational lithography for micro-chips, to make the smallest machines
to AI at the Large Hadron Collider to explain the universe.
The world’s most important companies are here from auto and transportation
healthcare, manufacturing, financial services,
retail, apparel, media and entertainment, telco
and of course, the world’s leading AI companies.
The purpose of GTC is to inspire the world on the art-of-the-possible of accelerating computing
and to celebrate the achievements of the scientists and researchers that use it.
I am a translator.
Transforming text into creative discovery,
movement into animation,
and direction into action.
I am a healer.
Exploring the building blocks that make us unique
modeling new threats before they happen
and searching for the cures to keep them at bay.
I am a visionary.
Generating new medical miracles
and giving us a new perspective on our sun
to keep us safe here on earth.
I am a navigator.
Discovering a unique moment in a sea of content
we’re announcing the next generation
and the perfect setting for any story.
I am a creator.
Building 3D experiences from snapshots
and adding new levels of reality to our virtual selves.
I am a helper.
Bringing brainstorms to life
sharing the wisdom of a million programmers
and turning ideas into virtual worlds.
Build northern forest.
I even helped write this script
breathed life into the words
and composed the melody.
I am AI.
Brought to life by NVIDIA, deep learning, and brilliant minds everywhere.
NVIDIA invented accelerated computing to solve problems that normal computers can’t.
Accelerated computing is not easy
it requires full-stack invention from chips, systems, networking,
acceleration libraries, to refactoring the applications.
Each optimized stack accelerates an application domain
from graphics, imaging, particle or fluid dynamics
quantum physics, to data processing and machine learning.
Once accelerated, the application can enjoy incredible speed-up, as well as scale-up across many computers.
The combination of speed-up and scale-up has enabled us to achieve a million-X
for many applications over the past decade
helping solve problems previously impossible.
Though there are many examples, the most famous is deep learning.
In 2012, Alex Kerchevsky, Ilya Suskever, and Geoff Hinton needed an insanely fast computer
to train the AlexNet computer vision model.
The researchers trained AlexNet with 14 million images on GeForce GTX 580
processing 262 quadrillion floating-point operations,
and the trained model won the ImageNet challenge by a wide margin, and ignited the Big Bang of AI.
A decade later, the transformer model was invented.
And Ilya, now at OpenAI, trained the GPT-3 large language model to predict the next word.
323 sextillion floating-point operations were required to train GPT-3.
One million times more floating-point operations than to train AlexNet.
The result this time – ChatGPT, the AI heard around the world.
A new computing platform has been invented.
The iPhone moment of AI has started.
Accelerated computing and AI have arrived.
Acceleration libraries are at the core of accelerated computing.
These libraries connect to applications which connect to the world’s industries, forming a network of networks.
Three decades in the making, several thousand applications are now NVIDIA accelerated
with libraries in almost every domain of science and industry.
All NVIDIA GPUs are CUDA-compatible, providing a large install base and significant reach for developers.
A wealth of accelerated applications attract end users, which creates a large market for cloud service providers
and computer makers to serve.
A large market affords billions in R&D to fuel its growth.
NVIDIA has established the accelerated computing virtuous cycle.
Of the 300 acceleration libraries and 400 AI models that span ray tracing and neural rendering
physical, earth, and life sciences, quantum physics and chemistry, computer vision
data processing, machine learning and AI, we updated 100
we updated 100 this year that increase performance and features for our entire installed base.
Let me highlight some acceleration libraries that solve new challenges and open new markets.
The auto and aerospace industries use CFD for turbulence and aerodynamics simulation.
The electronics industry uses CFD for thermal management design.
This is Cadence’s slide of their new CFD solver accelerated by CUDA.
At equivalent system cost, NVIDIA A100 is 9X the throughput of CPU servers.
Or at equivalent simulation throughput, NVIDIA is 9X lower cost or 17X less energy consumed.
Ansys, Siemens, Cadence, and other leading CFD solvers are now CUDA-accelerated.
Worldwide, industrial CAE uses nearly 100 billion CPU core hours yearly.
Acceleration is the best way to reclaim power and achieve sustainability and Net Zero.
NVIDIA is partnering with the global quantum computing research community.
The NVIDIA Quantum platform consists of libraries and systems for researchers to advance quantum programming models,
system architectures, and algorithms.
cuQuantum is an acceleration library for quantum circuit simulations.
IBM Qiskit, Google Cirq, Baidu Quantum Leaf, QMWare, QuEra, Xanadu Pennylane, Agnostiq, and AWS Bracket
have integrated cuQuantum into their simulation frameworks.
Open Quantum CUDA is our hybrid GPU-Quantum programming model.
IonQ, ORCA Computing, Atom, QuEra, Oxford Quantum Circuits, IQM, Pasqal, Quantum Brilliance, Quantinuum, Rigetti,
Xanadu, and Anyon have integrated Open Quantum CUDA.
Error correction on a large number of qubits is necessary to recover data from quantum noise and decoherence.
Today, we are announcing a quantum control link, developed in partnership with Quantum Machines
that connects NVIDIA GPUs to a quantum computer to do error correction at extremely high speeds.
Though commercial quantum computers are still a decade or two away, we are delighted to support this large and vibrant
research community with NVIDIA Quantum.
Enterprises worldwide use Apache Spark to process data lakes and warehouses
SQL queries, graph analytics, and recommender systems.
Spark-RAPIDS is NVIDIA’s accelerated Apache Spark data processing engine.
Data processing is the leading workload of the world’s $500B cloud computing spend.
Spark-RAPIDS now accelerates major cloud data processing platforms, including GCP Dataproc
Amazon EMR, Databricks, and Cloudera.
Recommender systems use vector databases to store, index, search, and retrieve massive datasets of unstructured data.
A new important use-case of vector databases is large language models to retrieve domain-specific or proprietary facts
that can be queried during text generation.
We are introducing a new library, RAFT, to accelerate indexing, loading the data
and retrieving a batch of neighbors for a single query.
We are bringing the acceleration of RAFT to Meta’s open-source FAISS AI Similarity Search, Milvus open-source vector DB
used by over 1,000 organizations, and Redis with over 4B docker pulls.
Vector databases will be essential for organizations building proprietary large language models.
Twenty-two years ago, operations research scientists Li and Lim posted a series of challenging pickup and delivery problems.
PDP shows up in manufacturing, transportation, retail and logistics, and even disaster relief.
PDP is a generalization of the Traveling Salesperson Problem and is NP-hard
meaning there is no efficient algorithm to find an exact solution.
The solution time grows factorially as the problem size increases.
Using an evolution algorithm and accelerated computing to analyze 30 billion moves per second
NVIDIA cuOpt has broken the world record and discovered the best solution for Li&Lim’s challenge.
AT&T routinely dispatches 30,000 technicians to service 13 million customers across 700 geographic zones.
Today, running on CPUs, AT&T’s dispatch optimization takes overnight.
AT&T wants to find a dispatch solution in real time that continuously optimizes for urgent customer needs
and overall customer satisfaction, while adjusting for delays and new incidents that arise.
With cuOpt, AT&T can find a solution 100X faster and update their dispatch in real time.
AT&T has adopted a full suite of NVIDIA AI libraries.
In addition to Spark-RAPIDS and cuOPT, they’re using Riva for conversational AI and Omniverse for digital avatars.
AT&T is tapping into NVIDIA accelerated computing and AI for sustainability, cost savings, and new services.
cuOpt can also optimize logistic services. 400 billion parcels are delivered to 377 billion stops each year.
Deloitte, Capgemini, Softserve, Accenture, and Quantiphi are using NVIDIA cuOpt to help customers optimize operations.
NVIDIA’s inference platform consists of three software SDKs.
NVIDIA TensorRT is our inference runtime that optimizes for the target GPU.
NVIDIA Triton is a multi-framework data center inference serving software supporting GPUs and CPUs.
Microsoft Office and Teams, Amazon, American Express, and the U.S. Postal Service
are among the 40,000 customers using TensorRT and Triton.
Uber uses Triton to serve hundreds of thousands of ETA predictions per second.
With over 60 million daily users, Roblox uses Triton to serve models for game recommendations
build avatars, and moderate content and marketplace ads.
We are releasing some great new features – model analyzer support for model ensembles, multiple concurrent model serving,
and multi-GPU, multi-node inference for GPT-3 large language models.
NVIDIA Triton Management Service is our new software that automates the scaling and orchestration
of Triton inference instances across a data center.
Triton Management Service will help you improve the throughput and cost efficiency of deploying your models.
50-80% of cloud video pipelines are processed on CPUs
consuming power and cost and adding latency.
CV-CUDA for computer vision, and VPF for video processing, are new cloud-scale acceleration libraries.
CV-CUDA includes 30 computer vision operators for detection, segmentation, and classification.
VPF is a python video encode and decode acceleration library.
Tencent uses CV-CUDA and VPF to process 300,000 videos per day.
Microsoft uses CV-CUDA and VPF to process visual search.
Runway is a super cool company that uses CV-CUDA and VPF to process video
for their cloud Generative AI video editing service.
Already, 80% of internet traffic is video.
User-generated video content is driving significant growth and consuming massive amounts of power.
We should accelerate all video processing and reclaim the power.
CV-CUDA and VPF are in early access.
NVIDIA accelerated computing helped achieve a genomics milestone
now doctors can draw blood and sequence a patient’s DNA in the same visit.
In another milestone, NVIDIA-powered instruments reduced the cost of whole genome sequencing to just $100.
Genomics is a critical tool in synthetic biology with applications ranging from drug discovery
and agriculture to energy production.
NVIDIA Parabricks is a suite of AI-accelerated libraries for end-to-end genomics analysis in the cloud or in-instrument.
NVIDIA Parabricks is available in every public cloud and genomics platforms like Terra, DNAnexus, and FormBio.
Today, we’re announcing Parabricks 4.1 and will run on NVIDIA-accelerated genomics instruments
from PacBio, Oxford Nanopore, Ultima, Singular, BioNano, and Nanostring.
The world’s $250B medical instruments market is being transformed.
Medical instruments will be software-defined and AI powered.
NVIDIA Holoscan is a software library for real-time sensor processing systems.
Over 75 companies are developing medical instruments on Holoscan.
Today, we are announcing Medtronic, the world leader in medical instruments, and NVIDIA are building their AI platform
for software-defined medical devices.
This partnership will create a common platform for Medtronic systems, ranging from surgical navigation
to robotic-assisted surgery.
Today, Medtronic announced that its next-generation GI Genius system, with AI for early detection of colon cancer
is built on NVIDIA Holoscan and will ship around the end of this year.
The chip industry is the foundation of nearly every industry.
Chip manufacturing demands extreme precision, producing features 1,000 times smaller than a bacterium
and on the order of a single gold atom or a strand of human DNA.
Lithography, the process of creating patterns on a wafer, is the beginning of the chip manufacturing process
and consists of two stages – photomask making and pattern projection.
It is fundamentally an imaging problem at the limits of physics.
The photomask is like a stencil of a chip. Light is blocked or passed through the mask
to the wafer to create the pattern.
The light is produced by the ASML EUV extreme ultraviolet lithography system.
Each system is more than a quarter-of-a-billion dollars.
ASML EUV uses a radical way to create light.
Laser pulses firing 50,000 times a second at a drop of tin, vaporizing it, creating a plasma that emits 13.5nm EUV light
Multilayer mirrors guide the light to the mask.
The multilayer reflectors in the mask reticle take advantage of interference patterns of the 13.5nm light
to create finer features down to 3nm.
The wafer is positioned within a quarter of a nanometer and aligned 20,000 times a second to adjust for any vibration.
The step before lithography is equally miraculous.
Computational lithography applies inverse physics algorithms to predict the patterns on the mask
that will produce the final patterns on the wafer.
In fact, the patterns on the mask do not resemble the final features at all.
Computational lithography simulates Maxwell’s equations of the behavior of light passing through optics
and interacting with photoresists.
Computational lithography is the largest computation workload in chip design and manufacturing
consuming tens of billions of CPU hours annually.
Massive data centers run 24/7 to create reticles used in lithography systems.
These data centers are part of the nearly $200 billion annual CAPEX invested by chip manufacturers.
Computational lithography is growing fast as algorithm complexity increases
enabling the industry to go to 2nm and beyond.
NVIDIA today is announcing cuLitho, a library for computational lithography.
cuLitho, a massive body of work that has taken nearly four years, and with close collaborations with TSMC,
ASML, and Synopsys, accelerates computational lithography by over 40X.
There are 89 reticles for the NVIDIA H100.
Running on CPUs, a single reticle currently takes two weeks to process.
cuLitho, running on GPUs, can process a reticle in a single 8-hour shift.
TSMC can reduce their 40,000 CPU servers used for computational lithography by accelerating with cuLitho
on just 500 DGX H100 systems, reducing power from 35MW to just 5MW.
With cuLitho, TSMC can reduce prototype cycle time, increase throughput
and reduce the carbon footprint of their manufacturing, and prepare for 2nm and beyond.
TSMC will be qualifying cuLitho for production starting in June.
Every industry needs to accelerate every workload, so that we can reclaim power and do more with less.
Over the past ten years, cloud computing has grown 20% annually into a massive $1T industry.
Some 30 million CPU servers do the majority of the processing.
There are challenges on the horizon.
As Moore’s Law ends, increasing CPU performance comes with increased power.
And the mandate to decrease carbon emissions is fundamentally at odds with the need to increase data centers.
Cloud computing growth is power-limited.
First and foremost, data centers must accelerate every workload.
Acceleration will reclaim power.
The energy saved can fuel new growth.
Whatever is not accelerated will be processed on CPUs.
The CPU design point for accelerated cloud datacenters differs fundamentally from the past.
In AI and cloud services, accelerated computing offloads parallelizable workloads, and CPUs process other workloads,
like web RPC and database queries.
We designed the Grace CPU for an AI and cloud-first world, where AI workloads are GPU-accelerated
and Grace excels at single-threaded execution and memory processing.
It’s not just about the CPU chip. Datacenter operators optimize for throughput and total cost of ownership of the entire datacenter.
We designed Grace for high energy-efficiency at cloud datacenter scale.
Grace comprises 72 Arm cores connected by a super high-speed on-chip scalable coherent fabric that delivers 3.2 TB/sec
of cross-sectional bandwidth.
Grace Superchip connects 144 cores between two CPU dies over a 900 GB/sec low-power chip-to-chip coherent interface.
The memory system is LPDDR low-power memory, like used in cellphones, that we specially enhanced for use in datacenters.
It delivers 1 TB/s, 2.5x the bandwidth of today’s systems at 1/8th the power.
The entire 144-core Grace Superchip module with 1TB of memory is only 5x8 inches.
It is so low power it can be air cooled.
This is the computing module with passive cooling.
Two Grace Superchip computers can fit in a single 1U air-cooled server.
Grace’s performance and power efficiency are excellent for cloud and scientific computing applications.
We tested Grace on a popular Google benchmark, which tests how quickly cloud microservices communicate
and the Hi-Bench suite that tests Apache Spark memory-intensive data processing.
These kinds of workloads are foundational for cloud datacenters.
At microservices, Grace is 1.3X faster than the average of the newest generation x86 CPUs
and 1.2X faster at data processing
And that higher performance is achieved using only 60% of the power measured at the full server node.
CSPs can outfit a power-limited data center with 1.7X more Grace servers, each delivering 25% higher throughput.
At iso-power, Grace gives CSPs 2X the growth opportunity.
Grace is sampling.
And Asus, Atos, Gigabyte, HPE, QCT, Supermicro, Wistron, and ZT are building systems now.
In a modern software-defined data center, the operating system doing virtualization, network, storage, and security can
consume nearly half of the datacenter’s CPU cores and associated power.
Datacenters must accelerate every workload to reclaim power and free CPUs for revenue-generating workloads.
NVIDIA BlueField offloads and accelerates the datacenter operating system and infrastructure software.
Over two dozen ecosystem partners, including Check Point, Cisco, DDN, Dell EMC
Juniper, Palo Alto Networks, Red Hat, and VMWare,
use BlueField’s datacenter acceleration technology to run their software platforms more efficiently.
BlueField-3 is in production and adopted by leading cloud service providers, Baidu, CoreWeave, JD.com, Microsoft Azure,
Oracle OCI, and Tencent Games, to accelerate their clouds.
NVIDIA accelerated computing starts with DGX the world’s AI supercomputer
the engine behind the large language model breakthrough.
I hand-delivered the world’s first DGX to OpenAI.
Since then, half of the Fortune 100 companies have installed DGX AI supercomputers.
DGX has become the essential instrument of AI.
The GPU of DGX is eight H100 modules.
H100 has a Transformer Engine designed to process models like the amazing ChatGPT,
which stands for Generative Pre-trained Transformers.
The eight H100 modules are NVLINK’d to each other across NVLINK switches to allow fully non-blocking transactions.
The eight H100s work as one giant GPU.
The computing fabric is one of the most vital systems of the AI supercomputer.
400 Gbps ultra-low latency NVIDIA Quantum InfiniBand
with in-network processing
connects hundreds and thousands of DGX nodes
into an AI supercomputer.
NVIDIA DGX H100 is the blueprint for customers building AI infrastructure worldwide.
It is now in full production.
I am thrilled that Microsoft announced Azure is opening private previews to their H100 AI supercomputer.
Other systems and cloud services will soon come from Atos, AWS, Cirrascale, CoreWeave, Dell, Gigabyte, Google, HPE,
Lambda Labs, Lenovo, Oracle, Quanta, and SuperMicro.
The market for DGX AI supercomputers has grown significantly.
Originally used as an AI research instrument, DGX AI supercomputers are expanding into operation
running 24/7 to refine data and process AI.
DGX supercomputers are modern AI factories.
We are at the iPhone moment of AI.
Start-ups are racing to build disruptive products and business models, while incumbents are looking to respond.
Generative AI has triggered a sense of urgency in enterprises worldwide to develop AI strategies.
Customers need to access NVIDIA AI easier and faster.
We are announcing NVIDIA DGX Cloud through partnerships with Microsoft Azure, Google GCP, and Oracle OCI
to bring NVIDIA DGX AI supercomputers to every company, instantly, from a browser.
DGX Cloud is optimized to run NVIDIA AI Enterprise, the world’s leading acceleration library suite
for end-to-end development and deployment of AI.
DGX Cloud offers customers the best of NVIDIA AI and the best of the world’s leading cloud service providers.
This partnership brings NVIDIA’s ecosystem to the CSPs, while amplifying NVIDIA’s scale and reach.
This win-win partnership gives customers racing to engage Generative AI instant access to NVIDIA in global-scale clouds.
We’re excited by the speed, scale, and reach of this cloud extension of our business model.
Oracle Cloud Infrastructure, OCI, will be the first NVIDIA DGX Cloud.
OCI has excellent performance. They have a two-tier computing fabric and management network.
NVIDIA’s CX-7, with the industry’s best RDMA, is the computing fabric.
And BlueField-3 will be the infrastructure processor for the management network.
The combination is a state-of-the-art DGX AI supercomputer that can be offered as a multi-tenant cloud service.
We have 50 early access enterprise customers, spanning consumer internet and software, healthcare
media and entertainment, and financial services.
ChatGPT, Stable Diffusion, DALL-E, and Midjourney have awakened the world to Generative AI.
These applications’ ease-of-use and impressive capabilities attracted over a hundred million users in just a few months
- ChatGPT is the fastest-growing application in history.
No training is necessary. Just ask these models to do something.
The prompts can be precise or ambiguous. If not clear,
through conversation, ChatGPT learns your intentions.
The generated text is beyond impressive.
ChatGPT can compose memos and poems, paraphrase a research paper, solve math problems,
highlight key points of a contract, and even code software programs.
ChatGPT is a computer that not only runs software but writes software.
Many breakthroughs led to Generative AI.
Transformers learn context and meaning from the relationships and dependencies of data, in parallel and at large scale.
This led to large language models that learn from so much data
they can perform downstream tasks without explicit training.
And diffusion models, inspired by physics, learn without supervision to generate images.
In just over a decade, we went from trying to recognize cats to generating realistic images of a cat
in a space suit
walking on the moon.
Generative AI is a new kind of computer — one that we program in human language.
This ability has profound implications. Everyone can direct a computer to solve problems.
This was a domain only for computer programmers.
Now everyone is a programmer.
Generative AI is a new computing platform like PC, internet, mobile, and cloud.
And like in previous computing eras, first-movers are creating new applications
and founding new companies to capitalize on Generative AI’s ability to automate and co-create.
Debuild lets users design and deploy web applications just by explaining what they want.
Grammarly is a writing assistant that considers context.
Tabnine helps developers write code.
Omnekey generates customized ads and copy.
Kore.ai is a virtual customer service agent.
Jasper generates marketing material. Jasper has written nearly 5 billion words,
reducing time to generate the first draft by 80%.
Insilico uses AI to accelerate drug design.
Absci is using AI to predict therapeutic antibodies.
Generative AI will reinvent nearly every industry.
Many companies can use one of the excellent Generative AI APIs coming to market.
Some companies need to build custom models, with their proprietary data, that are experts in their domain.
They need to set up usage guardrails and refine their models to align
with their company’s safety, privacy, and security requirements.
The industry needs a foundry, a TSMC, for custom large language models.
Today, we announce the NVIDIA AI Foundations
a cloud service for customers needing to build, refine, and operate
custom LLMlarge language models and Generative AI
trained with their proprietary data
and for their domain-specific tasks.
NVIDIA AI Foundations comprises Language,
Visual, and Biology model-making services.
NVIDIA Nemo is for building custom language text-to-text
Customers can bring their model or start with the Nemo pre-trained language models, ranging from GPT-8, GPT-43
and GPT-530 billion parameters.
Throughout the entire process, NVIDIA AI experts will work with you, from creating your proprietary model to operations.
Let’s take a look.
Generative models, like NVIDIA’s 43B foundational model, learn by training on billions of sentences
and trillions of words.
As the model converges, it begins to understand the relationships between words and their underlying concepts
captured in the weights in the embedding space of the model.
Transformer models use a technique called self attention: a mechanism designed to learn dependencies and relationships
within a sequence of words.
The result is a model that provides the foundation for a ChatGPT-like experience.
These generative models require expansive amounts of data
deep AI expertise for data processing and distributed training
and large scale compute to train, deploy and maintain at the pace of innovation.
Enterprises can fast-track their generative AI adoption
with NVIDIA NeMo service running on NVIDIA DGX Cloud.
The quickest path is starting with one of NVIDIA’s state-of-the-art
pre-trained foundation models.
With the NeMo service, organizations can easily customize a model
with p-tuning to teach it specialized skills
like summarizing financial documents
creating brand-specific content
and composing emails with personalized writing styles.
Connecting the model to a proprietary knowledge base
ensures that responses are accurate, current
and cited for their business.
Next, they can provide guardrails by adding logic
and monitoring inputs, outputs, toxicity, and bias thresholds
so it operates within a specified domain
and prevents undesired responses.
After putting the model to work, it can continuously improve
with reinforcement learning based on user interactions.
And NeMo’s playground is available for rapid prototyping before moving to the cloud API
for larger-scale evaluation and application integration.
Sign up for the NVIDIA NeMo service today
to codify your enterprise’s knowledge into a personalized
AI model that you control.
Picasso is a visual language model-making service for customers who want to build custom models
trained with licensed or proprietary content.
Let’s take a look.
Generative AI is transforming how visual content is created.
But to realize its full potential, enterprises need massiveamounts of copyright-cleared data, AI experts, and an AI supercomputer.
NVIDIA Picasso is a cloud service for building and deploying
generative AI-powered image, video, and 3D applications.
With it, enterprises, ISVs, and service providers
can deploy their own models.
We’re working with premier partners to bring
generative AI capabilities to every industry
Organizations can also start with NVIDIA Edify models
and train them on their data to create a product or service.
These models generate images, videos, and 3D assets.
To access generative AI models
applications send an API call with text prompts
and metadata to Picasso.
Picasso uses the appropriate model running on NVIDIA DGX Cloud
to send back the generated asset to the application.
This can be a photorealistic image, a high-resolution video, or a detailed 3D geometry.
Generated assets can be imported into editing tools or into NVIDIA Omniverse to build photorealistic virtual worlds,
metaverse applications, and digital twin simulations.
With NVIDIA Picasso services running on NVIDIA DGX Cloud
you can streamline training, optimization, and inference
needed to build custom generative AI applications.
See how NVIDIA Picasso can bring transformative generative AI capabilities into your applications.
We are delighted that Getty Images will use the Picasso service to build Edify-image and Edify-video generative models
trained on their rich library of responsibly licensed professional images and video assets.
Enterprises will be able to create custom images and video with simple text or image prompts.
Shutterstock is developing an Edify-3D generative model
trained on their professional image, 3D, and video assets library.
Shutterstock will help simplify the creation of 3D assets for creative production, digital twins and virtual collaboration,
making these workflows faster and easier for enterprises to implement.
And I’m thrilled to announce a significant expansion of our long-time partnership with Adobe
to build a set of next-generation AI capabilities for the future of creativity
integrating generative AI into the everyday workflows of marketers and creative professionals.
The new Generative AI models will be optimized
for image creation, video, 3D, and animation.
To protect artists’ rights, Adobe is developing with a focus on commercial viability and proper content attribution
powered by Adobe’s Content Authenticity Initiative.
Our third language domain is biology.
Drug discovery is a nearly $2T industry
with $250B dedicated to R&D.
NVIDIA’s Clara is a healthcare application framework for imaging
instruments, genomics, and drug discovery.
The industry is now jumping onto generative AI to discover disease targets
design novel molecules or protein-based drugs, and predict the behavior of the medicines in the body.
Insilico Medicine, Exscientia, Absci, and Evozyme, are among hundreds of new AI drug discovery start-ups.
Several have discovered novel targets or drug candidates and have started human clinical trials.
BioNeMo helps researchers create
fine-tune, and serve custom models with their proprietary data.
Let’s take a look.
There are 3 key stages to drug discovery
discovering the biology that causes disease
designing new molecules - whether those are small-molecules, proteins or antibodies
and finally screening how those molecules interact with each other.
Today, Generative AI is transforming every step of the drug discovery process.
NVIDIA BioNeMo Service provides state-of-the-art
generative AI models for drug discovery.
It’s available as a cloud service, providing instant and easy access to accelerated drug discovery workflows.
BioNeMo includes models like AlphaFold, ESMFold and OpenFold
for 3D protein structure prediction.
ProtGPT for protein generation,
ESM1 and ESM2 for protein property prediction
MegaMolBART and MoFlow and for molecule generation
and DiffDock for molecular docking.
Drug discovery teams can use the models through BioNeMo’s web interface
or cloud APIs.
Here is an example of using NVIDIA BioNeMo
for drug discovery virtual screening.
Generative models can now read a proteins amino acid sequence
and in seconds, accurately predict the structure of a target protein.
They can also generate molecules with desirable ADME properties that optimize how a drug behaves in the body.
Generative models can even predict the 3D interactions of a protein and molecule
accelerating the discovery of optimal drug candidates.
With NVIDIA DGX Cloud BioNeMo also provides on-demand super computing infrastructure to further optimize and train models,
saving teams valuable time and money so they can focus on discovering life saving medicines.
The new AI drug discovery pipelines are here.
Sign up for access for NVIDIA BioNeMo Service.
We will continue to work with the industry to include models into BioNemo
that encompass the end-to-end workflow of drug discovery and virtual screening.
Amgen, AstraZeneca, Insilico Medicine, Evozyne, Innophore, and Alchemab Therapeutics are early access users of BioNeMo.
NVIDIA AI Foundations, a cloud service, a foundry, for building custom language models and Generative AI.
Since AlexNet a decade ago, deep learning has opened giant new markets — automated driving, robotics, smart speakers,
and reinvented how we shop, consume news, and enjoy music.
That’s just the tip of the iceberg.
AI is at an inflection point as Generative AI has started a new wave of opportunities, driving a step-function increase
in inference workloads.
AI can now generate diverse data, spanning voice, text, images, video, and 3D graphics to proteins and chemicals.
Designing a cloud data center to process Generative AI is a great challenge.
On the one hand, a single type of accelerator is ideal, because it allows the datacenter to be elastic
and handle the unpredictable peaks and valleys of traffic.
On the other hand, no one accelerator can optimally process the diversity of algorithms, models, data types, and sizes.
NVIDIA’s One Architecture platform offers both acceleration and elasticity.
Today, we are announcing our new inference platform - four configurations - one architecture - one software stack.
Each configuration is optimized for a class of workloads.
For AI video workloads, we have L4 optimized for video decoding and transcoding, video content moderation,
and video call features like background replacement, relighting, making eye contact,
transcription, and real-time language translation.
Most cloud videos today are processed on CPUs.
One 8-GPU L4 server will replace over a hundred dual-socket CPU servers for processing AI Video.
Snap is a leading user of NVIDIA AI for computer vision and recommender systems.
Snap will use L4 for AV1 video processing, generative AI, and augmented reality.
Snapchat users upload hundreds of millions of videos every day.
Google announced today NVIDIA L4 on GCP.
NVIDIA and Google Cloud are working to deploy major workloads on L4.
Let me highlight five.
First, we’re accelerating inference for generative AI models for cloud services like Wombo and Descript.
Second, we’re integrating Triton Inference Server with Google Kubernetes Engine and VertexAI.
Third, we’re accelerating Google Dataproc with NVIDIA Spark-RAPIDS.
Fourth, we’re accelerating AlphaFold, and UL2 and T5 large language models.
And fifth, we are accelerating Google Cloud’s Immersive Stream that renders 3D and AR experiences.
With this collaboration, Google GCP is a premiere NVIDIA AI cloud.
We look forward to telling you even more about our collaboration very soon.
For Omniverse, graphics rendering and generative AI like text-to-image and text-to-video, we are announcing L40.
L40 is up to 10 times the performance of NVIDIA’s T4, the most popular cloud inference GPU.
Runway is a pioneer in Generative AI.
Their research team was a key creator of Stable Diffusion and its predecessor, Latent Diffusion.
Runway is inventing generative AI models for creating and editing content.
With over 30 AI Magic Tools, their service is revolutionizing the creative process, all from the cloud.
Let’s take a look.
Runway is making amazing AI-powered video editing and image creation tools accessible to everyone.
Powered by the latest generation of NVIDIA GPUs running locally or in the cloud, Runway makes it possible
to remove an object from a video with just a few brush strokes.
Or apply different styles to video using just an input image.
Or change the background or the foreground of a video.
What used to take hours using conventional tools can now be completed with professional broadcast quality results
in just a few minutes.
Runway does this by utilizing CV-CUDA, an open-source project that enables developers to build highly efficient
GPU-accelerated pre- and post-processing pipelines for computer vision workloads and scale them into the cloud.
With NVIDIA technology, Runway is able to make impossible things to give the best experience to content creators.
What previously limited pros can now be done by you.
In fact, Runway is used in Oscar-nominated Hollywood films and we are placing this technology
in the hands of the world’s creators.
Large language models like ChatGPT are a significant new inference workload.
GPT models are memory and computationally intensive.
Furthermore, inference is a high-volume, scale-out workload and requires standard commodity servers.
For large language model inference, like ChatGPT, we are announcing a new Hopper GPU — the PCIE H100
with dual-GPU NVLINK. The new H100 has 94GB of HBM3 memory.
H100 can process the 175-billion-parameter GPT-3
and supporting commodity PCIE servers make it easy to scale out.
The only GPU in the cloud today that can practically process ChatGPT is HGX A100.
Compared to HGX A100 for GPT-3 processing, a standard server with four pairs of H100 with dual-GPU NVLINK
is up to 10X faster.
H100 can reduce large language model processing costs by an order of magnitude.
Grace Hopper is our new superchip that connects Grace CPU and Hopper GPU over a high-speed 900 GB/sec
coherent chip-to-chip interface.
Grace Hopper is ideal for processing giant data sets like AI databases for recommender systems
and large language models.
Today, CPUs, with large memory, store and query giant embedding tables then transfer results to GPUs for inference.
With Grace-Hopper, Grace queries the embedding tables and transfers the results directly to Hopper
across the high-speed interface – 7 times faster than PCIE.
Customers want to build AI databases several orders of magnitude larger.
Grace-Hopper is the ideal engine.
This is NVIDIA’s inference platform – one architecture for diverse AI workloads,
and maximum datacenter acceleration and elasticity.
The world’s largest industries make physical things, but they want to build them digitally.
Omniverse is a platform for industrial digitalization that bridges digital and physical.
It lets industries design, build, operate, and optimize physical products and factories digitally,
before making a physical replica.
Digitalization boosts efficiency and speed and saves money.
One use of Omniverse is the virtual bring-up of a factory, where all of its machinery is integrated digitally
before the real factory is built.
This reduces last-minute surprises, change orders, and plant opening delays.
Virtual factory integration can save billions for the world’s factories.
The semiconductor industry is investing half a trillion dollars to build a record 84 new fabs.
By 2030, auto manufacturers will build 300 factories to make 200 million electric vehicles.
And battery makers are building 100 more mega factories.
Digitalization is also transforming logistics, moving goods through billions of square feet of warehouses worldwide.
Let’s look at how Amazon uses Omniverse to automate, optimize, and plan its autonomous warehouses.
Amazon Robotics has manufactured and deployed the largest fleet of mobile industrial robots in the world.
The newest member of this robotic fleet is Proteus, Amazon’s first fully autonomous warehouse robot.
Proteus is built to move through our facilities using advanced safety, perception, and navigation technology.
Let’s see how NVIDIA Isaac Sim, built on Omniverse is creating physically accurate, photoreal simulations
to help accelerate Proteus deployments.
Proteus features multiple sensors that include cameras, lidars, and ultrasonic sensors
to power it’s autonomy software systems.
The Proteus team needed to improve the performance of a neural network that read fiducial markers and helped the robot
determine its location on the map.
It takes lots of data—and the right kind—to train the ML models that are driven by the robot sensor input.
With Omniverse Replicator in Isaac Sim, Amazon Robotics was able to generate large photoreal synthetic datasets that improved
the marker detection success rate from 88.6% to 98%.
The use of the synthetic data generated by Omniverse Replicator also sped up development times, from months to days,
as we were able to iteratively test and train our models much faster than when only using real data.
To enable new autonomous capabilities for the expanding fleet of Proteus robots, Amazon Robotics is working towards
closing the gap from simulation to reality, building large scale multi-sensor, multi-robot simulations.
With Omniverse, Amazon Robotics will optimize operations with full fidelity warehouse digital twins.
Whether we’re generating synthetic data or developing new levels of autonomy, Isaac Sim on Omniverse
helps the Amazon Robotics team save time and money as we deploy Proteus across our facilities.
Omniverse has unique technologies for digitalization.
And Omniverse is the premier development platform for USD, which serves as a common language that lets teams collaborate
to create virtual worlds and digital twins.
Omniverse is physically based, mirroring the laws of physics.
It can connect to robotic systems and operate with hardware-in-the-loop.
It features Generative AI to accelerate the creation of virtual worlds.
And Omniverse can manage data sets of enormous scale.
We’ve made significant updates to Omniverse in every area.
Let’s take a look.
Nearly 300,000 creators and designers have downloaded Omniverse.
Omniverse is not a tool, but a USD network and shared database,
a fabric connecting to design tools used across industries.
It connects, composes, and simulates the assets created by industry-leading tools.
We are delighted to see the growth of Omniverse connections.
Each connection links the ecosystem of one platform to the ecosystems of all the others.
Omniverse’s network of networks is growing exponentially.
Bentley Systems LumenRT is now connected.
So are Siemens Teamcenter, NX, and Process Simulate, Rockwell Automation Emulate 3D, Cesium, Unity, and many more.
Let’s look at the digitalization of the $3T auto industry
and see how car companies are evaluating Omniverse in their workflows.
Volvo Cars and GM use Omniverse USD Composer to connect and unify their asset pipelines.
GM connects designers, sculptors, and artists using Alias, Siemens NX, Unreal, Maya, 3ds Max,
and virtually assembles the components into a digital twin of the car.
In engineering and simulation, they visualize the power flow aerodynamics in Omniverse.
For next-generation Mercedes-Benz and Jaguar Land Rover vehicles, engineers use Drive Sim in Omniverse to generate
synthetic data to train AI models, validate the active-safety system against a virtual NCAP driving test,
and simulate real driving scenarios.
Omniverse’s generative AI reconstructs previously driven routes into 3D
so past experiences can be reenacted or modified.
Working with Idealworks, BMW uses Isaac Sim in Omniverse to generate synthetic data
and scenarios to train factory robots.
Lotus is using Omniverse to virtually assemble welding stations.
Toyota is using Omniverse to build digital twins of their plants.
Mercedes-Benz uses Omniverse to build, optimize, and plan assembly lines for new models.
Rimac and Lucid Motors use Omniverse to build digital stores from actual design data that faithfully represent their cars.
BMW is using Omniverse to plan operations across nearly three dozen factories worldwide.
And they are building a new EV factory, completely in Omniverse, two years before the physical plant opens.
The world’s industries are accelerating digitalization with over $3.4 trillion being invested in the next three years.
We at BMW strive to be leading edge in automotive digitalization.
With NVIDIA Omniverse and AI we set up new factories faster and produce more efficiently than ever.
This results in significant savings for us.
It all starts with planning – a complex process in which we need to connect many tools,
datasets and specialists around the world.
Traditionally, we are limited, since data is managed separately in a variety of systems and tools.
Today, we’ve changed all that.
We are developing custom Omniverse applications to connect our existing tools, know-how and teams
all in a unified view.
Omniverse is cloud-native and cloud-agnostic enabling teams to collaborate across our virtual factories from everywhere.
I’m about to join a virtual planning session for Debrecen in Hungary – our new EV factory – opening in 2025.
Letʼs jump in.
Planner 1: Ah, Milan is joining.
Milan: Hello, everyone!
Planner 1:Hi Milan – great to see you, we’re in the middle of an optimization loop for our body shop.
Would you like to see?
Milan: Thanks – I’m highly interested. And I’d like to invite a friend.
Planner 1: Sure.
Jensen: Hey Milan! Good to see you.
Milan: Jensen, welcome to our virtual planning session.
Jensen: Its great to be here. What are we looking at?
Milan: This is our global planning team who are working on a robot cell in Debrecen’s digital twin.
Matthias, tell us what’s happening …
Matthias: So, we just learned the production concept requires some changes.
We’re now reconfiguring the layout to add a new robot into the cell.
Planner 2: Ok, but if we add a new robot, on the logistics side, we’ll need to move our storage container.
Planner 3: Alright, let’s get this new robot in.
Matthias: That’s perfect. But let’s double-check - can we run the cell?
Jensen: Milan, this is just incredible!
Virtual factory integration is essential for every industry.
I’m so proud to see what our teams did together. Congratulations!
Milan: We are working globally to optimize locally.
After planning, operations is king, and we’ve already started!
To celebrate the launch of our virtual plant, I’d like to invite you to open the first digital factory with me.
Jensen: I’d be honored. Let’s do it!
Car companies employ nearly 14 million people.
Digitalization will enhance the industry’s efficiency, productivity, and speed.
Omniverse is the digital-to-physical operating system to realize industrial digitalization.
Today we are announcing three systems designed to run Omniverse.
First, we’re launching a new generation of workstations powered by NVIDIA Ada RTX GPUs and Intel’s newest CPUs.
The new workstations are ideal for doing ray tracing, physics simulation, neural graphics, and generative AI.
They will be available from Boxx, Dell, HP, and Lenovo starting in March.
Second, new NVIDIA OVX servers optimized for Omniverse.
OVX consists of L40 Ada RTX server GPUs and our new BlueField-3.
OVX servers will be available from Dell, HPE, Quanta, Gigabyte, Lenovo, and Supermicro.
Each layer of the Omniverse stack, including the chips, systems, networking, and software are new inventions.
Building and operating the Omniverse computer requires a sophisticated IT team.
We’re going to make Omniverse fast and easy to scale and engage.
Let’s take a look.
The world’s largest industries are racing to digitalize their physical processes.
Today, that’s a complex undertaking.
NVIDIA Omniverse Cloud is a platform-as-a-service that provides instant, secure access to managed Omniverse Cloud APIs,
workflows, and customizable applications running on NVIDIA OVX.
Enterprise teams access the suite of managed services through the web browser Omniverse Launcher
or via a custom-built integration.
Once in Omniverse Cloud, enterprise teams can instantly access, extend, and publish foundation applications
and workflows - to assemble and compose virtual worlds -
generate data to train perception AIs -
test and validate autonomous vehicles -
or simulate autonomous robots…
…accessing and publishing shared data to Omniverse Nucleus.
Designers and engineers working in their favorite 3rd party design tools on RTX workstations,
publish edits to Nucleus in parallel.
Then when ready to iterate or view their integrated model in Omniverse,
can simply open a web browser and log in.
As projects and teams scale, Omniverse Cloud helps optimize cost
by provisioning compute resources and licenses as needed.
And new services and upgrades are automatically provided with real time updates.
With Omniverse Cloud, enterprises can fast-track unified digitalization and collaboration
across major industrial workflows, increasing efficiency, reducing costs and waste,
and accelerating the path to innovation.
See you in Omniverse!
Today, we announce the NVIDIA Omniverse Cloud, a fully managed cloud service.
We’re partnering with Microsoft to bring Omniverse Cloud to the world’s industries.
We will host it in Azure, benefiting from Microsoft’s rich storage, security, applications, and services portfolio.
We are connecting Omniverse Cloud to Microsoft 365 productivity suite, including Teams, OneDrive, SharePoint,
and the Azure IoT Digital Twins services.
Microsoft and NVIDIA are bringing Omniverse to hundreds of millions of Microsoft 365 and Azure users.
Accelerated computing and AI have arrived.
Developers use NVIDIA to speed-up and scale-up to solve problems previously impossible.
A daunting challenge is Net Zero. Every company must accelerate every workload to reclaim power.
Accelerated computing is a full-stack, datacenter-scale computing challenge.
Grace, Grace-Hopper, and BlueField-3 are new chips for super energy-efficient accelerated data centers.
Acceleration libraries solve new challenges and open new markets.
We updated 100 acceleration libraries, including cuQuantum for quantum computing, cuOpt for combinatorial optimization,
and cuLitho for computational lithography.
We are thrilled to partner with TSMC, ASML, and Synopsys to go to 2nm and beyond.
NVIDIA DGX AI Supercomputer is the engine behind the generative large language model breakthrough.
The DGX H100 AI Supercomputer is in production and available soon
from an expanding network of OEM and cloud partners worldwide.
The DGX supercomputer is going beyond research and becoming a modern AI factory.
Every company will manufacture intelligence.
We are extending our business model with NVIDIA DGX Cloud by partnering with Microsoft Azure, Google GCP, and Oracle OCI
to instantly bring NVIDIA AI to every company, from a browser.
DGX Cloud offers customers the best of NVIDIA and the best of the world’s leading CSPs.
We are at the iPhone moment for AI.
Generative AI inference workloads have gone into overdrive.
We launched our new inference platform - four configurations - one architecture.
L4 for AI video.
L40 for Omniverse and graphics rendering.
H100 PCIE for scaling out large language model inference.
Grace-Hopper for recommender systems and vector databases.
NVIDIA’s inference platform enables maximum data center acceleration and elasticity.
NVIDIA and Google Cloud are working together to deploy a broad range of inference workloads.
With this collaboration, Google GCP is a premiere NVIDIA AI cloud.
NVIDIA AI Foundations is a cloud service, a foundry, for building custom language models and Generative AI.
NVIDIA AI Foundations comprises language, visual, and biology model-making services.
Getty Images and Shutterstock are building custom visual language models.
And we’re partnering with Adobe to build a set of next-generation AI capabilities for the future of creativity.
Omniverse is the digital-to-physical operating system to realize industrial digitalization.
Omniverse can unify the end-to-end workflow and digitalize the $3T, 14 million-employee automotive industry.
Omniverse is leaping to the cloud.
Hosted in Azure, we partner with Microsoft to bring Omniverse Cloud to the world’s industries.
I thank our systems, cloud, and software partners, researchers, scientists,
and especially our amazing employees
for building the NVIDIA accelerated computing ecosystem.
Together, we are helping the world do the impossible.
Have a great GTC!