NVIDIA H100 & H200 Tensor Core GPUs – AI & ML Performance at Scale

Q: What workloads are best suited for NVIDIA H100 and H200 GPUs?

These GPUs excel in: AI model training & inference, Natural language processing (NLP), Large-scale data analytics, Scientific computing, High-performance computing, High-frequency trading (HFT).

Q: How much VRAM do the NVIDIA H100 & H200 GPUs have?

H100 GPU: Up to 80GB HBM3 memory, H200 GPU: Features higher memory bandwidth and more capacity than the H100, making it even more powerful for large-scale AI applications

NVIDIA HGX H100

Starting at
$2.30 / Per hour

Reserve now

Unprecedented acceleration for the world’s most demanding AI and machine learning workloads

Availability

Mini Cluster 64 H100 GPUs
Base Cluster 24 H100 GPUs

Pricing

Starting at $2.30 / hour
with 36-month contract, 730 / hour / month

Key features

Connected by NVIDIA Quantum-2 3200Gb/s InfiniBand Networking with Non-Blocking InfiniBand Network Design

NVIDIA H100 SXM with FP8 Support available

Enterprise-ready at any scale and any location

Clusters at any size

Vultr's enterprise-ready infrastructure seamlessly supports any cluster size of NVIDIA HGX H100 and H200 GPUs. Whether you require a small cluster or a massive deployment, Vultr ensures reliable, high-performance computing to meet your specific needs.

Get in touch

Globally available, locally accessible

Large clusters of NVIDIA HGX H100 and H200 GPUs are available where you need them, thanks to Vultr’s extensive infrastructure. With 32 global cloud data center locations across six continents, we guarantee low latency and high availability, enabling your enterprise to achieve optimal performance worldwide.

Learn more

Enterprise-grade compliance and security

Vultr ensures our platform, products, and services meet diverse global compliance, privacy, and security needs, covering areas such as server availability, data protection, and privacy. Our commitment to industry-wide privacy and security frameworks, including ISO and SOC 2+ standards, demonstrates our dedication to protecting our customers' data.

Learn more

Purpose-built for AI, simulation, and data analytics

AI, complex simulations, and massive datasets require multiple GPUs with extremely fast interconnections and a fully accelerated software stack. The NVIDIA HGX™ AI supercomputing platform brings together the full power of NVIDIA GPUs, NVLink®, NVIDIA networking, and fully optimized AI and high-performance computing (HPC) software stacks to provide the highest application performance and drive the fastest time to insights.

Download the H100 datasheet
Download the H200 datasheet

Specifications	NVIDIA H100 SXM	NVIDIA H200 SXM¹
FP64	34 TFLOPS	34 TFLOPS
FP64 Tensor Core	67 TFLOPS	67 TFLOPS
FP32	67 TFLOPS	67 TFLOPS
TF32 Tensor Core	989 TFLOPS²	989 TFLOPS²
BFLOAT16 Tensor Core	1,979 TFLOPS²	1,979 TFLOPS²
FP16 Tensor Core	1,979 TFLOPS²	1,979 TFLOPS²
FP8 Tensor Core	3,958 TFLOPS²	3,958 TFLOPS²
INT8 Core	3,958 TFLOPS²	3,958 TFLOPS²
GPU Memory	80 GB	141 GB
GPU Memory Bandwith	3.35TB/s	4.8TB/s
Decoders	7 NVDEC \| 7 JPEG	7 NVDEC \| 7 JPEG
Interconnect	NVIDIA NVLink®: 900GB/s PCIe Gen5: 128GB/s	NVIDIA NVLink®: 900GB/s PCIe Gen5: 128GB/s
¹Preliminary specifications. May be subject to change. ²With sparsity.

Additional resources

FAQ

What is the difference between NVIDIA H100 and H200 GPUs?

The NVIDIA H100 and H200 GPUs are both powerful accelerators designed for demanding AI and high-performance computing (HPC) workloads. The H100 GPU excels in AI model training and HPC tasks, offering industry-leading performance for deep learning applications. Building on this foundation, the H200 GPU is an evolution of the H100, featuring higher memory bandwidth and increased memory capacity. These enhancements make the H200 especially well-suited for large-scale AI inference and intensive data processing tasks.

What workloads are best suited for NVIDIA H100 and H200 GPUs?

These GPUs excel in:

AI model training & inference
Natural language processing (NLP)
Large-scale data analytics
Scientific computing
High-performance computing
High-frequency trading (HFT)

Can I rent NVIDIA H100 GPUs on demand?

Yes, Vultr offers on-demand NVIDIA H100 GPUs, allowing businesses and developers to scale AI workloads without investing in expensive hardware.

How does renting an NVIDIA GPU from Vultr work?

You can deploy NVIDIA H100 GPUs in just a few clicks via Vultr’s cloud platform. Simply choose the GPU instance that fits your needs and start running your AI or HPC workloads immediately.

How much VRAM do the NVIDIA H100 & H200 GPUs have?

H100 GPU: Up to 80GB HBM3 memory
H200 GPU: Features higher memory bandwidth and more capacity than the H100, making it even more powerful for large-scale AI applications

How do NVIDIA H100 and H200 GPUs improve AI inference?

These GPUs offer low-latency inference performance with FP8 Tensor Cores, making them ideal for real-time AI applications like chatbots, voice assistants, and recommendation engines.

Can I run multiple GPUs in parallel for distributed training?

Yes, Vultr supports multi-GPU clusters using NVIDIA H100 & H200 GPUs, enabling faster AI model training and inference across multiple nodes.

What are NVIDIA H100 and H200 Tensor Core GPUs?

NVIDIA H100 and H200 GPUs are high-performance GPUs designed for AI, deep learning, and high-performance computing (HPC). They feature Tensor Cores optimized for machine learning workloads, offering superior speed and efficiency compared to previous generations.

What are Tensor Cores, and why are they important?

Tensor Cores are specialized AI cores in NVIDIA GPUs that accelerate matrix multiplications, a key operation in deep learning. This technology enables faster AI model training and inference, significantly boosting performance for machine learning applications.

How do FP8 and Transformer Engine innovations in the NVIDIA H100 GPU optimize deep learning performance?

The NVIDIA H100 introduces FP8 precision and the Transformer Engine, which deliver up to 4x faster training and inference for transformer-based models like GPT and BERT. FP8 enables smaller data formats with minimal accuracy loss, while the Transformer Engine dynamically chooses the optimal precision for each layer – ideal for large-scale deep learning on Vultr’s cloud infrastructure.

What are the cost-performance tradeoffs when selecting NVIDIA H100 vs H200 GPUs for cloud deployment?

The H100 offers exceptional performance for training large AI models, while the H200 builds on that with expanded memory and bandwidth, which is ideal for inference at scale. On Vultr, H100 instances start at $2.30/GPU/hour, making them a cost-efficient option for training workloads. The H200 justifies its premium for memory-bound inference tasks with higher throughput and latency reduction.

How do NVIDIA H100 and H200 GPUs handle multi-instance GPU (MIG) configurations for workload isolation?

Both H100 and H200 support Multi-Instance GPU (MIG), allowing up to seven isolated GPU instances per physical GPU. This enables better utilization, isolation, and security across AI, ML, and HPC workloads. On Vultr’s cloud GPU hosting platform, MIG makes it easy to allocate right-sized GPU resources without overprovisioning.

Are NVIDIA H100 and H200 GPUs suitable for deploying AI inference at the edge via cloud regions?

Yes – paired with Vultr’s globally distributed infrastructure, H100 and H200 GPUs deliver low-latency AI inference capabilities close to end-users. These GPUs are well-suited for edge inference workloads such as video analytics, recommendation engines, and speech recognition, where real-time performance and availability across global regions are critical.