How to Choose the Right GPU: RTX 4090 vs A100 vs H100 vs Rubin

"Which GPU should I use?" has no single answer — it has the right answer for your workload. Using an H100 to serve a tiny model is waste; trying to train a giant model on an RTX 4090 is frustration. This guide matches each kind of task to the right GPU, from the consumer RTX 4090 to the next-gen NVIDIA Vera Rubin — and the best part: on GPUBrazil you rent all of them by the hour, without paying import prices.

⚡ TL;DR

Inference of small/mid models and QLoRA → RTX 4090/5090. Serious training → A100. Large-scale training and high-throughput inference → H100. Largest models / most memory → H200. The future (H2 2026) → NVIDIA Vera Rubin. All rentable by the hour in reais.

Comparison table

GPU	Memory	Strong at	When to choose
RTX 4090	24GB GDDR6X	Price/perf, inference, QLoRA, gaming/streaming	Small/mid models and light fine-tuning
RTX 5090	32GB GDDR7	Newer consumer flagship	More headroom than the 4090 without going data center
A100	40/80GB HBM2e	Training, NVLink	Training workhorse
H100	80GB HBM3	Large-scale training, high throughput	Heavy production inference/training
H200	141GB HBM3e	Biggest current memory	The largest models on a single GPU
NVIDIA Vera Rubin	Next-gen	~5x Blackwell inference	When it arrives (H2 2026)

Decision framework: "if you do X, pick Y"

RTX 4090 — the price/performance all-rounder

With 24GB of GDDR6X, the RTX 4090 is the smart choice for inference of small and mid-size models, QLoRA fine-tuning, and mixed development workloads. It's also excellent for cloud gaming and streaming. If you're prototyping or serving a model up into the tens of billions of parameters when quantized, it usually handles it comfortably.

RTX 5090 — the newer consumer flagship

The RTX 5090 is the next consumer generation, with more memory and performance than the 4090. Good when you want a bit more headroom without moving to a data center GPU.

A100 — the training workhorse

The A100 (40 or 80GB, with NVLink) is the reference for training. NVLink lets you combine multiple GPUs with high bandwidth, essential for distributing training of larger models.

H100 — large scale and high throughput

The H100 (80GB HBM3) is the step up: large-scale training and high-throughput inference. When you need to serve many requests per second or train seriously, this is the one.

H200 — the biggest current memory

With 141GB of HBM3e, the H200 is the current biggest-memory option, ideal for fitting the largest models on a single GPU without fragmenting across cards.

NVIDIA Vera Rubin — the next frontier

The NVIDIA Vera Rubin platform is the 2026 bet: the Rubin R100 GPU (around 336 billion transistors) paired with the Vera CPU, promising roughly 5x the inference performance of Blackwell. Cloud availability is expected in the second half of 2026.

💡 You don't need to buy

Buying any of these cards in Brazil involves steep import taxes (see how much it costs to run AI in Brazil). On GPUBrazil you rent by the hour in reais via Pix — starting with the RTX A4000 from R$1.80/h. For the rest, see live pricing in the console.

Common mistakes when choosing

Over-provisioning: paying for an H100 to serve a model that fits on a 4090.
Under-provisioning: trying to train or run a large model on a GPU without enough memory — leading to OOM or overly aggressive quantization.
Ignoring memory: the most common bottleneck isn't compute, it's VRAM. Check model size + KV cache before choosing.

Test the right GPU before committing

Rent by the hour, from the RTX A4000 (from R$1.80/h) up to data center GPUs.

Get Started Free →

Frequently asked questions

Which GPU should I use for inference of small and mid-size models?

For inference of small and mid-size models and light fine-tuning (QLoRA), the RTX 4090 (24GB GDDR6X) offers excellent price/performance. The newer RTX 5090 is the top consumer flagship. To start cheap, the RTX A4000 is available from R$1.80/h on GPUBrazil. All are rentable by the hour in reais.

What's the difference between A100, H100, and H200?

The A100 (40/80GB, NVLink) is the training workhorse. The H100 (80GB HBM3) delivers large-scale training and high-throughput inference. The H200 (141GB HBM3e) is the current biggest-memory option, ideal for the largest models. The choice depends on model size and the throughput you need.

What is the NVIDIA Vera Rubin platform?

Vera Rubin is NVIDIA's next-generation platform for 2026: the Rubin R100 GPU (around 336 billion transistors) paired with the Vera CPU, promising roughly 5x the inference performance of Blackwell. Cloud availability is expected in the second half of 2026.

Conclusion

Choosing a GPU is about matching memory and throughput to your real workload. Start with model size, then throughput, and only then think about raw power. And since everything on GPUBrazil is rentable by the hour in reais, you can test the GPU before committing — without paying Brazilian import prices for hardware that may not have been the right fit.