"Which GPU should I use?" has no single answer โ€” it has the right answer for your workload. Using an H100 to serve a tiny model is waste; trying to train a giant model on an RTX 4090 is frustration. This guide matches each kind of task to the right GPU, from the consumer RTX 4090 to the next-gen NVIDIA Vera Rubin โ€” and the best part: on GPUBrazil you rent all of them by the hour, without paying import prices.

โšก TL;DR

Inference of small/mid models and QLoRA โ†’ RTX 4090/5090. Serious training โ†’ A100. Large-scale training and high-throughput inference โ†’ H100. Largest models / most memory โ†’ H200. The future (H2 2026) โ†’ NVIDIA Vera Rubin. All rentable by the hour in reais.

Comparison table

GPUMemoryStrong atWhen to choose
RTX 409024GB GDDR6XPrice/perf, inference, QLoRA, gaming/streamingSmall/mid models and light fine-tuning
RTX 509032GB GDDR7Newer consumer flagshipMore headroom than the 4090 without going data center
A10040/80GB HBM2eTraining, NVLinkTraining workhorse
H10080GB HBM3Large-scale training, high throughputHeavy production inference/training
H200141GB HBM3eBiggest current memoryThe largest models on a single GPU
NVIDIA Vera RubinNext-gen~5x Blackwell inferenceWhen it arrives (H2 2026)

Decision framework: "if you do X, pick Y"

RTX 4090 โ€” the price/performance all-rounder

With 24GB of GDDR6X, the RTX 4090 is the smart choice for inference of small and mid-size models, QLoRA fine-tuning, and mixed development workloads. It's also excellent for cloud gaming and streaming. If you're prototyping or serving a model up into the tens of billions of parameters when quantized, it usually handles it comfortably.

RTX 5090 โ€” the newer consumer flagship

The RTX 5090 is the next consumer generation, with more memory and performance than the 4090. Good when you want a bit more headroom without moving to a data center GPU.

A100 โ€” the training workhorse

The A100 (40 or 80GB, with NVLink) is the reference for training. NVLink lets you combine multiple GPUs with high bandwidth, essential for distributing training of larger models.

H100 โ€” large scale and high throughput

The H100 (80GB HBM3) is the step up: large-scale training and high-throughput inference. When you need to serve many requests per second or train seriously, this is the one.

H200 โ€” the biggest current memory

With 141GB of HBM3e, the H200 is the current biggest-memory option, ideal for fitting the largest models on a single GPU without fragmenting across cards.

NVIDIA Vera Rubin โ€” the next frontier

The NVIDIA Vera Rubin platform is the 2026 bet: the Rubin R100 GPU (around 336 billion transistors) paired with the Vera CPU, promising roughly 5x the inference performance of Blackwell. Cloud availability is expected in the second half of 2026.

๐Ÿ’ก You don't need to buy

Buying any of these cards in Brazil involves steep import taxes (see how much it costs to run AI in Brazil). On GPUBrazil you rent by the hour in reais via Pix โ€” starting with the RTX A4000 from R$1.80/h. For the rest, see live pricing in the console.

Common mistakes when choosing

  • Over-provisioning: paying for an H100 to serve a model that fits on a 4090.
  • Under-provisioning: trying to train or run a large model on a GPU without enough memory โ€” leading to OOM or overly aggressive quantization.
  • Ignoring memory: the most common bottleneck isn't compute, it's VRAM. Check model size + KV cache before choosing.

Test the right GPU before committing

Rent by the hour, from the RTX A4000 (from R$1.80/h) up to data center GPUs.

Get Started Free โ†’

Frequently asked questions

Which GPU should I use for inference of small and mid-size models?

For inference of small and mid-size models and light fine-tuning (QLoRA), the RTX 4090 (24GB GDDR6X) offers excellent price/performance. The newer RTX 5090 is the top consumer flagship. To start cheap, the RTX A4000 is available from R$1.80/h on GPUBrazil. All are rentable by the hour in reais.

What's the difference between A100, H100, and H200?

The A100 (40/80GB, NVLink) is the training workhorse. The H100 (80GB HBM3) delivers large-scale training and high-throughput inference. The H200 (141GB HBM3e) is the current biggest-memory option, ideal for the largest models. The choice depends on model size and the throughput you need.

What is the NVIDIA Vera Rubin platform?

Vera Rubin is NVIDIA's next-generation platform for 2026: the Rubin R100 GPU (around 336 billion transistors) paired with the Vera CPU, promising roughly 5x the inference performance of Blackwell. Cloud availability is expected in the second half of 2026.

Conclusion

Choosing a GPU is about matching memory and throughput to your real workload. Start with model size, then throughput, and only then think about raw power. And since everything on GPUBrazil is rentable by the hour in reais, you can test the GPU before committing โ€” without paying Brazilian import prices for hardware that may not have been the right fit.

Read next: NVIDIA Vera Rubin explained ยท Open-source LLM comparison 2026 ยท How much it costs to run AI in Brazil