"Which GPU should I use?" has no single answer โ it has the right answer for your workload. Using an H100 to serve a tiny model is waste; trying to train a giant model on an RTX 4090 is frustration. This guide matches each kind of task to the right GPU, from the consumer RTX 4090 to the next-gen NVIDIA Vera Rubin โ and the best part: on GPUBrazil you rent all of them by the hour, without paying import prices.
โก TL;DR
Inference of small/mid models and QLoRA โ RTX 4090/5090. Serious training โ A100. Large-scale training and high-throughput inference โ H100. Largest models / most memory โ H200. The future (H2 2026) โ NVIDIA Vera Rubin. All rentable by the hour in reais.
Comparison table
| GPU | Memory | Strong at | When to choose |
|---|---|---|---|
| RTX 4090 | 24GB GDDR6X | Price/perf, inference, QLoRA, gaming/streaming | Small/mid models and light fine-tuning |
| RTX 5090 | 32GB GDDR7 | Newer consumer flagship | More headroom than the 4090 without going data center |
| A100 | 40/80GB HBM2e | Training, NVLink | Training workhorse |
| H100 | 80GB HBM3 | Large-scale training, high throughput | Heavy production inference/training |
| H200 | 141GB HBM3e | Biggest current memory | The largest models on a single GPU |
| NVIDIA Vera Rubin | Next-gen | ~5x Blackwell inference | When it arrives (H2 2026) |
Decision framework: "if you do X, pick Y"
RTX 4090 โ the price/performance all-rounder
With 24GB of GDDR6X, the RTX 4090 is the smart choice for inference of small and mid-size models, QLoRA fine-tuning, and mixed development workloads. It's also excellent for cloud gaming and streaming. If you're prototyping or serving a model up into the tens of billions of parameters when quantized, it usually handles it comfortably.
RTX 5090 โ the newer consumer flagship
The RTX 5090 is the next consumer generation, with more memory and performance than the 4090. Good when you want a bit more headroom without moving to a data center GPU.
A100 โ the training workhorse
The A100 (40 or 80GB, with NVLink) is the reference for training. NVLink lets you combine multiple GPUs with high bandwidth, essential for distributing training of larger models.
H100 โ large scale and high throughput
The H100 (80GB HBM3) is the step up: large-scale training and high-throughput inference. When you need to serve many requests per second or train seriously, this is the one.
H200 โ the biggest current memory
With 141GB of HBM3e, the H200 is the current biggest-memory option, ideal for fitting the largest models on a single GPU without fragmenting across cards.
NVIDIA Vera Rubin โ the next frontier
The NVIDIA Vera Rubin platform is the 2026 bet: the Rubin R100 GPU (around 336 billion transistors) paired with the Vera CPU, promising roughly 5x the inference performance of Blackwell. Cloud availability is expected in the second half of 2026.
๐ก You don't need to buy
Buying any of these cards in Brazil involves steep import taxes (see how much it costs to run AI in Brazil). On GPUBrazil you rent by the hour in reais via Pix โ starting with the RTX A4000 from R$1.80/h. For the rest, see live pricing in the console.
Common mistakes when choosing
- Over-provisioning: paying for an H100 to serve a model that fits on a 4090.
- Under-provisioning: trying to train or run a large model on a GPU without enough memory โ leading to OOM or overly aggressive quantization.
- Ignoring memory: the most common bottleneck isn't compute, it's VRAM. Check model size + KV cache before choosing.
Test the right GPU before committing
Rent by the hour, from the RTX A4000 (from R$1.80/h) up to data center GPUs.
Get Started Free โFrequently asked questions
Which GPU should I use for inference of small and mid-size models?
For inference of small and mid-size models and light fine-tuning (QLoRA), the RTX 4090 (24GB GDDR6X) offers excellent price/performance. The newer RTX 5090 is the top consumer flagship. To start cheap, the RTX A4000 is available from R$1.80/h on GPUBrazil. All are rentable by the hour in reais.
What's the difference between A100, H100, and H200?
The A100 (40/80GB, NVLink) is the training workhorse. The H100 (80GB HBM3) delivers large-scale training and high-throughput inference. The H200 (141GB HBM3e) is the current biggest-memory option, ideal for the largest models. The choice depends on model size and the throughput you need.
What is the NVIDIA Vera Rubin platform?
Vera Rubin is NVIDIA's next-generation platform for 2026: the Rubin R100 GPU (around 336 billion transistors) paired with the Vera CPU, promising roughly 5x the inference performance of Blackwell. Cloud availability is expected in the second half of 2026.
Conclusion
Choosing a GPU is about matching memory and throughput to your real workload. Start with model size, then throughput, and only then think about raw power. And since everything on GPUBrazil is rentable by the hour in reais, you can test the GPU before committing โ without paying Brazilian import prices for hardware that may not have been the right fit.
Read next: NVIDIA Vera Rubin explained ยท Open-source LLM comparison 2026 ยท How much it costs to run AI in Brazil