LoRA Training Guide: Custom AI Models in Hours

What is LoRA?

LoRA (Low-Rank Adaptation) lets you customize AI models by training small adapter weights instead of the full model. Benefits include:

Fast training: Hours instead of days
Low VRAM: Train on consumer GPUs
Small files: 10-200MB vs 2-7GB for full models
Stackable: Combine multiple LoRAs
Reversible: Base model unchanged

💡 What you'll learn

How to train LoRAs for Stable Diffusion (images) and LLMs (text), from dataset preparation to deployment.

Part 1: Stable Diffusion LoRA

Use Cases

Custom characters or people
Specific art styles
Product/brand imagery
Consistent environments

Setup: Kohya SS Trainer

# Clone Kohya trainer
git clone https://github.com/kohya-ss/sd-scripts.git
cd sd-scripts

# Create environment
python -m venv venv
source venv/bin/activate

# Install dependencies
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
pip install xformers

# Download base model (SDXL)
mkdir models
cd models
wget https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors

Dataset Preparation

Quality data is crucial for good LoRAs:

# Dataset structure
dataset/
├── 10_character_name/     # 10 = repeat count
│   ├── image1.png
│   ├── image1.txt         # Caption for image1
│   ├── image2.png
│   ├── image2.txt
│   └── ...
└── 5_style_name/          # Different concept
    ├── style1.png
    ├── style1.txt
    └── ...

Dataset Guidelines

Quantity: 15-50 images for characters, 50-200 for styles
Quality: High resolution, varied poses/angles
Captions: Describe each image, include trigger word
Consistency: Similar quality across images

# Example caption (image1.txt):
sks character, a woman with red hair, standing in a park, 
sunny day, wearing blue dress, smiling

# "sks" is the trigger word - use something unique

Training Configuration

# train_lora.sh
accelerate launch --num_cpu_threads_per_process=2 train_network.py \
  --pretrained_model_name_or_path="./models/sd_xl_base_1.0.safetensors" \
  --train_data_dir="./dataset" \
  --output_dir="./output" \
  --output_name="my_character_lora" \
  --save_model_as=safetensors \
  --resolution=1024,1024 \
  --train_batch_size=1 \
  --max_train_epochs=10 \
  --learning_rate=1e-4 \
  --unet_lr=1e-4 \
  --text_encoder_lr=1e-5 \
  --network_dim=32 \
  --network_alpha=16 \
  --network_module=networks.lora \
  --optimizer_type="AdamW8bit" \
  --mixed_precision="bf16" \
  --cache_latents \
  --gradient_checkpointing \
  --save_every_n_epochs=2 \
  --sample_every_n_epochs=1 \
  --sample_prompts="./prompts.txt"

Key Parameters Explained

network_dim (rank): Higher = more capacity, larger file. 16-64 typical
network_alpha: Scaling factor. Usually dim/2 or equal to dim
learning_rate: Start 1e-4, reduce if overfit
epochs: 5-15 for characters, 10-30 for styles

⚠️ Overfitting Warning

If generated images look exactly like training images, you've overfit. Reduce epochs, increase regularization, or add more diverse training data.

Part 2: LLM LoRA Training

Use Cases

Custom writing styles
Domain-specific knowledge
Specialized tasks (coding, analysis)
Character roleplay

Setup with PEFT

pip install transformers peft datasets accelerate bitsandbytes

Training Script

from transformers import (
    AutoModelForCausalLM, 
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    DataCollatorForLanguageModeling
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from datasets import load_dataset
import torch

# Load base model in 4-bit
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct",
    load_in_4bit=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
tokenizer.pad_token = tokenizer.eos_token

# Prepare for training
model = prepare_model_for_kbit_training(model)

# LoRA configuration
lora_config = LoraConfig(
    r=16,                          # Rank
    lora_alpha=32,                 # Scaling
    target_modules=[               # Which layers to adapt
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj"
    ],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# Typically ~0.5-2% of total parameters

# Load dataset
dataset = load_dataset("json", data_files="training_data.jsonl")

def tokenize(example):
    return tokenizer(
        example["text"],
        truncation=True,
        max_length=2048,
        padding="max_length"
    )

tokenized_dataset = dataset.map(tokenize, batched=True)

# Training arguments
training_args = TrainingArguments(
    output_dir="./lora_output",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    bf16=True,
    logging_steps=10,
    save_steps=100,
    warmup_ratio=0.03,
    lr_scheduler_type="cosine",
)

# Train
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False),
)

trainer.train()

# Save LoRA
model.save_pretrained("./my_lora")

Dataset Format

// training_data.jsonl
{"text": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nYour question here<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nYour desired response here<|eot_id|>"}
{"text": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nAnother question<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAnother response<|eot_id|>"}

Loading and Using LoRAs

Stable Diffusion (ComfyUI)

# Place LoRA in models/loras/
# In ComfyUI:
1. Add "Load LoRA" node
2. Connect between checkpoint and CLIP
3. Set weight (0.5-1.0 typical)

Stable Diffusion (Diffusers)

from diffusers import StableDiffusionXLPipeline
import torch

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16
).to("cuda")

# Load LoRA
pipe.load_lora_weights("./my_character_lora.safetensors")

# Generate with trigger word
image = pipe(
    "sks character standing in a forest, sunset",
    num_inference_steps=25
).images[0]

# Unload LoRA
pipe.unload_lora_weights()

LLM LoRA

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Load LoRA
model = PeftModel.from_pretrained(base_model, "./my_lora")

# Or merge for faster inference
model = model.merge_and_unload()

# Use normally
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
# ...

Training Time Estimates

Task	Dataset Size	RTX 4090 Time	Cost on GPUBrazil
SDXL Character LoRA	30 images	~30 min	~$0.20
SDXL Style LoRA	100 images	~1-2 hrs	~$0.60
LLM LoRA (8B)	10k examples	~2-4 hrs	~$1.20
LLM LoRA (70B)	10k examples	~8-12 hrs	~$10.00

Train Your Custom LoRA Today

RTX 4090s from $0.40/hr. Train a character LoRA for under $1.

Get $5 Free Credit →

Best Practices

Image LoRAs

Use consistent, high-quality images
Include variety in poses, lighting, backgrounds
Write detailed, accurate captions
Use unique trigger word
Monitor samples during training
Test at different weights (0.5, 0.7, 1.0)

LLM LoRAs

High-quality, consistent training examples
Match the chat template exactly
Include diverse examples of desired behavior
Validate with held-out test set
Start with smaller rank, increase if needed

Troubleshooting

LoRA Has No Effect

Check trigger word is in prompt
Increase LoRA weight
Train longer or with higher rank

Overfitting

Reduce epochs
Lower learning rate
Add more training data
Reduce rank/dim

Poor Quality

Improve training data quality
Check captions are accurate
Ensure consistent style in training images

Conclusion

LoRA training democratizes AI customization. For under $1, you can train a custom character model that would have cost thousands just a few years ago.

Start with image LoRAs—they're simpler and results are immediately visible. Once comfortable, try LLM LoRAs for specialized text generation.

Train on GPUBrazil to access powerful GPUs without upfront hardware investment.