What is LoRA?
LoRA (Low-Rank Adaptation) lets you customize AI models by training small adapter weights instead of the full model. Benefits include:
- Fast training: Hours instead of days
- Low VRAM: Train on consumer GPUs
- Small files: 10-200MB vs 2-7GB for full models
- Stackable: Combine multiple LoRAs
- Reversible: Base model unchanged
π‘ What you'll learn
How to train LoRAs for Stable Diffusion (images) and LLMs (text), from dataset preparation to deployment.
Part 1: Stable Diffusion LoRA
Use Cases
- Custom characters or people
- Specific art styles
- Product/brand imagery
- Consistent environments
Setup: Kohya SS Trainer
# Clone Kohya trainer
git clone https://github.com/kohya-ss/sd-scripts.git
cd sd-scripts
# Create environment
python -m venv venv
source venv/bin/activate
# Install dependencies
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
pip install xformers
# Download base model (SDXL)
mkdir models
cd models
wget https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors
Dataset Preparation
Quality data is crucial for good LoRAs:
# Dataset structure
dataset/
βββ 10_character_name/ # 10 = repeat count
β βββ image1.png
β βββ image1.txt # Caption for image1
β βββ image2.png
β βββ image2.txt
β βββ ...
βββ 5_style_name/ # Different concept
βββ style1.png
βββ style1.txt
βββ ...
Dataset Guidelines
- Quantity: 15-50 images for characters, 50-200 for styles
- Quality: High resolution, varied poses/angles
- Captions: Describe each image, include trigger word
- Consistency: Similar quality across images
# Example caption (image1.txt):
sks character, a woman with red hair, standing in a park,
sunny day, wearing blue dress, smiling
# "sks" is the trigger word - use something unique
Training Configuration
# train_lora.sh
accelerate launch --num_cpu_threads_per_process=2 train_network.py \
--pretrained_model_name_or_path="./models/sd_xl_base_1.0.safetensors" \
--train_data_dir="./dataset" \
--output_dir="./output" \
--output_name="my_character_lora" \
--save_model_as=safetensors \
--resolution=1024,1024 \
--train_batch_size=1 \
--max_train_epochs=10 \
--learning_rate=1e-4 \
--unet_lr=1e-4 \
--text_encoder_lr=1e-5 \
--network_dim=32 \
--network_alpha=16 \
--network_module=networks.lora \
--optimizer_type="AdamW8bit" \
--mixed_precision="bf16" \
--cache_latents \
--gradient_checkpointing \
--save_every_n_epochs=2 \
--sample_every_n_epochs=1 \
--sample_prompts="./prompts.txt"
Key Parameters Explained
- network_dim (rank): Higher = more capacity, larger file. 16-64 typical
- network_alpha: Scaling factor. Usually dim/2 or equal to dim
- learning_rate: Start 1e-4, reduce if overfit
- epochs: 5-15 for characters, 10-30 for styles
β οΈ Overfitting Warning
If generated images look exactly like training images, you've overfit. Reduce epochs, increase regularization, or add more diverse training data.
Part 2: LLM LoRA Training
Use Cases
- Custom writing styles
- Domain-specific knowledge
- Specialized tasks (coding, analysis)
- Character roleplay
Setup with PEFT
pip install transformers peft datasets accelerate bitsandbytes
Training Script
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
TrainingArguments,
Trainer,
DataCollatorForLanguageModeling
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from datasets import load_dataset
import torch
# Load base model in 4-bit
model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.1-8B-Instruct",
load_in_4bit=True,
torch_dtype=torch.bfloat16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
tokenizer.pad_token = tokenizer.eos_token
# Prepare for training
model = prepare_model_for_kbit_training(model)
# LoRA configuration
lora_config = LoraConfig(
r=16, # Rank
lora_alpha=32, # Scaling
target_modules=[ # Which layers to adapt
"q_proj", "k_proj", "v_proj", "o_proj",
"gate_proj", "up_proj", "down_proj"
],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# Typically ~0.5-2% of total parameters
# Load dataset
dataset = load_dataset("json", data_files="training_data.jsonl")
def tokenize(example):
return tokenizer(
example["text"],
truncation=True,
max_length=2048,
padding="max_length"
)
tokenized_dataset = dataset.map(tokenize, batched=True)
# Training arguments
training_args = TrainingArguments(
output_dir="./lora_output",
num_train_epochs=3,
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
learning_rate=2e-4,
bf16=True,
logging_steps=10,
save_steps=100,
warmup_ratio=0.03,
lr_scheduler_type="cosine",
)
# Train
trainer = Trainer(
model=model,
args=training_args,
train_dataset=tokenized_dataset["train"],
data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
trainer.train()
# Save LoRA
model.save_pretrained("./my_lora")
Dataset Format
// training_data.jsonl
{"text": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nYour question here<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nYour desired response here<|eot_id|>"}
{"text": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nAnother question<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAnother response<|eot_id|>"}
Loading and Using LoRAs
Stable Diffusion (ComfyUI)
# Place LoRA in models/loras/
# In ComfyUI:
1. Add "Load LoRA" node
2. Connect between checkpoint and CLIP
3. Set weight (0.5-1.0 typical)
Stable Diffusion (Diffusers)
from diffusers import StableDiffusionXLPipeline
import torch
pipe = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16
).to("cuda")
# Load LoRA
pipe.load_lora_weights("./my_character_lora.safetensors")
# Generate with trigger word
image = pipe(
"sks character standing in a forest, sunset",
num_inference_steps=25
).images[0]
# Unload LoRA
pipe.unload_lora_weights()
LLM LoRA
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.1-8B-Instruct",
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Load LoRA
model = PeftModel.from_pretrained(base_model, "./my_lora")
# Or merge for faster inference
model = model.merge_and_unload()
# Use normally
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
# ...
Training Time Estimates
| Task | Dataset Size | RTX 4090 Time | Cost on GPUBrazil |
|---|---|---|---|
| SDXL Character LoRA | 30 images | ~30 min | ~$0.20 |
| SDXL Style LoRA | 100 images | ~1-2 hrs | ~$0.60 |
| LLM LoRA (8B) | 10k examples | ~2-4 hrs | ~$1.20 |
| LLM LoRA (70B) | 10k examples | ~8-12 hrs | ~$10.00 |
Train Your Custom LoRA Today
RTX 4090s from $0.40/hr. Train a character LoRA for under $1.
Get $5 Free Credit βBest Practices
Image LoRAs
- Use consistent, high-quality images
- Include variety in poses, lighting, backgrounds
- Write detailed, accurate captions
- Use unique trigger word
- Monitor samples during training
- Test at different weights (0.5, 0.7, 1.0)
LLM LoRAs
- High-quality, consistent training examples
- Match the chat template exactly
- Include diverse examples of desired behavior
- Validate with held-out test set
- Start with smaller rank, increase if needed
Troubleshooting
LoRA Has No Effect
- Check trigger word is in prompt
- Increase LoRA weight
- Train longer or with higher rank
Overfitting
- Reduce epochs
- Lower learning rate
- Add more training data
- Reduce rank/dim
Poor Quality
- Improve training data quality
- Check captions are accurate
- Ensure consistent style in training images
Conclusion
LoRA training democratizes AI customization. For under $1, you can train a custom character model that would have cost thousands just a few years ago.
Start with image LoRAsβthey're simpler and results are immediately visible. Once comfortable, try LLM LoRAs for specialized text generation.
Train on GPUBrazil to access powerful GPUs without upfront hardware investment.