What is LoRA?

LoRA (Low-Rank Adaptation) lets you customize AI models by training small adapter weights instead of the full model. Benefits include:

πŸ’‘ What you'll learn

How to train LoRAs for Stable Diffusion (images) and LLMs (text), from dataset preparation to deployment.

Part 1: Stable Diffusion LoRA

Use Cases

Setup: Kohya SS Trainer

# Clone Kohya trainer
git clone https://github.com/kohya-ss/sd-scripts.git
cd sd-scripts

# Create environment
python -m venv venv
source venv/bin/activate

# Install dependencies
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
pip install xformers

# Download base model (SDXL)
mkdir models
cd models
wget https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors

Dataset Preparation

Quality data is crucial for good LoRAs:

# Dataset structure
dataset/
β”œβ”€β”€ 10_character_name/     # 10 = repeat count
β”‚   β”œβ”€β”€ image1.png
β”‚   β”œβ”€β”€ image1.txt         # Caption for image1
β”‚   β”œβ”€β”€ image2.png
β”‚   β”œβ”€β”€ image2.txt
β”‚   └── ...
└── 5_style_name/          # Different concept
    β”œβ”€β”€ style1.png
    β”œβ”€β”€ style1.txt
    └── ...

Dataset Guidelines

# Example caption (image1.txt):
sks character, a woman with red hair, standing in a park, 
sunny day, wearing blue dress, smiling

# "sks" is the trigger word - use something unique

Training Configuration

# train_lora.sh
accelerate launch --num_cpu_threads_per_process=2 train_network.py \
  --pretrained_model_name_or_path="./models/sd_xl_base_1.0.safetensors" \
  --train_data_dir="./dataset" \
  --output_dir="./output" \
  --output_name="my_character_lora" \
  --save_model_as=safetensors \
  --resolution=1024,1024 \
  --train_batch_size=1 \
  --max_train_epochs=10 \
  --learning_rate=1e-4 \
  --unet_lr=1e-4 \
  --text_encoder_lr=1e-5 \
  --network_dim=32 \
  --network_alpha=16 \
  --network_module=networks.lora \
  --optimizer_type="AdamW8bit" \
  --mixed_precision="bf16" \
  --cache_latents \
  --gradient_checkpointing \
  --save_every_n_epochs=2 \
  --sample_every_n_epochs=1 \
  --sample_prompts="./prompts.txt"

Key Parameters Explained

⚠️ Overfitting Warning

If generated images look exactly like training images, you've overfit. Reduce epochs, increase regularization, or add more diverse training data.

Part 2: LLM LoRA Training

Use Cases

Setup with PEFT

pip install transformers peft datasets accelerate bitsandbytes

Training Script

from transformers import (
    AutoModelForCausalLM, 
    AutoTokenizer,
    TrainingArguments,
    Trainer,
    DataCollatorForLanguageModeling
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from datasets import load_dataset
import torch

# Load base model in 4-bit
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct",
    load_in_4bit=True,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
tokenizer.pad_token = tokenizer.eos_token

# Prepare for training
model = prepare_model_for_kbit_training(model)

# LoRA configuration
lora_config = LoraConfig(
    r=16,                          # Rank
    lora_alpha=32,                 # Scaling
    target_modules=[               # Which layers to adapt
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj"
    ],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# Typically ~0.5-2% of total parameters

# Load dataset
dataset = load_dataset("json", data_files="training_data.jsonl")

def tokenize(example):
    return tokenizer(
        example["text"],
        truncation=True,
        max_length=2048,
        padding="max_length"
    )

tokenized_dataset = dataset.map(tokenize, batched=True)

# Training arguments
training_args = TrainingArguments(
    output_dir="./lora_output",
    num_train_epochs=3,
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    bf16=True,
    logging_steps=10,
    save_steps=100,
    warmup_ratio=0.03,
    lr_scheduler_type="cosine",
)

# Train
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset["train"],
    data_collator=DataCollatorForLanguageModeling(tokenizer, mlm=False),
)

trainer.train()

# Save LoRA
model.save_pretrained("./my_lora")

Dataset Format

// training_data.jsonl
{"text": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nYour question here<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nYour desired response here<|eot_id|>"}
{"text": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>\n\nAnother question<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\nAnother response<|eot_id|>"}

Loading and Using LoRAs

Stable Diffusion (ComfyUI)

# Place LoRA in models/loras/
# In ComfyUI:
1. Add "Load LoRA" node
2. Connect between checkpoint and CLIP
3. Set weight (0.5-1.0 typical)

Stable Diffusion (Diffusers)

from diffusers import StableDiffusionXLPipeline
import torch

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16
).to("cuda")

# Load LoRA
pipe.load_lora_weights("./my_character_lora.safetensors")

# Generate with trigger word
image = pipe(
    "sks character standing in a forest, sunset",
    num_inference_steps=25
).images[0]

# Unload LoRA
pipe.unload_lora_weights()

LLM LoRA

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

# Load LoRA
model = PeftModel.from_pretrained(base_model, "./my_lora")

# Or merge for faster inference
model = model.merge_and_unload()

# Use normally
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
# ...

Training Time Estimates

TaskDataset SizeRTX 4090 TimeCost on GPUBrazil
SDXL Character LoRA30 images~30 min~$0.20
SDXL Style LoRA100 images~1-2 hrs~$0.60
LLM LoRA (8B)10k examples~2-4 hrs~$1.20
LLM LoRA (70B)10k examples~8-12 hrs~$10.00

Train Your Custom LoRA Today

RTX 4090s from $0.40/hr. Train a character LoRA for under $1.

Get $5 Free Credit β†’

Best Practices

Image LoRAs

  1. Use consistent, high-quality images
  2. Include variety in poses, lighting, backgrounds
  3. Write detailed, accurate captions
  4. Use unique trigger word
  5. Monitor samples during training
  6. Test at different weights (0.5, 0.7, 1.0)

LLM LoRAs

  1. High-quality, consistent training examples
  2. Match the chat template exactly
  3. Include diverse examples of desired behavior
  4. Validate with held-out test set
  5. Start with smaller rank, increase if needed

Troubleshooting

LoRA Has No Effect

Overfitting

Poor Quality

Conclusion

LoRA training democratizes AI customization. For under $1, you can train a custom character model that would have cost thousands just a few years ago.

Start with image LoRAsβ€”they're simpler and results are immediately visible. Once comfortable, try LLM LoRAs for specialized text generation.

Train on GPUBrazil to access powerful GPUs without upfront hardware investment.