Why ComfyUI?
ComfyUI is the power user's choice for AI image generation. Unlike simple interfaces, ComfyUI's node-based workflow gives you:
- Complete control: Every step of the generation pipeline
- Reproducibility: Save and share exact workflows
- Advanced features: ControlNet, IP-Adapter, AnimateDiff out of the box
- Efficiency: Batch processing, queuing, API access
- Extensibility: Thousands of custom nodes
๐ก What you'll learn
How to set up ComfyUI on a cloud GPU, create advanced workflows with ControlNet and IP-Adapter, and automate generation via API.
Setup on GPUBrazil
Spin up an RTX 4090 instance on GPUBrazil, then:
# Clone ComfyUI
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
# Create virtual environment
python -m venv venv
source venv/bin/activate
# Install PyTorch with CUDA
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
# Install ComfyUI dependencies
pip install -r requirements.txt
# Run ComfyUI
python main.py --listen 0.0.0.0 --port 8188
Access via http://your-server-ip:8188
Installing Models
# SDXL base model
cd models/checkpoints
wget https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors
# SDXL VAE (better colors)
cd ../vae
wget https://huggingface.co/stabilityai/sdxl-vae/resolve/main/sdxl_vae.safetensors
# ControlNet models
cd ../controlnet
wget https://huggingface.co/lllyasviel/sd_control_collection/resolve/main/diffusers_xl_canny_full.safetensors
wget https://huggingface.co/lllyasviel/sd_control_collection/resolve/main/diffusers_xl_depth_full.safetensors
Essential Custom Nodes
# Install ComfyUI Manager (essential!)
cd custom_nodes
git clone https://github.com/ltdrdata/ComfyUI-Manager.git
# IP-Adapter
git clone https://github.com/cubiq/ComfyUI_IPAdapter_plus.git
# ControlNet Preprocessors
git clone https://github.com/Fannovel16/comfyui_controlnet_aux.git
# AnimateDiff
git clone https://github.com/Kosinkadink/ComfyUI-AnimateDiff-Evolved.git
# Restart ComfyUI to load nodes
# Then use ComfyUI Manager to install dependencies
Basic SDXL Workflow
The fundamental workflow connects these nodes:
- Load Checkpoint: Load SDXL model
- CLIP Text Encode: Convert prompt to embeddings
- Empty Latent Image: Create blank canvas
- KSampler: Run diffusion process
- VAE Decode: Convert latent to image
- Save Image: Output result
// Workflow JSON (paste in ComfyUI)
{
"nodes": [
{"type": "CheckpointLoaderSimple", "id": 1},
{"type": "CLIPTextEncode", "id": 2, "inputs": {"text": "your prompt here"}},
{"type": "CLIPTextEncode", "id": 3, "inputs": {"text": "ugly, blurry"}},
{"type": "EmptyLatentImage", "id": 4, "inputs": {"width": 1024, "height": 1024}},
{"type": "KSampler", "id": 5, "inputs": {"steps": 25, "cfg": 7.5}},
{"type": "VAEDecode", "id": 6},
{"type": "SaveImage", "id": 7}
]
}
ControlNet for Precise Control
ControlNet lets you guide generation with reference images:
Canny Edge Detection
Preserves edges/outlines from a reference image:
# Add these nodes:
1. Load Image (reference)
2. Canny Edge Detector (preprocessor)
3. Load ControlNet Model
4. Apply ControlNet
5. Connect to KSampler
Depth Control
Maintains 3D structure and perspective:
# Nodes needed:
1. Load Image
2. Depth Estimator (MiDaS or Zoe)
3. Load ControlNet (depth model)
4. Apply ControlNet
OpenPose for Characters
Control character poses precisely:
# Pose workflow:
1. Load reference image with pose
2. OpenPose Detector
3. Load ControlNet (openpose model)
4. Apply with strength 0.8-1.0
๐ก ControlNet Tips
Start with strength 0.7-0.8. Higher values follow the control more strictly but may reduce creativity. Combine multiple ControlNets for complex control.
IP-Adapter for Style Transfer
IP-Adapter transfers style from reference images without training:
# IP-Adapter workflow:
1. Load SDXL checkpoint
2. Load IP-Adapter model
3. Load CLIP Vision model
4. Load reference style image
5. IPAdapter Apply node
- weight: 0.7 (style influence)
- noise: 0.3 (variation)
6. Connect to KSampler
# Great for:
- Consistent character generation
- Style transfer
- Brand consistency
Face-Specific IP-Adapter
# For consistent faces:
1. Use IP-Adapter FaceID model
2. Load face reference
3. InstantID nodes for best results
4. Combine with ControlNet pose
AnimateDiff for Video
Generate AI animations from text prompts:
# AnimateDiff workflow:
1. Load Checkpoint
2. Load AnimateDiff Model (motion module)
3. AnimateDiff Loader
4. Standard prompt encoding
5. AnimateDiff Sampler
6. AnimateDiff Combine (to video)
# Settings:
- Frames: 16-32 for short clips
- Motion scale: 1.0 default
- Context length: 16
โ ๏ธ VRAM Requirements
AnimateDiff needs significant VRAM. For 16 frames at 512x512, expect ~12GB usage. Use an RTX 4090 or better for comfortable generation.
Batch Processing
Generate multiple images efficiently:
# Method 1: Batch in EmptyLatentImage
- Set batch_size to 4
- Generates 4 images per run
# Method 2: Queue multiple prompts
- Use ComfyUI API
- Queue workflow with different seeds
# Method 3: Prompt scheduling
- Use different prompts per frame
- Great for animations
API Automation
import json
import requests
import websocket
import uuid
SERVER = "http://your-server:8188"
def queue_prompt(workflow):
"""Queue a workflow for generation"""
prompt_id = str(uuid.uuid4())
response = requests.post(
f"{SERVER}/prompt",
json={
"prompt": workflow,
"client_id": prompt_id
}
)
return response.json()
def get_images(prompt_id):
"""Get generated images"""
response = requests.get(f"{SERVER}/history/{prompt_id}")
history = response.json()
images = []
for node_id, output in history[prompt_id]["outputs"].items():
if "images" in output:
for img in output["images"]:
img_url = f"{SERVER}/view?filename={img['filename']}"
images.append(img_url)
return images
# Load workflow from file
with open("workflow.json") as f:
workflow = json.load(f)
# Modify prompt
workflow["3"]["inputs"]["text"] = "A cyberpunk city at night"
# Queue and wait
result = queue_prompt(workflow)
prompt_id = result["prompt_id"]
# Poll for completion
import time
while True:
history = requests.get(f"{SERVER}/history/{prompt_id}").json()
if prompt_id in history:
break
time.sleep(1)
# Get images
images = get_images(prompt_id)
print(f"Generated images: {images}")
Performance Optimization
Memory Optimization
# Launch flags for low VRAM:
python main.py --lowvram # Aggressive memory saving
python main.py --medvram # Moderate memory saving
python main.py --gpu-only # Keep everything on GPU (fast, needs VRAM)
# For best performance on 24GB:
python main.py --highvram
Speed Optimization
# Enable xformers (faster attention)
pip install xformers
# Use fp16 precision
# Set in node settings or via --force-fp16
# Batch similar operations
# Queue multiple jobs instead of one at a time
Useful Workflows
1. Product Photography
- Load product image
- Remove background (rembg node)
- ControlNet canny for shape
- Prompt for studio lighting
2. Consistent Characters
- IP-Adapter with face reference
- ControlNet OpenPose for pose
- Same seed for consistency
3. Architecture Visualization
- ControlNet depth from 3D render
- ControlNet canny for edges
- Style with IP-Adapter
Run ComfyUI in the Cloud
No local GPU? Run ComfyUI on GPUBrazil's RTX 4090s from $0.40/hr.
Get $5 Free Credit โTroubleshooting
CUDA Out of Memory
- Reduce image resolution
- Use --lowvram flag
- Disable unnecessary ControlNets
- Reduce batch size
Missing Custom Nodes
- Use ComfyUI Manager to install
- Check custom_nodes folder
- Restart ComfyUI after installing
Slow Generation
- Enable xformers
- Use fp16 precision
- Reduce steps (20-25 usually sufficient)
- Use faster samplers (DPM++ 2M Karras)
Conclusion
ComfyUI is the most powerful tool for AI image generation. While the learning curve is steeper than simple interfaces, the control and capabilities are unmatched.
Start with basic workflows, gradually add ControlNets and IP-Adapter as you get comfortable. Deploy on GPUBrazil for cloud-based generation without hardware investment.