How to Train Custom LoRAs for Uncensored Stable Diffusion Locally
Step-by-step guide to training your own LoRA models for Stable Diffusion on your local machine. Create custom styles, characters, and concepts with zero cloud dependency and no content restrictions.
Why Train Your Own LoRA?
CivitAI has thousands of LoRAs — but none of them are you. Maybe you want a model that nails a specific art style, a consistent character, a product look, or a concept that doesn't exist in any public model.
Training locally means no upload restrictions, no content review, and no one else gets access to your training data or finished model. Your dataset stays on your machine.
What You Need
- GPU: NVIDIA with 8+ GB VRAM (RTX 3060 12 GB recommended). AMD GPUs work with ROCm but training is slower.
- RAM: 16 GB minimum, 32 GB recommended
- Storage: 20 GB free for training tools + datasets
- Software: Kohya_ss (GUI) or LoRA Easy Training Scripts
- Base model: The checkpoint you want to train against (e.g., Juggernaut XL, Pony v6)
- Training images: 15–200 images depending on use case
Step 1: Prepare Your Dataset
This is the most important step. Bad data = bad LoRA. Here's how to do it right:
- Resolution: Crop/resize all images to 1024×1024 for SDXL LoRAs (512×512 for SD 1.5)
- Variety: Include different angles, lighting conditions, backgrounds, and expressions
- Quality: Remove blurry, low-res, or badly cropped images — they'll degrade the output
- Captions: Each image needs a text caption describing what's in it. Use WD14 Tagger to auto-caption, then edit manually
Pro tip:
Put a unique trigger word in every caption (e.g., "ohwx_style" or "myperson"). This becomes the activation tag you'll use in prompts later.
Step 2: Configure Training in Kohya_ss
Launch Kohya_ss and set these parameters (SDXL LoRA defaults that work for most cases):
| Setting | Recommended Value |
|---|---|
| Network Rank | 32 (character) / 64 (style) |
| Network Alpha | 16 (half of rank) |
| Learning Rate | 1e-4 (UNet) / 1e-5 (text encoder) |
| Scheduler | cosine_with_restarts |
| Epochs | 10–20 (monitor for overtraining) |
| Batch Size | 1 (8 GB VRAM) / 2 (12+ GB) |
| Optimizer | AdamW8bit or Prodigy |
| Resolution | 1024 (SDXL) / 512 (SD 1.5) |
Training a 30-image LoRA at these settings takes roughly 30–60 minutes on an RTX 3060.
Step 3: Train & Test
- Hit "Start Training" in Kohya_ss — watch the loss curve in the terminal
- Save checkpoints every 2 epochs so you can pick the best one
- Copy the .safetensors file to your Forge/ComfyUI
models/Lora/folder - Test with your trigger word — e.g., "photo of ohwx_style woman in a garden"
- Adjust LoRA weight in the prompt:
<lora:my_lora:0.7>— start at 0.7, go up/down
Common Mistakes
- Overtraining: If outputs look "fried" or identical regardless of prompt, reduce epochs or lower learning rate
- Bad captions: Generic captions like "a person" don't teach the model anything useful — be descriptive
- Too few images: Under 10 images rarely produces good results for characters
- Wrong base model: Train against the same model family you'll use for generation (don't train on SD 1.5 and generate with SDXL)
- Rank too high: Rank 128+ doesn't mean better — it often causes overfit. Rank 32 is the sweet spot for most use cases.
Reality check: Setting up Kohya_ss from scratch means installing Python, CUDA, resolving dependency conflicts, and configuring 20+ training parameters. Most people spend 2–4 hours before their first training run — if nothing breaks. LocalForge AI gives you the generation side pre-configured so you can focus on what matters: training and creating.
FAQ
Can I share my LoRA publicly?
Yes — upload to CivitAI or Hugging Face. But since you trained locally, sharing is entirely optional. Your model, your choice.
Does training use internet?
No. Once you have Kohya_ss and your base model downloaded, training is 100% offline. No data leaves your machine.
Can I train on AMD GPUs?
Yes, with ROCm on Linux. Training is possible on RX 7900 XT/XTX but roughly 40% slower than equivalent NVIDIA hardware. Windows AMD training support is still experimental in 2026.
Bottom Line
Training LoRAs locally gives you complete creative control — custom characters, styles, and concepts that no public model offers. Your training data never leaves your machine.
The training side requires Kohya_ss regardless. But for the generation side — loading, testing, and using your LoRAs — LocalForge AI saves you hours of Forge setup and configuration. Train your LoRA, drop it in, and generate immediately.
