Install Stable Diffusion WebUI Forge - 5 Steps, Nothing Extra
Five steps. That's all Forge needs. Download the right package, extract it, let first boot finish, drop one checkpoint in, generate one image. Everything after that is optional until you prove the base works.
Forge (lllyasviel/stable-diffusion-webui-forge on GitHub, 12.5k+ stars) is a fork of the AUTOMATIC1111 WebUI rebuilt around better VRAM management. It runs SD1.5, SDXL, and Flux models through the same Gradio interface you already know - but uses less memory doing it. The recommended one-click package for Windows is webui_forge_cu121_torch231.7z (CUDA 12.1 + PyTorch 2.3.1). That's the one with 1.3 million downloads and the fewest compatibility reports.
This guide is intentionally short. No extension shopping. No theme tweaks. No "bookmark these 12 docs" filler. You'll confirm your GPU, install Forge, and generate a test image. If that works, everything else can wait.
The Quick Answer
Key Takeaway - May 2026
Download webui_forge_cu121_torch231.7z from the Forge GitHub releases page. Extract to a short path like C:\sd-forge\. Run run.bat. Wait 5-10 minutes on an SSD for first boot. Drop one .safetensors checkpoint into models/Stable-diffusion/. Generate at 512x512 (SD1.5) or 1024x1024 (SDXL). Done. Total disk budget: ~25 GB for Forge plus one SDXL model.
What You Need
- GPU: NVIDIA RTX 2060 or newer with 6+ GB VRAM. 8 GB is the comfort zone for SDXL. 4 GB works for SD1.5 only.
- RAM: 16 GB minimum. 32 GB stops Windows from paging when the browser and Forge compete for memory.
- Disk: 25-30 GB free on an SSD. Forge's Python environment takes ~6 GB. One SDXL checkpoint adds 6-7 GB. The pip cache eats another ~9 GB in
%USERPROFILE%\.cache. - OS: Windows 10 or 11. Linux works with the shell scripts. macOS is not supported.
- Driver: NVIDIA driver 535+ for CUDA 12.1 compatibility. Open Task Manager, click Performance, click GPU - your driver version is right there.
Step 1 - Download the Correct Package
Go to the Forge GitHub releases page and download webui_forge_cu121_torch231.7z. This is the CUDA 12.1 + PyTorch 2.3.1 bundle - the one the README recommends and the one with over 1.3 million downloads.
There's also a webui_forge_cu124_torch24.7z (CUDA 12.4 + PyTorch 2.4). It can be faster on RTX 40-series cards, but users report more compatibility issues. Start with cu121. Switch later if you have a reason.
The download is ~1.8 GB. Don't grab random packages from third-party sites. Don't try to pip-install PyTorch separately into this package. The bundle is self-contained for a reason.
Step 2 - Extract and Run First Boot
Create a folder with a short path and no spaces: C:\sd-forge\ works. Avoid C:\Users\Your Name\Downloads\Stable Diffusion Forge v2\ - spaces in paths cause subtle breakage with Python scripts and some extensions.
Extract the .7z archive into your folder. You should see run.bat, update.bat, and a webui subfolder. Run update.bat first to pull the latest code, then run run.bat.
First boot takes 5-10 minutes on an NVMe SSD while Forge downloads Python dependencies and configures the environment. On a SATA SSD, expect closer to 10-15 minutes. On a spinning HDD, go make coffee. Don't close the console window when it looks frozen - it's still working.
When you see Running on local URL: http://127.0.0.1:7860 in the console, Forge is ready. Your browser should open automatically.
Step 3 - Drop One Checkpoint In
You need exactly one model file to start. Go to CivitAI or Hugging Face, pick a popular checkpoint, and download the .safetensors file.
- SD1.5 model (e.g., Realistic Vision v5.1): ~2 GB, runs on 4+ GB VRAM at 512x512
- SDXL model (e.g., Juggernaut XL v9): ~6.5 GB, needs 8+ GB VRAM at 1024x1024
Copy the file into models/Stable-diffusion/ inside your Forge folder. Back in the Forge UI, click the refresh button next to the checkpoint dropdown. Select your model.
That's it. One model. Prove it works before you download ten more.
Step 4 - Generate Your First Image
Type a simple prompt. Something like a red fox sitting in snow, detailed fur, natural lighting. Set these values:
- SD1.5: 512x512, 20 steps, DPM++ 2M Karras sampler, CFG 7
- SDXL: 1024x1024, 25 steps, DPM++ 2M Karras sampler, CFG 7
Click Generate. SD1.5 at 512x512 uses ~4 GB VRAM and finishes in 5-15 seconds on an RTX 3060. SDXL at 1024x1024 uses ~8 GB VRAM and takes 15-30 seconds on the same card.
Black image? You need the right VAE. Some checkpoints require a separate VAE file in models/VAE/. Check the model's download page for instructions. Don't stack multiple VAEs - use the one the model card specifies.
Step 5 - Fix Memory Errors (If They Happen)
If you get a CUDA out-of-memory error, Forge has built-in options that A1111 doesn't. In the Forge UI, look for the VRAM management settings:
- "Automatic" mode: Forge's default. It handles offloading between GPU and CPU without flags. This alone lets 6 GB cards run SDXL at reduced resolutions (768x768 or 832x832).
- "Always maximize offload": Drops VRAM usage to under 2 GB but slows generation significantly. Use this on 4 GB cards as a last resort.
If the UI-level options aren't enough, add --medvram to the launch arguments in webui-user.bat. Add one flag at a time. Test after each change.
Verify It Works
Your install is good if all four of these are true:
- Forge console shows
Running on local URL: http://127.0.0.1:7860without CUDA errors - Your checkpoint appears in the dropdown after refresh
- A test prompt at the model's native resolution produces a visible image (not black, not noise)
- Task Manager shows GPU utilization spiking during generation, not CPU
If any of those fail, don't install extensions or download more models. Fix the base first.
Troubleshooting
- CUDA OOM at recommended resolution: Lower the resolution by one step (1024 to 896, or 512 to 448). If that fixes it, your VRAM is the bottleneck - use Forge's automatic offloading or switch to a smaller model.
- "Torch is not compiled with CUDA enabled": Wrong package. You downloaded a CPU-only build or a mismatched CUDA version. Delete the folder and re-extract
webui_forge_cu121_torch231.7z. - Black images every time: Missing or wrong VAE. Download the VAE file specified on your checkpoint's model card and place it in
models/VAE/. Select it in the Forge UI under Settings or the VAE dropdown. - Antivirus quarantines files: Windows Defender flags Python executables in fresh extractions. Add your Forge folder to the exclusion list: Windows Security, Virus & Threat Protection, Manage Settings, Exclusions.
- Console stuck for 20+ minutes: On first boot, this can be normal on slow connections (Forge downloads ~2 GB of dependencies). If it's genuinely hung, check your internet connection and try again. Don't Ctrl+C and restart repeatedly - that corrupts half-downloaded packages.
What to Do Next
Once your test image works, you have three directions worth exploring:
- Understand the difference: Read the Forge vs AUTOMATIC1111 comparison to know what you gained by choosing Forge.
- Check your hardware ceiling: The hardware requirements guide has real VRAM measurements for every model class.
- Try zero-setup instead: If this install felt like too many moving parts, LocalForge AI packages Forge with the CUDA stack pre-configured - no manual package selection or path management.
Bottom Line
Forge installation is five decisions: which package (cu121), where to put it (short path, SSD), which checkpoint (one), what resolution (match the model), and whether the test image looks right. Everything else - extensions, LoRAs, custom samplers, themes - is noise until those five are settled. Do less. Do it right.
