Forge / Use Case
How to Install Stable Diffusion Forge Locally
Forge is a fork of AUTOMATIC1111 that runs Stable Diffusion and Flux faster on the same GPU. lllyasviel's last personal commit landed in November 2024 - but the install still works fine, and 2026 forks like sd-webui-forge-classic pick up where the original left off. This guide covers the official install paths, the two fixes to apply right after, and when to switch to a fork instead.
About this Use Case
Forge is a local, offline AI image generation tool that is fully open source. It allows unrestricted content generation without filters.
The State of Forge in 2026
Forge's original maintainer (lllyasviel - also behind ControlNet, Fooocus, and IC-Light) hasn't pushed a personal commit since November 1, 2024. Community PRs kept landing through July 2025, but there's been no fresh official release. The README's "Forge Status" table still lists test dates from August/September 2024.
That doesn't mean it's broken. The latest one-click bundle still installs fine in 2026, runs SDXL and Flux NF4 faster than AUTOMATIC1111, and most issues you'll hit have maintainer-written answers in the GitHub Discussions tab. But if you need active 2026 development - Torch 2.8 support, Qwen Image, GGUF - install sd-webui-forge-classic (by Haoming02) instead. It's a drop-in fork of the same UI, ~1.1k stars, ongoing commits.
Pick lllyasviel's original Forge if you want a stable, known-working install for SD 1.5 / SDXL / Flux NF4. Pick Forge Classic if you want the same UI with active 2026 development.
Hardware You'll Actually Need
VRAM is the constraint. Each model class wants different headroom on Forge.
| Model | VRAM minimum | Sweet spot |
|---|---|---|
| SD 1.5 | 2 GB (community-reported) | 6 GB+ |
| SDXL | 4 GB | 8 GB |
| Flux NF4 | 6 GB | 8-12 GB |
| Flux FP8 | 8 GB | 12 GB+ |
You'll also need 16 GB system RAM (32 GB recommended) and an NVIDIA GPU on the CUDA path. AMD, Intel, and Apple Silicon work via fallbacks but are rougher. The non-negotiable: at least 40 GB of system swap.
That last one isn't optional. The Forge maintainer's own troubleshooting thread traces roughly fourteen unrelated-looking errors back to insufficient swap. Set the Windows pagefile or Linux swap to ≥ 40 GB and put it on an SSD, not an HDD.
Disk: 20 GB for the install itself, plus 6-17 GB per checkpoint (Flux dev FP8 alone is ~17 GB).
Install: Windows One-Click (Recommended)
Use this if you have an NVIDIA GPU and don't want to mess with Python versions:
- Install 7-Zip from 7-zip.org. The one-click bundle ships as a
.7zarchive and Windows can't extract that natively. - Download
webui_forge_cu121_torch231.7zfrom the Forge releases page. This is the CUDA 12.1 + PyTorch 2.3.1 bundle, which the README explicitly marks as Recommended. Skip the cu124 / Torch 2.4 bundle unless you know you need it - the README warns its xformers and MSVC support are unstable. - Extract to a short path close to the drive root -
C:\AI\forge\works,C:\Users\Your Name\Desktop\AI Stuff\Forge\will quietly break the bundled venv. - Run
update.batfirst. The maintainer's README says this is required, not optional. The bundles can be weeks or months stale at any given moment, and the stale snapshots have known bugs. - Drop your
.safetensorscheckpoints intowebui\models\Stable-diffusion\. Forge ships with no models - at minimum, grab one SDXL checkpoint from CivitAI (Juggernaut XL is a fine starting point, ~6.5 GB). - Run
run.bat. Forge launches a Gradio server onhttp://127.0.0.1:7860(or:7861if 7860 is already taken).
If run.bat fails or the console flashes and closes immediately, jump to "Two Things to Do Right After Install" - it's almost always the system swap.
Install: Git Clone (Windows / Linux / macOS)
Use this if you want to reuse an existing Python install, run on Linux/macOS, or pin specific versions:
Install Python 3.10.6 (not 3.11+ - see "Common First-Run Errors"). On Windows, grab it from python.org and check "Add Python to PATH". On macOS,
brew install python@3.10. On Linux, your package manager.Install Git from git-scm.com if you don't have it. Default install options are fine.
Clone the repo:
git clone https://github.com/lllyasviel/stable-diffusion-webui-forge cd stable-diffusion-webui-forgeLaunch - first launch creates
venv/and downloads PyTorch + dependencies (10-15 minutes on a decent connection):
- Windows: double-click
webui-user.bat - Linux/macOS:
./webui.sh
- Drop checkpoints into
models/Stable-diffusion/, then openhttp://127.0.0.1:7860.
On Linux specifically, the Gradio binding sometimes lands on :7861 instead of :7860. Watch the terminal output.
Two Things to Do Right After Install
These two steps fix more than half of the issues people post about:
- Set system swap to at least 40 GB. On Windows: System → About → Advanced system settings → Performance Settings → Advanced → Virtual Memory. Uncheck "Automatically manage," and set a custom size of 40000 MB minimum / 60000 MB maximum on an SSD drive. On Linux:
sudo fallocate -l 40G /swapfile && sudo mkswap /swapfile && sudo swapon /swapfile. The maintainer's troubleshooting thread maps roughly fourteen apparently-unrelated errors back to this single root cause. The symptoms: "Connection errored out," "Press any key to continue,"Killed,Aborted, segfaults,RuntimeError: CPUAllocator, and the console flashing closed. - Run
update.batonce a week. Community-maintained branches still get patches even though lllyasviel is quiet. Running update is the only way you pick them up.
Adding Flux NF4 Support
Flux is the main reason most people switch from A1111 to Forge. The setup is short:
- Confirm your GPU supports CUDA 11.7+ (every RTX 30xx/40xx does). On GTX 10xx/20xx, NF4 isn't available - use FP8 instead.
- Download
flux1-dev-bnb-nf4-v2.safetensors(12 GB) from huggingface.co/lllyasviel/flux1-dev-bnb-nf4 intomodels/Stable-diffusion/. - Select it from the checkpoint dropdown. Forge auto-detects the NF4 format.
- Set these in the UI before generating: CFG = 1.0, Distilled CFG = 3.5, Sampler = Euler, Schedule = Simple, Steps = 20. CFG above 1 won't help - Flux dev is a distilled model, and CFG > 1 also disables the negative prompt.
- Touch the GPU Weight slider carefully. Set it conservatively first, then nudge up if you have headroom. A too-high GPU Weight is the #1 cause of "why is Flux suddenly 10x slower on my machine." The maintainer says this single setting resolves 99% of Flux performance complaints.
On the same hardware, NF4 generates ~3.86× faster than FP8 (8 GB 3070 Ti laptop test from the maintainer: 8.3 s/it FP8 → 2.15 s/it NF4). Prefer NF4 on any card with ≤ 12 GB VRAM.
Common First-Run Errors
Almost all of these are documented in maintainer-written GitHub Discussions:
| Error | Cause | Fix |
|---|---|---|
Torch not compiled with CUDA enabled |
CPU-only PyTorch installed by hand | Use the one-click bundle, or pip install -r requirements.txt only - don't pip-install torch separately |
subprocess-exited-with-error on first launch |
Python 3.11+ in PATH | Install Python 3.10.6 and put it first in PATH |
"Connection errored out" / Killed / Aborted / segfault / RuntimeError: CPUAllocator |
System swap < 40 GB | Set swap to ≥ 40 GB on an SSD |
MetadataIncompleteBuffer / PytorchStreamReader failed |
Corrupted checkpoint download | Re-download the model and pip uninstall safetensors && pip install safetensors |
SSL: CERTIFICATE_VERIFY_FAILED |
VPN, proxy, or restricted network | Disable VPN/proxy or download models manually outside the venv |
| Forge can't find models in another folder | Forge only reads models/Stable-diffusion/ by default |
Add --ckpt-dir, --lora-dir, --embeddings-dir to COMMANDLINE_ARGS= in webui-user.bat (forward slashes; quote paths with spaces) |
Reusing Your AUTOMATIC1111 Models
If you already have an A1111 install, don't duplicate gigabytes of checkpoints. Edit webui-user.bat and add:
set COMMANDLINE_ARGS=--forge-ref-a1111-home "C:\path\to\stable-diffusion-webui"
Forge will transparently see your A1111 checkpoints, LoRAs, embeddings, and extensions. It's the lowest-friction A1111 → Forge migration.
If You'd Rather Use a Fork (or Skip Setup Entirely)
A few options once the maintenance reality of vanilla Forge becomes a problem:
- sd-webui-forge-classic (Haoming02) - the actively-maintained 2026 continuation. Same one-click pattern, supports Torch 2.8, CUDA 12.9, and Flash Attention. ~1.1k stars, ongoing commits.
- Forge Neo (also Haoming02) - newer fork on top of Forge Classic with Qwen Image, Nunchaku, and GGUF. Bleeding-edge; expect rough edges.
- AUTOMATIC1111 SD-WebUI - the original. Slower than Forge on low-VRAM GPUs (Forge advertises +75% on 6 GB, +45% on 8 GB) and has no first-class Flux NF4 support. Not recommended in 2026 - Forge is a strict superset.
- ComfyUI - node-based, more flexible for Flux pipelines, but a steeper learning curve. See ComfyUI for local install.
- LocalForge AI - Forge pre-configured with checkpoints and Flux NF4 included, no Python or Git required. One option if you'd rather skip every step on this page.
Bottom Line
The original Forge install still works in 2026, but you're installing a frozen-in-time codebase. Run the one-click bundle, set 40 GB of swap, run update.bat first, and you'll be generating SDXL or Flux NF4 in under half an hour. If you need active 2026 development - new model architectures, Torch 2.8 - install Forge Classic instead.
About Forge
| Runs Locally | Yes |
| Open Source | Yes |
| NSFW Allowed | Yes |
| Website | https://github.com/lllyasviel/stable-diffusion-webui-forge |
