LocalForge AILocalForge AI
LibraryBlogFAQ

Forge / Use Case

How to Install Stable Diffusion Forge Locally

Forge is a fork of AUTOMATIC1111 that runs Stable Diffusion and Flux faster on the same GPU. lllyasviel's last personal commit landed in November 2024 - but the install still works fine, and 2026 forks like sd-webui-forge-classic pick up where the original left off. This guide covers the official install paths, the two fixes to apply right after, and when to switch to a fork instead.

About this Use Case

Forge is a local, offline AI image generation tool that is fully open source. It allows unrestricted content generation without filters.

The State of Forge in 2026

Forge's original maintainer (lllyasviel - also behind ControlNet, Fooocus, and IC-Light) hasn't pushed a personal commit since November 1, 2024. Community PRs kept landing through July 2025, but there's been no fresh official release. The README's "Forge Status" table still lists test dates from August/September 2024.

That doesn't mean it's broken. The latest one-click bundle still installs fine in 2026, runs SDXL and Flux NF4 faster than AUTOMATIC1111, and most issues you'll hit have maintainer-written answers in the GitHub Discussions tab. But if you need active 2026 development - Torch 2.8 support, Qwen Image, GGUF - install sd-webui-forge-classic (by Haoming02) instead. It's a drop-in fork of the same UI, ~1.1k stars, ongoing commits.

Pick lllyasviel's original Forge if you want a stable, known-working install for SD 1.5 / SDXL / Flux NF4. Pick Forge Classic if you want the same UI with active 2026 development.

Hardware You'll Actually Need

VRAM is the constraint. Each model class wants different headroom on Forge.

Model VRAM minimum Sweet spot
SD 1.5 2 GB (community-reported) 6 GB+
SDXL 4 GB 8 GB
Flux NF4 6 GB 8-12 GB
Flux FP8 8 GB 12 GB+

You'll also need 16 GB system RAM (32 GB recommended) and an NVIDIA GPU on the CUDA path. AMD, Intel, and Apple Silicon work via fallbacks but are rougher. The non-negotiable: at least 40 GB of system swap.

That last one isn't optional. The Forge maintainer's own troubleshooting thread traces roughly fourteen unrelated-looking errors back to insufficient swap. Set the Windows pagefile or Linux swap to ≥ 40 GB and put it on an SSD, not an HDD.

Disk: 20 GB for the install itself, plus 6-17 GB per checkpoint (Flux dev FP8 alone is ~17 GB).

Install: Windows One-Click (Recommended)

Use this if you have an NVIDIA GPU and don't want to mess with Python versions:

  1. Install 7-Zip from 7-zip.org. The one-click bundle ships as a .7z archive and Windows can't extract that natively.
  2. Download webui_forge_cu121_torch231.7z from the Forge releases page. This is the CUDA 12.1 + PyTorch 2.3.1 bundle, which the README explicitly marks as Recommended. Skip the cu124 / Torch 2.4 bundle unless you know you need it - the README warns its xformers and MSVC support are unstable.
  3. Extract to a short path close to the drive root - C:\AI\forge\ works, C:\Users\Your Name\Desktop\AI Stuff\Forge\ will quietly break the bundled venv.
  4. Run update.bat first. The maintainer's README says this is required, not optional. The bundles can be weeks or months stale at any given moment, and the stale snapshots have known bugs.
  5. Drop your .safetensors checkpoints into webui\models\Stable-diffusion\. Forge ships with no models - at minimum, grab one SDXL checkpoint from CivitAI (Juggernaut XL is a fine starting point, ~6.5 GB).
  6. Run run.bat. Forge launches a Gradio server on http://127.0.0.1:7860 (or :7861 if 7860 is already taken).

If run.bat fails or the console flashes and closes immediately, jump to "Two Things to Do Right After Install" - it's almost always the system swap.

Install: Git Clone (Windows / Linux / macOS)

Use this if you want to reuse an existing Python install, run on Linux/macOS, or pin specific versions:

  1. Install Python 3.10.6 (not 3.11+ - see "Common First-Run Errors"). On Windows, grab it from python.org and check "Add Python to PATH". On macOS, brew install python@3.10. On Linux, your package manager.

  2. Install Git from git-scm.com if you don't have it. Default install options are fine.

  3. Clone the repo:

    git clone https://github.com/lllyasviel/stable-diffusion-webui-forge
    cd stable-diffusion-webui-forge
    
  4. Launch - first launch creates venv/ and downloads PyTorch + dependencies (10-15 minutes on a decent connection):

  • Windows: double-click webui-user.bat
  • Linux/macOS: ./webui.sh
  1. Drop checkpoints into models/Stable-diffusion/, then open http://127.0.0.1:7860.

On Linux specifically, the Gradio binding sometimes lands on :7861 instead of :7860. Watch the terminal output.

Two Things to Do Right After Install

These two steps fix more than half of the issues people post about:

  1. Set system swap to at least 40 GB. On Windows: System → About → Advanced system settings → Performance Settings → Advanced → Virtual Memory. Uncheck "Automatically manage," and set a custom size of 40000 MB minimum / 60000 MB maximum on an SSD drive. On Linux: sudo fallocate -l 40G /swapfile && sudo mkswap /swapfile && sudo swapon /swapfile. The maintainer's troubleshooting thread maps roughly fourteen apparently-unrelated errors back to this single root cause. The symptoms: "Connection errored out," "Press any key to continue," Killed, Aborted, segfaults, RuntimeError: CPUAllocator, and the console flashing closed.
  2. Run update.bat once a week. Community-maintained branches still get patches even though lllyasviel is quiet. Running update is the only way you pick them up.

Adding Flux NF4 Support

Flux is the main reason most people switch from A1111 to Forge. The setup is short:

  1. Confirm your GPU supports CUDA 11.7+ (every RTX 30xx/40xx does). On GTX 10xx/20xx, NF4 isn't available - use FP8 instead.
  2. Download flux1-dev-bnb-nf4-v2.safetensors (12 GB) from huggingface.co/lllyasviel/flux1-dev-bnb-nf4 into models/Stable-diffusion/.
  3. Select it from the checkpoint dropdown. Forge auto-detects the NF4 format.
  4. Set these in the UI before generating: CFG = 1.0, Distilled CFG = 3.5, Sampler = Euler, Schedule = Simple, Steps = 20. CFG above 1 won't help - Flux dev is a distilled model, and CFG > 1 also disables the negative prompt.
  5. Touch the GPU Weight slider carefully. Set it conservatively first, then nudge up if you have headroom. A too-high GPU Weight is the #1 cause of "why is Flux suddenly 10x slower on my machine." The maintainer says this single setting resolves 99% of Flux performance complaints.

On the same hardware, NF4 generates ~3.86× faster than FP8 (8 GB 3070 Ti laptop test from the maintainer: 8.3 s/it FP8 → 2.15 s/it NF4). Prefer NF4 on any card with ≤ 12 GB VRAM.

Common First-Run Errors

Almost all of these are documented in maintainer-written GitHub Discussions:

Error Cause Fix
Torch not compiled with CUDA enabled CPU-only PyTorch installed by hand Use the one-click bundle, or pip install -r requirements.txt only - don't pip-install torch separately
subprocess-exited-with-error on first launch Python 3.11+ in PATH Install Python 3.10.6 and put it first in PATH
"Connection errored out" / Killed / Aborted / segfault / RuntimeError: CPUAllocator System swap < 40 GB Set swap to ≥ 40 GB on an SSD
MetadataIncompleteBuffer / PytorchStreamReader failed Corrupted checkpoint download Re-download the model and pip uninstall safetensors && pip install safetensors
SSL: CERTIFICATE_VERIFY_FAILED VPN, proxy, or restricted network Disable VPN/proxy or download models manually outside the venv
Forge can't find models in another folder Forge only reads models/Stable-diffusion/ by default Add --ckpt-dir, --lora-dir, --embeddings-dir to COMMANDLINE_ARGS= in webui-user.bat (forward slashes; quote paths with spaces)

Reusing Your AUTOMATIC1111 Models

If you already have an A1111 install, don't duplicate gigabytes of checkpoints. Edit webui-user.bat and add:

set COMMANDLINE_ARGS=--forge-ref-a1111-home "C:\path\to\stable-diffusion-webui"

Forge will transparently see your A1111 checkpoints, LoRAs, embeddings, and extensions. It's the lowest-friction A1111 → Forge migration.

If You'd Rather Use a Fork (or Skip Setup Entirely)

A few options once the maintenance reality of vanilla Forge becomes a problem:

  • sd-webui-forge-classic (Haoming02) - the actively-maintained 2026 continuation. Same one-click pattern, supports Torch 2.8, CUDA 12.9, and Flash Attention. ~1.1k stars, ongoing commits.
  • Forge Neo (also Haoming02) - newer fork on top of Forge Classic with Qwen Image, Nunchaku, and GGUF. Bleeding-edge; expect rough edges.
  • AUTOMATIC1111 SD-WebUI - the original. Slower than Forge on low-VRAM GPUs (Forge advertises +75% on 6 GB, +45% on 8 GB) and has no first-class Flux NF4 support. Not recommended in 2026 - Forge is a strict superset.
  • ComfyUI - node-based, more flexible for Flux pipelines, but a steeper learning curve. See ComfyUI for local install.
  • LocalForge AI - Forge pre-configured with checkpoints and Flux NF4 included, no Python or Git required. One option if you'd rather skip every step on this page.

Bottom Line

The original Forge install still works in 2026, but you're installing a frozen-in-time codebase. Run the one-click bundle, set 40 GB of swap, run update.bat first, and you'll be generating SDXL or Flux NF4 in under half an hour. If you need active 2026 development - new model architectures, Torch 2.8 - install Forge Classic instead.

About Forge

Runs Locally Yes
Open Source Yes
NSFW Allowed Yes
Website https://github.com/lllyasviel/stable-diffusion-webui-forge

Frequently Asked Questions

Is Forge still being maintained in 2026? +
lllyasviel - the original author - hasn't pushed a personal commit since November 2024, and the README's status table still shows 2024 dates. Community PRs kept the original repo alive through mid-2025, and the actively-maintained 2026 fork is sd-webui-forge-classic by Haoming02. The vanilla install still works; for new features, use the fork.
Why does my Forge install crash with 'Connection errored out' or 'Press any key to continue'? +
Insufficient system swap. The maintainer traces around fourteen apparently-unrelated symptoms (segfaults, Killed, Aborted, console flashing closed) all to the same root cause. Set Windows pagefile or Linux swap to at least 40 GB on an SSD, then re-run.
Should I install Forge or Forge Classic? +
Install vanilla Forge if you want a known-working SD 1.5 / SDXL / Flux NF4 setup with the most documented troubleshooting. Install Forge Classic if you need 2026 features - Torch 2.8, CUDA 12.9, Flash Attention, newer model architectures. Forge Classic is a UI-compatible drop-in, so models and settings transfer.
Can I share my AUTOMATIC1111 models with Forge instead of duplicating them? +
Yes. Add `set COMMANDLINE_ARGS=--forge-ref-a1111-home "C:\path\to\stable-diffusion-webui"` to `webui-user.bat`, and Forge will transparently use your A1111 checkpoints, LoRAs, embeddings, and extensions. It's the lowest-effort migration path.
Which CUDA bundle should I download? +
`webui_forge_cu121_torch231.7z` - CUDA 12.1 with PyTorch 2.3.1. The README explicitly marks it Recommended. The cu124 / Torch 2.4 bundle is slightly faster but the README warns xformers and MSVC may not work on it; skip it unless you know what you're doing.
Why is Flux suddenly slower than it was yesterday? +
GPU Weight slider too high. When the model gets evicted from VRAM mid-generation, generation time can grow 5-10x. The maintainer says lowering this single setting resolves 99% of Flux performance complaints. Set it conservatively first, then increase if you have headroom.

Other Use Cases for Forge