Which Forge package should I download for an RTX 3060 or 4070?

webui_forge_cu121_torch231.7z (CUDA 12.1 + PyTorch 2.3.1). It's compatible with all RTX 20/30/40 series GPUs and has the widest testing. The cu124 package can be faster on RTX 40-series but has more reported issues.

How long does Forge's first boot take?

5-10 minutes on an NVMe SSD, 10-15 on SATA SSD, 20+ on a spinning HDD. First boot downloads ~2 GB of Python dependencies. Every launch after that takes 30-60 seconds.

Can I run Forge with an AMD GPU on Windows?

Not reliably. Forge's one-click packages bundle NVIDIA CUDA. AMD GPU support requires ROCm, which only works on Linux with specific AMD cards (RX 6000/7000 series). If you're on AMD + Windows, look at DirectML-based forks instead - but expect more friction.

How much disk space does Forge use without any models?

About 15 GB total. The extracted package is ~6 GB, and first boot downloads another ~9 GB of cached dependencies to your user profile. Add 2-7 GB per checkpoint on top of that.

What's the minimum VRAM for SDXL in Forge?

6 GB works at 768x768 with Forge's automatic VRAM management. 8 GB runs SDXL at its native 1024x1024 comfortably. With 'always maximize offload' enabled, even 4 GB cards can generate SDXL - but generation takes 3-5x longer.

Is Forge still maintained in 2026?

The main repository (lllyasviel/stable-diffusion-webui-forge) last updated in mid-2025. Community forks like reForge continue active development. The one-click packages still work and remain the most downloaded Stable Diffusion UI on GitHub.

Install Stable Diffusion WebUI Forge (5-Step Guide, 2026)

The Quick Answer

Key Takeaway - May 2026

Download webui_forge_cu121_torch231.7z from the Forge GitHub releases page. Extract to a short path like C:\sd-forge\. Run run.bat. Wait 5-10 minutes on an SSD for first boot. Drop one .safetensors checkpoint into models/Stable-diffusion/. Generate at 512x512 (SD1.5) or 1024x1024 (SDXL). Done. Total disk budget: ~25 GB for Forge plus one SDXL model.

What You Need

GPU: NVIDIA RTX 2060 or newer with 6+ GB VRAM. 8 GB is the comfort zone for SDXL. 4 GB works for SD1.5 only.
RAM: 16 GB minimum. 32 GB stops Windows from paging when the browser and Forge compete for memory.
Disk: 25-30 GB free on an SSD. Forge's Python environment takes ~6 GB. One SDXL checkpoint adds 6-7 GB. The pip cache eats another ~9 GB in %USERPROFILE%\.cache.
OS: Windows 10 or 11. Linux works with the shell scripts. macOS is not supported.
Driver: NVIDIA driver 535+ for CUDA 12.1 compatibility. Open Task Manager, click Performance, click GPU - your driver version is right there.

Step 1 - Download the Correct Package

Go to the Forge GitHub releases page and download webui_forge_cu121_torch231.7z. This is the CUDA 12.1 + PyTorch 2.3.1 bundle - the one the README recommends and the one with over 1.3 million downloads.

There's also a webui_forge_cu124_torch24.7z (CUDA 12.4 + PyTorch 2.4). It can be faster on RTX 40-series cards, but users report more compatibility issues. Start with cu121. Switch later if you have a reason.

The download is ~1.8 GB. Don't grab random packages from third-party sites. Don't try to pip-install PyTorch separately into this package. The bundle is self-contained for a reason.

Step 2 - Extract and Run First Boot

Create a folder with a short path and no spaces: C:\sd-forge\ works. Avoid C:\Users\Your Name\Downloads\Stable Diffusion Forge v2\ - spaces in paths cause subtle breakage with Python scripts and some extensions.

Extract the .7z archive into your folder. You should see run.bat, update.bat, and a webui subfolder. Run update.bat first to pull the latest code, then run run.bat.

First boot takes 5-10 minutes on an NVMe SSD while Forge downloads Python dependencies and configures the environment. On a SATA SSD, expect closer to 10-15 minutes. On a spinning HDD, go make coffee. Don't close the console window when it looks frozen - it's still working.

When you see Running on local URL: http://127.0.0.1:7860 in the console, Forge is ready. Your browser should open automatically.

Step 3 - Drop One Checkpoint In

You need exactly one model file to start. Go to CivitAI or Hugging Face, pick a popular checkpoint, and download the .safetensors file.

SD1.5 model (e.g., Realistic Vision v5.1): ~2 GB, runs on 4+ GB VRAM at 512x512
SDXL model (e.g., Juggernaut XL v9): ~6.5 GB, needs 8+ GB VRAM at 1024x1024

Copy the file into models/Stable-diffusion/ inside your Forge folder. Back in the Forge UI, click the refresh button next to the checkpoint dropdown. Select your model.

That's it. One model. Prove it works before you download ten more.

Step 4 - Generate Your First Image

Type a simple prompt. Something like a red fox sitting in snow, detailed fur, natural lighting. Set these values:

SD1.5: 512x512, 20 steps, DPM++ 2M Karras sampler, CFG 7
SDXL: 1024x1024, 25 steps, DPM++ 2M Karras sampler, CFG 7

Click Generate. SD1.5 at 512x512 uses ~4 GB VRAM and finishes in 5-15 seconds on an RTX 3060. SDXL at 1024x1024 uses ~8 GB VRAM and takes 15-30 seconds on the same card.

Black image? You need the right VAE. Some checkpoints require a separate VAE file in models/VAE/. Check the model's download page for instructions. Don't stack multiple VAEs - use the one the model card specifies.

Step 5 - Fix Memory Errors (If They Happen)

If you get a CUDA out-of-memory error, Forge has built-in options that A1111 doesn't. In the Forge UI, look for the VRAM management settings:

"Automatic" mode: Forge's default. It handles offloading between GPU and CPU without flags. This alone lets 6 GB cards run SDXL at reduced resolutions (768x768 or 832x832).
"Always maximize offload": Drops VRAM usage to under 2 GB but slows generation significantly. Use this on 4 GB cards as a last resort.

If the UI-level options aren't enough, add --medvram to the launch arguments in webui-user.bat. Add one flag at a time. Test after each change.

Verify It Works

Your install is good if all four of these are true:

Forge console shows Running on local URL: http://127.0.0.1:7860 without CUDA errors
Your checkpoint appears in the dropdown after refresh
A test prompt at the model's native resolution produces a visible image (not black, not noise)
Task Manager shows GPU utilization spiking during generation, not CPU

If any of those fail, don't install extensions or download more models. Fix the base first.

Troubleshooting

CUDA OOM at recommended resolution: Lower the resolution by one step (1024 to 896, or 512 to 448). If that fixes it, your VRAM is the bottleneck - use Forge's automatic offloading or switch to a smaller model.
"Torch is not compiled with CUDA enabled": Wrong package. You downloaded a CPU-only build or a mismatched CUDA version. Delete the folder and re-extract webui_forge_cu121_torch231.7z.
Black images every time: Missing or wrong VAE. Download the VAE file specified on your checkpoint's model card and place it in models/VAE/. Select it in the Forge UI under Settings or the VAE dropdown.
Antivirus quarantines files: Windows Defender flags Python executables in fresh extractions. Add your Forge folder to the exclusion list: Windows Security, Virus & Threat Protection, Manage Settings, Exclusions.
Console stuck for 20+ minutes: On first boot, this can be normal on slow connections (Forge downloads ~2 GB of dependencies). If it's genuinely hung, check your internet connection and try again. Don't Ctrl+C and restart repeatedly - that corrupts half-downloaded packages.

What to Do Next

Once your test image works, you have three directions worth exploring:

Understand the difference: Read the Forge vs AUTOMATIC1111 comparison to know what you gained by choosing Forge.
Check your hardware ceiling: The hardware requirements guide has real VRAM measurements for every model class.
Try zero-setup instead: If this install felt like too many moving parts, LocalForge AI packages Forge with the CUDA stack pre-configured - no manual package selection or path management.

Bottom Line

Forge installation is five decisions: which package (cu121), where to put it (short path, SSD), which checkpoint (one), what resolution (match the model), and whether the test image looks right. Everything else - extensions, LoRAs, custom samplers, themes - is noise until those five are settled. Do less. Do it right.

Install Stable Diffusion WebUI Forge - 5 Steps, Nothing Extra

The Quick Answer

What You Need

Step 1 - Download the Correct Package

Step 2 - Extract and Run First Boot

Step 3 - Drop One Checkpoint In

Step 4 - Generate Your First Image

Step 5 - Fix Memory Errors (If They Happen)

Verify It Works

Troubleshooting

What to Do Next

Bottom Line

What to Do Next

FAQ

Your Privacy, Guaranteed

The Quick Answer

What You Need

Step 1 - Download the Correct Package

Step 2 - Extract and Run First Boot

Step 3 - Drop One Checkpoint In

Step 4 - Generate Your First Image

Step 5 - Fix Memory Errors (If They Happen)

Verify It Works

Troubleshooting

What to Do Next

Bottom Line

What to Do Next

FAQ

Related Guides

Get LocalForge AI

Redirecting to Secure Checkout...

Your Privacy, Guaranteed