LocalForge AILocalForge AI
BlogFAQ

How to Run SDXL Locally — Complete Offline Setup Guide

You can run Stable Diffusion XL on your own PC with no cloud account, no subscription, and no content filters. This guide covers hardware requirements, which UI to install, where to download models, and how to fix the errors that trip up most people. Total setup time: 15–45 minutes depending on your path.

What You Need

  • GPU: NVIDIA with 8+ GB VRAM (RTX 3060 12GB or better). 4 GB works with ComfyUI + fp16 optimizations, but it's painfully slow. AMD GPUs work via DirectML — expect significantly worse performance.
  • RAM: 32 GB recommended. 16 GB is technically possible, but your system will freeze when loading the base + refiner models simultaneously.
  • Disk space: 20 GB minimum for one model + UI + dependencies. Budget 30–50 GB if you plan to use multiple models and LoRAs.
  • OS: Windows 10/11 (primary). Linux works well. macOS with Apple Silicon runs 2–4x slower than equivalent NVIDIA setups.
  • Software: Python 3.10.x and Git (skip if using a one-click installer).

Step 1 — Pick Your UI

Three options, one clear recommendation:

  • Forge UI — Best for most users. It's a fork of AUTOMATIC1111 with better VRAM management, native SDXL support, and 10–30% faster generation on the same hardware. If you'd pick A1111, pick Forge instead — it's the successor.
  • Fooocus — Easiest option. Extract, run, done. Auto-downloads models on first launch. Built-in inpainting, styles, and upscaling. Great if you just want to type prompts and get images.
  • ComfyUI — Most powerful, steepest learning curve. Node-based workflow editor gives you total control over the generation pipeline. Pick this for custom workflows or video generation.

Or use LocalForge AI for Forge pre-configured with zero setup.

Step 2 — Install Prerequisites

If you're using Fooocus or Forge's one-click package, skip this step — everything's bundled.

For the git clone path: download Python 3.10.x (not 3.11+, which causes dependency conflicts with torch/xformers) and Git for Windows. Verify both:

python --version   # should show 3.10.x
git --version      # any version works

Step 3 — Install and Launch the UI

Forge (one-click): Download the .7z package from the Forge GitHub releases. Extract, run update.bat, then run.bat. First launch installs PyTorch and dependencies — takes 10–20 minutes.

Fooocus: Download from Fooocus releases. Extract, run run.bat. First launch auto-downloads the SDXL model (~6.5 GB), about 10–15 minutes.

ComfyUI: Download the portable package from ComfyUI releases. Extract, run run_nvidia_gpu.bat.

Your UI opens at http://127.0.0.1:7860 (Forge) or http://127.0.0.1:8188 (ComfyUI). Fooocus opens automatically.

Step 4 — Download the SDXL Model

Fooocus handles this automatically. For Forge and ComfyUI, download these files:

Place checkpoints in models/Stable-diffusion/ (Forge) or models/checkpoints/ (ComfyUI). VAE goes in models/VAE/ or models/vae/.

Always download .safetensors format — it's safer and loads faster than .ckpt.

Step 5 — Configure and Generate

Select the SDXL model in the checkpoint dropdown, then use these settings:

  • Resolution: 1024×1024 (SDXL's native resolution). Other good sizes: 1152×896, 1216×832. Don't use 512×512 — you'll get blurry, distorted output.
  • Sampler: DPM++ 2M Karras or Euler a
  • Steps: 25–30
  • CFG Scale: 5–7 (SDXL prefers lower CFG than SD 1.5)

Verify It Works

Run this test prompt: "a photo of a cat sitting on a windowsill, golden hour lighting, detailed fur, 8k"

You should get a coherent, detailed 1024×1024 image. Expected generation times:

  • RTX 4080 (16 GB): 15–20 seconds
  • RTX 3060 (12 GB): 30–45 seconds
  • 8 GB GPU with --medvram: 1–3 minutes

Troubleshooting

  • "CUDA out of memory": Add --medvram or --lowvram to launch args in webui-user.bat. Reduce resolution to 1024×1024 if you went higher. Set batch size to 1.
  • Black/blank images: Switch VAE to sdxl-vae-fp16-fix. SDXL's default VAE has float16 precision issues that produce black output.
  • Extremely slow generation: Enable xformers (--xformers in launch args). Switch from A1111 to Forge for better memory management. Reduce steps to 20–25.
  • UI won't start: Verify Python 3.10.x (not 3.11/3.12). Delete the venv folder and re-run the batch file to rebuild the environment.
  • LoRA not working: Confirm it's SDXL-compatible, not SD 1.5. Check the weight (0.7–1.0) and add the trigger word from the download page to your prompt.

What to Do Next

FAQ

Can I run SDXL with 4 GB VRAM? +
Technically yes, using ComfyUI with fp16 optimizations and model CPU offloading. But expect generation times of 5+ minutes per image. 8 GB VRAM is the realistic minimum for a usable experience.
Do I need Python installed to run SDXL locally? +
Not if you use a one-click installer. Fooocus and Forge both offer standalone packages that bundle Python, PyTorch, and all dependencies. Just extract and run.
What resolution should I use for SDXL? +
1024×1024 is SDXL's native training resolution. Other supported sizes include 1152×896 and 1216×832. Using 512×512 produces blurry, distorted output because the model wasn't trained at that size.
How long does SDXL take to generate one image? +
15–20 seconds on an RTX 4080, 30–45 seconds on an RTX 3060 12GB, and 1–3 minutes on an 8 GB GPU with --medvram enabled. These times are for 1024×1024 at 25–30 steps.
Is SDXL better than SD 1.5? +
SDXL produces significantly more detailed images at higher resolution (1024×1024 vs 512×512) with better text rendering and composition. The tradeoff: it needs roughly twice the VRAM and generates slower.