Running Stable Diffusion on AMD GPUs: Uncensored Local Setup in 2026
Can you run Stable Diffusion on AMD Radeon GPUs locally? Yes — here's exactly how. ROCm on Linux, DirectML on Windows, performance expectations, and which AMD cards actually work for uncensored AI image generation.
The Honest Truth About AMD + Stable Diffusion
Every guide focuses on NVIDIA. There's a reason for that — CUDA dominance means AMD is always playing catch-up in the AI space. But in 2026, AMD GPUs genuinely work for local Stable Diffusion.
The caveats are real though: Linux is strongly preferred, performance is slower than equivalent NVIDIA hardware, and some extensions don't work. Here's the full picture.
AMD GPU Compatibility Table
| GPU | VRAM | Linux (ROCm) | Windows (DirectML) | Notes |
|---|---|---|---|---|
| RX 7900 XTX | 24 GB | Excellent | Decent | Best AMD option, runs Flux models |
| RX 7900 XT | 20 GB | Excellent | Decent | Great value for VRAM |
| RX 7800 XT | 16 GB | Good | Decent | Handles SDXL well |
| RX 7700 XT | 12 GB | OK | Basic | SDXL works, tight on Flux |
| RX 6900 XT | 16 GB | OK | Limited | RDNA 2, ROCm support varies |
| RX 6800 XT | 16 GB | OK | Limited | Works but slower compute |
Option 1: Linux + ROCm (Recommended)
This is the best path for AMD. ROCm is AMD's equivalent of CUDA and provides native GPU acceleration.
- Install Ubuntu 22.04 or 24.04 (best ROCm support)
- Install ROCm 6.x — follow AMD's official guide
- Install PyTorch for ROCm:
pip install torch torchvision --index-url https://download.pytorch.org/whl/rocm6.2 - Clone Forge or ComfyUI and launch normally — it auto-detects ROCm
- Set environment variable if needed:
HSA_OVERRIDE_GFX_VERSION=11.0.0(for RDNA 3 cards)
Performance: RX 7900 XTX on ROCm generates SDXL images at roughly 70% the speed of an RTX 4080. Usable, not record-breaking.
Option 2: Windows + DirectML
If you're staying on Windows, DirectML works but with trade-offs:
- Use the DirectML fork of AUTOMATIC1111: Forge AMD fork
- Slower than ROCm — roughly 40–60% of ROCm performance
- Some extensions break on DirectML (ADetailer works, some ControlNet preprocessors don't)
- No xFormers — memory optimization is more limited on Windows AMD
Reading this and thinking "this is complicated"? You're not wrong. AMD setup requires ROCm, environment variables, and fork-specific installations. If you have access to any NVIDIA GPU (even an old RTX 3060), LocalForge AI turns it into a fully working uncensored image generator in under 10 minutes — zero terminal commands.
AMD vs NVIDIA: Should You Switch?
If you already have an AMD GPU, use it — the guides above work. Don't buy a new GPU just for SD.
If you're buying specifically for AI image generation, NVIDIA is the pragmatic choice. CUDA ecosystem support is years ahead. An RTX 4060 Ti 16 GB outperforms an RX 7800 XT in SD despite costing less, because of software optimization.
The exception: if you need 24 GB VRAM on a budget, the RX 7900 XTX ($800–900) beats the RTX 4090 ($1,600+) on price. You sacrifice speed but gain VRAM headroom for large models.
FAQ
Does AMD work with Flux models?
Yes, on ROCm (Linux) with 16+ GB VRAM. DirectML support for Flux is hit-or-miss. The RX 7900 XTX/XT are the only AMD cards with enough VRAM to comfortably run Flux.
Can I train LoRAs on AMD?
Yes, on Linux with ROCm. Kohya_ss works with ROCm. Training is roughly 40% slower than equivalent NVIDIA hardware. Not supported on Windows DirectML.
What about Intel Arc GPUs?
Intel Arc A770 (16 GB) has experimental support via OpenVINO or IPEX. It's slower than AMD ROCm and much less tested. Not recommended as a primary SD GPU in 2026.
