ComfyUI NSFW video generation (local, 2026)
Video gen is the hardest thing in this guide series. That's okay - start with 16 frames at 512x512 and you'll have something moving in 20 minutes. ComfyUI handles NSFW video through AnimateDiff - a motion module system that bolts onto SD1.5 checkpoints you already own. You're not rendering feature films. You're generating 2-second motion clips at 8 fps, and even that takes real VRAM (10-13 GB for 16 frames). The workflow: install ComfyUI-AnimateDiff-Evolved nodes, drop a motion module into the right folder, wire up a small test graph, and export with VideoHelperSuite. Flickering and temporal consistency are real problems you'll hit immediately - SD1.5 AnimateDiff handles these better than SDXL right now. Newer options like Wan2.1 skip AnimateDiff entirely and generate cleaner video, but need separate model downloads. This guide covers both paths. If you want zero-setup stills before attempting motion, LocalForge AI handles that side.
The Quick Answer
Key Takeaway - May 2026
AnimateDiff through ComfyUI-AnimateDiff-Evolved produces 16-frame motion clips at 512x512 on 10-13 GB VRAM. SD1.5 motion modules (v3_sd15_mm) flicker less than SDXL variants. For cleaner video with fewer nodes, Wan2.1's 1.3B model runs on ~8 GB VRAM and generates 5-second 480p clips natively in ComfyUI - but it's a completely different model architecture. Start with AnimateDiff if you already have SD1.5 checkpoints. Start with Wan2.1 if you're setting up fresh.
What You Need
| Component | AnimateDiff (SD1.5) | Wan2.1 (1.3B) | Wan2.1 (14B) |
|---|---|---|---|
| VRAM | 10-13 GB (16 frames, 512x512) | ~8 GB | 16+ GB |
| Resolution | 512x512 (safe start) | 480p native | 720p native |
| Motion module | v3_sd15_mm (~837 MB) | Wan diffusion model (~2.5 GB) | Wan diffusion model (~28 GB) |
| Extra downloads | SD1.5 checkpoint (~2 GB) | Text encoder + VAE + CLIP (~5 GB total) | Same encoders + larger model |
| Frame rate | 8 fps (standard) | 24 fps | 24 fps |
| Best for | Short loops, existing SD1.5 users | Quick 5-second clips | Higher quality, more VRAM |
Step 1 - Install AnimateDiff Evolved nodes
Open ComfyUI Manager, search ComfyUI-AnimateDiff-Evolved (by Kosinkadink), and click Install. Restart ComfyUI after it finishes. You'll see new nodes under the AnimateDiff category: AnimateDiff Loader, Uniform Context Options, and Apply AnimateDiff Model.
Don't clone random forks. The Kosinkadink version is the maintained one as of 2026, with recent fixes for motion model dtype loading and multi-GPU support.
Install ComfyUI-VideoHelperSuite the same way - you'll need its Video Combine node to export anything watchable.
Step 2 - Download a motion module
Grab v3_sd15_mm.safetensors (837 MB) from the guoyww/animatediff Hugging Face repo. Drop it in ComfyUI/custom_nodes/ComfyUI-AnimateDiff-Evolved/models/. That folder should exist after installing the node pack.
Other options:
- mm_sd15_v2.safetensors (909 MB) - older but stable
- mm_sd15_AnimateLCM.safetensors (907 MB) - faster generation with LCM sampling, fewer steps needed
Don't download SDXL motion modules yet. SDXL AnimateDiff has worse flickering and needs 16+ GB VRAM. Stick with SD1.5 for your first working clip.
Step 3 - Build the baseline workflow
Wire this minimal graph:
- Load Checkpoint - any SD1.5 model you use for stills
- AnimateDiff Loader - point to your v3_sd15_mm motion module
- Uniform Context Options - set context_length to 16, context_overlap to 4
- Apply AnimateDiff Model - connect loader output to model
- KSampler - steps: 20, CFG: 7, sampler: euler_ancestral
- VAE Decode - standard
- Video Combine (VHS node) - frame_rate: 8, format: mp4
Set your latent image to 512x512, batch size 16. That's 16 frames - 2 seconds of video at 8 fps.
Queue it. Don't change the prompt yet. Just confirm you get 16 frames of motion without an OOM error.
Step 4 - Configure frames, resolution, and sampler
Once your baseline runs:
- Frame count: Stay at 16 until stable. Going to 24 adds ~50% VRAM. Going to 48 often OOMs on 12 GB cards.
- Resolution: 512x512 is the safe zone. 768x768 jumps to 12-14 GB VRAM. Don't try 1024x1024 with AnimateDiff on consumer GPUs.
- LCM sampling (optional): Swap in the AnimateLCM motion module + an LCM-LoRA. Switch sampler to
lcm, scheduler tosgm_uniform, drop steps to 6-8, set CFG to 1.5-2.0. Renders 3-4x faster but slightly less detailed. - Motion strength: Start at 1.0 (default). Lower values (0.6-0.8) give subtle movement. Higher values (1.2+) create more motion but increase flickering.
Step 5 - Generate and iterate
First prompt: Keep it simple. One subject, minimal background, no camera movement. Something like "woman standing, wind blowing hair, soft lighting" works better than a paragraph of detail.
Seed strategy: Lock your seed while tuning motion settings. Change the seed only when you're happy with the movement quality and want variation.
What to expect honestly: Flickering is normal, especially on faces and fine details. SD1.5 AnimateDiff produces "motion illustrations" more than photorealistic video. Temporal consistency breaks most often on fast movement, detailed faces at low resolution, and complex backgrounds with multiple subjects.
Step 6 - Export with VideoHelperSuite
The Video Combine node handles export. Key settings:
- frame_rate: 8 for standard AnimateDiff, 12-16 if you've interpolated frames
- format:
video/h264-mp4for MP4,image/giffor looping GIFs - pingpong: Enable for back-and-forth loops (doubles apparent length)
- save_output: True to keep the file in ComfyUI's output folder
Output lands in ComfyUI/output/ with a timestamp filename. GIFs work for short loops under 3 seconds. MP4 for anything you'll edit further.
Verify It Works
Your first successful clip should show:
- 16 frames of visible motion (not 16 identical stills)
- No OOM errors in the ComfyUI console
- Playable MP4 or GIF in your system video player
- Render time under 3 minutes on a 12 GB GPU at 512x512
If you got a static image repeated 16 times, the AnimateDiff model isn't loading - check the console for errors about missing motion modules.
Alternative: Wan2.1
If you're starting fresh in 2026, Wan2.1 might be the better path. It's natively supported in ComfyUI, generates 5-second 480p clips, and the 1.3B model runs on ~8 GB VRAM.
Setup: Download the Wan2.1 diffusion model + umt5_xxl_fp8_e4m3fn_scaled.safetensors (text encoder) + wan_2.1_vae.safetensors + clip_vision_h.safetensors. Place them in ComfyUI's standard model directories. Load the example workflow from ComfyUI's built-in template library.
Trade-off: Wan2.1 produces cleaner, longer video than AnimateDiff. But your SD1.5 checkpoints and LoRAs don't transfer. AnimateDiff lets you reuse the same models you already have for stills - that's its real advantage.
Troubleshooting
- OOM at 16 frames: Drop resolution to 512x384. If that still fails, your GPU needs more than 8 GB for AnimateDiff.
- No motion in output: Confirm the AnimateDiff Loader node shows a loaded model name, not "None." Check that the motion module file sits in the correct folder path.
- Severe flickering: Switch to v3_sd15_mm if you're on an older module. Lower motion_strength to 0.7. Reduce CFG to 5-6.
- Video plays too fast: VHS defaults to 8 fps. If your player assumes 24 or 30 fps, it'll look 3x too fast. Re-export with the correct frame_rate value.
- ControlNet OOM: Each ControlNet adds 2-4 GB VRAM overhead. Get clean motion without ControlNet first, then add one net at 0.5 strength and drop resolution to 512x512.
- "Module not found" error after install: Restart ComfyUI fully. AnimateDiff Evolved registers its nodes on startup, not hot-reload.
Bottom Line
AnimateDiff gets you short NSFW motion clips using checkpoints and LoRAs you already own. Start at 16 frames, 512x512, 8 fps - prove it runs before you scale anything. Wan2.1 is the cleaner path for new setups but you lose LoRA compatibility. Neither approach gives you cinema. You're making 2-second animated loops, and that's the honest starting point.
