LocalForge AILocalForge AI
LibraryBlogFAQ

ComfyUI NSFW video generation (local, 2026)

Video gen is the hardest thing in this guide series. That's okay - start with 16 frames at 512x512 and you'll have something moving in 20 minutes. ComfyUI handles NSFW video through AnimateDiff - a motion module system that bolts onto SD1.5 checkpoints you already own. You're not rendering feature films. You're generating 2-second motion clips at 8 fps, and even that takes real VRAM (10-13 GB for 16 frames). The workflow: install ComfyUI-AnimateDiff-Evolved nodes, drop a motion module into the right folder, wire up a small test graph, and export with VideoHelperSuite. Flickering and temporal consistency are real problems you'll hit immediately - SD1.5 AnimateDiff handles these better than SDXL right now. Newer options like Wan2.1 skip AnimateDiff entirely and generate cleaner video, but need separate model downloads. This guide covers both paths. If you want zero-setup stills before attempting motion, LocalForge AI handles that side.

The Quick Answer

Key Takeaway - May 2026

AnimateDiff through ComfyUI-AnimateDiff-Evolved produces 16-frame motion clips at 512x512 on 10-13 GB VRAM. SD1.5 motion modules (v3_sd15_mm) flicker less than SDXL variants. For cleaner video with fewer nodes, Wan2.1's 1.3B model runs on ~8 GB VRAM and generates 5-second 480p clips natively in ComfyUI - but it's a completely different model architecture. Start with AnimateDiff if you already have SD1.5 checkpoints. Start with Wan2.1 if you're setting up fresh.


What You Need

Component AnimateDiff (SD1.5) Wan2.1 (1.3B) Wan2.1 (14B)
VRAM 10-13 GB (16 frames, 512x512) ~8 GB 16+ GB
Resolution 512x512 (safe start) 480p native 720p native
Motion module v3_sd15_mm (~837 MB) Wan diffusion model (~2.5 GB) Wan diffusion model (~28 GB)
Extra downloads SD1.5 checkpoint (~2 GB) Text encoder + VAE + CLIP (~5 GB total) Same encoders + larger model
Frame rate 8 fps (standard) 24 fps 24 fps
Best for Short loops, existing SD1.5 users Quick 5-second clips Higher quality, more VRAM

Step 1 - Install AnimateDiff Evolved nodes

Open ComfyUI Manager, search ComfyUI-AnimateDiff-Evolved (by Kosinkadink), and click Install. Restart ComfyUI after it finishes. You'll see new nodes under the AnimateDiff category: AnimateDiff Loader, Uniform Context Options, and Apply AnimateDiff Model.

Don't clone random forks. The Kosinkadink version is the maintained one as of 2026, with recent fixes for motion model dtype loading and multi-GPU support.

Install ComfyUI-VideoHelperSuite the same way - you'll need its Video Combine node to export anything watchable.


Step 2 - Download a motion module

Grab v3_sd15_mm.safetensors (837 MB) from the guoyww/animatediff Hugging Face repo. Drop it in ComfyUI/custom_nodes/ComfyUI-AnimateDiff-Evolved/models/. That folder should exist after installing the node pack.

Other options:

  • mm_sd15_v2.safetensors (909 MB) - older but stable
  • mm_sd15_AnimateLCM.safetensors (907 MB) - faster generation with LCM sampling, fewer steps needed

Don't download SDXL motion modules yet. SDXL AnimateDiff has worse flickering and needs 16+ GB VRAM. Stick with SD1.5 for your first working clip.


Step 3 - Build the baseline workflow

Wire this minimal graph:

  1. Load Checkpoint - any SD1.5 model you use for stills
  2. AnimateDiff Loader - point to your v3_sd15_mm motion module
  3. Uniform Context Options - set context_length to 16, context_overlap to 4
  4. Apply AnimateDiff Model - connect loader output to model
  5. KSampler - steps: 20, CFG: 7, sampler: euler_ancestral
  6. VAE Decode - standard
  7. Video Combine (VHS node) - frame_rate: 8, format: mp4

Set your latent image to 512x512, batch size 16. That's 16 frames - 2 seconds of video at 8 fps.

Queue it. Don't change the prompt yet. Just confirm you get 16 frames of motion without an OOM error.


Step 4 - Configure frames, resolution, and sampler

Once your baseline runs:

  • Frame count: Stay at 16 until stable. Going to 24 adds ~50% VRAM. Going to 48 often OOMs on 12 GB cards.
  • Resolution: 512x512 is the safe zone. 768x768 jumps to 12-14 GB VRAM. Don't try 1024x1024 with AnimateDiff on consumer GPUs.
  • LCM sampling (optional): Swap in the AnimateLCM motion module + an LCM-LoRA. Switch sampler to lcm, scheduler to sgm_uniform, drop steps to 6-8, set CFG to 1.5-2.0. Renders 3-4x faster but slightly less detailed.
  • Motion strength: Start at 1.0 (default). Lower values (0.6-0.8) give subtle movement. Higher values (1.2+) create more motion but increase flickering.

Step 5 - Generate and iterate

First prompt: Keep it simple. One subject, minimal background, no camera movement. Something like "woman standing, wind blowing hair, soft lighting" works better than a paragraph of detail.

Seed strategy: Lock your seed while tuning motion settings. Change the seed only when you're happy with the movement quality and want variation.

What to expect honestly: Flickering is normal, especially on faces and fine details. SD1.5 AnimateDiff produces "motion illustrations" more than photorealistic video. Temporal consistency breaks most often on fast movement, detailed faces at low resolution, and complex backgrounds with multiple subjects.


Step 6 - Export with VideoHelperSuite

The Video Combine node handles export. Key settings:

  • frame_rate: 8 for standard AnimateDiff, 12-16 if you've interpolated frames
  • format: video/h264-mp4 for MP4, image/gif for looping GIFs
  • pingpong: Enable for back-and-forth loops (doubles apparent length)
  • save_output: True to keep the file in ComfyUI's output folder

Output lands in ComfyUI/output/ with a timestamp filename. GIFs work for short loops under 3 seconds. MP4 for anything you'll edit further.


Verify It Works

Your first successful clip should show:

  • 16 frames of visible motion (not 16 identical stills)
  • No OOM errors in the ComfyUI console
  • Playable MP4 or GIF in your system video player
  • Render time under 3 minutes on a 12 GB GPU at 512x512

If you got a static image repeated 16 times, the AnimateDiff model isn't loading - check the console for errors about missing motion modules.


Alternative: Wan2.1

If you're starting fresh in 2026, Wan2.1 might be the better path. It's natively supported in ComfyUI, generates 5-second 480p clips, and the 1.3B model runs on ~8 GB VRAM.

Setup: Download the Wan2.1 diffusion model + umt5_xxl_fp8_e4m3fn_scaled.safetensors (text encoder) + wan_2.1_vae.safetensors + clip_vision_h.safetensors. Place them in ComfyUI's standard model directories. Load the example workflow from ComfyUI's built-in template library.

Trade-off: Wan2.1 produces cleaner, longer video than AnimateDiff. But your SD1.5 checkpoints and LoRAs don't transfer. AnimateDiff lets you reuse the same models you already have for stills - that's its real advantage.


Troubleshooting

  • OOM at 16 frames: Drop resolution to 512x384. If that still fails, your GPU needs more than 8 GB for AnimateDiff.
  • No motion in output: Confirm the AnimateDiff Loader node shows a loaded model name, not "None." Check that the motion module file sits in the correct folder path.
  • Severe flickering: Switch to v3_sd15_mm if you're on an older module. Lower motion_strength to 0.7. Reduce CFG to 5-6.
  • Video plays too fast: VHS defaults to 8 fps. If your player assumes 24 or 30 fps, it'll look 3x too fast. Re-export with the correct frame_rate value.
  • ControlNet OOM: Each ControlNet adds 2-4 GB VRAM overhead. Get clean motion without ControlNet first, then add one net at 0.5 strength and drop resolution to 512x512.
  • "Module not found" error after install: Restart ComfyUI fully. AnimateDiff Evolved registers its nodes on startup, not hot-reload.

Bottom Line

AnimateDiff gets you short NSFW motion clips using checkpoints and LoRAs you already own. Start at 16 frames, 512x512, 8 fps - prove it runs before you scale anything. Wan2.1 is the cleaner path for new setups but you lose LoRA compatibility. Neither approach gives you cinema. You're making 2-second animated loops, and that's the honest starting point.

What to Do Next

FAQ

How much VRAM do I need for AnimateDiff in ComfyUI? +
10-13 GB for 16 frames at 512x512 with SD1.5. A 12 GB card (RTX 3060 12GB, RTX 4070) handles this comfortably. 8 GB cards can work with optimizations but expect slower renders and occasional OOM on complex prompts.
What's the difference between AnimateDiff and Wan2.1 for NSFW video? +
AnimateDiff bolts motion onto your existing SD1.5 checkpoints and LoRAs - same models you use for stills. Wan2.1 is a standalone video model that generates cleaner, longer clips but doesn't use your SD1.5 LoRAs. Pick AnimateDiff if you have LoRAs you need. Pick Wan2.1 if you're starting from scratch.
Why does my AnimateDiff output flicker so much? +
Flickering is AnimateDiff's biggest known limitation. SD1.5 modules flicker less than SDXL. Use v3_sd15_mm, lower motion_strength to 0.7, and keep CFG at 5-7. Faces and fine details flicker most at low resolution.
Which motion module should I download first? +
v3_sd15_mm.safetensors (837 MB) from the guoyww/animatediff Hugging Face repo. It's the latest official SD1.5 motion module with the best temporal consistency. Use mm_sd15_AnimateLCM if you want faster renders with fewer sampling steps.
Can I use ControlNet with AnimateDiff? +
Yes, but each ControlNet adds 2-4 GB VRAM. Get clean motion without ControlNet first, then add one net at half strength. Don't stack multiple ControlNets until you're comfortable managing VRAM at your target resolution.
Is NSFW video technically different from SFW in ComfyUI? +
Same nodes, same motion modules, same VRAM math. The difference is your checkpoint and LoRA choices. Local generation has no content filter - what you generate on your own hardware is private.