ControlNet for realistic anatomy (local NSFW workflows, 2026)
ControlNet doesn't fix bad prompts. It fixes bad geometry. If your hands have six fingers and your elbows bend backward, that's not a model problem - it's a missing pose constraint. OpenPose gives the diffusion model a skeleton to follow. Depth maps give it spatial relationships - what's in front, what's behind. Stack both and you eliminate the anatomy failures that no amount of seed rerolling or CFG tweaking will solve.
This guide covers the exact models, weights, and node setup for ComfyUI and Forge. We're using SD1.5 and SDXL ControlNet models (1.35 - 2.5 GB each), keeping weights conservative so skin doesn't turn to plastic. If your poses look like mannequins, you've probably cranked weights too high - we'll fix that. NSFW context here means adult anatomy studies generated locally on your own hardware - not cloud policy workarounds.
Quick Answer
Key Takeaway - May 2026
Stack OpenPose (skeleton structure) + Depth (spatial layout) to fix anatomy. Use control_v11p_sd15_openpose and control_v11f1p_sd15_depth for SD1.5, or their SDXL equivalents. Set each ControlNet weight to 0.35 - 0.55 as a starting point. Two nets at conservative weights beat one net cranked to 1.0.
Or use LocalForge AI for pre-configured ControlNet setups with zero manual installs.
What You Need
SD1.5 ControlNet models (1.35 - 1.45 GB each):
- OpenPose:
control_v11p_sd15_openpose- skeleton keypoints for body, hands, face - Depth:
control_v11f1p_sd15_depth- spatial relationships and occlusion
SDXL ControlNet models (~2.5 GB each):
- OpenPose:
thibaud/controlnet-openpose-sdxl-1.0ordiffusers/controlnet-openpose-sdxl-1.0 - Depth:
diffusers/controlnet-depth-sdxl-1.0
VRAM budget: Each SD1.5 ControlNet adds roughly 1.5 GB to your baseline. Each SDXL ControlNet adds about 2.5 GB. Two SD1.5 nets on a 3 GB base model need 6 GB minimum. Two SDXL nets on a 4.6 GB UNet need 10+ GB. If you're hitting OOM, use fp16 variants (723 MB each for SD1.5) or run one net at a time.
Software: ComfyUI with ControlNet Auxiliary Preprocessors installed via ComfyUI Manager, or Forge with multi-ControlNet enabled in settings.
Step 1 - Download ControlNet Models
Grab SafeTensors files from Hugging Face:
- SD1.5 OpenPose:
lllyasviel/control_v11p_sd15_openpose-diffusion_pytorch_model.safetensors(1.35 GB) - SD1.5 Depth:
lllyasviel/control_v11f1p_sd15_depth-diffusion_pytorch_model.safetensors(1.45 GB) - SDXL variants:
diffusers/controlnet-openpose-sdxl-1.0anddiffusers/controlnet-depth-sdxl-1.0(2.5 GB each)
Drop them in your models/controlnet/ folder. ComfyUI picks them up automatically. In Forge, they appear in the ControlNet model dropdown after a UI restart.
fp16 option: If VRAM is tight, control_v11p_sd15_depth_fp16.safetensors at 723 MB has minimal quality loss.
Step 2 - Generate or Edit Your Pose Map
OpenPose maps: In ComfyUI, add the OpenPose Preprocessor node from ControlNet Auxiliary Preprocessors. Enable hand and face detection for anatomy work - body-only skeletons miss the fingers that cause most complaints. The ComfyUI OpenPose Editor node lets you manually adjust keypoints before generation.
Depth maps: Depth Anything V3 (released November 2025) with V2-style normalization is the current best option for ControlNet workflows. MiDaS still works but produces noisier results on human subjects. Install either through ComfyUI Manager.
Manual editing matters. Auto-extracted maps from reference photos are a starting point, not gospel. Open the pose map in an image editor and fix obvious problems - overlapping limbs, impossible joint angles, missing keypoints. Ten minutes fixing a map saves an hour of rerolling seeds.
Step 3 - Configure Weights
Start each ControlNet at 0.4 and adjust from there. The practical range is 0.3 - 0.6 per net when stacking two.
- Too high (0.7+): Skin turns waxy, poses look stiff and plastic, fine details vanish
- Too low (below 0.2): ControlNet barely influences generation - you're burning VRAM for nothing
- Sweet spot (0.35 - 0.55): Skeleton guides anatomy without overriding the model's skin texture and lighting
Change one weight at a time. Adjusting OpenPose and Depth simultaneously means you can't tell which change caused what. Log your results: pose 0.42 depth 0.48 - elbows fixed, hands still rough.
Step 4 - Stack OpenPose + Depth
In ComfyUI: Add two Apply ControlNet nodes. Chain them by connecting the first node's conditioning output to the second node's conditioning input. Load OpenPose in the first, Depth in the second. Attach your preprocessed maps to each node's image input.
In Forge: Open the ControlNet section, enable Unit 0 (OpenPose) and Unit 1 (Depth). Upload or auto-detect maps for each. Set preprocessor to "none" if you're providing pre-generated maps - otherwise Forge re-processes them and potentially changes your carefully edited keypoints.
Preview before generating. Both UIs can show you the preprocessed map. If the preview looks wrong - merged limbs, missing depth gradients, skeleton joints in impossible positions - the render will be wrong too. Fix the map, not the sampler.
When maps disagree (depth shows an arm in front but pose shows it behind), composite in an image editor. Don't crank both weights to 1.0 hoping they'll average out. They won't.
Step 5 - Generate
Run a small batch (2 - 4 images) to validate your map + weight combination before queuing larger runs.
Resolution matters. Generate at the resolution your maps were created for. Upscaling a 512px map to guide a 1024px generation produces blurry guidance. Regenerate maps at the target resolution or upscale after generation.
Batch debugging: If 3 out of 4 results look correct and one has mangled hands, the problem is likely prompt ambiguity, not ControlNet. The map is doing its job on the body - hands need a separate fix pass.
Step 6 - Refine
Inpainting for hands: Export a cropped OpenPose stub covering just the arm and hand region. Full-body maps can override local edits with global torque. Inpaint the problem area using the cropped map as guidance.
ADetailer: Runs YOLO detection to find hands and faces, then re-generates them at higher effective resolution. Good for batch workflows where manual inpainting isn't practical.
Save maps as lossless PNG. JPEG compression adds mosquito noise that ControlNet treats as intentional detail. You'll see it as weird texture artifacts on skin.
Verify It Works
Check these after your first successful generation:
- Finger count: All hands have five fingers with correct proportions
- Joint angles: Elbows and knees bend in anatomically possible directions
- Depth consistency: Foreground objects occlude background - no transparency glitches
- Skin texture: Smooth and natural, not waxy or plasticky (weights too high if so)
- Map alignment: Toggle the ControlNet preview overlay to confirm skeleton aligns with the generated pose
Troubleshooting
- Plastic/waxy skin: Lower both weights by 0.1. You're over-constraining the model.
- Extra limbs: Your OpenPose map has ghost keypoints from a second detected person. Delete stray skeletons in the map.
- Hands still broken: Use a cropped hand-only OpenPose map and inpaint. Full-body maps are too coarse for finger detail.
- CUDA OOM: Use fp16 ControlNet models (723 MB vs 1.45 GB). Run one net at a time. Close other VRAM-hungry apps.
- Maps look right, render is wrong: Check lighting. Your depth map encodes geometry, but if the base image's lighting contradicts the painted pose, the model gets confused.
- Stiff poses: Drop OpenPose weight to 0.3. Let the model add natural variation within the skeleton constraint.
Bottom Line
ControlNet doesn't make images better. It makes anatomy correct. OpenPose handles skeleton placement, Depth handles spatial relationships, and conservative weights keep everything looking human instead of mannequin. Download two models, spend ten minutes on your maps, and stop rerolling seeds for problems that geometry solves.
