LocalForge AILocalForge AI
LibraryBlogFAQ

ComfyUI Pony Diffusion V6 XL workflow (2026)

Pony Diffusion V6 XL is a 6.46 GB SDXL fine-tune with 950k+ downloads on Civitai and one hard requirement: clip skip 2. Skip that setting and your outputs look like generic SDXL mush. This guide walks through the exact ComfyUI graph setup - checkpoint loader, VAE node, CLIPSetLastLayer wired to -2, and score tag prompts - so you get Pony-quality results on the first run. You don't need exotic hardware: 8 GB VRAM handles 1024x1024 with ComfyUI's memory management. The model ships with community workflows for detailers, upscale chains, and LoRA stacking, but we're starting simple: one checkpoint, one VAE, one resolution. Get a clean baseline before you touch anything else. LocalForge AI bundles this if you'd rather skip the manual setup.

The Quick Answer

Key Takeaway - May 2026

Pony Diffusion V6 XL (civitai.com/models/257749) is a 6.46 GB SDXL-class checkpoint trained on 2.6M aesthetically ranked images with 71k+ reviews. You need three things configured correctly: clip skip 2 via CLIPSetLastLayer, the external sdxl_vae.safetensors wired explicitly, and score tags (score_9, score_8_up, score_7_up) at the start of your positive prompt. Generate at 1024x1024 with Euler a or DPM++ 2M Karras, 25 steps, CFG 7. The model has 950k+ downloads for a reason - it works when you follow its conventions.


What You Need

  • Pony Diffusion V6 XL checkpoint - download the "V6 (start with this one)" version from Civitai model 257749 (6.46 GB, fp16 safetensors, pruned)
  • sdxl_vae.safetensors - 335 MB, available from the same Civitai page or HuggingFace mirrors
  • ComfyUI installed and launching (this guide assumes you've done that already)
  • GPU with 8+ GB VRAM - RTX 3060 or better runs 1024x1024 without tiling. 6 GB cards work with --lowvram but expect 2-3x slower generations
  • ComfyUI Manager (optional but saves time resolving missing custom nodes)

Step 1 - Set Up the Base Graph

Start with ComfyUI's default SDXL workflow. Delete any SD 1.5 nodes - Pony is SDXL-class and needs the dual CLIP encoder path. Your minimum graph:

  • CheckpointLoaderSimple - connects to KSampler, positive/negative CLIP, and VAE
  • CLIPSetLastLayer - wired between checkpoint CLIP output and both CLIPTextEncode nodes
  • VAELoader - override the checkpoint's baked VAE with the external sdxl_vae.safetensors
  • KSampler - feeds into VAEDecode, then SaveImage

Pin your ComfyUI commit before you install custom nodes. Comfy updates break node compatibility more than the changelogs admit - a frozen install means your workflow JSON stays reproducible.


Step 2 - Load the Checkpoint

Point CheckpointLoaderSimple at your Pony V6 XL file. Verify the filename matches what you downloaded - renaming checkpoint files is how you spend an hour debugging "wrong model loaded" when the problem is a stale symlink.

The file is 6.46 GB in fp16 safetensors format. If you downloaded the full fp32 version (~12 GB), switch to fp16. There's no visible quality difference and you save ~6 GB of VRAM headroom.


Step 3 - Wire the VAE

Use a separate VAELoader node pointing at sdxl_vae.safetensors (335 MB). The checkpoint has a baked VAE, but the external SDXL VAE produces cleaner color reproduction - particularly in skin tones and gradients.

Connect the VAELoader output to both your VAEDecode node and anywhere else the graph expects a VAE input. Don't leave this on "auto" - explicit VAE wiring prevents silent fallbacks when you swap checkpoints later.


Step 4 - CLIP Set Last Layer (Clip Skip 2)

This is the setting that separates Pony results from generic SDXL. Add a CLIPSetLastLayer node and set stop_at_clip_layer to -2.

Wire it between the checkpoint's CLIP output and both CLIPTextEncode nodes (positive and negative). Every prompt encode must go through this node. If you wire one encoder directly to the checkpoint, that encoder runs at clip skip 1 and your positive/negative prompts interpret the model differently.

Clip skip 2 isn't optional for Pony V6 - skip this and your outputs look like generic SDXL with plastic skin and muddy details. The model was fine-tuned with this setting; fighting it wastes your time.


Step 5 - Score Tags in Your Prompts

Pony V6 was trained with aesthetic score tags derived from CLIP-based ranking across 2.6M images. Place them at the start of your positive prompt:

score_9, score_8_up, score_7_up, [your actual prompt here]

  • Minimum viable set: score_9, score_8_up, score_7_up - this is what most high-quality Civitai workflows use
  • Full stack: score_9, score_8_up, score_7_up, score_6_up, score_5_up, score_4_up - broader quality range, slightly less curated feel
  • Negative prompt: add score_6_up, score_5_up, score_4_up to your negative if you used only the top 3 in positive

Don't treat these as incantations. They're quality priors from training data. If your image gets muddy after adding more score tags, you added too many, not too few. Remove one layer at a time until the composition sharpens.


Step 6 - Generate and Lock Settings

Set your KSampler:

  • Sampler: euler_ancestral (Euler a) for variety, dpmpp_2m for consistency
  • Scheduler: karras
  • Steps: 25 (start here; going above 30 rarely improves output)
  • CFG: 7 (safe default; 8-9 adds contrast but risks oversaturation)
  • Resolution: 1024x1024 for square, 896x1152 for portrait, 1152x896 for landscape, 832x1216 for tall compositions

Hit Queue Prompt. Your first gen should show Pony-characteristic style: strong color saturation, good anatomy, responsive to the score tags. If it looks like vanilla SDXL, your clip skip node isn't wired correctly - go back to Step 4.


Verify It Works

A correctly configured Pony V6 graph shows these tells:

  • Color saturation is noticeably higher than base SDXL at the same prompt
  • Score tag response - removing score tags from positive prompt produces visibly lower-quality output
  • Clip skip test - temporarily set CLIPSetLastLayer to -1, regenerate with the same seed. If the output changes (more plastic, less detailed), your -2 setting was working correctly
  • VAE test - disconnect your external VAE, let the checkpoint's baked VAE take over. If colors shift slightly, your external VAE was active

Troubleshooting

  • Plastic/waxy skin: Check clip skip first - this is the cause 80% of the time. Then check stacked LoRAs fighting the base model's rendering. Third, lower CFG by 1-2 points
  • VRAM out of memory at 1024x1024: Enable ComfyUI's --lowvram flag. If you're on exactly 8 GB, close other GPU-using apps (Discord, browser hardware acceleration). Tiled VAE decode also helps
  • Score tags do nothing: Your CLIPTextEncode is wired directly to the checkpoint, bypassing CLIPSetLastLayer. Trace the graph connections from checkpoint CLIP output
  • Colors look washed out: You're using the baked VAE instead of the external sdxl_vae.safetensors. Wire a VAELoader node explicitly
  • LoRA breaks composition: You added a LoRA before your baseline was stable. Remove all LoRAs, confirm the base generates cleanly, then add one at a time at 0.5-0.7 strength

Bottom Line

Pony V6 XL in ComfyUI is three things done right: clip skip 2, external SDXL VAE, and score tags at prompt start. Get those three nodes wired correctly and you've got a 950k-download checkpoint doing what made it popular. Build boring first, experiment after.

What to Do Next

FAQ

What clip skip setting does Pony V6 XL need in ComfyUI? +
Set CLIPSetLastLayer to stop_at_clip_layer = -2. This is mandatory - the model was fine-tuned at clip skip 2 and produces plastic, muddy output without it.
Where do I download Pony Diffusion V6 XL? +
Civitai model page at civitai.com/models/257749. Download the 'V6 (start with this one)' version - it's 6.46 GB in fp16 safetensors format. Verify file hashes if you mirror from third-party sources.
Does Pony V6 XL need a separate VAE file? +
It has a baked VAE, but loading sdxl_vae.safetensors (335 MB) explicitly through a VAELoader node gives cleaner color reproduction - especially skin tones. Worth the extra node.
How much VRAM does Pony V6 XL need? +
8 GB runs 1024x1024 in ComfyUI without tiling. 6 GB cards work with --lowvram but generations take 2-3x longer. Adding detailers and upscalers increases requirements - preview small, then commit.
What are score tags and do I have to use them? +
Score tags (score_9, score_8_up, score_7_up) are aesthetic quality priors from training data. They're not required, but outputs without them are noticeably lower quality. Start with the top 3 at the beginning of your positive prompt.
Can I use SD 1.5 LoRAs with Pony V6 XL? +
No. Pony is SDXL-class - it needs SDXL or Pony-specific LoRAs. SD 1.5 LoRAs will error or produce garbage. Check the LoRA's base model tag on Civitai before downloading.