TheDenk
/

wan2.2-t2v-a14b-controlnet-depth-v1

+---
+license: apache-2.0
+language:
+  - en
+tags:
+- video
+- video-generation
+- video-to-video
+- controlnet
+- diffusers
+- wan2.2
+---
+# Controlnet for Wan2.2 A14B (depth)
+This repo contains the code for controlnet module for Wan2.2.  See <a href="https://github.com/TheDenk/wan2.2-controlnet">Github code</a>.
+Same approach as controlnet for [Wan2.1](https://github.com/TheDenk/wan2.1-dilated-controlnet).
+<video controls autoplay src=""></video>
+### For ComfyUI
+Use the cool [ComfyUI-WanVideoWrapper](https://github.com/kijai/ComfyUI-WanVideoWrapper).
+### Inference examples
+#### Simple inference with cli
+```bash
+python -m inference.cli_demo \
+    --video_path "resources/bubble.mp4" \
+    --prompt "Close-up shot with soft lighting, focusing sharply on the lower half of a young woman's face. Her lips are slightly parted as she blows an enormous bubblegum bubble. The bubble is semi-transparent, shimmering gently under the light, and surprisingly contains a miniature aquarium inside, where two orange-and-white goldfish slowly swim, their fins delicately fluttering as if in an aquatic universe. The background is a pure light blue color." \
+    --controlnet_type "depth" \
+    --base_model_path Wan-AI/Wan2.2-T2V-A14B \
+    --controlnet_model_path TheDenk/wan2.2-t2v-a14b-controlnet-depth-v1
+```
+#### Minimal code example
+```python
+import os
+os.environ['CUDA_VISIBLE_DEVICES'] = "0"
+os.environ["TOKENIZERS_PARALLELISM"] = "false"
+import torch
+from diffusers.utils import load_video, export_to_video
+from diffusers import AutoencoderKLWan, UniPCMultistepScheduler
+from controlnet_aux import MidasDetector
+from wan_controlnet import WanControlnet
+from wan_transformer import CustomWanTransformer3DModel
+from wan_t2v_controlnet_pipeline import WanTextToVideoControlnetPipeline
+base_model_path = "Wan-AI/Wan2.2-T2V-A14B"
+controlnet_model_path = "TheDenk/wan2.2-t2v-a14b-controlnet-depth-v1"
+vae = AutoencoderKLWan.from_pretrained(base_model_path, subfolder="vae", torch_dtype=torch.float32)
+transformer = CustomWanTransformer3DModel.from_pretrained(base_model_path, subfolder="transformer", torch_dtype=torch.bfloat16)
+controlnet = WanControlnet.from_pretrained(controlnet_model_path, torch_dtype=torch.bfloat16)
+pipe = WanTextToVideoControlnetPipeline.from_pretrained(
+    pretrained_model_name_or_path=base_model_path,
+    controlnet=controlnet,
+    transformer=transformer,
+    vae=vae,
+    torch_dtype=torch.bfloat16
+)
+pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config, flow_shift=12.0)
+pipe.enable_model_cpu_offload()
+controlnet_processor = MidasDetector.from_pretrained('lllyasviel/Annotators')
+img_h = 704 # 704 480
+img_w = 1280 # 1280 832
+num_frames = 121  # 121 81 49
+video_path = 'bubble.mp4'
+video_frames = load_video(video_path)[:num_frames]
+video_frames = [x.resize((img_w, img_h)) for x in video_frames]
+controlnet_frames = [controlnet_processor(x) for x in video_frames]
+prompt = "Close-up shot with soft lighting, focusing sharply on the lower half of a young woman's face. Her lips are slightly parted as she blows an enormous bubblegum bubble. The bubble is semi-transparent, shimmering gently under the light, and surprisingly contains a miniature aquarium inside, where two orange-and-white goldfish slowly swim, their fins delicately fluttering as if in an aquatic universe. The background is a pure light blue color."
+negative_prompt = "bad quality, worst quality"
+output = pipe(
+    prompt=prompt,
+    negative_prompt=negative_prompt,
+    height=img_h,
+    width=img_w,
+    num_frames=num_frames,
+    guidance_scale=5,
+    generator=torch.Generator(device="cuda").manual_seed(42),
+    output_type="pil",
+    controlnet_frames=controlnet_frames,
+    controlnet_guidance_start=0.0,
+    controlnet_guidance_end=0.8,
+    controlnet_weight=0.8,
+    teacache_treshold=0.6,
+).frames[0]
+export_to_video(output, "output.mp4", fps=16)
+```
+## Acknowledgements
+Original code and models [Wan2.2](https://github.com/Wan-Video/Wan2.2).
+## Citations
+```
+@misc{TheDenk,
+    title={Wam2.2 Controlnet},
+    author={Karachev Denis},
+    url={https://github.com/TheDenk/wan2.2-controlnet},
+    publisher={Github},
+    year={2025}
+}
+```
+## Contacts
+<p>Issues should be raised directly in the repository. For professional support and recommendations please <a>[email protected]</a>.</p>

config.json ADDED Viewed

	@@ -0,0 +1,25 @@

+{
+  "_class_name": "WanControlnet",
+  "_diffusers_version": "0.35.0.dev0",
+  "added_kv_proj_dim": null,
+  "attention_head_dim": 128,
+  "cross_attn_norm": true,
+  "downscale_coef": 8,
+  "eps": 1e-06,
+  "ffn_dim": 8960,
+  "freq_dim": 256,
+  "image_dim": null,
+  "in_channels": 3,
+  "num_attention_heads": 12,
+  "num_layers": 6,
+  "out_proj_dim": 5120,
+  "patch_size": [
+    1,
+    2,
+    2
+  ],
+  "qk_norm": "rms_norm_across_heads",
+  "rope_max_seq_len": 1024,
+  "text_dim": 4096,
+  "vae_channels": 16
+}

diffusion_pytorch_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:878a730283dc0471cf27c68497e3db4edd791a60763d2fd7e13a67c6ab3825ec
+size 705206016