WAN 2.5 FP16 - Image-to-Video Generation Model
Version: v1.4 Precision: FP16 (16-bit floating point) Model Family: WAN (Video Generation) Task: Image-to-Video Generation
Model Description
WAN 2.5 Image-to-Video (I2V) is a state-of-the-art diffusion model capable of generating high-quality video sequences from static images. This FP16 version provides a balance between model quality and computational efficiency, making it suitable for systems with moderate GPU resources.
Key Capabilities
- Image-to-Video Generation: Animate static images into coherent video sequences
- Temporal Coherence: Produces smooth, temporally consistent video frames
- Motion Control: Advanced control over motion dynamics and camera movements
- Lighting Preservation: Maintains lighting consistency from source image
- Quality Enhancement: Support for LoRA adapters for improved output quality
- Efficient Inference: FP16 precision reduces memory footprint while maintaining quality
Model Architecture
- Diffusion Framework: Latent diffusion-based video generation
- Conditioning: Image-conditioned video synthesis
- Precision: FP16 (half-precision floating point)
- Format: SafeTensors (secure, efficient format)
- VAE: Variational Autoencoder for latent space encoding/decoding
Repository Contents
Status: Repository structure prepared for model files (currently empty).
Current Directory Structure
wan25-fp16-i2v/
βββ diffusion_models/
β βββ wan/ # Empty - awaiting model download
βββ README.md # This file (15 KB)
βββ (model files to be added)
Expected Model Files (After Download)
The repository is organized to store WAN 2.5 FP16 I2V model files once downloaded from Hugging Face:
Core Model Files (to be placed in diffusion_models/wan/):
wan_2.5_i2v_fp16.safetensors- Main UNet diffusion model for video generation (~8-12 GB)wan_vae_fp16.safetensors- VAE for encoding/decoding video frames (~1-2 GB)image_encoder.safetensors- CLIP/VAE image encoder for conditioning (~1-2 GB)config.json- Model architecture configuration and hyperparameters (~5-10 KB)
Optional LoRA Adapters (to be placed in loras/ directory if downloaded):
motion_control_lora.safetensors- Fine-grained motion dynamics control (~100-500 MB)camera_control_lora.safetensors- Camera movement and perspective control (~100-500 MB)quality_enhancement_lora.safetensors- Output quality improvements (~100-500 MB)
Total Repository Size:
- Current: ~15 KB (documentation only)
- After Model Download: 10-15 GB (core model) + 0.3-1.5 GB (optional LoRAs)
Download Instructions
To populate this repository with model files:
# Install Hugging Face CLI
pip install huggingface-hub
# Download WAN 2.5 FP16 I2V model (requires HF authentication)
huggingface-cli login
huggingface-cli download Wan/WAN-2.5-I2V --revision fp16 --local-dir "E:/huggingface/wan25-fp16-i2v/diffusion_models/wan"
Hardware Requirements
Minimum Requirements (FP16)
- GPU: NVIDIA RTX 3090 (24 GB VRAM) or AMD equivalent
- System RAM: 32 GB
- Disk Space: 20 GB free space
- CUDA: 11.8 or higher (for NVIDIA GPUs)
Recommended Requirements
- GPU: NVIDIA RTX 4090 (24 GB VRAM) or A5000/A6000
- System RAM: 64 GB
- Disk Space: 30 GB free space (for model + output cache)
- CUDA: 12.1 or higher
Performance Expectations
- Short Videos (2-4 seconds): ~30-60 seconds generation time
- Medium Videos (5-10 seconds): ~1-3 minutes generation time
- Long Videos (10-15 seconds): ~3-5 minutes generation time
Generation times vary based on resolution, frame rate, and sampling steps
Usage Examples
Installation
# Install dependencies
pip install diffusers transformers accelerate safetensors torch torchvision pillow
Basic Image-to-Video Generation
import torch
from diffusers import DiffusionPipeline
from PIL import Image
# Load the model
pipeline = DiffusionPipeline.from_pretrained(
"E:/huggingface/wan25-fp16-i2v/diffusion_models/wan",
torch_dtype=torch.float16,
variant="fp16"
)
pipeline.to("cuda")
# Enable memory optimizations
pipeline.enable_attention_slicing()
pipeline.enable_vae_slicing()
# Load source image
image = Image.open("input_image.jpg")
# Generate video from image
prompt = "Add gentle camera pan and natural motion"
video = pipeline(
image=image, # Source image
prompt=prompt, # Optional motion guidance
num_frames=64, # Number of frames to generate
height=512, # Video height
width=512, # Video width
num_inference_steps=50, # Sampling steps (higher = better quality)
guidance_scale=7.5, # Prompt adherence (higher = closer to prompt)
image_guidance_scale=1.0 # Image fidelity (higher = closer to source)
).frames
# Save video
from diffusers.utils import export_to_video
export_to_video(video, "output.mp4", fps=8)
Advanced Generation with Motion Control
import torch
from diffusers import DiffusionPipeline
from PIL import Image
# Load model with LoRA support
pipeline = DiffusionPipeline.from_pretrained(
"E:/huggingface/wan25-fp16-i2v/diffusion_models/wan",
torch_dtype=torch.float16
)
pipeline.to("cuda")
# Load LoRA adapters for enhanced control
pipeline.load_lora_weights("E:/huggingface/wan25-fp16-i2v/loras", adapter_name="motion_control")
pipeline.load_lora_weights("E:/huggingface/wan25-fp16-i2v/loras", adapter_name="camera_control")
# Enable adapters with specific weights
pipeline.set_adapters(["motion_control", "camera_control"], adapter_weights=[0.8, 0.7])
# Load source image
image = Image.open("landscape.jpg")
# Generate with enhanced control
prompt = "Smooth dolly forward, subtle parallax, cinematic motion"
video = pipeline(
image=image,
prompt=prompt,
num_frames=96, # More frames for longer video
height=768, # Higher resolution
width=768,
num_inference_steps=75, # More steps for quality
guidance_scale=8.0,
image_guidance_scale=1.2 # Strong image fidelity
).frames
export_to_video(video, "enhanced_output.mp4", fps=12)
Memory-Efficient Generation
import torch
from diffusers import DiffusionPipeline
from PIL import Image
pipeline = DiffusionPipeline.from_pretrained(
"E:/huggingface/wan25-fp16-i2v/diffusion_models/wan",
torch_dtype=torch.float16
)
pipeline.to("cuda")
# Enable all memory optimizations
pipeline.enable_attention_slicing()
pipeline.enable_vae_slicing()
pipeline.enable_sequential_cpu_offload() # Offload to CPU when not in use
# Load and resize image for efficiency
image = Image.open("photo.jpg")
image = image.resize((512, 512))
# Generate with reduced memory footprint
prompt = "Subtle natural motion and breathing life into the scene"
video = pipeline(
image=image,
prompt=prompt,
num_frames=48, # Fewer frames for memory efficiency
height=512,
width=512,
num_inference_steps=30, # Fewer steps for faster generation
guidance_scale=7.0
).frames
export_to_video(video, "memory_efficient_output.mp4", fps=8)
Batch Processing Multiple Images
import torch
from diffusers import DiffusionPipeline
from PIL import Image
import os
pipeline = DiffusionPipeline.from_pretrained(
"E:/huggingface/wan25-fp16-i2v/diffusion_models/wan",
torch_dtype=torch.float16
)
pipeline.to("cuda")
pipeline.enable_attention_slicing()
pipeline.enable_vae_slicing()
# Process multiple images
input_dir = "E:/input_images"
output_dir = "E:/output_videos"
os.makedirs(output_dir, exist_ok=True)
for img_file in os.listdir(input_dir):
if img_file.endswith(('.jpg', '.png', '.jpeg')):
# Load image
image = Image.open(os.path.join(input_dir, img_file))
# Generate video
video = pipeline(
image=image,
prompt="Cinematic motion, natural dynamics",
num_frames=64,
height=512,
width=512,
num_inference_steps=40
).frames
# Save with matching name
output_path = os.path.join(output_dir, f"{os.path.splitext(img_file)[0]}.mp4")
export_to_video(video, output_path, fps=8)
print(f"Generated: {output_path}")
Model Specifications
Technical Details
| Specification | Value |
|---|---|
| Model Type | Latent Diffusion (Image-to-Video) |
| Precision | FP16 (16-bit) |
| Format | SafeTensors |
| Max Frames | 96-128 frames |
| Resolution | 512x512 to 1024x1024 |
| Image Encoder | CLIP/VAE-based |
| VAE Channels | 4 (latent) |
| Sampling | DDPM, DDIM, DPM-Solver++ |
Supported Features
- β Image-to-video generation
- β Motion dynamics control
- β Camera movement control
- β Prompt-guided motion
- β Image fidelity preservation
- β LoRA adapter support
- β Memory optimization techniques
- β Batch processing
- β Custom sampling schedulers
- β Frame interpolation support
Limitations
- β οΈ Video length limited by VRAM (typically 2-15 seconds)
- β οΈ Requires significant GPU memory (24 GB minimum recommended)
- β οΈ Generation time increases with frame count and resolution
- β οΈ Complex motions may require higher sampling steps for coherence
- β οΈ Source image quality directly affects output quality
- β οΈ Very high contrast or unusual images may produce artifacts
Performance Tips and Optimization
Memory Optimization
Enable Attention Slicing: Reduces VRAM usage at slight speed cost
pipeline.enable_attention_slicing()Enable VAE Slicing: Processes VAE in smaller chunks
pipeline.enable_vae_slicing()CPU Offloading: Move model components to CPU when not in use
pipeline.enable_sequential_cpu_offload()Reduce Resolution: Start with 512x512 for testing, upscale later
Resize Source Images: Preprocess images to target resolution
image = image.resize((512, 512), Image.LANCZOS)
Quality Optimization
- Increase Inference Steps: 50-100 steps for higher quality (slower)
- Adjust Guidance Scales:
guidance_scale: 7.0-9.0 for prompt adherenceimage_guidance_scale: 1.0-1.5 for image fidelity
- Use LoRA Adapters: Enhance motion, camera, and quality aspects
- Frame Interpolation: Generate fewer frames, interpolate with RIFE/FILM
- High-Quality Source Images: Use clean, well-lit source images
Speed Optimization
- Reduce Inference Steps: 20-30 steps for faster generation (lower quality)
- Lower Resolution: 512x512 generates 4x faster than 1024x1024
- Fewer Frames: Generate 48-64 frames instead of 96-128
- Use DPM-Solver++: Faster sampling scheduler
from diffusers import DPMSolverMultistepScheduler pipeline.scheduler = DPMSolverMultistepScheduler.from_config(pipeline.scheduler.config)
Prompt Engineering Tips
- Describe Motion: "gentle pan", "slow zoom", "subtle motion"
- Camera Movements: "dolly in", "crane up", "orbit around"
- Motion Quality: "smooth", "cinematic", "natural dynamics"
- Avoid Contradictions: Keep motion descriptions coherent
- Optional Prompts: Prompts guide motion; can be empty for automatic motion
- Scene Context: Reference elements in the source image
Source Image Best Practices
- Resolution: Use images at or near target video resolution
- Quality: High-quality, well-exposed images work best
- Composition: Well-composed images produce better results
- Lighting: Consistent lighting makes animation more coherent
- Subject Matter: Clear subjects with defined edges animate better
- Avoid: Very blurry, low-resolution, or extremely dark images
License
This model is released under a custom WAN license. Please review the license terms before use.
Usage Restrictions
- β Research and non-commercial use permitted
- β Educational and academic use permitted
- β οΈ Commercial use may require separate licensing
- β Do not use for generating harmful, misleading, or illegal content
- β Do not use for deepfakes or impersonation without consent
- β Respect copyright and intellectual property rights of source images
Please refer to the official WAN model documentation for complete license terms.
Citation
If you use this model in your research or projects, please cite:
@misc{wan25-i2v-fp16,
title={WAN 2.5: Image-to-Video Diffusion Model},
author={WAN Team},
year={2025},
publisher={Hugging Face},
howpublished={\url{https://huggingface.co/Wan/WAN-2.5-I2V}},
note={FP16 variant}
}
Resources and Links
Official Resources
- Hugging Face Model Card: https://huggingface.co/Wan/WAN-2.5-I2V
- WAN Official Documentation: [Link to official docs when available]
- Model Paper: [ArXiv link when available]
Community and Support
- Hugging Face Forums: https://discuss.huggingface.co/
- GitHub Issues: [Repository link when available]
- Discord Community: [Discord invite when available]
Related Models
- WAN 2.5 Text-to-Video: Text-conditioned video generation variant
- WAN 2.5 FP8: More memory-efficient variant (lower precision)
- WAN 2.5 Full: Full precision variant (higher quality, more VRAM)
- FLUX.1: Alternative text-to-image models in this repository
Tutorials and Examples
- Diffusers Documentation: https://huggingface.co/docs/diffusers
- Image-to-Video Guide: https://huggingface.co/docs/diffusers/using-diffusers/image-to-video
- LoRA Training Guide: https://huggingface.co/docs/diffusers/training/lora
Version History
v1.4 (2025-10-28)
- Verified YAML frontmatter compliance with HuggingFace requirements
- Confirmed repository structure documentation accuracy
- Validated metadata fields (license, library_name, pipeline_tag, tags)
- Repository remains prepared for model file downloads
v1.3 (2025-10-14)
- CRITICAL FIX: Corrected pipeline_tag from
text-to-videotoimage-to-video - Updated all documentation to reflect Image-to-Video (I2V) functionality
- Revised usage examples for image-conditioned generation
- Added
image_guidance_scaleparameter documentation - Updated tags to include
image-to-video - Added source image best practices section
- Corrected model file naming conventions for I2V variant
v1.2 (2025-10-14)
- Simplified YAML frontmatter to essential fields only per requirements
- Removed base_model and base_model_relation (base model, not derived)
- Streamlined tags for better discoverability
- Verified directory structure (still awaiting model download)
v1.1 (2025-10-14)
- Updated YAML frontmatter to be first in file
- Corrected repository contents to reflect actual directory state
- Added download instructions for model files
- Clarified that model files are pending download
- Moved version comment after YAML frontmatter per HuggingFace standards
v1.0 (2025-10-13)
- Created repository structure
- Documented expected model files and usage
- Provided comprehensive usage examples
- Included hardware requirements and optimization tips
Contact and Contributions
For questions, issues, or contributions related to this repository organization:
- Local repository maintained for personal use
- See official WAN model repository for model-specific issues
- Refer to Hugging Face documentation for diffusers library support
Repository Maintained By: Local User Last Updated: 2025-10-28 README Version: v1.4
- Downloads last month
- -