Anime Face Diffusion Model 🎨

A fine-tuned diffusion model for generating high-quality anime faces using DDPM. This model is based on Google's pre-trained ddpm-celebahq-256 model and fine-tuned on 7,000+ anime face images.

Model Details

  • Model Type: Denoising Diffusion Probabilistic Model (DDPM)
  • Base Model: google/ddpm-celebahq-256
  • Task: Unconditional Image Generation (256Γ—256 anime faces)
  • Training Data: 7,000+ high-quality anime face images
  • Framework: 🧨 Diffusers
  • License: MIT

Training Parameters

  • Learning Rate: 2e-5
  • Epochs: 15
  • Batch Size: 4
  • Gradient Accumulation Steps: 2
  • Training Steps: ~26,250 (1750 steps/epoch Γ— 15 epochs)
  • Optimizer: AdamW
  • Loss: MSE (Mean Squared Error)

Usage

Basic Usage

from diffusers import DDPMPipeline
import torch

# Load the model
pipeline = DDPMPipeline.from_pretrained("abcd2019/Anime-face-generation")
device = "cuda" if torch.cuda.is_available() else "cpu"
pipeline = pipeline.to(device)

# Generate a single image
image = pipeline(num_inference_steps=100).images[0]
image.save("anime_face.png")

Generate Multiple Images

from diffusers import DDPMPipeline

pipeline = DDPMPipeline.from_pretrained("abcd2019/Anime-face-generation")
pipeline = pipeline.to("cuda")

# Generate 5 anime faces
images = pipeline(batch_size=5, num_inference_steps=100).images

for i, image in enumerate(images):
    image.save(f"anime_face_{i}.png")

Adjust Inference Steps for Quality vs Speed

# Fast generation (fewer steps, less quality)
fast_image = pipeline(num_inference_steps=50).images[0]

# High quality (more steps, slower)
quality_image = pipeline(num_inference_steps=150).images[0]

# Recommended: 100 steps for good balance
balanced_image = pipeline(num_inference_steps=100).images[0]

Use Different Scheduler

from diffusers import DDPMPipeline, DDIMScheduler

pipeline = DDPMPipeline.from_pretrained("abcd2019/Anime-face-generation")

# Switch to DDIM for faster sampling
scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
scheduler.set_timesteps(num_inference_steps=50)
pipeline.scheduler = scheduler

fast_image = pipeline().images[0]  # Generates in ~50 steps instead of 1000

Model Performance

  • Training Loss: ~0.0077 (final epoch)
  • Image Resolution: 256Γ—256 pixels
  • Inference Speed: ~30-60 seconds per image (depending on steps)
  • Recommended Inference Steps: 100 (for best quality)
  • Generated Face Styles: Wide diversity of anime faces with various:
    • Hair colors and styles
    • Eye colors and expressions
    • Face shapes and features
    • Skin tones

Limitations & Bias

  • Resolution: Limited to 256Γ—256 pixels (inherent to model architecture)
  • Style: Specifically trained on anime faces, may not generate realistic/photorealistic faces
  • Diversity: Generated faces are limited to patterns in training data
  • Quality Variation: Face shape clarity depends on inference steps (higher = better)

Training Details

Data Preparation

  • Dataset: Anime Face Dataset (Kaggle)
  • Total Images: 7,000
  • Selection Method: Top quality images by file size
  • Preprocessing:
    • Resized to 256Γ—256
    • Random horizontal flip (50% probability)
    • Normalized to [-1, 1]

Fine-tuning Approach

  • Started from pre-trained ddpm-celebahq-256
  • Fine-tuned with low learning rate to preserve general face generation knowledge
  • Adapted to anime-specific features (large eyes, stylized features, etc.)

Training Dynamics

  • Epoch 0-3: Model adapts from photorealistic to anime style
  • Epoch 4-8: Loss continues to decrease, anime features solidify
  • Epoch 9+: Marginal improvements, risk of overfitting

Ethical Considerations

This model generates synthetic anime faces and should not be used to:

  • Create misleading/deceptive content
  • Generate non-consensual images of real people
  • Violate any local laws or regulations

Recommended Citation

If you use this model in your research or project, please credit:

  • The original DDPM paper
  • Google's pre-trained ddpm-celebahq-256 model
  • This fine-tuned adaptation

Future Improvements

Potential enhancements for future versions:

  • Higher resolution (512Γ—512 or more)
  • Conditional generation (text-to-image for anime faces)
  • Better diversity through larger training datasets
  • Improved training with advanced schedulers or techniques

Resources


Created: 2025-12-28

Model Card Contact: [Your Name/Username]

Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support