Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents

This repository contains the pretrained checkpoints for Bifrost-1, a unified framework that bridges pretrained multimodal LLMs (MLLMs) and diffusion models using patch-level CLIP image embeddings as latent variables. Bifrost-1 enables high-fidelity controllable image generation with significant training efficiency without compromising the strong reasoning capabilities of MLLMs.

Paper: Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents

Project Page: https://bifrost-1.github.io

GitHub Repository: https://github.com/hanlincs/Bifrost-1

Bifrost-1 is designed for:

High-Fidelity Generation: Patch-level CLIP latents are natively aligned with the MLLM visual encoder, enabling high-quality image generation.
Training Efficiency: Achieves better image generation quality over other architecture variants with non-MLLM-aligned visual features, under controlled experimental settings, with substantially lower compute during training.
Preserves Visual Reasoning: Bifrost-1 fully inherits strong visual understanding capabilities of backbone MLLM by equipping it with a visual generation branch initialized from the original MLLM parameters.

🔧 Environment Setup

conda create -n bifrost1 python==3.11
conda activate bifrost1
pip install -r requirements.txt

🔮 Inference

📌 Model Checkpoints

The model checkpoint can be downloaded from HuggingFace here.

You can download it to your specified local_dir with code:

from huggingface_hub import snapshot_download

snapshot_download(
    repo_id="hanlincs/Bifrost-1",
    repo_type="model",
    local_dir="xxxxxxxx", # Replace with your local directory path
    local_dir_use_symlinks=False  
)

📌 Run Inference Scripts

Generate images from GenEval prompts

python inference_geneval_dpgbench.py --eval_geneval --output_dir "./outputs" --local_checkpoint_path XXXXX # Replace XXXXX with your local checkpoint path

📚 BibTeX

🌟 Please let us know in the issues or PRs if there's any questions. If you find our project useful in your research or application development, citing our paper would be the best support for us!

@misc{lin2025bifrost1bridgingmultimodalllms,
      title={Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents}, 
      author={Han Lin and Jaemin Cho and Amir Zadeh and Chuan Li and Mohit Bansal},
      year={2025},
      eprint={2508.05954},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2508.05954}, 
}

🙏 Acknowledgements

The development of Bifrost-1 has been greatly inspired by the following amazing works and teams:

We hope that releasing this model/codebase helps the community to continue pushing these creative tools forward in an open and responsible way.

Downloads last month: 6

Safetensors

Model size

15B params

Tensor type

BF16