Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents
This repository contains the pretrained checkpoints for Bifrost-1, a unified framework that bridges pretrained multimodal LLMs (MLLMs) and diffusion models using patch-level CLIP image embeddings as latent variables. Bifrost-1 enables high-fidelity controllable image generation with significant training efficiency without compromising the strong reasoning capabilities of MLLMs.
Paper: Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents
Project Page: https://bifrost-1.github.io
GitHub Repository: https://github.com/hanlincs/Bifrost-1
Bifrost-1 is designed for:
- High-Fidelity Generation: Patch-level CLIP latents are natively aligned with the MLLM visual encoder, enabling high-quality image generation.
- Training Efficiency: Achieves better image generation quality over other architecture variants with non-MLLM-aligned visual features, under controlled experimental settings, with substantially lower compute during training.
- Preserves Visual Reasoning: Bifrost-1 fully inherits strong visual understanding capabilities of backbone MLLM by equipping it with a visual generation branch initialized from the original MLLM parameters.
๐ง Environment Setup
conda create -n bifrost1 python==3.11
conda activate bifrost1
pip install -r requirements.txt
๐ฎ Inference
๐ Model Checkpoints
The model checkpoint can be downloaded from HuggingFace here.
You can download it to your specified local_dir with code:
from huggingface_hub import snapshot_download
snapshot_download(
repo_id="hanlincs/Bifrost-1",
repo_type="model",
local_dir="xxxxxxxx", # Replace with your local directory path
local_dir_use_symlinks=False
)
๐ Run Inference Scripts
Generate images from GenEval prompts
python inference_geneval_dpgbench.py --eval_geneval --output_dir "./outputs" --local_checkpoint_path XXXXX # Replace XXXXX with your local checkpoint path
๐ BibTeX
๐ Please let us know in the issues or PRs if there's any questions. If you find our project useful in your research or application development, citing our paper would be the best support for us!
@misc{lin2025bifrost1bridgingmultimodalllms,
title={Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents},
author={Han Lin and Jaemin Cho and Amir Zadeh and Chuan Li and Mohit Bansal},
year={2025},
eprint={2508.05954},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2508.05954},
}
๐ Acknowledgements
The development of Bifrost-1 has been greatly inspired by the following amazing works and teams:
We hope that releasing this model/codebase helps the community to continue pushing these creative tools forward in an open and responsible way.
- Downloads last month
- 6