|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: mit |
|
|
license_name: flux-1-dev |
|
|
license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/blob/main/LICENSE.md |
|
|
tags: |
|
|
- image-restoration |
|
|
- diffusion |
|
|
- computer-vision |
|
|
- flux |
|
|
pipeline_tag: image-to-image |
|
|
library_name: diffusers |
|
|
--- |
|
|
|
|
|
<div align="center"> |
|
|
<h1>🎨 LucidFlux:<br/>Caption-Free Universal Image Restoration with a Large-Scale Diffusion Transformer</h1> |
|
|
|
|
|
### |
|
|
[**🌍 Website**](https://w2genai-lab.github.io/LucidFlux/) | [**📄 Paper**](https://huggingface.co/papers/2509.22414) | [**💻 Code**](https://github.com/W2GenAI-Lab/LucidFlux) | [**🧩 Models**](https://huggingface.co/W2GenAI/LucidFlux) |
|
|
</div> |
|
|
|
|
|
--- |
|
|
<img width="1420" height="1116" alt="abs_image" src="https://github.com/user-attachments/assets/791c0c60-29a6-4497-86a9-5716049afe9a" /> |
|
|
|
|
|
--- |
|
|
## News & Updates |
|
|
|
|
|
--- |
|
|
|
|
|
Let us know if this works! |
|
|
|
|
|
## 👥 Authors |
|
|
|
|
|
> [**Song Fei**](https://github.com/FeiSong123)<sup>1</sup>\*, [**Tian Ye**](https://owen718.github.io/)<sup>1</sup>\*‡, [**Lei Zhu**](https://sites.google.com/site/indexlzhu/home)<sup>1,2</sup>† |
|
|
> |
|
|
> <sup>1</sup>The Hong Kong University of Science and Technology (Guangzhou) |
|
|
> <sup>2</sup>The Hong Kong University of Science and Technology |
|
|
> |
|
|
> \*Equal Contribution, ‡Project Leader, †Corresponding Author |
|
|
|
|
|
--- |
|
|
|
|
|
## 🌟 What is LucidFlux? |
|
|
|
|
|
<!-- <div align="center"> |
|
|
<img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/demo/demo2.png" alt="What is LucidFlux - Quick Prompt Demo" width="1200"/> |
|
|
<br> |
|
|
</div> --> |
|
|
|
|
|
LucidFlux is a framework designed to perform high-fidelity image restoration across a wide range of degradations without requiring textual captions. By combining a Flux-based DiT backbone with Light-weight Condition Module and SigLIP semantic alignment, LucidFlux enables caption-free guidance while preserving structural and semantic consistency, achieving superior restoration quality. |
|
|
|
|
|
<!-- ## 🚀 Quick Start |
|
|
|
|
|
### 🔧 Installation |
|
|
|
|
|
```bash |
|
|
# Clone the repository |
|
|
git clone https://github.com/ephemeral182/LucidFlux.git |
|
|
cd LucidFlux |
|
|
|
|
|
# Create conda environment |
|
|
conda create -n postercraft python=3.11 |
|
|
conda activate postercraft |
|
|
|
|
|
# Install dependencies |
|
|
pip install -r requirements.txt |
|
|
|
|
|
``` --> |
|
|
|
|
|
<!-- ### 🚀 Quick Generation |
|
|
|
|
|
Generate high-quality aesthetic posters from your prompt with `BF16` precision: |
|
|
|
|
|
```bash |
|
|
python inference.py \ |
|
|
--prompt "Urban Canvas Street Art Expo poster with bold graffiti-style lettering and dynamic colorful splashes" \ |
|
|
--enable_recap \ |
|
|
--num_inference_steps 28 \ |
|
|
--guidance_scale 3.5 \ |
|
|
--seed 42 \ |
|
|
--pipeline_path "black-forest-labs/FLUX.1-dev" \ |
|
|
--custom_transformer_path "LucidFlux/LucidFlux-v1_RL" \ |
|
|
--qwen_model_path "Qwen/Qwen3-8B" |
|
|
``` |
|
|
|
|
|
If you are running on a GPU with limited memory, you can use `inference_offload.py` to offload some components to the CPU: |
|
|
|
|
|
```bash |
|
|
python inference_offload.py \ |
|
|
--prompt "Urban Canvas Street Art Expo poster with bold graffiti-style lettering and dynamic colorful splashes" \ |
|
|
--enable_recap \ |
|
|
--num_inference_steps 28 \ |
|
|
--guidance_scale 3.5 \ |
|
|
--seed 42 \ |
|
|
--pipeline_path "black-forest-labs/FLUX.1-dev" \ |
|
|
--custom_transformer_path "LucidFlux/LucidFlux-v1_RL" \ |
|
|
--qwen_model_path "Qwen/Qwen3-8B" |
|
|
``` --> |
|
|
<!-- |
|
|
### 💻 Gradio Web UI |
|
|
|
|
|
We provide a Gradio web UI for LucidFlux. |
|
|
|
|
|
```bash |
|
|
python demo_gradio.py |
|
|
``` --> |
|
|
|
|
|
|
|
|
## 📊 Performance Benchmarks |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
### 📈 Quantitative Results |
|
|
|
|
|
<table> |
|
|
<thead> |
|
|
<tr> |
|
|
<th>Benchmark</th> |
|
|
<th>Metric</th> |
|
|
<th>ResShift</th> |
|
|
<th>StableSR</th> |
|
|
<th>SinSR</th> |
|
|
<th>SeeSR</th> |
|
|
<th>DreamClear</th> |
|
|
<th>SUPIR</th> |
|
|
<th>LucidFlux<br/>(Ours)</th> |
|
|
</tr> |
|
|
</thead> |
|
|
<tbody> |
|
|
<tr> |
|
|
<td rowspan="7" style="text-align:center; vertical-align:middle;">RealSR</td> |
|
|
<td style="white-space: nowrap;">CLIP-IQA+ ↑</td> |
|
|
<td>0.5005</td> |
|
|
<td>0.4408</td> |
|
|
<td>0.5416</td> |
|
|
<td>0.6731</td> |
|
|
<td>0.5331</td> |
|
|
<td>0.5640</td> |
|
|
<td><b>0.7074</b></td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td style="white-space: nowrap;">Q-Align ↑</td> |
|
|
<td>3.1045</td> |
|
|
<td>2.5087</td> |
|
|
<td>3.3615</td> |
|
|
<td>3.6073</td> |
|
|
<td>3.0044</td> |
|
|
<td>3.4682</td> |
|
|
<td><b>3.7555</b></td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td style="white-space: nowrap;">MUSIQ ↑</td> |
|
|
<td>49.50</td> |
|
|
<td>39.98</td> |
|
|
<td>57.95</td> |
|
|
<td>67.57</td> |
|
|
<td>49.48</td> |
|
|
<td>55.68</td> |
|
|
<td><b>70.20</b></td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td style="white-space: nowrap;">MANIQA ↑</td> |
|
|
<td>0.2976</td> |
|
|
<td>0.2356</td> |
|
|
<td>0.3753</td> |
|
|
<td>0.5087</td> |
|
|
<td>0.3092</td> |
|
|
<td>0.3426</td> |
|
|
<td><b>0.5437</b></td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td style="white-space: nowrap;">NIMA ↑</td> |
|
|
<td>4.7026</td> |
|
|
<td>4.3639</td> |
|
|
<td>4.8282</td> |
|
|
<td>4.8957</td> |
|
|
<td>4.4948</td> |
|
|
<td>4.6401</td> |
|
|
<td><b>5.1072</b></td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td style="white-space: nowrap;">CLIP-IQA ↑</td> |
|
|
<td>0.5283</td> |
|
|
<td>0.3521</td> |
|
|
<td>0.6601</td> |
|
|
<td><b>0.6993</b></td> |
|
|
<td>0.5390</td> |
|
|
<td>0.4857</td> |
|
|
<td>0.6783</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td style="white-space: nowrap;">NIQE ↓</td> |
|
|
<td>9.0674</td> |
|
|
<td>6.8733</td> |
|
|
<td>6.4682</td> |
|
|
<td>5.4594</td> |
|
|
<td>5.2873</td> |
|
|
<td>5.2819</td> |
|
|
<td><b>4.2893</b></td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td rowspan="7" style="text-align:center; vertical-align:middle;">RealLQ250</td> |
|
|
<td style="white-space: nowrap;">CLIP-IQA+ ↑</td> |
|
|
<td>0.5529</td> |
|
|
<td>0.5804</td> |
|
|
<td>0.6054</td> |
|
|
<td>0.7034</td> |
|
|
<td>0.6810</td> |
|
|
<td>0.6532</td> |
|
|
<td><b>0.7406</b></td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td style="white-space: nowrap;">Q-Align ↑</td> |
|
|
<td>3.6318</td> |
|
|
<td>3.5586</td> |
|
|
<td>3.7451</td> |
|
|
<td>4.1423</td> |
|
|
<td>4.0640</td> |
|
|
<td>4.1347</td> |
|
|
<td><b>4.3935</b></td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td style="white-space: nowrap;">MUSIQ ↑</td> |
|
|
<td>59.50</td> |
|
|
<td>57.25</td> |
|
|
<td>65.45</td> |
|
|
<td>70.38</td> |
|
|
<td>67.08</td> |
|
|
<td>65.81</td> |
|
|
<td><b>73.01</b></td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td style="white-space: nowrap;">MANIQA ↑</td> |
|
|
<td>0.3397</td> |
|
|
<td>0.2937</td> |
|
|
<td>0.4230</td> |
|
|
<td>0.4895</td> |
|
|
<td>0.4400</td> |
|
|
<td>0.3826</td> |
|
|
<td><b>0.5589</b></td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td style="white-space: nowrap;">NIMA ↑</td> |
|
|
<td>5.0624</td> |
|
|
<td>5.0538</td> |
|
|
<td>5.2397</td> |
|
|
<td>5.3146</td> |
|
|
<td>5.2200</td> |
|
|
<td>5.0806</td> |
|
|
<td><b>5.4836</b></td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td style="white-space: nowrap;">CLIP-IQA ↑</td> |
|
|
<td>0.6129</td> |
|
|
<td>0.5160</td> |
|
|
<td><b>0.7166</b></td> |
|
|
<td>0.7063</td> |
|
|
<td>0.6950</td> |
|
|
<td>0.5767</td> |
|
|
<td>0.7122</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td style="white-space: nowrap;">NIQE ↓</td> |
|
|
<td>6.6326</td> |
|
|
<td>4.6236</td> |
|
|
<td>5.4425</td> |
|
|
<td>4.4383</td> |
|
|
<td>3.8700</td> |
|
|
<td><b>3.6591</b></td> |
|
|
<td>3.6742</td> |
|
|
</tr> |
|
|
</tbody> |
|
|
</table> |
|
|
|
|
|
|
|
|
|
|
|
<!-- <img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/user_study/hpc.png" alt="User Study Results" width="1200"/> --> |
|
|
|
|
|
</div> |
|
|
|
|
|
--- |
|
|
|
|
|
## 🎭 Gallery & Examples |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
### 🎨 LucidFlux Gallery |
|
|
|
|
|
--- |
|
|
|
|
|
### 🔍 Comparison with Open-Source Methods |
|
|
|
|
|
<table> |
|
|
<tr align="center"> |
|
|
<td width="200"><b>LQ</b></td> |
|
|
<td width="200"><b>SinSR</b></td> |
|
|
<td width="200"><b>SeeSR</b></td> |
|
|
<td width="200"><b>SUPIR</b></td> |
|
|
<td width="200"><b>DreamClear</b></td> |
|
|
<td width="200"><b>Ours</b></td> |
|
|
</tr> |
|
|
<tr align="center"><td colspan="6"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/comparison/040.jpg" width="1200"></td></tr> |
|
|
<tr align="center"><td colspan="6"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/comparison/041.jpg" width="1200"></td></tr> |
|
|
<tr align="center"><td colspan="6"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/comparison/111.jpg" width="1200"></td></tr> |
|
|
<tr align="center"><td colspan="6"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/comparison/123.jpg" width="1200"></td></tr> |
|
|
<tr align="center"><td colspan="6"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/comparison/160.jpg" width="1200"></td></tr> |
|
|
</table> |
|
|
|
|
|
<details> |
|
|
<summary>Show more examples</summary> |
|
|
|
|
|
<table> |
|
|
<tr align="center"><td colspan="6"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/comparison/013.jpg" width="1200"></td></tr> |
|
|
<tr align="center"><td colspan="6"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/comparison/079.jpg" width="1200"></td></tr> |
|
|
<tr align="center"><td colspan="6"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/comparison/082.jpg" width="1200"></td></tr> |
|
|
<tr align="center"><td colspan="6"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/comparison/137.jpg" width="1200"></td></tr> |
|
|
<tr align="center"><td colspan="6"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/comparison/166.jpg" width="1200"></td></tr> |
|
|
</table> |
|
|
|
|
|
</details> |
|
|
|
|
|
--- |
|
|
|
|
|
### 💼 Comparison with Commercial Models |
|
|
|
|
|
<table> |
|
|
<tr align="center"> |
|
|
<td width="200"><b>LQ</b></td> |
|
|
<td width="200"><b>HYPIR</b></td> |
|
|
<td width="200"><b>Topaz</b></td> |
|
|
<td width="200"><b>SeeDream 4.0</b></td> |
|
|
<td width="200"><b>Gemini-NanoBanana</b></td> |
|
|
<td width="200"><b>GPT-4o</b></td> |
|
|
<td width="200"><b>Ours</b></td> |
|
|
</tr> |
|
|
<tr align="center"><td colspan="7"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/commercial_comparison/commercial_061.jpg" width="1400"></td></tr> |
|
|
<tr align="center"><td colspan="7"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/commercial_comparison/commercial_094.jpg" width="1400"></td></tr> |
|
|
<tr align="center"><td colspan="7"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/commercial_comparison/commercial_205.jpg" width="1400"></td></tr> |
|
|
<tr align="center"><td colspan="7"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/commercial_comparison/commercial_209.jpg" width="1400"></td></tr> |
|
|
</table> |
|
|
|
|
|
<details> |
|
|
<summary>Show more examples</summary> |
|
|
|
|
|
<table> |
|
|
<tr align="center"><td colspan="7"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/commercial_comparison/commercial_062.jpg" width="1400"></td></tr> |
|
|
<tr align="center"><td colspan="7"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/commercial_comparison/commercial_160.jpg" width="1400"></td></tr> |
|
|
<tr align="center"><td colspan="7"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/commercial_comparison/commercial_111.jpg" width="1400"></td></tr> |
|
|
<tr align="center"><td colspan="7"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/commercial_comparison/commercial_123.jpg" width="1400"></td></tr> |
|
|
</table> |
|
|
|
|
|
</details> |
|
|
</div> |
|
|
|
|
|
--- |
|
|
|
|
|
## 🏗️ Model Architecture |
|
|
|
|
|
<div align="center"> |
|
|
<img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/framework/framework.png" alt="LucidFlux Framework Overview" width="1200"/> |
|
|
<br> |
|
|
<em><strong>Caption-Free Universal Image Restoration with a Large-Scale Diffusion Transformer</strong></em> |
|
|
</div> |
|
|
|
|
|
Our unified framework consists of **four critical components in the training workflow**: |
|
|
|
|
|
**🔤 Scaling Up Real-world High-Quality Data for Universal Image Restoration** |
|
|
|
|
|
**🎨 Two Parallel Light-weight Condition Module Branches for Low-Quality Image Conditioning** |
|
|
|
|
|
**🎯 Timestep and Layer-Adaptive Condition Injection** |
|
|
|
|
|
**🔄 Semantic Priors from Siglip for Caption-Free Semantic Alignment** |
|
|
|
|
|
|
|
|
## 🚀 Quick Start |
|
|
|
|
|
### 🔧 Installation |
|
|
|
|
|
```bash |
|
|
# Clone the repository |
|
|
git clone https://github.com/W2GenAI-Lab/LucidFlux.git |
|
|
cd LucidFlux |
|
|
|
|
|
# Create conda environment |
|
|
conda create -n lucidflux python=3.9 |
|
|
conda activate lucidflux |
|
|
|
|
|
# Install dependencies |
|
|
pip install -r requirements.txt |
|
|
|
|
|
``` |
|
|
|
|
|
### Inference |
|
|
|
|
|
Prepare models in 2 steps, then run a single command. |
|
|
|
|
|
1) Login to Hugging Face (required for gated FLUX.1-dev). Skip if already logged-in. |
|
|
|
|
|
```bash |
|
|
python -m tools.hf_login --token "$HF_TOKEN" |
|
|
``` |
|
|
|
|
|
2) Download required weights to fixed paths and export env vars |
|
|
|
|
|
```bash |
|
|
# FLUX.1-dev (flow+ae), SwinIR prior, T5, CLIP, SigLIP and LucidFlux checkpoint to ./weights |
|
|
python -m tools.download_weights --dest weights |
|
|
|
|
|
# Exports FLUX_DEV_FLOW/FLUX_DEV_AE to your shell |
|
|
source weights/env.sh |
|
|
``` |
|
|
|
|
|
|
|
|
Run inference (uses fixed relative paths): |
|
|
|
|
|
```bash |
|
|
bash inference.sh |
|
|
``` |
|
|
|
|
|
You can also obtain results of LucidFlux on RealSR and RealLQ250 from Hugging Face: [**LucidFlux**](https://huggingface.co/W2GenAI/LucidFlux). |
|
|
|
|
|
## 🪪 License |
|
|
|
|
|
The provided code and pre-trained weights are licensed under the [FLUX.1 [dev]](LICENSE). |
|
|
|
|
|
## 🙏 Acknowledgments |
|
|
|
|
|
- This code is based on [FLUX](https://github.com/black-forest-labs/flux). Some code are brought from [DreamClear](https://github.com/shallowdream204/DreamClear), [x-flux](https://github.com/XLabs-AI/x-flux). We thank the authors for their awesome work. |
|
|
|
|
|
- 🏛️ Thanks to our affiliated institutions for their support. |
|
|
- 🤝 Special thanks to the open-source community for inspiration. |
|
|
|
|
|
--- |
|
|
|
|
|
## 📬 Contact |
|
|
|
|
|
For any questions or inquiries, please reach out to us: |
|
|
|
|
|
- **Song Fei**: `[email protected]` |
|
|
- **Tian Ye**: `[email protected]` |
|
|
|
|
|
## 🧑🤝🧑 WeChat Group |
|
|
<details> |
|
|
<summary>点击展开二维码(WeChat Group QR Code)</summary> |
|
|
|
|
|
<br> |
|
|
|
|
|
<img src="https://github.com/user-attachments/assets/047faa4e-da63-415c-97a0-8dbe8045a839" |
|
|
alt="WeChat Group QR" |
|
|
width="320"> |
|
|
</details> |