Xsong123 commited on
Commit
816d522
Β·
verified Β·
1 Parent(s): 86d4273

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +386 -3
README.md CHANGED
@@ -1,3 +1,386 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center">
2
+ <h1>🎨 LucidFlux:<br/>Caption-Free Universal Image Restoration with a Large-Scale Diffusion Transformer</h1>
3
+
4
+ ###
5
+ [**🌍 Website**](https://w2genai-lab.github.io/LucidFlux/) | [**πŸ“˜ Technical Report**](https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/Technical_Report.pdf) | [**🧩 Models**](https://huggingface.co/W2GenAI/LucidFlux)
6
+ </div>
7
+
8
+ ---
9
+ <img width="1420" height="1116" alt="abs_image" src="https://github.com/user-attachments/assets/791c0c60-29a6-4497-86a9-5716049afe9a" />
10
+
11
+ ---
12
+ ## News & Updates
13
+
14
+ ---
15
+
16
+ Let us know if this works!
17
+
18
+ ## πŸ‘₯ Authors
19
+
20
+ > [**Song Fei**](https://github.com/FeiSong123)<sup>1</sup>\*, [**Tian Ye**](https://owen718.github.io/)<sup>1</sup>\*‑, [**Lei Zhu**](https://sites.google.com/site/indexlzhu/home)<sup>1,2</sup>†
21
+ >
22
+ > <sup>1</sup>The Hong Kong University of Science and Technology (Guangzhou)
23
+ > <sup>2</sup>The Hong Kong University of Science and Technology
24
+ >
25
+ > \*Equal Contribution, ‑Project Leader, †Corresponding Author
26
+
27
+ ---
28
+
29
+ ## 🌟 What is LucidFlux?
30
+ LucidFlux is a framework designed to perform high-fidelity image restoration across a wide range of degradations without requiring textual captions. By combining a Flux-based DiT backbone with Light-weight Condition Module and SigLIP semantic alignment, LucidFlux enables caption-free guidance while preserving structural and semantic consistency, achieving superior restoration quality.
31
+
32
+ ## πŸ“Š Performance Benchmarks
33
+
34
+ <div align="center">
35
+
36
+ ### πŸ“ˆ Quantitative Results
37
+
38
+ <table>
39
+ <thead>
40
+ <tr>
41
+ <th>Benchmark</th>
42
+ <th>Metric</th>
43
+ <th>ResShift</th>
44
+ <th>StableSR</th>
45
+ <th>SinSR</th>
46
+ <th>SeeSR</th>
47
+ <th>DreamClear</th>
48
+ <th>SUPIR</th>
49
+ <th>LucidFlux<br/>(Ours)</th>
50
+ </tr>
51
+ </thead>
52
+ <tbody>
53
+ <tr>
54
+ <td rowspan="7" style="text-align:center; vertical-align:middle;">RealSR</td>
55
+ <td style="white-space: nowrap;">CLIP-IQA+ ↑</td>
56
+ <td>0.5005</td>
57
+ <td>0.4408</td>
58
+ <td>0.5416</td>
59
+ <td>0.6731</td>
60
+ <td>0.5331</td>
61
+ <td>0.5640</td>
62
+ <td><b>0.7074</b></td>
63
+ </tr>
64
+ <tr>
65
+ <td style="white-space: nowrap;">Q-Align ↑</td>
66
+ <td>3.1045</td>
67
+ <td>2.5087</td>
68
+ <td>3.3615</td>
69
+ <td>3.6073</td>
70
+ <td>3.0044</td>
71
+ <td>3.4682</td>
72
+ <td><b>3.7555</b></td>
73
+ </tr>
74
+ <tr>
75
+ <td style="white-space: nowrap;">MUSIQ ↑</td>
76
+ <td>49.50</td>
77
+ <td>39.98</td>
78
+ <td>57.95</td>
79
+ <td>67.57</td>
80
+ <td>49.48</td>
81
+ <td>55.68</td>
82
+ <td><b>70.20</b></td>
83
+ </tr>
84
+ <tr>
85
+ <td style="white-space: nowrap;">MANIQA ↑</td>
86
+ <td>0.2976</td>
87
+ <td>0.2356</td>
88
+ <td>0.3753</td>
89
+ <td>0.5087</td>
90
+ <td>0.3092</td>
91
+ <td>0.3426</td>
92
+ <td><b>0.5437</b></td>
93
+ </tr>
94
+ <tr>
95
+ <td style="white-space: nowrap;">NIMA ↑</td>
96
+ <td>4.7026</td>
97
+ <td>4.3639</td>
98
+ <td>4.8282</td>
99
+ <td>4.8957</td>
100
+ <td>4.4948</td>
101
+ <td>4.6401</td>
102
+ <td><b>5.1072</b></td>
103
+ </tr>
104
+ <tr>
105
+ <td style="white-space: nowrap;">CLIP-IQA ↑</td>
106
+ <td>0.5283</td>
107
+ <td>0.3521</td>
108
+ <td>0.6601</td>
109
+ <td><b>0.6993</b></td>
110
+ <td>0.5390</td>
111
+ <td>0.4857</td>
112
+ <td>0.6783</td>
113
+ </tr>
114
+ <tr>
115
+ <td style="white-space: nowrap;">NIQE ↓</td>
116
+ <td>9.0674</td>
117
+ <td>6.8733</td>
118
+ <td>6.4682</td>
119
+ <td>5.4594</td>
120
+ <td>5.2873</td>
121
+ <td>5.2819</td>
122
+ <td><b>4.2893</b></td>
123
+ </tr>
124
+ <tr>
125
+ <td rowspan="7" style="text-align:center; vertical-align:middle;">RealLQ250</td>
126
+ <td style="white-space: nowrap;">CLIP-IQA+ ↑</td>
127
+ <td>0.5529</td>
128
+ <td>0.5804</td>
129
+ <td>0.6054</td>
130
+ <td>0.7034</td>
131
+ <td>0.6810</td>
132
+ <td>0.6532</td>
133
+ <td><b>0.7406</b></td>
134
+ </tr>
135
+ <tr>
136
+ <td style="white-space: nowrap;">Q-Align ↑</td>
137
+ <td>3.6318</td>
138
+ <td>3.5586</td>
139
+ <td>3.7451</td>
140
+ <td>4.1423</td>
141
+ <td>4.0640</td>
142
+ <td>4.1347</td>
143
+ <td><b>4.3935</b></td>
144
+ </tr>
145
+ <tr>
146
+ <td style="white-space: nowrap;">MUSIQ ↑</td>
147
+ <td>59.50</td>
148
+ <td>57.25</td>
149
+ <td>65.45</td>
150
+ <td>70.38</td>
151
+ <td>67.08</td>
152
+ <td>65.81</td>
153
+ <td><b>73.01</b></td>
154
+ </tr>
155
+ <tr>
156
+ <td style="white-space: nowrap;">MANIQA ↑</td>
157
+ <td>0.3397</td>
158
+ <td>0.2937</td>
159
+ <td>0.4230</td>
160
+ <td>0.4895</td>
161
+ <td>0.4400</td>
162
+ <td>0.3826</td>
163
+ <td><b>0.5589</b></td>
164
+ </tr>
165
+ <tr>
166
+ <td style="white-space: nowrap;">NIMA ↑</td>
167
+ <td>5.0624</td>
168
+ <td>5.0538</td>
169
+ <td>5.2397</td>
170
+ <td>5.3146</td>
171
+ <td>5.2200</td>
172
+ <td>5.0806</td>
173
+ <td><b>5.4836</b></td>
174
+ </tr>
175
+ <tr>
176
+ <td style="white-space: nowrap;">CLIP-IQA ↑</td>
177
+ <td>0.6129</td>
178
+ <td>0.5160</td>
179
+ <td><b>0.7166</b></td>
180
+ <td>0.7063</td>
181
+ <td>0.6950</td>
182
+ <td>0.5767</td>
183
+ <td>0.7122</td>
184
+ </tr>
185
+ <tr>
186
+ <td style="white-space: nowrap;">NIQE ↓</td>
187
+ <td>6.6326</td>
188
+ <td>4.6236</td>
189
+ <td>5.4425</td>
190
+ <td>4.4383</td>
191
+ <td>3.8700</td>
192
+ <td><b>3.6591</b></td>
193
+ <td>3.6742</td>
194
+ </tr>
195
+ </tbody>
196
+ </table>
197
+
198
+ </div>
199
+
200
+ ---
201
+
202
+ ## 🎭 Gallery & Examples
203
+
204
+ <div align="center">
205
+
206
+ ### 🎨 LucidFlux Gallery
207
+
208
+ ---
209
+
210
+ ### πŸ” Comparison with Open-Source Methods
211
+
212
+ <table>
213
+ <tr align="center">
214
+ <td width="200"><b>LQ</b></td>
215
+ <td width="200"><b>SinSR</b></td>
216
+ <td width="200"><b>SeeSR</b></td>
217
+ <td width="200"><b>SUPIR</b></td>
218
+ <td width="200"><b>DreamClear</b></td>
219
+ <td width="200"><b>Ours</b></td>
220
+ </tr>
221
+ <tr align="center"><td colspan="6"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/comparison/040.jpg" width="1200"></td></tr>
222
+ <tr align="center"><td colspan="6"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/comparison/041.jpg" width="1200"></td></tr>
223
+ <tr align="center"><td colspan="6"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/comparison/111.jpg" width="1200"></td></tr>
224
+ <tr align="center"><td colspan="6"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/comparison/123.jpg" width="1200"></td></tr>
225
+ <tr align="center"><td colspan="6"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/comparison/160.jpg" width="1200"></td></tr>
226
+ </table>
227
+
228
+ <details>
229
+ <summary>Show more examples</summary>
230
+
231
+ <table>
232
+ <tr align="center"><td colspan="6"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/comparison/013.jpg" width="1200"></td></tr>
233
+ <tr align="center"><td colspan="6"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/comparison/079.jpg" width="1200"></td></tr>
234
+ <tr align="center"><td colspan="6"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/comparison/082.jpg" width="1200"></td></tr>
235
+ <tr align="center"><td colspan="6"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/comparison/137.jpg" width="1200"></td></tr>
236
+ <tr align="center"><td colspan="6"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/comparison/166.jpg" width="1200"></td></tr>
237
+ </table>
238
+
239
+ </details>
240
+
241
+ ---
242
+
243
+ ### πŸ’Ό Comparison with Commercial Models
244
+
245
+ <table>
246
+ <tr align="center">
247
+ <td width="200"><b>LQ</b></td>
248
+ <td width="200"><b>HYPIR</b></td>
249
+ <td width="200"><b>Topaz</b></td>
250
+ <td width="200"><b>Gemini-NanoBanana</b></td>
251
+ <td width="200"><b>GPT-4o</b></td>
252
+ <td width="200"><b>Ours</b></td>
253
+ </tr>
254
+ <tr align="center"><td colspan="6"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/commercial_comparison/commercial_061.jpg" width="1200"></td></tr>
255
+ <tr align="center"><td colspan="6"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/commercial_comparison/commercial_094.jpg" width="1200"></td></tr>
256
+ <tr align="center"><td colspan="6"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/commercial_comparison/commercial_205.jpg" width="1200"></td></tr>
257
+ <tr align="center"><td colspan="6"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/commercial_comparison/commercial_209.jpg" width="1200"></td></tr>
258
+ </table>
259
+
260
+ <details>
261
+ <summary>Show more examples</summary>
262
+
263
+ <table>
264
+ <tr align="center"><td colspan="6"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/commercial_comparison/commercial_062.jpg" width="1200"></td></tr>
265
+ <tr align="center"><td colspan="6"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/commercial_comparison/commercial_160.jpg" width="1200"></td></tr>
266
+ <tr align="center"><td colspan="6"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/commercial_comparison/commercial_111.jpg" width="1200"></td></tr>
267
+ <tr align="center"><td colspan="6"><img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/commercial_comparison/commercial_123.jpg" width="1200"></td></tr>
268
+ </table>
269
+
270
+ </details>
271
+ </div>
272
+
273
+ ---
274
+
275
+ ## πŸ—οΈ Model Architecture
276
+
277
+ <div align="center">
278
+ <img src="https://raw.githubusercontent.com/W2GenAI-Lab/LucidFlux/main/images/framework/framework.png" alt="LucidFlux Framework Overview" width="1200"/>
279
+ <br>
280
+ <em><strong>Caption-Free Universal Image Restoration with a Large-Scale Diffusion Transformer</strong></em>
281
+ </div>
282
+
283
+ Our unified framework consists of **four critical components in the training workflow**:
284
+
285
+ **πŸ”€ Scaling Up Real-world High-Quality Data for Universal Image Restoration**
286
+
287
+ **🎨 Two Parallel Light-weight Condition Module Branches for Low-Quality Image Conditioning**
288
+
289
+ **🎯 Timestep and Layer-Adaptive Condition Injection**
290
+
291
+ **πŸ”„ Semantic Priors from Siglip for Caption-Free Semantic Alignment**
292
+
293
+
294
+ ## πŸš€ Quick Start
295
+
296
+ ### πŸ”§ Installation
297
+
298
+ ```bash
299
+ # Clone the repository
300
+ git clone https://github.com/W2GenAI-Lab/LucidFlux.git
301
+ cd LucidFlux
302
+
303
+ # Create conda environment
304
+ conda create -n lucidflux python=3.9
305
+ conda activate lucidflux
306
+
307
+ # Install dependencies
308
+ pip install -r requirements.txt
309
+
310
+ ```
311
+
312
+ ### Inference
313
+ - **Flux.1 dev** β†’ [πŸ€— FLUX.1-dev](https://huggingface.co/black-forest-labs/FLUX.1-dev)
314
+ Then update the model path in the `configs` for `flux-dev` in `src/flux/util.py` to your local FLUX.1-dev model path.
315
+
316
+ - **T5** β†’ [πŸ€— T5](https://huggingface.co/XLabs-AI/xflux_text_encoders)
317
+ Then update the T5 path in the function `load_t5` in `src/flux/util.py` to your local T5 path.
318
+
319
+ - **CLIP** β†’ [πŸ€— CLIP](https://huggingface.co/openai/clip-vit-large-patch14)
320
+ Then update the CLIP path in the function `load_clip` in `src/flux/util.py` to your local CLIP path.
321
+
322
+ - **SigLIP** β†’ [πŸ€— siglip2-so400m-patch16-512](https://huggingface.co/google/siglip2-so400m-patch16-512)
323
+ Then set `siglip_ckpt` to the corresponding local path.
324
+
325
+ - **SwinIR** β†’ [πŸ€— SwinIR](https://huggingface.co/lxq007/DiffBIR/blob/main/general_swinir_v1.ckpt)
326
+ Then set `swin_ir_ckpt` to the corresponding local path.
327
+
328
+ - **LucidFlux** β†’ [πŸ€— LucidFlux](https://huggingface.co/W2GenAI/LucidFlux)
329
+ Then set `checkpoint` to the corresponding local path.
330
+
331
+ ```bash
332
+ inference.sh
333
+
334
+ result_dir=ouput_images_folder
335
+ input_folder=input_images_folder
336
+ checkpoint_path=path/to/lucidflux.pth
337
+ swin_ir_ckpt=path/to/swinir.ckpt
338
+ siglip_ckpt=path/to/siglip.ckpt
339
+
340
+ mkdir -p ${result_dir}
341
+ echo "Processing checkpoint..."
342
+ python inference.py \
343
+ --checkpoint ${checkpoint_path} \
344
+ --swinir_pretrained ${swin_ir_ckpt} \
345
+ --control_image ${input_folder} \
346
+ --siglip_ckpt ${siglip_ckpt} \
347
+ --prompt "restore this image into high-quality, clean, high-resolution result" \
348
+ --output_dir ${result_dir}/ \
349
+ --width 1024 --height 1024 --num_steps 50 \
350
+ ```
351
+
352
+ Finially ```bash inference.sh```. You can also obtain the results of LucidFlux on RealSR and RealLQ250 from Hugging Face: [**LucidFlux**](https://huggingface.co/W2GenAI/LucidFlux).
353
+
354
+ ## πŸͺͺ License
355
+
356
+ The provided code and pre-trained weights are licensed under the [FLUX.1 \[dev\]](LICENSE).
357
+
358
+ ## πŸ™ Acknowledgments
359
+
360
+ - This code is based on [FLUX](https://github.com/black-forest-labs/flux). Some code are brought from [DreamClear](https://github.com/shallowdream204/DreamClear), [x-flux](https://github.com/XLabs-AI/x-flux). We thank the authors for their awesome work.
361
+
362
+ - πŸ›οΈ Thanks to our affiliated institutions for their support.
363
+ - 🀝 Special thanks to the open-source community for inspiration.
364
+
365
+ ---
366
+
367
+ ## πŸ“¬ Contact
368
+
369
+ For any questions or inquiries, please reach out to us:
370
+
371
+ - **Song Fei**: `[email protected]`
372
+ - **Tian Ye**: `[email protected]`
373
+
374
+ ## πŸ§‘β€πŸ€β€πŸ§‘ WeChat Group
375
+ <details>
376
+ <summary>η‚Ήε‡»ε±•εΌ€δΊŒη»΄η οΌˆWeChat Group QR CodeοΌ‰</summary>
377
+
378
+ <br>
379
+
380
+ <img src="https://github.com/user-attachments/assets/047faa4e-da63-415c-97a0-8dbe8045a839"
381
+ alt="WeChat Group QR"
382
+ width="320">
383
+ </details>
384
+
385
+
386
+ </div>