focus_flux / README.md
ericbill21's picture
Update README.md
085c98f verified
metadata
license: other
license_name: flux-1-dev-non-commercial-license
license_link: https://huggingface.co/black-forest-labs/FLUX.1-dev/resolve/main/LICENSE.md
language:
  - en
base_model:
  - black-forest-labs/FLUX.1-dev
pipeline_tag: text-to-image
tags:
  - flux.1-dev
  - flux
  - text-to-image
  - multi-subject
  - FOCUS
  - flow-matching
  - optimal-control
  - fine-tuned

FLUX.1 [dev] + FOCUS

FLUX.1 [dev] fine-tuned for multi-subject prompts

TL;DR: A fine-tuned derivative of black-forest-labs/FLUX.1-dev focused on multi-subject fidelity—keeping multiple entities and their attributes unentangled while preserving base style. Works across animals, people, and objects.
Read the paper: Optimal Control Meets Flow Matching: A Principled Route to Multi-Subject Fidelity.

⚠️ Licensing: This model inherits the BlackForest Community License from the base model and is distributed under compatible terms. Use is subject to the base model’s license


What’s improved

  • Entity disentanglement: better separation across 2–4 subjects, fewer merges/omissions.
  • Attribute binding: colors, clothing, and small accessories stick to the correct subject.
  • Single Subject: also improve sinlge subject generation, while staying stylistic close to base model.

Quick start (Diffusers)

Install the 🧨 diffusers library

pip install -U transformers==4.53.0 diffusers==0.33.1

Then:

import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained(
    "ericbill21/focus_flux",
    torch_dtype=torch.bfloat16
).to("cuda")
# For smaller GPUs use: pipe.enable_sequential_cpu_offload() instead of .to("cuda")

image = pipe(
    prompt="A lion and a tiger resting side by side in a jungle clearing",
    num_inference_steps=28,
    guidance_scale=3.5,
    max_sequence_length=256,
    height=512,
    width=512,
    generator=torch.Generator("cpu").manual_seed(5),
).images[0]

image.save("sample.png")

Since this uses the standard Diffusers pipeline, you can apply features like xFormers attention, VAE tiling/slicing, and quantization as usual.

How was this achieved?

We cast multi-subject fidelity as a stochastic optimal control problem over flow-matching samplers and fine-tune via FOCUS (an adjoint-matching heuristic). A lightweight controller is trained to respect subject identity, attributes, and spatial relations while staying close to the base distribution, yielding improved multi-subject fidelity without sacrificing style. Full details and ablations are in the paper and code.

Model details

  • Base: black-forest-labs/FLUX.1-dev
  • Type: full pipeline (no LoRA required at inference)
  • Intended use: research/creative work where multi-subject consistency matters
  • Limitations: under extreme clutter or highly similar subjects, attributes may still leak; biases of the base model may persist.

Citation

If you find this useful, please cite:

@article{Bill2025FOCUS,
  title   = {Optimal Control Meets Flow Matching: A Principled Route to Multi-Subject Fidelity},
  author  = {Eric Tillmann Bill and Enis Simsar and Thomas Hofmann},
  journal = {arXiv preprint arXiv:2510.02315},
  year    = {2025},
  url     = {https://arxiv.org/abs/2510.02315}
}

Contact

Feedback and issues welcome via the Hugging Face model page or GitHub.