Recurrent Gemma encoder as CLIP-L

Inference

encoder_path = 'nightknocker/recurrent-t5gemma-l-l-ul2-encoder'
tokenizer = AutoTokenizer.from_pretrained(encoder_path)
encoder = RecurrentEncoderModel.from_pretrained(encoder_path)
inputs = tokenizer.batch_encode_plus(
    [text],
    max_length=77,  # or any longer seq length
    padding='max_length',
    truncation=True,
    return_tensors='pt'
)
image = pipeline(
    None,
    prompt_embeds=encoder.forward(**inputs).last_hidden_state
).images[0]

References

  • 2504.06225
  • 2511.06876
  • 2511.07384

Datasets

  • artbench-pd-256x256
  • colormix (synthetic color, fashion dataset)
  • danbooru2023-florence2-caption (verb, action clauses)
  • danbooru2025-1000-balanced-448px
  • spatial-caption
  • spright-coco
  • benchmarks from the Qwen-Image Technical Report
Downloads last month
20
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support