Recurrent Gemma encoder as CLIP-L

Inference

encoder_path = 'nightknocker/recurrent-t5gemma-l-l-ul2-encoder'
tokenizer = AutoTokenizer.from_pretrained(encoder_path)
encoder = RecurrentEncoderModel.from_pretrained(encoder_path)
inputs = tokenizer.batch_encode_plus(
    [text],
    max_length=77,  # or any longer seq length
    padding='max_length',
    truncation=True,
    return_tensors='pt'
)
image = pipeline(
    None,
    prompt_embeds=encoder.forward(**inputs).last_hidden_state
).images[0]

References

2504.06225
2511.06876
2511.07384

Datasets

artbench-pd-256x256
colormix (synthetic color, fashion dataset)
danbooru2023-florence2-caption (verb, action clauses)
danbooru2025-1000-balanced-448px
spatial-caption
spright-coco
benchmarks from the Qwen-Image Technical Report

Downloads last month: 20

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support