Recurrent Gemma encoder as CLIP-L
Inference
encoder_path = 'nightknocker/recurrent-t5gemma-l-l-ul2-encoder'
tokenizer = AutoTokenizer.from_pretrained(encoder_path)
encoder = RecurrentEncoderModel.from_pretrained(encoder_path)
inputs = tokenizer.batch_encode_plus(
[text],
max_length=77, # or any longer seq length
padding='max_length',
truncation=True,
return_tensors='pt'
)
image = pipeline(
None,
prompt_embeds=encoder.forward(**inputs).last_hidden_state
).images[0]
References
- 2504.06225
- 2511.06876
- 2511.07384
Datasets
- artbench-pd-256x256
- colormix (synthetic color, fashion dataset)
- danbooru2023-florence2-caption (verb, action clauses)
- danbooru2025-1000-balanced-448px
- spatial-caption
- spright-coco
- benchmarks from the Qwen-Image Technical Report
- Downloads last month
- 20
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support

