nightknocker
/

strelitzia-t5gemma-encoder-adapter

Model card Files Files and versions

Timestep-Aware Semantic Alignment for Gemma Encoder

uses bidirectional attention mask for padded tokens
text-image pairs specifically targeting low/high-frequency details based on the diffusion timestep
explicit timestep conditioning integrated into the encoder
as in previous releases, the dataset was cleaned of overused phrases and stop words

prompt_embeds = encoder.encode(text, t=torch.ones((1,)))
callback_on_step_end = lambda p, i, t, kwargs: {
  'prompt_embeds': encoder.model.forward(*encoder, t=t / 1000.0)
}

References

2403.05135

Datasets

animelover/touhou-images
animesfw
alfredplpl/artbench-pd-256x256
anime-art-multicaptions (multicharacter interactions)
colormix (synthetic color, fashion dataset)
danbooru2023-florence2-caption (verb, action clauses)
spatial-caption
spright-coco
picollect
pixiv rank
trojblue/danbooru2025-metadata

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support