Timestep-Aware Semantic Alignment for Gemma Encoder

  • uses bidirectional attention mask for padded tokens
  • text-image pairs specifically targeting low/high-frequency details based on the diffusion timestep
  • explicit timestep conditioning integrated into the encoder
  • as in previous releases, the dataset was cleaned of overused phrases and stop words
prompt_embeds = encoder.encode(text, t=torch.ones((1,)))
callback_on_step_end = lambda p, i, t, kwargs: {
  'prompt_embeds': encoder.model.forward(*encoder, t=t / 1000.0)
}

References

  • 2403.05135

Datasets

  • animelover/touhou-images
  • animesfw
  • alfredplpl/artbench-pd-256x256
  • anime-art-multicaptions (multicharacter interactions)
  • colormix (synthetic color, fashion dataset)
  • danbooru2023-florence2-caption (verb, action clauses)
  • spatial-caption
  • spright-coco
  • picollect
  • pixiv rank
  • trojblue/danbooru2025-metadata
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support